[OTDev] Luca Settimo
Vedrin Jeliazkov vedrin.jeliazkov at gmail.comThu Aug 4 14:28:56 CEST 2011
- Previous message: [OTDev] help request
- Next message: [OTDev] Luca Settimo
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Luca, > could you give me some more info on the databases that you collected for AMBIT? The database dump that is available at http://ambit.uni-plovdiv.bg/downloads/ambit2/db/ambit2-2011051401.7z contains the following datasets: ECHA list of pre-registered substances (143835 entries) ChemIDplus (structures for 80468 chemicals from the ECHA list of pre-registered substances) Chemical Identifier Resolver (structures for 72985 chemicals from the ECHA list of pre-registered substances) ChemDraw (structures for 22519 chemicals from the ECHA list of pre-registered substances) CPDBAS (1547 entries) DBPCAN (209 entries) EPAFHM (617 entries) FDAMDD (1216 entries) HPVCSI (3548 entries) HPVISD (1006 entries) IRISTR (544 entries) KIERBL (278 entries) NCTRER (232 entries) NTPBSI (2330 entries) NTPHTS (1408 entries) ISSCAN (1150 entries) ISSMIC (151 entries) ISSSTY (232 entries) TOXCST (320 entries) TXCST2 (960 entries) ECETOC Technical Report No. 66 Skin irritation and corrosion Reference Chemicals data base (1995) (176 entries) Local Lymph Node Data for the Evaluation of Skin Sensitization - Compilation of historical data (Dermatitis Vol 16 No 4 2005) (209 entries) Local Lymph Node Data for the Evaluation of Skin Sensitization - Second compilation (Dermatitis Vol 21 No 1 2010) (108 entries) Bioconcentration factor (BCF) Gold Standard Database (1130 entries) Benchmark Data Set for pKa Prediction of Monoprotic Small Molecules the SMARTS Way (185 entries) Benchmark Data Set for In Silico Prediction of Ames Mutagenicity (6512 entries) Bursi AMES Toxicity Dataset (4337 entries) EPI_AOP (818 entries) EPI_BCF (685 entries) EPI_BioHC (175 entries) EPI_Biowin (1263 entries) EPI_Boil_Pt (5890 entries) EPI_Henry (1829 entries) EPI_KM (631 entries) EPI_KOA (308 entries) EPI_Kowwin (15809 entries) EPI_Melt_Pt (10051 entries) EPI_PCKOC (788 entries) EPI_VP (3037 entries) EPI_WaterFrag (5764 entries) EPI_Wskowwin (2348 entries) TOXCST_ACEA (320 entries) TOXCST_Attagene (320 entries) TOXCST_BioSeek (320 entries) TOXCST_Cellumen (320 entries) TOXCST_CellzDirect (320 entries) TOXCST_Gentronix (320 entries) TOXCST_NCGC (320 entries) TOXCST_Novascreen (320 entries) TOXCST_Solidus (320 entries) TOXCST_ToxRefDB (320 entries) ECBPRS (structures and data for 80410 chemicals from the ECHA list of pre-registered substances) OPSIN (structures for 78458 chemicals from the ECHA list of pre-registered substances) You can also access all of the above mentioned datasets at https://ambit.uni-plovdiv.bg:8443/ambit2/dataset after you login with your OpenTox username and password at https://ambit.uni-plovdiv.bg:8443/ambit2/opentoxuser (You can register as an OpenTox user at http://www.opentox.org/join_form if you haven't already). In addition to these datasets, you could access at the same location the PubChem Structures + Assays dataset (473965 entries), which is not included in the MySQL dump that is available for download in order to keep it more compact. Please note that some additional datasets (not listed above, but available in the DB) are accessible only by OpenTox partners, due to specific licensing requirements and agreements. > Are you aware of this paper? [http://dx.doi.org/10.1016/j.taap.2009.08.022] > Perhaps you will find very useful Table 1 because it shows all databases for tox that are available in the literature. Which of these > do you have? As you can see from the list above, there's some degree of overlap between the references in Table 1 of this paper and the datasets included in the OpenTox DB, but both have entries that are absent in the other list. One major obstacle for including some of the sources that you mention is the lack of computer-readable bulk download for them. In addition, the AMBIT database is evolving continuously (even as I write these lines) and it can be somehow hard to tell what's included and what's not -- all registered users with sufficient privileges can add datasets at any time. In general, the OpenTox framework (and AMBIT as one particular implementation of the OpenTox API) provides the infrastructure to store and process relevant data in a more or less similar way as the Apache HTTP server acts for making available web site content. It's up to the users to upload whatever datasets, algorithms, models, etc..., they like to use or make available to others. So, in essence, the OpenTox DB is a kind of starting reference point, with particular emphasis on datasets that are relevant to the European REACH legislation, mainly due to the specific context of the OpenTox project. However, the OpenTox framework was designed in a generic way, to enable its use in other domains as well. It's up to the users to install, populate, run, maintain their own instances of OpenTox services. Furthermore, due to the common API, these services could be linked together and rely on each other for executing specific tasks (e.g. an algorithm provided by service A can be used to build a model by service B, using training dataset available at service C; the model at service B could be validated by service D and used to predict properties for a dataset hosted at service E, etc). You can have all of these running on a single box, or on a private cluster, or as (distributed) services that you offer to the public to use. > So Barry told me that you have a linux version of tox-create/tox-predict? Is that true? See my previous and Micha's mail for a detailed answers to these questions. The apps are platform independent and can run on any OS. ToxPredict and its dependencies are Java-based, ToxCreate and its dependencies are Ruby-based. As a somehow easier first step you might want to try the OpenTox virtual appliance, which has all of these apps pre-installed for you on a recent version of Linux: http://ambit.uni-plovdiv.bg/downloads/opentox/Opentox%20Virtual%20Appliance%20DC.ova Please note that this is a large file (2730474496 bytes). Its md5 checksum which you could check to ensure that no errors have occurred while downloading it is: 1530bb83e88c3c646bcbac3183745bab You could import and run the appliance in VirtualBox (http://www.virtualbox.org/). Let us know if we can be of further assistance. Kind regards, Vedrin
- Previous message: [OTDev] help request
- Next message: [OTDev] Luca Settimo
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list