[OTDev] Validation: Efficiency
Nina Jeliazkova jeliazkova.nina at gmail.comFri Feb 25 12:53:27 CET 2011
- Previous message: [OTDev] Validation: Efficiency
- Next message: [OTDev] Validation: Efficiency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Andreas, On 25 February 2011 13:28, Andreas Maunz <andreas at maunz.de> wrote: > Nina, > > you are right (I think it still is the case that datasets are redundant). > However, with different model parameters, which will probably be used a lot > in validation, new datasets will be created. > I think it would be definitely necessary to not store data redundantly (as > you indicated), but that might be only part of the solution. > So it may still be necessary to compress the amount of policies needed. > > Well, thinking further 1) I would implement validation splits (at least at our services) as logical splits of the same dataset , assigning some tags, similar to what is in the mutagenicity Benchmark dataset (look for column "Set" http://apps.ideaconsult.net:8080/ambit2/feature/28956 ) http://apps.ideaconsult.net:8080/ambit2/dataset/2344?max=100 and introduce searching similar to the queries below (restricted to the property in question) Training set http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=TRAIN Crossvalidation sets http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV1 http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV2 http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV3 ... Thus, everything is in the original dataset (or a single copy of it on another dataset service) and no need of additional policies. Different features , calculated during validation run would be specified via feature_uris[] parameter on the same dataset URI. http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV3?feature_uris[]=.. .. 2) Not changing your current approach, perhaps it makes sense to introduce in the API a resource for "groups of datasets" , that could be used as a placeholder for URIs of several datasets, and use some wildcards on the policy server to ensure only one policy for the group of dataset is needed. I guess groups of datasets could be useful in other cases as well. Nina > Andreas > > Nina Jeliazkova wrote on 02/25/2011 12:06 PM: > >> Andreas, >> >> I have not thought about it in detail, but having in mind differences in >> dataset implementation at Freiburg and ours, I think part of the problem >> is (AFAIK) your implementation makes full copy of the dataset on each >> run, regardless of using same URIs (e.g. as same records in the database) >> >> So may be this is just an implementation specific? >> >> Nina >> >> On 25 February 2011 13:02, Andreas Maunz <andreas at maunz.de >> <mailto:andreas at maunz.de>> wrote: >> >> Dear all, >> >> since a single validation of a model on a dataset creates multiple >> ressources (currently > 50), and by the fact that everything is >> decentralized (i.e. linked via URIs) and referenceable in OpenTox, >> we are facing the problem that currently prohibitively high load is >> placed on the AA services, because a policy must be created and >> requested multiple times (and eventually deleted) for each of the >> resources. >> >> For example the spike in http://tinyurl.com/6amuo8x to the very >> right is produced by a single validation. Moreover, the validation >> service is very slow, the AA related part alone takes at least >> several minutes. All this is induced by the amount of single >> policies that have to be created. >> >> Martin argues that currently there seems no API compliant way of >> improving performance: One way could be to collect all URIs and >> create a policy covering all of them at the end of the validation. >> However, there is no way of notifying validation-involved services >> to not create policies in the first place. Also, without policies, >> there would be no way for validation to access the resource, since >> default (without associated policy) is "deny". >> >> We consider this issue high priority, which should be dealt with >> before everyone starts using validation in production. Perhaps we >> would need an API extension that allows the collection strategy >> discussed before, or are there other suggestions? >> >> Best regards >> Andreas >> _______________________________________________ >> Development mailing list >> Development at opentox.org <mailto:Development at opentox.org> >> >> http://www.opentox.org/mailman/listinfo/development >> >> >> > -- > http://www.maunz.de > > According to my calculations the problem doesn't exist. >
- Previous message: [OTDev] Validation: Efficiency
- Next message: [OTDev] Validation: Efficiency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list