[OTDev] RDF in OpenTox
chung chvng at mail.ntua.grFri May 27 23:48:34 CEST 2011
- Previous message: [OTDev] RDF in OpenTox
- Next message: [OTDev] RDF in OpenTox
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Egon, Thank a lot for your comments. On Fri, 2011-05-27 at 19:27 +0200, Egon Willighagen wrote: > Dear Pantelis, > > On Fri, May 27, 2011 at 5:50 PM, chung <chvng at mail.ntua.gr> wrote: > > Some criticism on RDF from the experience we've gained in OpenTox : > > http://is.gd/qLJG3h . The article is not complete yet and will be > > enriched with more facts and diagrams. > > Please do, because right now you left out so much detail on what you > are in fact doing. I do appreciate your frustration, and the > difference is unacceptable. > > I have these questions: > > * RDF is not a format, while ARFF is for file format? you mix RDF and > RDF/XML as if they are the same thing; why? I use RDF to refer to RDF/XML for brevity. I should make it clear in the text. > * what RDF file format have you used? RDF/XML, as you later refer to? RDF/XML solely. I have some other measurements showing that RDF/XML performs better than any other RDF variant (using Jena). > * are you using reasoning, and if so why? moreover, you should not > compare a reasoning environment with a non-reasoning one (of course, > you'd see differences) No inference engines are involved. > * what information is specified in the ARFF header? Just the URIs of the features in the order they appear in the body of the document. > * why aren't you using a vector annotation in RDF? > * how large is the file, and what are you doing to use 2GB of heap space? 2.79GB to be precise :) That measurement is taken using a Java profiler on the following piece of code: VRI dataset = Services.ideaconsult().augment("dataset","585036"); DatasetSpider dss = new DatasetSpider(dataset); dss.parse(); OK, it's not very clear because it's ToxOtis code, but the only thing I do is that I download and parse an RDF document into an OntModel. > * how large is your data set? Approximately 2600 compounds, 177 features. > * what does your code look like? > It's layered (lots of dependencies) so I can't really present it as a standalone piece of code. It is optimized however meaning that using Jena and the current OpenTox framework it's the best we can do. I would be very interested if someone has different results with Jena or if someone has experience with a better library. > A fair comment would be take ARFF takes a short cut: it imposes > additional structure on the data, something you identify in your > report. RDF does not do that by itself. A vector environment does. > That does not mean that such is not possible with RDF. Have you > consider what options there are to introduce this vector restriction > into the computational framework, forced to use RDF? Do you believe it > is impossible to achieve that with RDF? Would you see it impossible to > define an ontology to capture vector notation, allowing you to specify > what each column in that vector represents? > Was it machine-readable, I would suggest Greek as the proper language/representation for serializing datasets. On the other side, we have ARFF: it is no more than an algorithm needs. RDF is somewhere in the middle. Personally I like it but we should be using it wisely and for the purposes it is designed for. And it is supposed to be a framework for **metadata data modeling**, not data modeling. A dataset has both parts: data + metadata. I would suggest something like ARFF for the actual data. As far as I know from other partners in the project, everything ends up as ARFF or Matrices. People don't exploit the flexibility of RDF nor do they use it for reasoning. Ok, after some time there might be the need to do something very elaborate and then RDF will prove invaluable. But I'm more in favor of the scalable solution of an ARFF-like representation. Another point I want to make is that RDF is by design memory consuming. Let me put it figuratively once again: If you read a book page after page, and you forget everything once you turn to the next page, there's no way that you can say something for the book as a whole. We either need an improvement of RDF towards this direction or to adopt an alternative scheme. And we cannot impose improvised restrictions I think. > Now, given that you do see that option too, you would probably end up > with a ontology looking very much like the ARFF specification, but the > in RDF. > > In short, based on your report I really cannot judge of RDF is the > problem, because your results do not make such conclusion possible. > Instead, I rather think that you are running into a highly confounded > analysis where it is not possible to assign the slowness to any > factor. I think you are comparing two widely different data models, > one optimized for computation (ARFF) and one not (your current RDF/XML > file). Would that perhaps be the significant factor in the difference > in speed? > > I am looking forward to a more detailed report on the various involved > factors that determine the speed here, > I'll present a more detailed version of the report with more evidence. Till then, any comments are welcome. Best regards, Pantelis > Egon >
- Previous message: [OTDev] RDF in OpenTox
- Next message: [OTDev] RDF in OpenTox
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list