Dear Ricardo, The issues you raise are the ones we are currently playing with at Glasgow. It's still early days, but a here are a few thoughts. In terms of distributed queries, languages such as SPARQL support querying remote sources (for an introduction see Leigh Dodd's article http://www.xml.com/pub/a/2005/11/16/introducing-sparql-querying- semantic-web-tutorial.html). SPARQL has its own query protocol using web services. I've not played with this much yet, but it look promising. Basically, your query can contain the URL of a resource and SPARQL will query it. How this will scale across multiple sources I'm not sure. Another tool to consider is TAP (http://tap.stanford.edu/). In terms of discovery, this is a separate issue in some ways. I think an obvious thing to look at is BioMoby. A light weight version of this would be to have a RDF source that listed data sources and described what kinds of information they contain. A bit like GBIFs UDDI registry, but actually useful ;-) Long term what I think might happen is that users have their own triple stores, and as they do queries the results get added to their own triple store and they can make inferences locally that they are interested in. MIT's Piggy bank project (http://simile.mit.edu/piggy-bank/) is an example of this sort of approach. Regards Rod On 3 Jan 2006, at 13:24, Ricardo Scachetti Pereira wrote:
Dear Rod,
Thanks for putting those ideas together and sharing them with the group. We really needed some more concrete examples to explore the use of LSID and RDF in the GUID metadata. It definitely sound like a good combination of technologies to explore.
Reading your manuscript one question kept coming to my mind. The most important benefits of using RDF seem to be the graph representation, the possibility of merging separate graphs into larger ones (the merge operation), and the possibility of making inferences on the resulting graph. All those features are supported by the so called triple stores.
However, the triple store examples I come across always use a local and centralized architecture. I'm aware that those are simplifications needed to get the new concepts through, but I suppose we need to address the scalability issues of those technologies if we want to use them in production environment.
So I wanted to explore a little in this list is the following issue: How does the RDF inference mechanisms scale up to a distributed environment like ours? In my opinion, GUID technologies provide only a bit (although a fundamental bit) of the infrastructure needed to make the RDF metadata framework fully functional, i.e., it provides the function: get me all metadata for this ID (besides persistence and maybe provenance). I suppose that other pieces of functionality will be required, such as: i) other metadata discovery mechanisms (to find out, for example, who has metadata using this or that ontology); ii) a more flexible search mechanism (such as RDQL or SPARQL). In other words, we might need a RDF compatible query protocol.
Then it makes a lot of sense to have "aggregators" all around (such as GBIF or others) harvesting and merging RDF triples of interest into local triple stores and providing query and inference services, and providing links (for humans and machines) back to the information source.
Does anyone have an idea on how SPARQL or related technologies are right now in terms of supporting this kind of functionality?
Cheers,
Ricardo
Roderic Page wrote:
I've finally managed to put down some ideas on LSIDs, metadata, and taxonomic names in the form of a manuscript. It's a bit rough, but I thought members of this list might find it of interest. You can grab a PDF here: http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples/lsid.pdf
The web site http://darwin.zoology.gla.ac.uk/~rpage/lsid/examples has links to some example RDF files that are mentioned in the manuscript (where they are trimmed to save space).
I'd welcome any comments.
Regards
Rod
---------------------------------------------------------------------- -- ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org
<ricardo.vcf>
---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org