[tdwg-tapir] Output models based on RDFS class definitions?

Wed Mar 22 11:07:11 CET 2006

Hi Guys,

This may be crazy idea but I thought it was worth running past you.

I have been working on RDFS vocabulary for TCS and this has spun off a 
bunch of ideas. One is stolen from the prismstandard.org thing of having 
avowed serializations of RDF data. You can read more here 
http://biodiv.hyam.net/schemas/TCS_RDF/tcs_rdf_examples.zip if you 
haven't already.

Following on from this I have had the idea of generating Tapir output 
models from RDFS class definitions. The TaxonName class, for instance, 
has 23 properties. Most of these are literals. It might be possible to 
automatically generate a Tapir output model using XSLT from this class 
definition. This would then give people the ability to query for 
TaxonName objects that are an avowed XML Schema serialization of RDF 
over Tapir. The data mapping done by the client could be quite simple 
because they are only having to retrieve a single object (more or less a 
table per class if we were building a cache database) which would also 
make it fast. It would be easy for people to understand and work through 
a data mapping process to set up a provider and the process could make 
use of querying a central repository of objects types that is 
periodically updated.

The problem comes with properties that are not literals but resource 
pointers. An example is TaxonName->authorshipTeam. In RDF this would be 
a resource that points to a team of people. If we all use GUIDs then the 
data source could just issue a GUID for the resource identifier and the 
client could then run another query to get the author team (or do it by 
resolving an LSID...). This multiple calls process is obviously 
inefficient. It is effectively getting the client to do their own joins 
but as the client is going to want to do this over multiple data sources 
and also it could be prohibitive to allow clients to do any arbitrary 
joins on a single host machine it might be a price worth paying. The 
other thing is that it solves the non-relational problem i.e. what if 
the data source can provide AuthorTeams and TaxonNames but can't permit 
arbitrary queries between the two (e.g. one is a researcher's Access db 
and the other the institutions collections management system).

Effectively this is a mechanism for formally creating lots of Darwin 
Core schemas to use....

Does this sound like a crazy idea or not?

All the best,

Roger

-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger at tdwg.org
 +44 1578 722782
-------------------------------------