[Tdwg-tag] TCS in RDF for use in LSIDs and possiblegeneric mechanism.

Roger Hyam roger at tdwg.org
Wed Mar 29 11:16:56 CEST 2006


Steve,

I find the triple store debate interesting because the goal posts seem 
to shift. No one would claim to design a conventional relational 
structure (or perhaps generate one from an XML Schema?) that was 
guaranteed to perform equally quickly for any arbitrary query. All real 
world relational schemas are optimized for a particular purpose. When 
people talk about triple stores they expect to be able to ask them 
*anything *and get a responsive answer - when that is never expected of 
relational db.

As an example: Donald would be mad to build the GBIF data portal as a 
basic triple store because 90% of queries are currently going to be the 
same i.e. by taxon name and geographical area. Even if there is a triple 
store out the back some place one is going to want to optimize for the 
most common queries. If you want to ask something weird you are going to 
have to wait!

Enabling a client to ask an arbitrary query of a data store with no 
knowledge of the underlying structure (only the semantics) and 
guaranteeing response times seems, to me, to be a general problem of any 
system - whether we are in a triple based world or a XML Schema based 
one. It also seems to be one we don't need to answer.

I imagine that the people who are looking at optimizing triple stores 
are looking at using the queries to build 'clever' indexes that amount 
to separate tables for triple patterns that occur regularly a little 
like MS SQL Server does with regular indexes. But then this is just me 
speculating.

We have to accept that data publishers are only going to offer a limited 
range of queries of their data. Complex queries have to be answered by 
gathering a subset of data (probably from several publishers) locally or 
in a grid and then querying that in interesting ways. Triple stores 
would be great for this local cache as it will be smaller and can sit in 
memory etc. The way to get data into these local caches is by making 
sure the publishers supply it in RDF using common vocabularies - even if 
they don't care a fig about RDF and are just using an XML Schema which 
has an RDF mapping.

Can we make a separation between the use of RDF/S for transfer, for 
query and for storage or are these things only in my mind?

Thanks for your input on this,

Roger


Steven Perry wrote:
> Bob,
>
> This is interesting stuff.  I don't know what claims Oracle is making 
> for it's triple store, but there are many other database-backed triple 
> stores out there and I've examined several of them in depth.
>
> With most of the current crop of triple stores (including Jena's which 
> we use in DiGIR2), triples are stored in a single de-normalized table 
> that has columns for subject, predicate, and object.  This table is 
> heavily indexed to allow quick lookups, for example, to find statements 
> with a given subject and predicate.  The difficult thing in measuring 
> triple store performance is not raw throughput (how fast triples can be 
> loaded or listed), but is instead their performance on queries.  With 
> many of the triple stores I've examined, raw throughput is limited only 
> by how fast the underlying database is at performing SQL inserts and 
> selects.  Granted there is some overhead in the RDF framework that sits 
> atop the database, but performance for insertions and basic retrievals 
> is dominated by the underlying database.
>
> With sophisticated queries the story is quite different.  For a long 
> time every triple store had it's own query language.  Now that the world 
> is starting to standardize on SPARQL I hope to see a standard set of 
> SPARQL-based metrics that will allow query performance comparisons to be 
> made across triple store implementations.  SPARQL is very powerful and 
> allows a large variety of useful queries.  However much of SPARQL cannot 
> be pushed down into SQL queries.  This makes any triple store designed 
> to work over a relational database at risk of having to load all triples 
> into memory for examination by the RDF framework in order to answer 
> sophisticated SPARQL queries.  The simplest example of such a query is 
> one that uses the filter(regex()) pattern because most relational 
> databases cannot perform XPath's matches regex function.
>
> I hope to have more information about Oracle's performance claims soon 
> and I'll share them with the list when I get them.
>
> -Steve
>
>
>
> Bob Morris wrote:
>
>   
>> http://www.franz.com/resources/educational_resources/white_papers/AllegroCache_RDF_Dobbs2006.pdf
>> is a rather interesting piece about RDF scalability. They claim to load 
>> 300,000 triples/sec from a triple store based on Allegro Common Lisp.
>>
>> Allegro CL is also at the heart of Ora Lassila's Wilbur toolkit. OINK, a 
>> way cool new Wilbur application is described at 
>> http://www.lassila.org/blog/archive/2006/03/oink.html. [Does Wilbur run 
>> on the free version of Allegro?]. Wilbur loads 2600 triples/sec. Lassila 
>> is generally regarded with Hendler and Berner's-Lee as one of the 
>> founders of the Semantic Web.
>>
>> [Bill Campbell, a colleague of mine and author of UMB-Scheme distributed 
>> with RedHat Linux, once remarked "XML is just Lisp with pointy 
>> brackets". The above might support: "RDF is just CLOS with pointy 
>> brackets". Which, by the way, is positive.]
>>
>> Does anyone know what triple retrieval claims Oracle is making for its 
>> triple store support?
>>
>> There is a good current survey of RDF programming support at 
>> http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/
>>
>> --Bob
>>
>> _______________________________________________
>> Tdwg-tag mailing list
>> Tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
>>  
>>
>>     
>
>
> _______________________________________________
> Tdwg-tag mailing list
> Tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
>
>   


-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger at tdwg.org
 +44 1578 722782
-------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060329/e11d82ad/attachment.html 


More information about the tdwg-tag mailing list