Bob,

This is interesting stuff.  I don't know what claims Oracle is making 
for it's triple store, but there are many other database-backed triple 
stores out there and I've examined several of them in depth.

With most of the current crop of triple stores (including Jena's which 
we use in DiGIR2), triples are stored in a single de-normalized table 
that has columns for subject, predicate, and object.  This table is 
heavily indexed to allow quick lookups, for example, to find statements 
with a given subject and predicate.  The difficult thing in measuring 
triple store performance is not raw throughput (how fast triples can be 
loaded or listed), but is instead their performance on queries.  With 
many of the triple stores I've examined, raw throughput is limited only 
by how fast the underlying database is at performing SQL inserts and 
selects.  Granted there is some overhead in the RDF framework that sits 
atop the database, but performance for insertions and basic retrievals 
is dominated by the underlying database.

With sophisticated queries the story is quite different.  For a long 
time every triple store had it's own query language.  Now that the world 
is starting to standardize on SPARQL I hope to see a standard set of 
SPARQL-based metrics that will allow query performance comparisons to be 
made across triple store implementations.  SPARQL is very powerful and 
allows a large variety of useful queries.  However much of SPARQL cannot 
be pushed down into SQL queries.  This makes any triple store designed 
to work over a relational database at risk of having to load all triples 
into memory for examination by the RDF framework in order to answer 
sophisticated SPARQL queries.  The simplest example of such a query is 
one that uses the filter(regex()) pattern because most relational 
databases cannot perform XPath's matches regex function.

I hope to have more information about Oracle's performance claims soon 
and I'll share them with the list when I get them.

-Steve

Bob Morris wrote: