Bob,
This is interesting stuff. I don't know what claims Oracle is making
for it's triple store, but there are many other database-backed triple
stores out there and I've examined several of them in depth.
With most of the current crop of triple stores (including Jena's which
we use in DiGIR2), triples are stored in a single de-normalized table
that has columns for subject, predicate, and object. This table is
heavily indexed to allow quick lookups, for example, to find statements
with a given subject and predicate. The difficult thing in measuring
triple store performance is not raw throughput (how fast triples can be
loaded or listed), but is instead their performance on queries. With
many of the triple stores I've examined, raw throughput is limited only
by how fast the underlying database is at performing SQL inserts and
selects. Granted there is some overhead in the RDF framework that sits
atop the database, but performance for insertions and basic retrievals
is dominated by the underlying database.
With sophisticated queries the story is quite different. For a long
time every triple store had it's own query language. Now that the world
is starting to standardize on SPARQL I hope to see a standard set of
SPARQL-based metrics that will allow query performance comparisons to be
made across triple store implementations. SPARQL is very powerful and
allows a large variety of useful queries. However much of SPARQL cannot
be pushed down into SQL queries. This makes any triple store designed
to work over a relational database at risk of having to load all triples
into memory for examination by the RDF framework in order to answer
sophisticated SPARQL queries. The simplest example of such a query is
one that uses the filter(regex()) pattern because most relational
databases cannot perform XPath's matches regex function.
I hope to have more information about Oracle's performance claims soon
and I'll share them with the list when I get them.
-Steve
Bob Morris wrote:
http://www.franz.com/resources/educational_resources/white_papers/AllegroCache_RDF_Dobbs2006.pdf
is a rather interesting piece about RDF scalability. They claim to load
300,000 triples/sec from a triple store based on Allegro Common Lisp.
Allegro CL is also at the heart of Ora Lassila's Wilbur toolkit. OINK, a
way cool new Wilbur application is described at
http://www.lassila.org/blog/archive/2006/03/oink.html. [Does Wilbur run
on the free version of Allegro?]. Wilbur loads 2600 triples/sec. Lassila
is generally regarded with Hendler and Berner's-Lee as one of the
founders of the Semantic Web.
[Bill Campbell, a colleague of mine and author of UMB-Scheme distributed
with RedHat Linux, once remarked "XML is just Lisp with pointy
brackets". The above might support: "RDF is just CLOS with pointy
brackets". Which, by the way, is positive.]
Does anyone know what triple retrieval claims Oracle is making for its
triple store support?
There is a good current survey of RDF programming support at
http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/
--Bob
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org