<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<br>

Steve,<br>

<br>

I find the triple store debate interesting because the goal posts seem

to shift. No one would claim to design a conventional relational

structure (or perhaps generate one from an XML Schema?) that was

guaranteed to perform equally quickly for any arbitrary query. All real

world relational schemas are optimized for a particular purpose. When

people talk about triple stores they expect to be able to ask them <b>anything

</b>and get a responsive answer - when that is never expected of

relational db. <br>

<br>

As an example: Donald would be mad to build the GBIF data portal as a

basic triple store because 90% of queries are currently going to be the

same i.e. by taxon name and geographical area. Even if there is a

triple store out the back some place one is going to want to optimize

for the most common queries. If you want to ask something weird you are

going to have to wait!<br>

<br>

Enabling a client to ask an arbitrary query of a data store with no

knowledge of the underlying structure (only the semantics) and

guaranteeing response times seems, to me, to be a general problem of

any system - whether we are in a triple based world or a XML Schema

based one. It also seems to be one we don't need to answer.<br>

<br>

I imagine that the people who are looking at optimizing triple stores

are looking at using the queries to build 'clever' indexes that amount

to separate tables for triple patterns that occur regularly a little

like MS SQL Server does with regular indexes. But then this is just me

speculating.<br>

<br>

We have to accept that data publishers are only going to offer a

limited range of queries of their data. Complex queries have to be

answered by gathering a subset of data (probably from several

publishers) locally or in a grid and then querying that in interesting

ways. Triple stores would be great for this local cache as it will be

smaller and can sit in memory etc. The way to get data into these local

caches is by making sure the publishers supply it in RDF using common

vocabularies - even if they don't care a fig about RDF and are just

using an XML Schema which has an RDF mapping.<br>

<br>

Can we make a separation between the use of RDF/S for transfer, for

query and for storage or are these things only in my mind?<br>

<br>

Thanks for your input on this,<br>

<br>

Roger<br>

<br>

<br>

Steven Perry wrote:

<blockquote cite="mid442959D7.8050603@ku.edu" type="cite">

  <pre wrap="">Bob,

This is interesting stuff.  I don't know what claims Oracle is making 

for it's triple store, but there are many other database-backed triple 

stores out there and I've examined several of them in depth.

With most of the current crop of triple stores (including Jena's which 

we use in DiGIR2), triples are stored in a single de-normalized table 

that has columns for subject, predicate, and object.  This table is 

heavily indexed to allow quick lookups, for example, to find statements 

with a given subject and predicate.  The difficult thing in measuring 

triple store performance is not raw throughput (how fast triples can be 

loaded or listed), but is instead their performance on queries.  With 

many of the triple stores I've examined, raw throughput is limited only 

by how fast the underlying database is at performing SQL inserts and 

selects.  Granted there is some overhead in the RDF framework that sits 

atop the database, but performance for insertions and basic retrievals 

is dominated by the underlying database.

With sophisticated queries the story is quite different.  For a long 

time every triple store had it's own query language.  Now that the world 

is starting to standardize on SPARQL I hope to see a standard set of 

SPARQL-based metrics that will allow query performance comparisons to be 

made across triple store implementations.  SPARQL is very powerful and 

allows a large variety of useful queries.  However much of SPARQL cannot 

be pushed down into SQL queries.  This makes any triple store designed 

to work over a relational database at risk of having to load all triples 

into memory for examination by the RDF framework in order to answer 

sophisticated SPARQL queries.  The simplest example of such a query is 

one that uses the filter(regex()) pattern because most relational 

databases cannot perform XPath's matches regex function.

I hope to have more information about Oracle's performance claims soon 

and I'll share them with the list when I get them.

-Steve

Bob Morris wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap=""><a class="moz-txt-link-freetext" href="http://www.franz.com/resources/educational_resources/white_papers/AllegroCache_RDF_Dobbs2006.pdf">http://www.franz.com/resources/educational_resources/white_papers/AllegroCache_RDF_Dobbs2006.pdf</a>

is a rather interesting piece about RDF scalability. They claim to load 

300,000 triples/sec from a triple store based on Allegro Common Lisp.

Allegro CL is also at the heart of Ora Lassila's Wilbur toolkit. OINK, a 

way cool new Wilbur application is described at 

<a class="moz-txt-link-freetext" href="http://www.lassila.org/blog/archive/2006/03/oink.html">http://www.lassila.org/blog/archive/2006/03/oink.html</a>. [Does Wilbur run 

on the free version of Allegro?]. Wilbur loads 2600 triples/sec. Lassila 

is generally regarded with Hendler and Berner's-Lee as one of the 

founders of the Semantic Web.

[Bill Campbell, a colleague of mine and author of UMB-Scheme distributed 

with RedHat Linux, once remarked "XML is just Lisp with pointy 

brackets". The above might support: "RDF is just CLOS with pointy 

brackets". Which, by the way, is positive.]

Does anyone know what triple retrieval claims Oracle is making for its 

triple store support?

There is a good current survey of RDF programming support at 

<a class="moz-txt-link-freetext" href="http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/">http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/</a>

--Bob

_______________________________________________

Tdwg-tag mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Tdwg-tag@lists.tdwg.org">Tdwg-tag@lists.tdwg.org</a>

<a class="moz-txt-link-freetext" href="http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org">http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org</a>

    </pre>

  </blockquote>

  <pre wrap=""><!---->

_______________________________________________

Tdwg-tag mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Tdwg-tag@lists.tdwg.org">Tdwg-tag@lists.tdwg.org</a>

<a class="moz-txt-link-freetext" href="http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org">http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org</a>

  </pre>

</blockquote>

<br>

<br>

<pre class="moz-signature" cols="72">-- 

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>

 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>

 +44 1578 722782

-------------------------------------

</pre>

</body>

</html>