RDF query and inference in a distributed environment

Blum, Stan sblum at CALACADEMY.ORG
Wed Jan 4 14:03:51 CET 2006


Robert Huber wrote:

It would be very interesting to me whether OAI ever was discussed in the TDWG
or for use in GBIF?

---------------------


Robert,

Back in the 90's issues of "control" (integrity and availability, even down
to classes of users) and attribution really dominated discussions.  Many
institutions wanted assurance that users would understand who created, or at
least digitized, and maintained the data.  Oposing schools of thought
emerged, the "data hoarders" versus the "data anarchists".  Given that
context, several forays into data integration (Species Analyst, REMIB, AVH,
etc.) prior to DiGIR and BioCASe all assumed a requirement for distributed
queries (and later indexing), rather than harvesting and wholesale caching.
We were aware of OAI by 2000, and it was even implemented in vPlants project
http://www.vplants.org/, but the rest of us avoided that approach because we
fully expected it to be much more difficult to sell "let us serve your data
for you".  The community is STILL very wary of empire building.  


Regarding Rod Page's contrast between the approaches in biodiversity- and
(molecular) bio-informatics: I think one of the drivers behind the two
approaches that is that (lab-)bench scientists tend to view sequence data as
observation-events and relatively fixed once they are submitted; whereas
collections people tend to view specimens (and maybe even classifications) as
evolving entities that must be updated.  We have a longer view of the
specimens and the data about them.

Having said that, there is no doubt that building large warehouses is a
simpler and better performing architecture for integration.  We just need to
address data replication explicitly and deliberately, in a heterogeneous
environment where "subscribers" can go offline (i.e., a database transaction
oriented approach would be too stringent).  

A GUID system that incorporates versioning should help support data
replication.  Open and standards-based protocols appear to be very effective
in calming fears about empire buiding.

-Stan

Stanley D. Blum, Ph.D.
Research Information Manager
California Academy of Sciences
875 Howard St.
San Francisco,  CA
+1 (415) 321-8183




More information about the tdwg-tag mailing list