[Tdwg-guid] Zoobank LSID Service
smperry at ku.edu
Fri Sep 22 22:58:16 CEST 2006
peter.hollas at thomson.com wrote:
> 2. Is there any standard scheme for LSID discovery? i.e. Would it be a
> good/bad idea to extend the LSID service to allow machine queries of
> LSIDs by taxon name rather than discovering them through the web
> Any comments and suggestions are very welcome!
> Regards, Peter.
In my view, LSID is primarily a naming scheme. While it does allow for
the resolution of data objects, it was not designed to support other
common data access tasks such as discovery, search, query, or harvest.
To support these additional data access tasks I feel we ought to look to
other standard or well-established protocols. Some protocols, like the
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) are
well suited to harvesting. Others, like the W3C's SPARQL protocol are
well suited to search and query. Unfortunately there's no single
protocol for working with RDF data that does everything we need. The
problem for data providers is that each additional protocol that they
are asked to support adds an additional burden on them. At the same
time, the easier a provider makes it to access their data, the more
their data will be used. However, many data providing organizations
wish to devote their resources to the creation and curation of data,
rather than to the implementation of data access protocols.
We're hoping to provide an end-to-end solution for this problem with the
wasabi project (formally known as DiGIR2). While the rest of this
message is a bit of a plug, for wasabi, it also acts to illustrate the
fact that different stakeholders in a data network have different
requirements for data access.
The central idea behind wasabi is that data providing organizations
generate RDF data objects, assign LSIDs to them, and drop them into a
locally-installed wasabi server. The wasabi server makes data objects
available through a variety of standard data access protocols. It
supports LSID to metadata resolution through a simple HTTP-get protocol
and through a plugin to the IBM LSID resolver. The wasabi server also
allows data aggregators to use OAI-PMH to efficiently fetch data objects
in bulk so that they can be indexed for search. One benefit of the OAI
protocol, especially the implementation in the wasabi server, is that it
allows for incremental harvesting. Because it only sends the data
objects that have changed since the last harvest, OAI-PMH decreases the
load on a provider's server and allows for fast indexing of their data.
Finally the wasabi server provides direct SPARQL query access to data
objects. Any time a wasabi server is queried (through OAI, SPARQL, or
LSID resolution), the access is logged. This helps data providers keep
track of who is using their data.
If a provider doesn't want to write a custom program that generates
RDF/XML they can use the wasabi server's synchronizer program, along
with a concept mapping configuration file, to periodically connect to
their database, transform its contents into RDF data objects, and load
them into the wasabi server. The wasabi server will keep track of which
objects are newly added, deleted, or updated.
The RDF server is only one component of the wasabi project. It also
includes a library that implements the client side of the supported
data access protocols. This makes it easy for people to write custom
software to grab data from wasabi servers. Researchers who want to
gather large amounts of data for analysis can use the client library to
simplify the task.
Another important part of the wasabi project is the indexer. The
indexer supports harvesting from multiple distributed wasabi servers.
Harvested data are then pushed into the indexer which can generate
indices that are designed to support various types of queries. For
example, the indexer can use a Google-style inverted index that is well
suited to full-text queries, a database with geospatial extensions to
support geographical queries, or even a triple store. The wasabi
indexer could be used by large data aggregators like GBIF or by custom
software developers. One example of the later are the developers of
collections management software who might want to index and cache data
from TCS providers so that their users can associate specimens with
taxon concepts through an easy to use desktop software package.
Both the client library and the indexer allow computers to access and
work with data. The wasabi project also provides an extensible web
portal. Built over the client library and an index of data objects
harvested from wasabi servers, the portal component allows data to be
accessed by people through a customizable interface. It supports
browsing, searching, and downloading of data objects. One use for the
portal component might be as the public face of a thematic data networks
like FishNet2, MaNIS, OrNIS, or HerpNet.
The wasabi project is free and open source. It is implemented in Java.
The server, client library, and indexer portions of the project are now
in beta and we plan to release them by the end of the year. The portal
is still under active development.
I'll be presenting wasabi at the TDWG meeting in St. Louis next month.
More information about the tdwg-tag