[tdwg-guid] An approach to Abstract LSIDs[Scanned]

Paul Kirk p.kirk at cabi.org
Mon Jul 16 09:16:29 CEST 2007

Isn't this where I came in last week Rich?

The LSID urn:lsid:indexfungorum.org:names:178962 is assigned to the IF
database record for Amanita phalloides (the deathcap - used as one of
the EoL sample species pages [well done EoL]). If I recall correctly the
getData() returns "Amanita phalloides" [not the primary key of the
database record, which is 178962 - Rich, why did you restrict the
getData() to returning a PK?]; the getMetadata() returns "(Vaill. ex
Fr.) Link","1833", etc etc. Whatever the encoding, this LSID will always
return the same 'payload' - if the coding value of any of the characters
in the string has to change for whatever reason (e.g. typographical
'spelling' error, Code required orthographic correction, even
capitalization of the specific epithet) it gets a new LSID (a new
database record) and the 'old' LSID (the 'old' database record) has a
column containing the new PK and a metadata element in the getMetadata()
containing the new LSID. Kevin will correct me if I'm wrong on this.

Is this your proposed solution Rich?


PS the author string above could have been represented by "(" &
urn:lsid:ipni.org:authors:11023-1 & "ex" &
urn:lsid:ipni.org:authors:2913-1 & ")" &
urn:lsid:ipni.org:authors:22401-1 but more elegantly rendered in XML ...

-----Original Message-----
From: tdwg-guid-bounces at lists.tdwg.org
[mailto:tdwg-guid-bounces at lists.tdwg.org] On Behalf Of Richard Pyle
Sent: 15 July 2007 20:00
To: tdwg-guid at lists.tdwg.org
Subject: [tdwg-guid] An approach to Abstract LSIDs[Scanned]

In my previous post, I quoted the LSID Best Practices page
(http://www-128.ibm.com/developerworks/opensource/library/os-lsidbp/) on
describing "Abstract" LSIDs.  Here is the full section:

Abstract LSIDs

The data behind the data bytes of a concept might exist in multiple data
formats or derivations. One approach using a single LSID would be to
append all different instances together, using some token to separate
the different formats. This solution is poor for many reasons, primarily
because the client must download all formats. The best approach is to
create a different LSID for each data format or for derivations and
connect them with a single abstract LSID.

The benefit of using an abstract scheme is that it allows for LSIDs that
do not name actual data bytes but instead provide only metadata
These LSIDs can be used to represent abstract notions, such as a gene or
protein, which may have many concrete representations. The metadata
documents associated with these abstract LSIDs can contain multiple
relationships pointing to LSIDs that name data bytes.

In this way, researchers can use a series of LSIDs to create an
interconnected metadata graph to name objects that may have many
different representations. The abstract LSID provides the anchor point
for software and users to explore the metadata and obtain further
pointers to all the concrete LSID references that contain data, along
with the data's exact relationship to the abstract concept. This level
of indirection is very powerful.

Previously, we've debated about whether an LSID assigned to a
non-digital object should be assigned to the "Abstract" object, or to a
specific database record created for that object.  I'll stick with the
Taxon Name example, but the same principles apply to other non-digital
objects like specimens, observations, reference citations, etc.

Many, many databases in the world include a database record to represent
the butterflyfish genus described by Linnaeus in 1758 (which, for the
sake of simplicity, I'll henceforth refer to via the ASCII rendering

Database records (rows) are, inherently, digital objects, and therefore
can (with some level of established convention) be represented by binary
-- retrievable via getData().  Thus, the many, many database records out
there can each receive a proper data-bearing LSID.  Obviously, there
would need to be mechanisms to make sure that the bytestream returned by
getData() for these inherently digital database records are always
This could be relatively easy if the only "data" returned for the LSID
is a specified encoding of the primary key value for the database
record, and all the other columns/fields were returned via
getMetadata().  But the point is, a database record *is* an inherently
digital object, and therefore *can* be legitimately represented by a
data-bearing (non-Abstract) LSID.

We could then assign an "Abstract" LSID for the "idea" or "notion" of
the scientific name "Chaetodon", and use that LSID in the spirit of the
above-quoted best practices description of Abstract LSIDs to track
"further pointers to all the concrete LSID [for database records
established for the genus Chaetodon] references that contain data".

That would effectively allow the Abstract LSID to serve the needs of
those of us who *want* a shared, resusable, persistent identifier for
the idea/notion/concept of the taxon name "Chaetodon", which itself
serves as an index of sorts to all manner of database records (digital
objects) that contain data (and metadata) associated with that taxon


Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology Department of Natural Sciences,
Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org

tdwg-guid mailing list
tdwg-guid at lists.tdwg.org
The information contained in this e-mail and any files transmitted with it is confidential and is for the exclusive use of the intended recipient. If you are not the intended recipient please note that any distribution, copying or use of this communication or the information in it is prohibited. 

Whilst CAB International trading as CABI takes steps to prevent the transmission of viruses via e-mail, we cannot guarantee that any e-mail or attachment is free from computer viruses and you are strongly advised to undertake your own anti-virus precautions.

If you have received this communication in error, please notify us by e-mail at cabi at cabi.org or by telephone on +44 (0)1491 829199 and then delete the e-mail and any copies of it.

CABI is an International Organization recognised by the UK Government under Statutory Instrument 1982 No. 1071.


More information about the tdwg-tag mailing list