In my previous post, I quoted the LSID Best Practices page (http://www-128.ibm.com/developerworks/opensource/library/os-lsidbp/) on describing "Abstract" LSIDs. Here is the full section:
*************************************** Abstract LSIDs
The data behind the data bytes of a concept might exist in multiple data formats or derivations. One approach using a single LSID would be to append all different instances together, using some token to separate the different formats. This solution is poor for many reasons, primarily because the client must download all formats. The best approach is to create a different LSID for each data format or for derivations and connect them with a single abstract LSID.
The benefit of using an abstract scheme is that it allows for LSIDs that do not name actual data bytes but instead provide only metadata documents. These LSIDs can be used to represent abstract notions, such as a gene or protein, which may have many concrete representations. The metadata documents associated with these abstract LSIDs can contain multiple relationships pointing to LSIDs that name data bytes.
In this way, researchers can use a series of LSIDs to create an interconnected metadata graph to name objects that may have many different representations. The abstract LSID provides the anchor point for software and users to explore the metadata and obtain further pointers to all the concrete LSID references that contain data, along with the data's exact relationship to the abstract concept. This level of indirection is very powerful. ***************************************
Previously, we've debated about whether an LSID assigned to a non-digital object should be assigned to the "Abstract" object, or to a specific database record created for that object. I'll stick with the Taxon Name example, but the same principles apply to other non-digital objects like specimens, observations, reference citations, etc.
Many, many databases in the world include a database record to represent the butterflyfish genus described by Linnaeus in 1758 (which, for the sake of simplicity, I'll henceforth refer to via the ASCII rendering "Chaetodon").
Database records (rows) are, inherently, digital objects, and therefore can (with some level of established convention) be represented by binary "data" -- retrievable via getData(). Thus, the many, many database records out there can each receive a proper data-bearing LSID. Obviously, there would need to be mechanisms to make sure that the bytestream returned by getData() for these inherently digital database records are always bit-consistent. This could be relatively easy if the only "data" returned for the LSID is a specified encoding of the primary key value for the database record, and all the other columns/fields were returned via getMetadata(). But the point is, a database record *is* an inherently digital object, and therefore *can* be legitimately represented by a data-bearing (non-Abstract) LSID.
We could then assign an "Abstract" LSID for the "idea" or "notion" of the scientific name "Chaetodon", and use that LSID in the spirit of the above-quoted best practices description of Abstract LSIDs to track "further pointers to all the concrete LSID [for database records established for the genus Chaetodon] references that contain data".
That would effectively allow the Abstract LSID to serve the needs of those of us who *want* a shared, resusable, persistent identifier for the idea/notion/concept of the taxon name "Chaetodon", which itself serves as an index of sorts to all manner of database records (digital objects) that contain data (and metadata) associated with that taxon name.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html