[tdwg-guid] First step in implementing LSIDs?[Scanned]

Sun Jun 3 13:03:58 CEST 2007

I think we need to be clear what gets an LSID (or a GUID in general).

Some of the things listed by Anna are digital records, such as an  
image. It seems simplest to give these GUIDs that identify the image,  
with metadata linking the image to the thing the image depicts (there  
are existing RDF vocabularies to do this).

Some things listed, such as a specimen, are physical objects. These  
are different from digital objects, and they way in which GUIDs that  
identify real things are handled has caused all manner of discussion  
(see  http://www.w3.org/DesignIssues/HTTP-URI and related pages  
bookmarked at http://del.icio.us/rdmpage/303). LSIDs don't handle  
this well, unless we rely on metadata saying "the thing identified by."

So, at least on this level to say that all seven things get the same  
GUID is clearly a non starter.

Relationships between things can be easily specified in metadata ("is  
part of", "depicts", "is kind of").

The final issue is GUID reuse, that is, if somebody uses a INOTAXA  
record, they should at a minimum refer to the INOTAXA LSID. This  
would particularly apply to aggregators such as GBIF, who should not  
present their own identifiers unless GBIF has actually created the  
data. You often state "presumably shortly also available to GBIF in  
some form". It's not clear to what that means, but if it's GBIF  
because INOTAXA serves it, then I think GBIF should use INOTAXA LSIDs  
to refer to INOTAXA records.

Clearly, generating a plethora a new, effectively local ids  
(masquerading as global) is not a recipe for progress. If we don't  
reuse GUIDs we are wasting our time.

Regards

Rod

On 2 Jun 2007, at 18:53, Weitzman, Anna wrote:

> Hi Rich (et al.),
> I'm going to join this particular discussion  in spite of the fact  
> that I have not been able to follow the entire GUID discussion over  
> the past couple of years and I may be repeating things that have  
> been resolved.
>
> Let's continue to investigate whether an LSID applies to the  
> physical specimen or the database record (or both?).
>
> What about the record(s) for that same physical object in the  
> literature?  As we mark up literature, we are going to generate  
> LSIDs for specimen records that will need to be resolved to be  
> related to the same physical object (in a collection) and the data  
> record (usually in that same collection's database).
>
> Let's look at the example that Chris Lyal and I are contemplating  
> as we work on implementing an INOTAXA pilot to show in Bratislava:
> 1) a weevil specimen here at USNM (a type described in the BCA)
> 2) a record for it in the museum's database (we do have a type  
> database for insects, and it will be available in a year or two),  
> available on the museum's website, through GBIF, and through INOTAXA
> 3) a record from digitized and parsed BCA in INOTAXA (presumably  
> shortly also available to GBIF in some form)
> 4) a record for the same weevil from a paper published in the 1950s  
> available through INOTAXA (presumably shortly also available to  
> GBIF in some form)
> 5) a record for that weevil from a paper published in the 1990s  
> available through INOTAXA  (presumably shortly also available to  
> GBIF in some form)
> 6) a published image (or series of images) in the paper from the  
> 1990s -- but now also digitized and made available through INOTAXA  
> (presumably shortly also available to GBIF in some form)
> 7) a digitized image (or series of images) made in our imaging  
> project and made available through the museum's database, INOTAXA,  
> GBIF and MorphoBank
>
> Either each of these (1-7) will need to have its own LSID (or an  
> equivalent in the case of the specimen itself) or they will all  
> need to have the same LSID.  If the former, they will all have to  
> resolve to the same parent LSID--is this for the specimen or the  
> record in its home database?--in order for the overall biodiversity  
> information system to really work.
>
> Or let's take that a step further and make that a fish, where not  
> only is there a record in the museum's database with its LSID, but  
> that same record for the same fish that was imported some years ago  
> into FishBase (now out of date perhaps, but still available to GBIF  
> and via Fishbase).  At the time, it was imported without an LSID  
> and FishBase has (presumably) assigned it's own LSID...
>
> Or let's say that someone else digitized their copy of the same BCA  
> volume and followed the INOTAXA (taXMLit) and assigned yet another  
> LSID for the specimen record...is that really the same 'record' or  
> different from the one in #3?
>
> I would like to think that in the long run we do not need multiple  
> LSIDs for records that refer to the same specimen or record (as  
> long as we can be truly certain that they are 'the same'.  After  
> all, the literature markup has a whole series of unique IDs for its  
> various parts already, so can't we refer to 'the use of LSID 123 in  
> workID 987' or 'the use of LSID 123 on pageID 456 in workID 987'?
>
> There are a lot of IDs here, but unless every collection database  
> already has an LSID that we can 'grab' and use in INOTAXA we are  
> going to have to create our own LSIDs and count on a community  
> resolver to sort it all out (and even if that were true, not all  
> the specimens that we are going to be referring to from INOTAXA  
> have been put in electronic form anyplace else, so we will have to  
> assign LSIDs at least temporarily--Paul did not mention how they  
> are going to deal with the Zoological name LSIDs as at least a  
> temporary solution--but I assume that they have a similar problem).
>
> I'm sure I don't know what the best solution is, but that's what  
> I'm counting on the computer scientists in this group to tell me.   
> I just hope they tell me soon, since we're going to need answers soon!
>
> Cheers,
> Anna
>
> Anna L. Weitzman, PhD
> Botanical and Biodiversity Informatics Research
> National Museum of Natural History
> Smithsonian Institution
>
> office: 202.633.0846
> mobile: 202.415.4684
> weitzman at si.edu
>
> ________________________________
>
> From: tdwg-guid-bounces at lists.tdwg.org on behalf of Richard Pyle
> Sent: Sat 02-Jun-07 5:08 AM
> To: 'Paul Kirk'; 'Jason Best'; tdwg-guid at lists.tdwg.org
> Subject: RE: [tdwg-guid] First step in implementing LSIDs?[Scanned]
>
>
>
>
> Paul and List,
>
> First, I should clarify something about my earlier post.  I wrote  
> at the
> start of Scenario 3:
>
> "3) Issue data-less LSIDs without using the revision ID feature,  
> and track
> data change history separately from the LSIDs"
>
> That should have been "...and track *metadata* change history  
> separately
> from the LSIDs" (metadata, not data).
>
>> So, without making things too complicated as we 'start to walk'
>> in this domain of biodiversity informatics my vote is for a
>> variation of scenario 3) from Rich. The reason I vote for this
>> is that in the fullness of time, and the 'herb.IMI' database
>> has already started this, much of the metadata with be
>> LSIDs and it's correctness (i.e. sorting out typos etc) will
>> be delegated to the entities who issue those LSIDs. As IPNI
>> improves the quality of the metadata associated with the
>> LSIDs they issue (and if I understand correctly they do use
>> the scenario 3) from Rich) so the quality of the metadata
>> associated with a 'herb.IMI' LSID improves. The reason I
>> prefer the data + metadate 'model' is that in this instance
>> the data is fixed ... who changes collection/accession
>> numbers? ... so perfect for this role. Even if a collection
>> moves to a new owner the original data need not 'disappear'
>> in the same way that DOI's move with the objects as book and
>> journal titles change from one publisher to another.
>
> So...if I understand correctly, you differ from my scenario 3 in  
> that you do
> generate data-bearing LSIDs for specimens, but the data part is  
> limited to
> only the Accession number, not the complete set of data fields  
> associated
> with the record -- correct?  So, in effect, the object LSID  
> actially applies
> to is the binary accession number, not the "concept" of the  
> specimen.  I can
> imagine in this case that the LSID can be thought of as  
> representing the
> "concept of the specimen" because the accession number itself is a  
> surrogate
> for the physical specimen.  The only thing that concerns me about this
> approach is that there is a non-zero incidence of accidental duplicate
> catalog numbers within a given collection, and possibly errors in
> associating catalog numbers.  For example, if the computer database  
> for a
> collection had an error created by a technician who, for example,  
> entered
> the metadata for accession number IMI1234569 by mistake, when it  
> should have
> been IMI1234596 (and vice versa), then branding the accession  
> number as
> "data" for the LSID means that the LSID technically *must* stay  
> with the
> accession number (not the specimen associated with the metadata for  
> that
> LSID), after the error is discovered.  Not a huge problem, but could
> surprise people who had indexed the LSID before the error was  
> discovered,
> who then came back to resolve it again after the error was fixed  
> (i.e., they
> would get totally wrong information).  Given how rare this problem  
> is likely
> to be (against a backdrop of many far more likely problems we will  
> have to
> overcome), I don't see this as a strong reason not to proceed with  
> your
> plan.
>
>> Final point, the 'data' is the 'herb.IMI' accession number;
>> in context this is a GUI because of the existence of Index
>> Herbariorum. So, our data will be 123456 not IMI123456
>> because ... in the fullness of time we will include an
>> Index Herbariorum LSID to 'identify' the 'institutional
>> acronym' element of the metadata.
>
> Is the binary data for the accession number in 8-bit, or 16-bit?  I'm
> assuming 8-bit would be fine, as I suspect all collections would have
> accession numbers that can be rendered with 256-character ASCII.   
> Is there
> any "wrapper" to the number as binary data, or is it a straight  
> ASCII binary
> representation (e.g.:  
> 001100010011001000110011001101000011010100110110 for
> "12345")?
>
> I'm not sure I follow the logic of how embedding the accession  
> number as
> data for the LSID allows the LSID to move to a new owner.  I would  
> think the
> opposite. Isn't it likely that the new owner would create their own
> accession number for the specimen?  In this case, they would be  
> forced to
> generate a new LSID if they were following the same practice of  
> encoding the
> accession number as "data", rather than metadata.
>
> Also, wouldn't it make more sense to include the acronym (IMI) as  
> part of
> the data for the LSID? At least that way the "12345" would have *some*
> context.
>
> Finally, this approach would work only for collections where there  
> is a
> strict 1:1 correlation between accession numbers and specimen  
> objects for
> which an LSID is desired.
>
> Thanks for your comments -- this thread is already forcing me to  
> think about
> things in a way I hadn't thought of them before.
>
> Aloha,
> Rich
>
> Richard L. Pyle, PhD
> Database Coordinator for Natural Sciences
>   and Associate Zoologist in Ichthyology
> Department of Natural Sciences, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> http://hbs.bishopmuseum.org/staff/pylerichard.html
>
>
>
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
>
>
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
>

----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page at bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
iChat: aim://rodpage1962
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Rod's rants on ants: http://semant.blogspot.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20070603/175a4092/attachment.html