[tdwg-guid] First step in implementing LSIDs?[Scanned]

Sat Jun 2 20:42:59 CEST 2007

Thanks for the additional information, Paul.

> Yes Rich, our plan is to apply the LSID to the accession 'number' 
> (actually an accession 'code' as we have an historical legacy of
> suffix 'a', 'b', etc for subdivisions of the original collection
> which in many cases is a collection of objects rather than one
> physical object - a bag of leaves for example). And yes, there
> are some possible problems with errors associated with the
> metadata but ... in the cotenxt of a DBMS where the accession
> number is set to unique values only, duplications are in reality
> impossible, and yes there are far more important challenges
> to address than this ... ;-)

O.K., thanks -- and I agree!

> I assume you are correct about the
001100010011001000110011001101000011010100110110
> ... I'm a systematist leaning towards nomenclature rather 
> than an IT person.

You and me both.  I was able to create the binary conversion not because I'm
a techno-whiz, but because I know just enough about how to use Google to be
dangerous
(i.e., http://www.theskull.com/javascript/ascii-binary.html).  But the main
question was about exactly how one would convert the text "12345" into a
binary data blob for the LSID "data" (as opposed to metadata).

> I guess the 'change of ownership' comment was directed at the 
> importance of retaining the accession number as this is cited
> in the literature, and the utility of keeping this as a resolvable LSID.

Ah!  That makes sense.  But still, I'm a little uneasy about "committing" an
LSID to an accession number by branding it with data.  The main advantage I
see is that no matter how much manipulation happens with the metadata, there
will still be "something" permanently included with the LSID as data that a
human might be able to use to sort things out if the metadata get changed
too much.  Even still, though, I think I would want to also include an
institutional and/or collection prefix, so that the embedded number is
(potentially) more interpretable to an outside observer.

> A rather complex model is required for 'managing' the objects of 
> a collecting event and what subsequently happens to those objects,
> which others have more experience of and valid opinions on -
> I refer, for example, to a pit trap for insects where multiple
> objects are assigned an initial accession number, the objects
> are subsequently divided and divided again and again and finally
> a few may end up on pins as name bearing types.

Right -- that's a common occurrence in natural history collections.  The
number "12345" is assigned to a multi-species lot, and then later that lot
is split up into constituent parts.  I don't see this as a major problem,
though.  In cases where institutions typically retain the original number
for one of the original parts, and simply assign new numbers to the bits
that were "removed", then the only thing that changes on the original
accession number/LSID is the metadata for its contents, and the new
accession numbers/LSIDs would, presumably, include pointers back to the
original number/LSID as a "removed from" indicator -- but again, it only
affects metadata.  Conversely, institutions that assign new numbers to all
components of a split-up lot (effectively depreciating the original number
and retaining its meaning to the original multi-part lot) will also only
need to manage metadata changes.

The more I think about it, the more I like your approach of branding the
LSID to the accession event, rather than some conceptual notion of a
"specimen" -- which is actually more dynamic than I think most people
realize).  But I'd like to see TDWG create some sort of standard that we can
all collectively follow in how to actually do this.  Personally, I'd like to
see that standard include institution code and collection code within the
binary data blob.

Aloha,
Rich