[tdwg-guid] An approach to Abstract LSIDs

Wed Jul 18 18:12:56 CEST 2007

    Greg,

    I believe that there is a misunderstanding here.

    The proposal of assigning LSIDs to physical world entities and all 
of its digital representations is perfectly feasible and sensible, but 
it is not a required practice. In other words, you are not obligated to 
keep LSIDs for physical and digital objects separately when assigning 
and resolving LSIDs.

    While some providers may find useful to represent the world in such 
way (and agree to perform the extra management tasks required by it), 
other providers may very well adopt a simpler approach. For example, 
many providers will find just simpler to continue to use the data model 
they currently implement in their institutions and just assign an LSID 
to every record that they wish to share from their databases. This way 
they would keep the mapping between LSID and database records one to 
one. In those cases, data providers don't even need to care about 
whether they are modeling physical entities or just their digital 
representations.

    I believe that you assumed that this simpler approach for assigning 
LSID was feasible and it was your main drive towards adopting LSIDs. I 
am very sure that it is an assumption that still holds true.

    Furthermore, I believe that you would still benefit from replacing 
your local trusted, persistent and opaque surrogates by LSIDs. First 
because LSIDs have all those properties you mentioned and more: they are 
globally (not only locally) unique, their syntax has been standardized, 
they have a standard resolution mechanism, and provide provenance. A 
good example of putting LSIDs for good use by you would be to track of 
the relationships between taxon names exported by APNI to IPNI.

    I hope this makes things clearer and helps you see the LSID 
specification again as a useful tool to share and manage your data records.

    Best regards,

Ricardo


Greg Whitbread wrote:
> Now I am worried. This is becoming very confusing.
>
> For a while there I almost believed that we might use LSID's as unique 
> identifiers but ... I (CANB) simply cannot afford the risk associated 
> with any implementation of LSID's within our database as either object 
> keys or instance identifiers when what we really need is to stay with 
> our trusted, persistent and opaque surrogates.
>
> But given that we already use a GUID to manage the identity of an 
> object, the LSID still adds two very useful methods to our persistence 
> model:
>
> getData to return to a particular instance of an object (same state) and;
>
> getMetadata to establish relationships within and between states.
>
> The LSID becomes a surrogate for a query about an object rather than 
> the object itself.  The relationship object:LSID is one to many.
>
> To meet out TDWG obligations we will deliver data sets about objects 
> uniquely identified by LSIDs and we will establish the necessary 
> resolvers.  The question of underlying persistence implied by this 
> agreement is another matter.  There is nothing in the candidate 
> standard to assist data providers deal with the issues of object, or 
> instance, identity management and many will simply find it beyond 
> their resources and/or capabilities.
>
> Perhaps there is a way, peer-to-peer like, if providers can be 
> convinced to use both object and instance identifiers, for our 
> aggregators to provide services delivering the kinds of metadata and 
> instance persistence required. I don't think that is going to come 
> from many providers.
>
> While LSIDs may provide a useful framework for managing and testing 
> object persistence their true value will still lie in the guarantee, 
> even if ephemeral or without the benefits of resolution, of an 
> object's identity and provenance; and in the advantages their mere 
> presence in a dataset can offer to both providers and users of these 
> data.
>
> greg
>
>
>
> Richard Pyle wrote:
>>> Isn't this where I came in last week Rich?
>>
>> Sort of....but we weren't clear back then about the role of the 
>> "Abstract"
>> LSID that would encompass all of the many database-record LSIDs.  I know
>> this is almost a no-brainer for those who have been part of the LSID
>> discussions over the years, but I think part of our confusion is that 
>> we're
>> talking about data-bearing LSIDs applied to database records as if 
>> *they*
>> would be the LSIDs we also use to represent the abstract notion of "the
>> name".
>>
>>> The LSID urn:lsid:indexfungorum.org:names:178962 is assigned to the 
>>> IF database record for Amanita phalloides (the deathcap - used as 
>>> one of the EoL sample species pages [well done EoL]). 
>>
>> Right -- so this is really a "Name Usage" instance -- that is, the 
>> usage of
>> the name "Amanita phalloides" by Index Fungorum.  Or, maybe it's a usage
>> instance from some other publication that IF index, like the original
>> description (=protologue) of the name "Amanita phalloides".
>>
>>> If I recall correctly the
>>> getData() returns "Amanita phalloides" 
>>
>> Not according to Kevin (who agrees with me on the data-less LSIDs for 
>> names)
>> -- but he can answer that himself when he gets back to his email (I 
>> think he
>> and Sally are just now getting onboard a plane leaving Hawaii as I type
>> this).
>>
>>> [not the primary key of the database record, which is 178962 - Rich, 
>>> why did you restrict the getData() to returning a PK?]
>>
>> My rationale is this:  If getData() returns a bytestream (rather than
>> nothing), then pretty-much by definition the LSID identifies a digital
>> object -- not an abstract object.  The "name" is an abstract object, 
>> with no
>> digital (or even physical) manifestation.  So, if the LSID returns 
>> binary
>> data via getData(), then the LSID identifies a digital object, which 
>> in the
>> scenario I described would be a computer database row (reocrd).  I 
>> suggested
>> the PK as a "natural" binary representation for a database record 
>> because
>> it's the attribute of a database record that is LEAST likely to even 
>> need to
>> be changed.  Technically, if the PK changes, then you're really talking
>> about a *different* database row, and as such, it would be a different
>> digital object, and as such, it would need a new LSID.
>>
>> In most cases, the content of other columns (fields) in a database 
>> record
>> are more subject to change.  If you embedded content of other 
>> columns/fields
>> into the "data" part of the LSID, then you would be duty-bound (per LSID
>> specs) to generate a new LSID everytime you changed any part of any
>> column/field that was included within the scope of "data" returned by
>> getData().
>>
>> Because I like he idea of GUID reusability, my inclination would be to
>> follow a protocol that least necessitated the generation of new GUIDs 
>> for
>> objects that I would otherwise intuitively think to be the "same" thing.
>> Frankly, the biologist in me is FAR more interested in GUIDs for 
>> abstract
>> objects (i.e., objects without inherent digital manifestation, such 
>> as taxon
>> names, specimens, etc.), than I am interested in GUIDs that identify
>> specific database records.
>>
>>> Is this your proposed solution Rich?
>>
>> Not exactly....but I only had 5 hrs sleep last night, and it's been a 
>> REALLY
>> long day (11pm now), so it's probably best for everyone concerned that I
>> shut up now and go to bed....
>>
>> :-)
>>
>> Aloha,
>> Rich
>>
>>
>> _______________________________________________
>> tdwg-guid mailing list
>> tdwg-guid at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
>