tdwg-content
Threads by month
- ----- 2024 -----
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2003 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2002 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2001 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2000 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 1999 -----
- December
- November
- October
- September
- August
- 1557 discussions
I believe a centralised system that maintins and assigns the LSIDs would be
too large a job for one organisation and would create a bottle-neck in the
system (where every call/request or assignment of IDs will need to be
passed through one web server located a GBIF).
The reality of the data that will be transferred using the SDD format, is
that it is quite decentralised and is represented in quite different ways
at each data source. I think therefore it should be up the orgainisation
containing the data to provide a LSID and resolve these LSIDs. For example
we are considering an LSID and a MAC GUID for the unique ID of our
taxonomic name data here at Landcare Research. Which would be something
like URN:LSID:LandcareResearch:TaxonName:86BA062A-ADC6-4516-956F-
34CDA0F465EC. With a centralised system this would not be allowed - ie the
LSID would probably be limited to integers, which would then need to be
reslove at the individual organisation to find the matching data.
Having a non-centralised system would put more work on each organisation
involved, and create problems when data is moved, or organisations are
closed, but this just means that procedures need to be put in place to
handle such situations. It is possible that there may be intermediate
services that provide the LSID resolution for a bunch of databases/data
sources and serve up this data to those who request it.
I also think there is a bit of a tie between the authority name of the LSID
and the URL that is used for obtaining the data. This doesnt need to be
so. The would be the job of the central Resolver would be to match the
authority name to the url for obtaining the data. So the authority name in
the LSID could just be any name.
Kevin Richards
On Thu, 23 Sep 2004 13:29:26 -1000, Richard Pyle
<deepreef(a)BISHOPMUSEUM.ORG> wrote:
>I want to start by wholeheartedly endorsing Wouter's plea for
>non-information-bearing (meaningless) GUIDs. This feature is CRITICAL to
>the long-term success of any GUID system. It is absolutely imperative that
>there NEVER be any motivation to change the content of a GUID (i.e., it
>should be permanent). If the GUID itself contains any information
>whatsoever, there may be motivation to change that information at a later
>time.
>
>For this reason, I had initially preferred the DOI approach, but over time,
>I am gradually warming up to the LSID approach. While components of an
LSID
>do, indeed, represent information, they represent the one piece of
>information that I think may legitimately belong embedded within a GUID:
>context. That is, the context, or domain, of the GUID itself. The context
>in this case would be the "issuer" of the GUID -- not necessarily the
>current "owner" of the GUID (see more discussion on this below). Though
the
>organization that issued a GUID may eventually disappear, the fact that the
>organization was the one to issue the GUID in the first place will never
>change, and thus represents a permanent and unchanging component of the
>GUID. Without the context portion, the GUID itself is really nothing more
>than a random string of characters. In summary, I'm warming up to the LSID
>approach because it represents embedded context, without the risk of
>temptation to change the content of a GUID after it has been issued.
>
>Regarding Donald's PPT file, I have a couple of comments and questions:
>(Assumes Title slide is "Slide 1")
>
>Slide 2:
>You note there is "No reliable mechanism" to relate the same record from
>different providers to each other. But in the context of DarwinCore, the
>combination of [InstitutionCode]+[CollectionCode]+[CatalogNumber] should
>represent a virtual GUID (provided that the Global Provider Registry
ensures
>no duplication of [InstitutionCode]). I do realize that words like "should"
>and "reliable" are critical here. Perhaps the DarwinCore implementation
>should enforce the requirement of uniqueness of
>[CollectionCode]+[CatalogNumber] within a single [InstitutionCode], and
>further ensure globally unique [InstitutionCode] values via the Global
>Provider Registry.
>
>Slide 3:
>Wouldn't most of the problems indicated in the first four bulleted points
be
>largely solved by the Global Provider Registry? Using the [InstitutionCode]
>would allow lookup in the registry for a (current/active) metadata URL, and
>the metadata URL would provide information on where to access a particular
>[CollectionCode]+[CatalogNumber] piece of data.
>
>The issue of specimens changing numbers and/or collections is problematic,
>of course.
>
>The issue of versioning is a bit dicey, in my mind (e.g., at what
resolution
>of information change)? Some things, like changing taxonomic
determinations
>(i.e., "real" changes) need to be handled in a robust way. Other things,
>like the correction of typos and different styles of representing the exact
>same information (e.g., R.L. Pile==>R.L. Pyle; or R.L. Pyle==>Pyle, R.L.)
>probably don't need to be versioned. Other sorts of changes (e.g., the
>elaboration of previously existing information, such as the addition of
>retroactively-generated georeference coordinates) fall somewhere in-
between.
>
>Slide 4:
>We should all get behind SEEK in addressing these issues (Taxon concept
>mapping). Ultimately, we minimally need a GUID pool for References
>(inclusive of unpublished works), and a GUID pool for what I call
>"Protonyms" (original creations of IC_N Code-compliant names). The union
of
>these two GUIDs (what I would call "Assertions") would itself represent a
>GUID to a "potential concept" (Berendsohn). (Note: my preference would be
to
>define Protonyms as a subtype of Assertions, and therefore Protonym GUIDs
>would be a subset drawn from the same pool as Assertion GUIDs -- but this
is
>a technical discussion for another time).
>
>Slide 5:
>Nice summary!!
>
>Slide 6:
>Good stuff here, but I'll respond with some of my personal opinions:
>
>- RevisionID: see points of concern already expressed above
>
>- Specimen Record LSIDs: I gather from subsequent slides that you recognize
>two alternative approaches: having the "owner" of a specimen assign the
LSID
>within the context of their own <domainName>, or adopting GBIF as the
>international standard issuer for ALL specimen GUID. In other words, GBIF
>would represent the centralized issuer of GUIDs for all biological
>specimens, and the biological specimen community would/should rally around
>GBIF for thus purpose, and adopt GBIF specimen GUIDs as their own. I
>personally have no problem with this (I do not live in fear of "Big
Brother"
>centralization when it serves the benefit of all, as I believe it would in
>this case) -- but I know there are many who might have a problem with it,
>and therefore it might not garner widespread adoption without large volumes
>of "fuss".
>
>If, on the other hand, each organization issues its own GUIDs for its own
>set of specimens, then the question is when, if ever, GBIF would assign a
>specimen GUID? Perhaps as a surrogate for institutions that lack the
>technological ability to assign their own LSIDs? But I wonder, how many
>institutions that could server electronic data of their holdings to the
>internet would lack the ability to assign their own LSIDs?
>
>As you've outlined in subsequent slides, I see two alternative paths: A)
>Get the biological world to rally around GBIF as the centralized provider
of
>GUIDs for specimens for all collections; or B) Have each
>collection/institution issue its own set of LSIDs for its own specimens,
and
>have GBIF adopt those LSIDs for its own internal purposes. I could get
>behind either approach, but I see danger in the adoption of a mixture of
>these two approaches. I'll defer elaboration, but a lot of it has to do
with
>potential confusion about whether the GUID applies fundamentally to the
>physical specimen, or the electronic conglomeration of data associated with
>the specimen. Also, I think we should avoid the risk of assigning two
>separate GUIDs for the same "single data element" (sensu your Slide 5).
>
>- Name record LSIDs: I understand the example of an IPNI LSID for a plant
>name, and presumably there would be analogous "Catalog of Fishes" LSIDs for
>each fish name, etc. But I don't think that would be a wise approach.
>Unlike specimen records, where there are fairly unambiguous "owner"
>institutions (or at least "original owner" institutions that issued a
GUID),
>taxonomic aggregators (IPNI, ITIS, Species2000, GBIF, uBio, etc.) are most
>certainly not owners of the taxonomic names that they include in their
>databases. We would want to avoid the risk of duplicate GUIDs for the same
>name, and thus the need for mapping, e.g., an IPNI GUID for a name to its
>ITIS equivalent. Again, I can't help but think that the world will be a
>better place if we can avoid assigning multiple GUIDs to the same "single
>data element".
>
>One approach would be to rally around GBIF, and rely on them to issue GUIDs
>for all taxon names. However, I also recognize that we do not exist in a
>political/personality vacuum with regards to "ownership" of taxonomic
names,
>or the electronic representations thereof. Therefore, the closest thing
>that exists to an "owner" of a taxonomic name is the Commission of
>Nomenclature (and it's respective Code of Nomenclature) under which the
name
>was established. Thus, when it comes to assigning GUIDs for names (not
>concepts), I would propose the following:
>
>urn:lsid:ICZN.org:TaxonName:XXXXXX (all zoological names)
>urn:lsid:ICBN.org:TaxonName:XXXXXX (all botanical names)
>urn:lsid:ICNB[or LBSN??].org:TaxonName:XXXXXX (all bacteriological names)
>urn:lsid:ICTV[or ICVCN??].org:TaxonName:XXXXXX (all virus names)
>
>In an ideal world, we'd get to the point where there would be a need for
>only one registrar of nomenclature, e.g.:
>urn:lsid:BioCode.org:TaxonName:XXXXXXX
>
>Or, perhaps:
>urn:lsid:gbif.net:TaxonName:XXXXXXX
>
>But I don't think we're quite there yet.
>
>In any case, the idea would be for the taxon name aggregators to adopt the
>unambiguously unique GUID for each taxon name.
>
>Taxonomic concepts are a whole 'nother ball of wax....
>
>Slide 8:
>I actually prefer this approach (GBIF as the central issuer of specimen
>GUIDs), for a variety of reasons. One of the main reasons is that it would
>assure uniqueness of an integer within a given <namespace> (e.g.,
>Specimens), which would make things a bit easier for those of us who like
to
>use integers as primary keys in databases. In other words, it avoids the
>possibility of urn:lsid:bishopmuseum.org:Specimen:1234567 colliding with
>urn:lsid:usnm.gov:Specimen:1234567, when reducing the GUID to just its
>integer component for local application purposes (where context can be
>enforced by other means). However, I should point something out regarding
>the "Advantage" part of this slide, which is that the "problem" of
>transferring record locations doesn't exist, provided that the <domainName>
>component of the LSID is taken as the issuer of the GUID, not as the
current
>owner of the specimen. In other words, if Bishop Museum assigned GUID
>urn:lsid:bishopmuseum.org:Specimen:1234567 to a specimen, and then gave
that
>specimen to Smithsonian, then Smithsonian would retain the complete GUID
>intact as: urn:lsid:bishopmuseum.org:Specimen:1234567.
>
>The danger comes when you try to use the <domainName> component as metadata
>to represent the current location of the specimen and/or its electronically
>represented data. This is where Wouter's original point
about 'meaningless'
>GUIDs comes into play. If the whole point of using LSIDs is to embed the
>"current location" information within the ID itself so that applications
can
>retrieve additional data associated with the GUID directly, then I have
some
>concerns (mostly address already).
>
>Why there is a reference to urn:lsid:gbif.net:TaxonConcept:106734 at the
top
>of this slide???
>
>Slide 9:
>Again, I'm not sure I understand on this slide why there is a reference to
>urn:lsid:ipni.org:TaxonName:82090-3:1.1
>Also, in this model, what function does the LSID serve that is not met by
>the concatenated [InstitutionCode]+[CollectionCode]+[CatalogNumber] (in the
>context of Global Provider Registry).
>
>Slide 10 (taxon concepts and literature):
>This message is already getting too long... :-)
>I already touched on this above under "Slide 4". I definitely agree that
we
>need a GUID system for References. This should include more than just
>published references. It doesn't quite exist yet among the existing
>Reference registrars (as far as I can tell) to accommodate the specific
>needs of taxonomists (e.g. referring to a subsection of a reference as
>representing an original taxonomic description), so I do see a need to
>create a Reference GUID system specific to biology. I could rant for pages
>on this, but I'll summarize simply with a plea to *DEFINE* a Concept GUID
as
>an intersection between an Name GUID and a Reference GUID (i.e., what I
>would call an "Assertion"). Not all Name-Reference combinations will be
>worthy of recognition as a distinct "Concept", but all are *potentially*
>representative of a concept (Berendsohn), and thus all should be drawn from
>the same pool of GUIDs as Concept GUIDs. In other words, "Concepts" should
>be thought of as a subtype of Name-Reference instances. I would go further
>to suggest (as I did above) that "Name" GUIDs should also be a subtype of
>Name-Reference instances (non-exclusive of Concept subtype instances),
using
>the Name-Reference instance that represents the Code-recognized original
>description of the name as the "handle" to the Name.
>
>By this approach, you need only two GUID object classes <objectClass>: one
>for References, and one for Name-Reference intersections (Assertions). The
>latter of these could serve as the source for both Concept GUIDs and Name
>GUIDs.
>
>Last Slide:
>
>My own answers to your questions:
>
>1) Are LSIDs the most appropriate technology?
>
> I'm increasingly coming to that conclusion.
>
>2) Should identifiers be assigned and resolved centrally or via a fully
>distributed model (or should providers have the option of using either
>model)?
>
> I think the best option would be central. The next option would
be full
>distributed. Leaving it as an option would, in my opinion, be a BIG
>mistake.
>
>3) Which objects should receive identifiers?
>
> Specimens, References, Name-Reference intersections (Assertions),
and
>perhaps Agents. [TaxonNames and Concepts can be subsets of Name-Reference
>intersections].
>
>3a) Should we develop a set of object classes for biodiversity informatics
>and assign identifiers to instances of all of these?
>
> I think so, yes. Of course, it depends a bit on who you mean
by "we". I'm
>thinking sensu lato.
>
>3b) Should identifiers be associated with real world objects (e.g.
>specimens), or with digitised records representing them (e.g. perhaps
>multiple records representing different digitisation attempts by different
>researchers for the same specimen), or both?
>
> I would say definitely real-world objects (treating things like
>Code-recognized original descriptions of taxon names, and citable
references
>as "real-world objects"). I do NOT think we should have separate GUIDs for
>digital representations thereof. Alternative digital representations are
>simply clutter that will eventually be weeded out of the system, once we
all
>get organized on this stuff, and harness the power of the internet to
>implement a global editing/QA system.
>
>4) What should be done about existing records without identifiers?
>
> As far as I know, ALL records are currently without identifiers
(unless
>someone established a widely accepted GUID system and I missed the
>announcement...)
>
>4a) Should they be left alone?
>
> Ultimately, no.
>
>4b) Should they all be updated with identifiers?
>
> Ultimately, yes.
>
>4c) Should the provider software be modified to generate "soft" identifiers
>(ones which we cannot guarantee in all cases to be unique) based e.g. on
the
>combination of InstitutionCode, CollectionCode and CatalogNumber?
>
> As an interim solution, perhaps. See my comments under "Slide 2"
above.
>
>5) Are revision identifiers a useful feature?
>
> I would like to think not. If the information is truly dynamic
over time
>(e.g., re-determinations of taxonomic identity of specimens), then
>individual instances should probably receive their own set of GUIDs (as
>opposed to versions of the "parent" GUID). If the information is static
>over time, and changes represent objective corrections, then I don't see a
>real need to track that within the context of a GUID (record edit history
>may or may not need to be tracked, but this seems to me to be a separate
>issue from GUIDs).
>
>5b) How many providers will be able to provide and handle them?
>
> If versioning is incorporated, then it should be designed such
that a
>"default" version is provided automatically when versioning is not handled.
>
>
>Sorry for the long post, but I feel that this issue is extremely important
>at this point in bioinformatics history.
>
>Aloha,
>Rich
>
>Richard L. Pyle, PhD
>Natural Sciences Database Coordinator, Bishop Museum
>1525 Bernice St., Honolulu, HI 96817
>Ph: (808)848-4115, Fax: (808)847-8252
>email: deepreef(a)bishopmuseum.org
>http://www.bishopmuseum.org/bishop/HBS/pylerichard.html
>
>> -----Original Message-----
>> From: TDWG - Structure of Descriptive Data
>> [mailto:TDWG-SDD@LISTSERV.NHM.KU.EDU]On Behalf Of Donald Hobern
>> Sent: Thursday, September 23, 2004 6:22 AM
>> To: TDWG-SDD(a)LISTSERV.NHM.KU.EDU
>> Subject: Re: Globally Unique Identifier
>>
>>
>> This is precisely one of the key questions we need to address with any
>> identifier framework we adopt. I think we could easily use LSIDs in a
>> way that should overcome your concerns, and I think that the built-in
>> mechanisms for discovery and metadata access within the LSID model are
>> really exciting.
>>
>> I have just put together a PowerPoint presentation to explain some of
>> what I think we could achieve with globally unique identifiers and
>> particularly with LSIDS. It can be downloaded from:
>>
>> http://circa.gbif.net/Public/irc/gbif/dadi/library?l=/architecture/globa
>> llyuniqueidentifier/
>>
>> It may be clearest if you go through it as a slide show rather than in
>> edit mode.
>>
>> Thanks,
>>
>> Donald
>>
>> ---------------------------------------------------------------
>> Donald Hobern (dhobern(a)gbif.org)
>> Programme Officer for Data Access and Database Interoperability
>> Global Biodiversity Information Facility Secretariat
>> Universitetsparken 15, DK-2100 Copenhagen, Denmark
>> Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
>> ---------------------------------------------------------------
>>
>>
>> -----Original Message-----
>> From: TDWG - Structure of Descriptive Data
>> [mailto:TDWG-SDD@LISTSERV.NHM.KU.EDU] On Behalf Of Wouter Addink
>> Sent: 23. september 2004 17:38
>> To: TDWG-SDD(a)LISTSERV.NHM.KU.EDU
>> Subject: Re: Globally Unique Identifier
>>
>> It seems that DOI allows for any existing IDs to be used as part of the
>> unique identifier. That seems to me as a fast to adopt short term
>> solution
>> but not a good idea for the long term. At first sight I very much liked
>> the
>> LSID specification, but the longer I think about it, the less I like
>> some
>> parts. What I think is missing in the LSID specification is that the
>> unique
>> identifier should be 'meaningless' apart from being an identifier to
>> become
>> time independent (and to avoid possible political problems). Any
>> solution
>> with a URN I can think of has some meaning, which makes solutions like a
>> MAC-address generated GUID favorable in my opinion. And any meaning you
>> need
>> (like an authority of an object) can be specified in metadata instead of
>> using it in the identifier. What is not very clear to me in the LSID
>> specification is where the LSID generated by a LSIDAssigningService is
>> actually stored.
>>
>> Wouter Addink
>>
>> ----- Original Message -----
>> From: "Gregor Hagedorn" <G.Hagedorn(a)BBA.DE>
>> To: <TDWG-SDD(a)LISTSERV.NHM.KU.EDU>
>> Sent: Wednesday, September 08, 2004 6:20 PM
>> Subject: Re: Globally Unique Identifier
>>
>>
>> >I am not quite sure, but to me it seems with "GUID" you refer to the
>> > numeric, MAC-address generated GUID type. I have nothing against
>> > these. However, any URN in my view is a GUID that has most of the
>> > properties you mention:
>> >
>> >> - it is guaranteed to be unique globally, and can be created
>> anywhere,
>> >> anytime by any server or client machine - it has no meaning as to
>> >> where the data is physically located and will there not confuse any
>> >> user about this
>> >
>> >> - most id
>> >> mechanisms, especially URI/URN ids require a 'governing body' to
>> >> handle namespaces/urls to ensure every URN is unique, whereas a GUID
>> >> is always unique
>> >
>> > The governing body is restricted to the primary web address, and in
>> > most cases such an address is already available. Being a member of a
>> > governmental institution that explicitly forbids the use without
>> > prior consent, and forbids the use of its domain name once you are no
>> > longer working for them, I realize some potential for problem.
>> >
>> >> I do think a URL of some kind would be useful for things such as
>> >> global searches of multiple databases, as this will allow the search
>> >> to go directly to the data source where the name, referene, etc comes
>> >> from. But this should not be part of its ID. Maybe a name/id should
>> >> have several foms, a GUID for an ID and a URL + a GUID for a fully
>> >> specified name.
>> >>
>> >> What are the current thoughts on these ideas?
>> >
>> > A GUID is only part of the problem. The other half of the problem is
>> > actually getting at the resource. URN schemes like DOI or LSID (I
>> > prefer the latter) intend to define resolution mechanisms. That make
>> > the URN not yet a URL - in my view the good comes with the good,
>> > location and reorganization independence.
>> >
>> > I believe GBIF should install such an LSID resolver, which is why in
>> > the UBIF proxy model, under Links, I propose to support a general URL
>> > (including potentially URNS), a typed LSID and a typed DOI. This
>> > could be simplified to have just a URN (LSID and DOI are URNs), but
>> > that would then require string parsing to determine and recognize the
>> > preferred resolvable GUID types. Comments on splitting/not splitting
>> > this are welcome!
>> >
>> > There may be some need to define a non-resolvable URN/numeric GUID as
>> > well. However, that would not be under the linking question. Is it
>> > correct that linking requires resolvability, or am I thinking into a
>> > wrong direction?
>> >
>> > Gregor
>> >>
>> >
>> >
>> > ----------------------------------------------------------
>> > Gregor Hagedorn (G.Hagedorn(a)bba.de)
>> > Institute for Plant Virology, Microbiology, and Biosafety
>> > Federal Research Center for Agriculture and Forestry (BBA)
>> > Koenigin-Luise-Str. 19 Tel: +49-30-8304-2220
>> > 14195 Berlin, Germany Fax: +49-30-8304-2203
>> >
>> > Often wrong but never in doubt!
1
0
Richard Pyle wrote:
...
>
>>Perhaps there's a
>>delegation mechanism that can be used? So when a DN can't be resolved,
>>the system backs down to a default DN, such as gbif.org that would then
>>indicate that smithsonian.org is now bishop.org?
>
>
> But it's not that simple, is it? If there is an LSID:
>
> urn:lsid:bishopmuseum.org:Specimen:1234567
>
> and another LSID, to a completely different specimen:
>
> urn:lsid:smithsonian.gov:Specimen:1234567
>
> ...then simply re-directing all bishopmuseum.org requests to Smiithsonian
> wouldn't work....would it? Or would Smithsonian recognize the domain and
> deal with it accordingly?
>
> It seems to me that a lot of complexity would disappear if we could all get
> behind a single issuer of GUIDs, and mirror the capability to resolve those
> GUIDs on dozens or hundreds of servers around the world, and only use the
> GUIDs in a semantic context that is self-evident.
Perhaps. But what about if an insitution wants to provide IDs for more
than just specimen or name objects? Should we always rely on a single
authority to provide a mechanism for doing that? I don't think that
would go very far.
>
> Re-reading something I wrote:
>
>
>>>I would go further
>>>to suggest (as I did above) that "Name" GUIDs should also be a subtype
>
> of
>
>>>Name-Reference instances (non-exclusive of Concept subtype instances),
>
> using
>
>>>the Name-Reference instance that represents the Code-recognized original
>>>description of the name as the "handle" to the Name.
>
>
> Actually, it's probably safe to say that all "name-bearing" Name+Reference
> instances (i.e., original descriptions) are also, virtually by definition,
> also "concept-bearing" Name+Reference instances. So, not only would
> name-bearing and concept-bearing Name+Reference instances be non-exclusive
> of each other, it would probably be safe to think of name-bearing instances
> as a subset (Subtype) of Concept-bearing instances, which themselves are a
> subset (Subtype) of all Name+Reference instances.
>
ok.
>
>>>My own answers to your questions:
>>>
>>>1) Are LSIDs the most appropriate technology?
>>>
>>> I'm increasingly coming to that conclusion.
>>
>>I agree. The LSID system is easy to implement, stable, scalable and
>>does everything we need. The DOI system is good as well, but the fee
>>scheme bothers me (though I understand there are ways around that).
>
>
> My understanding is that it would be easy to develop a DOI-like system that
> is not part of the fee-based DOI system, and I still find it appealing
> because it could as simple as an integer ID and very basic context tag.
>
> As for LSIDs -- If I understand correctly that the purpose of the
> <DomainName> portion of the LSID is to point to the one (and only?) server
> that can resolve the ID, then all of a sudden I don't like them at all. If
> it's true that the embedded Domain portion of an LSID *requires* that the
> domain exist for as long as the GUID exists in order for the GUID to be
> useful, then I definitely have reservations. If, on the other hand, the
> Domain portion can be seen as representing the issuer (somewhat analogous to
> the function of "InstitutionCode" in DwC), and could be resolved by any
> server set up to deal with the <namespace> part of the LSID, then I'm much
> less concerned.
>
The DN portion is meant to be resolvable by the DNS system. So yes,
there is a dependency on the continued existence of the DN, but is can
be set up to be resolved by any LSID service endpoint.
>
>>> I think the best option would be central. The next
>>
>>option would be full
>>
>>>distributed. Leaving it as an option would, in my opinion, be a BIG
>>>mistake.
>>
>>I disagree- the assignment of identifiers should be by the curators of
>>the data. However, I do strongly consider that there should be some
>>sort of trust scheme in place, where identifiers are issued only by
>>entities trusted by the rest of the system. A scheme similar to that
>>used by certificate authorities and delegates should be adequate.
>
>
> Maybe I'm misunderstanding the use of the word "issuers", but in my mind,
> the issuer's job is only to provide a guaranteed-unique set of ID's. It
> would not, necessarily, be the location where the ID is applied to its
> associated data.
>
If we use the MAC approach + a context such as an LSID or DOI form, then
there is absolutely no need for a central issuing agency. If data
providers are more careful about assuring they meet requirements about
their identifiers, then again there is no need for a central issuer.
> In Donald's PowerPoint file, he made reference to "mechanisms for data
> providers to request and use blocks of LSIDs from central service". Here's
> how I imagine a system would work:
>
> GBIF (or some other central entity) establishes a service that can generate
> unique <objectID> numbers within its own LSID context. The same service
> also maintains a complete set of data associated with each <objectID>.
> Major (and minor) institutions (essentially your set of "Trusted" entities)
> would established mirrored copies of the complete set of all data (or,
> perhaps, only a filtered subset of the complete data), but would not be able
> to issue new GUIDs directly. However, the mirrored sites could serve as
> real-time "pass-through" to the central sight so as to be functionally able
> to provide new GUIDs in real time, by retrieving them directly (in real
> time) from the central server. Also, the mirrored sites would all maintain
> synchrony of their copies of the data with the central "master" copy, on a
> realistic time frame (e.g., every 24 hours, or on-demand if a data provider
> chose to initiate a synchronization command).
Again, just use a MAC based GUID inside an LSID context. If you have
any MS dev tools on your machine type "guidgen" and the command prompt.
Voila! Globally unique identifiers. No matter how many times you
push the "New GUID" button.
>
> If a curator of a local institution's data needed to assign a new batch of
> numbers for a new set of specimens, the curator would issue a request to the
> central server (or via one of mirrored sites as a pass-through request) for
> a block of N numbers. The central server would never re-issue those same
> numbers again to anyone else. But those numbers remain "empty" until the
> curator assigns them to data, and uploads that data either to the central
> server or to one of the mirrors. In other words, even though the numbers
> are "issued" by a central server, they are applied to real data only by
> local curators.
>
> A big issue, of course, is control over editing of data associated with a
> given GUID. In the case of specimens, the central server and mirrored sites
> could (perhaps at the discretion of the data curator who initially requested
> the number) restrict subsequent editing of those data to a defined set of
> password-protected user accounts. In the case of more public data, such as
> taxon names and publications, the control of data editing would be less
> restrictive (e.g., either full accessible by the public, or accessible to
> anyone who goes to the trouble to register themselves as a taxonomist with
> the central server or with any of the mirrored sites).
>
> Maybe this approach would not be practical for specimen data -- but I think
> it would be the optimal approach for taxon data. Perhaps those two
> fundamentally different kinds of data (owned, vs. public domain) need
> fundamentally different approaches to GUID issuance and assignment?
>
>
>>>3) Which objects should receive identifiers?
>>>
>>> Specimens, References, Name-Reference intersections
>>
>>(Assertions), and
>>
>>>perhaps Agents. [TaxonNames and Concepts can be subsets of
>>
>>Name-Reference
>>
>>>intersections].
>>
>>Any object. It doesn't matter what it is, just that it can be resolved,
>>and when you find it, you can figure out what it is. Sensible use of
>>the NameSpace portion of the LSID will help a lot with this. A trusted
>>organization should issue the NameSpace portion to avoid NS conflicts.
>
>
> I'd have to think this through some more. Leaving it too open might lead to
> a plethora of (potentially overlapping, but not quite equivalent)
> NameSpaces, which seems like it could turn into a real mess, really quickly.
> Centralized ID systems such as social security numbers in the U.S.,
> telephone numbers, etc. definitely have some advantage over totally open
> systems. I suppose that the pool of NS's would be self-cleaning simply by
> use or non-use....but I still wonder how much better this approach would be
> over the status quo.
That's why there needs to be some agreement over the issuance of namespaces.
>
>
>>>3a) Should we develop a set of object classes for biodiversity
>>
>>informatics
>>
>>>and assign identifiers to instances of all of these?
>>>
>>> I think so, yes. Of course, it depends a bit on who you
>>
>>mean by "we". I'm
>>
>>>thinking sensu lato.
>>
>>Sure, and these could be a core from which others can be built. But we
>>should absolutely not restrict the capability of the "system" to accept
>>new classes - even classes that represent the same infomration in a
>>different way that may be appropriate to a group of users.
>
>
> Again, I'll have to think about this some more. I certainly don't think
> that the "system" should be incapable of dealing with new classes -- sort of
> like how anyone can develop their own Federation Schema and use DiGIR to
> establish specific information networks. But I'd hate to see a breakdown in
> the global transmission of biodiversity information simply because different
> subgroups establish their own special-needs, non-mutually-compatible classes
> for dealing with essentially the same kinds of information (especially if
> they do not also conform to a generalized international standard).
Bah. That's the whole point of this - to facilitate data exchange. If
a small subgroup wants to start exchanging data in an abbreviated
format, so what? As long as the identifiers being used are able to
resolve the type of object being passed around, and the objects conform
to their definitions, it shouldn't be a problem. By initially
establishing a robust framework for Scientific Names and perhaps
specimen data / collections, then there will be little need for others
to recreate new ways to represent that data. The benefits of a robust
reliable representation and provision of cheap, effective software tools
will hopefully overcome the steep learning curve needed to even
understand what's in some of these schemas.
>
>
>>>4) What should be done about existing records without identifiers?
>>>
>>> As far as I know, ALL records are currently without
>>
>>identifiers (unless
>>
>>>someone established a widely accepted GUID system and I missed the
>>>announcement...)
>>
>>All records currently have some sort of identifier, the problem is their
>>uniqueness is not rigorously enforced or even evaluated, so their
>>usefulness is probably limited.
>
>
> O.K., in that case I misunderstood the meaning of "identifiers". All
> historical identifiers (e.g., catalog numbers for specimens) should be
> maintained, preserved, and cross-referenced to GUIDs just like any other
> metadata about the physical object. I think of catalog numbers not so much
> as unique identifiers, but as "labels" -- not altogether unlike taxonomic
> names. In the databases I manage, I do not use catalog numbers as
> identifiers -- the computer generates the UID, which is never seen, read,
> written, or typed by a human. That's how I'd like to see the sorts of GUIDs
> we're discussing be implemented -- i.e., for the benefit of
> computer-computer data exchange; not human-human data exchange or
> human-computer/computer-human data exchange.
>
But if I want to say to you, hey look at this specimen xxx while we're
chatting from around the world using an instant messenger while
collaborating on some project, would't it be nice to just be able to
type in lsid:mymuseum.org:specimen:1234 and have your client retrieve
that exact data and associated metadata directly? A trivial example but
one that can form the foundation of some cool stuff for data exchange
and interaction. I thought that was the whole point of these GUID
things. But maybe I'm mistaken?
>
>>>4c) Should the provider software be modified to generate "soft"
>>
>>identifiers
>>
>>>(ones which we cannot guarantee in all cases to be unique)
>>
>>based e.g. on the
>>
>>>combination of InstitutionCode, CollectionCode and CatalogNumber?
>>>
>>> As an interim solution, perhaps. See my comments under
>>
>>"Slide 2" above.
>>
>>Yes, but not soft. The providers should assign their own identifiers,
>>but there must be a mechanism to ensure that identifiers are being
>>properly assigned.
>
>
> Agreed -- but I still think of these as "soft" identifiers, because
> CatalogNumber values can change over time, in certain circumstances. GUIDs
> should *never* need to be changed (even if the institution that issued them
> vanishes without a trace from the face of the Earth).
>
Yep. That would be the ideal.
>
>>Revision information is very helpful in dealing with errors such as
>>keystroke errors or other such details that do not change the object.
>
>
> I agree the revision information *can* be helpful in dealing with errors;
> but I don't see that function as being integral to the assignment of GUID
> values.
Except in the somewhat bizarre case when you need the old version of the
object.
>
>
>>Not many. It seems most collections don't record any history in their
>>record edits, so without a major alteration in the way the data are
>>stored, it will be a significant undertaking to provide useful revision
>>information.
>
>
> For what it's worth, the databases I have developed for my institution are
> designed to log every change made to every field (except
> performance-enhancing, purely derivative fields), including what the
> previous value was, who made the change, and when the change was made. When
> records are deleted, a "snapshot" of the value of every non-null field is
> logged, including the time the record was deleted, and by whom. The reason
> I say all of this is to underscore that my stance on not including
> versioning IDs as part of a GUID system is NOT from lack of appreciation for
> the value of preserving edit histories (something I clearly value very, very
> much -- given that the total diskspace occupied by my edit logs exceeds the
> total diskspace occupied by the "real" data!)
Awesome. That's a nice way to do it.
>
> In closing, I apologize to those who find my overly-long posts on this topic
> to be an annoyance. I also am starting to wonder: is this the appropriate
> email forum to have this discussion?
>
Yeah, good question. Maybe this should be on the GBIF DADI list or TDWG
general? Or even the LSID list?
> Aloha,
> Rich
>
Kia ora,
Dave V.
1
0
Greg Whitbread wrote:
> On Fri, 2004-09-24 at 18:17, Dave Vieglais wrote:
> ....
>
>>document. Or perhaps a system might be developed that provided an LSID
>>for a DiGIR query document- so the dataset could be completely recreated
>
>
> exactly?
>
Nope. Well, assuming the contributing datasets haven't been modified,
the content will be the same set of records. If collections supported
versioning, then perhaps one day it might be possible to always return
exactly the same data...
All the system I mentioned does (will do) is assoicate an LSID with a
query document (and the target datasets). Resolving the LSID actually
returns the query document, which can then be resolved by the DiGIR network.
>
>>just be hitting on the LSID (yes, one is under construction). One could
>>imagine simply passing the LSID to another infrastructure that say,
>>estimated potential distribution, or highlighted relevant news reports
>>from an AP feed mentioning the species for which the query was created.
>>...
>
>
> --
> Greg Whitbread <ghw(a)anbg.gov.au>
> ANBG/CPBR/ANH
>
1
0
Richard Pyle wrote:
>>Yes, certainly, a GUID within the context of an XML document is pretty
>>well defined by the schema, dtd or just it's loose association with
>>other elements in the document.
>>
>>But what about if one appears in a journal article, a citation in a
>>policy document, etc?
>
>
> Well...that's partly why I emphasized that I think GUIDs should be for
> computer-computer data exchange only. But even if printed for a pair of
> human eyes to read, surely there would be *some* stated context. E.g.,
> "ITIS TSN 1234567"; "BPBM 123456"; "GBIF Specimen ID 9876543"; "ICZN NameID
> 92AB5B37-70E9-4f05-9E97-CBABD08513ED"; etc....
>
So formalize that a little and you might have something more
consistently machine parsable like: ITIS.ORG:TSN:1234567;
BPBM.EDU:something:123456;GBIF.ORG:Specimen:9876543, ...
Add in the system identifier for resolution (urn:lsid:...) and you have
LSIDs. The result is a far more consistent, legible and widely useful
mechanism for referencing objects. Allowing an author to arbitrarily
provide the context for identifiers gets us little further along.
Have you seen how LSIDs and DOIs are being used in electronic publications?
>
>>It would be nice to be able to provide a unique
>>identifier as perhaps a footnote for a scientific name mentioned in a
>>document.
>
>
> How hard would it be in such cases to include within the Methods section of
> the document, something to the effect of "All taxon IDs listed in this paper
> refer to GBIF Specimen ID's, which can be resolved at gbif.net". If the
> problem is one involving a pair of human eyes reading a number, then the
> problem can be solved in the context of a pair of human eyes reading the
> context.
>
Sure, but do that consistently, by all authors? And do it in a way that
is without ambiguity? Machine parsable (for electronic publication)?
Easily resuable in other documents?
>
>>Or perhaps a system might be developed that provided an LSID
>>for a DiGIR query document- so the dataset could be completely recreated
>>just be hitting on the LSID (yes, one is under construction). One could
>>imagine simply passing the LSID to another infrastructure that say,
>>estimated potential distribution, or highlighted relevant news reports
>>from an AP feed mentioning the species for which the query was created.
>> Using a simple, meaningless GUID buys us none of this potential, and
>>forces us to always use a wrapper to provide a contextual basis on how
>>to interpret the identifier.
>
>
> I guess my question is, why *must* the wrapper be integral to the ID itself?
> Why can't the contextual basis be established around the ID, at the time the
> ID is presented/transferred, as needed? If the cost of embedding the
> context within the GUID is that all links to, say, Bishop Museum ichthyology
> GUIDs for specimens become useless if the collection is transfered to
> another institution and the embedded DomainName terminated, then I say put
> the burden of context establishment on the ID exchange system
> ("presentation layer"), rather than embedded within the ID itself.
>
It must be so that it can be reused outside the original context of the
document that contained the identifier. I believe there are mechanisms
in the LSID spec for dealing with this problem - but I need to go back
through it to be sure.
> Aloha,
> Rich
>
Kia ora,
Dave V.
1
0
I sent the other response before going through the whole document. Treat
this as part II. This is getting so huge, there might even be a part III...
Richard Pyle wrote:
...
>
>
>
>>That's
>>the point of the LSID or DOI, they provide GUIDs that identify what
>>system can be used to resolve them. If GUIDs for names or specimens or
>>whatever are to be used in other systems, then it is essential that the
>>GUID can be associated with a resolving system.
>
>
> I tend to agree -- which is why I preferred DOIs (and increasingly, LSIDs)
> to MAC ID's (which show up all over the place in all sorts of contexts).
> Even still, though, I think we'll find that all electronic exchanges
> involving GUIDs of which we speak, will do so within an evident context.
Maybe. Perhaps for individual records there is no need for a resolvable
identifier to a single object, and using a MAC type guid there makes
some sense. But if we go to the trouble of making GUIDs, why not make
the useful as well?
>
>>Both the DOI and LSID approaches are structured and provide context.
>>The DOI system uses the NISO Z39.84-2000 standard for categorization,
>>the LSID uses the domain name system. Both provide a context essential
>>for reuse of an identifier outside it's original context.
>
>
> Yes, but I initially preferred DOIs to LSIDs because there tends to be less
> "context baggage" associated with them. My sense of DOIs is that each
> institution would not create its own DOI category; but rather there would be
> a single agreed-upon DOI category that is independent of any particular
> institution (with all the potential for political baggage an
> institution-specified context might afford).
>
Yes, that would be the way they are created - the DOI category would be
assigned by the governing agency (probably DOI.org). Then the baggage,
the unique part, would be up to the data providers or some other authority.
>
>>This was one of the first recommendations to GBIF - to provide a
>>registry of institution codes for exactly this purpose. Having a tool
>>that verified the uniqueness of records within a collection as exposed
>>by it's provider (either biocase or digir) would help this uniqueness
>>problem. Now that the UDDI registry is available, we could in theory
>>use the institution identifiers in there.
>
>
> More power to you (and GBIF, and the future of DiGIR)! But in my view, it
> should still be seen only as a temporary solution, until we can get our acts
> together with more specific (and less information-contingent) ID systems.
>
Yeah, I think the UDDI registry can really be leveraged to help with this.
>
>>I strongly disagree that there should be a single GUID issuer or
>>resolver.
>
>
> I believe you are in the majority on this. But when I think it all through,
> I still feel that consolidation of GUID issuance will be more advantageous
> in the long term.
>
Nope. You'll have to try harder to convince me :-)
>
>>What we really need is an organization that operates kind of
>>like a certificate authority- GBIF could act as the root from which
>>other trusted GUID issuers may be created. In this way we can avoid the
>>arbitrary creation of GUIDs yet still provide considerable flexibility
>>and de-centralization in the community.
>
>
> If I read you correctly, I gather you are saying that the issuance of
> numbers would be distributed and isolated, but the issuers would fall under
> a centralized authority. I'm not sure I understand why this system is
> necessarily advantageous over a centralized issuer.
>
Because there's no single point of failure, it is more scalable, and in
the (unlikely) event the centralized authority no longer exists, it
would be a fairly trivial matter to delegate root authority to another
tusted party.
>
>>It would be a relatively simple task to include a LSID resolver service
>>along with a DiGIR provider. I have prototyped such a system a while
>>back, but other issues prevented deployment. With such an
>>implementation, it would be trivial to assign unique identifiers to
>>specimens - but first the problems institutions seem to have even
>>providing unique identifiers within a collection must be resolved.
>
>
> AGREED!
>
>
>>>As you've outlined in subsequent slides, I see two alternative
>>
>>paths: A)
>>
>>>Get the biological world to rally around GBIF as the
>>
>>centralized provider of
>>
>>>GUIDs for specimens for all collections; or B) Have each
>>>collection/institution issue its own set of LSIDs for its own
>>
>>specimens, and
>>
>>>have GBIF adopt those LSIDs for its own internal purposes. I could get
>>>behind either approach, but I see danger in the adoption of a mixture of
>>>these two approaches. I'll defer elaboration, but a lot of it
>>
>>has to do with
>>
>>>potential confusion about whether the GUID applies fundamentally to the
>>>physical specimen, or the electronic conglomeration of data
>>
>>associated with
>>
>>>the specimen. Also, I think we should avoid the risk of assigning two
>>>separate GUIDs for the same "single data element" (sensu your Slide 5).
>>>
>>
>>A mixture would still work, provided there was appropriate coordination
>>between the efforts.
>
>
> With the level of coordination required, you might as well go for the "brass
> ring" (in my opinion). But maybe what I see as the "brass ring" is seen as
> a dud to others.
>
>
>>>Thus, when it comes to assigning GUIDs for names (not
>>>concepts), I would propose the following:
>>>
>>>urn:lsid:ICZN.org:TaxonName:XXXXXX (all zoological names)
>>>urn:lsid:ICBN.org:TaxonName:XXXXXX (all botanical names)
>>>urn:lsid:ICNB[or LBSN??].org:TaxonName:XXXXXX (all
>>
>>bacteriological names)
>>
>>>urn:lsid:ICTV[or ICVCN??].org:TaxonName:XXXXXX (all virus names)
>>>
>>>In an ideal world, we'd get to the point where there would be a need for
>>>only one registrar of nomenclature, e.g.:
>>>urn:lsid:BioCode.org:TaxonName:XXXXXXX
>>>
>>>Or, perhaps:
>>>urn:lsid:gbif.net:TaxonName:XXXXXXX
>>
>>It is quite likely that there will be multiple LSID generators and
>>issuers. There is no real reason why this should be prevented, except
>>to ensure that appropriate measures are taken to avoid duplication of
>>GUIDs for the same object (taxonomic concept in this case).
>
>
> Actually, I was talking about Taxonomic Names, specifically -- but if Names
> are considered as represented by a subset of Concepts (as I hope they will
> be), then it's the same GUID pool.
>
Not sure what you mean here- If Joe enters a citation someplace and Rich
uses it's LSID within a Taxonomic Object he entered, why does it have to
be in the same pool? As long as the LSID resolved to the appropriate
object, all would be good.
>
>>So a
>>critical piece of infrastructure for a name service that was intending
>>to assign GUIDs would be a mechanism for determining if the object they
>>are about to assign the GUID to is not already present in the system,
>>held at some other location. There needs to be something like a global
>>"findThisObject(taxon_object)" that absolutely guarantees that the
>>instance doesn't exist some other place. And if duplicates were to
>>occur, then there must also be a mechanism for indicating equivalence
>>between GUIDs, or perhaps a way of deleting the duplicate (how to decide
>>which is the duplicate?).
>
>
> I agree with all of this, but it seems that the infrastructure you describe
> would yield a higher total cost than the single GUID provider approach
> would.
Yeah, but it really concerns me having a single point of failure for
such a critical system.
>
>
>>Forcing the use of a single DN such as BioCode.org for all names would
>>seem to be a mistake, since that implies a single resolver service for
>>all names- with obvious implications in case of failure. Perhaps there
>>can be multiple resolver services with a single DN? That would probably
>>work fine then.
>
>
> Hmmm...I'm not sure I follow. If I interpret your word "resolver"
> correctly, then I see no reason why BioCode.org LSIDs could only be resolved
> by one server. Is that what the DomainName component of a LSID is
> specifically for? That is, "go to this domain to resolve the meaning of
> this LSID"? I thought the DomainName component was simply to give
> uniqueness to an LSID in the form of representing the issuer (analogous to
> the function of InstitutionCode in DwC). I see no reason why there couldn't
> be dozens, or hundreds of mirrored caches of the complete dataset all over
> the world, maintained automatically in synchrony with the "master" set
> (which would presumably, but not necessarily, reside at BioCode.org). Any
> one of the mirrors could resolve any BioCode.org LSID. With such a system,
> resolving an LSID would require that *any one* of potentially dozens of
> mirrored servers to be functional.
>
> If I understand you correctly, and an LSID is resolved only by the server at
> the Domain embedded within the LSID, then a dataset containing a
> heterogeneous assortment of LSIDs would need *all* of potentially dozens of
> distributed servers to be functional.
>
How an LSID is resolved is described in detail in the document:
http://www.omg.org/cgi-bin/doc?lifesci/2003-12-02
Section 8.3 describes the use of DNS for resolution.
Basically, the LSID client:
1. Parses the LSID urn:lsid:DN:NS:ID[:Rev]
2. Using DNS, locate the SVR record for DN, which points to the service
3. Using DNS again, resolve the location of the service
4. Once you have the service endpoint, basically ask it for the object
with NS:ID:Rev
That's a gross simplification, and it appears that the LSID definition
now treats DNS resolution as one resolution mechanism, rather than the
only one.
>
>>The LSID service must be able to resolve the object. When the object
>>moves some other place, then there will need to be a mechanism for the
>>LSID service to forward the resolution to the appropriate service. The
>>really big problem is when an institution no longer exists - so the
>>hypothetical example of Bishop museum consuming all the Smithsonian fish
>>collections - the Smithsonian LSID resolver would perhaps no longer
>>exist, and so those LSIDs become meaningless.
>
>
> In that case, I would vehemently oppose the use of LSIDs -- especially ones
> issued from multiple sources, which rely on the issuer existing into
> perpetuity. It seems MUCH more feasible to me that the GUIDs only be used
> within a prescribed context, than it would to require that all LSID issuers
> exist into perpetuity, and be functional at all times that someone needs to
> resolve the information associated with any particular ID value.
>
> Embedding issuer context in a GUID makes sense to me. Restricting
> resolution of GUID to the embedded issuer *only*, seems like a very
> dangerous system to me.
>
Yeah, but once again - if the single issuer no longer exists, then
everything is gone. That would be a real drag.
Part III to follow!
Dave V.
1
0
Richard Pyle wrote:
>>I have to disagree - kind of. A non-information-bearing GUID such as
>>one generated by a MAC, eg
>>
>>{92AB5B37-70E9-4f05-9E97-CBABD08513ED}
>>
>>is completely useless unless it only appears within the context of a
>>system that provides more information about what it actually is.
>
>
> Yes, that would be an assumption. But not an unreasonable one. I'm trying
> to imagine a scenario where I am presented with a series of MAC id's where I
> don't inherently understand the context. I suppose if I came in to work and
> found such a number scribbled on a piece of paper, with no other
> information, I'd be in a fix to figure out what the number refers to. But
> obviously that's not a realistic scenario. I suspect that such IDs would be
> used by computers (not humans), and would only be exchanged among computers
> in some sort of semantic context; e.g., within the context of a DwC2 XML
> file, nestled between appropriate tags:
>
> <GlobalUniqueIdentifier>92AB5B37-70E9-4f05-9E97-CBABD08513ED</GlobalUniqueId
> entifier>
>
> ...these themselves nestled within further context tags.
>
Yes, certainly, a GUID within the context of an XML document is pretty
well defined by the schema, dtd or just it's loose association with
other elements in the document.
But what about if one appears in a journal article, a citation in a
policy document, etc? It would be nice to be able to provide a unique
identifier as perhaps a footnote for a scientific name mentioned in a
document. Or perhaps a system might be developed that provided an LSID
for a DiGIR query document- so the dataset could be completely recreated
just be hitting on the LSID (yes, one is under construction). One could
imagine simply passing the LSID to another infrastructure that say,
estimated potential distribution, or highlighted relevant news reports
from an AP feed mentioning the species for which the query was created.
Using a simple, meaningless GUID buys us none of this potential, and
forces us to always use a wrapper to provide a contextual basis on how
to interpret the identifier.
cheers,
Dave V.
1
0
On Fri, 2004-09-24 at 18:17, Dave Vieglais wrote:
....
> document. Or perhaps a system might be developed that provided an LSID
> for a DiGIR query document- so the dataset could be completely recreated
exactly?
> just be hitting on the LSID (yes, one is under construction). One could
> imagine simply passing the LSID to another infrastructure that say,
> estimated potential distribution, or highlighted relevant news reports
> from an AP feed mentioning the species for which the query was created.
> ...
--
Greg Whitbread <ghw(a)anbg.gov.au>
ANBG/CPBR/ANH
1
0
At 02:30 PM 9/24/2004, Richard Pyle wrote:
>I think the important parts of this discussion surround the functional
>parameters of the GUIDs for biological objects:
>
>1) Should issuance of IDs be controlled from a single source; or freely
>created by anyone with a computer, anywhere, anytime; or something
>in-between?
>
>2) Is it important that all biological objects use the same scheme for ID
>sourcing, or is it advantageous to chose a scheme optimal for each class of
>object (e.g., privately owned and managed specimen data, vs. publicly owned
>and managed taxonomic nomenclature data)?
To provide another context for GUID's and LSID's, the CIPRes project (via
Dan Miranker) is considering LSID's as the glue (there is probably a
better metaphor) for backpointers to datasources for taxa/characters/states
from the next version of Treebase. That is, if you deposit a Nexus file
(which will obviously no longer be a monolithic 'file', but an XML
document) in TreeBase and have images/movies/voxel data sets to illustrate
a character state, a LSID will provide a source for exploring additional
supporting documentation for that cell of the matrix. SDD fits into this
communication in ways that are clear only at a pretty abstract level, but
are involved in standardizing the exchange of data (mostly morphological)
between character sources and CIPRes. We also imagine that a LSID will
provide a link to the taxon concept invoked for a row in the matrix and
presumably for specimens used to make the observations.
The interlinking of biodiversity and phylogenetic data in this context
suggests to this observer (somewhere between Richard Pyle and his aquarium
fish in terms of understanding), that a single source of ID's would not
scale. It also strongly argues for LSID's over meaningless keys as
essential to projects like CIPRes, which only want to store the logic for
resolving a URN (and in real time for users of the data) that could refer
to a bunch of different kinds of things.
Hope that is clear (enough).
Wishing I was going to New Zealand, Julian
Julian Humphries
DigiMorph.Org
Geological Sciences
University of Texas at Austin
Austin, TX 78712
512-471-3275
1
0
Some comments in text below.
Dave V.
Richard Pyle wrote:
> I want to start by wholeheartedly endorsing Wouter's plea for
> non-information-bearing (meaningless) GUIDs. This feature is CRITICAL to
> the long-term success of any GUID system. It is absolutely imperative that
> there NEVER be any motivation to change the content of a GUID (i.e., it
> should be permanent). If the GUID itself contains any information
> whatsoever, there may be motivation to change that information at a later
> time.
I have to disagree - kind of. A non-information-bearing GUID such as
one generated by a MAC, eg
{92AB5B37-70E9-4f05-9E97-CBABD08513ED}
is completely useless unless it only appears within the context of a
system that provides more information about what it actually is. That's
the point of the LSID or DOI, they provide GUIDs that identify what
system can be used to resolve them. If GUIDs for names or specimens or
whatever are to be used in other systems, then it is essential that the
GUID can be associated with a resolving system.
>
> For this reason, I had initially preferred the DOI approach, but over time,
> I am gradually warming up to the LSID approach. While components of an LSID
> do, indeed, represent information, they represent the one piece of
> information that I think may legitimately belong embedded within a GUID:
> context. That is, the context, or domain, of the GUID itself. The context
> in this case would be the "issuer" of the GUID -- not necessarily the
> current "owner" of the GUID (see more discussion on this below). Though the
> organization that issued a GUID may eventually disappear, the fact that the
> organization was the one to issue the GUID in the first place will never
> change, and thus represents a permanent and unchanging component of the
> GUID. Without the context portion, the GUID itself is really nothing more
> than a random string of characters. In summary, I'm warming up to the LSID
> approach because it represents embedded context, without the risk of
> temptation to change the content of a GUID after it has been issued.
Both the DOI and LSID approaches are structured and provide context.
The DOI system uses the NISO Z39.84-2000 standard for categorization,
the LSID uses the domain name system. Both provide a context essential
for reuse of an identifier outside it's original context.
>
> Regarding Donald's PPT file, I have a couple of comments and questions:
> (Assumes Title slide is "Slide 1")
>
> Slide 2:
> You note there is "No reliable mechanism" to relate the same record from
> different providers to each other. But in the context of DarwinCore, the
> combination of [InstitutionCode]+[CollectionCode]+[CatalogNumber] should
> represent a virtual GUID (provided that the Global Provider Registry ensures
> no duplication of [InstitutionCode]). I do realize that words like "should"
> and "reliable" are critical here. Perhaps the DarwinCore implementation
> should enforce the requirement of uniqueness of
> [CollectionCode]+[CatalogNumber] within a single [InstitutionCode], and
> further ensure globally unique [InstitutionCode] values via the Global
> Provider Registry.
This was one of the first recommendations to GBIF - to provide a
registry of institution codes for exactly this purpose. Having a tool
that verified the uniqueness of records within a collection as exposed
by it's provider (either biocase or digir) would help this uniqueness
problem. Now that the UDDI registry is available, we could in theory
use the institution identifiers in there.
>
> Slide 3:
> Wouldn't most of the problems indicated in the first four bulleted points be
> largely solved by the Global Provider Registry? Using the [InstitutionCode]
> would allow lookup in the registry for a (current/active) metadata URL, and
> the metadata URL would provide information on where to access a particular
> [CollectionCode]+[CatalogNumber] piece of data.
>
> The issue of specimens changing numbers and/or collections is problematic,
> of course.
>
> The issue of versioning is a bit dicey, in my mind (e.g., at what resolution
> of information change)? Some things, like changing taxonomic determinations
> (i.e., "real" changes) need to be handled in a robust way. Other things,
> like the correction of typos and different styles of representing the exact
> same information (e.g., R.L. Pile==>R.L. Pyle; or R.L. Pyle==>Pyle, R.L.)
> probably don't need to be versioned. Other sorts of changes (e.g., the
> elaboration of previously existing information, such as the addition of
> retroactively-generated georeference coordinates) fall somewhere in-between.
>
> Slide 4:
> We should all get behind SEEK in addressing these issues (Taxon concept
> mapping). Ultimately, we minimally need a GUID pool for References
> (inclusive of unpublished works), and a GUID pool for what I call
> "Protonyms" (original creations of IC_N Code-compliant names). The union of
> these two GUIDs (what I would call "Assertions") would itself represent a
> GUID to a "potential concept" (Berendsohn). (Note: my preference would be to
> define Protonyms as a subtype of Assertions, and therefore Protonym GUIDs
> would be a subset drawn from the same pool as Assertion GUIDs -- but this is
> a technical discussion for another time).
>
Good progress is being made on this, prototypes should be ready for
evaluation of the LSID approach soon.
> Slide 5:
> Nice summary!!
>
> Slide 6:
> Good stuff here, but I'll respond with some of my personal opinions:
>
> - RevisionID: see points of concern already expressed above
>
> - Specimen Record LSIDs: I gather from subsequent slides that you recognize
> two alternative approaches: having the "owner" of a specimen assign the LSID
> within the context of their own <domainName>, or adopting GBIF as the
> international standard issuer for ALL specimen GUID. In other words, GBIF
> would represent the centralized issuer of GUIDs for all biological
> specimens, and the biological specimen community would/should rally around
> GBIF for thus purpose, and adopt GBIF specimen GUIDs as their own. I
> personally have no problem with this (I do not live in fear of "Big Brother"
> centralization when it serves the benefit of all, as I believe it would in
> this case) -- but I know there are many who might have a problem with it,
> and therefore it might not garner widespread adoption without large volumes
> of "fuss".
I strongly disagree that there should be a single GUID issuer or
resolver. What we really need is an organization that operates kind of
like a certificate authority- GBIF could act as the root from which
other trusted GUID issuers may be created. In this way we can avoid the
arbitrary creation of GUIDs yet still provide considerable flexibility
and de-centralization in the community.
>
> If, on the other hand, each organization issues its own GUIDs for its own
> set of specimens, then the question is when, if ever, GBIF would assign a
> specimen GUID? Perhaps as a surrogate for institutions that lack the
> technological ability to assign their own LSIDs? But I wonder, how many
> institutions that could server electronic data of their holdings to the
> internet would lack the ability to assign their own LSIDs?
It would be a relatively simple task to include a LSID resolver service
along with a DiGIR provider. I have prototyped such a system a while
back, but other issues prevented deployment. With such an
implementation, it would be trivial to assign unique identifiers to
specimens - but first the problems institutions seem to have even
providing unique identifiers within a collection must be resolved.
>
> As you've outlined in subsequent slides, I see two alternative paths: A)
> Get the biological world to rally around GBIF as the centralized provider of
> GUIDs for specimens for all collections; or B) Have each
> collection/institution issue its own set of LSIDs for its own specimens, and
> have GBIF adopt those LSIDs for its own internal purposes. I could get
> behind either approach, but I see danger in the adoption of a mixture of
> these two approaches. I'll defer elaboration, but a lot of it has to do with
> potential confusion about whether the GUID applies fundamentally to the
> physical specimen, or the electronic conglomeration of data associated with
> the specimen. Also, I think we should avoid the risk of assigning two
> separate GUIDs for the same "single data element" (sensu your Slide 5).
>
A mixture would still work, provided there was appropriate coordination
between the efforts.
> - Name record LSIDs: I understand the example of an IPNI LSID for a plant
> name, and presumably there would be analogous "Catalog of Fishes" LSIDs for
> each fish name, etc. But I don't think that would be a wise approach.
> Unlike specimen records, where there are fairly unambiguous "owner"
> institutions (or at least "original owner" institutions that issued a GUID),
> taxonomic aggregators (IPNI, ITIS, Species2000, GBIF, uBio, etc.) are most
> certainly not owners of the taxonomic names that they include in their
> databases. We would want to avoid the risk of duplicate GUIDs for the same
> name, and thus the need for mapping, e.g., an IPNI GUID for a name to its
> ITIS equivalent. Again, I can't help but think that the world will be a
> better place if we can avoid assigning multiple GUIDs to the same "single
> data element".
>
> One approach would be to rally around GBIF, and rely on them to issue GUIDs
> for all taxon names. However, I also recognize that we do not exist in a
> political/personality vacuum with regards to "ownership" of taxonomic names,
> or the electronic representations thereof. Therefore, the closest thing
> that exists to an "owner" of a taxonomic name is the Commission of
> Nomenclature (and it's respective Code of Nomenclature) under which the name
> was established. Thus, when it comes to assigning GUIDs for names (not
> concepts), I would propose the following:
>
> urn:lsid:ICZN.org:TaxonName:XXXXXX (all zoological names)
> urn:lsid:ICBN.org:TaxonName:XXXXXX (all botanical names)
> urn:lsid:ICNB[or LBSN??].org:TaxonName:XXXXXX (all bacteriological names)
> urn:lsid:ICTV[or ICVCN??].org:TaxonName:XXXXXX (all virus names)
>
> In an ideal world, we'd get to the point where there would be a need for
> only one registrar of nomenclature, e.g.:
> urn:lsid:BioCode.org:TaxonName:XXXXXXX
>
> Or, perhaps:
> urn:lsid:gbif.net:TaxonName:XXXXXXX
It is quite likely that there will be multiple LSID generators and
issuers. There is no real reason why this should be prevented, except
to ensure that appropriate measures are taken to avoid duplication of
GUIDs for the same object (taxonomic concept in this case). So a
critical piece of infrastructure for a name service that was intending
to assign GUIDs would be a mechanism for determining if the object they
are about to assign the GUID to is not already present in the system,
held at some other location. There needs to be something like a global
"findThisObject(taxon_object)" that absolutely guarantees that the
instance doesn't exist some other place. And if duplicates were to
occur, then there must also be a mechanism for indicating equivalence
between GUIDs, or perhaps a way of deleting the duplicate (how to decide
which is the duplicate?).
Forcing the use of a single DN such as BioCode.org for all names would
seem to be a mistake, since that implies a single resolver service for
all names- with obvious implications in case of failure. Perhaps there
can be multiple resolver services with a single DN? That would probably
work fine then.
>
> But I don't think we're quite there yet.
>
> In any case, the idea would be for the taxon name aggregators to adopt the
> unambiguously unique GUID for each taxon name.
>
> Taxonomic concepts are a whole 'nother ball of wax....
>
> Slide 8:
> I actually prefer this approach (GBIF as the central issuer of specimen
> GUIDs), for a variety of reasons. One of the main reasons is that it would
> assure uniqueness of an integer within a given <namespace> (e.g.,
> Specimens), which would make things a bit easier for those of us who like to
> use integers as primary keys in databases. In other words, it avoids the
> possibility of urn:lsid:bishopmuseum.org:Specimen:1234567 colliding with
> urn:lsid:usnm.gov:Specimen:1234567, when reducing the GUID to just its
> integer component for local application purposes (where context can be
> enforced by other means). However, I should point something out regarding
> the "Advantage" part of this slide, which is that the "problem" of
> transferring record locations doesn't exist, provided that the <domainName>
> component of the LSID is taken as the issuer of the GUID, not as the current
> owner of the specimen. In other words, if Bishop Museum assigned GUID
> urn:lsid:bishopmuseum.org:Specimen:1234567 to a specimen, and then gave that
> specimen to Smithsonian, then Smithsonian would retain the complete GUID
> intact as: urn:lsid:bishopmuseum.org:Specimen:1234567.
>
> The danger comes when you try to use the <domainName> component as metadata
> to represent the current location of the specimen and/or its electronically
> represented data. This is where Wouter's original point about 'meaningless'
> GUIDs comes into play. If the whole point of using LSIDs is to embed the
> "current location" information within the ID itself so that applications can
> retrieve additional data associated with the GUID directly, then I have some
> concerns (mostly address already).
The LSID service must be able to resolve the object. When the object
moves some other place, then there will need to be a mechanism for the
LSID service to forward the resolution to the appropriate service. The
really big problem is when an institution no longer exists - so the
hypothetical example of Bishop museum consuming all the Smithsonian fish
collections - the Smithsonian LSID resolver would perhaps no longer
exist, and so those LSIDs become meaningless. Perhaps there's a
delegation mechanism that can be used? So when a DN can't be resolved,
the system backs down to a default DN, such as gbif.org that would then
indicate that smithsonian.org is now bishop.org?
>
> Why there is a reference to urn:lsid:gbif.net:TaxonConcept:106734 at the top
> of this slide???
>
> Slide 9:
> Again, I'm not sure I understand on this slide why there is a reference to
> urn:lsid:ipni.org:TaxonName:82090-3:1.1
> Also, in this model, what function does the LSID serve that is not met by
> the concatenated [InstitutionCode]+[CollectionCode]+[CatalogNumber] (in the
> context of Global Provider Registry).
>
> Slide 10 (taxon concepts and literature):
> This message is already getting too long... :-)
> I already touched on this above under "Slide 4". I definitely agree that we
> need a GUID system for References. This should include more than just
> published references. It doesn't quite exist yet among the existing
> Reference registrars (as far as I can tell) to accommodate the specific
> needs of taxonomists (e.g. referring to a subsection of a reference as
> representing an original taxonomic description), so I do see a need to
> create a Reference GUID system specific to biology. I could rant for pages
> on this, but I'll summarize simply with a plea to *DEFINE* a Concept GUID as
> an intersection between an Name GUID and a Reference GUID (i.e., what I
> would call an "Assertion"). Not all Name-Reference combinations will be
> worthy of recognition as a distinct "Concept", but all are *potentially*
> representative of a concept (Berendsohn), and thus all should be drawn from
> the same pool of GUIDs as Concept GUIDs. In other words, "Concepts" should
> be thought of as a subtype of Name-Reference instances. I would go further
> to suggest (as I did above) that "Name" GUIDs should also be a subtype of
> Name-Reference instances (non-exclusive of Concept subtype instances), using
> the Name-Reference instance that represents the Code-recognized original
> description of the name as the "handle" to the Name.
>
> By this approach, you need only two GUID object classes <objectClass>: one
> for References, and one for Name-Reference intersections (Assertions). The
> latter of these could serve as the source for both Concept GUIDs and Name
> GUIDs.
>
> Last Slide:
>
> My own answers to your questions:
>
> 1) Are LSIDs the most appropriate technology?
>
> I'm increasingly coming to that conclusion.
I agree. The LSID system is easy to implement, stable, scalable and
does everything we need. The DOI system is good as well, but the fee
scheme bothers me (though I understand there are ways around that).
>
> 2) Should identifiers be assigned and resolved centrally or via a fully
> distributed model (or should providers have the option of using either
> model)?
>
> I think the best option would be central. The next option would be full
> distributed. Leaving it as an option would, in my opinion, be a BIG
> mistake.
I disagree- the assignment of identifiers should be by the curators of
the data. However, I do strongly consider that there should be some
sort of trust scheme in place, where identifiers are issued only by
entities trusted by the rest of the system. A scheme similar to that
used by certificate authorities and delegates should be adequate.
>
> 3) Which objects should receive identifiers?
>
> Specimens, References, Name-Reference intersections (Assertions), and
> perhaps Agents. [TaxonNames and Concepts can be subsets of Name-Reference
> intersections].
Any object. It doesn't matter what it is, just that it can be resolved,
and when you find it, you can figure out what it is. Sensible use of
the NameSpace portion of the LSID will help a lot with this. A trusted
organization should issue the NameSpace portion to avoid NS conflicts.
>
> 3a) Should we develop a set of object classes for biodiversity informatics
> and assign identifiers to instances of all of these?
>
> I think so, yes. Of course, it depends a bit on who you mean by "we". I'm
> thinking sensu lato.
Sure, and these could be a core from which others can be built. But we
should asolutely not restrict the capability of the "system" to accept
new classes - even classes that represent the same infomration in a
different way that may be appropriate to a group of users.
>
> 3b) Should identifiers be associated with real world objects (e.g.
> specimens), or with digitised records representing them (e.g. perhaps
> multiple records representing different digitisation attempts by different
> researchers for the same specimen), or both?
>
> I would say definitely real-world objects (treating things like
> Code-recognized original descriptions of taxon names, and citable references
> as "real-world objects"). I do NOT think we should have separate GUIDs for
> digital representations thereof. Alternative digital representations are
> simply clutter that will eventually be weeded out of the system, once we all
> get organized on this stuff, and harness the power of the internet to
> implement a global editing/QA system.
Yeah, we need to be very clear about what these identifiers are assigned
to. There should be very clear documentation about this that is
accepted by the relevant community. Where possible, it makes a lot of
sense to use the same identifier in the electronic record as that
associated withthe physical object- afterall, the electronic data is
really just metadata about the physical object.
>
> 4) What should be done about existing records without identifiers?
>
> As far as I know, ALL records are currently without identifiers (unless
> someone established a widely accepted GUID system and I missed the
> announcement...)
All records currently have some sort of identifier, the problem is their
uniqueness is not rigorously enforced or even evaluated, so their
usefulness is probably limited.
>
> 4a) Should they be left alone?
>
> Ultimately, no.
>
> 4b) Should they all be updated with identifiers?
>
> Ultimately, yes.
All records that will be referenced by another entity need to have
unique identifiers in order for a robust system that allows reuse of
data to be properly implemented.
>
> 4c) Should the provider software be modified to generate "soft" identifiers
> (ones which we cannot guarantee in all cases to be unique) based e.g. on the
> combination of InstitutionCode, CollectionCode and CatalogNumber?
>
> As an interim solution, perhaps. See my comments under "Slide 2" above.
Yes, but not soft. The providers should assign their own identifiers,
but there must be a mechanism to ensure that identifiers are being
properly assigned.
>
> 5) Are revision identifiers a useful feature?
>
> I would like to think not. If the information is truly dynamic over time
> (e.g., re-determinations of taxonomic identity of specimens), then
> individual instances should probably receive their own set of GUIDs (as
> opposed to versions of the "parent" GUID). If the information is static
> over time, and changes represent objective corrections, then I don't see a
> real need to track that within the context of a GUID (record edit history
> may or may not need to be tracked, but this seems to me to be a separate
> issue from GUIDs).
Revision information is very helpful in dealing with errors such as
keystroke errors or other such details that do not change the object.
>
> 5b) How many providers will be able to provide and handle them?
>
> If versioning is incorporated, then it should be designed such that a
> "default" version is provided automatically when versioning is not handled.
>
Not many. It seems most collections don't record any history in their
record edits, so without a major alteration in the way the data are
stored, it will be a significant undertaking to provide useful revision
information.
>
> Sorry for the long post, but I feel that this issue is extremely important
> at this point in bioinformatics history.
>
> Aloha,
> Rich
>
> Richard L. Pyle, PhD
> Natural Sciences Database Coordinator, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef(a)bishopmuseum.org
> http://www.bishopmuseum.org/bishop/HBS/pylerichard.html
>
>
>>-----Original Message-----
>>From: TDWG - Structure of Descriptive Data
>>[mailto:TDWG-SDD@LISTSERV.NHM.KU.EDU]On Behalf Of Donald Hobern
>>Sent: Thursday, September 23, 2004 6:22 AM
>>To: TDWG-SDD(a)LISTSERV.NHM.KU.EDU
>>Subject: Re: Globally Unique Identifier
>>
>>
>>This is precisely one of the key questions we need to address with any
>>identifier framework we adopt. I think we could easily use LSIDs in a
>>way that should overcome your concerns, and I think that the built-in
>>mechanisms for discovery and metadata access within the LSID model are
>>really exciting.
>>
>>I have just put together a PowerPoint presentation to explain some of
>>what I think we could achieve with globally unique identifiers and
>>particularly with LSIDS. It can be downloaded from:
>>
>>http://circa.gbif.net/Public/irc/gbif/dadi/library?l=/architecture/globa
>>llyuniqueidentifier/
>>
>>It may be clearest if you go through it as a slide show rather than in
>>edit mode.
>>
>>Thanks,
>>
>>Donald
>>
>>---------------------------------------------------------------
>>Donald Hobern (dhobern(a)gbif.org)
>>Programme Officer for Data Access and Database Interoperability
>>Global Biodiversity Information Facility Secretariat
>>Universitetsparken 15, DK-2100 Copenhagen, Denmark
>>Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
>>---------------------------------------------------------------
>>
>>
>>-----Original Message-----
>>From: TDWG - Structure of Descriptive Data
>>[mailto:TDWG-SDD@LISTSERV.NHM.KU.EDU] On Behalf Of Wouter Addink
>>Sent: 23. september 2004 17:38
>>To: TDWG-SDD(a)LISTSERV.NHM.KU.EDU
>>Subject: Re: Globally Unique Identifier
>>
>>It seems that DOI allows for any existing IDs to be used as part of the
>>unique identifier. That seems to me as a fast to adopt short term
>>solution
>>but not a good idea for the long term. At first sight I very much liked
>>the
>>LSID specification, but the longer I think about it, the less I like
>>some
>>parts. What I think is missing in the LSID specification is that the
>>unique
>>identifier should be 'meaningless' apart from being an identifier to
>>become
>>time independent (and to avoid possible political problems). Any
>>solution
>>with a URN I can think of has some meaning, which makes solutions like a
>>MAC-address generated GUID favorable in my opinion. And any meaning you
>>need
>>(like an authority of an object) can be specified in metadata instead of
>>using it in the identifier. What is not very clear to me in the LSID
>>specification is where the LSID generated by a LSIDAssigningService is
>>actually stored.
>>
>>Wouter Addink
>>
>>----- Original Message -----
>>From: "Gregor Hagedorn" <G.Hagedorn(a)BBA.DE>
>>To: <TDWG-SDD(a)LISTSERV.NHM.KU.EDU>
>>Sent: Wednesday, September 08, 2004 6:20 PM
>>Subject: Re: Globally Unique Identifier
>>
>>
>>
>>>I am not quite sure, but to me it seems with "GUID" you refer to the
>>>numeric, MAC-address generated GUID type. I have nothing against
>>>these. However, any URN in my view is a GUID that has most of the
>>>properties you mention:
>>>
>>>
>>>>- it is guaranteed to be unique globally, and can be created
>>
>>anywhere,
>>
>>>>anytime by any server or client machine - it has no meaning as to
>>>>where the data is physically located and will there not confuse any
>>>>user about this
>>>
>>>>- most id
>>>>mechanisms, especially URI/URN ids require a 'governing body' to
>>>>handle namespaces/urls to ensure every URN is unique, whereas a GUID
>>>>is always unique
>>>
>>>The governing body is restricted to the primary web address, and in
>>>most cases such an address is already available. Being a member of a
>>>governmental institution that explicitly forbids the use without
>>>prior consent, and forbids the use of its domain name once you are no
>>>longer working for them, I realize some potential for problem.
>>>
>>>
>>>>I do think a URL of some kind would be useful for things such as
>>>>global searches of multiple databases, as this will allow the search
>>>>to go directly to the data source where the name, referene, etc comes
>>>>from. But this should not be part of its ID. Maybe a name/id should
>>>>have several foms, a GUID for an ID and a URL + a GUID for a fully
>>>>specified name.
>>>>
>>>>What are the current thoughts on these ideas?
>>>
>>>A GUID is only part of the problem. The other half of the problem is
>>>actually getting at the resource. URN schemes like DOI or LSID (I
>>>prefer the latter) intend to define resolution mechanisms. That make
>>>the URN not yet a URL - in my view the good comes with the good,
>>>location and reorganization independence.
>>>
>>>I believe GBIF should install such an LSID resolver, which is why in
>>>the UBIF proxy model, under Links, I propose to support a general URL
>>>(including potentially URNS), a typed LSID and a typed DOI. This
>>>could be simplified to have just a URN (LSID and DOI are URNs), but
>>>that would then require string parsing to determine and recognize the
>>>preferred resolvable GUID types. Comments on splitting/not splitting
>>>this are welcome!
>>>
>>>There may be some need to define a non-resolvable URN/numeric GUID as
>>>well. However, that would not be under the linking question. Is it
>>>correct that linking requires resolvability, or am I thinking into a
>>>wrong direction?
>>>
>>>Gregor
>>>
>>>
>>>----------------------------------------------------------
>>>Gregor Hagedorn (G.Hagedorn(a)bba.de)
>>>Institute for Plant Virology, Microbiology, and Biosafety
>>>Federal Research Center for Agriculture and Forestry (BBA)
>>>Koenigin-Luise-Str. 19 Tel: +49-30-8304-2220
>>>14195 Berlin, Germany Fax: +49-30-8304-2203
>>>
>>>Often wrong but never in doubt!
>
>
1
0
Thanks, Julian!
As I stew over this stuff, I am ever so gradually warming back up to LSID,
even with their (apparent) handicaps. At least for "owned" data (specimens,
etc.). I'd still like to know what happens (in terms of LSIDs) when one
museum gives a specimen to another museum. This sort of think happens very
routinely. Less routine, but by no means unheard of, are cases where an
entire collection is transferred from one institution to another. I'm still
trying to contemplate the implications for "public domain" data (Reference
citations, Taxon Names, published Taxon Concepts, etc.), and it still gives
me gastric Lepidopterans.
> observer (somewhere between Richard Pyle and his aquarium
> fish in terms of understanding)
Ha! Nope -- I think you're somewhere in the much broader gap; between me
and Bill Gates.
Aloha,
Rich
1
0