Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .

I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.

LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).

The references posted by Markus Döring  were:

(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/tm-07-01.pdf
"Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe
The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.

(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.


Regards

Rod

On 5 Jun 2007, at 19:40, Chuck Miller wrote:

I just want to take the time here to say AAAAAARGGGGGH!!!  Does anyone
else share this with me?  

It is clear as day that we do not have a decision on what an LSID
specifically and exactly is (for this community), much less how to
actually make use of an LSID, and then how in world could we reliably
relate them together.  

It's difficult to accept that so much time has been spent by so many
people discussing this at length in meetings and workshops, in Wikis,
and in emails but still we have different opinions of what an LSID
specifically is for biodiversity data.

What mechanism can we employ that will end this LSID fuzziness once and
for all, so we can move on to making them functional?  

Is the official, unambiguous definition of an LSID something that needs
to be voted on and accepted at the TDWG meeting?  Perhaps several
definitions, one for each type of LSID - concept, name, specimen,
literature, and so on?  But a definition (right or wrong in some folks'
opinion) that will end the debate.  

Chuck

Chuck Miller
VP-IT & CIO
Missouri Botanical Garden
St. Louis, MO, USA


-----Original Message-----
[mailto:tdwg-guid-bounces@lists.tdwg.org] On Behalf Of Bob Morris
Sent: Tuesday, June 05, 2007 12:15 AM
To: Kevin Richards
Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]

I am constantly confused in this thread by the term "use an LSID"
seeming to mean "have an LSID".

Whenever there is a "conceptual" object such as a physical object or
an observation, there is a strong tradition that its LSID have empty
data and meaningful metadata. In these cases, I strongly hope there
will be such an LSID and that there be only a single one for it, which
is not actually required by the LSID spec. Without this, for example,
an assertion that  two different  LSID1 and LSID2 both identify the
type specimen for Aus bus depends on good will and accurate data. On
the other hand, if I am presented with two LSIDs that are identical,
no reasoning on returned metadata, nor even  resolution,  is required
to know that they are talking about the same thing. Thus, if two
different specimen records, intending to describe the same thing,
refer to the same LSID in their data in some attribute like
"DESCRIBES", then those records are unambiguously describing the same
thing. On the other hand, if the specimen records \themselves/ are to
have LSIDs, they must NOT have the same LSID, as that violates the
specification and purpose of LSID (or any other GUID).

I think the confusion here may be some notion that an LSID inherently
describes something. It doesn't. It only serves to distinguish its
target  from other things that may have LSIDs.  Nothing in the LSID
spec implies that the metadata returned by LSID resolution is a
description, though some communities may wish to do make it so. For
mutable electronic objects, such as a specimen record  given an LSIDs,
it would be entirely reasonable for the resolution to return the
current content of the record as its LSID metadata. It would be
incorrect to return that record as data, if the content of that record
is ever permitted to change (which specimen records usually are).

On 6/5/07, Kevin Richards <richardsk@landcareresearch.co.nz> wrote:
I agree with Roger that LSIDs should identify the data record, not the
physical object.  In many cases there will not be a physical object,
event or whatever.  Any physical object is only really "described" by
the digital record (and this does not even guarantee the connection is
correct - due to typos etc).

Also consider the situation where you have a single specimen and two
different databases that have data about that specimen - should they
use
the same LSID?  I think not.  For one, when you "resolve" the LSID it
will only ever fetch the data for the database that is the "owner" of
the LSID, and is therefore handling the resolution.  It would be
better,
I think, to use metadata of the objects to connect these two datasets
-
eg the sameAs, replacedBy, etc RDF tags that others have mentioned on
this thread.  Of course it is crucial that we reuse LSIDs where
possible, but only if when referring to an external ID (ie not much
point in having the ID for "your" data being the LSID of another
system
- then your data will never be uniquely identified, or retrievable).
Does this make sense?

Kevin


Roger Hyam <roger@tdwg.org> 06/04/07 9:31 PM >>>
Wow this is a big thread appears over the weekend!

I am posting without having time to read and digest everything in its
entirety but only to answer Jason's original question I hope.

Received wisdom is don't use the version part of an LSID. If you did
you would be creating new LSIDs anyhow so in a way it doesn't matter.
The identifier is supposed to be opaque so the client should never
break the LSID into parts anyhow and would only to byte identical
comparisons to the LSIDs themselves to see whether they are the same
things or not.

If you want to do versioning create an LSID for the "thing that
changes" and an LSID for each version of that thing.

Each of the versions are linked by dcterm:replaces and
dcterm:isReplacedBy.



Each version points to the LSID for the "thing that changes" with
dcterm:isVersionOf


We have our own vocabulary item for the one link in the change that
isn't support by Dublin Core. tcom:versionedAs


This points from the "thing that changes" to the current version.
i.e. the version that has identical metadata to itself. Anyone who
caches the data for the current version of the LSID can know which
version they have so if it becomes retrospectively important to get
back to the actual version it is possible.

Philosophically when do you version? Only the provider can say that a
change is significant enough to warrant a new version - but if you
have gone to the trouble of implementing version control you may as
well do it for any change. Only the data provider can say whether the
"thing that changes" has changed so much that it is no longer the
same thing.

Personally I believe this approach nails the versioning issues.

There is the perennial debate about whether an LSID points to a
physical object or not (when it doesn't have byte stream associated
with it). The answer is easy. It points to a digital object. If you
doubt this try destroying a physical specimen and then asking whether
you should do away with the LSID and associated data? Clearly you
would maintain a record of something you once had so that you could
still return the data. Likewise if you gave the specimen to another
institution you would maintain a record of having had it but would,
hopefully, link to the new institutions record of it.

Hope this is helpful.

Roger



On 3 Jun 2007, at 15:08, Paul Kirk wrote:

Anybody got any views (strong, otherwise or proxy for others views)
on whether the LSID should refer to data+metadata or just metadata?

From where I sit, closer to physical objects than bits in a bit
stream, I favour the former. Take names for example. Strings of
characters and spaces whose form is governed by Codes and one of
the means, if not the primary mean, by which we communicate
(verbally, in print or electronically) about biodiversity. For
LSIDs applied to names my understanding is that they must resolve
to an unchanging bit stream representing the name (we implemented
this in Index Fungorum 1st May 2005 when we set up the demo
resolver) but the associated metadata may change. If I'm correct on
this one how does it work for LSIDs only resolving metadata, which
is not fixed. I know Roger tried to explain this one to me but I'm
still not sure it's entirely logical.

I think I'm with Rod on the LSIDs for specimens - they do not
represent the physical object but are a sort of digital substitute
(or substitutes) of that object.

And I also support Rods view that we should as far as possible
avoid the duplication of GUIDs. Thus, for names it appears logical
(although I must declare an 'interest' here so others may see a
conflict) that the globally recognized nomenclators (IPNI, IF,
ZooBank (soon), the bacterial list, the algal list, the virus
database - I forget the acronyms here) be charged with providing
these GUIDs (currently as LSIDs) for all of us to use. And
following on from that, the 'institution' which is charged with
providing the digital representation of specimens is the
institution which is the custodian of the physical object.

Regards,

Paul

From: Roderic Page [mailto:r.page@bio.gla.ac.uk]
Sent: Sun 03/06/2007 12:03
To: Weitzman, Anna
Cc: Richard Pyle; Paul Kirk; Jason Best; tdwg-guid@lists.tdwg.org
Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]

I think we need to be clear what gets an LSID (or a GUID in
general).

Some of the things listed by Anna are digital records, such as an
image. It seems simplest to give these GUIDs that identify the
image, with metadata linking the image to the thing the image
depicts (there are existing RDF vocabularies to do this).

Some things listed, such as a specimen, are physical objects. These
are different from digital objects, and they way in which GUIDs
that identify real things are handled has caused all manner of
related pages bookmarked at http://del.icio.us/rdmpage/303). LSIDs
don't handle this well, unless we rely on metadata saying "the
thing identified by."

So, at least on this level to say that all seven things get the
same GUID is clearly a non starter.

Relationships between things can be easily specified in metadata
("is part of", "depicts", "is kind of").

The final issue is GUID reuse, that is, if somebody uses a INOTAXA
record, they should at a minimum refer to the INOTAXA LSID. This
would particularly apply to aggregators such as GBIF, who should
not present their own identifiers unless GBIF has actually created
the data. You often state "presumably shortly also available to
GBIF in some form". It's not clear to what that means, but if it's
GBIF because INOTAXA serves it, then I think GBIF should use
INOTAXA LSIDs to refer to INOTAXA records.

Clearly, generating a plethora a new, effectively local ids
(masquerading as global) is not a recipe for progress. If we don't
reuse GUIDs we are wasting our time.

Regards

Rod





On 2 Jun 2007, at 18:53, Weitzman, Anna wrote:

Hi Rich (et al.),
I'm going to join this particular discussion  in spite of the fact
that I have not been able to follow the entire GUID discussion
over the past couple of years and I may be repeating things that
have been resolved.

Let's continue to investigate whether an LSID applies to the
physical specimen or the database record (or both?).

What about the record(s) for that same physical object in the
literature?  As we mark up literature, we are going to generate
LSIDs for specimen records that will need to be resolved to be
related to the same physical object (in a collection) and the data
record (usually in that same collection's database).

Let's look at the example that Chris Lyal and I are contemplating
as we work on implementing an INOTAXA pilot to show in Bratislava:
1) a weevil specimen here at USNM (a type described in the BCA)
2) a record for it in the museum's database (we do have a type
database for insects, and it will be available in a year or two),
available on the museum's website, through GBIF, and through
INOTAXA
3) a record from digitized and parsed BCA in INOTAXA (presumably
shortly also available to GBIF in some form)
4) a record for the same weevil from a paper published in the
1950s available through INOTAXA (presumably shortly also available
to GBIF in some form)
5) a record for that weevil from a paper published in the 1990s
available through INOTAXA  (presumably shortly also available to
GBIF in some form)
6) a published image (or series of images) in the paper from the
1990s -- but now also digitized and made available through INOTAXA
(presumably shortly also available to GBIF in some form)
7) a digitized image (or series of images) made in our imaging
project and made available through the museum's database, INOTAXA,
GBIF and MorphoBank

Either each of these (1-7) will need to have its own LSID (or an
equivalent in the case of the specimen itself) or they will all
need to have the same LSID.  If the former, they will all have to
resolve to the same parent LSID--is this for the specimen or the
record in its home database?--in order for the overall
biodiversity information system to really work.

Or let's take that a step further and make that a fish, where not
only is there a record in the museum's database with its LSID, but
that same record for the same fish that was imported some years
ago into FishBase (now out of date perhaps, but still available to
GBIF and via Fishbase).  At the time, it was imported without an
LSID and FishBase has (presumably) assigned it's own LSID...

Or let's say that someone else digitized their copy of the same
BCA volume and followed the INOTAXA (taXMLit) and assigned yet
another LSID for the specimen record...is that really the same
'record' or different from the one in #3?

I would like to think that in the long run we do not need multiple
LSIDs for records that refer to the same specimen or record (as
long as we can be truly certain that they are 'the same'.  After
all, the literature markup has a whole series of unique IDs for
its various parts already, so can't we refer to 'the use of LSID
123 in workID 987' or 'the use of LSID 123 on pageID 456 in workID
987'?

There are a lot of IDs here, but unless every collection database
already has an LSID that we can 'grab' and use in INOTAXA we are
going to have to create our own LSIDs and count on a community
resolver to sort it all out (and even if that were true, not all
the specimens that we are going to be referring to from INOTAXA
have been put in electronic form anyplace else, so we will have to
assign LSIDs at least temporarily--Paul did not mention how they
are going to deal with the Zoological name LSIDs as at least a
temporary solution--but I assume that they have a similar problem).

I'm sure I don't know what the best solution is, but that's what
I'm counting on the computer scientists in this group to tell me.
I just hope they tell me soon, since we're going to need answers
soon!

Cheers,
Anna

Anna L. Weitzman, PhD
Botanical and Biodiversity Informatics Research
National Museum of Natural History
Smithsonian Institution

office: 202.633.0846
mobile: 202.415.4684

________________________________

From: tdwg-guid-bounces@lists.tdwg.org on behalf of Richard Pyle
Sent: Sat 02-Jun-07 5:08 AM
To: 'Paul Kirk'; 'Jason Best'; tdwg-guid@lists.tdwg.org
Subject: RE: [tdwg-guid] First step in implementing LSIDs?[Scanned]




Paul and List,

First, I should clarify something about my earlier post.  I wrote
at the
start of Scenario 3:

"3) Issue data-less LSIDs without using the revision ID feature,
and track
data change history separately from the LSIDs"

That should have been "...and track *metadata* change history
separately
from the LSIDs" (metadata, not data).

So, without making things too complicated as we 'start to walk'
in this domain of biodiversity informatics my vote is for a
variation of scenario 3) from Rich. The reason I vote for this
is that in the fullness of time, and the 'herb.IMI' database
has already started this, much of the metadata with be
LSIDs and it's correctness (i.e. sorting out typos etc) will
be delegated to the entities who issue those LSIDs. As IPNI
improves the quality of the metadata associated with the
LSIDs they issue (and if I understand correctly they do use
the scenario 3) from Rich) so the quality of the metadata
associated with a 'herb.IMI' LSID improves. The reason I
prefer the data + metadate 'model' is that in this instance
the data is fixed ... who changes collection/accession
numbers? ... so perfect for this role. Even if a collection
moves to a new owner the original data need not 'disappear'
in the same way that DOI's move with the objects as book and
journal titles change from one publisher to another.

So...if I understand correctly, you differ from my scenario 3 in
that you do
generate data-bearing LSIDs for specimens, but the data part is
limited to
only the Accession number, not the complete set of data fields
associated
with the record -- correct?  So, in effect, the object LSID
actially applies
to is the binary accession number, not the "concept" of the
specimen.  I can
imagine in this case that the LSID can be thought of as
representing the
"concept of the specimen" because the accession number itself is a
surrogate
for the physical specimen.  The only thing that concerns me about
this
approach is that there is a non-zero incidence of accidental
duplicate
catalog numbers within a given collection, and possibly errors in
associating catalog numbers.  For example, if the computer
database for a
collection had an error created by a technician who, for example,
entered
the metadata for accession number IMI1234569 by mistake, when it
should have
been IMI1234596 (and vice versa), then branding the accession
number as
"data" for the LSID means that the LSID technically *must* stay
with the
accession number (not the specimen associated with the metadata
for that
LSID), after the error is discovered.  Not a huge problem, but
could
surprise people who had indexed the LSID before the error was
discovered,
who then came back to resolve it again after the error was fixed
(i.e., they
would get totally wrong information).  Given how rare this problem
is likely
to be (against a backdrop of many far more likely problems we will
have to
overcome), I don't see this as a strong reason not to proceed with
your
plan.

Final point, the 'data' is the 'herb.IMI' accession number;
in context this is a GUI because of the existence of Index
Herbariorum. So, our data will be 123456 not IMI123456
because ... in the fullness of time we will include an
Index Herbariorum LSID to 'identify' the 'institutional
acronym' element of the metadata.

Is the binary data for the accession number in 8-bit, or 16-bit?
I'm
assuming 8-bit would be fine, as I suspect all collections would
have
accession numbers that can be rendered with 256-character ASCII.
Is there
any "wrapper" to the number as binary data, or is it a straight
ASCII binary
representation (e.g.:
001100010011001000110011001101000011010100110110 for
"12345")?

I'm not sure I follow the logic of how embedding the accession
number as
data for the LSID allows the LSID to move to a new owner.  I would
think the
opposite. Isn't it likely that the new owner would create their own
accession number for the specimen?  In this case, they would be
forced to
generate a new LSID if they were following the same practice of
encoding the
accession number as "data", rather than metadata.

Also, wouldn't it make more sense to include the acronym (IMI) as
part of
the data for the LSID? At least that way the "12345" would have
*some*
context.

Finally, this approach would work only for collections where there
is a
strict 1:1 correlation between accession numbers and specimen
objects for
which an LSID is desired.

Thanks for your comments -- this thread is already forcing me to
think about
things in a way I hadn't thought of them before.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252



_______________________________________________
tdwg-guid mailing list


_______________________________________________
tdwg-guid mailing list


----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone: +44 141 330 4778
Fax: +44 141 330 2792

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names:
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Rod's rants on ants: http://semant.blogspot.com




**********************************************************************

**
The information contained in this e-mail and any files transmitted
with it is confidential and is for the exclusive use of the
intended recipient. If you are not the intended recipient please
note that any distribution, copying or use of this communication or
the information in it is prohibited.

Whilst CAB International trading as CABI takes steps to prevent the
transmission of viruses via e-mail, we cannot guarantee that any e-
mail or attachment is free from computer viruses and you are
strongly advised to undertake your own anti-virus precautions.

If you have received this communication in error, please notify us
by e-mail at cabi@cabi.org or by telephone on +44 (0)1491 829199
and then delete the e-mail and any copies of it.

CABI is an International Organization recognised by the UK
Government under Statutory Instrument 1982 No. 1071.


**********************************************************************

****

_______________________________________________
tdwg-guid mailing list



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be
read,
used, copied or disseminated by anyone receiving them in error.  If
you are
not the intended recipient, please notify the sender by return email
and
delete this message and any attachments.

The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.

Landcare Research

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++


_______________________________________________
tdwg-guid mailing list

_______________________________________________
tdwg-guid mailing list
_______________________________________________
tdwg-guid mailing list


----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone: +44 141 330 4778
Fax: +44 141 330 2792

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Rod's rants on ants: http://semant.blogspot.com