Rich,
I should say that my inclusion of references, acronym definitions etc.
is not to insinuate that you are unaware of those things, but is a
recognition that this is a discussion on a public list and that some of
the readers may have never heard of these things and may not be aware
of the references.š Also, the message to which you responded was a
response to chunks of several emails - I guess a bad practice intended
to cut down on the number of postings and to group related thoughts.š I
thought I had included enough of the the "Roderic Page wrote:" and "
Kevin Richards wrote:" headings to make it clear to which message I was
referring.š A couple statements to which you responded to were written
by Kevin and not me.š
For the purposes of clarity, any time I say "GUID" here, I intend it in
the sense of the TDWG GUID Applicability Statement.š In the GBIF
"Adoption of Persistent Identifiers for Biodiversity Informatics"
document (http://www2.gbif.org/Persistent-Identifiers.pdf), the term
"persistent actionable identifiers" is used instead of GUID, but in the
interest of brevity I'll use GUID.š
Thanks for taking the time to explain more about how GNUB will work.š I
am anxious to see it come to fruition and to use it.š I have additional
comments and questions relative to your description of it, but they
will have to wait for another email.š I think it would be best to focus
this post on the subject of GUIDs because I think that this is the crux
of our disagreement here.š
First a word about the TDWG GUID Applicability Statement.š You were
expressing some reservations about calling it a "standard". If you go
to http://www.tdwg.org/standards/, you will find it listed under
"Current Standards".š My understanding is (and I may be corrected by
those who know better) that a TDWG Standard can be either an
Applicability Statement or a Technical Definition (like Darwin Core).š
In either case, the standard has gone through the review process, been
subjected to public comment, and approved by the TDWG Executive.š So I
consider either an Applicability Statement or a Technical Definition to
have considerably more "weight" than something like a blog post or ad
hoc usage guide.š One problem with the GUID A.S. (Applicability
Statement) is found on the title page.š It says "there is, or will be,
a separate document for the applicability of each specific GUID
technology".š Unfortunately, the "there is" part currently only applies
to LSIDs - no other GUID technology has its own document.š So an
understanding of the "appropriate" way to apply something like a UUID
must be inferred from the general statements and examples about UUIDs,
by "reading between the lines" by considering how general
recommendations about GUIDs would impact the handling of UUIDs, and by
analogy to how LSIDs (another non-HTTP URI-based GUID) are handled.š
You quoted p.7 of the guide:
============================
The global uniqueness of an identifier is often confused with the issue of
resolution of the identifier. These two attributes of GUIDs can be
distinguished and discussed separately.
For example a Universally Unique Identifier (UUID) is a globally unique
identifier, but there are no widely known and used protocols for resolving a
UUID over the Internet (unlike HTTP URIs). This form of GUID is perfectly
acceptable for uniquely identifying data objects within a dataset.
Some identifiers therefore provide uniqueness, but not resolvability.
============================
So based on this, you are correct to call a UUID a GUID.š However, the
part that I disagree with is:
... I think it's foolish to regard all of these different
resolution mechanisms as distinct "identifiers". There is *ONE* GUID. It
is: A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523. There are ten different ways to
make it actionable. It therefore meets the recommendations of the
applicability statement.
The problem is that when you create an HTTP URI out of a UUID, you are
creating an identifier whether you think you are or not.š I suppose as
a matter of semantics, you could say "I don't intend for the ten ways I
showed of making my UUID actionable to be GUIDs", but if I encounter
one of them, how am I supposed to know that?š You may not think that an
HTTP proxied non-HTTP URI GUID (e.g. an HTTP proxied UUID) is a GUID,
but anyone who is interested in describing the properties of the
identified resource in RDF (which should be everyone, GUID A.S.
recommendation 10) will think so.š The GUID A.S. does not contain any
RDF examples (unfortunately) but the LSID Applicability Statement talks
in detail about how LSIDs should be used in RDF.š Recommendation 29 of
the LSID A.S. states that "objects must be identified by an LSID in its
standard form using the rdf:about attribute".š You can do this with an
LSID because it is a urn (subset of the more generic URI) and therefore
a describable thing in RDF.š However, a UUID cannot be used similarly
in an rdf:about attribute because it is not any kind of URI.š It is
just a globally unique string.š Recommendation 31 says "All references
to objects identified by LSIDs using the rdf:resource attribute must
use a proxy version of the LSID."š This is because an LSID (nor a UUID)
cannot be used by a client to retrieve information about the object of
the property (the value of the rdf:resource attribute).š That can only
be done if the GUID is an HTTP URI.š Recommendation 30 says that the
description of all objects identified by an LSID must contain an
owl:sameAs, owl:equivalentProperty or owl:equivalentClass statement
expressing the equivalence beteen the object identifier in its standard
form and its proxy version.š The RDF example given on page 18 show how
this is to be accomplished (fragment shown here):
š <rdf:Description
rdf:about="urn:lsid:ubio.org:namebank:11815">
ššš
<dc:identifier>urn:lsid:ubio.org:namebank:11815</dc:identifier>
ššš <owl:sameAs
rdf:resource="http://lsid.tdwg.org/urn:lsid:ubio.org:namebank:11815"/>
ššš ...
š </rdf:Description>
In this example, the HTTP URI
"http://lsid.tdwg.org/urn:lsid:ubio.org:namebank:11815" is not just a
"resolution mechanism".š It IS an identifier whether you want it to be
or not.š I suppose you could try to define it out of a role as a "GUID"
but that would be playing with semantics (no pun intended).š Semantic
clients would consider it to be just as much an identifier as the
unproxied LSID Now consider how the example you were giving would need
to be handled in RDF.š I am extrapolating here because as I said, there
is no "UUID Applicability guide".š To handle all of the identifiers you
listed:
A. A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
B. urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
C.http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
D. http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
E.http://lsid.tdwg.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
F. http://www.google.com/search?q=Danaus+plexippus+(Linnaeus+1758)
G.http://lsid.tdwg.org/summary/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
H.
http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523&submit=Go
I.
http://zoobank.org/?lsid=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
J.
http://zoobank.org/?id=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
K. http://zoobank.org/?id=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
L. http://zoobank.org/?uuid=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
one would write this:
š <rdf:Description
rdf:about="urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">
ššš
<dc:identifier>A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</dc:identifier>
ššš <owl:sameAs
rdf:resource="http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš <owl:sameAs
rdf:resource="http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš <owl:sameAs
rdf:resource="http://lsid.tdwg.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš <owl:sameAs
rdf:resource="http://www.google.com/search?q=Danaus+plexippus+(Linnaeus+1758)"/>
ššš <owl:sameAs
rdf:resource="http://lsid.tdwg.org/summary/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš <owl:sameAs
rdf:resource="http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523&submit=Go"/>
ššš <owl:sameAs
rdf:resource="http://zoobank.org/?lsid=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš <owl:sameAs
rdf:resource="http://zoobank.org/?id=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš <owl:sameAs
rdf:resource="http://zoobank.org/?id=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš <owl:sameAs
rdf:resource="http://zoobank.org/?uuid=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"/>
ššš ...
š </rdf:Description>
Note that it would not be necessary (nor in my opinion a good idea) to
use the LSID in the rdf:about attribute.š Any of the 10 HTTP URIs could
have been switched with it.š (Well, the google.com one really shouldn't
be there because it represents a web page, not a name.)š However the
UUID can NOT be used in the rdf:about attribute, nor can it be used in
an rdf:resource attribute.š From the standpoint of the RDF, it has no
use as an identifier that the client can "understand" (i.e. use as a
subject or object of any object property).š
I don't think you were seriously suggesting that all 12 of the
identifiers on the list would actually be used in "real life".š You
were making a point about how a UUID could be made actionable.š But my
point is that you simply cannot meet the requirements of the GUID A.S.
with ONLY a UUID.š You MUST have an HTTP proxied version of it in order
to "do the right thing" (i.e. GUID A.S. rec 10) and provide metadata in
the form of RDF serialized as XML.š That HTTP proxied version isn't just
going to be seen as a "resolution mechanism".š It is going to be
the ONLY identifier of any relevance in terms of the operation of the
RDF which will see the UUID in the dc:identifier property as nothing
more than a string literal.š If you and GNUB are going to participate
in BiSciCol as I understand it to be developing (and I believe that you
are), you will HAVE to have an HTTP URI version of your UUIDs and in
that context the raw UUID will be relatively irrelevant.š
My point is that you should decide on just one of these HTTP URIs and
use that as your identifier when you communicate with the outside
world.š My preference would be
"http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" as the
shortest and least complex one that would do everything that needs to
get done.š I guess that there isn't problem with the other nine
existing, but from my point of view there is nothing but harm to be
done by exposing them to the outside world.š If you do, there is a
chance that people will think that you intend for them to be an HTTP
URI GUIDš for the object and you will be stuck forever having to put
owl:sameAs statements about them in your RDF.š You noted that the GUID
A.S. says about UUIDs: "This form of GUID is perfectly acceptable for
uniquely identifying data objects within a dataset."š I would put
emphasis on the word "within".š Outside of that dataset, the UUID is
not as useful as its HTTP proxied version.šš You could (from the
standpoint of the outside world) refer to your object by both
"http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" and
"A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523", or you could ONLY refer to your
object in the outside world as
"http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523".š You can't
ONLY refer to your object to the outside world as
"A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" and describe it in RDF.š From
this point of view, why would you want to expose two identifiers when
you only need to expose one?š This is what I meant when I said you
should just pick one and stick with it.š
The other point which I was trying to make is: why would you choose to
expose to the outside world an identifier that only does part of the
desirable things that we want (i.e. my list of 8 desirable attributes
of a GUID), when you could use a modification of that identifier that
would do everything you want?š You mention how GUIDs for names are
primarily of interest to machines.š That is undoubtedly true.š But with
virtually no additional cost (15 minutes of time from somebody who
knows how to create a single 3 kB XSLT file) an HTTP URI GUID could
resolve to something readable by humans in additional to the more
useful machine-readable RDF/XML.š
I would assert the same thing about LSIDs.š Why would you create in
identifier that is part of (what seems to me to be universally
recognized as) a dead technology when you could create a simpler HTTP
URI that would do the same thing and potentially more?š In the case of
uBio and Biodiversity Collections Index, they were set up when LSIDs
were believed to be the "Next Big Thing".š That did not turn out to be
the case, so those organizations are stuck with painful HTTP URIs like
"http://biocol.org/urn:lsid:biocol.org:col:35115" and
"http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:9479554"
when they could have had "http://biocol.org/35115" and
"http://www.ubio.org/9479554".š I would say "lesson learned" - we know
how to construct good HTTP URI GUIDs that will do everything people
want so why not just do that?š If it turns out that Linked Data and the
Semantic Web are also "The Next Big Thing" that turns out to be a flop,
we still have globally unique strings that are not actionable.š But I
think that the demonstrations of multiple members of our community show
that at least to some degree LOD/Semantic Web technologies "work" and
can be implemented by almost anybody.š
You said:
Here is where I completely disagree. I've said it before, and I'll keep
saying it: GUIDs are (should be) intended and necessary for
computer-computer communication; *NOT*for human-computer or human-human
communication. Their beauty or ugliness should be determined by what's
beautiful or ugly to a computer, not to a human. A consistent 128 bits is
"beautiful" to a computer, but a UUID is ugly to a human; whereas " Danaus
plexippus (Linnaeus 1758)" is beautiful to a human, but ugly to a computer
(for reasons Dima already outlined).
More fundamentally, one lesson of history that seems to be perpetually
repeated is the mistake of encoding human-interpretable information into
what is intended to be a stable, permanent identifier. INEVITABLY, a system
that uses human-interpretable information as identifiers will include some
fraction of instances where the human-interpretable part is somehow "wrong"
(e.g., the user entered a Cyrillic "Á" was accitdentally entered instead of
a latin "a", or a typographic error in a scientific name, or worst of all,
the assignement of a text-string name to a homonym due to a mix-up in
authorship). The temptation to "fix" those "wrong" values is enormous. And,
of course, by "fixing" them, permanence is broken.
Almost by definition, then, a "beautiful" identifier for computer-computer
communication should be "ugly" to a pair of human eyeballs.
I disagree with you completely here.š If you haven't read the "Cool
URIs" piece, you should before we talk about this more.š It is full of
examples that are easy to read and type and are intended to be
"understood" by both humans and computers.š The piece at
http://www.w3.org/Provider/Style/URI is an even easier read.š GUIDs CAN
be easy to "read" and type, although they don't have to be.š The degree
to which it "matters" whether a GUID is human readable or not depends
primarily on the likelihood that humans will see it in print or type it
in the URL box of a web browser.š In the examples of GUIDs for names
that you provided, I will agree that it's not very likely that humans
will be seeing them.š But if the GUID is of a specimen, an image, or a
tree (which could easily appear in print or be written down by somebody
to look at its web page), I would argue that readability is desirable,
e.g. http://bioimages.vanderbilt.edu/uncg/966 .š I realize that
everyone does not agree with me on this, particularly the fans of
UUIDs.š As far as I know, there isn't any rule about what characters
should be in an HTTP URI.š But there is a general understanding that it
is a best practice that an HTTP URI that is intended as an identifier
should do content negotiation and produce both HTML for humans and RDF
for machines.š
[lots of stuff cut out here that will have to wait for another email]
Errr..sort of. I say we identify things using GUIDs, and provide services
that resolve those GUIDs via actionable HTTP URIs (or, if you prefer,
embedding those GUIDs within a resolution metadata "wrapper"). Yes, I know
it's all the rage to collapse the functions of actionability and globally
unique identification into the same text-string URI (what I've been
referring to as the TB-L perspective). But to be perfectly blunt, I see
this as a mistake that will, in the long run, sow down our progress.
Why does this slow down our progress?š I don't get that at all.š I see
your viewpoint as the one impeding progress because non-HTTP GUIDs make
it difficult or impossible to describe things in RDF.
...
Agreed! I think when we distill this entire exchange, we'll find that we
have slightly different interpretations about what the GUID applicability
statement actually says & means, and a non-trivial amount of
miscommunication, but otherwise (as was the case the last time we had such a
voluminous exchange), we're actually more on the same page than not.
I'm sure this is probably the case!š I hope that I am not coming across
as rude or disrespectful in this kind of discussion.š When I question
your statements and those of others, I expect to often be shown to be
wrong and learn from the experience.š I also expect that my statements
will be subjected to the same scrutiny and criticism that I may dish
out! :-)
So I'm actually pretty optimistic about the whole venture
assuming that we can get people and organizations to
actually read and try to follow the standards that we
have already agreed upon.
I think it's nice to end this email on a point of strong agreement!
Likewise!
Steve
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
Associate Zoologist in Ichthyology
Dive Safety Officer
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef@bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu