Thank you Gregor for very succinctly expressing what I think is the important take-home message in this discussion! I think that one of the things that makes me want to scream and run away from TDWG is the excessive "point making" that goes on on this list. I hesitate to make the following statement because somebody is going to find an earlier email of mine and find one that I wrote "just to make a point". But I would like to believe the following is true: I am not participating in TDWG for intellectual stimulation, social networking, career advancement, or entertainment. I am participating because I believe that it will help me achieve some useful result in a reasonable amount of time.
Given that axiom, I personally don't care very much how many "correct" ways there are of creating identifiers that the will "officially" work in RDF or what clever possible future technology or URI schemes Android might or might not be creating. I will happily allow Rich to call the UUID version of his identifiers a "GUID" and the HTTP proxided version "Rumpelstiltskin" while I call the HTTP proxied version the "GUID" and the UUID "a string". That simply does not matter. What does matter is that after the years of time that TDWG has spent spinning its wheels on the issue of GUIDs, we finally have have a system (HTTP URIs, HTTP as a universally understood information transfer method, and RDF as a lingua franca for marking up metadata) that is implementable for creating a distributed system for unambiguously identifying and transferring information about biodiversity resources (the "dream" laid out in the old TAG roadmaps). Not only is that system implementable, but it HAS been very successfully implemented by people within our community and is increasingly being more broadly implemented outside our community. We have people on this list (mostly silent, but I know they are there because I sometimes get off-list emails from them) who don't know how to do the things that the "experts" know how to do and they are coming here for information and advice on how to actually make things work according to the standards TDWG has established. Given that, I consider it extremely helpful to have examples of implementations that actually "work" right now, and not particularly helpful to have examples of things that could be done but would be a bad idea, or which can't be implemented in a finite amount of time, or that might be done at some point in the future but that don't actually work in the present. I would suggest that we keep that at the forefront in our minds when we post.
I do apologize for my part in the long email exchanges with Rich which some might consider tedious, but in the end I think they have produced some useful results. From Rich's last post, I now know that he intends for http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF4... to be the widely circulated http proxied form of his zoobank UUIDs. Given that zoobank issued LSIDs, it is doing the right thing to maintain them even if nobody uses them. So from the perspective of offering a useful example, I would suggest that
<rdf:Description rdf:about="urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">
dcterms:identifierA9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</dcterms:identifier> <owl:sameAs rdf:resource="http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF4... ... </rdf:Description>
would be a way to mark up the information about his various identifiers that would be both "correct" (in the sense of not breaking any rules about RDF) and also useful as a template for people who want to give LOD a chance. I looked up dcterms:identifier to see what the definition was. It says "An unambiguous reference to the resource within a given context. ... Recommended best practice is to identify the resource by means of a string conforming to a formal identification system." The bugaboo here is "formal identification system" and whether UUID fits that definition. But I would venture to say that one could "get away with it" and it would achieve Rich's goal of letting the universe in hundreds of years know "this is the identifier that I intend for that object". Because XML is just plain text, the RDF file would not have to be considered to be part of a magically actionable system. It could also just be read as a marked up plain text file and dcterms are about the most well-known and stable thing we have at the moment for imparting information about what we intend things to mean. However, rdf:about and owl:sameAs statements would also make either the HTTP URI or the LSID work in the "here and now" of Linked Data. Including the HTTP URI version would allow a semantic client to "look up" information through the existing network system (i.e. using HTTP protocol) assuming that Rich gets the zoobank system to return content-type=rdf+xml when that is asked for by a semantic client rather than always HTML.
Further, in the interest of achieving what Gregor so clearly stated, I would recommend (beg on my knees?) that when GNUB is set up (assuming that it uses UUIDs) that it creates a single, simple HTTP URI proxied form of the UUID (another GUID or a Rumpelstiltskin if you prefer) that can be used by those who want to give LOD a chance. The domain name should be something that is intended to persist for a very long time (purl.org would allow maintenance to be transferred but I personally don't care). The RDF for the TNUs could then look something like this:
<rdf:Description rdf:about="http://purl.org/tnu/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41524%22%3E
dcterms:identifierA9F435E0-8ED7-46DD-BAB4-EA8E5BF41524</dcterms:identifier> ... </rdf:Description>
(replace purl.org/tnu with domain name of your choice). Assuming that interest in LSIDs is as low as it seems to be, I would just skip the hassle of messing with them. All of your RDF served would then be one line shorter and do what Gregor suggested (minimize the number of IDs) as well.
Also in the interest of providing examples, I mentioned the XSLT option for simple human-friendly content negotiation. Here is how I did it. I made a single 3 kb XSL stylesheet (XSLT) file: http://bioimages.vanderbilt.edu/taxon/taxonconcepts.xsl which is sitting in the same directory as the rdf files. Then each RDF file contains
<?xml-stylesheet type="text/xsl" href="taxonconcepts.xsl"?>
right after the
<?xml version="1.0" encoding="UTF-8"?>
line. The server is set up so that when a URI like http://bioimages.vanderbilt.edu/taxon/19422-weakley2010 is dereferenced, the client is sent the file http://bioimages.vanderbilt.edu/taxon/19422-weakley2010.rdf regardless of the content-type requested. So a semantic client gets the RDF/XML and a web browser formats the XML for humans according to the XSL stylesheet. I call this "poor man's content negotiation" because it requires virtually no maintenance or sophisticated server resources. One does have to maintain a consistent RDF structure because the XSLT is a "dumb" static file, but if your RDF is being generated systematically, it will probably have a consistent format anyway. It also means that a human has to use "view page source" to look at the underlying RDF, but 99% of human clients won't care about that anyway, and the 1% that does care will probably know how to view the page source anyway. I want to be clear here that what I am trying to show in this example is NOT anything about taxon concepts, proper RDF format, the correctness of Darwin-SW, etc. or to say that this is the only, best, or most proper way to achieve content negotiation. What I AM trying to show is that if you are going to provide RDF for computers, there is virtually no additional cost to also providing a human readable version. I am essentially a computer dummy. I went to a bookstore and bought a book on XSLT and wrote the file myself. If I can do that, then any organization who has a "real" computer person on their staff could accomplish this as well and I don't see any reason NOT to do it, even if the information provided is intended primarily for computer to computer communication.
Steve
Gregor Hagedorn wrote:
While I generally accept Bob's careful research, and while I think it is imperative that multiple IDs are allowed in principle, as to avoid monopolies, my feeling is:
- The number of IDs should be minimized.
- The http:-URI is defacto the most relevant ID in the semantic web.
- Avoid multiple alternative ways of embedding a UUID-string in a
resolvable URI. All forms have to be added as sameAs information. They become ballast for future generations.
That is: take the http-ID serious as a resource requiring long-term management and persistence.
Finally: Success is measured by the adoption by people not visiting tdwg meetings. This is a social issue.
The last point is my main reservation about the TDWG Applicability Statement for GUID's.
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content