[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time
steve.baskauf at vanderbilt.edu
Thu Jan 6 17:35:56 CET 2011
Rod and Joel,
I stand corrected on all points. Thanks for the clarification.
Roderic Page wrote:
> Dear Steve,
> On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
>> A quick comment about ubio.org's LSID resolution: I took a look at
>> the complete RDF source and noticed that the namespace declarations
>> for "ubi:" and "gla:" don't start with "http://". I'm pretty sure
>> that isn't kosher RDF, although neither of the RDF validators
>> complained about it (doesn't give me much confidence in RDF
>> validators). So much for being overly optimistic about there being
>> one source that worked completely. :-)
> I don't think RDF namespaces need to be HTTP URIs. There is an
> expectation by most RDF clients that namespaces are resolvable via
> HTTP URIs and will return RDF, but this need not be true (some well
> known namespaces such as http://search.yahoo.com/mrss/
> and http://www.georss.org/georss/ don't resolve to RDF). So, ASFAIK
> LSIDs are perfectly valid, if "unfriendly".
>> I should say that I'm not an advocate of LSIDs. I actually don't
>> like them at all. I simply investigated them in this context as
>> something that might work in a Linked Data context. I should also
>> say that I'm not necessarily an advocate of Linked Data - I'm in the
>> "wait and see" camp. I'd like to give it a chance but I'm not
>> betting the store on it. I AM an advocate of having identifiers that
>> are persistent and globally unique and it appears that both Catalog
>> of Life and IPNI (in my view) flunk the persistence test if you just
>> look at the URI (LSID with version) as a string (which you SHOULD be
>> able to do) that can be an unchanging identifier. We desperately
>> need persistent and globally unique identifiers for a lot of things.
>> If they resolve to RDF/XML then that's a bonus. We need an Emperor,
>> clothing is somewhat optional.
>> A few specific responses:
>> Roderic Page wrote:
>>> These issues won't go away simply by replacing LSIDs with HTTP URIs.
>>> Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org
>> The reason I see HTTP URIs [capable of producing either HTML or
>> RDF/XML through content negotiation] as something of greater value
>> than LSIDs is that people can at a minimum get a web page out of it.
>> That's the cake. If some people can also use them to get RDF for the
>> Linked Data dream, that's the icing. The fact that HTTP URI GUIDs
>> fail to "work" is similar to the fact that regular web URLs fail to
>> work. That doesn't stop people from using the web and it shouldn't
>> stop well designed (i.e. http://www.w3.org/TR/cooluris/ and
>> http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
> I'm not disagreeing at all, simply saying that replacing LSIDs with
> HTTP URIs (or anything else) doesn't magic away the issue of keeping
> the identifiers "live".
>>> IPNI supports versioning of LSIDs, but the LSID without the version is
>>> also valid (it resolves to the latest version). I think this is a
>>> perfectly valid thing to do. Versioning matters to some, but not
>>> everyone cares about it, IPNI supports both views. If you want to cite
>>> a specific version, do so, if not, don't.
>> OK, I guess as long as people can just leave the version off if they
>> want, they'll get unchanging strings for their identifiers and so I
>> guess that moves them into the "usable" column. But as far as I'm
>> concerned, putting the versions on adds to the confusion. If I can
>> say this without launching an unnecessary thread about the opacity of
>> GUIDs, nobody is supposed to be looking at any GUID itself to infer
>> meaning about the identified object. That "no-no" seems to be
>> exactly what Catalog of Life and IPNI seem to be suggesting that
>> people do with the LSID version numbers they are tacking on. Get rid
>> of them.
> Version numbers are optional and are part of the LSID spec. Given that
> the unversioned ones work I don't see a huge problem here, especially
> as the first thing people will say regarding any identifier is "what
> happens if my data changes?" Versioning is there if you need it, if
> you don't need it, don't use it.
> Opacity is another red herring. Does anybody know any truly opaque
> identifiers, ones about which I can't infer anything, not even that
> its an identifier? In point of fact, most real world identifiers are
> laden with embedded semantics (including checksums, etc.).
> Furthermore, any recent discussion of good URL structure pretty much
> converges on making them clean, human-readable, and hackable.
>>> The data/metadata distinction is a complete red herring and has side-
>>> tracked more LSID conversations than I care to remember. The
>>> distinction stems from LSIDs originally being conceived as identifiers
>>> for large data objects (e.g., sequences, images, binary streams) that
>>> people want accurately versioned so that could reproduce digital
>>> experiments. In this scenario, metadata can change because it's not
>>> what mattered for this reproducibility. Some people think of data
>>> about taxonomic names as "metadata", and then we're off on a wild
>>> goose chase about should LSIDs change when metadata changes. I
>>> wouldn't loose any sleep over this (see discussion of IPNI above).
>> Agreed. The important thing is that the identifier for a certain
>> "thing" (however the "thing" is defined) should not change. I have
>> previously expressed the opinion that it won't really work to have
>> any URI (sensu Linked Data) point to "data" and that all URIs serving
>> as GUIDs should be considered to reference non-information resources
>> (which by definition have only "metadata"). Under that scenario, all
>> resource properties are "metadata" and subject to change. The data
>> vs. metadata argument then becomes irrelevant.
>>> Regarding DOIs for books, these are uncommon, although there are moves
>>> to express ISBNs within the DOI framework, so there may be more of
>>> these. But for a DOI to exist someone has to claim ownership of a
>>> resource and register the DOI with CrossRef. For a lot of older
>>> literature there won't be a publisher around to do this, but that's
>>> where BHL comes in.
>> Then PLEASE BHL, create simple, unchanging URIs [suitable for use in
>> the Linked Data world] to identify your resources. If you don't want
>> to provide RDF/XML, that's fine for now. You can wait and see if
>> that is necessary later. But at least create URIs that COULD be used
>> for Linked Data in the future and that are not long and loaded with
>> "?" and "&" characters. I don't know how to do that because I'm not
>> a server dude. But it apparently isn't that hard to do with a mod
>> rewrite. I know there are at least three people (probably more) on
>> this list who do that routinely - Rod does it with his bioguid.info
>> site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets
>> written into "ugly" URL
>> http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody
>> cares as long as the "cool" URI doesn't change).
> BHL does have simple, clean, stable URIs, such
> as http://www.biodiversitylibrary.org/page/26246005 for a
> page, http://www.biodiversitylibrary.org/item/84644 for an item, and
> so on.
>> My two cents worth...
>>> In summary, we're in a mess, and I don't think this is really down to
>>> technology. It's a failure of our community to create the appropriate
>>> resources (e.g., centralised, curated resources of identifiers and
>>> associated metadata for names, publications, and specimens).
>>> On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
>>>> Well, I have continued my quest for resolvable, RDF-producing GUIDs
>>>> taxon/name-related stuff. I have gotten a lot of good information
>>>> reading Rod Page's BMC Bioinformatics paper
>>>> (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating
>>>> his http://bioguid.info/ site.
>>>> >From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
>>>> based on the recent discussion, it sounds like the DIO solution for
>>>> publications is a good direction to go IF resolution services
>>>> RDF comes into existence and IF it becomes possible to actually search
>>>> for the DIOs of more obscure publications. I tried using Rod's site
>>>> look up a journal article using the ISSN, volume, and page and the web
>>>> interface found the DOI and generated RDF just fine. However, an
>>>> attempt to use the web to find the DIO of Gleason and Cronquist's
>>>> of vascular plants of Northeastern United States and adjacent Canada
>>>> failed despite a half hour of effort (I found the UPC, the LOC call
>>>> number, and the ISBN, but no DOI). Maybe there just isn't a DOI for
>>>> but there should be a way for me to know that. So DOIs for books and
>>>> old journal articles are not really ready for prime time.
>>>> >From the standpoint of the "scientific name" part of a TNU/taxon
>>>> concept, I had better luck (sort of). Rod's "Status of biodiversity
>>>> services" page (http://www.bioguid.info/status/) was really cool. I
>>>> resources I hadn't known about before. I tried out several of the
>>>> services that claimed to issue LSIDs.
>>>> Catalog of Life's LSIDs didn't work with either the
>>>> http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with
>>>> either a
>>>> web browser or the OpenLink RDF browser. I only got an empty RDF
>>>> element in response.
>>>> Index Fungorum was down.
>>>> IPNI seemed to work. However, I was somewhat appalled to observe that
>>>> they seem to change the revision identifier any time that they change
>>>> any part of the metadata. That renders the LSID useless as a
>>>> GUID for the name and I believe is inconsistent with the design of
>>>> where the revision is only supposed to change if the underlying data
>>>> itself (NOT metadata) changes. (Catalog of Life says that they change
>>>> the revision identifier EACH YEAR for all of their records! That's
>>>> worse!) If I'm remembering the TDWG LSID recommendations, it is not
>>>> even recommended to use the revision part of an LSID at all in the
>>>> biodiversity informatics context.
>>>> ubio.org's LSIDs seemed to work properly.
>>>> [sorry - didn't try zoobank since I was looking for plants]
>>>> I don't know which (if any) of the Web sites listed on Rod's status
>>>> use generic HTTP URI guids (rather than LSIDs) to refer to taxon
>>>> I tried out the Global Names index that Pete was mentioning. The URI
>>>> version of the UUIDs (e.g.
>>>> do resolve under content negotiation, but the only useful information
>>>> that the RDF representation seems to provide is the actual name string
>>>> that was used to generate the UUID. Until some other useful linked
>>>> information is added to the RDF, there doesn't seem to be much
>>>> in pointing a semantic client to the URI over just using a string
>>>> literal for the name.
>>>> So the bottom line is that of the LSID services for names that I've
>>>> tried so far, only ubio.org seems to have LSIDs for names that are
>>>> unchanging, can work as a proxied URI, and that produce actual useful
>>>> RDF. That's pretty disappointing given the apparently huge amount of
>>>> work that's been put into building these various systems.
>>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>>> Vanderbilt University Dept. of Biological Sciences
>>>> postal mail address:
>>>> VU Station B 351634
>>>> Nashville, TN 37235-1634, U.S.A.
>>>> delivery address:
>>>> 2125 Stevenson Center
>>>> 1161 21st Ave., S.
>>>> Nashville, TN 37235
>>>> office: 2128 Stevenson Center
>>>> phone: (615) 343-4582, fax: (615) 343-6707
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>> Roderic Page
>>> Professor of Taxonomy
>>> Institute of Biodiversity, Animal Health and Comparative Medicine
>>> College of Medical, Veterinary and Life Sciences
>>> Graham Kerr Building
>>> University of Glasgow
>>> Glasgow G12 8QQ, UK
>>> Email: r.page at bio.gla.ac.uk
>>> Tel: +44 141 330 4778
>>> Fax: +44 141 330 2792
>>> AIM: rodpage1962 at aim.com
>>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>>> Twitter: http://twitter.com/rdmpage
>>> Blog: http://iphylo.blogspot.com
>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN 37235-1634, U.S.A.
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582, fax: (615) 343-6707
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
> Email: r.page at bio.gla.ac.uk <mailto:r.page at bio.gla.ac.uk>
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> AIM: rodpage1962 at aim.com <mailto:rodpage1962 at aim.com>
> Facebook: http://www.facebook.com/profile.php?id=1112517192
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tdwg-content