[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time
joel sachs
jsachs at csee.umbc.edu
Thu Jan 6 16:31:32 CET 2011
On Thu, 6 Jan 2011, Steve Baskauf wrote:
> A quick comment about ubio.org's LSID resolution: I took a look at the
> complete RDF source and noticed that the namespace declarations for "ubi:"
> and "gla:" don't start with "http://". I'm pretty sure that isn't kosher
> RDF,
It is kosher. Linked Data best practice is to use only http URIs, but RDF
resources can be identified by any type of URI reference [1]. In my wild
idea talk in Montpellier, I advocated incorporating non-http URIs
(specifically LSIDs) into Linked Data. Tim Finin and I later fleshed this
idea out slightly [2]. Backtracking, I'm not sure lsids *should* be
part of Linked Data, but I'm certain that they *can* be.
Joel.
1. http://www.w3.org/TR/rdf-schema/
2. http://ebiquity.umbc.edu/paper/html/id/485/What-Does-it-Mean-for-a-URI-to-Resolve-
> although neither of the RDF validators complained about it (doesn't give
> me much confidence in RDF validators). So much for being overly optimistic
> about there being one source that worked completely. :-)
>
> I should say that I'm not an advocate of LSIDs. I actually don't like them
> at all. I simply investigated them in this context as something that might
> work in a Linked Data context. I should also say that I'm not necessarily an
> advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give
> it a chance but I'm not betting the store on it. I AM an advocate of having
> identifiers that are persistent and globally unique and it appears that both
> Catalog of Life and IPNI (in my view) flunk the persistence test if you just
> look at the URI (LSID with version) as a string (which you SHOULD be able to
> do) that can be an unchanging identifier. We desperately need persistent and
> globally unique identifiers for a lot of things. If they resolve to RDF/XML
> then that's a bonus. We need an Emperor, clothing is somewhat optional.
>
> A few specific responses:
>
> Roderic Page wrote:
>> These issues won't go away simply by replacing LSIDs with HTTP URIs. Some
>> of the linked data sources go belly up fairly regularly (notably
>> http://bio2rdf.org ).
>>
> The reason I see HTTP URIs [capable of producing either HTML or RDF/XML
> through content negotiation] as something of greater value than LSIDs is that
> people can at a minimum get a web page out of it. That's the cake. If some
> people can also use them to get RDF for the Linked Data dream, that's the
> icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact
> that regular web URLs fail to work. That doesn't stop people from using the
> web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/
> and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
>> IPNI supports versioning of LSIDs, but the LSID without the version is
>> also valid (it resolves to the latest version). I think this is a
>> perfectly valid thing to do. Versioning matters to some, but not everyone
>> cares about it, IPNI supports both views. If you want to cite a specific
>> version, do so, if not, don't.
>>
> OK, I guess as long as people can just leave the version off if they want,
> they'll get unchanging strings for their identifiers and so I guess that
> moves them into the "usable" column. But as far as I'm concerned, putting
> the versions on adds to the confusion. If I can say this without launching
> an unnecessary thread about the opacity of GUIDs, nobody is supposed to be
> looking at any GUID itself to infer meaning about the identified object.
> That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be
> suggesting that people do with the LSID version numbers they are tacking on.
> Get rid of them.
>> The data/metadata distinction is a complete red herring and has side-
>> tracked more LSID conversations than I care to remember. The distinction
>> stems from LSIDs originally being conceived as identifiers for large data
>> objects (e.g., sequences, images, binary streams) that people want
>> accurately versioned so that could reproduce digital experiments. In this
>> scenario, metadata can change because it's not what mattered for this
>> reproducibility. Some people think of data about taxonomic names as
>> "metadata", and then we're off on a wild goose chase about should LSIDs
>> change when metadata changes. I wouldn't loose any sleep over this (see
>> discussion of IPNI above).
>>
> Agreed. The important thing is that the identifier for a certain "thing"
> (however the "thing" is defined) should not change. I have previously
> expressed the opinion that it won't really work to have any URI (sensu Linked
> Data) point to "data" and that all URIs serving as GUIDs should be considered
> to reference non-information resources (which by definition have only
> "metadata"). Under that scenario, all resource properties are "metadata" and
> subject to change. The data vs. metadata argument then becomes irrelevant.
>> Regarding DOIs for books, these are uncommon, although there are moves to
>> express ISBNs within the DOI framework, so there may be more of these. But
>> for a DOI to exist someone has to claim ownership of a resource and
>> register the DOI with CrossRef. For a lot of older literature there won't
>> be a publisher around to do this, but that's where BHL comes in.
>>
> Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the
> Linked Data world] to identify your resources. If you don't want to provide
> RDF/XML, that's fine for now. You can wait and see if that is necessary
> later. But at least create URIs that COULD be used for Linked Data in the
> future and that are not long and loaded with "?" and "&" characters. I don't
> know how to do that because I'm not a server dude. But it apparently isn't
> that hard to do with a mod rewrite. I know there are at least three people
> (probably more) on this list who do that routinely - Rod does it with his
> bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022
> gets written into "ugly" URL
> http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as
> long as the "cool" URI doesn't change).
> My two cents worth...
> Steve
>> In summary, we're in a mess, and I don't think this is really down to
>> technology. It's a failure of our community to create the appropriate
>> resources (e.g., centralised, curated resources of identifiers and
>> associated metadata for names, publications, and specimens).
>>
>> Regards
>>
>> Rod
>>
>>
>>
>> On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
>>
>>
>>> Well, I have continued my quest for resolvable, RDF-producing GUIDs for
>>> taxon/name-related stuff. I have gotten a lot of good information from
>>> reading Rod Page's BMC Bioinformatics paper
>>> (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating
>>> his http://bioguid.info/ site.
>>>
>>> >From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
>>> based on the recent discussion, it sounds like the DIO solution for
>>> publications is a good direction to go IF resolution services producing
>>> RDF comes into existence and IF it becomes possible to actually search
>>> for the DIOs of more obscure publications. I tried using Rod's site to
>>> look up a journal article using the ISSN, volume, and page and the web
>>> interface found the DOI and generated RDF just fine. However, an
>>> attempt to use the web to find the DIO of Gleason and Cronquist's Manual
>>> of vascular plants of Northeastern United States and adjacent Canada
>>> failed despite a half hour of effort (I found the UPC, the LOC call
>>> number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it
>>> but there should be a way for me to know that. So DOIs for books and
>>> old journal articles are not really ready for prime time.
>>>
>>> >From the standpoint of the "scientific name" part of a TNU/taxon
>>> concept, I had better luck (sort of). Rod's "Status of biodiversity
>>> services" page (http://www.bioguid.info/status/) was really cool. I saw
>>> resources I hadn't known about before. I tried out several of the
>>> services that claimed to issue LSIDs.
>>>
>>> Catalog of Life's LSIDs didn't work with either the
>>> http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a
>>> web browser or the OpenLink RDF browser. I only got an empty RDF
>>> element in response.
>>>
>>> Index Fungorum was down.
>>>
>>> IPNI seemed to work. However, I was somewhat appalled to observe that
>>> they seem to change the revision identifier any time that they change
>>> any part of the metadata. That renders the LSID useless as a permanent
>>> GUID for the name and I believe is inconsistent with the design of LSIDs
>>> where the revision is only supposed to change if the underlying data
>>> itself (NOT metadata) changes. (Catalog of Life says that they change
>>> the revision identifier EACH YEAR for all of their records! That's even
>>> worse!) If I'm remembering the TDWG LSID recommendations, it is not
>>> even recommended to use the revision part of an LSID at all in the
>>> biodiversity informatics context.
>>>
>>> ubio.org's LSIDs seemed to work properly.
>>>
>>> [sorry - didn't try zoobank since I was looking for plants]
>>>
>>> I don't know which (if any) of the Web sites listed on Rod's status page
>>> use generic HTTP URI guids (rather than LSIDs) to refer to taxon names.
>>> I tried out the Global Names index that Pete was mentioning. The URI
>>> version of the UUIDs (e.g.
>>> http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f)
>>> do resolve under content negotiation, but the only useful information
>>> that the RDF representation seems to provide is the actual name string
>>> that was used to generate the UUID. Until some other useful linked
>>> information is added to the RDF, there doesn't seem to be much advantage
>>> in pointing a semantic client to the URI over just using a string
>>> literal for the name.
>>>
>>> So the bottom line is that of the LSID services for names that I've
>>> tried so far, only ubio.org seems to have LSIDs for names that are
>>> unchanging, can work as a proxied URI, and that produce actual useful
>>> RDF. That's pretty disappointing given the apparently huge amount of
>>> work that's been put into building these various systems.
>>>
>>> Steve
>>>
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>>
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN 37235-1634, U.S.A.
>>>
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>>
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582, fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
>>>
>>
>> ---------------------------------------------------------
>> Roderic Page
>> Professor of Taxonomy
>> Institute of Biodiversity, Animal Health and Comparative Medicine
>> College of Medical, Veterinary and Life Sciences
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QQ, UK
>>
>> Email: r.page at bio.gla.ac.uk
>> Tel: +44 141 330 4778
>> Fax: +44 141 330 2792
>> AIM: rodpage1962 at aim.com
>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>> Twitter: http://twitter.com/rdmpage
>> Blog: http://iphylo.blogspot.com
>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>
>>
>>
>>
>>
>>
>>
>> .
>>
>>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN 37235-1634, U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582, fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>
>
More information about the tdwg-content
mailing list