most GUIDs/URIs for names/taxon stuff not ready for prime time
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon concept, based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either the http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context.
ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g. http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes.
It is my understanding that this is exactly how LSID's are supposed to work and one reason I don't like them.
[sorry - didn't try zoobank since I was looking for plants]
I have tried ZooBank and those LSID's seem to work.
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g.
http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f ) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
Yes, this is one reason this part of the GNI is still experimental, but the idea is that these can be further developed to expose RDF that
1) Indicates what contributing databases contain that namestring
2) Links to the resolution group which contains all the variations of that name string (It might make the most sense to have the linkto's there)
3) These will then be tied to various types of TaxonConcepts, TNU's etc.
I can't speak for the GNI or GNA but my understanding is that the initial goal is to get all the names used into one very large repository.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
I think that part of the problem is that LSID's are really only used and advocated by a small group within the larger informatics community.
They do not benefit from all the related tools and code that already exists for the general semantic web and linked data community.
In addition, some of the problems you experience are the result of the design of LSID's - to add versioning they lost stability.
The assumption is that you will write the tools that understand that LSIDs are different from LOD identifiers and process them accordingly.
How has that assumption held up?
Respectfully,
- Pete
--------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/ ------------------------------------------------------------
Peter DeVries wrote:
that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
Yes, this is one reason this part of the GNI is still experimental, but the idea is that these can be further developed to expose RDF that
- Indicates what contributing databases contain that namestring
- Links to the resolution group which contains all the variations of
that name string (It might make the most sense to have the linkto's there) 3) These will then be tied to various types of TaxonConcepts, TNU's etc. I can't speak for the GNI or GNA but my understanding is that the initial goal is to get all the names used into one very large repository.
Good luck, I'll stay tuned. But I stick with my assessment of "not currently ready for prime time". Steve
On 06/01/2011, at 6:16 PM, Peter DeVries wrote:
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes.
It is my understanding that this is exactly how LSID's are supposed to work and one reason I don't like them.
My understanding is that this is not the case, and I'd suggest that it is possible that the approach taken by this resolution service may be not entirely correct. The specification mandates "It is allowed to have an LSID representing [I'd have preferred 'identifying'] abstract entities or concepts. If an LSID represents real data, the LSID resolution service must resolve always the same set [I'd have preferred the word 'sequence'] of bytes representing the data".
To make sense of this, we must discuss the distinction between 'data' and 'metadata' in this context. This distinction is the same as the linked data "information resource"/"other resource" distinction and is old news for a lot of people here.
A linked data URI or an LSID may serve as the name of a real-world thing: a specimen, Fluffy the Tiger at London Zoo, the idea of 'Fabaceae' expressed in Mabberley's Plant Book Edn 3. In those cases, we cannot stuff the actual real-world thing down the fibre-optic pipe: it either won't fit (in the case of fluffy the tiger) or is rather too abstract (in the case of a taxonomic name).
However, we do have an enormous amount of information *about* the things identified by that LSID or URI, and we can serve it up as RDF. We might provide you with Fluffy's weight and age, we might provide you the name's parts, some accepted representations of it, and the id of its publication.
For things like this, there is no requirement that the result you get back be byte-for-byte identical. As Peter points out - that is actually a bit pointless.
On the other hand, we may have documents and media which most certainly can be stuffed down the pipe: pdfs, audio clips, what have you. These things are LSID "data", linked data "information resources". The requirement in the world of LSIDs is that the data must always be byte-for-byte identical, and that's where version numbers come into play.
The point of confusion is that the RDF metadata is also "stuff that can be put down the pipe" - you could understand it as data rather than metadata if you chose. The crucial point is that urn:lsid:zoo.uk:individual:Fluffy is not the name of some particular chunk of RDF, it is the name of Fluffy over in that cage there. It's obvious in the case of Fluffy, less obvious in the case of "Australhypopus flagellifer Fain & Friend, 1984", but the distinction holds. A name, or a taxon, is not the same thing as a chunk of RDF describing it. The spec does not at all mandate that that that description - the metadata - be static.
The LSID version identifier seems to me a way of mitigating the "data must be static" problem, of handling it when people update an image or a pdf document that is named with an lsid. For example: the zoo might formally publish an infectious materials handling policy that gets updated from time to time, and wants to have an LSID referring not simply to "the policy", but to the particular PDF document whose publishing is an important act by the zoo. The version mechanism allows you to have a persistent lsid for "the current policy" or perhaps "the collection of these important policy documents", while also allowing you to have a different LSID for the pdf document promulgated 1998. The first lsid has no data - it refers to an abstract thing - but its metadata will indicate what versioned LSID is the current one.
(Without using LSID version numbers, another solution for this is to make use of namespaces: zoo.documentseries and zoo.pfds, for instance. The point of lsid version numbers is that you can see a relationship between the pfd and the series it is part of by looking at the lsid itself. http URIs can be structured as deeply as you like and the problem does not arise.)
This might be a sensible way for IPNI to have its cake and eat it too if the goal of versioning is to keep all of the old versions available. But if the version business at the IPNI resolution service is simply - I hesitate to suggest it - a misunderstanding of the spec, then perhaps it ought to be fixed.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Dear Steve,
Yep, the Emperor has no clothes.
I don't think the failure of LSIDs is completely down to LSID-specific issues. HTTP URIs in the linked data sense have their own issues, and if you follow the technical discussions every so often people get very agitated about the (in)famous HTTP 303 redirect solution to information versus non-information resources. DOIs are also technically non-trivial to implement.
I suspect the failure of LSIDs is due to a combination of issues:
1. Most biodiversity data providers are small, poorly resourced operations that can't guarantee 24/7 operations
2. Nobody provides central support/monitoring of LSIDs (i.e., there is no CrossRef equivalent where you can report a broken LSID and have a reasonable expectation that someone will fix it).
3. The community as a whole doesn't value unique identifiers (if they did, we wouldn't be in this mess)
4. There are no LSID consumers. The scientific publishing industry's linking infrastructure depends on DOIs, publishers use CrossRef services as part of their publishing workflow. There is no equivalent for our field (another way of restating #3).
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
Regarding specifics, Catalogue of Life LSIDs have been broken for over a year (see http://iphylo.blogspot.com/2010/01/thoughts-on-international-year-of.html ). Nobody seems to care (see #3 above).
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
Index Fungorum's web services are broken (see #1 above).
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
uBio is one of the few success stories of biodiversity informatics (props to Dave Remsen and David Patterson for setting this up, and people like Anthony Goddard for keeping it running). They've had LSIDs running since about 2005, and pretty much the only reason I keep my LSID server running is because they use some vocabulary terms I coined way back when.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon concept, based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either the http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context.
ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g. http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
A quick comment about ubio.org's LSID resolution: I took a look at the complete RDF source and noticed that the namespace declarations for "ubi:" and "gla:" don't start with "http://". I'm pretty sure that isn't kosher RDF, although neither of the RDF validators complained about it (doesn't give me much confidence in RDF validators). So much for being overly optimistic about there being one source that worked completely. :-)
I should say that I'm not an advocate of LSIDs. I actually don't like them at all. I simply investigated them in this context as something that might work in a Linked Data context. I should also say that I'm not necessarily an advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give it a chance but I'm not betting the store on it. I AM an advocate of having identifiers that are persistent and globally unique and it appears that both Catalog of Life and IPNI (in my view) flunk the persistence test if you just look at the URI (LSID with version) as a string (which you SHOULD be able to do) that can be an unchanging identifier. We desperately need persistent and globally unique identifiers for a lot of things. If they resolve to RDF/XML then that's a bonus. We need an Emperor, clothing is somewhat optional.
A few specific responses:
Roderic Page wrote:
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
The reason I see HTTP URIs [capable of producing either HTML or RDF/XML through content negotiation] as something of greater value than LSIDs is that people can at a minimum get a web page out of it. That's the cake. If some people can also use them to get RDF for the Linked Data dream, that's the icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact that regular web URLs fail to work. That doesn't stop people from using the web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
OK, I guess as long as people can just leave the version off if they want, they'll get unchanging strings for their identifiers and so I guess that moves them into the "usable" column. But as far as I'm concerned, putting the versions on adds to the confusion. If I can say this without launching an unnecessary thread about the opacity of GUIDs, nobody is supposed to be looking at any GUID itself to infer meaning about the identified object. That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be suggesting that people do with the LSID version numbers they are tacking on. Get rid of them.
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
Agreed. The important thing is that the identifier for a certain "thing" (however the "thing" is defined) should not change. I have previously expressed the opinion that it won't really work to have any URI (sensu Linked Data) point to "data" and that all URIs serving as GUIDs should be considered to reference non-information resources (which by definition have only "metadata"). Under that scenario, all resource properties are "metadata" and subject to change. The data vs. metadata argument then becomes irrelevant.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the Linked Data world] to identify your resources. If you don't want to provide RDF/XML, that's fine for now. You can wait and see if that is necessary later. But at least create URIs that COULD be used for Linked Data in the future and that are not long and loaded with "?" and "&" characters. I don't know how to do that because I'm not a server dude. But it apparently isn't that hard to do with a mod rewrite. I know there are at least three people (probably more) on this list who do that routinely - Rod does it with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as long as the "cool" URI doesn't change).
My two cents worth... Steve
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon
concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either the http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context.
ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g. http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
On Thu, 6 Jan 2011, Steve Baskauf wrote:
A quick comment about ubio.org's LSID resolution: I took a look at the complete RDF source and noticed that the namespace declarations for "ubi:" and "gla:" don't start with "http://". I'm pretty sure that isn't kosher RDF,
It is kosher. Linked Data best practice is to use only http URIs, but RDF resources can be identified by any type of URI reference [1]. In my wild idea talk in Montpellier, I advocated incorporating non-http URIs (specifically LSIDs) into Linked Data. Tim Finin and I later fleshed this idea out slightly [2]. Backtracking, I'm not sure lsids *should* be part of Linked Data, but I'm certain that they *can* be.
Joel.
1. http://www.w3.org/TR/rdf-schema/ 2. http://ebiquity.umbc.edu/paper/html/id/485/What-Does-it-Mean-for-a-URI-to-Re...
although neither of the RDF validators complained about it (doesn't give me much confidence in RDF validators). So much for being overly optimistic about there being one source that worked completely. :-)
I should say that I'm not an advocate of LSIDs. I actually don't like them at all. I simply investigated them in this context as something that might work in a Linked Data context. I should also say that I'm not necessarily an advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give it a chance but I'm not betting the store on it. I AM an advocate of having identifiers that are persistent and globally unique and it appears that both Catalog of Life and IPNI (in my view) flunk the persistence test if you just look at the URI (LSID with version) as a string (which you SHOULD be able to do) that can be an unchanging identifier. We desperately need persistent and globally unique identifiers for a lot of things. If they resolve to RDF/XML then that's a bonus. We need an Emperor, clothing is somewhat optional.
A few specific responses:
Roderic Page wrote:
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
The reason I see HTTP URIs [capable of producing either HTML or RDF/XML through content negotiation] as something of greater value than LSIDs is that people can at a minimum get a web page out of it. That's the cake. If some people can also use them to get RDF for the Linked Data dream, that's the icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact that regular web URLs fail to work. That doesn't stop people from using the web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
OK, I guess as long as people can just leave the version off if they want, they'll get unchanging strings for their identifiers and so I guess that moves them into the "usable" column. But as far as I'm concerned, putting the versions on adds to the confusion. If I can say this without launching an unnecessary thread about the opacity of GUIDs, nobody is supposed to be looking at any GUID itself to infer meaning about the identified object. That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be suggesting that people do with the LSID version numbers they are tacking on. Get rid of them.
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
Agreed. The important thing is that the identifier for a certain "thing" (however the "thing" is defined) should not change. I have previously expressed the opinion that it won't really work to have any URI (sensu Linked Data) point to "data" and that all URIs serving as GUIDs should be considered to reference non-information resources (which by definition have only "metadata"). Under that scenario, all resource properties are "metadata" and subject to change. The data vs. metadata argument then becomes irrelevant.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the Linked Data world] to identify your resources. If you don't want to provide RDF/XML, that's fine for now. You can wait and see if that is necessary later. But at least create URIs that COULD be used for Linked Data in the future and that are not long and loaded with "?" and "&" characters. I don't know how to do that because I'm not a server dude. But it apparently isn't that hard to do with a mod rewrite. I know there are at least three people (probably more) on this list who do that routinely - Rod does it with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as long as the "cool" URI doesn't change). My two cents worth... Steve
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon
concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either the http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context.
ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g. http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Dear Steve,
On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
A quick comment about ubio.org's LSID resolution: I took a look at the complete RDF source and noticed that the namespace declarations for "ubi:" and "gla:" don't start with "http://". I'm pretty sure that isn't kosher RDF, although neither of the RDF validators complained about it (doesn't give me much confidence in RDF validators). So much for being overly optimistic about there being one source that worked completely. :-)
I don't think RDF namespaces need to be HTTP URIs. There is an expectation by most RDF clients that namespaces are resolvable via HTTP URIs and will return RDF, but this need not be true (some well known namespaces such as http://search.yahoo.com/mrss/ and http://www.georss.org/georss/ don't resolve to RDF). So, ASFAIK LSIDs are perfectly valid, if "unfriendly".
I should say that I'm not an advocate of LSIDs. I actually don't like them at all. I simply investigated them in this context as something that might work in a Linked Data context. I should also say that I'm not necessarily an advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give it a chance but I'm not betting the store on it. I AM an advocate of having identifiers that are persistent and globally unique and it appears that both Catalog of Life and IPNI (in my view) flunk the persistence test if you just look at the URI (LSID with version) as a string (which you SHOULD be able to do) that can be an unchanging identifier. We desperately need persistent and globally unique identifiers for a lot of things. If they resolve to RDF/XML then that's a bonus. We need an Emperor, clothing is somewhat optional.
A few specific responses:
Roderic Page wrote:
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
The reason I see HTTP URIs [capable of producing either HTML or RDF/ XML through content negotiation] as something of greater value than LSIDs is that people can at a minimum get a web page out of it. That's the cake. If some people can also use them to get RDF for the Linked Data dream, that's the icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact that regular web URLs fail to work. That doesn't stop people from using the web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
I'm not disagreeing at all, simply saying that replacing LSIDs with HTTP URIs (or anything else) doesn't magic away the issue of keeping the identifiers "live".
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
OK, I guess as long as people can just leave the version off if they want, they'll get unchanging strings for their identifiers and so I guess that moves them into the "usable" column. But as far as I'm concerned, putting the versions on adds to the confusion. If I can say this without launching an unnecessary thread about the opacity of GUIDs, nobody is supposed to be looking at any GUID itself to infer meaning about the identified object. That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be suggesting that people do with the LSID version numbers they are tacking on. Get rid of them.
Version numbers are optional and are part of the LSID spec. Given that the unversioned ones work I don't see a huge problem here, especially as the first thing people will say regarding any identifier is "what happens if my data changes?" Versioning is there if you need it, if you don't need it, don't use it.
Opacity is another red herring. Does anybody know any truly opaque identifiers, ones about which I can't infer anything, not even that its an identifier? In point of fact, most real world identifiers are laden with embedded semantics (including checksums, etc.). Furthermore, any recent discussion of good URL structure pretty much converges on making them clean, human-readable, and hackable.
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
Agreed. The important thing is that the identifier for a certain "thing" (however the "thing" is defined) should not change. I have previously expressed the opinion that it won't really work to have any URI (sensu Linked Data) point to "data" and that all URIs serving as GUIDs should be considered to reference non-information resources (which by definition have only "metadata"). Under that scenario, all resource properties are "metadata" and subject to change. The data vs. metadata argument then becomes irrelevant.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the Linked Data world] to identify your resources. If you don't want to provide RDF/XML, that's fine for now. You can wait and see if that is necessary later. But at least create URIs that COULD be used for Linked Data in the future and that are not long and loaded with "?" and "&" characters. I don't know how to do that because I'm not a server dude. But it apparently isn't that hard to do with a mod rewrite. I know there are at least three people (probably more) on this list who do that routinely - Rod does it with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as long as the "cool" URI doesn't change).
BHL does have simple, clean, stable URIs, such as http://www.biodiversitylibrary.org/page/26246005 for a page, http://www.biodiversitylibrary.org/item/84644 for an item, and so on.
Regards
Rod
My two cents worth... Steve
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon
concept, based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon
concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either the http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context.
ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g. http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Rod and Joel, I stand corrected on all points. Thanks for the clarification. Steve
Roderic Page wrote:
Dear Steve,
On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
A quick comment about ubio.org's LSID resolution: I took a look at the complete RDF source and noticed that the namespace declarations for "ubi:" and "gla:" don't start with "http://". I'm pretty sure that isn't kosher RDF, although neither of the RDF validators complained about it (doesn't give me much confidence in RDF validators). So much for being overly optimistic about there being one source that worked completely. :-)
I don't think RDF namespaces need to be HTTP URIs. There is an expectation by most RDF clients that namespaces are resolvable via HTTP URIs and will return RDF, but this need not be true (some well known namespaces such as http://search.yahoo.com/mrss/ and http://www.georss.org/georss/ don't resolve to RDF). So, ASFAIK LSIDs are perfectly valid, if "unfriendly".
I should say that I'm not an advocate of LSIDs. I actually don't like them at all. I simply investigated them in this context as something that might work in a Linked Data context. I should also say that I'm not necessarily an advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give it a chance but I'm not betting the store on it. I AM an advocate of having identifiers that are persistent and globally unique and it appears that both Catalog of Life and IPNI (in my view) flunk the persistence test if you just look at the URI (LSID with version) as a string (which you SHOULD be able to do) that can be an unchanging identifier. We desperately need persistent and globally unique identifiers for a lot of things. If they resolve to RDF/XML then that's a bonus. We need an Emperor, clothing is somewhat optional.
A few specific responses:
Roderic Page wrote:
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
The reason I see HTTP URIs [capable of producing either HTML or RDF/XML through content negotiation] as something of greater value than LSIDs is that people can at a minimum get a web page out of it. That's the cake. If some people can also use them to get RDF for the Linked Data dream, that's the icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact that regular web URLs fail to work. That doesn't stop people from using the web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
I'm not disagreeing at all, simply saying that replacing LSIDs with HTTP URIs (or anything else) doesn't magic away the issue of keeping the identifiers "live".
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
OK, I guess as long as people can just leave the version off if they want, they'll get unchanging strings for their identifiers and so I guess that moves them into the "usable" column. But as far as I'm concerned, putting the versions on adds to the confusion. If I can say this without launching an unnecessary thread about the opacity of GUIDs, nobody is supposed to be looking at any GUID itself to infer meaning about the identified object. That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be suggesting that people do with the LSID version numbers they are tacking on. Get rid of them.
Version numbers are optional and are part of the LSID spec. Given that the unversioned ones work I don't see a huge problem here, especially as the first thing people will say regarding any identifier is "what happens if my data changes?" Versioning is there if you need it, if you don't need it, don't use it.
Opacity is another red herring. Does anybody know any truly opaque identifiers, ones about which I can't infer anything, not even that its an identifier? In point of fact, most real world identifiers are laden with embedded semantics (including checksums, etc.). Furthermore, any recent discussion of good URL structure pretty much converges on making them clean, human-readable, and hackable.
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
Agreed. The important thing is that the identifier for a certain "thing" (however the "thing" is defined) should not change. I have previously expressed the opinion that it won't really work to have any URI (sensu Linked Data) point to "data" and that all URIs serving as GUIDs should be considered to reference non-information resources (which by definition have only "metadata"). Under that scenario, all resource properties are "metadata" and subject to change. The data vs. metadata argument then becomes irrelevant.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the Linked Data world] to identify your resources. If you don't want to provide RDF/XML, that's fine for now. You can wait and see if that is necessary later. But at least create URIs that COULD be used for Linked Data in the future and that are not long and loaded with "?" and "&" characters. I don't know how to do that because I'm not a server dude. But it apparently isn't that hard to do with a mod rewrite. I know there are at least three people (probably more) on this list who do that routinely - Rod does it with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as long as the "cool" URI doesn't change).
BHL does have simple, clean, stable URIs, such as http://www.biodiversitylibrary.org/page/26246005 for a page, http://www.biodiversitylibrary.org/item/84644 for an item, and so on.
Regards
Rod
My two cents worth... Steve
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon
concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either the http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context.
ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g. http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk mailto:r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com mailto:rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
There are too many ideas to respond to at once but there is one issue that is being overblown.
That is this issue of uptime.
If these resources are exposed properly at any given time there should be several copies of any resource available somewhere else in the cloud.
For instance, if EUNIS or Bioimages is down, the data is still available on my SPARQL endpoint.
In the case of EUNIS is should also be available at:
http://linkeddata.uriburner.com/fct/
http://linkeddata.uriburner.com/fct/http://sindice.com/
http://sindice.com/And probably many other places.
Yes, we all strive for 99.99% uptime, but I had my choice I would rather that these sites spend what little developmental resources they have on properly exposing real semantic web identifiers than on a site that has 99.99% uptime but exposed LSID's.
The Linked Data system has build in redundancy, we might as well take advantage of it.
Respectfully,
- Pete
On Thu, Jan 6, 2011 at 10:11 AM, Roderic Page r.page@bio.gla.ac.uk wrote:
Dear Steve,
On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
A quick comment about ubio.org's LSID resolution: I took a look at the complete RDF source and noticed that the namespace declarations for "ubi:" and "gla:" don't start with "http://". I'm pretty sure that isn't kosher RDF, although neither of the RDF validators complained about it (doesn't give me much confidence in RDF validators). So much for being overly optimistic about there being one source that worked completely. :-)
I don't think RDF namespaces need to be HTTP URIs. There is an expectation by most RDF clients that namespaces are resolvable via HTTP URIs and will return RDF, but this need not be true (some well known namespaces such as http://search.yahoo.com/mrss/ and http://www.georss.org/georss/ don't resolve to RDF). So, ASFAIK LSIDs are perfectly valid, if "unfriendly".
I should say that I'm not an advocate of LSIDs. I actually don't like them at all. I simply investigated them in this context as something that might work in a Linked Data context. I should also say that I'm not necessarily an advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give it a chance but I'm not betting the store on it. I AM an advocate of having identifiers that are persistent and globally unique and it appears that both Catalog of Life and IPNI (in my view) flunk the persistence test if you just look at the URI (LSID with version) as a string (which you SHOULD be able to do) that can be an unchanging identifier. We desperately need persistent and globally unique identifiers for a lot of things. If they resolve to RDF/XML then that's a bonus. We need an Emperor, clothing is somewhat optional.
A few specific responses:
Roderic Page wrote:
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
The reason I see HTTP URIs [capable of producing either HTML or RDF/XML through content negotiation] as something of greater value than LSIDs is that people can at a minimum get a web page out of it. That's the cake. If some people can also use them to get RDF for the Linked Data dream, that's the icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact that regular web URLs fail to work. That doesn't stop people from using the web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
I'm not disagreeing at all, simply saying that replacing LSIDs with HTTP URIs (or anything else) doesn't magic away the issue of keeping the identifiers "live".
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
OK, I guess as long as people can just leave the version off if they want, they'll get unchanging strings for their identifiers and so I guess that moves them into the "usable" column. But as far as I'm concerned, putting the versions on adds to the confusion. If I can say this without launching an unnecessary thread about the opacity of GUIDs, nobody is supposed to be looking at any GUID itself to infer meaning about the identified object. That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be suggesting that people do with the LSID version numbers they are tacking on. Get rid of them.
Version numbers are optional and are part of the LSID spec. Given that the unversioned ones work I don't see a huge problem here, especially as the first thing people will say regarding any identifier is "what happens if my data changes?" Versioning is there if you need it, if you don't need it, don't use it.
Opacity is another red herring. Does anybody know any truly opaque identifiers, ones about which I can't infer anything, not even that its an identifier? In point of fact, most real world identifiers are laden with embedded semantics (including checksums, etc.). Furthermore, any recent discussion of good URL structure pretty much converges on making them clean, human-readable, and hackable.
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
Agreed. The important thing is that the identifier for a certain "thing" (however the "thing" is defined) should not change. I have previously expressed the opinion that it won't really work to have any URI (sensu Linked Data) point to "data" and that all URIs serving as GUIDs should be considered to reference non-information resources (which by definition have only "metadata"). Under that scenario, all resource properties are "metadata" and subject to change. The data vs. metadata argument then becomes irrelevant.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the Linked Data world] to identify your resources. If you don't want to provide RDF/XML, that's fine for now. You can wait and see if that is necessary later. But at least create URIs that COULD be used for Linked Data in the future and that are not long and loaded with "?" and "&" characters. I don't know how to do that because I'm not a server dude. But it apparently isn't that hard to do with a mod rewrite. I know there are at least three people (probably more) on this list who do that routinely - Rod does it with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as long as the "cool" URI doesn't change).
BHL does have simple, clean, stable URIs, such as http://www.biodiversitylibrary.org/page/26246005 for a page, http://www.biodiversitylibrary.org/item/84644 for an item, and so on.
Regards
Rod
My two cents worth... Steve
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon
concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either thehttp://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context. ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g.http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
---------------------------------------------------------
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Dear Pete,
Replication is a good point, although it does depend on the source being available for at least part of the time (not always a given, sadly), plus the copies need to be easily discoverable (ideally transparently as far as the user is concerned -- they shouldn't need to go hunting). Phil Cryer wrote a nice post about this, proposing a solution based on CouchDB, which makes replication pretty trivial. See http://fak3r.com/2009/04/29/resolving-lsids-wit-url-resolvers-and-couchdb/
Regards
Rod
On 6 Jan 2011, at 20:49, Peter DeVries wrote:
There are too many ideas to respond to at once but there is one issue that is being overblown.
That is this issue of uptime.
If these resources are exposed properly at any given time there should be several copies of any resource available somewhere else in the cloud.
For instance, if EUNIS or Bioimages is down, the data is still available on my SPARQL endpoint.
In the case of EUNIS is should also be available at:
http://linkeddata.uriburner.com/fct/
And probably many other places.
Yes, we all strive for 99.99% uptime, but I had my choice I would rather that these sites spend what little developmental resources they have on properly exposing real semantic web identifiers than on a site that has 99.99% uptime but exposed LSID's.
The Linked Data system has build in redundancy, we might as well take advantage of it.
Respectfully,
- Pete
On Thu, Jan 6, 2011 at 10:11 AM, Roderic Page r.page@bio.gla.ac.uk wrote: Dear Steve,
On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
A quick comment about ubio.org's LSID resolution: I took a look at the complete RDF source and noticed that the namespace declarations for "ubi:" and "gla:" don't start with "http://". I'm pretty sure that isn't kosher RDF, although neither of the RDF validators complained about it (doesn't give me much confidence in RDF validators). So much for being overly optimistic about there being one source that worked completely. :-)
I don't think RDF namespaces need to be HTTP URIs. There is an expectation by most RDF clients that namespaces are resolvable via HTTP URIs and will return RDF, but this need not be true (some well known namespaces such as http://search.yahoo.com/mrss/ and http://www.georss.org/georss/ don't resolve to RDF). So, ASFAIK LSIDs are perfectly valid, if "unfriendly".
I should say that I'm not an advocate of LSIDs. I actually don't like them at all. I simply investigated them in this context as something that might work in a Linked Data context. I should also say that I'm not necessarily an advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give it a chance but I'm not betting the store on it. I AM an advocate of having identifiers that are persistent and globally unique and it appears that both Catalog of Life and IPNI (in my view) flunk the persistence test if you just look at the URI (LSID with version) as a string (which you SHOULD be able to do) that can be an unchanging identifier. We desperately need persistent and globally unique identifiers for a lot of things. If they resolve to RDF/XML then that's a bonus. We need an Emperor, clothing is somewhat optional.
A few specific responses:
Roderic Page wrote:
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
The reason I see HTTP URIs [capable of producing either HTML or RDF/ XML through content negotiation] as something of greater value than LSIDs is that people can at a minimum get a web page out of it. That's the cake. If some people can also use them to get RDF for the Linked Data dream, that's the icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact that regular web URLs fail to work. That doesn't stop people from using the web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
I'm not disagreeing at all, simply saying that replacing LSIDs with HTTP URIs (or anything else) doesn't magic away the issue of keeping the identifiers "live".
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
OK, I guess as long as people can just leave the version off if they want, they'll get unchanging strings for their identifiers and so I guess that moves them into the "usable" column. But as far as I'm concerned, putting the versions on adds to the confusion. If I can say this without launching an unnecessary thread about the opacity of GUIDs, nobody is supposed to be looking at any GUID itself to infer meaning about the identified object. That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be suggesting that people do with the LSID version numbers they are tacking on. Get rid of them.
Version numbers are optional and are part of the LSID spec. Given that the unversioned ones work I don't see a huge problem here, especially as the first thing people will say regarding any identifier is "what happens if my data changes?" Versioning is there if you need it, if you don't need it, don't use it.
Opacity is another red herring. Does anybody know any truly opaque identifiers, ones about which I can't infer anything, not even that its an identifier? In point of fact, most real world identifiers are laden with embedded semantics (including checksums, etc.). Furthermore, any recent discussion of good URL structure pretty much converges on making them clean, human-readable, and hackable.
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
Agreed. The important thing is that the identifier for a certain "thing" (however the "thing" is defined) should not change. I have previously expressed the opinion that it won't really work to have any URI (sensu Linked Data) point to "data" and that all URIs serving as GUIDs should be considered to reference non-information resources (which by definition have only "metadata"). Under that scenario, all resource properties are "metadata" and subject to change. The data vs. metadata argument then becomes irrelevant.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the Linked Data world] to identify your resources. If you don't want to provide RDF/XML, that's fine for now. You can wait and see if that is necessary later. But at least create URIs that COULD be used for Linked Data in the future and that are not long and loaded with "?" and "&" characters. I don't know how to do that because I'm not a server dude. But it apparently isn't that hard to do with a mod rewrite. I know there are at least three people (probably more) on this list who do that routinely - Rod does it with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as long as the "cool" URI doesn't change).
BHL does have simple, clean, stable URIs, such as http://www.biodiversitylibrary.org/page/26246005 for a page, http://www.biodiversitylibrary.org/item/84644 for an item, and so on.
Regards
Rod
My two cents worth... Steve
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon
concept, based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon
concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either the http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context.
ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g. http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base / GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Hi Rod,
I still think that these resources will be more reliable in the longterm if they are based on LOD URI's rather than depend on members of this community to properly maintain separate code that is specific to LSID's.
It is also likely that we will be able leverage someone else's work that does more transparent replication of LOD data.
Respectfully,
- Pete
On Thu, Jan 6, 2011 at 3:48 PM, Roderic Page r.page@bio.gla.ac.uk wrote:
Dear Pete,
Replication is a good point, although it does depend on the source being available for at least part of the time (not always a given, sadly), plus the copies need to be easily discoverable (ideally transparently as far as the user is concerned -- they shouldn't need to go hunting). Phil Cryer wrote a nice post about this, proposing a solution based on CouchDB, which makes replication pretty trivial. See http://fak3r.com/2009/04/29/resolving-lsids-wit-url-resolvers-and-couchdb/
Regards
Rod
On 6 Jan 2011, at 20:49, Peter DeVries wrote:
There are too many ideas to respond to at once but there is one issue that is being overblown.
That is this issue of uptime.
If these resources are exposed properly at any given time there should be several copies of any resource available somewhere else in the cloud.
For instance, if EUNIS or Bioimages is down, the data is still available on my SPARQL endpoint.
In the case of EUNIS is should also be available at:
http://linkeddata.uriburner.com/fct/
http://linkeddata.uriburner.com/fct/http://sindice.com/
http://sindice.com/And probably many other places.
Yes, we all strive for 99.99% uptime, but I had my choice I would rather that these sites spend what little developmental resources they have on properly exposing real semantic web identifiers than on a site that has 99.99% uptime but exposed LSID's.
The Linked Data system has build in redundancy, we might as well take advantage of it.
Respectfully,
- Pete
On Thu, Jan 6, 2011 at 10:11 AM, Roderic Page r.page@bio.gla.ac.ukwrote:
Dear Steve,
On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
A quick comment about ubio.org's LSID resolution: I took a look at the complete RDF source and noticed that the namespace declarations for "ubi:" and "gla:" don't start with "http://". I'm pretty sure that isn't kosher RDF, although neither of the RDF validators complained about it (doesn't give me much confidence in RDF validators). So much for being overly optimistic about there being one source that worked completely. :-)
I don't think RDF namespaces need to be HTTP URIs. There is an expectation by most RDF clients that namespaces are resolvable via HTTP URIs and will return RDF, but this need not be true (some well known namespaces such as http://search.yahoo.com/mrss/ and http://www.georss.org/georss/ don't resolve to RDF). So, ASFAIK LSIDs are perfectly valid, if "unfriendly".
I should say that I'm not an advocate of LSIDs. I actually don't like them at all. I simply investigated them in this context as something that might work in a Linked Data context. I should also say that I'm not necessarily an advocate of Linked Data - I'm in the "wait and see" camp. I'd like to give it a chance but I'm not betting the store on it. I AM an advocate of having identifiers that are persistent and globally unique and it appears that both Catalog of Life and IPNI (in my view) flunk the persistence test if you just look at the URI (LSID with version) as a string (which you SHOULD be able to do) that can be an unchanging identifier. We desperately need persistent and globally unique identifiers for a lot of things. If they resolve to RDF/XML then that's a bonus. We need an Emperor, clothing is somewhat optional.
A few specific responses:
Roderic Page wrote:
These issues won't go away simply by replacing LSIDs with HTTP URIs. Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org ).
The reason I see HTTP URIs [capable of producing either HTML or RDF/XML through content negotiation] as something of greater value than LSIDs is that people can at a minimum get a web page out of it. That's the cake. If some people can also use them to get RDF for the Linked Data dream, that's the icing. The fact that HTTP URI GUIDs fail to "work" is similar to the fact that regular web URLs fail to work. That doesn't stop people from using the web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
I'm not disagreeing at all, simply saying that replacing LSIDs with HTTP URIs (or anything else) doesn't magic away the issue of keeping the identifiers "live".
IPNI supports versioning of LSIDs, but the LSID without the version is also valid (it resolves to the latest version). I think this is a perfectly valid thing to do. Versioning matters to some, but not everyone cares about it, IPNI supports both views. If you want to cite a specific version, do so, if not, don't.
OK, I guess as long as people can just leave the version off if they want, they'll get unchanging strings for their identifiers and so I guess that moves them into the "usable" column. But as far as I'm concerned, putting the versions on adds to the confusion. If I can say this without launching an unnecessary thread about the opacity of GUIDs, nobody is supposed to be looking at any GUID itself to infer meaning about the identified object. That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be suggesting that people do with the LSID version numbers they are tacking on. Get rid of them.
Version numbers are optional and are part of the LSID spec. Given that the unversioned ones work I don't see a huge problem here, especially as the first thing people will say regarding any identifier is "what happens if my data changes?" Versioning is there if you need it, if you don't need it, don't use it.
Opacity is another red herring. Does anybody know any truly opaque identifiers, ones about which I can't infer anything, not even that its an identifier? In point of fact, most real world identifiers are laden with embedded semantics (including checksums, etc.). Furthermore, any recent discussion of good URL structure pretty much converges on making them clean, human-readable, and hackable.
The data/metadata distinction is a complete red herring and has side- tracked more LSID conversations than I care to remember. The distinction stems from LSIDs originally being conceived as identifiers for large data objects (e.g., sequences, images, binary streams) that people want accurately versioned so that could reproduce digital experiments. In this scenario, metadata can change because it's not what mattered for this reproducibility. Some people think of data about taxonomic names as "metadata", and then we're off on a wild goose chase about should LSIDs change when metadata changes. I wouldn't loose any sleep over this (see discussion of IPNI above).
Agreed. The important thing is that the identifier for a certain "thing" (however the "thing" is defined) should not change. I have previously expressed the opinion that it won't really work to have any URI (sensu Linked Data) point to "data" and that all URIs serving as GUIDs should be considered to reference non-information resources (which by definition have only "metadata"). Under that scenario, all resource properties are "metadata" and subject to change. The data vs. metadata argument then becomes irrelevant.
Regarding DOIs for books, these are uncommon, although there are moves to express ISBNs within the DOI framework, so there may be more of these. But for a DOI to exist someone has to claim ownership of a resource and register the DOI with CrossRef. For a lot of older literature there won't be a publisher around to do this, but that's where BHL comes in.
Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the Linked Data world] to identify your resources. If you don't want to provide RDF/XML, that's fine for now. You can wait and see if that is necessary later. But at least create URIs that COULD be used for Linked Data in the future and that are not long and loaded with "?" and "&" characters. I don't know how to do that because I'm not a server dude. But it apparently isn't that hard to do with a mod rewrite. I know there are at least three people (probably more) on this list who do that routinely - Rod does it with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as long as the "cool" URI doesn't change).
BHL does have simple, clean, stable URIs, such as http://www.biodiversitylibrary.org/page/26246005 for a page, http://www.biodiversitylibrary.org/item/84644 for an item, and so on.
Regards
Rod
My two cents worth... Steve
In summary, we're in a mess, and I don't think this is really down to technology. It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
Regards
Rod
On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
Well, I have continued my quest for resolvable, RDF-producing GUIDs for taxon/name-related stuff. I have gotten a lot of good information from reading Rod Page's BMC Bioinformatics paper (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating his http://bioguid.info/ site.
From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
based on the recent discussion, it sounds like the DIO solution for publications is a good direction to go IF resolution services producing RDF comes into existence and IF it becomes possible to actually search for the DIOs of more obscure publications. I tried using Rod's site to look up a journal article using the ISSN, volume, and page and the web interface found the DOI and generated RDF just fine. However, an attempt to use the web to find the DIO of Gleason and Cronquist's Manual of vascular plants of Northeastern United States and adjacent Canada failed despite a half hour of effort (I found the UPC, the LOC call number, and the ISBN, but no DOI). Maybe there just isn't a DOI for it but there should be a way for me to know that. So DOIs for books and old journal articles are not really ready for prime time.
From the standpoint of the "scientific name" part of a TNU/taxon
concept, I had better luck (sort of). Rod's "Status of biodiversity services" page (http://www.bioguid.info/status/) was really cool. I saw resources I hadn't known about before. I tried out several of the services that claimed to issue LSIDs.
Catalog of Life's LSIDs didn't work with either thehttp://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with either a web browser or the OpenLink RDF browser. I only got an empty RDF element in response.
Index Fungorum was down.
IPNI seemed to work. However, I was somewhat appalled to observe that they seem to change the revision identifier any time that they change any part of the metadata. That renders the LSID useless as a permanent GUID for the name and I believe is inconsistent with the design of LSIDs where the revision is only supposed to change if the underlying data itself (NOT metadata) changes. (Catalog of Life says that they change the revision identifier EACH YEAR for all of their records! That's even worse!) If I'm remembering the TDWG LSID recommendations, it is not even recommended to use the revision part of an LSID at all in the biodiversity informatics context. ubio.org's LSIDs seemed to work properly.
[sorry - didn't try zoobank since I was looking for plants]
I don't know which (if any) of the Web sites listed on Rod's status page use generic HTTP URI guids (rather than LSIDs) to refer to taxon names. I tried out the Global Names index that Pete was mentioning. The URI version of the UUIDs (e.g.http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f) do resolve under content negotiation, but the only useful information that the RDF representation seems to provide is the actual name string that was used to generate the UUID. Until some other useful linked information is added to the RDF, there doesn't seem to be much advantage in pointing a semantic client to the URI over just using a string literal for the name.
So the bottom line is that of the LSID services for names that I've tried so far, only ubio.org seems to have LSIDs for names that are unchanging, can work as a proxied URI, and that produce actual useful RDF. That's pretty disappointing given the apparently huge amount of work that's been put into building these various systems.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
---------------------------------------------------------
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
On 06/01/2011, at 7:28 PM, Roderic Page wrote:
I suspect the failure of LSIDs is due to a combination of issues:
Another problem with LSIDs is that the specification is complex and difficult to implement, especially it's reliance on WSDL. You have to absorb a great deal of spec before you can implement it correctly. One of the goals of our implementation was to provide some software which would take care of all that: you can put any data you like into it without being bound to any particular XML schema or set of database views, and it will serve up your data according to the spec.
- Most biodiversity data providers are small, poorly resourced
operations that can't guarantee 24/7 operations
As you point out, http uris don't make this problem go away. Service-oriented software approaches must be designed at each layer with the expectation that services will go down, each layer must cope with asynchronous delays. This applies even in-house: I worked a one shop where their entire system simply stopped when the image repository jammed (which happened about twice a week, and needless to say it was 3rd party software so couldn't be tinkered with). Use of the internet makes the problem worse. The problem is simply that it's hard to do.
- There are no LSID consumers.
Partly a result of the specification being so complex - the most complex part of it being that you have to understand WSDLs. Talking to an LSID resolver is as byzantine as writing one. A java "protocol handler" that you could drop into your lib directory might go a ways to fixing this. The spec is at the OMG site, and the only support there is a single pdf. Everything else is an effort. When we started, the only code was that old IBM thing, and we gave up on it.
By contrast, just about every software platform these days will fetch an http URI - http is ubiquitous.
It's a failure of our community to create the appropriate resources (e.g., centralised, curated resources of identifiers and associated metadata for names, publications, and specimens).
"Why should we use LSIDs? No one else is, and it's really hard to do."
Another difficulty is that the community already has a system for handing out unique identifiers and extensive systems for managing them - these identifiers are scientific names, and the systems are the various nomenclatural committees. Isn't it absurd to coin unique, stable identifiers when "Ixodes tasmani" is such an identifier already? Isn't that good enough? Turns out it isn't: that it actually only uniquely identifies things when it's used in a context. Nevertheless, you don't get buy-in unless what you are proposing is clearly better than what people already have. It's difficult to persuade someone that a human-readable system that has worked just fine (more or less) for 300 years needs to be fixed, particularly when the fix is 'urn:lsid:lsid:biodiversity.org.au:afd.name:291425'.
_______________________________________________
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Another difficulty is that the community already has a system for handing out unique identifiers and extensive systems for managing them - these identifiers are scientific names, and the systems are the various nomenclatural committees. Isn't it absurd to coin unique, stable identifiers when "Ixodes tasmani" is such an identifier already? Isn't that good enough? Turns out it isn't: that it actually only uniquely identifies things when it's used in a context. Nevertheless, you don't get buy-in unless what you are proposing is clearly better than what people already have. It's difficult to persuade someone that a human-readable system that has worked just fine (more or less) for 300 years needs to be fixed, particularly when the fix is 'urn:lsid:lsid:biodiversity.org.au:afd.name:291425'.
Hi Paul,
I am using part of an email from another conversation, but it seems to apply here.
Why use URI's instead of string literals?
Aren't the semantics the same?
No for a couple of reasons.
Is <scientificName>mus musculus</scientificName> the same as <scientificName>Mus musculus</scientificName> ?
More importantly by using a "cool URI" http://www.w3.org/TR/cooluris/
<Occurrence> <hasScientificName rdf:resource="http://example.org/id/12345<http://mymuseum.org/specimens/id/12345>" /> </Occurrence>
Any other statements or data sets with http://example.org/id/12345http://mymuseum.org/specimens/id/12345 are linked and findable.
This is not the case with the integer id, or a string literal
Also triple and quadstores can handle URI's *much* more efficiently than plain text or UTF-8 text. (32 bytes per triple)
Lastly, Literals cannot be used as subjects in triples.
Is the current GNI RDF the greatest things since sliced bread?
No, it is a start and a needed part of a system to track name use, name to name relations and name to concept relations etc.
Also the systems are moving toward formatting the human readable views such that the URI is replaced with the rdfs:Label.
You can already see this in this example from Sig.ma where the URI's are replaced in the human view with the names of states etc.
http://sig.ma/search?pid=95b2be387166066bf7aa9c1a1c661611
So the human view is the string people are used to seeing, but hidden from view is the URI that ties that name to other data.
Respectfully,
- Pete
--------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/ ------------------------------------------------------------
participants (5)
-
joel sachs
-
Paul Murray
-
Peter DeVries
-
Roderic Page
-
Steve Baskauf