Why we should not use LSID
Note that part of my concern about the use of concept when talking about classes/properties/data elements is that I more and more believe we will want to use ontology reasoners for uses other than software design, i.e. as part of what we currently consider data (taxon names, concepts, rank hierarchy, parts of organisms, properties of organisms, etc.). All these are ontological concepts, and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained some examples how we might want to query our data - in ways that OWL-for-software- design seems not to cover - and which using LSIDs would even prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor ---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance.
---- File information ----------- File: SDD-TAG1.ppt Date: 23 Apr 2006, 18:10 Size: 1056768 bytes. Type: Unknown
Dear Gregor,
For the benefit of those not at TAG 1, can you please explain why "LSIDs are not interoperable with semantic web technologies"?
Regards
Rod
On 2 May 2006, at 16:44, Gregor Hagedorn wrote:
Note that part of my concern about the use of concept when talking about classes/properties/data elements is that I more and more believe we will want to use ontology reasoners for uses other than software design, i.e. as part of what we currently consider data (taxon names, concepts, rank hierarchy, parts of organisms, properties of organisms, etc.). All these are ontological concepts, and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained some examples how we might want to query our data - in ways that OWL-for-software- design seems not to cover - and which using LSIDs would even prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor
Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance.
---- File information ----------- File: SDD-TAG1.ppt Date: 23 Apr 2006, 18:10 Size: 1056768 bytes. Type: Unknown <SDD-TAG1.ppt>_______________________________________________ Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends http://uk.messenger.yahoo.com
Hi Rod,
From the meeting report - which I am struggling to get back to - these two bullet points sum it up I think
· There are certain things for which LSIDs are not appropriate. It would be legal to use them for RDF resource identifiers for controlled vocabularies and XML Schema locations BUT we would have to extend existing software libraries to do this which is not desirable.
· *Recommendation:* LSIDs are not used for controlled vocabularies, ontologies or XML Schema locations. LSIDs should be used to refer to instances.
Basically it was felt that if we used LSIDs for things like rdfs:Class definitions then any library that went off to fetch the definitions automatically would have to be extended so that it understood LSID resolution. On the other hand it was felt that use of LSIDs for real resources (things we are actually describing like specimens and people) was fine. Once an ontology is loaded then it is all fine though so to an extent this may be a false problem.
We spent a long time talking about what is part of the ontology and what isn't and went round in circles (please lets not do it again). Basically class and property descriptions should be URL type URIs but instance URIs can be LSIDs. If you want to define the genus /Rhododendron/ as being an OWL DL class retrieved remotely then you should probably give it a URL. If you want to define it as a data item then use a LSID.
I think Gregor's worries (correct me if I am wrong Gregor) are that in SDD (possibly our whole domain) many things could be considered classes and properties. i.e. Things you want your reasoner to use in the reasoning rather than simply reason about. In this case it may be better to have URLs for everything.
There is a niggling doubt (in my mind) that we may come across 'cool' tools and libraries that assume that *all *resource URIs are URLs and that we would not be able to use them or would need to extend them if we use LSIDs. Imagine a semantic web browser where you click on a node and it fetches the associated resource to expand itself.
I do occasionally struggle to see the advantages of LSIDs as GUIDs over just conventions for use of URLs but these may be matters of personal faith. Another bullet point in the report says:
· *Recommendation: *GUIDs Group should issue a document clearly justifying adoption of GUID technology. The advantages need to be clearly explained.
I'll try and get this report out ASAP but it looks very similar to the wiki page here:
http://wiki.tdwg.org/twiki/bin/view/TAG/TagMeeting1ReportDraft
Obviously would be grateful for your thoughts.
Roger
Roderic Page wrote:
Dear Gregor,
For the benefit of those not at TAG 1, can you please explain why "LSIDs are not interoperable with semantic web technologies"?
Regards
Rod
On 2 May 2006, at 16:44, Gregor Hagedorn wrote:
Note that part of my concern about the use of concept when talking about classes/properties/data elements is that I more and more believe we will want to use ontology reasoners for uses other than software design, i.e. as part of what we currently consider data (taxon names, concepts, rank hierarchy, parts of organisms, properties of organisms, etc.). All these are ontological concepts, and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained some examples how we might want to query our data - in ways that OWL-for-software- design seems not to cover - and which using LSIDs would even prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor
Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance.
---- File information ----------- File: SDD-TAG1.ppt Date: 23 Apr 2006, 18:10 Size: 1056768 bytes. Type: Unknown <SDD-TAG1.ppt>_______________________________________________ Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends http://uk.messenger.yahoo.com
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Thanks, Roger for the clarification from TAG-1.
I suspect that we still have some confusion between what should be part of the biodiversity ontology and what is actually data described by that ontology. (Of course this is not at all surprising, since the boundary has to be arbitrary.)
As I understand the TAG meetings concerns, we should ensure that software libraries are able to access any definition elements required for them to validate the form and content of exchange documents, and that it would be foolish to use LSIDs in such situations.
Considering SDD as an example, my interpretation of this is that the RDFS or other definitions for classes such as Character, State or Modifier should be accessible through URLs. This does not mean that the instances of these classes (i.e. what I might call the individual SDD data elements) need themselves to be accessible in the same way.
I do understand that Gregor is concerned about our ability to reason over the data as well as to validate the underlying documents. However, if we do choose to use technologies such as OWL to support such reasoning, I do not believe that we can expect to reason over fully federated data. Surely we would expect to resolve the data (through LSIDs, PURLs, or whatever else) and then process them locally? We should certainly consider whether this is an issue, but we should keep it separate from the main issue identified by the TAG.
I also suspect that LSIDs may be a really good way for us to handle many of our controlled vocabularies. Obviously those vocabularies which make up the definition of classes and their properties may need URL access, but in many other contexts (including, I would have thought, SDD) it may be more sensible to treat the vocabulary terms as data objects. This will allow us to extend them with all kinds of metadata.
I would say that the major reason the GUID meetings avoided adopting PURLs was simply that they give us no clean separation between the identifier and the owner and location of the document. LSIDs (provided we sort out appropriate best practices for how they are constructed) may, among other things, give us an intermediate layer we can conveniently manage to handle this.
Thanks,
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
_____
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roger Hyam Sent: 03 May 2006 12:29 To: Roderic Page Cc: tdwg-tag@lists.tdwg.org Subject: Re: [Tdwg-tag] Why we should not use LSID
Hi Rod,
From the meeting report - which I am struggling to get back to - these two
bullet points sum it up I think
* There are certain things for which LSIDs are not appropriate. It would be legal to use them for RDF resource identifiers for controlled vocabularies and XML Schema locations BUT we would have to extend existing software libraries to do this which is not desirable.
* Recommendation: LSIDs are not used for controlled vocabularies, ontologies or XML Schema locations. LSIDs should be used to refer to instances.
Basically it was felt that if we used LSIDs for things like rdfs:Class definitions then any library that went off to fetch the definitions automatically would have to be extended so that it understood LSID resolution. On the other hand it was felt that use of LSIDs for real resources (things we are actually describing like specimens and people) was fine. Once an ontology is loaded then it is all fine though so to an extent this may be a false problem.
We spent a long time talking about what is part of the ontology and what isn't and went round in circles (please lets not do it again). Basically class and property descriptions should be URL type URIs but instance URIs can be LSIDs. If you want to define the genus Rhododendron as being an OWL DL class retrieved remotely then you should probably give it a URL. If you want to define it as a data item then use a LSID.
I think Gregor's worries (correct me if I am wrong Gregor) are that in SDD (possibly our whole domain) many things could be considered classes and properties. i.e. Things you want your reasoner to use in the reasoning rather than simply reason about. In this case it may be better to have URLs for everything.
There is a niggling doubt (in my mind) that we may come across 'cool' tools and libraries that assume that all resource URIs are URLs and that we would not be able to use them or would need to extend them if we use LSIDs. Imagine a semantic web browser where you click on a node and it fetches the associated resource to expand itself.
I do occasionally struggle to see the advantages of LSIDs as GUIDs over just conventions for use of URLs but these may be matters of personal faith. Another bullet point in the report says:
* Recommendation: GUIDs Group should issue a document clearly justifying adoption of GUID technology. The advantages need to be clearly explained.
I'll try and get this report out ASAP but it looks very similar to the wiki page here:
http://wiki.tdwg.org/twiki/bin/view/TAG/TagMeeting1ReportDraft
Obviously would be grateful for your thoughts.
Roger
Roderic Page wrote:
Dear Gregor,
For the benefit of those not at TAG 1, can you please explain why "LSIDs are not interoperable with semantic web technologies"?
Regards
Rod
On 2 May 2006, at 16:44, Gregor Hagedorn wrote:
Note that part of my concern about the use of concept when talking about classes/properties/data elements is that I more and more believe we will want to use ontology reasoners for uses other than software design, i.e. as part of what we currently consider data (taxon names, concepts, rank hierarchy, parts of organisms, properties of organisms, etc.). All these are ontological concepts, and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained some examples how we might want to query our data - in ways that OWL-for-software- design seems not to cover - and which using LSIDs would even prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor ---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance.
---- File information ----------- File: SDD-TAG1.ppt Date: 23 Apr 2006, 18:10 Size: 1056768 bytes. Type: Unknown <SDD-TAG1.ppt>_______________________________________________ Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends http://uk.messenger.yahoo.com
_______________________________________________ Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Donald wrote_
Considering SDD as an example, my interpretation of this is that the RDFS or other definitions for classes such as Character, State or Modifier should be accessible through URLs. This does not mean that the instances of these classes (i.e. what I might call the individual SDD data elements) need themselves to be accessible in the same way.
I do understand that Gregor is concerned about our ability to reason over the data as well as to validate the underlying documents. However, if we do choose to use technologies such as OWL to support such reasoning, I do not believe that we can expect to reason over fully federated data. Surely we would expect to resolve the data (through LSIDs, PURLs, or whatever else) and then process them locally? We should certainly consider whether this is an issue, but we should keep it separate from the main issue identified by the TAG.
I fail to understand the distinction between class and instance. So far I understand that it is arbitrary, depending on the purpose. The class "SDD.Character" is expressed in an xml instance document (a w3c schema document, using instances of classes expressed in the w3c-schema-schema). So is the class "flower color" which is an instance (or "data" in Donald's sense) expressed in an SDD instance document.
However, a taxon description makes use of "flower color" in the sense of a class definition, i.e. all descriptions using this term with a specific value are instances of the class flower color.
"Flower color" can be generalized to "color of flower-like structure". This often makes a lot of sense - e.g. in compositae (sunflower, etc.) many people will give answer about the inflorescence rather than about a color, and even botany students get confused about the cythium of Euphorbia). So we do want to make use of reasoning engines when processing taxon identification queries.
Exactly the same generalization relationships hold for taxa.
What I try to elicit here is that my perception is that those charged with developing the GBIF software use the distinction from their perspective - which is a good thing - but that it seems that we might be going down a way that prevents us from ever changing the perspective by requesting the use of LSIDs for what from the software development perspective is currently perceived an instance. If my information is correct, this would exactly prevent the use of standard reasoners to answer questions such as I posed in my talk prepared for TAG-1 (can not post to the email list, only 200 kB allowed).
I also suspect that LSIDs may be a really good way for us to handle many of our controlled vocabularies. Obviously those vocabularies which make up the definition of classes and their properties may need URL access, but in many other contexts (including, I would have thought, SDD) it may be more sensible to treat the vocabulary terms as data objects. This will allow us to extend them with all kinds of metadata.
I would say that the major reason the GUID meetings avoided adopting PURLs was simply that they give us no clean separation between the identifier and the owner and location of the document. LSIDs (provided we sort out appropriate best practices for how they are constructed) may, among other things, give us an intermediate layer we can conveniently manage to handle this.
Can you explain this. I believe a purl does exactly this. If I have purl.org/xyz it is an id, but the document may be anywhere and may in fact move.
Thanks!
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Folks,
I think that this problem could be solved by using an LSID HTTP proxy, such as the (Biopathways LSID Resolver http://lsid.biopathways.org/resolver/ - http://lsid.biopathways.org/resolver/), to allow clients that are only aware of HTTP to resolve LSIDs without any additional software.
Such a proxy accepts HTTP GET/POST requests following some well-known rules (such as these: http://lsid.biopathways.org/resolver/weblinks.shtml http://lsid.biopathways.org/resolver/weblinks.shtml), resolves the LSID passed as a parameter, and returns either the data or metadata associated with the LSID to the original client. Here is one example:
http://lsid.biopathways.org/resolver/metadata/urn:lsid:lsid.tdwg.gbif.org:st...
The result is very similar, if not equivalent, to what you would have with PURL, without having to give up on the benefits of LSIDs.
The problem then would be to choose whether to use the pure LSIDs or the proxy link. One idea would be to use the proxy links to identify ontology classes and predicates. Also, LSID aware clients could extract the LSID from the the proxy link with one additional well known rule.
Besides the proxy I mentioned for LSID, there are well developed HTTP proxies for both DOI and ARK as well. There is really good documentation about the DOI bridge on the DOI site.
Regards,
Ricardo
Gregor Hagedorn wrote:
Note that part of my concern about the use of concept when talking about classes/properties/data elements is that I more and more believe we will want to use ontology reasoners for uses other than software design, i.e. as part of what we currently consider data (taxon names, concepts, rank hierarchy, parts of organisms, properties of organisms, etc.). All these are ontological concepts, and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained some examples how we might want to query our data - in ways that OWL-for-software- design seems not to cover - and which using LSIDs would even prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor
Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance.
---- File information ----------- File: SDD-TAG1.ppt Date: 23 Apr 2006, 18:10 Size: 1056768 bytes. Type: Unknown
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
participants (5)
-
Donald Hobern
-
Gregor Hagedorn
-
Ricardo Scachetti Pereira
-
Roderic Page
-
Roger Hyam