RDF/OWL - Please answer these questions.
Hi Everyone,
One of the action items out of the kick-off meeting of the new RDF/OWL best practices group was a questionnaire with two purposes: i) to assess people's expectations and wishes for the group; and ii) to serve as an RDF audit, a partial "state of the semantic web in TDWG".
The questionnaire is at http://code.google.com/p/tdwg-rdf/wiki/Survey
All are encouraged to participate (especially the 20 or so people who volunteered to do so at the kick-off).
Filling out the survey myself, I realized that it can actually take a lot of time to present and explain an organization's rdf . So please, don't be shy about answering in parts. Take ten minutes to provide some pointers now, and then return later to provide more detailed explanations.
For the first ten people to complete the survey, I will buy beer at iEvoBio 2012 (or whenever I next see you).
Thanks! Joel.
Dear Joel,
I guess part of the problem with surveys is that they require people to do work without any immediately obvious benefit to them.
Couldn't many of the questions here be tackled simply by looking at the RDF people are pumping out, given that it's essentially self-describing?
For example, harvest a record from each of the major RDF users (uBio, ION, IPNI, Index Fungorum, ZooBank, CoL, WoRMS, ALA, UniProt, etc.) and:
1. build list of vocabularies used 2. list shared vocabularies 3. look for any shared identifiers
One quick way to do this would be to look at the cache for http://lsid.tdwg.org (assuming it has one). If tdwg.org doesn't have a cache I have one at http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/cache/ (and another one at http://bioguid.info ). This will give a good sense of what is actually being used (rather than what people say is being used).
Then the task is to discover what is unique to a particular project, and what vocabularies are used for the same things (why oh why do we have two different sets of vocabularies for taxonomic names, for example?).
My recent grumbles about RDF mainly concern the lack of shared identifiers, which is harder to assess because you need to look at a multiple records. But my guess is that very few RDF providers reuse other providers' identifiers. People who aggregate may well end up using lots of identifiers to cope with this, but ideally we'd want taxonomic concept databases (e.g., CoL) to use nomenclator identifiers for names (e.g., ION, ZooBank, IPNI, Index Fungorum), and existing identifiers for literature (e.g., DOIs, PubMed). So, the question is, how connected is the biodiversity data web? (not very, is the short answer). It shouldn't be too hard to quantify this, and identify what needs to be done to fix the problem (for example, what links could we add to make the biodiversity data cloud coalesce more rapidly?).
What we're seeing outside biodiversity is the growth of services that say identifier x is the same as identifier y (e.g., http://sameas.org/ and http://identifiers.org/), which reflects the adage that there's no problem in computer science that can't be solved by adding a layer of indirection. t I'd argue that this is symptomatic of a failure of to reuse identifiers -- which one could also argue is a natural consequence of the way the we use the web, but that' s another argument ;)
Regards
Rod
On 16 Nov 2011, at 04:18, joel sachs wrote:
Hi Everyone,
One of the action items out of the kick-off meeting of the new RDF/OWL best practices group was a questionnaire with two purposes: i) to assess people's expectations and wishes for the group; and ii) to serve as an RDF audit, a partial "state of the semantic web in TDWG".
The questionnaire is at http://code.google.com/p/tdwg-rdf/wiki/Survey
All are encouraged to participate (especially the 20 or so people who volunteered to do so at the kick-off).
Filling out the survey myself, I realized that it can actually take a lot of time to present and explain an organization's rdf . So please, don't be shy about answering in parts. Take ten minutes to provide some pointers now, and then return later to provide more detailed explanations.
For the first ten people to complete the survey, I will buy beer at iEvoBio 2012 (or whenever I next see you).
Thanks! Joel.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Rod, What you have suggested (looking at known RDF providers) can and certainly should be done. However, looking at the sources we already know about isn't going to help us discover the sources we don't know about. That's what the survey is for. Also, the list you provided is (as is typical for TDWG) heavy on taxonomy and light on the broad range of resources we want to be relating (Occurrences, Identifications, Locations, genes, DNA sequences, etc.). Hopefully we can discover more examples of how people are modeling those things in RDF.
Steve
Roderic Page wrote:
Dear Joel,
I guess part of the problem with surveys is that they require people to do work without any immediately obvious benefit to them.
Couldn't many of the questions here be tackled simply by looking at the RDF people are pumping out, given that it's essentially self-describing?
For example, harvest a record from each of the major RDF users (uBio, ION, IPNI, Index Fungorum, ZooBank, CoL, WoRMS, ALA, UniProt, etc.) and:
- build list of vocabularies used
- list shared vocabularies
- look for any shared identifiers
One quick way to do this would be to look at the cache for http://lsid.tdwg.org (assuming it has one). If tdwg.org http://tdwg.org doesn't have a cache I have one at http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/cache/ http://darwin.zoology.gla.ac.uk/%7Erpage/lsid/tester/cache/ (and another one at http://bioguid.info ). This will give a good sense of what is actually being used (rather than what people say is being used).
Then the task is to discover what is unique to a particular project, and what vocabularies are used for the same things (why oh why do we have two different sets of vocabularies for taxonomic names, for example?).
My recent grumbles about RDF mainly concern the lack of shared identifiers, which is harder to assess because you need to look at a multiple records. But my guess is that very few RDF providers reuse other providers' identifiers. People who aggregate may well end up using lots of identifiers to cope with this, but ideally we'd want taxonomic concept databases (e.g., CoL) to use nomenclator identifiers for names (e.g., ION, ZooBank, IPNI, Index Fungorum), and existing identifiers for literature (e.g., DOIs, PubMed). So, the question is, how connected is the biodiversity data web? (not very, is the short answer). It shouldn't be too hard to quantify this, and identify what needs to be done to fix the problem (for example, what links could we add to make the biodiversity data cloud coalesce more rapidly?).
What we're seeing outside biodiversity is the growth of services that say identifier x is the same as identifier y (e.g., http://sameas.org/ and http://identifiers.org/), which reflects the adage that there's no problem in computer science that can't be solved by adding a layer of indirection. t I'd argue that this is symptomatic of a failure of to reuse identifiers -- which one could also argue is a natural consequence of the way the we use the web, but that' s another argument ;)
Regards
Rod
On 16 Nov 2011, at 04:18, joel sachs wrote:
Hi Everyone,
One of the action items out of the kick-off meeting of the new RDF/OWL best practices group was a questionnaire with two purposes: i) to assess people's expectations and wishes for the group; and ii) to serve as an RDF audit, a partial "state of the semantic web in TDWG".
The questionnaire is at http://code.google.com/p/tdwg-rdf/wiki/Survey
All are encouraged to participate (especially the 20 or so people who volunteered to do so at the kick-off).
Filling out the survey myself, I realized that it can actually take a lot of time to present and explain an organization's rdf . So please, don't be shy about answering in parts. Take ten minutes to provide some pointers now, and then return later to provide more detailed explanations.
For the first ten people to complete the survey, I will buy beer at iEvoBio 2012 (or whenever I next see you).
Thanks! Joel.
tdwg-content mailing list tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk mailto:r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com mailto:rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Dear Steve,
Sure, it is taxonomy heavy. A number of the other resources of interest have RDF sources that aren't managed by members of this community, so I'm not sure they'll be captured by this survey (unless people add what they consume, or may, as well as serve).
There are several vocabularies for sequences, genes, etc. that are used by services such as Bio2RDF, for example (sadly these ignore much of the stuff that is of interest in biodiversity, such as the link to the specimen).
Maybe a better way to do this is have a tool that people can use to upload a URL, it fetches the RDF, reports what vocabularies are used, and creates a graph or table of how that relates to other vocabularies.
If RDF is as easy and as self describing as we think it is, this survey should be a simple task that could be done using RDF tools, or am I missing something...?
Regards
Rod
On 16 Nov 2011, at 15:54, Steve Baskauf wrote:
Rod, What you have suggested (looking at known RDF providers) can and certainly should be done. However, looking at the sources we already know about isn't going to help us discover the sources we don't know about. That's what the survey is for. Also, the list you provided is (as is typical for TDWG) heavy on taxonomy and light on the broad range of resources we want to be relating (Occurrences, Identifications, Locations, genes, DNA sequences, etc.). Hopefully we can discover more examples of how people are modeling those things in RDF.
Steve
Roderic Page wrote:
Dear Joel,
I guess part of the problem with surveys is that they require people to do work without any immediately obvious benefit to them.
Couldn't many of the questions here be tackled simply by looking at the RDF people are pumping out, given that it's essentially self-describing?
For example, harvest a record from each of the major RDF users (uBio, ION, IPNI, Index Fungorum, ZooBank, CoL, WoRMS, ALA, UniProt, etc.) and:
- build list of vocabularies used
- list shared vocabularies
- look for any shared identifiers
One quick way to do this would be to look at the cache for http://lsid.tdwg.org (assuming it has one). If tdwg.org doesn't have a cache I have one at http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/cache/ (and another one at http://bioguid.info ). This will give a good sense of what is actually being used (rather than what people say is being used).
Then the task is to discover what is unique to a particular project, and what vocabularies are used for the same things (why oh why do we have two different sets of vocabularies for taxonomic names, for example?).
My recent grumbles about RDF mainly concern the lack of shared identifiers, which is harder to assess because you need to look at a multiple records. But my guess is that very few RDF providers reuse other providers' identifiers. People who aggregate may well end up using lots of identifiers to cope with this, but ideally we'd want taxonomic concept databases (e.g., CoL) to use nomenclator identifiers for names (e.g., ION, ZooBank, IPNI, Index Fungorum), and existing identifiers for literature (e.g., DOIs, PubMed). So, the question is, how connected is the biodiversity data web? (not very, is the short answer). It shouldn't be too hard to quantify this, and identify what needs to be done to fix the problem (for example, what links could we add to make the biodiversity data cloud coalesce more rapidly?).
What we're seeing outside biodiversity is the growth of services that say identifier x is the same as identifier y (e.g., http://sameas.org/ and http://identifiers.org/), which reflects the adage that there's no problem in computer science that can't be solved by adding a layer of indirection. t I'd argue that this is symptomatic of a failure of to reuse identifiers -- which one could also argue is a natural consequence of the way the we use the web, but that' s another argument ;)
Regards
Rod
On 16 Nov 2011, at 04:18, joel sachs wrote:
Hi Everyone,
One of the action items out of the kick-off meeting of the new RDF/OWL best practices group was a questionnaire with two purposes: i) to assess people's expectations and wishes for the group; and ii) to serve as an RDF audit, a partial "state of the semantic web in TDWG".
The questionnaire is at http://code.google.com/p/tdwg-rdf/wiki/Survey
All are encouraged to participate (especially the 20 or so people who volunteered to do so at the kick-off).
Filling out the survey myself, I realized that it can actually take a lot of time to present and explain an organization's rdf . So please, don't be shy about answering in parts. Take ten minutes to provide some pointers now, and then return later to provide more detailed explanations.
For the first ten people to complete the survey, I will buy beer at iEvoBio 2012 (or whenever I next see you).
Thanks! Joel.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
participants (3)
-
joel sachs
-
Roderic Page
-
Steve Baskauf