What you have suggested (looking at known RDF providers) can and
certainly should be done. However, looking at the sources we already
know about isn't going to help us discover the sources we don't know
about. That's what the survey is for. Also, the list you provided is
(as is typical for TDWG) heavy on taxonomy and light on the broad range
of resources we want to be relating (Occurrences, Identifications,
Locations, genes, DNA sequences, etc.). Hopefully we can discover more
examples of how people are modeling those things in RDF.
Dear Joel,
I guess part of the problem with surveys is that they require
people to do work without any immediately obvious benefit to them.
Couldn't many of the questions here be tackled simply by looking
at the RDF people are pumping out, given that it's essentially
self-describing?
For example, harvest a record from each of the major RDF users
(uBio, ION, IPNI, Index Fungorum, ZooBank, CoL, WoRMS, ALA, UniProt,
etc.) and:
1. build list of vocabularies used
2. list shared vocabularies
3. look for any shared identifiers
Then the task is to discover what is unique to a particular
project, and what vocabularies are used for the same things (why oh why
do we have two different sets of vocabularies for taxonomic names, for
example?).
My recent grumbles about RDF mainly concern the lack of shared
identifiers, which is harder to assess because you need to look at a
multiple records. But my guess is that very few RDF providers reuse
other providers' identifiers. People who aggregate may well end up
using lots of identifiers to cope with this, but ideally we'd want
taxonomic concept databases (e.g., CoL) to use nomenclator identifiers
for names (e.g., ION, ZooBank, IPNI, Index Fungorum), and existing
identifiers for literature (e.g., DOIs, PubMed). So, the question is,
how connected is the biodiversity data web? (not very, is the short
answer). It shouldn't be too hard to quantify this, and identify what
needs to be done to fix the problem (for example, what links could we
add to make the biodiversity data cloud coalesce more rapidly?).
What we're seeing outside biodiversity is the growth of services
that say identifier x is the same as identifier y (e.g.,
http://sameas.org/
and
http://identifiers.org/),
which reflects the adage that there's no problem in computer science
that can't be solved by adding a layer of indirection. t I'd argue that
this is symptomatic of a failure of to reuse identifiers -- which one
could also argue is a natural consequence of the way the we use the
web, but that' s another argument ;)
Regards
Rod
On 16 Nov 2011, at 04:18, joel sachs wrote:
Hi Everyone,
One of the action items out of the kick-off meeting of the new RDF/OWL
best practices group was a questionnaire with two purposes: i) to
assess
people's expectations and wishes for the group; and ii) to serve as an
RDF
audit, a partial "state of the semantic web in TDWG".
The questionnaire is at
http://code.google.com/p/tdwg-rdf/wiki/Survey
All are encouraged to participate (especially the 20 or so people who
volunteered to do so at the kick-off).
Filling out the survey myself, I realized that it can actually take a
lot
of time to present and explain an organization's rdf . So please, don't
be
shy about answering in parts. Take ten minutes to provide some
pointers now, and then return later to provide more detailed
explanations.
For the first ten people to complete the survey, I will buy beer at
iEvoBio 2012 (or whenever I next see you).
Thanks!
Joel.
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content