Hi all,
having just joined this list, I find it a great idea to have such an
RDF Task Group.
I am going to publish a little species catalog for the Federal
Environment Agency in Germany as Linked Data, and I am looking for
the best way to express it in RDF.
In the Linked Data cloud [1] I find several related contributions,
such as Geospecies, TaxonConcept, EUNIS, and more.
Comparing these approaches I prefer the idea of reusing SKOS [2]
labels and hierarchical relations, as in the Geospecies example [3].
It might be a good idea to apply the SKOS XL extension as well to go
deeper into the taxon name properties.
Finally, I would add the taxon ranks as a distinct concept scheme
and link them to the taxon concepts with a mapping relation.
Certainly this is not the only way to go, but it is rather simple
and will be easily understood as SKOS is quite common in the Linked
Data community. This might be called "Simple Darwin Core" and give
room for more complex ontology approaches beyond that.
Looking forward to discussion,
Thomas
[1] http://lod-cloud.net
[2] http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
[3] http://lod.geospecies.org/ses/v6n7p.rdf
Peter J. DeVries
Am 07.10.2010 19:29, schrieb Blum, Stan:
Re: [tdwg-content] Idea for Discussion,Differentiating
between "type's" of identifiers
Hi Steve,
Sorry, I missed your message below (as well as your response
to Roger) before I sent my reply about the utility of an RDF
guide for DwC. Obviously, I think it’s a great idea. To do
this within the “normal” TDWG process, this should be done as
a Task Group. I could help you draft a charter for that,
which would then need to be reviewed by the TAG and Exec.
Once approved, we would put the charter up on the web site,
and do our best to provide any other resources that would help
speed the task. I don’t mean to slow you down. The Charter
doesn’t have to be elaborate. It’s function is to let others
in TDWG and beyond know that this task is proceeding, who to
contact, how to get involved, etc. It also gives you the
backing of the TDWG community.
Let me know if you’d like to pursue this.
-Stan
On 10/7/10 7:41 AM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu>
wrote:
I agree that it is best to avoid a
proliferation of terms and I agree that it is best to keep
Darwin Core technology independent to the maximum extent
possible. However, I think that the case of facilitating
HTTP URIs is a special one because of the requirements of
GUIDs/Persistent Identifiers. Both the TDWG and GBIF
guidelines such as they currently stand say that GUIDs must
be resolvable, that in their resolution they must return
RDF, and that the RDF has to be in an XML format. Like it
or not, that is what we have. Given the amount of time that
it seems to have taken to settle on that much, I think it is
best for us to decide to live with it, warts and all, rather
than re-opening the discussion and delaying the
implementation of GUIDs for another five years.
Given that assumption, there needs to be within Darwin Core
some way to support this particular "technology" (Linked
Data, RDF/XML) even if we don't do "special" things to
support other technologies such as LSID, DOI, etc. The
point is well taken that most of those other technologies
have mechanisms for turning their identifiers into URIs and
the aforementioned guidelines lay out how owl:sameAs can be
used within the RDF to associate the non-HTTP-resolvable
forms with the URIs. Based on my admittedly limited
experience with trying to write RDF using Darwin Core terms,
I think that in most cases there already exists appropriate
terms for getting the job done. What may be lacking is
concrete examples and community consensus on what terms to
use for what. I also think that there are probably some
"ID" terms where it isn't really very important (from an RDF
point of view) that there exist both a URI form and a text
string form. I'm thinking of something like
dwc:identificationID, which is mostly likely to be needed to
allow a machine to make a connection between some resource
and its identification. The machine isn't going to care if
there is a human-readable version. In contrast, something
like dwc:collectionID is likely to need both a URI version
(e.g. proxied version of the BCI LSID) for the machines and
a string version (the name of the collection as it would be
displayed) for humans. I think that trying to make
example/template RDF for various types of resources will
help make it clear in which cases one version (URI), the
other (string), or both are actually necessary.
I "volunteered" a couple weeks ago to have a go at writing
an RDF guide for Darwin Core. I am still willing to do
this, although I'm still getting caught up at work from
being at the TDWG meeting. However, next week we have fall
break and I will make it a priority to come up with a draft
which can be the subject of discussion. As a part of this
process, I think it would be good to create one or more
"boilerplate" RDF files for the various kinds of resources
that are likely to be identified with GUIDs (e.g.
Occurrences, Taxa, etc.). This can also be a subject of
discussion and I think it will help to clarify what will
meet the actual needs that we have discussed in this thread.
I have a pretty clear picture of what I think Occurrence
RDF should look like. I'm going to have to depend on Pete
and others to deal with the taxonomy part.
Steve
Markus Döring wrote:
Steve, Pete,
Id like to draw your attention on a basic DarwinCore
design pattern. Dwc has the goal of being technology
independent by simply providing a list of abstract terms
one can use in various arenas such as xml, rdf, xhtml, csv
etc. And even within those there might be various ways of
using them (e.g. we have a normalised and a simple flat
xml schema), thats why we should have a guideline for each
of them on how to use them. We are missing such a
guideline for rdf currently, hence this debate.
Whether scientificName is a literal string or some complex
object shouldnt matter - its defined to be a scientific
name. Such a dwc rdf property could either hold a literal
string or a url to some name rdf:resource (potentially
with a rdfs:label).
With the introduction if many ID terms we have diluted
that idea a little already in my mind. We could have as
well used scientificName in xml to hold some identifier
for that name. All URNs tell you what they are by their
urn prefix (not necessarily how to resolve them), so you
can easily detect a UUID, LSID, http(s) url, ftp, doi and
apply the conventional resolution mechanism. The hardest
problem are the local ids and other plain identifers. For
those mainly we created the ID terms (at least in my
mind). I am feeling rather uncomfortable discussing the
introduction of specific dwc terms for each type of id.
Maybe we should remove all id terms in dwc and use the
specific guidelines to specify these? At least if you
really think having all those id terms for rdf is a good
thing I would feel much more comfortable going down this
route instead of diluting dwc by adding more and more
rather redundant terms. The abstract concept is key to a
dwc term, not the actual data type fo
rced by the technology you are using it with. Would you
want several date terms for various date formats? In fact
we do that already to some degree (eventDate, eventTime,
year, month, day, verbatimEventDate) and I always felt
this is not a good idea. There are also a number of
verbatimXXX terms in dwc which also contradict this
pattern.
Talking about new dwc terms - in the examples given
properties like "hasScientificName" is not strictly the
correct dwc term, which is simply scientificName. I think
it would be fine to have the convention in the rdf
guidlines to use hasDwcTerm instead of dwcTerm, this is
exactly what an rdf guideline is for. On the flip side I
am sure this only applies to some terms, recordBy for
example is likely to remain as it is. Its unclear to me
what is best to do really. Always stick to the original
dwc terms? Refine them through some rdfs or owl schema and
define the relation to the original term? Should we still
use the same namespace in this case?
As an rdf beginner even after a few years exposed I wonder
if we cant simply stick to the non ID terms and use them
either as literals or with a uri pointer. As in the rdf
world a resolvable http is really required for resource
relations to work, why not simply mandate this in the
guidelines? If you only happen to have non resolvable uris
like lsid or dois the guidelines should be asking you to
use proxied versions, knowing it will break rdf frameworks
and lod conventions otherwise. On the resolving side one
could always include such urns with owl:sameAs (or sth
alike) I believe. But how many non resolvable ids with no
matching http counterpart are really out there yet?
- Markus
On Oct 6, 2010, at 9:02, Peter DeVries wrote:
Hi Steve,
You are probably right that it might be best to use
rdfs:Label, but I am thinking we might be able to get
the same
result my defining the string variants as subproperties
of rdfs:Label.
This would make them an rdfs:Label but a special kind of
rdfs:Label.
This is one of those things that I would test with
Sindice and URIburner to see if they interpret these
correctly.
This would require a live vocabulary that Sindice could
look at to determine that hasScientificName is to be
treated as a rdfs:Label.
- Pete
On Mon, Oct 4, 2010 at 10:41 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu>
<mailto:steve.baskauf@vanderbilt.edu>
wrote:
Although this specific example deals with taxonomic name
identifiers, it is related to a previous discussion on
this list about how we should use the dwc:xxxxxID terms
and other terms (such as recordedBy and identifiedBy)
that could have either a string (literal) or URI form.
Although I don't really want to see an unnecessary
proliferation of Darwin Core terms, I think that in the
interest of clarity (particularly where RDF is involved)
there either should be multiple terms that make it clear
what form of identifier is expected, or else there
should be an understanding that in RDF the default for
such a term is a URI which would then have an rdfs:Label
property which was the string form. I think the former
would be preferable to the latter.
I came to this opinion when trying to write RDF
describing an herbarium specimen. The collector should
be the dwc:recordedBy property of the specimen.
Optimally, there would be a database in which known
collectors were assigned URIs so that "Glen N. Montz",
"Glen Montz", "G. N. Montz", etc. would all be different
labels for the same resource. However, realistically,
I'm not going to drop what I'm doing to set up such a
database (even if I were capable of doing it, which I'm
not). So I ended up just writing it as
<dwc:recordedBy>Glen N.
Montz</dwc:recordedBy> even though I knew it
wasn't probably the best thing. In a large Occurrence
database that was compiled from the RDF created by a lot
of people, there might end up being a mixture of strings
and URIs for dwc:recordedBy properties of the specimens.
It seems to me like it would be better to have
properties like dwc:recordedBy for strings and
dwc:recordedByURI for a corresponding URI (and I suppose
dwc:reco
rdedByLSID if anyone wants to use it). Of course, this
would require a number of term additions to DwC and
clarification in the DwC documentation that the generic
version was intended for strings.
With respect to the example
<dwc:hasScientificNameLSID
rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
I think you are right that (with the possible exception
of rdfs:seeAlso) there is an expectation that an
rdf:resource attribute will be a resolvable URI that
produces RDF. So
<dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
is probably better.
Steve
Peter DeVries wrote:
I have been thinking about the following pattern. In
part after looking at the GBIF vocabulary.
I am not sure if it is even a good idea but might be
worth some discussion.
For those fields that have both a string and "ID" form
maybe the following pattern might be useful
hasScientificName = string form
hasScientificNameURI = Resolvable LOD compliant
identifier
hasScientificNameLSID = LSID identifier which could be
resolvable once you add the "http:proxy" <http:proxy>
etc.
This allows all three forms to be included if desired,
it also provides a hint as to how the field should be
interpreted or resolved.
One group could also provide a mapping service so that
each record does not need to include all three forms,
but would allow systems
to find the matching LSID for a given URI or vs.
versa.
My concern was that it would be difficult to infer how
a scientificNameID should be interpreted by other
systems.
Is this an LSD, is it a URI, is it a UUID etc. ?
This impacts the structure of the RDF.
* Note that the actual identifiers might not be
correct, the example below is more about the form of
the RDF
* For instance, I don't think it is probably correct
to see the COL LSID as just a namestring
* Also in this example the GNI name does not exactly
match the string name
<dwc:hasScientificName>Puma concolor (Linnaeus
1771)</dwc:hasScientificName>
<dwc:hasScientificNameURI rdf:resource="http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8"
<http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8>
/>
<dwc:hasScientificNameLSID
rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
Some system may choke on the LSID form assuming that
it uses a standard resolution mechanism
So it might be best to use this form
<dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
- Pete
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge
Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------
This body part will be downloaded on demand.
--
Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com
innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491