[tdwg-content] Use of dwc:xxxxID terms in RDF, was Re: dwc:associatedOccurrences

John Wieczorek tuco at berkeley.edu
Wed Sep 8 00:51:35 CEST 2010

Comments inline.

On Sun, Sep 5, 2010 at 11:16 PM, Markus Döring <m.doering at mac.com> wrote:

> Steve,
> I was about to fix the xsl when I thought we might first want to decide on
> what the whole dwc "site" should be offering.
> Currently we have:
> - normative html definitions for all current and historical term versions
> - 2 guidelines (xml & text) on how to use dwc terms with the respective
> technology including xml schemas and other files needed for that.
> - dwcterms.rdf and dwctermshistory.rdf for rdf definitions of the terms.
> Primarily these are intended to support the use of dwc terms in rdf and
> allow a resolution of the term URIs into rdf.
> It would be really nice if you or someone else would add a new guideline on
> how to use dwc terms in the context of rdf.
> For that I am wondering if dwctermshistory.rdf is needed at all or whether
> its not good enough to maintain the html version only.
> And in the light of having already normative html versions I support the
> idea of removing the xsl stylesheet from the rdf files.

A bit of history. The dwctermshistory.rdf is meant to be a complete and
single normative document of the Darwin Core terms whether current or
obsolete. The term description content in HTML on the Darwin Core site can
be generated from the dwctermshistory.rdf file, but not vice versa. I would
not remove it.
The dcterms.rdf file was created for convenience. It is meant to contain
only what is necessary for the Darwin Core as it currently stands, not all
of the history that lead up to its current form. If you don't need to know
about how something came to be in its current form, dcterms.rdf should be
I'm neutral about the xsl stylesheet, because I don't use it or maintain it.
I think it might be useful to have a functional XSL stylesheet as one of the
tools available from the Tools and Applications section of the Darwin Core
wiki (http://code.google.com/p/darwincore/wiki/ToolsAndApplications), as
there are people who would like to adapt it for customized documentation
based on the normative standard.

> In regards to the resolution of the term URIs I have setup some content
> negotiation that takes you to the rdf in case you request
> "application/rdf+xml" - otherwise it defaults to the html definitions. I
> think that is one more reason that we dont need any xsl on the rdf anymore.
> http://rs.tdwg.org/dwc/terms/taxonID
> now gets redirected to either:
> http://darwincore.googlecode.com/svn/trunk/rdf/dwcterms.rdf
> http://darwincore.googlecode.com/svn/trunk/terms/index.htm#taxonID
> For the domains of the ID terms its good to have a more in depth
> discussion. To me those ID terms can always both, a primary or a foreign key
> depending where you use them. For RDF as you say there is a natural primary
> key with rdf:about for all rdf resources, so the purpose for dwc ID terms
> should be restricted to act as "foreign keys", i.e. have no domain at all
> (unless we want to go into the discussion which is the definite set of
> accepted domains and how they actually relate - sth we had decided dwc will
> not do but wait for the proper ontology work to take on).

For that discussion Id also like to point out that analogeous to rdf we have
> created guidelines for using dwc terms in normalised xml that defines
> "classes". In the matching xml schema actually all ID terms are allowed in
> every class:
> http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#classes
> http://rs.tdwg.org/dwc/xsd/tdwg_dwc_class_terms.xsd
> In the html definitions we assign a "class" to all terms, some of which are
> "all" though (for example all record level ones). We should consider for all
> ID terms a Class of "all" too, so we can use them as "foreign keys" as well.
> In the historical html file we additionally have a "Has Domain" definition.
> It seems to me this is a left over from the initial definitions and this can
> be removed.

Just so everyone knows, the "assignment" of a class to a term in Darwin Core
is nothing more than an organizational convenience accomplished through an
rdf predicate called organizedInClass created specifically for this purpose.
This predicate allows to make these convenient groupings without going so
far as to put say that a term is a property of a given class.

> In the dwcterms.rdf definitions we dont use rdfs:domain at all right now.
> Btw, Ive changed rdfs:replaces to dcterms:replaces in the rdf files.
> If we all agree I would propose to do the following additional changes:
> - remove xsl from rdf files

I agree if you move functional versions to the Tools wiki page.

> - remove dwctermshistory.rdf in favor over html alone?

I don't agree. This is a normative document.

> - remove "Has Domain" from term history.

If there is any chance that we will use domains for anything in the future I
would leave this as it is. Otherwise, there may be a number of places in the
documentation where hasDomain needs to be excised. Since it is functionally
equivalent to have hasDomain blank, I would leave it that way until we are
absolutely sure.

> Markus
> On Sep 6, 2010, at 2:38, Steve Baskauf wrote:
> > Thanks for the references, John.  Sorry for bringing up a settled issue -
> I didn't mean to plow old ground again (I think I wasn't on the list yet
> when these posts came out).  I can see the advantage of not actually
> specifying a domain for the DwC terms in terms of not squelching innovative
> uses of the DwC terms.  However, rdfs:domain issue aside, if I'm
> understanding the cited archived messages correctly, the point of putting
> the DwC terms into classes at all is to help to clarify what they mean and
> how they should be used.  The point of my post was that the dwc:xxxxID terms
> have two possible uses, one of which is consistent with their current
> placement in classes and one (the one that I think is most useful) that is
> not.
> >
> > I guess the reason that I'm bringing this up is that I'm using these
> terms pretty heavily in a way that I'm not sure is the way they are
> intended.  What I would like is either some confirmation from the community
> that this use is OK or else we need to define some other terms that will get
> the job done.  Darwin Core doesn't at this point have an "instruction book"
> except for the Google Code Wiki and that Wiki has no descriptions yet for
> most of the terms in the vocabulary.  I don't feel comfortable creating my
> own meanings for terms that don't have clear guidelines.  After all, the
> point of having Darwin Core terms is so that people use them to refer to the
> same "thing".
> >
> > I would also like to make two suggestions about the RDF file at
> http://darwincore.googlecode.com/svn/trunk/rdf/dwcterms.rdf .  One is that
> either someone needs to fix the "human.xsl" file so that it renders the RDF
> in a web browser, or remove the xml-stylesheet tag and just let web browsers
> display the raw RDF XML.  This has not rendered properly since the rdf was
> moved over from the tdwg domain (I think at least six months ago).  It
> doesn't matter to me since I know how to use "view source" to see the source
> XML, but it's really pretty shoddy form for the definition of a major
> vocabulary.  The second thing is that if the DwC terms are not going to have
> rdfs:domains specified, then somebody should take those erroneous comments
> out of the dwcterms.rdf file.  I think that would be allowable without
> requiring the issuing of a new version because it's correcting an error and
> doesn't actually change the DwC standard itself.
> >
> > Steve
> >
> > John Wieczorek wrote:
> >> For background on the state of and reasons for the Darwin Core domains,
> here are relevant messages in the tdwg-content list archives from the public
> review period for the standard as it currently stands:
> >>
> >> assertions in DwC terms:
> >>
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-June/000019.html
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-June/000021.html
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-June/000022.html
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-June/000024.html
> >>
> >> Request for Decision for Public Review of DarwinCore Draft Standard:
> >>
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-July/000035.html
> >>
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-August/000088.html
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-August/000089.html
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-August/000098.html
> >> http://lists.tdwg.org/pipermail/tdwg-content/2009-August/000092.html
> >>
> >> On Tue, Aug 31, 2010 at 12:34 PM, Steve Baskauf <
> steve.baskauf at vanderbilt.edu> wrote:
> >> Markus,
> >> Thanks!  After I posted the other night I was thinking that the way I
> had constructed the RDF was a bit convoluted.  Your example is much more
> straightforward.  (although except for a lack of typing on my part the two
> examples actually result in the same triples).
> >>
> >> As you noted, the other thing that my example has is that the record for
> the offspring makes reference to the dwc:resourceRelationshipID as a way to
> assert that the offspring has that relationship.  I would be interested in
> some discussion about the appropriate use of the varous dwc:xxxxID terms
> (e.g. dwc:occurrenceID, dwc:institutionID, etc.).  As was the case with the
> resourceRelationship terms, there aren't a lot of examples showing the
> proper use of these terms, with the noteworthy exception of
> http://rs.tdwg.org/dwc/terms/guides/xml/index.htm .  As I see it, there
> are two possible uses of these terms.  One (the "first" use) is to identify
> the actual resource that is the subject of a record.  The other (the
> "second" use) is as an "IDRef" term that connects the subject resource to
> another record of a different type.  In the XML example, we see both uses.
>  In the first example in section 2.7 we see
> >>
> >>     <dcterms:Location>
> >>         <dwc:locationID>http://guid.mvz.org/sites/arg/127
> </dwc:locationID>
> >>              ... more metadata...
> >>     </dcterms:Location>
> >>     <dwc:Occurrence>
> >>              ... more metadata ...
> >>
> <dwc:occurrenceID>urn:catalog:MVZ:Mammals:14523</dwc:occurrenceID>
> >>              ... more metadata ...
> >>         <dwc:locationID>http://guid.mvz.org/sites/arg/127
> </dwc:locationID>
> >>     </dwc:Occurrence>
> >>
> >> where in the dcterms:Location record, dwc:locationID indicates the
> identifier for the Location itself (i.e. the subject), while in the
> dwc:Occurrence record, dwc:locationID refers to the object of the location
> property of the Occurrence.  Because of the freewheeling nature of generic
> XML files, I guess this is OK.  However, when I'm trying to think clearly
> about how to express relationships in RDF, I've come to decide that I don't
> like this dual use.  I would express the above relationships in RDF as
> follows:
> >>
> >> <rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
> >>     <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
> >>        ...metadata...
> >> </rdf:Description>
> >> <rdf:Description rdf:about="
> http://resolver.org/urn:catalog:MVZ:Mammals:14523">
> >>     <rdfs:type rdf:resource="http://rs.tdwg.org/dwc/terms/Occurrence"/>
> >>        ... metadata...
> >>     <dwc:locationID rdf:resource="http://guid.mvz.org/sites/arg/127"/>
> >> </rdf:Description>
> >>
> >> To me this makes the most sense.  It corresponds to the "second" use in
> the XML example (it indicates the Location property of the Occurrence, i.e.
> serves as the predicate connecting the subject Occurrence to the object
> Location).   One could also do something like
> >>
> >> <rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
> >>     <dwc:locationID rdf:resource="http://guid.mvz.org/sites/arg/127"/>
> >>     <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
> >>        ...metadata...
> >> etc.
> >>
> >> or
> >>
> >> <rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
> >>     <dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
> >>     <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
> >>        ...metadata...
> >> etc.
> >>
> >> But this seems rather pointless because in the first case, the rdf:about
> attribute already provides information about the subject of the relationship
> in a Linked Data manner.  In the second case, why have a special term to
> indicate a literal version of the identifier when there is a more generic
> and well-known term that is common use? e.g.
> >>
> >> <rdf:Description rdf:about="http://guid.mvz.org/sites/arg/127">
> >>     <dcterms:identifier>http://guid.mvz.org/sites/arg/127
> </dcterms:identifier>
> >>     <rdfs:type rdf:resource="http://purl.org/dc/terms/Location"/>
> >>        ...metadata...
> >> etc.
> >>
> >> In contrast, there is no simple alternative that I can see (with the
> possible exception of the dwc:relatedResource, which I wouldn't call a
> SIMPLE alternative) to using dwc:locationID as the predicate that points to
> the Location object of an Occurrence subject.  Darwin Core doesn't have any
> terms specifically designed for this (such as "hasLocation" or
> "hasIdentificatin" for example) but I don't see any reason why the
> dwc:xxxxID terms couldn't be used this way.
> >>
> >> The main issue I can see with the use of these dwc:xxxxID terms is their
> placement in classes which I guess is related to their rdfs:domain .
>  Although the term description comments at
> http://darwincore.googlecode.com/svn/trunk/rdf/dwcterms.rdf (is this the
> actual place where DwC is officially defined?) say that each term has a
> rdfs:domain property, they actually don't in the RDF.  So the only real clue
> to the intended subjects of the dwc:xxxxID terms is the placement of the
> dwc:xxxxID terms into classes.  In most cases, a dwc:xxxxID term is placed
> in the class of the thing that the term is identifying (e.g. occurrenceID is
> in the Occurrence class, identificationID is in the Identification class,
> etc.).  This would hint that the terms were intended to be used with
> instances of their class as their subject.  However, if the terms were used
> as IDRefs (the "second" use shown in the XML example) then they really
> should be in different classes.  For example, dwc:locationID could be a
> member of the class Occurrence, since an Occurrence could have
> dwc:locationID as a property with an instance of a Location as its object.
>  But then there might be the question of whether an Event could also have a
> dwc:locationID as a property and dwc:locationID shouldn't be in two classes.
>  In the absence of defined rdfs:domains, I guess the terms can be used with
> pretty much any subject a user thinks makes sense.  As Bob Morris pointed
> out in a recent post, there aren't very many applications that are doing
> much in the way of semantic reasoning at this point.  But I suppose (hope?)
> that there could be some in the future, so achieving some kind of consensus
> on how the terms should be used in RDF would probably be good.
> >>
> >> At this point, the dwc:xxxxID terms serve a useful purpose for me in
> connecting one kind of resource to another (examples in
> http://bioimages.vanderbilt.edu/ind-baskauf/41870.rdf and
> http://bioimages.vanderbilt.edu/baskauf/51363.rdf), so unless somebody
> tells me that's "wrong" and comes up with a better term to describe the
> relationships I need to describe, I'll keep using the dwc:xxxxID terms as
> IDRefs.
> >>
> >> Comments?
> >> Steve
> >>
> >> Markus Döring wrote:
> >>> I must say it feels strange to use the id terms in rdf to refer to
> rdf:resources instead of holding ID literals.
> >>> More natural to the rdf language would be <dwc:resource
> rdf:resource="..." /> and <dwc:relatedResource rdf:resource="..." /> I
> suppose, but the dwc vocabulary was built for different technologies, not
> rdf alone. So I guess thats the price we have to pay.
> >>>
> >>> Markus
> >>>
> >>>
> >> --
> >> Steven J. Baskauf, Ph.D., Senior Lecturer
> >> Vanderbilt University Dept. of Biological Sciences
> >>
> >> postal mail address:
> >> VU Station B 351634
> >> Nashville, TN  37235-1634,  U.S.A.
> >>
> >> delivery address:
> >> 2125 Stevenson Center
> >> 1161 21st Ave., S.
> >> Nashville, TN 37235
> >>
> >> office: 2128 Stevenson Center
> >> phone: (615) 343-4582,  fax: (615) 343-6707
> >>
> >> http://bioimages.vanderbilt.edu
> >>
> >>
> >>
> >>
> >
> > --
> > Steven J. Baskauf, Ph.D., Senior Lecturer
> > Vanderbilt University Dept. of Biological Sciences
> >
> > postal mail address:
> > VU Station B 351634
> > Nashville, TN  37235-1634,  U.S.A.
> >
> > delivery address:
> > 2125 Stevenson Center
> > 1161 21st Ave., S.
> > Nashville, TN 37235
> >
> > office: 2128 Stevenson Center
> > phone: (615) 343-4582,  fax: (615) 343-6707
> >
> > http://bioimages.vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20100907/4dc6476b/attachment.html 

More information about the tdwg-content mailing list