[tdwg-tag] Creating a TDWG standard for documenting Data Standards

Steve Baskauf steve.baskauf at vanderbilt.edu
Wed Mar 7 16:10:31 CET 2012

Dag and Éamonn,
Thanks for your interest and response.  Responses to your comments inline.

On 3/7/2012 7:18 AM, Dag Endresen [GBIF] wrote:
> Do you envision that the guidelines for providing documentation on a 
> controlled (value) vocabulary might be included as part of the TDWG 
> Standards Documentation Specification 
> (http://www.tdwg.org/standards/147/)? And to eventually move to seek 
> ratification for this standard initiated by Roger Hyam some years ago...?
Well, I think that it would be extremely beneficial to finish that 
specification.  I think it provides very useful guidelines for 
documenting human-readable documents.  However, it says little about how 
to document non-human readable documents.  I have brought this issue up 
in the context of Darwin Core where it is not very clear which RDF 
documents are normative.  If Roger is not working on this any more, 
perhaps it could be passed off to somebody else (not me, please).  I 
would recommend that whomever works on it consult with the authors and 
review managers of the standards which are undergoing ratification or 
which have been ratified since the existing documentation standard draft 
was written.  I think that would be Darwin Core, Audubon Core, the 
GUID/LSID Applicability Statement, and TAPIR (I might have missed one).  
Those people would probably have useful feedback on what did and did not 
work for them.  In particular, I think that the part of the draft which 
says that there is no versioning of standards needs to be reworked.  I 
understand the rationale, but as a practical matter we are ignoring that 
prohibitin in Darwin Core which has a namespace policy that allows the 
standard to evolve without re-submission and ratification.  Audubon Core 
is considering a similar policy.  If the DwC namespace policy is 
something that works (I suppose that is subject to debate) it could be 
written into the documentation standards.
> Or could these RDF guidelines be seen more as the kind of Type 3 (or 
> type 2) documents that Roger describes with the proposed documentation 
> specification standard.
I think that it might have been a mistake on my part to have mentioned 
creating a standard in the email because it seems to have brought up all 
kinds of ancient history of which I'm not aware.  But then my initial 
email was just to bring up the idea with you and Éamonn rather than to 
start a list discussion (which happened inadvertently when I used "reply 
all" to a previous message of yours - remind me not to do that again).  
At this point it may be premature to suggest that it be a standard, 
although it could be one of the type "Current Best Practice".  But first 
I think that there needs to be some more serious discussion and 
experimentation in the context of the RDF Task Group to find out whether 
creating RDF representations of controlled vocabularies actually 
accomplishes anything useful or not.
> I have been following the first discussions on the TDWG-RDF Google 
> Group with great interest. The discussion on how to report values as 
> literals when a term is declared with the range to be a non-literal 
> resource type has been very educational for me to follow. It seems to 
> me that this is still a topic under exploratory discussion in the 
> Dublin Core Metadata Initiative group, however some guidelines for the 
> TDWG best practices would be valuable! A large part of the RDF 
> vocabulary guidelines would probably be on a less technically detailed 
> level?
As one who has struggled to understand the DCMI Abstract Model, I think 
that if we cannot provide guidelines which can be understood in a short 
period of time by people with a general understanding of data 
management, then we are wasting our time.  I think that an important 
part of this is providing concrete examples.
> Do you think that the same guidelines can cover the scope for both the 
> RDF vocabularies of terms and the controlled value vocabularies? My 
> thinking is that either terms or controlled values can be described as 
> concepts in a basic RDF vocabularies - and that these concepts can be 
> re-used in derived resources such as the Darwin Core Archive 
> extensions and vocabularies (or re-used by ontologies). The best 
> practices for constructing a vocabulary defining concepts intended 
> either as controlled values or concepts intended as terms could be 
> largely the same?
I don't know the answer to this.  I was thinking about controlled 
vocabularies because they are generally initially defined as text 
strings.  It would therefore be relatively easy to combine some form of 
those strings with a namespace to create a URI, then provide minimal RDF 
to support dereferencing.  We have examples of this in the three 
controlled vocabularies I mentioned in my initial email.  I think that 
we have learned from the experience with the Darwin Core type vocabulary 
that it is probably unwise to embed things like subclass properties 
within the term definition RDF itself.  However, I am intrigued by the 
possibility of including enough semantics to allow a client to figure 
out that terms which are expressed in different languages are equivalent 
(as Éamonn mentioned in his response).  I don't know what the best 
strategy for doing that is.  The Library of Congress language tag RDF 
expresses relationships using SKOS terms, but I do not have enough 
experience with KOS to know how widely clients are equipped to make use 
of that kind of information.  But I know that we have people in our 
community who are familiar with that and hopefully they can advise.

The issue of how best to define in RDF general-use vocabulary terms 
seems to me to be a more difficult issue.  In Darwin Core, we have the 
normative definitions in RDF.  This presents problems if there later 
turns out to be problems with that RDF - it requires going through the 
term change process to fix those problems.  Audubon Core is following a 
different model, which is to define the terms in human language and 
leave the problem of creating the RDF for URI dereferencing until later 
(I think).  I am hoping that as the RDF group looks carefully at how 
Darwin Core terms can be used in RDF that we can figure out what the 
actual best-practices are and provide some guidance for future 
vocabulary development.

I need to take some time to digest Paul's comments on this subject.


Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

More information about the tdwg-tag mailing list