[tdwg-tag] Creating a TDWG standard for documenting Data Standards
steve.baskauf at vanderbilt.edu
Wed Mar 7 16:10:31 CET 2012
Dag and Éamonn,
Thanks for your interest and response. Responses to your comments inline.
On 3/7/2012 7:18 AM, Dag Endresen [GBIF] wrote:
> Do you envision that the guidelines for providing documentation on a
> controlled (value) vocabulary might be included as part of the TDWG
> Standards Documentation Specification
> (http://www.tdwg.org/standards/147/)? And to eventually move to seek
> ratification for this standard initiated by Roger Hyam some years ago...?
Well, I think that it would be extremely beneficial to finish that
specification. I think it provides very useful guidelines for
documenting human-readable documents. However, it says little about how
to document non-human readable documents. I have brought this issue up
in the context of Darwin Core where it is not very clear which RDF
documents are normative. If Roger is not working on this any more,
perhaps it could be passed off to somebody else (not me, please). I
would recommend that whomever works on it consult with the authors and
review managers of the standards which are undergoing ratification or
which have been ratified since the existing documentation standard draft
was written. I think that would be Darwin Core, Audubon Core, the
GUID/LSID Applicability Statement, and TAPIR (I might have missed one).
Those people would probably have useful feedback on what did and did not
work for them. In particular, I think that the part of the draft which
says that there is no versioning of standards needs to be reworked. I
understand the rationale, but as a practical matter we are ignoring that
prohibitin in Darwin Core which has a namespace policy that allows the
standard to evolve without re-submission and ratification. Audubon Core
is considering a similar policy. If the DwC namespace policy is
something that works (I suppose that is subject to debate) it could be
written into the documentation standards.
> Or could these RDF guidelines be seen more as the kind of Type 3 (or
> type 2) documents that Roger describes with the proposed documentation
> specification standard.
I think that it might have been a mistake on my part to have mentioned
creating a standard in the email because it seems to have brought up all
kinds of ancient history of which I'm not aware. But then my initial
email was just to bring up the idea with you and Éamonn rather than to
start a list discussion (which happened inadvertently when I used "reply
all" to a previous message of yours - remind me not to do that again).
At this point it may be premature to suggest that it be a standard,
although it could be one of the type "Current Best Practice". But first
I think that there needs to be some more serious discussion and
experimentation in the context of the RDF Task Group to find out whether
creating RDF representations of controlled vocabularies actually
accomplishes anything useful or not.
> I have been following the first discussions on the TDWG-RDF Google
> Group with great interest. The discussion on how to report values as
> literals when a term is declared with the range to be a non-literal
> resource type has been very educational for me to follow. It seems to
> me that this is still a topic under exploratory discussion in the
> Dublin Core Metadata Initiative group, however some guidelines for the
> TDWG best practices would be valuable! A large part of the RDF
> vocabulary guidelines would probably be on a less technically detailed
As one who has struggled to understand the DCMI Abstract Model, I think
that if we cannot provide guidelines which can be understood in a short
period of time by people with a general understanding of data
management, then we are wasting our time. I think that an important
part of this is providing concrete examples.
> Do you think that the same guidelines can cover the scope for both the
> RDF vocabularies of terms and the controlled value vocabularies? My
> thinking is that either terms or controlled values can be described as
> concepts in a basic RDF vocabularies - and that these concepts can be
> re-used in derived resources such as the Darwin Core Archive
> extensions and vocabularies (or re-used by ontologies). The best
> practices for constructing a vocabulary defining concepts intended
> either as controlled values or concepts intended as terms could be
> largely the same?
I don't know the answer to this. I was thinking about controlled
vocabularies because they are generally initially defined as text
strings. It would therefore be relatively easy to combine some form of
those strings with a namespace to create a URI, then provide minimal RDF
to support dereferencing. We have examples of this in the three
controlled vocabularies I mentioned in my initial email. I think that
we have learned from the experience with the Darwin Core type vocabulary
that it is probably unwise to embed things like subclass properties
within the term definition RDF itself. However, I am intrigued by the
possibility of including enough semantics to allow a client to figure
out that terms which are expressed in different languages are equivalent
(as Éamonn mentioned in his response). I don't know what the best
strategy for doing that is. The Library of Congress language tag RDF
expresses relationships using SKOS terms, but I do not have enough
experience with KOS to know how widely clients are equipped to make use
of that kind of information. But I know that we have people in our
community who are familiar with that and hopefully they can advise.
The issue of how best to define in RDF general-use vocabulary terms
seems to me to be a more difficult issue. In Darwin Core, we have the
normative definitions in RDF. This presents problems if there later
turns out to be problems with that RDF - it requires going through the
term change process to fix those problems. Audubon Core is following a
different model, which is to define the terms in human language and
leave the problem of creating the RDF for URI dereferencing until later
(I think). I am hoping that as the RDF group looks carefully at how
Darwin Core terms can be used in RDF that we can figure out what the
actual best-practices are and provide some guidance for future
I need to take some time to digest Paul's comments on this subject.
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
More information about the tdwg-tag