[tdwg-tag] Proposed charter for a TAG Vocabulary Management Task Group (VOMAG)

Steve Baskauf steve.baskauf at vanderbilt.edu
Fri Feb 10 20:01:43 CET 2012


I'm going to post this message here rather than on the GBIF community 
site for the Vocabulary Management group because the subject lies at the 
interface between what the VoMaG is proposing and the core mission of TDWG.

As the review manager for Audubon Core, I have struggled to understand 
how one is supposed to reconcile the practical details of publishing and 
maintaining a vocabulary with the standards documentation process 
described in the draft TDWG Standards Documentation Specification 
(http://www.tdwg.org/standards/147/).  I think that the specification 
works pretty well for a standard like an Applicability Statement (sensu 
http://www.tdwg.org/standards/status-and-categories/) which is by its 
nature somewhat descriptive and not likely to have repeated, small 
technical changes over time.  However, I don't think that the 
specification really works for something like a vocabulary which falls 
into the Technical Specification.  Vocabularies such as Darwin Core and 
the draft Audubon Core standard are subject to continual change and 
evolution. The documentation specification does not allow that because 
"once a standard has been ratified it cannot be changed in any 
substantive way; it must be superseded by a standard with a different 
name" (section 8).  Vocabularies may also be defined in non-human 
readable form (e.g. Darwin Core whose normative definition is in RDF) 
which is not really dealt with adequately in the documentation 
specification which is focused primarily on the formatting of 
human-readable documents.  So the documentation specification leaves us 
with little guidance on the subject of vocabularies.

In the case of Darwin Core, its namespace policy 
(http://rs.tdwg.org/dwc/terms/namespace/index.htm) describes the process 
by which the vocabulary can evolve.  Although this process is somewhat 
lengthy and perhaps confusing to the uninitiated, it has actually worked 
and resulted in improvements in the standard.  In the Audubon Core 
review process, we have talked about using a similar mechanism to ensure 
that the AC vocabulary can also evolve over time without the necessity 
to go through the really painful process of resubmitting the vocabulary 
as a new standard (which would be required under the draft documentation 
standard.  But such a mechanism is not included (or even allowed) in the 
current version of the documentation specification.

I think that the standards development process at TDWG has been hampered 
by the lack of a ratified Documentation Standard.  However, I don't 
think the solution is to just ratify the existing version because as 
I've noted, I don't think it really works for vocabularies.  If it is 
appropriate, it might be good for the VoMaG to look at the process of 
vocabulary maintenance and documentation in the "standards" context.  
For example, should the process described in the DwC namespace policy be 
adopted more broadly as a policy for all TDWG standards that are 
vocabularies?  Who should bear the responsibility of managing the 
revisions to vocabulary standards - the task group that created them? - 
the TAG? - somebody else?  How do we ensure that the normative 
definition of the vocabularies are readily accessible and understandable 
to software developers and content providers who want to use them?  
These are all things which I think probably should be included in a 
final Documentation Standard and I think that based on recent experience 
we have a better idea now as to what might work than when the existing 
draft was created in 2007.  If it turns out that figuring this out falls 
outside of the mandate of the VoMaG, then perhaps another task group 
could be formed to work in parallel with the VoMaG to work out these 
details and to put them into writing in the form of a ratified Standards 
Documentation Specification.

I also would like to suggest that the VoMaG take a look at the fourth 
category of standards described at 
http://www.tdwg.org/standards/status-and-categories/ , namely the "Data 
Standard" category.  The category description says that a Data Standard 
"specifies valid values in controlled vocabularies".  I'm not sure 
exactly what that means, but I'm assuming it means what one uses when a 
term definition says "Recommended best practice is to use a controlled 
vocabulary".  This frequently comes up, but to my knowledge, nobody has 
ever tried to define a controlled vocabulary as a Data Standard.  In 
that context, it would be useful to establish just what a "controlled 
vocabulary" means.  Most current applications provide text literals as 
the values for properties such as basisOfRecord.  However, in the case 
of RDF, it might be a better practice to reference URIs rather than text 
strings.  Or not.  I'm not sure.  Again this is something that would be 
useful to work out.  We currently don't have any precedents in the TDWG 
community so maybe there are examples elsewhere that we could use as a 
model.

Steve

On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:
> Dear TAG,
>
> After battling with the plans for a biodiversity knowledge organization
> (KOS) framework for biodiversity information resources we have
> identified the requirement to develop guidelines and best practices for
> the management of vocabularies of terms. Basic terms organized in
> vocabularies provides an essential element to underpin the biodiversity
> information standards. As introduced at the TDWG 2011 TAG meetings in
> New Orleans, we propose the formation of a new Vocabulary Management
> Task Group (VOMAG) [1] to be organized under the TDWG technical
> architecture group (TAG). Please find the draft charter available from
> the GBIF Community Site [2][3]. Here you will also find another draft
> document "Biodiversity Knowledge Organization System: Proposed
> Architecture; Version 0.1 February 2012" that provides an overview of
> the proposed KOS landscape and the context for the proposed work-plan of
> the Vocabulary Management Task Group Charter.
>
> This is the first draft so far only discussed with Greg Whitbread as
> convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as
> convener of the TDWG RDF/OWL task group. We invite feedback and comments
> to the proposed formation of the task group including suggestions with
> regard to the work-plan. Please join the Vocabulary Management group at
> the GBIF Community Site [1]. You can start or participate in discussions
> or share suggestions using the GBIF Community Site. Feel also free to
> make contact with us to volunteer as a core member for this proposed
> task group!
>
> [1] http://community.gbif.org/pg/groups/21382/vocabulary-management/
> [2] http://community.gbif.org/pg/file/read/21388/
> [3] http://community.gbif.org/pg/blog/read/21387/
> [4] http://community.gbif.org/pg/file/read/21582/
>
>
> Best regards
> Dag, Eamonn and David
>
>
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu




More information about the tdwg-tag mailing list