[tdwg-tag] Proposed charter for a TAG Vocabulary Management Task Group (VOMAG)
Steve Baskauf
steve.baskauf at vanderbilt.edu
Fri Feb 10 20:01:43 CET 2012
I'm going to post this message here rather than on the GBIF community
site for the Vocabulary Management group because the subject lies at the
interface between what the VoMaG is proposing and the core mission of TDWG.
As the review manager for Audubon Core, I have struggled to understand
how one is supposed to reconcile the practical details of publishing and
maintaining a vocabulary with the standards documentation process
described in the draft TDWG Standards Documentation Specification
(http://www.tdwg.org/standards/147/). I think that the specification
works pretty well for a standard like an Applicability Statement (sensu
http://www.tdwg.org/standards/status-and-categories/) which is by its
nature somewhat descriptive and not likely to have repeated, small
technical changes over time. However, I don't think that the
specification really works for something like a vocabulary which falls
into the Technical Specification. Vocabularies such as Darwin Core and
the draft Audubon Core standard are subject to continual change and
evolution. The documentation specification does not allow that because
"once a standard has been ratified it cannot be changed in any
substantive way; it must be superseded by a standard with a different
name" (section 8). Vocabularies may also be defined in non-human
readable form (e.g. Darwin Core whose normative definition is in RDF)
which is not really dealt with adequately in the documentation
specification which is focused primarily on the formatting of
human-readable documents. So the documentation specification leaves us
with little guidance on the subject of vocabularies.
In the case of Darwin Core, its namespace policy
(http://rs.tdwg.org/dwc/terms/namespace/index.htm) describes the process
by which the vocabulary can evolve. Although this process is somewhat
lengthy and perhaps confusing to the uninitiated, it has actually worked
and resulted in improvements in the standard. In the Audubon Core
review process, we have talked about using a similar mechanism to ensure
that the AC vocabulary can also evolve over time without the necessity
to go through the really painful process of resubmitting the vocabulary
as a new standard (which would be required under the draft documentation
standard. But such a mechanism is not included (or even allowed) in the
current version of the documentation specification.
I think that the standards development process at TDWG has been hampered
by the lack of a ratified Documentation Standard. However, I don't
think the solution is to just ratify the existing version because as
I've noted, I don't think it really works for vocabularies. If it is
appropriate, it might be good for the VoMaG to look at the process of
vocabulary maintenance and documentation in the "standards" context.
For example, should the process described in the DwC namespace policy be
adopted more broadly as a policy for all TDWG standards that are
vocabularies? Who should bear the responsibility of managing the
revisions to vocabulary standards - the task group that created them? -
the TAG? - somebody else? How do we ensure that the normative
definition of the vocabularies are readily accessible and understandable
to software developers and content providers who want to use them?
These are all things which I think probably should be included in a
final Documentation Standard and I think that based on recent experience
we have a better idea now as to what might work than when the existing
draft was created in 2007. If it turns out that figuring this out falls
outside of the mandate of the VoMaG, then perhaps another task group
could be formed to work in parallel with the VoMaG to work out these
details and to put them into writing in the form of a ratified Standards
Documentation Specification.
I also would like to suggest that the VoMaG take a look at the fourth
category of standards described at
http://www.tdwg.org/standards/status-and-categories/ , namely the "Data
Standard" category. The category description says that a Data Standard
"specifies valid values in controlled vocabularies". I'm not sure
exactly what that means, but I'm assuming it means what one uses when a
term definition says "Recommended best practice is to use a controlled
vocabulary". This frequently comes up, but to my knowledge, nobody has
ever tried to define a controlled vocabulary as a Data Standard. In
that context, it would be useful to establish just what a "controlled
vocabulary" means. Most current applications provide text literals as
the values for properties such as basisOfRecord. However, in the case
of RDF, it might be a better practice to reference URIs rather than text
strings. Or not. I'm not sure. Again this is something that would be
useful to work out. We currently don't have any precedents in the TDWG
community so maybe there are examples elsewhere that we could use as a
model.
Steve
On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:
> Dear TAG,
>
> After battling with the plans for a biodiversity knowledge organization
> (KOS) framework for biodiversity information resources we have
> identified the requirement to develop guidelines and best practices for
> the management of vocabularies of terms. Basic terms organized in
> vocabularies provides an essential element to underpin the biodiversity
> information standards. As introduced at the TDWG 2011 TAG meetings in
> New Orleans, we propose the formation of a new Vocabulary Management
> Task Group (VOMAG) [1] to be organized under the TDWG technical
> architecture group (TAG). Please find the draft charter available from
> the GBIF Community Site [2][3]. Here you will also find another draft
> document "Biodiversity Knowledge Organization System: Proposed
> Architecture; Version 0.1 February 2012" that provides an overview of
> the proposed KOS landscape and the context for the proposed work-plan of
> the Vocabulary Management Task Group Charter.
>
> This is the first draft so far only discussed with Greg Whitbread as
> convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as
> convener of the TDWG RDF/OWL task group. We invite feedback and comments
> to the proposed formation of the task group including suggestions with
> regard to the work-plan. Please join the Vocabulary Management group at
> the GBIF Community Site [1]. You can start or participate in discussions
> or share suggestions using the GBIF Community Site. Feel also free to
> make contact with us to volunteer as a core member for this proposed
> task group!
>
> [1] http://community.gbif.org/pg/groups/21382/vocabulary-management/
> [2] http://community.gbif.org/pg/file/read/21388/
> [3] http://community.gbif.org/pg/blog/read/21387/
> [4] http://community.gbif.org/pg/file/read/21582/
>
>
> Best regards
> Dag, Eamonn and David
>
>
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
More information about the tdwg-tag
mailing list