I'm going to post this message here rather than on the GBIF community site for the Vocabulary Management group because the subject lies at the interface between what the VoMaG is proposing and the core mission of TDWG. As the review manager for Audubon Core, I have struggled to understand how one is supposed to reconcile the practical details of publishing and maintaining a vocabulary with the standards documentation process described in the draft TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/). I think that the specification works pretty well for a standard like an Applicability Statement (sensu http://www.tdwg.org/standards/status-and-categories/) which is by its nature somewhat descriptive and not likely to have repeated, small technical changes over time. However, I don't think that the specification really works for something like a vocabulary which falls into the Technical Specification. Vocabularies such as Darwin Core and the draft Audubon Core standard are subject to continual change and evolution. The documentation specification does not allow that because "once a standard has been ratified it cannot be changed in any substantive way; it must be superseded by a standard with a different name" (section 8). Vocabularies may also be defined in non-human readable form (e.g. Darwin Core whose normative definition is in RDF) which is not really dealt with adequately in the documentation specification which is focused primarily on the formatting of human-readable documents. So the documentation specification leaves us with little guidance on the subject of vocabularies. In the case of Darwin Core, its namespace policy (http://rs.tdwg.org/dwc/terms/namespace/index.htm) describes the process by which the vocabulary can evolve. Although this process is somewhat lengthy and perhaps confusing to the uninitiated, it has actually worked and resulted in improvements in the standard. In the Audubon Core review process, we have talked about using a similar mechanism to ensure that the AC vocabulary can also evolve over time without the necessity to go through the really painful process of resubmitting the vocabulary as a new standard (which would be required under the draft documentation standard. But such a mechanism is not included (or even allowed) in the current version of the documentation specification. I think that the standards development process at TDWG has been hampered by the lack of a ratified Documentation Standard. However, I don't think the solution is to just ratify the existing version because as I've noted, I don't think it really works for vocabularies. If it is appropriate, it might be good for the VoMaG to look at the process of vocabulary maintenance and documentation in the "standards" context. For example, should the process described in the DwC namespace policy be adopted more broadly as a policy for all TDWG standards that are vocabularies? Who should bear the responsibility of managing the revisions to vocabulary standards - the task group that created them? - the TAG? - somebody else? How do we ensure that the normative definition of the vocabularies are readily accessible and understandable to software developers and content providers who want to use them? These are all things which I think probably should be included in a final Documentation Standard and I think that based on recent experience we have a better idea now as to what might work than when the existing draft was created in 2007. If it turns out that figuring this out falls outside of the mandate of the VoMaG, then perhaps another task group could be formed to work in parallel with the VoMaG to work out these details and to put them into writing in the form of a ratified Standards Documentation Specification. I also would like to suggest that the VoMaG take a look at the fourth category of standards described at http://www.tdwg.org/standards/status-and-categories/ , namely the "Data Standard" category. The category description says that a Data Standard "specifies valid values in controlled vocabularies". I'm not sure exactly what that means, but I'm assuming it means what one uses when a term definition says "Recommended best practice is to use a controlled vocabulary". This frequently comes up, but to my knowledge, nobody has ever tried to define a controlled vocabulary as a Data Standard. In that context, it would be useful to establish just what a "controlled vocabulary" means. Most current applications provide text literals as the values for properties such as basisOfRecord. However, in the case of RDF, it might be a better practice to reference URIs rather than text strings. Or not. I'm not sure. Again this is something that would be useful to work out. We currently don't have any precedents in the TDWG community so maybe there are examples elsewhere that we could use as a model. Steve On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:
Dear TAG,
After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter.
This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!
[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/ [2] http://community.gbif.org/pg/file/read/21388/ [3] http://community.gbif.org/pg/blog/read/21387/ [4] http://community.gbif.org/pg/file/read/21582/
Best regards Dag, Eamonn and David
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu