[tdwg-tag] Creating a TDWG standard for documenting Data Standards

Steve Baskauf steve.baskauf at vanderbilt.edu
Tue Mar 6 18:57:35 CET 2012


Markus,

Thanks for pointing these things out.  I haven't looked at all of the 
prior standards, but all of the ones that I've looked at predate the 
development of RDF and are therefore defined in either "human readable" 
terms, or as XML schemas.  My interest in this is to try to describe an 
uncomplicated but standardized way to create dereferenceable URIs for 
what would otherwise be text-only terms.

As far as the process of ratification and maintenance of controlled 
vocabularies is concerned, I think that is going to to have to be taken 
up by somebody other than me (maybe the Vocabulary Management Task 
Group).  I'm not really suggesting here that we create a standard for 
the PROCESS of adopting controlled vocabularies, but rather a standard 
for describing and documenting them.  This would perform a function 
similar to the way that the TDWG Standards Documentation Specification 
(http://www.tdwg.org/standards/147/) specifies how one documents for 
humans standards which are developed under a process that is defined 
separately at http://www.tdwg.org/about-tdwg/process/ .

Steve

On 3/6/2012 10:32 AM, "Markus Döring (GBIF)" wrote:
> Thanks for bringing this to attention Steve.
> It is a very important piece in the puzzle of interoperability.
>
> Just quickly Id like to point to some more of such data standards.
> Some have been created by TDWG some time ago and now are marked as "prior standards":
> http://www.tdwg.org/standards/
> For example the "World Geographical Scheme for Recording Plant Distributions" is still a widely adopted standard to my knowledge. But as you can see there are others which dont even have a download link and effectively are lost.
>
> For dwc archives GBIF and others have also created many formal vocabularies, none of which are ratified by TDWG though. Ratification of controlled vocabularies is a bit harder in my mind, as many of the vocabularies are living to some degree and definitely not as static as data exchange formats or protocols. So versioning and frequent updates become critical.
>
> For many of the dwc terms that you have listed you will find a vocabulary in use here (in particular under gbif):
> http://rs.gbif.org/vocabulary/
>
> best,
> Markus
>
>
>
> On 06.03.2012, at 17:11, Steve Baskauf wrote:
>
>> Dag and Éamonn,
>>
>> In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...".  That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .
>>
>> It seems to me that it would be optimal for TDWG to have a standard for documenting controlled vocabularies of this sort which I believe to be the standard category "Data Standard (DS)" described at http://www.tdwg.org/standards/status-and-categories/ .  To my knowledge, there are no such current DS standards, nor are there any guidelines as to how they should be defined/documented.  But there SHOULD be such standards and the lack of them is impeding progress in our community.  I think this is reflected in your effort to form the VOMAG group.
>>
>> TDWG does have a model for standards documentation in the form of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/), which although unratified is being used to specify how standards should be documented for humans.  There are also several models for defining controlled values in RDF:
>>
>> http://dublincore.org/2010/10/11/dctype.rdf
>> http://id.loc.gov/vocabulary/iso639-1.rdf
>> http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rdf
>>
>> There are probably more examples if I would look for them.  These models have some consistency in format which can guide us.
>>
>> Since I have been the Audubon Core review manager, I am now pretty familiar with the process of TDWG standards development and ratification, so I think that it would be possible to draft a standard based on the existing models that I have listed above, and to do so in a reasonable amount of time.  I would envision that this standard would define how the human-readable and machine readable documentation of the controlled vocabularies would be written, but it would not specify who should maintain the vocabularies, where and how they should be maintained, etc.  Those are technical details that can be worked out by groups such as your Vocabulary Management Task Group.
>>
>> Normally, the creation of a standard would be to form a task group to take on the task of creating the standard.  However, as far as I am concerned, the TDWG RDF group already has been given the task of creating this kind of thing and when your task group is approved, I think this task would be within the charge of it as well (your charter goal " Develop technical guidelines for TDWG vocabularies of basic terms..." .  So what I am proposing is that at some point in the near future, we work on creating this documentation standard as a joint project of the two task groups.  I am willing to get it started by writing a draft.  It could then be discussed by the two task groups and revised as necessary.  I do not think that this documentation standard needs to be very complex, nor will it be controversial.  So I think that it could be ratified in a period much shorter than what has been required for Technical Standards like Darwin Core and Audubon Core.  Please note that I am NOT proposing that we actually create any Data Standards, but rather that we create guidelines for how they should be documented for humans and machines.
>>
>> Although I am not in a position to initiate this at the present moment, I wanted to give you some time to think about this before I make any kind of official proposal.  I believe that the current discussion on the TDWG RDF email list is highly relevant to this issue because we need to work out how to present in RDF controlled vocabulary values in both string and URI reference forms.  In the context of the ongoing discussion on the list, I intend to specifically bring up the issue of dwc:basisOfRecord and dcterms:type which both specify the use of a controlled vocabulary and which both are defined in RDF as well as having text values.
>>
>> I don't want to seem "pushy" about this.  However, when I took on the role as co-convener of the RDF Task Group, it was with the understanding that we would make significant progress in one year.  It has now been five months out of that year and I feel the need to demonstrate that we can accomplish something tangible.  If we cannot actually accomplish something in the course of a year, then I need to move on to other projects that I've put aside in order to work on this.  So I am committed to trying to keep things moving.
>>
>> Please think about this and we can talk about it more later.
>>
>> Steve
>>
>> On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:
>>> Dear TAG,
>>>
>>> After battling with the plans for a biodiversity knowledge organization
>>> (KOS) framework for biodiversity information resources we have
>>> identified the requirement to develop guidelines and best practices for
>>> the management of vocabularies of terms. Basic terms organized in
>>> vocabularies provides an essential element to underpin the biodiversity
>>> information standards. As introduced at the TDWG 2011 TAG meetings in
>>> New Orleans, we propose the formation of a new Vocabulary Management
>>> Task Group (VOMAG) [1] to be organized under the TDWG technical
>>> architecture group (TAG). Please find the draft charter available from
>>> the GBIF Community Site [2][3]. Here you will also find another draft
>>> document "Biodiversity Knowledge Organization System: Proposed
>>> Architecture; Version 0.1 February 2012" that provides an overview of
>>> the proposed KOS landscape and the context for the proposed work-plan of
>>> the Vocabulary Management Task Group Charter.
>>>
>>> This is the first draft so far only discussed with Greg Whitbread as
>>> convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as
>>> convener of the TDWG RDF/OWL task group. We invite feedback and comments
>>> to the proposed formation of the task group including suggestions with
>>> regard to the work-plan. Please join the Vocabulary Management group at
>>> the GBIF Community Site [1]. You can start or participate in discussions
>>> or share suggestions using the GBIF Community Site. Feel also free to
>>> make contact with us to volunteer as a core member for this proposed
>>> task group!
>>>
>>> [1]
>>> http://community.gbif.org/pg/groups/21382/vocabulary-management/
>>>
>>> [2]
>>> http://community.gbif.org/pg/file/read/21388/
>>>
>>> [3]
>>> http://community.gbif.org/pg/blog/read/21387/
>>>
>>> [4]
>>> http://community.gbif.org/pg/file/read/21582/
>>>
>>>
>>>
>>> Best regards
>>> Dag, Eamonn and David
>>>
>>>
>>>
>>>
>> -- 
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>>
>> http://bioimages.vanderbilt.edu
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>
> .
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu





More information about the tdwg-tag mailing list