[tdwg-tag] Creating a TDWG standard for documenting Data Standards

Éamonn Ó Tuama [GBIF] eotuama at gbif.org
Wed Mar 7 09:57:48 CET 2012


Hi Steve,
I do think you raise an important issue - it would be good to develop
guidelines for expressing controlled vocabularies in RDF and I think the
task fits nicely within the RDF and VoMaG groups. I think it would be the
foundation for developing multilingual controlled vocabs (probably expressed
in SKOS) - something GBIF also needs to address.
Éamonn

-----Original Message-----
From: Steve Baskauf [mailto:steve.baskauf at vanderbilt.edu] 
Sent: 06 March 2012 20:53
To: Chuck Miller
Cc: "Markus Döring (GBIF)"; "Éamonn Ó Tuama (GBIF)"; TDWG TAG
Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data
Standards

Well, this is moderately amusing because I didn't realize I had sent the 
original email to the TAG email list - it was intended for Eamonn and 
Dag to think about as a possible future course of action.  :-) But no 
matter - fortunately when I write emails I assume that anyone could be 
reading them at some point in the future.

In any case, I will reiterate what I said in the initial posting which 
is that this has nothing to do with the defining the process for 
ratifying and maintaining data standards.  I'm not interested in 
touching that with a 100 meter pole.  It would simply be an attempt to 
lay out how one would document controlled vocabularies in RDF, which I 
believe falls within the charge of the RDF/OWL group.  In fact, 
guidelines on how to represent controlled vocabularies in RDF would be 
as useful for unratified controlled vocabularies as they would be for 
ratified data standards.

I was hoping to wait to bring this up with a larger group until after 
the discussions within the RDF/OWL had made more progress and possibly 
achieved some consensus.  But since it's been brought up, I would 
encourage anyone with an interest in discussing effective means of data 
transfer in RDF to join the RDF group discussion.  You can request to be 
added to the email list at http://groups.google.com/group/tdwg-rdf .

Steve

On 3/6/2012 12:31 PM, Chuck Miller wrote:
> Steve,
> I think you may be opening Pandora's box with the question of TDWG Data
Standards for DwC, particularly given the pervasiveness of some of the DwC
and DCTerms terms for which a controlled vocabulary is needed/recommended.
But the box needs opening.
>
> Establishing global consensus for controlled values for the more narrowly
circumscribed data exchange formats of the past was difficult, notably the
World Geographical Scheme still in use, which was initially a paper
publication and created for botany before TDWG broadened its charter, and
which required many months or years of work and collaborative agreement.
Some of the prior data standards merely declared standard works as the basis
for controlled values, such as TL-2 and BPH (both data standards for
botanical literature references that took years and years to create.)  But,
that approach couldn't extend to zoology where there is no equivalent to
TL-2 or BPH.  The practices of the past point out how things could be agreed
as standard back then before the modern era of linked objects.  But, even
then it took years of work.
>
> Forgetting the past coming forward to now, some of the more pervasive
terms in DwC will require wide involvement of many people to truly establish
a consensus for controlled values.  It's work that is needed to be done.
But, it will need to be community-inclusive and probably parsed into groups
of terms rather than attempt to tackle them all at once.  GBIF has started
the ball rolling on some of it.
>
> I still have plans to update the World Geographical Scheme data standard
to make it truly world, including oceans.  But, like everyone, the demands
of the work I'm paid to do continue to limit my time to devote to TDWG.
>
> Chuck
>
>
> -----Original Message-----
> From: tdwg-tag-bounces at lists.tdwg.org
[mailto:tdwg-tag-bounces at lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)"
> Sent: Tuesday, March 06, 2012 10:32 AM
> To: Steve Baskauf
> Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG
> Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data
Standards
>
> Thanks for bringing this to attention Steve.
> It is a very important piece in the puzzle of interoperability.
>
> Just quickly Id like to point to some more of such data standards.
> Some have been created by TDWG some time ago and now are marked as "prior
standards":
> http://www.tdwg.org/standards/
> For example the "World Geographical Scheme for Recording Plant
Distributions" is still a widely adopted standard to my knowledge. But as
you can see there are others which dont even have a download link and
effectively are lost.
>
> For dwc archives GBIF and others have also created many formal
vocabularies, none of which are ratified by TDWG though. Ratification of
controlled vocabularies is a bit harder in my mind, as many of the
vocabularies are living to some degree and definitely not as static as data
exchange formats or protocols. So versioning and frequent updates become
critical.
>
> For many of the dwc terms that you have listed you will find a vocabulary
in use here (in particular under gbif):
> http://rs.gbif.org/vocabulary/
>
> best,
> Markus
>
>
>
> On 06.03.2012, at 17:11, Steve Baskauf wrote:
>
>> Dag and Éamonn,
>>
>> In the context of the discussion which has been going on in the TDWG RDF
mailing list, I have been thinking more about the issue of how to deal with
DwC terms which state "Recommended best practice is to use a controlled
vocabulary...".  That would be dcterms:type, dwc:language,
dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition,
dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition,
dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country,
dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus,
dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode,
dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .
>>
>> It seems to me that it would be optimal for TDWG to have a standard for
documenting controlled vocabularies of this sort which I believe to be the
standard category "Data Standard (DS)" described at
http://www.tdwg.org/standards/status-and-categories/ .  To my knowledge,
there are no such current DS standards, nor are there any guidelines as to
how they should be defined/documented.  But there SHOULD be such standards
and the lack of them is impeding progress in our community.  I think this is
reflected in your effort to form the VOMAG group.
>>
>> TDWG does have a model for standards documentation in the form of the
TDWG Standards Documentation Specification
(http://www.tdwg.org/standards/147/), which although unratified is being
used to specify how standards should be documented for humans.  There are
also several models for defining controlled values in RDF:
>>
>> http://dublincore.org/2010/10/11/dctype.rdf
>> http://id.loc.gov/vocabulary/iso639-1.rdf
>> http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rd
>> f
>>
>> There are probably more examples if I would look for them.  These models
have some consistency in format which can guide us.
>>
>> Since I have been the Audubon Core review manager, I am now pretty
familiar with the process of TDWG standards development and ratification, so
I think that it would be possible to draft a standard based on the existing
models that I have listed above, and to do so in a reasonable amount of
time.  I would envision that this standard would define how the
human-readable and machine readable documentation of the controlled
vocabularies would be written, but it would not specify who should maintain
the vocabularies, where and how they should be maintained, etc.  Those are
technical details that can be worked out by groups such as your Vocabulary
Management Task Group.
>>
>> Normally, the creation of a standard would be to form a task group to
take on the task of creating the standard.  However, as far as I am
concerned, the TDWG RDF group already has been given the task of creating
this kind of thing and when your task group is approved, I think this task
would be within the charge of it as well (your charter goal " Develop
technical guidelines for TDWG vocabularies of basic terms..." .  So what I
am proposing is that at some point in the near future, we work on creating
this documentation standard as a joint project of the two task groups.  I am
willing to get it started by writing a draft.  It could then be discussed by
the two task groups and revised as necessary.  I do not think that this
documentation standard needs to be very complex, nor will it be
controversial.  So I think that it could be ratified in a period much
shorter than what has been required for Technical Standards like Darwin Core
and Audubon Core.  Please note that I am NOT proposing that we actually
create any Data Standards, but rather that we create guidelines for how they
should be documented for humans and machines.
>>
>> Although I am not in a position to initiate this at the present moment, I
wanted to give you some time to think about this before I make any kind of
official proposal.  I believe that the current discussion on the TDWG RDF
email list is highly relevant to this issue because we need to work out how
to present in RDF controlled vocabulary values in both string and URI
reference forms.  In the context of the ongoing discussion on the list, I
intend to specifically bring up the issue of dwc:basisOfRecord and
dcterms:type which both specify the use of a controlled vocabulary and which
both are defined in RDF as well as having text values.
>>
>> I don't want to seem "pushy" about this.  However, when I took on the
role as co-convener of the RDF Task Group, it was with the understanding
that we would make significant progress in one year.  It has now been five
months out of that year and I feel the need to demonstrate that we can
accomplish something tangible.  If we cannot actually accomplish something
in the course of a year, then I need to move on to other projects that I've
put aside in order to work on this.  So I am committed to trying to keep
things moving.
>>
>> Please think about this and we can talk about it more later.
>>
>> Steve
>>
>> On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:
>>> Dear TAG,
>>>
>>> After battling with the plans for a biodiversity knowledge
>>> organization
>>> (KOS) framework for biodiversity information resources we have
>>> identified the requirement to develop guidelines and best practices
>>> for the management of vocabularies of terms. Basic terms organized in
>>> vocabularies provides an essential element to underpin the
>>> biodiversity information standards. As introduced at the TDWG 2011
>>> TAG meetings in New Orleans, we propose the formation of a new
>>> Vocabulary Management Task Group (VOMAG) [1] to be organized under
>>> the TDWG technical architecture group (TAG). Please find the draft
>>> charter available from the GBIF Community Site [2][3]. Here you will
>>> also find another draft document "Biodiversity Knowledge Organization
>>> System: Proposed Architecture; Version 0.1 February 2012" that
>>> provides an overview of the proposed KOS landscape and the context
>>> for the proposed work-plan of the Vocabulary Management Task Group
Charter.
>>>
>>> This is the first draft so far only discussed with Greg Whitbread as
>>> convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as
>>> convener of the TDWG RDF/OWL task group. We invite feedback and
>>> comments to the proposed formation of the task group including
>>> suggestions with regard to the work-plan. Please join the Vocabulary
>>> Management group at the GBIF Community Site [1]. You can start or
>>> participate in discussions or share suggestions using the GBIF
>>> Community Site. Feel also free to make contact with us to volunteer
>>> as a core member for this proposed task group!
>>>
>>> [1]
>>> http://community.gbif.org/pg/groups/21382/vocabulary-management/
>>>
>>> [2]
>>> http://community.gbif.org/pg/file/read/21388/
>>>
>>> [3]
>>> http://community.gbif.org/pg/blog/read/21387/
>>>
>>> [4]
>>> http://community.gbif.org/pg/file/read/21582/
>>>
>>>
>>>
>>> Best regards
>>> Dag, Eamonn and David
>>>
>>>
>>>
>>>
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept.
>> of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>>
>> http://bioimages.vanderbilt.edu
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>
> .
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu







More information about the tdwg-tag mailing list