Proposed charter for a TAG Vocabulary Management Task Group (VOMAG)

newer
Re: [tdwg-tag] Any TCS users with...

older
Re: [tdwg-tag] tdwg-tag Digest,...

Dag Endresen (GBIF)

10 Feb 2012 10 Feb '12

11:20

Dear TAG, After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter. This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group! [1] http://community.gbif.org/pg/groups/21382/vocabulary-management/ [2] http://community.gbif.org/pg/file/read/21388/ [3] http://community.gbif.org/pg/blog/read/21387/ [4] http://community.gbif.org/pg/file/read/21582/ Best regards Dag, Eamonn and David -- Dag Endresen, PhD Knowledge Systems Engineer Global Biodiversity Information Facility (GBIF) Universitetsparken 15, DK-2100 Copenhagen, Denmark http://community.gbif.org/pg/profile/dag.endresen

Show replies by date

Steve Baskauf

10 Feb 10 Feb

19:01

New subject: [tdwg-tag] Proposed charter for a TAG Vocabulary Management Task Group (VOMAG)

I'm going to post this message here rather than on the GBIF community site for the Vocabulary Management group because the subject lies at the interface between what the VoMaG is proposing and the core mission of TDWG. As the review manager for Audubon Core, I have struggled to understand how one is supposed to reconcile the practical details of publishing and maintaining a vocabulary with the standards documentation process described in the draft TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/). I think that the specification works pretty well for a standard like an Applicability Statement (sensu http://www.tdwg.org/standards/status-and-categories/) which is by its nature somewhat descriptive and not likely to have repeated, small technical changes over time. However, I don't think that the specification really works for something like a vocabulary which falls into the Technical Specification. Vocabularies such as Darwin Core and the draft Audubon Core standard are subject to continual change and evolution. The documentation specification does not allow that because "once a standard has been ratified it cannot be changed in any substantive way; it must be superseded by a standard with a different name" (section 8). Vocabularies may also be defined in non-human readable form (e.g. Darwin Core whose normative definition is in RDF) which is not really dealt with adequately in the documentation specification which is focused primarily on the formatting of human-readable documents. So the documentation specification leaves us with little guidance on the subject of vocabularies. In the case of Darwin Core, its namespace policy (http://rs.tdwg.org/dwc/terms/namespace/index.htm) describes the process by which the vocabulary can evolve. Although this process is somewhat lengthy and perhaps confusing to the uninitiated, it has actually worked and resulted in improvements in the standard. In the Audubon Core review process, we have talked about using a similar mechanism to ensure that the AC vocabulary can also evolve over time without the necessity to go through the really painful process of resubmitting the vocabulary as a new standard (which would be required under the draft documentation standard. But such a mechanism is not included (or even allowed) in the current version of the documentation specification. I think that the standards development process at TDWG has been hampered by the lack of a ratified Documentation Standard. However, I don't think the solution is to just ratify the existing version because as I've noted, I don't think it really works for vocabularies. If it is appropriate, it might be good for the VoMaG to look at the process of vocabulary maintenance and documentation in the "standards" context. For example, should the process described in the DwC namespace policy be adopted more broadly as a policy for all TDWG standards that are vocabularies? Who should bear the responsibility of managing the revisions to vocabulary standards - the task group that created them? - the TAG? - somebody else? How do we ensure that the normative definition of the vocabularies are readily accessible and understandable to software developers and content providers who want to use them? These are all things which I think probably should be included in a final Documentation Standard and I think that based on recent experience we have a better idea now as to what might work than when the existing draft was created in 2007. If it turns out that figuring this out falls outside of the mandate of the VoMaG, then perhaps another task group could be formed to work in parallel with the VoMaG to work out these details and to put them into writing in the form of a ratified Standards Documentation Specification. I also would like to suggest that the VoMaG take a look at the fourth category of standards described at http://www.tdwg.org/standards/status-and-categories/ , namely the "Data Standard" category. The category description says that a Data Standard "specifies valid values in controlled vocabularies". I'm not sure exactly what that means, but I'm assuming it means what one uses when a term definition says "Recommended best practice is to use a controlled vocabulary". This frequently comes up, but to my knowledge, nobody has ever tried to define a controlled vocabulary as a Data Standard. In that context, it would be useful to establish just what a "controlled vocabulary" means. Most current applications provide text literals as the values for properties such as basisOfRecord. However, in the case of RDF, it might be a better practice to reference URIs rather than text strings. Or not. I'm not sure. Again this is something that would be useful to work out. We currently don't have any precedents in the TDWG community so maybe there are examples elsewhere that we could use as a model. Steve On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:

...

Dear TAG,

After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter.

This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!

[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/ [2] http://community.gbif.org/pg/file/read/21388/ [3] http://community.gbif.org/pg/blog/read/21387/ [4] http://community.gbif.org/pg/file/read/21582/

Best regards Dag, Eamonn and David

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu

Steve Baskauf

6 Mar 6 Mar

16:11

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Dag and Éamonn, In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType . It seems to me that it would be optimal for TDWG to have a standard for documenting controlled vocabularies of this sort which I believe to be the standard category "Data Standard (DS)" described at http://www.tdwg.org/standards/status-and-categories/ . To my knowledge, there are no such current DS standards, nor are there any guidelines as to how they should be defined/documented. But there SHOULD be such standards and the lack of them is impeding progress in our community. I think this is reflected in your effort to form the VOMAG group. TDWG does have a model for standards documentation in the form of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/), which although unratified is being used to specify how standards should be documented for humans. There are also several models for defining controlled values in RDF: http://dublincore.org/2010/10/11/dctype.rdf http://id.loc.gov/vocabulary/iso639-1.rdf http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rdf There are probably more examples if I would look for them. These models have some consistency in format which can guide us. Since I have been the Audubon Core review manager, I am now pretty familiar with the process of TDWG standards development and ratification, so I think that it would be possible to draft a standard based on the existing models that I have listed above, and to do so in a reasonable amount of time. I would envision that this standard would define how the human-readable and machine readable documentation of the controlled vocabularies would be written, but it would not specify who should maintain the vocabularies, where and how they should be maintained, etc. Those are technical details that can be worked out by groups such as your Vocabulary Management Task Group. Normally, the creation of a standard would be to form a task group to take on the task of creating the standard. However, as far as I am concerned, the TDWG RDF group already has been given the task of creating this kind of thing and when your task group is approved, I think this task would be within the charge of it as well (your charter goal " Develop technical guidelines for TDWG vocabularies of basic terms..." . So what I am proposing is that at some point in the near future, we work on creating this documentation standard as a joint project of the two task groups. I am willing to get it started by writing a draft. It could then be discussed by the two task groups and revised as necessary. I do not think that this documentation standard needs to be very complex, nor will it be controversial. So I think that it could be ratified in a period much shorter than what has been required for Technical Standards like Darwin Core and Audubon Core. Please note that I am NOT proposing that we actually create any Data Standards, but rather that we create guidelines for how they should be documented for humans and machines. Although I am not in a position to initiate this at the present moment, I wanted to give you some time to think about this before I make any kind of official proposal. I believe that the current discussion on the TDWG RDF email list is highly relevant to this issue because we need to work out how to present in RDF controlled vocabulary values in both string and URI reference forms. In the context of the ongoing discussion on the list, I intend to specifically bring up the issue of dwc:basisOfRecord and dcterms:type which both specify the use of a controlled vocabulary and which both are defined in RDF as well as having text values. I don't want to seem "pushy" about this. However, when I took on the role as co-convener of the RDF Task Group, it was with the understanding that we would make significant progress in one year. It has now been five months out of that year and I feel the need to demonstrate that we can accomplish something tangible. If we cannot actually accomplish something in the course of a year, then I need to move on to other projects that I've put aside in order to work on this. So I am committed to trying to keep things moving. Please think about this and we can talk about it more later. Steve On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:

...

Dear TAG,

After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter.

This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!

[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/ [2] http://community.gbif.org/pg/file/read/21388/ [3] http://community.gbif.org/pg/blog/read/21387/ [4] http://community.gbif.org/pg/file/read/21582/

Best regards Dag, Eamonn and David

"Markus Döring (GBIF)"

16:32

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Thanks for bringing this to attention Steve. It is a very important piece in the puzzle of interoperability. Just quickly Id like to point to some more of such data standards. Some have been created by TDWG some time ago and now are marked as "prior standards": http://www.tdwg.org/standards/ For example the "World Geographical Scheme for Recording Plant Distributions" is still a widely adopted standard to my knowledge. But as you can see there are others which dont even have a download link and effectively are lost. For dwc archives GBIF and others have also created many formal vocabularies, none of which are ratified by TDWG though. Ratification of controlled vocabularies is a bit harder in my mind, as many of the vocabularies are living to some degree and definitely not as static as data exchange formats or protocols. So versioning and frequent updates become critical. For many of the dwc terms that you have listed you will find a vocabulary in use here (in particular under gbif): http://rs.gbif.org/vocabulary/ best, Markus On 06.03.2012, at 17:11, Steve Baskauf wrote:

...

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

It seems to me that it would be optimal for TDWG to have a standard for documenting controlled vocabularies of this sort which I believe to be the standard category "Data Standard (DS)" described at http://www.tdwg.org/standards/status-and-categories/ . To my knowledge, there are no such current DS standards, nor are there any guidelines as to how they should be defined/documented. But there SHOULD be such standards and the lack of them is impeding progress in our community. I think this is reflected in your effort to form the VOMAG group.

TDWG does have a model for standards documentation in the form of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/), which although unratified is being used to specify how standards should be documented for humans. There are also several models for defining controlled values in RDF:

http://dublincore.org/2010/10/11/dctype.rdf http://id.loc.gov/vocabulary/iso639-1.rdf http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rdf

There are probably more examples if I would look for them. These models have some consistency in format which can guide us.

Since I have been the Audubon Core review manager, I am now pretty familiar with the process of TDWG standards development and ratification, so I think that it would be possible to draft a standard based on the existing models that I have listed above, and to do so in a reasonable amount of time. I would envision that this standard would define how the human-readable and machine readable documentation of the controlled vocabularies would be written, but it would not specify who should maintain the vocabularies, where and how they should be maintained, etc. Those are technical details that can be worked out by groups such as your Vocabulary Management Task Group.

Normally, the creation of a standard would be to form a task group to take on the task of creating the standard. However, as far as I am concerned, the TDWG RDF group already has been given the task of creating this kind of thing and when your task group is approved, I think this task would be within the charge of it as well (your charter goal " Develop technical guidelines for TDWG vocabularies of basic terms..." . So what I am proposing is that at some point in the near future, we work on creating this documentation standard as a joint project of the two task groups. I am willing to get it started by writing a draft. It could then be discussed by the two task groups and revised as necessary. I do not think that this documentation standard needs to be very complex, nor will it be controversial. So I think that it could be ratified in a period much shorter than what has been required for Technical Standards like Darwin Core and Audubon Core. Please note that I am NOT proposing that we actually create any Data Standards, but rather that we create guidelines for how they should be documented for humans and machines.

Although I am not in a position to initiate this at the present moment, I wanted to give you some time to think about this before I make any kind of official proposal. I believe that the current discussion on the TDWG RDF email list is highly relevant to this issue because we need to work out how to present in RDF controlled vocabulary values in both string and URI reference forms. In the context of the ongoing discussion on the list, I intend to specifically bring up the issue of dwc:basisOfRecord and dcterms:type which both specify the use of a controlled vocabulary and which both are defined in RDF as well as having text values.

I don't want to seem "pushy" about this. However, when I took on the role as co-convener of the RDF Task Group, it was with the understanding that we would make significant progress in one year. It has now been five months out of that year and I feel the need to demonstrate that we can accomplish something tangible. If we cannot actually accomplish something in the course of a year, then I need to move on to other projects that I've put aside in order to work on this. So I am committed to trying to keep things moving.

Please think about this and we can talk about it more later.

Steve

On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:

...
Dear TAG,

After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter.

This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!

[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/

[2] http://community.gbif.org/pg/file/read/21388/

[3] http://community.gbif.org/pg/blog/read/21387/

[4] http://community.gbif.org/pg/file/read/21582/

Best regards Dag, Eamonn and David

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707

http://bioimages.vanderbilt.edu _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Steve Baskauf

17:57

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Markus, Thanks for pointing these things out. I haven't looked at all of the prior standards, but all of the ones that I've looked at predate the development of RDF and are therefore defined in either "human readable" terms, or as XML schemas. My interest in this is to try to describe an uncomplicated but standardized way to create dereferenceable URIs for what would otherwise be text-only terms. As far as the process of ratification and maintenance of controlled vocabularies is concerned, I think that is going to to have to be taken up by somebody other than me (maybe the Vocabulary Management Task Group). I'm not really suggesting here that we create a standard for the PROCESS of adopting controlled vocabularies, but rather a standard for describing and documenting them. This would perform a function similar to the way that the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/) specifies how one documents for humans standards which are developed under a process that is defined separately at http://www.tdwg.org/about-tdwg/process/ . Steve On 3/6/2012 10:32 AM, "Markus Döring (GBIF)" wrote:

...

Thanks for bringing this to attention Steve. It is a very important piece in the puzzle of interoperability.

Just quickly Id like to point to some more of such data standards. Some have been created by TDWG some time ago and now are marked as "prior standards": http://www.tdwg.org/standards/ For example the "World Geographical Scheme for Recording Plant Distributions" is still a widely adopted standard to my knowledge. But as you can see there are others which dont even have a download link and effectively are lost.

For dwc archives GBIF and others have also created many formal vocabularies, none of which are ratified by TDWG though. Ratification of controlled vocabularies is a bit harder in my mind, as many of the vocabularies are living to some degree and definitely not as static as data exchange formats or protocols. So versioning and frequent updates become critical.

For many of the dwc terms that you have listed you will find a vocabulary in use here (in particular under gbif): http://rs.gbif.org/vocabulary/

best, Markus

On 06.03.2012, at 17:11, Steve Baskauf wrote:

...
Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

It seems to me that it would be optimal for TDWG to have a standard for documenting controlled vocabularies of this sort which I believe to be the standard category "Data Standard (DS)" described at http://www.tdwg.org/standards/status-and-categories/ . To my knowledge, there are no such current DS standards, nor are there any guidelines as to how they should be defined/documented. But there SHOULD be such standards and the lack of them is impeding progress in our community. I think this is reflected in your effort to form the VOMAG group.

TDWG does have a model for standards documentation in the form of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/), which although unratified is being used to specify how standards should be documented for humans. There are also several models for defining controlled values in RDF:

http://dublincore.org/2010/10/11/dctype.rdf http://id.loc.gov/vocabulary/iso639-1.rdf http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rdf

There are probably more examples if I would look for them. These models have some consistency in format which can guide us.

Since I have been the Audubon Core review manager, I am now pretty familiar with the process of TDWG standards development and ratification, so I think that it would be possible to draft a standard based on the existing models that I have listed above, and to do so in a reasonable amount of time. I would envision that this standard would define how the human-readable and machine readable documentation of the controlled vocabularies would be written, but it would not specify who should maintain the vocabularies, where and how they should be maintained, etc. Those are technical details that can be worked out by groups such as your Vocabulary Management Task Group.

Normally, the creation of a standard would be to form a task group to take on the task of creating the standard. However, as far as I am concerned, the TDWG RDF group already has been given the task of creating this kind of thing and when your task group is approved, I think this task would be within the charge of it as well (your charter goal " Develop technical guidelines for TDWG vocabularies of basic terms..." . So what I am proposing is that at some point in the near future, we work on creating this documentation standard as a joint project of the two task groups. I am willing to get it started by writing a draft. It could then be discussed by the two task groups and revised as necessary. I do not think that this documentation standard needs to be very complex, nor will it be controversial. So I think that it could be ratified in a period much shorter than what has been required for Technical Standards like Darwin Core and Audubon Core. Please note that I am NOT proposing that we actually create any Data Standards, but rather that we create guidelines for how they should be documented for humans and machines.

Although I am not in a position to initiate this at the present moment, I wanted to give you some time to think about this before I make any kind of official proposal. I believe that the current discussion on the TDWG RDF email list is highly relevant to this issue because we need to work out how to present in RDF controlled vocabulary values in both string and URI reference forms. In the context of the ongoing discussion on the list, I intend to specifically bring up the issue of dwc:basisOfRecord and dcterms:type which both specify the use of a controlled vocabulary and which both are defined in RDF as well as having text values.

I don't want to seem "pushy" about this. However, when I took on the role as co-convener of the RDF Task Group, it was with the understanding that we would make significant progress in one year. It has now been five months out of that year and I feel the need to demonstrate that we can accomplish something tangible. If we cannot actually accomplish something in the course of a year, then I need to move on to other projects that I've put aside in order to work on this. So I am committed to trying to keep things moving.

Please think about this and we can talk about it more later.

Steve

On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:

...
Dear TAG,

After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter.

This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!

[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/

[2] http://community.gbif.org/pg/file/read/21388/

[3] http://community.gbif.org/pg/blog/read/21387/

[4] http://community.gbif.org/pg/file/read/21582/

Best regards Dag, Eamonn and David

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707

http://bioimages.vanderbilt.edu _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

.

Chuck Miller

18:31

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Steve, I think you may be opening Pandora's box with the question of TDWG Data Standards for DwC, particularly given the pervasiveness of some of the DwC and DCTerms terms for which a controlled vocabulary is needed/recommended. But the box needs opening. Establishing global consensus for controlled values for the more narrowly circumscribed data exchange formats of the past was difficult, notably the World Geographical Scheme still in use, which was initially a paper publication and created for botany before TDWG broadened its charter, and which required many months or years of work and collaborative agreement. Some of the prior data standards merely declared standard works as the basis for controlled values, such as TL-2 and BPH (both data standards for botanical literature references that took years and years to create.) But, that approach couldn't extend to zoology where there is no equivalent to TL-2 or BPH. The practices of the past point out how things could be agreed as standard back then before the modern era of linked objects. But, even then it took years of work. Forgetting the past coming forward to now, some of the more pervasive terms in DwC will require wide involvement of many people to truly establish a consensus for controlled values. It's work that is needed to be done. But, it will need to be community-inclusive and probably parsed into groups of terms rather than attempt to tackle them all at once. GBIF has started the ball rolling on some of it. I still have plans to update the World Geographical Scheme data standard to make it truly world, including oceans. But, like everyone, the demands of the work I'm paid to do continue to limit my time to devote to TDWG. Chuck -----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" Sent: Tuesday, March 06, 2012 10:32 AM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards Thanks for bringing this to attention Steve. It is a very important piece in the puzzle of interoperability. Just quickly Id like to point to some more of such data standards. Some have been created by TDWG some time ago and now are marked as "prior standards": http://www.tdwg.org/standards/ For example the "World Geographical Scheme for Recording Plant Distributions" is still a widely adopted standard to my knowledge. But as you can see there are others which dont even have a download link and effectively are lost. For dwc archives GBIF and others have also created many formal vocabularies, none of which are ratified by TDWG though. Ratification of controlled vocabularies is a bit harder in my mind, as many of the vocabularies are living to some degree and definitely not as static as data exchange formats or protocols. So versioning and frequent updates become critical. For many of the dwc terms that you have listed you will find a vocabulary in use here (in particular under gbif): http://rs.gbif.org/vocabulary/ best, Markus On 06.03.2012, at 17:11, Steve Baskauf wrote:

...

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

It seems to me that it would be optimal for TDWG to have a standard for documenting controlled vocabularies of this sort which I believe to be the standard category "Data Standard (DS)" described at http://www.tdwg.org/standards/status-and-categories/ . To my knowledge, there are no such current DS standards, nor are there any guidelines as to how they should be defined/documented. But there SHOULD be such standards and the lack of them is impeding progress in our community. I think this is reflected in your effort to form the VOMAG group.

TDWG does have a model for standards documentation in the form of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/), which although unratified is being used to specify how standards should be documented for humans. There are also several models for defining controlled values in RDF:

http://dublincore.org/2010/10/11/dctype.rdf http://id.loc.gov/vocabulary/iso639-1.rdf http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rd f

There are probably more examples if I would look for them. These models have some consistency in format which can guide us.

Since I have been the Audubon Core review manager, I am now pretty familiar with the process of TDWG standards development and ratification, so I think that it would be possible to draft a standard based on the existing models that I have listed above, and to do so in a reasonable amount of time. I would envision that this standard would define how the human-readable and machine readable documentation of the controlled vocabularies would be written, but it would not specify who should maintain the vocabularies, where and how they should be maintained, etc. Those are technical details that can be worked out by groups such as your Vocabulary Management Task Group.

Normally, the creation of a standard would be to form a task group to take on the task of creating the standard. However, as far as I am concerned, the TDWG RDF group already has been given the task of creating this kind of thing and when your task group is approved, I think this task would be within the charge of it as well (your charter goal " Develop technical guidelines for TDWG vocabularies of basic terms..." . So what I am proposing is that at some point in the near future, we work on creating this documentation standard as a joint project of the two task groups. I am willing to get it started by writing a draft. It could then be discussed by the two task groups and revised as necessary. I do not think that this documentation standard needs to be very complex, nor will it be controversial. So I think that it could be ratified in a period much shorter than what has been required for Technical Standards like Darwin Core and Audubon Core. Please note that I am NOT proposing that we actually create any Data Standards, but rather that we create guidelines for how they should be documented for humans and machines.

Although I am not in a position to initiate this at the present moment, I wanted to give you some time to think about this before I make any kind of official proposal. I believe that the current discussion on the TDWG RDF email list is highly relevant to this issue because we need to work out how to present in RDF controlled vocabulary values in both string and URI reference forms. In the context of the ongoing discussion on the list, I intend to specifically bring up the issue of dwc:basisOfRecord and dcterms:type which both specify the use of a controlled vocabulary and which both are defined in RDF as well as having text values.

I don't want to seem "pushy" about this. However, when I took on the role as co-convener of the RDF Task Group, it was with the understanding that we would make significant progress in one year. It has now been five months out of that year and I feel the need to demonstrate that we can accomplish something tangible. If we cannot actually accomplish something in the course of a year, then I need to move on to other projects that I've put aside in order to work on this. So I am committed to trying to keep things moving.

Please think about this and we can talk about it more later.

Steve

On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:

...
Dear TAG,

After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter.

This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!

[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/

[2] http://community.gbif.org/pg/file/read/21388/

[3] http://community.gbif.org/pg/blog/read/21387/

[4] http://community.gbif.org/pg/file/read/21582/

Best regards Dag, Eamonn and David

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707

http://bioimages.vanderbilt.edu _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Steve Baskauf

19:52

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Well, this is moderately amusing because I didn't realize I had sent the original email to the TAG email list - it was intended for Eamonn and Dag to think about as a possible future course of action. :-) But no matter - fortunately when I write emails I assume that anyone could be reading them at some point in the future. In any case, I will reiterate what I said in the initial posting which is that this has nothing to do with the defining the process for ratifying and maintaining data standards. I'm not interested in touching that with a 100 meter pole. It would simply be an attempt to lay out how one would document controlled vocabularies in RDF, which I believe falls within the charge of the RDF/OWL group. In fact, guidelines on how to represent controlled vocabularies in RDF would be as useful for unratified controlled vocabularies as they would be for ratified data standards. I was hoping to wait to bring this up with a larger group until after the discussions within the RDF/OWL had made more progress and possibly achieved some consensus. But since it's been brought up, I would encourage anyone with an interest in discussing effective means of data transfer in RDF to join the RDF group discussion. You can request to be added to the email list at http://groups.google.com/group/tdwg-rdf . Steve On 3/6/2012 12:31 PM, Chuck Miller wrote:

...

Steve, I think you may be opening Pandora's box with the question of TDWG Data Standards for DwC, particularly given the pervasiveness of some of the DwC and DCTerms terms for which a controlled vocabulary is needed/recommended. But the box needs opening.

Establishing global consensus for controlled values for the more narrowly circumscribed data exchange formats of the past was difficult, notably the World Geographical Scheme still in use, which was initially a paper publication and created for botany before TDWG broadened its charter, and which required many months or years of work and collaborative agreement. Some of the prior data standards merely declared standard works as the basis for controlled values, such as TL-2 and BPH (both data standards for botanical literature references that took years and years to create.) But, that approach couldn't extend to zoology where there is no equivalent to TL-2 or BPH. The practices of the past point out how things could be agreed as standard back then before the modern era of linked objects. But, even then it took years of work.

Forgetting the past coming forward to now, some of the more pervasive terms in DwC will require wide involvement of many people to truly establish a consensus for controlled values. It's work that is needed to be done. But, it will need to be community-inclusive and probably parsed into groups of terms rather than attempt to tackle them all at once. GBIF has started the ball rolling on some of it.

I still have plans to update the World Geographical Scheme data standard to make it truly world, including oceans. But, like everyone, the demands of the work I'm paid to do continue to limit my time to devote to TDWG.

Chuck

-----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" Sent: Tuesday, March 06, 2012 10:32 AM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Thanks for bringing this to attention Steve. It is a very important piece in the puzzle of interoperability.

Just quickly Id like to point to some more of such data standards. Some have been created by TDWG some time ago and now are marked as "prior standards": http://www.tdwg.org/standards/ For example the "World Geographical Scheme for Recording Plant Distributions" is still a widely adopted standard to my knowledge. But as you can see there are others which dont even have a download link and effectively are lost.

For dwc archives GBIF and others have also created many formal vocabularies, none of which are ratified by TDWG though. Ratification of controlled vocabularies is a bit harder in my mind, as many of the vocabularies are living to some degree and definitely not as static as data exchange formats or protocols. So versioning and frequent updates become critical.

For many of the dwc terms that you have listed you will find a vocabulary in use here (in particular under gbif): http://rs.gbif.org/vocabulary/

best, Markus

On 06.03.2012, at 17:11, Steve Baskauf wrote:

...
Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

It seems to me that it would be optimal for TDWG to have a standard for documenting controlled vocabularies of this sort which I believe to be the standard category "Data Standard (DS)" described at http://www.tdwg.org/standards/status-and-categories/ . To my knowledge, there are no such current DS standards, nor are there any guidelines as to how they should be defined/documented. But there SHOULD be such standards and the lack of them is impeding progress in our community. I think this is reflected in your effort to form the VOMAG group.

TDWG does have a model for standards documentation in the form of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/), which although unratified is being used to specify how standards should be documented for humans. There are also several models for defining controlled values in RDF:

http://dublincore.org/2010/10/11/dctype.rdf http://id.loc.gov/vocabulary/iso639-1.rdf http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rd f

There are probably more examples if I would look for them. These models have some consistency in format which can guide us.

Since I have been the Audubon Core review manager, I am now pretty familiar with the process of TDWG standards development and ratification, so I think that it would be possible to draft a standard based on the existing models that I have listed above, and to do so in a reasonable amount of time. I would envision that this standard would define how the human-readable and machine readable documentation of the controlled vocabularies would be written, but it would not specify who should maintain the vocabularies, where and how they should be maintained, etc. Those are technical details that can be worked out by groups such as your Vocabulary Management Task Group.

Normally, the creation of a standard would be to form a task group to take on the task of creating the standard. However, as far as I am concerned, the TDWG RDF group already has been given the task of creating this kind of thing and when your task group is approved, I think this task would be within the charge of it as well (your charter goal " Develop technical guidelines for TDWG vocabularies of basic terms..." . So what I am proposing is that at some point in the near future, we work on creating this documentation standard as a joint project of the two task groups. I am willing to get it started by writing a draft. It could then be discussed by the two task groups and revised as necessary. I do not think that this documentation standard needs to be very complex, nor will it be controversial. So I think that it could be ratified in a period much shorter than what has been required for Technical Standards like Darwin Core and Audubon Core. Please note that I am NOT proposing that we actually create any Data Standards, but rather that we create guidelines for how they should be documented for humans and machines.

Although I am not in a position to initiate this at the present moment, I wanted to give you some time to think about this before I make any kind of official proposal. I believe that the current discussion on the TDWG RDF email list is highly relevant to this issue because we need to work out how to present in RDF controlled vocabulary values in both string and URI reference forms. In the context of the ongoing discussion on the list, I intend to specifically bring up the issue of dwc:basisOfRecord and dcterms:type which both specify the use of a controlled vocabulary and which both are defined in RDF as well as having text values.

I don't want to seem "pushy" about this. However, when I took on the role as co-convener of the RDF Task Group, it was with the understanding that we would make significant progress in one year. It has now been five months out of that year and I feel the need to demonstrate that we can accomplish something tangible. If we cannot actually accomplish something in the course of a year, then I need to move on to other projects that I've put aside in order to work on this. So I am committed to trying to keep things moving.

Please think about this and we can talk about it more later.

Steve

On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:

...
Dear TAG,

After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter.

This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!

[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/

[2] http://community.gbif.org/pg/file/read/21388/

[3] http://community.gbif.org/pg/blog/read/21387/

[4] http://community.gbif.org/pg/file/read/21582/

Best regards Dag, Eamonn and David

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707

http://bioimages.vanderbilt.edu _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

.

Éamonn Ó Tuama [GBIF]

7 Mar 7 Mar

08:57

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

...

Steve, I think you may be opening Pandora's box with the question of TDWG Data Standards for DwC, particularly given the pervasiveness of some of the DwC and DCTerms terms for which a controlled vocabulary is needed/recommended. But the box needs opening.

Establishing global consensus for controlled values for the more narrowly circumscribed data exchange formats of the past was difficult, notably the World Geographical Scheme still in use, which was initially a paper

Hi Steve, I do think you raise an important issue - it would be good to develop guidelines for expressing controlled vocabularies in RDF and I think the task fits nicely within the RDF and VoMaG groups. I think it would be the foundation for developing multilingual controlled vocabs (probably expressed in SKOS) - something GBIF also needs to address. Éamonn -----Original Message----- From: Steve Baskauf [mailto:steve.baskauf@vanderbilt.edu] Sent: 06 March 2012 20:53 To: Chuck Miller Cc: "Markus Döring (GBIF)"; "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards Well, this is moderately amusing because I didn't realize I had sent the original email to the TAG email list - it was intended for Eamonn and Dag to think about as a possible future course of action. :-) But no matter - fortunately when I write emails I assume that anyone could be reading them at some point in the future. In any case, I will reiterate what I said in the initial posting which is that this has nothing to do with the defining the process for ratifying and maintaining data standards. I'm not interested in touching that with a 100 meter pole. It would simply be an attempt to lay out how one would document controlled vocabularies in RDF, which I believe falls within the charge of the RDF/OWL group. In fact, guidelines on how to represent controlled vocabularies in RDF would be as useful for unratified controlled vocabularies as they would be for ratified data standards. I was hoping to wait to bring this up with a larger group until after the discussions within the RDF/OWL had made more progress and possibly achieved some consensus. But since it's been brought up, I would encourage anyone with an interest in discussing effective means of data transfer in RDF to join the RDF group discussion. You can request to be added to the email list at http://groups.google.com/group/tdwg-rdf . Steve On 3/6/2012 12:31 PM, Chuck Miller wrote: publication and created for botany before TDWG broadened its charter, and which required many months or years of work and collaborative agreement. Some of the prior data standards merely declared standard works as the basis for controlled values, such as TL-2 and BPH (both data standards for botanical literature references that took years and years to create.) But, that approach couldn't extend to zoology where there is no equivalent to TL-2 or BPH. The practices of the past point out how things could be agreed as standard back then before the modern era of linked objects. But, even then it took years of work.

...

Forgetting the past coming forward to now, some of the more pervasive

terms in DwC will require wide involvement of many people to truly establish a consensus for controlled values. It's work that is needed to be done. But, it will need to be community-inclusive and probably parsed into groups of terms rather than attempt to tackle them all at once. GBIF has started the ball rolling on some of it.

...

I still have plans to update the World Geographical Scheme data standard

to make it truly world, including oceans. But, like everyone, the demands of the work I'm paid to do continue to limit my time to devote to TDWG.

...

Chuck

-----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org

...

Sent: Tuesday, March 06, 2012 10:32 AM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Thanks for bringing this to attention Steve. It is a very important piece in the puzzle of interoperability.

Just quickly Id like to point to some more of such data standards. Some have been created by TDWG some time ago and now are marked as "prior standards": http://www.tdwg.org/standards/ For example the "World Geographical Scheme for Recording Plant Distributions" is still a widely adopted standard to my knowledge. But as you can see there are others which dont even have a download link and effectively are lost.

For dwc archives GBIF and others have also created many formal vocabularies, none of which are ratified by TDWG though. Ratification of controlled vocabularies is a bit harder in my mind, as many of the vocabularies are living to some degree and definitely not as static as data exchange formats or protocols. So versioning and frequent updates become critical.

For many of the dwc terms that you have listed you will find a vocabulary in use here (in particular under gbif): http://rs.gbif.org/vocabulary/

best, Markus

On 06.03.2012, at 17:11, Steve Baskauf wrote:

...
Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

It seems to me that it would be optimal for TDWG to have a standard for documenting controlled vocabularies of this sort which I believe to be the standard category "Data Standard (DS)" described at http://www.tdwg.org/standards/status-and-categories/ . To my knowledge,

...

...
TDWG does have a model for standards documentation in the form of the

TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/), which although unratified is being used to specify how standards should be documented for humans. There are also several models for defining controlled values in RDF:

...
http://dublincore.org/2010/10/11/dctype.rdf http://id.loc.gov/vocabulary/iso639-1.rdf http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctype.rd f

There are probably more examples if I would look for them. These models

have some consistency in format which can guide us.

...
Since I have been the Audubon Core review manager, I am now pretty

familiar with the process of TDWG standards development and ratification, so I think that it would be possible to draft a standard based on the existing models that I have listed above, and to do so in a reasonable amount of time. I would envision that this standard would define how the human-readable and machine readable documentation of the controlled vocabularies would be written, but it would not specify who should maintain

...

...
Normally, the creation of a standard would be to form a task group to

take on the task of creating the standard. However, as far as I am concerned, the TDWG RDF group already has been given the task of creating

...

...
Although I am not in a position to initiate this at the present moment, I

wanted to give you some time to think about this before I make any kind of official proposal. I believe that the current discussion on the TDWG RDF email list is highly relevant to this issue because we need to work out how to present in RDF controlled vocabulary values in both string and URI reference forms. In the context of the ongoing discussion on the list, I intend to specifically bring up the issue of dwc:basisOfRecord and dcterms:type which both specify the use of a controlled vocabulary and which both are defined in RDF as well as having text values.

...
I don't want to seem "pushy" about this. However, when I took on the

role as co-convener of the RDF Task Group, it was with the understanding

[mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of "Markus Döring (GBIF)" there are no such current DS standards, nor are there any guidelines as to how they should be defined/documented. But there SHOULD be such standards and the lack of them is impeding progress in our community. I think this is reflected in your effort to form the VOMAG group. the vocabularies, where and how they should be maintained, etc. Those are technical details that can be worked out by groups such as your Vocabulary Management Task Group. this kind of thing and when your task group is approved, I think this task would be within the charge of it as well (your charter goal " Develop technical guidelines for TDWG vocabularies of basic terms..." . So what I am proposing is that at some point in the near future, we work on creating this documentation standard as a joint project of the two task groups. I am willing to get it started by writing a draft. It could then be discussed by the two task groups and revised as necessary. I do not think that this documentation standard needs to be very complex, nor will it be controversial. So I think that it could be ratified in a period much shorter than what has been required for Technical Standards like Darwin Core and Audubon Core. Please note that I am NOT proposing that we actually create any Data Standards, but rather that we create guidelines for how they should be documented for humans and machines. that we would make significant progress in one year. It has now been five months out of that year and I feel the need to demonstrate that we can accomplish something tangible. If we cannot actually accomplish something in the course of a year, then I need to move on to other projects that I've put aside in order to work on this. So I am committed to trying to keep things moving.

...

...
Please think about this and we can talk about it more later.

Steve

On 2/10/2012 5:20 AM, Dag Endresen (GBIF) wrote:

...
Dear TAG,

After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group

Charter.

...
...
This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group!

[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/

[2] http://community.gbif.org/pg/file/read/21388/

[3] http://community.gbif.org/pg/blog/read/21387/

[4] http://community.gbif.org/pg/file/read/21582/

Best regards Dag, Eamonn and David

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707

http://bioimages.vanderbilt.edu _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

.

Paul Murray

01:52

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

...

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type. For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li... These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory. Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it. ------------------------------------------------------------- There are two or three approaches to using a standard vocabulary when your own data does not quite match it. You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go. You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive. In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that. You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted. Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of" But this creates a performance hit. ------------------------------------------------------------- That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have. I'd also suggest that that every enumeration (ie, ist of individuals) include two special values: NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional. These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

Tony.Rees＠csiro.au

1 Nov 1 Nov

22:41

New subject: [tdwg-tag] Any TCS users with experiences to report?

Hi TDWG persons, I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route - I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more... the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their "ibis" schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something). I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better. It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts? Regards - Tony Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au<mailto:Tony.Rees@csiro.au> Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36 From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED] On 07/03/2012, at 3:11 AM, Steve Baskauf wrote: Dag and Éamonn, In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType . We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type. For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#<http://rs.tdwg.org/ontology/voc/TaxonConcept>TaxonRelationshipTerm And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li... These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory. Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it. ------------------------------------------------------------- There are two or three approaches to using a standard vocabulary when your own data does not quite match it. You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go. You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive. In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that. You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted. Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of" But this creates a performance hit. ------------------------------------------------------------- That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have. I'd also suggest that that every enumeration (ie, ist of individuals) include two special values: NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional. These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

Roderic Page

23:55

New subject: [tdwg-tag] Any TCS users with experiences to report?

A TDWG standard not actually being used, surely not ;) Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them. We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards. Regards, Rod Sent from my iPhone On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au> wrote:

...

Hi TDWG persons,

I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).

I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.

It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.

For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm

Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm

And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li...

These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.

Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.

-------------------------------------------------------------

There are two or three approaches to using a standard vocabulary when your own data does not quite match it.

You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.

You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.

In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.

You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.

Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"

But this creates a performance hit.

-------------------------------------------------------------

That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.

I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:

NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.

These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

greg whitbread

2 Nov 2 Nov

00:53

New subject: [tdwg-tag] Any TCS users with experiences to report? [SEC=UNCLASSIFIED]

+1 for the TDWG ontology. Not a standard (yet) but one of the most useful and enduring things that TDWG has produced ( thanks Roger ). Overlooked perhaps because it is CSV enough for aggregation. For real work it's use should be encouraged. ALA-NSL uses it as the basis for the LSID vocabularies, RDF and JSON services delivered from http://biodiversity.org.au urn:lsid:biodiversity.org.au:apni.name:54321 , http://biodiversity.org.au/apni.name/54321.rdf or http://biodiversity.org.au/apni.name/54321.json I will leave the XML response to Paul. greg On 2 November 2012 10:55, Roderic Page <r.page@bio.gla.ac.uk> wrote:

...

A TDWG standard not actually being used, surely not ;)

Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept

This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them.

We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards.

Regards,

Rod

Sent from my iPhone

On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au> wrote:

Hi TDWG persons,

I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).

I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.

It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000)

e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566

LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.

For instance,

http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm

Is declared to be a subclass of

http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm

And we have a few specific items of that type:

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li...

These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.

Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.

-------------------------------------------------------------

There are two or three approaches to using a standard vocabulary when your own data does not quite match it.

You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.

You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.

In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.

You can use the "define your own term" mechanism and assert both

_:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .

_:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can have a completely separate predicate:

_:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .

_:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.

Another alternative is to create an OWL rule that says

"if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"

But this creates a performance hit.

-------------------------------------------------------------

That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.

I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:

NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means.

OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.

These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

-- Greg Whitbread Australian National Botanic Gardens Australian National Herbarium +61 2 62509482 ghw@anbg.gov.au

Steve Baskauf

01:10

New subject: [tdwg-tag] Any TCS users with experiences to report?

Rod, Not entirely overlooked: 1. The Australian Plan Name Index (APNI) uses some components of the TDWG Taxon Concept LSID vocabulary for rdf+xml repesentations. See http://www.biodiversity.org.au/confluence/display/bdv/NSL%2bServices for information and http://biodiversity.org.au/apni.taxon/118883.rdf for an example. Paul Murray may want to comment on his experience with using it. 2. Referred to in Beginner's Guide to RDF (http://code.google.com/p/tdwg-rdf/wiki/Beginners7OWL#7.6.3._Properties_in_OW...) and the RDF Task Group's class inventory (http://code.google.com/p/tdwg-rdf/wiki/ClassInventory#1.4.__Classes_in_the_T...). The TDWG Ontology as a whole has a very "unfinished" feel to it, which may contribute to why people haven't jumped to embrace it. However, because the TaxonConcept Ontology component of the TDWG Ontology is based on TCS, it appears to me to be pretty "finished" and quite usable. I agree with you that the Darwin Core Taxon class rather confusingly mixes aspects of taxon names and taxon concepts. That's why when Cam and I put together Darwin-SW (which is primarily based on Darwin Core) we opted to incorporate the TaxonConcept Ontology rather than to try to figure out how to use the DwC Taxon class terms (see http://code.google.com/p/darwin-sw/wiki/ClassTaxon). For a live example, in http://bioimages.vanderbilt.edu/ind-baskauf/37770.rdf a dwc:Identification instance is related to a tc:TaxonConcept through a dsw:toTaxon property. I would have preferred to link to a stable, well-known HTTP URI for the taxon concept, but since I don't think they exist for most of my non-Australian taxa, I was forced to mint some, e.g. http://bioimages.vanderbilt.edu/taxon/37297-fna1993.rdf (use view page source to see underlying RDF). I attempted to use the TDWG TaxonConcept ontology such as I could understand it as a non-taxonomist. Hmmm. I see that Greg has mentioned APNI already in his email before I had time to finish this. Steve Roderic Page wrote:

...

A TDWG standard not actually being used, surely not ;)

Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept

This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them.

We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies

...

such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having

...

standards.

Regards,

Rod

Sent from my iPhone

On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au <mailto:Tony.Rees@csiro.au>> wrote:

...
Hi TDWG persons,

I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).

I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so **could** implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.

It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000)

e-mail: Tony.Rees@csiro.au <mailto:Tony.Rees@csiro.au> Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566

LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

*From:* tdwg-tag-bounces@lists.tdwg.org <mailto:tdwg-tag-bounces@lists.tdwg.org> [mailto:tdwg-tag-bounces@lists.tdwg.org] *On Behalf Of *Paul Murray *Sent:* Wednesday, 7 March 2012 12:52 PM *To:* Steve Baskauf *Cc:* "Éamonn Ó Tuama (GBIF)"; TDWG TAG *Subject:* Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.

For instance,

http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm

Is declared to be a subclass of

http://rs.tdwg.org/ontology/voc/TaxonConcept# <http://rs.tdwg.org/ontology/voc/TaxonConcept>TaxonRelationshipTerm

And we have a few specific items of that type:

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li...

These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.

Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.

-------------------------------------------------------------

There are two or three approaches to using a standard vocabulary when your own data does not quite match it.

You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.

You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.

In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.

You can use the "define your own term" mechanism and assert both

_:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .

_:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can have a completely separate predicate:

_:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .

_:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.

Another alternative is to create an OWL rule that says

"if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"

But this creates a performance hit.

-------------------------------------------------------------

That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.

I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:

NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means.

OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.

These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org <mailto:tdwg-tag@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Richard Pyle

01:13

New subject: [tdwg-tag] Any TCS users with experiences to report?

As the Convenor of the TDWG Taxon Names and Concept group, I have failed in one of my core duties to address this issue. My inability to attend TDWG this year has only exacerbated this problem. Having said that….. I have had many discussions with many folks over the past couple of years on this issue, and for various reasons the time is now ripe to re-visit this age-old problem and make some decisions about how to move forward. For the ZooBank LSID resolver, we used Roger’s vocabularies; and to some extent, the DwC terms harmonize (but not completely). A few years ago I made a push to either revitalize TCS (e.g., through TCS 2.0), or to allow it to retire (if it hasn’t already done so de facto). Having just emerged from nearly two very thick years of development on ZooBank, GNA/GNUB, etc., I am now more energized (and liberated, in terms of available time) to re-focus on how to move forward. My hope is that we can make some core decisions about how to move forward well before next year’s TDWG meeting. I would very-much welcome feedback from people on: 1) Who is actively using TCS? Does it work? Can it be improved? Should it be retired? 2) Who is using Roger’s vocabulary? Does it work? Can it be improved? 3) How much of DwC:Taxon is in active use? Just the “traditional” terms; or some of the new ones introduced with the ratified DwC? Does it work? Can it be improved? 4) What other standards are being used in this space? Now that we have launched the new ZooBank, we will turn our attention to GNUB services that will start to put that content to work. It is therefore very much in our interest to support the sorts of data exchange mechanisms that people most need and, ideally, collapse the various “flavors” into something we can all rally around. Aloha, Rich Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control. From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Thursday, November 01, 2012 1:56 PM To: <Tony.Rees@csiro.au> Cc: pmurray@anbg.gov.au; <tdwg-tag@lists.tdwg.org>; Simon.Pigot@csiro.au Subject: Re: [tdwg-tag] Any TCS users with experiences to report? A TDWG standard not actually being used, surely not ;) Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them. We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards. Regards, Rod Sent from my iPhone On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au<mailto:Tony.Rees@csiro.au>> wrote: Hi TDWG persons, I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something). I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better. It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts? Regards - Tony Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au<mailto:Tony.Rees@csiro.au> Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36 From: tdwg-tag-bounces@lists.tdwg.org<mailto:tdwg-tag-bounces@lists.tdwg.org> [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED] On 07/03/2012, at 3:11 AM, Steve Baskauf wrote: Dag and Éamonn, In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType . We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type. For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#<http://rs.tdwg.org/ontology/voc/TaxonConcept>TaxonRelationshipTerm And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li... These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory. Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it. ------------------------------------------------------------- There are two or three approaches to using a standard vocabulary when your own data does not quite match it. You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go. You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive. In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that. You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted. Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of" But this creates a performance hit. ------------------------------------------------------------- That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have. I'd also suggest that that every enumeration (ie, ist of individuals) include two special values: NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional. These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email. _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org<mailto:tdwg-tag@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-tag ________________________________ This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum.

Kevin Richards

02:01

New subject: [tdwg-tag] Any TCS users with experiences to report?

We have used TCS for several projects, TDWG LSID vocabs (ontology) for some and Darwin Core taxon terms for another. Eg. Global Compositae Checklist uses a variation on TCS (as we need to provide “provider data” with the generated consensus checklist data). The web service return this TCS variant, eg http://dixon.iplantcollaborative.org/CompositaeWebService/TICAChecklistServi... TCS works very well for handling taxon names and concepts in their fuller form. Our biggest issue using TCS was data providers not being able to actually give this structured data – it was often just NameId, FullName, ParentNameId, AcceptedNameId data. I also agree with Rod that RDF is probably a better way to go, so the TDWG LSID vocabs (ontology), http://wiki.tdwg.org/twiki/bin/view/TAG/LsidVocs, are good for – ie the equivalent of the fuller TCS XML but in RDF from. We have used these vocabs for most LSID resolver that resolve names: our names at Landcare, eg http://lsid.landcareresearch.co.nz/authority/metadata/?lsid=urn:lsid:landcar... Index Fungorum names, eg urn:lsid:indexfungorum.org:names:213645 (currently broken, but on the move to a new server) Recently, with the NZOR project (www.nzor.org)<http://www.nzor.org)> we have delivered our names dataset to GBIF using the Taxon terms of Darwin Core, in the Darwin Core Archive format. This brought up several issues. Firstly, we still don’t seem to have sorted out what actually constitutes a taxon concept, and what to apply the ID for the concept to. This is evident from the conflation of the name and concept terms in DwC. I think the Darwin-SW stuff Steve mention is a good step towards sorting this out, but it really needs some clarification, especially when applying it in a practical sense, eg as a DwC-A. Other issues where around the flattened nature of DwC and the struggle to get concept data into it. I agree with Rich that all this stuff needs sorting out and we need to define a way forward – which standards should be recommended for use, etc. This also overlaps with the TAG work and there has been some talk of a TAG workshop that may be a good place to look at these issues. Eg, if we had a core, technology independent model that all standards could map to, then the standards would be more like applicability statements, best practices etc for particular use cases. There shouldn’t really be any issue with have multiple taxon oriented standards (as there are many use cases that vary slightly), but at least they should all map together and be consistent in some way. Kevin From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Friday, 2 November 2012 2:13 p.m. To: 'Roderic Page'; Tony.Rees@csiro.au Cc: pmurray@anbg.gov.au; tdwg-tag@lists.tdwg.org; Simon.Pigot@csiro.au Subject: Re: [tdwg-tag] Any TCS users with experiences to report? As the Convenor of the TDWG Taxon Names and Concept group, I have failed in one of my core duties to address this issue. My inability to attend TDWG this year has only exacerbated this problem. Having said that….. I have had many discussions with many folks over the past couple of years on this issue, and for various reasons the time is now ripe to re-visit this age-old problem and make some decisions about how to move forward. For the ZooBank LSID resolver, we used Roger’s vocabularies; and to some extent, the DwC terms harmonize (but not completely). A few years ago I made a push to either revitalize TCS (e.g., through TCS 2.0), or to allow it to retire (if it hasn’t already done so de facto). Having just emerged from nearly two very thick years of development on ZooBank, GNA/GNUB, etc., I am now more energized (and liberated, in terms of available time) to re-focus on how to move forward. My hope is that we can make some core decisions about how to move forward well before next year’s TDWG meeting. I would very-much welcome feedback from people on: 1) Who is actively using TCS? Does it work? Can it be improved? Should it be retired? 2) Who is using Roger’s vocabulary? Does it work? Can it be improved? 3) How much of DwC:Taxon is in active use? Just the “traditional” terms; or some of the new ones introduced with the ratified DwC? Does it work? Can it be improved? 4) What other standards are being used in this space? Now that we have launched the new ZooBank, we will turn our attention to GNUB services that will start to put that content to work. It is therefore very much in our interest to support the sorts of data exchange mechanisms that people most need and, ideally, collapse the various “flavors” into something we can all rally around. Aloha, Rich Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org<mailto:deepreef@bishopmuseum.org> http://hbs.bishopmuseum.org/staff/pylerichard.html Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control. From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Thursday, November 01, 2012 1:56 PM To: <Tony.Rees@csiro.au> Cc: pmurray@anbg.gov.au; <tdwg-tag@lists.tdwg.org>; Simon.Pigot@csiro.au Subject: Re: [tdwg-tag] Any TCS users with experiences to report? A TDWG standard not actually being used, surely not ;) Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them. We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards. Regards, Rod Sent from my iPhone On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au<mailto:Tony.Rees@csiro.au>> wrote: Hi TDWG persons, I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something). I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better. It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts? Regards - Tony Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au<mailto:Tony.Rees@csiro.au> Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36 From: tdwg-tag-bounces@lists.tdwg.org<mailto:tdwg-tag-bounces@lists.tdwg.org> [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED] On 07/03/2012, at 3:11 AM, Steve Baskauf wrote: Dag and Éamonn, In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType . We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type. For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#<http://rs.tdwg.org/ontology/voc/TaxonConcept>TaxonRelationshipTerm And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li... These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory. Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it. ------------------------------------------------------------- There are two or three approaches to using a standard vocabulary when your own data does not quite match it. You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go. You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive. In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that. You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted. Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of" But this creates a performance hit. ------------------------------------------------------------- That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have. I'd also suggest that that every enumeration (ie, ist of individuals) include two special values: NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional. These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email. _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org<mailto:tdwg-tag@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-tag ________________________________ This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum. ________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz

"Markus Döring (GBIF)"

12:08

New subject: [tdwg-tag] Any TCS users with experiences to report?

Hi, apart from GBIF the Catalogue of Life is now heavily engaged in using DwC-A to exchange taxonomic data between its various components. They have established a more rigorous specification being a subset of what dwc actually allows. Many other including EOL and EDIT already make use of dwc archives, but I dont know about the exact usages. The only problems I am aware of right now is indeed like Kevin pointed out with fully taxon concept oriented systems, having multiple concepts for the same name and qualified relations between them. But said that I think its mainly a lack of experience and clear guidelines so far that leaves us with open questions, not so much the format itself. I hope we can arrange some workshop in the near year to work out the remaining issues that people have with dwc for taxonomic data. So far I have Kevin and some people from CoL on my radar, but I would be interested to know if more people are interested in such an event. Markus On 02.11.2012, at 02:13, Richard Pyle wrote:

...

As the Convenor of the TDWG Taxon Names and Concept group, I have failed in one of my core duties to address this issue. My inability to attend TDWG this year has only exacerbated this problem.

Having said that….. I have had many discussions with many folks over the past couple of years on this issue, and for various reasons the time is now ripe to re-visit this age-old problem and make some decisions about how to move forward.

For the ZooBank LSID resolver, we used Roger’s vocabularies; and to some extent, the DwC terms harmonize (but not completely). A few years ago I made a push to either revitalize TCS (e.g., through TCS 2.0), or to allow it to retire (if it hasn’t already done so de facto).

Having just emerged from nearly two very thick years of development on ZooBank, GNA/GNUB, etc., I am now more energized (and liberated, in terms of available time) to re-focus on how to move forward. My hope is that we can make some core decisions about how to move forward well before next year’s TDWG meeting.

I would very-much welcome feedback from people on:

1) Who is actively using TCS? Does it work? Can it be improved? Should it be retired? 2) Who is using Roger’s vocabulary? Does it work? Can it be improved? 3) How much of DwC:Taxon is in active use? Just the “traditional” terms; or some of the new ones introduced with the ratified DwC? Does it work? Can it be improved? 4) What other standards are being used in this space?

Now that we have launched the new ZooBank, we will turn our attention to GNUB services that will start to put that content to work. It is therefore very much in our interest to support the sorts of data exchange mechanisms that people most need and, ideally, collapse the various “flavors” into something we can all rally around.

Aloha, Rich

Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html

Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control.

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Thursday, November 01, 2012 1:56 PM To: <Tony.Rees@csiro.au> Cc: pmurray@anbg.gov.au; <tdwg-tag@lists.tdwg.org>; Simon.Pigot@csiro.au Subject: Re: [tdwg-tag] Any TCS users with experiences to report?

A TDWG standard not actually being used, surely not ;)

Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept

This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them.

We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards.

Regards,

Rod

Sent from my iPhone

On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au> wrote:

Hi TDWG persons,

I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definitionhttp://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).

I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.

It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.

For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm

Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm

And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li...

These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.

Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.

-------------------------------------------------------------

There are two or three approaches to using a standard vocabulary when your own data does not quite match it.

You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.

You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.

In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.

You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.

Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"

But this creates a performance hit.

-------------------------------------------------------------

That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.

I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:

NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.

These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum. _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Kennedy, Jessie

17:02

New subject: [tdwg-tag] Any TCS users with experiences to report?

Hello Folks (Ghost from the pastŠ) Interesting discussion on who's using TCS - I was wondering the same myselfŠ Belo is short history of TCS from my perspective so you can stop now if it's not of interest.. The important thing to note is that when TCS was developed, the majority of the work was trying to negotiate with all the stakeholders on how they saw taxon concepts and to try and capture this in a format that would allow people to describe Taxon Concepts in as full a way or minimum way as possible. Until that point there were a couple of Taxon Concepts schemas - the Berlin Model (W. Berendsohn et al) and the Prometheus model (Pullan et al). These had differences and seemed to be too detailed for certain groups or didn't specify alternative ways of describing taxon concepts that other groups wanted. When I was asked to lead the initiative for TDWG, it became apparent that nearly every group wanted something slightly different and the job was to try and describe something that everyone could work with. However we also wanted to work with all the other standard groups ABCD, Character description (forgot its name sorry) etc but they weren't complete nor was the idea of guids which we also wanted to incorporate and felt were central for the approach to work. We went ahead with the belief that these would get finalised and we would have a way of cross linking the different schemas. TCS was therefore never fully specified as I had hoped. This was also in the time when XML was the thing to do, however the ideas behind TCS were more about the modelling - XML was just someway to specify it. As RDF was getting popular we started working on the ontology for TDWG using again the existing groups as the basis - this is where what is being referred to as Roger's ontology came from and was an RDF representation of part of TCS required immediately by some folks. At that point, as I was no longer working on any project in the area, I gave up the TCS group and handed to RichŠ . I very occasionally read emails from the group and often think - oh looks like we're starting the taxon concept debate again ;-) and realise people still want proper taxon concepts, but then I find we have not pushed TCS (in some form useful to people) and haven't move don very much. I have looked at DWCa recently as I have funding to do some more taxonomy visualisation and see that it makes a very slight nod to concepts but I do not think would be of any use to do the job properly. I think the problem is partly because of the focus on legacy data (which of course is important) but I think something else is required to improve our data for the future - this was always my hope. So before all the effort goes only into DWCa I would encourage folks to think hard about what it is you want. If it is only to be able to exchange basic information then fine, if it is do do any real analysis and try to improve the state of biodiversity data and knowledge then DWCa may not be the answer IMHO. So someone who really cares please take on the concept challenge and create proper taxon concept data (be in XML or whatever) - when enough is available people will make it work as they'll want to use your dataŠ While the only data we have is basic DWC specimen records no one will botherŠ. I think it would be great if the major databases that describe taxa (not just list names) described their data as concepts and allowed people to link to their databases when identifying specimens and when sequencing etc, this would be the start of a really useful biodiversity network. Oh well back to dreaming and occasional lurking on the mailing listŠ Hope you're all well and I see you again soon, Jessie On 02/11/2012 12:08, ""Markus Döring (GBIF)"" <mdoering@gbif.org> wrote:

...

Hi, apart from GBIF the Catalogue of Life is now heavily engaged in using DwC-A to exchange taxonomic data between its various components. They have established a more rigorous specification being a subset of what dwc actually allows. Many other including EOL and EDIT already make use of dwc archives, but I dont know about the exact usages.

The only problems I am aware of right now is indeed like Kevin pointed out with fully taxon concept oriented systems, having multiple concepts for the same name and qualified relations between them. But said that I think its mainly a lack of experience and clear guidelines so far that leaves us with open questions, not so much the format itself.

I hope we can arrange some workshop in the near year to work out the remaining issues that people have with dwc for taxonomic data. So far I have Kevin and some people from CoL on my radar, but I would be interested to know if more people are interested in such an event.

Markus

On 02.11.2012, at 02:13, Richard Pyle wrote:

...
As the Convenor of the TDWG Taxon Names and Concept group, I have failed in one of my core duties to address this issue. My inability to attend TDWG this year has only exacerbated this problem.

Having said thatŠ.. I have had many discussions with many folks over the past couple of years on this issue, and for various reasons the time is now ripe to re-visit this age-old problem and make some decisions about how to move forward.

For the ZooBank LSID resolver, we used Roger¹s vocabularies; and to some extent, the DwC terms harmonize (but not completely). A few years ago I made a push to either revitalize TCS (e.g., through TCS 2.0), or to allow it to retire (if it hasn¹t already done so de facto).

Having just emerged from nearly two very thick years of development on ZooBank, GNA/GNUB, etc., I am now more energized (and liberated, in terms of available time) to re-focus on how to move forward. My hope is that we can make some core decisions about how to move forward well before next year¹s TDWG meeting.

I would very-much welcome feedback from people on:

1) Who is actively using TCS? Does it work? Can it be improved? Should it be retired? 2) Who is using Roger¹s vocabulary? Does it work? Can it be improved? 3) How much of DwC:Taxon is in active use? Just the ³traditional² terms; or some of the new ones introduced with the ratified DwC? Does it work? Can it be improved? 4) What other standards are being used in this space?

Now that we have launched the new ZooBank, we will turn our attention to GNUB services that will start to put that content to work. It is therefore very much in our interest to support the sorts of data exchange mechanisms that people most need and, ideally, collapse the various ³flavors² into something we can all rally around.

Aloha, Rich

Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html

Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control.

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Thursday, November 01, 2012 1:56 PM To: <Tony.Rees@csiro.au> Cc: pmurray@anbg.gov.au; <tdwg-tag@lists.tdwg.org>; Simon.Pigot@csiro.au Subject: Re: [tdwg-tag] Any TCS users with experiences to report?

A TDWG standard not actually being used, surely not ;)

Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept

This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them.

We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards.

Regards,

Rod

Sent from my iPhone

On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au> wrote:

Hi TDWG persons,

I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, moreŠ the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their ³ibis² schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definitionhttp://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).

I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.

It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.

For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm

Is declared to be a subclass of

http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm

And we have a few specific items of that type:

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homony m

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous -literature-name

These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.

Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.

-------------------------------------------------------------

There are two or three approaches to using a standard vocabulary when your own data does not quite match it.

You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.

You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.

In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.

You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.

Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"

But this creates a performance hit.

-------------------------------------------------------------

That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.

I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:

NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.

These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum. _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Edinburgh Napier University is one of Scotland's top universities for graduate employability. 93.6% of graduates are in work or further study within six months of leaving. The Telegraph newspaper named us as one of the "top ten UK universities for getting a job" in 2012. This university is also proud winner of the Queen's Anniversary Prize for Higher and Further Education 2009, awarded for innovative housing construction for environmental benefit and quality of life. This message is intended for the addressee(s) only and should not be read, copied or disclosed to anyone else outwith the University without the permission of the sender. It is your responsibility to ensure that this message and any attachments are scanned for viruses or other defects. Edinburgh Napier University does not accept liability for any loss or damage which may result from this email or any attachment, or for errors or omissions arising after it was sent. Email is not a secure medium. Email entering the University's system is subject to routine monitoring and filtering by the University. Edinburgh Napier University is a registered Scottish charity. Registration number SC018373

Tony.Rees＠csiro.au

3 Nov 3 Nov

03:41

New subject: [tdwg-tag] Any TCS users with experiences to report?

Hi Jessie, also others who have responded thus far, You said:

...

I think it would be great if the major databases that describe taxa (not just list names) described their data as concepts and allowed people to link to their databases when identifying specimens and when sequencing etc, this would be the start of a really useful biodiversity network.

Agreed! And also the databases that "just list names" are dealing with concepts as we know, comprising a valid name plus all listed synonyms in these cases... My feeling is the reason that there is not yet any standardization in this area - every data resource does its own thing using its own home-grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards-compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&request=GetMap&layers=bioreg:CAAB37020002&styles=&bbox=109.0,-44.5,156.5,-8.5&width=512&height=388&srs=EPSG:4326&format=image/gif So maybe for either TCS, DwC and so on a missing part of the task is to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course! More soon, Regards - Tony ________________________________________ From: Kennedy, Jessie [J.Kennedy@napier.ac.uk] Sent: Saturday, 3 November 2012 4:02 AM To: "Markus Döring (GBIF)"; Richard Pyle; Rees, Tony (CMAR, Hobart) Cc: pmurray@anbg.gov.au; Éamonn Ó Tuama; Technical Architecture Group mailing list; Pigot, Simon (CMAR, Hobart) Subject: Re: [tdwg-tag] Any TCS users with experiences to report? Hello Folks (Ghost from the pastŠ) Interesting discussion on who's using TCS - I was wondering the same myselfŠ Belo is short history of TCS from my perspective so you can stop now if it's not of interest.. The important thing to note is that when TCS was developed, the majority of the work was trying to negotiate with all the stakeholders on how they saw taxon concepts and to try and capture this in a format that would allow people to describe Taxon Concepts in as full a way or minimum way as possible. Until that point there were a couple of Taxon Concepts schemas - the Berlin Model (W. Berendsohn et al) and the Prometheus model (Pullan et al). These had differences and seemed to be too detailed for certain groups or didn't specify alternative ways of describing taxon concepts that other groups wanted. When I was asked to lead the initiative for TDWG, it became apparent that nearly every group wanted something slightly different and the job was to try and describe something that everyone could work with. However we also wanted to work with all the other standard groups ABCD, Character description (forgot its name sorry) etc but they weren't complete nor was the idea of guids which we also wanted to incorporate and felt were central for the approach to work. We went ahead with the belief that these would get finalised and we would have a way of cross linking the different schemas. TCS was therefore never fully specified as I had hoped. This was also in the time when XML was the thing to do, however the ideas behind TCS were more about the modelling - XML was just someway to specify it. As RDF was getting popular we started working on the ontology for TDWG using again the existing groups as the basis - this is where what is being referred to as Roger's ontology came from and was an RDF representation of part of TCS required immediately by some folks. At that point, as I was no longer working on any project in the area, I gave up the TCS group and handed to RichŠ . I very occasionally read emails from the group and often think - oh looks like we're starting the taxon concept debate again ;-) and realise people still want proper taxon concepts, but then I find we have not pushed TCS (in some form useful to people) and haven't move don very much. I have looked at DWCa recently as I have funding to do some more taxonomy visualisation and see that it makes a very slight nod to concepts but I do not think would be of any use to do the job properly. I think the problem is partly because of the focus on legacy data (which of course is important) but I think something else is required to improve our data for the future - this was always my hope. So before all the effort goes only into DWCa I would encourage folks to think hard about what it is you want. If it is only to be able to exchange basic information then fine, if it is do do any real analysis and try to improve the state of biodiversity data and knowledge then DWCa may not be the answer IMHO. So someone who really cares please take on the concept challenge and create proper taxon concept data (be in XML or whatever) - when enough is available people will make it work as they'll want to use your dataŠ While the only data we have is basic DWC specimen records no one will botherŠ. I think it would be great if the major databases that describe taxa (not just list names) described their data as concepts and allowed people to link to their databases when identifying specimens and when sequencing etc, this would be the start of a really useful biodiversity network. Oh well back to dreaming and occasional lurking on the mailing listŠ Hope you're all well and I see you again soon, Jessie On 02/11/2012 12:08, ""Markus Döring (GBIF)"" <mdoering@gbif.org> wrote:

...

Hi, apart from GBIF the Catalogue of Life is now heavily engaged in using DwC-A to exchange taxonomic data between its various components. They have established a more rigorous specification being a subset of what dwc actually allows. Many other including EOL and EDIT already make use of dwc archives, but I dont know about the exact usages.

The only problems I am aware of right now is indeed like Kevin pointed out with fully taxon concept oriented systems, having multiple concepts for the same name and qualified relations between them. But said that I think its mainly a lack of experience and clear guidelines so far that leaves us with open questions, not so much the format itself.

I hope we can arrange some workshop in the near year to work out the remaining issues that people have with dwc for taxonomic data. So far I have Kevin and some people from CoL on my radar, but I would be interested to know if more people are interested in such an event.

Markus

On 02.11.2012, at 02:13, Richard Pyle wrote:

...
As the Convenor of the TDWG Taxon Names and Concept group, I have failed in one of my core duties to address this issue. My inability to attend TDWG this year has only exacerbated this problem.

Having said thatŠ.. I have had many discussions with many folks over the past couple of years on this issue, and for various reasons the time is now ripe to re-visit this age-old problem and make some decisions about how to move forward.

For the ZooBank LSID resolver, we used Roger¹s vocabularies; and to some extent, the DwC terms harmonize (but not completely). A few years ago I made a push to either revitalize TCS (e.g., through TCS 2.0), or to allow it to retire (if it hasn¹t already done so de facto).

Having just emerged from nearly two very thick years of development on ZooBank, GNA/GNUB, etc., I am now more energized (and liberated, in terms of available time) to re-focus on how to move forward. My hope is that we can make some core decisions about how to move forward well before next year¹s TDWG meeting.

I would very-much welcome feedback from people on:

1) Who is actively using TCS? Does it work? Can it be improved? Should it be retired? 2) Who is using Roger¹s vocabulary? Does it work? Can it be improved? 3) How much of DwC:Taxon is in active use? Just the ³traditional² terms; or some of the new ones introduced with the ratified DwC? Does it work? Can it be improved? 4) What other standards are being used in this space?

Now that we have launched the new ZooBank, we will turn our attention to GNUB services that will start to put that content to work. It is therefore very much in our interest to support the sorts of data exchange mechanisms that people most need and, ideally, collapse the various ³flavors² into something we can all rally around.

Aloha, Rich

Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html

Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control.

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Thursday, November 01, 2012 1:56 PM To: <Tony.Rees@csiro.au> Cc: pmurray@anbg.gov.au; <tdwg-tag@lists.tdwg.org>; Simon.Pigot@csiro.au Subject: Re: [tdwg-tag] Any TCS users with experiences to report?

A TDWG standard not actually being used, surely not ;)

Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept

This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them.

We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards.

Regards,

Rod

Sent from my iPhone

On 1 Nov 2012, at 22:41, <Tony.Rees@csiro.au> wrote:

Hi TDWG persons,

I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, moreŠ the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their ³ibis² schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definitionhttp://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).

I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.

It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.

For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm

Is declared to be a subclass of

http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm

And we have a few specific items of that type:

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homony m

http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous -literature-name

These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.

Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.

-------------------------------------------------------------

There are two or three approaches to using a standard vocabulary when your own data does not quite match it.

You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.

You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.

In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.

You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.

Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"

But this creates a performance hit.

-------------------------------------------------------------

That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.

I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:

NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.

These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum. _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Richard Pyle

05:33

New subject: [tdwg-tag] Any TCS users with experiences to report?

I never received Jessie's original post; I only saw it in Tony's reply (I wonder what other posts are not getting through...). In any case, thanks to Jessie for sending that -- her description very closely matches my own recollection of how TCS came to be, and the context in which it happened. But I wanted to comment on the same bit that Tony did, but coming from a different angle:

...

...
I think it would be great if the major databases that describe taxa (not just list names) described their data as concepts and allowed people to link to their databases when identifying specimens and when sequencing etc, this would be the start of a really useful biodiversity network.

I think the fundamental problem that existed during the development of TCS continues to this day; which is that there is no consensus definition of what a taxon concept "is" (in an informatics sense). We throw words like "circumscription" and such around, and although those words do help clarify the conversation a bit, they still leave a great deal of "wiggle room" for (mis?)interpretation. Is it a clade? Is it a class? Is it a set? In the same way that light is both a wave and a particle, taxon concepts can be different things at the same time. Three things that I think TCS got right (and should not be abandoned or forgotten with TCS 2.0, or whatever else takes the lead) are: 1) Separation of nomenclature from taxon concepts 2) Flexibility in how to represent the boundaries/definition of taxon concepts. 3) Support for 3rd-party concept mapping (i.e., RelationshipAssertions) But even with these assets, we still have very different ideas about how we want to define what a concept "is", which we really need to do before we can come up with a standard data model to represent them. Pete DeVries has done a lot of work in this area for establishing Taxon Concepts in LOD space. I think there is a great deal of overlap between what he is on about, and what other, more traditional peddlers of taxon concepts (e.g., CoL), are on about. However, there is still a rather broad disconnect in how we might cross-walk these different notions of a concept to each other. This is why I have always advocated an approach that focuses on atomized taxon name usage (TNU) instances. These are factual in nature, and much easier to model. TNUs serve as the foundation for both nomenclatural and taxon concept domains (the least common denominator, so to speak), and can serve to bridge not only names to concepts, but concepts to other concepts, and cross-linking of other information (occurrences, classifications, etc.) We've focused a lot of our effort these past two years, in part funded by the BiScCol project and more robustly funded by the Global Names (U.S.) project, and we're developing some basic web services to leverage the TNU approach to representing multiple classifications, cross-mapping taxon concepts, and bridging nomenclature to taxonomy. But the weakest link for the TNU approach is the lack of robust data content. I'm referring to much more than just missing names (which will total in the low single-digit millions); but also to the missing name-usage instances (which will number in the hundreds of millions to billions). The gears are slowly starting to turn in this area, but there is still a long way yet to go. We have another grant pending to further support GNA/GNUB development, and this time we also have Nico Franz on board to help flesh out a TNU-based taxon concept model -- which I have tremendous confidence will help guide the way forward in the long term (I'll leave it to Nico to elaborate). But as the TDWG TNC person, my current focus is more on the short- and medium-term needs of the broader community. For most needs, by most data providers, TCS, the vocabularies, and DwC/DwCa meet (or exceed) most of the technical needs (excepting the examples given by Paul). What I think we need to do is harmonize those largely overlapping but not-quite-identical approaches, and integrate the ideas that have come from Pete DeVries' work (http://www.taxonconcept.org) and from the Darwin-SW efforts. If we do it right, we can get rid of the redundancy and expand the functionality at the same time. Aloha, Rich Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control. This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum.

Paul Murray

2 Nov 2 Nov

01:31

New subject: [tdwg-tag] Any TCS users with experiences to report? [SEC=UNCLASSIFIED]

Firstly, the XML schema: http://biodiversity.org.au/xml/ibis is an xml namespace, which works a bit differently to RDF namespaces. RDF does not have an explicit mechanism for finding schema metadata. By convention (and it is just a convention), we usually find the schema for a namespace by assuming that the namespace URI will work as a URL that can be fetched, and that fetching it will pull back a schema description (possibly in any one of several formats, using HTTP content negotiation). In XML, however, namespaces are explicitly linked to schema documents by the xsi:schemaLocation attribute. The xml generated by biodiversity.org.au http://biodiversity.org.au/apni.taxon/54321.xml Comes back with the declaration <app:documents xmlns:app="http://biodiversity.org.au/xml/servicelayer/content" xmlns:ibis="http://biodiversity.org.au/xml/ibis" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cfg="http://biodiversity.org.au/xml/servicelayer/configuration" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/dcterms.xsd http://purl.org/dc/dcmitype/ http://dublincore.org/schemas/xmls/qdc/dcmitype.xsd http://biodiversity.org.au/xml/servicelayer/content http://anbg.gov.au/ala/schemas/xml/app.xsd http://biodiversity.org.au/xml/ibis http://biodiversity.org.au/xml/ibis-20120706.xsd http://biodiversity.org.au/xml/apni http://biodiversity.org.au/xml/apni-20120706.xsd http://biodiversity.org.au/xml/afd http://biodiversity.org.au/xml/afd-20120706.xsd http://biodiversity.org.au/xml/col http://biodiversity.org.au/xml/col-20110615.xsd "> Although it's a bit buried in there, XML parsers can see from this that the xml namespace "http://biodiversity.org.au/xml/ibis" has a location of "http://biodiversity.org.au/xml/ibis-20120706.xsd". All of our XML is supposed to validate, and last time I checked it did. By the way - note the date on the filename. We have changed the schema from time to time. Another change is upcoming: the addition of an "excluded" flag for concepts that have been considered for APC and have been explicitly excluded (for a variety of reasons). This will be managed by a new schema document being available on our server and the generated xsi:schemaLocation attribute being changed. Secondly, TCS: The issue with TCS is that it is very difficult to extend. To use a bit of TCS in some other schema, you would import the element types and extend them. But TCS mostly does not expose its element types as named types that can be referenced externally - it's all done inline. This means that the only place a TCS "ScientificName" or "Rank" element can appear is somewhere inside a TCS DataSet element. This is not in itself a show-stopper: we could simply generate a DataSet wrapper when we produce output in response to fetches. But there were other issues such as (and I can only recall one or two at the moment - this mail is not a full defence of our decision to not go with TCS): A TaxonConcepts element may have multiple TaxonRelationships element. We would like to attach additional data to each relationship to capture information that TCS cannot. There is a ProviderSpecificData element, but this is at the end of the TaxonConcept element, and I could not work out a way to stuff the extra data for each relationship into that ProviderSpecificData element in such a way as it was attached to the correct relationship - although re-looking at it now I see a "ref" attribute and perhaps that is meant to do the job. There are multiple TCS "relationship types", but these did not quite match the data we had. It is not possible to put anything but a TCS relationship type enum into the "type" attribute of a TaxonRelationship element, so we wind up having to provide two fields - the "real" type and the nearest TCS equivalent. The "real" type needs to go in the ProviderSpecificData section - miles away (in the document) from what is supposed to be the primary place where the relationship is described. It's ugly. Furthermore, some of our relationships don't really match the TCS ones at all well - to the point that using a TCS type would be misleading. The TCS enumeration does not have a "other" value, so there was a bit of an impasse. In any event, we were looking at either putting some relationships in the TCS array, and some in the PSD array, or putting corresponding arrays in each. Of course, in the provider specific data section we cannot use any of the TCS elements, because the element types are not exposed and can only appear in a TCS DataSet at the correct spot. It just got to the point where the ProviderSpecificData section was bigger and more interesting than the TCS, so we broke it out into a separate XML document (which was bundles with the TCS using an ibis:documents wrapper), at which point we couldn't help but ask "Can some one explain again why we are trying to do this?". After more discussions with both the zoologists and the botanists, attempting to work out which TCS enumerated values I should use for what, we gave up. TCS does an admirable job of being watertight. If you have any valid XML document with any TCS element, then you know that it will be enclosed in a DataSet element and come bundled with enough context to make sense of it. It's a model for shifting around entire, self-contained *sets* of data. Entire taxonomies, sitting as big files on a disk (or in an xml store). But our service layer serves up fragments - one or several taxa in response to a request, and TCS turned out to not be a good model for what we do. The history of trying to use it has left us with a legacy of having multiple relationship-type fields (relationship "description" and relationship "category") whose product does not form a sensible set of values. What we have now is a site-specific schema that captures and exposes the data we have. Admittedly, this means that the grand goal we are all trying to accomplish - a consistent worldwide net of data - is not as far down the track as we were hoping to go. It means that the problem of working out how data set 1 matches data set 2 is pushed off onto aggregators, a job that is in general impossible for an aggregator to accomplish. If we could have fitted our data into TCS, if everyone else could also have done so, then that would have been wonderful. We were reluctant to abandon it, but to get our data out the door we eventually did. On 02/11/2012, at 9:41 AM, <Tony.Rees@csiro.au> <Tony.Rees@csiro.au> wrote:

...

Hi TDWG persons,

I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS,http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumablyhttp://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).

I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.

It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]

On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:

Dag and Éamonn,

In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .

We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.

For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm

Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm

And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li...

These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.

Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.

-------------------------------------------------------------

There are two or three approaches to using a standard vocabulary when your own data does not quite match it.

You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.

You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.

In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.

You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .

You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.

Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"

But this creates a performance hit.

-------------------------------------------------------------

That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.

I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:

NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.

These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

Dag Endresen [GBIF]

7 Mar 7 Mar

13:18

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Dear Steve, Many thanks for bringing up the documentation practices for controlled vocabularies, and also for offering to take the lead on writing a first draft. Do you envision that the guidelines for providing documentation on a controlled (value) vocabulary might be included as part of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/)? And to eventually move to seek ratification for this standard initiated by Roger Hyam some years ago...? Or could these RDF guidelines be seen more as the kind of Type 3 (or type 2) documents that Roger describes with the proposed documentation specification standard. I have been following the first discussions on the TDWG-RDF Google Group with great interest. The discussion on how to report values as literals when a term is declared with the range to be a non-literal resource type has been very educational for me to follow. It seems to me that this is still a topic under exploratory discussion in the Dublin Core Metadata Initiative group, however some guidelines for the TDWG best practices would be valuable! A large part of the RDF vocabulary guidelines would probably be on a less technically detailed level? Do you think that the same guidelines can cover the scope for both the RDF vocabularies of terms and the controlled value vocabularies? My thinking is that either terms or controlled values can be described as concepts in a basic RDF vocabularies - and that these concepts can be re-used in derived resources such as the Darwin Core Archive extensions and vocabularies (or re-used by ontologies). The best practices for constructing a vocabulary defining concepts intended either as controlled values or concepts intended as terms could be largely the same? Our intention for the proposed VoMaG task group is to look at the management practices for vocabularies - including the evaluation of some tools to support an expert group formed to be in charge of such a vocabulary [1]. The KOS Wiki [2] is one such tool I am particularly interested to get feedback on. The KOS Wiki (Semantic MediaWiki) is based on the recommendations and role model of the Species-ID Wiki [3]. The KOS Wiki is for testing purposes and similar in many ways to the Species-ID Wiki. The Wiki tool could be used as a platform for collaborative management of vocabularies and their concepts. The normative RDF vocabulary could then be based on the RDF descriptions extracted from the Wiki. [Welcome to make contact for a username to test the demo KOS Wiki!]. [1] http://kos.gbif.org/ [2] http://kos.gbif.org/wiki/Main_Page [3] http://species-id.net/wiki/Category:Term Best regards Dag

Steve Baskauf

15:10

New subject: [tdwg-tag] Creating a TDWG standard for documenting Data Standards

Dag and Éamonn, Thanks for your interest and response. Responses to your comments inline. On 3/7/2012 7:18 AM, Dag Endresen [GBIF] wrote:

...

Do you envision that the guidelines for providing documentation on a controlled (value) vocabulary might be included as part of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/)? And to eventually move to seek ratification for this standard initiated by Roger Hyam some years ago...?

Well, I think that it would be extremely beneficial to finish that specification. I think it provides very useful guidelines for documenting human-readable documents. However, it says little about how to document non-human readable documents. I have brought this issue up in the context of Darwin Core where it is not very clear which RDF documents are normative. If Roger is not working on this any more, perhaps it could be passed off to somebody else (not me, please). I would recommend that whomever works on it consult with the authors and review managers of the standards which are undergoing ratification or which have been ratified since the existing documentation standard draft was written. I think that would be Darwin Core, Audubon Core, the GUID/LSID Applicability Statement, and TAPIR (I might have missed one). Those people would probably have useful feedback on what did and did not work for them. In particular, I think that the part of the draft which says that there is no versioning of standards needs to be reworked. I understand the rationale, but as a practical matter we are ignoring that prohibitin in Darwin Core which has a namespace policy that allows the standard to evolve without re-submission and ratification. Audubon Core is considering a similar policy. If the DwC namespace policy is something that works (I suppose that is subject to debate) it could be written into the documentation standards.

...

Or could these RDF guidelines be seen more as the kind of Type 3 (or type 2) documents that Roger describes with the proposed documentation specification standard.

I think that it might have been a mistake on my part to have mentioned creating a standard in the email because it seems to have brought up all kinds of ancient history of which I'm not aware. But then my initial email was just to bring up the idea with you and Éamonn rather than to start a list discussion (which happened inadvertently when I used "reply all" to a previous message of yours - remind me not to do that again). At this point it may be premature to suggest that it be a standard, although it could be one of the type "Current Best Practice". But first I think that there needs to be some more serious discussion and experimentation in the context of the RDF Task Group to find out whether creating RDF representations of controlled vocabularies actually accomplishes anything useful or not.

...

I have been following the first discussions on the TDWG-RDF Google Group with great interest. The discussion on how to report values as literals when a term is declared with the range to be a non-literal resource type has been very educational for me to follow. It seems to me that this is still a topic under exploratory discussion in the Dublin Core Metadata Initiative group, however some guidelines for the TDWG best practices would be valuable! A large part of the RDF vocabulary guidelines would probably be on a less technically detailed level?

As one who has struggled to understand the DCMI Abstract Model, I think that if we cannot provide guidelines which can be understood in a short period of time by people with a general understanding of data management, then we are wasting our time. I think that an important part of this is providing concrete examples.

...

Do you think that the same guidelines can cover the scope for both the RDF vocabularies of terms and the controlled value vocabularies? My thinking is that either terms or controlled values can be described as concepts in a basic RDF vocabularies - and that these concepts can be re-used in derived resources such as the Darwin Core Archive extensions and vocabularies (or re-used by ontologies). The best practices for constructing a vocabulary defining concepts intended either as controlled values or concepts intended as terms could be largely the same?

I don't know the answer to this. I was thinking about controlled vocabularies because they are generally initially defined as text strings. It would therefore be relatively easy to combine some form of those strings with a namespace to create a URI, then provide minimal RDF to support dereferencing. We have examples of this in the three controlled vocabularies I mentioned in my initial email. I think that we have learned from the experience with the Darwin Core type vocabulary that it is probably unwise to embed things like subclass properties within the term definition RDF itself. However, I am intrigued by the possibility of including enough semantics to allow a client to figure out that terms which are expressed in different languages are equivalent (as Éamonn mentioned in his response). I don't know what the best strategy for doing that is. The Library of Congress language tag RDF expresses relationships using SKOS terms, but I do not have enough experience with KOS to know how widely clients are equipped to make use of that kind of information. But I know that we have people in our community who are familiar with that and hopefully they can advise. The issue of how best to define in RDF general-use vocabulary terms seems to me to be a more difficult issue. In Darwin Core, we have the normative definitions in RDF. This presents problems if there later turns out to be problems with that RDF - it requires going through the term change process to fix those problems. Audubon Core is following a different model, which is to define the terms in human language and leave the problem of creating the RDF for URI dereferencing until later (I think). I am hoping that as the RDF group looks carefully at how Darwin Core terms can be used in RDF that we can figure out what the actual best-practices are and provide some guidance for future vocabulary development. I need to take some time to digest Paul's comments on this subject. Steve -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu

4879

Age (days ago)

5146

Last active (days ago)

List overview

Download

21 comments

13 participants

participants (13)

"Markus Döring (GBIF)"
Chuck Miller
Dag Endresen (GBIF)
Dag Endresen [GBIF]
greg whitbread
Kennedy, Jessie
Kevin Richards
Paul Murray
Richard Pyle
Roderic Page
Steve Baskauf
Tony.Rees＠csiro.au
Éamonn Ó Tuama [GBIF]