Dag and Éamonn, Thanks for your interest and response. Responses to your comments inline.
On 3/7/2012 7:18 AM, Dag Endresen [GBIF] wrote:
Do you envision that the guidelines for providing documentation on a controlled (value) vocabulary might be included as part of the TDWG Standards Documentation Specification (http://www.tdwg.org/standards/147/)? And to eventually move to seek ratification for this standard initiated by Roger Hyam some years ago...?
Well, I think that it would be extremely beneficial to finish that specification. I think it provides very useful guidelines for documenting human-readable documents. However, it says little about how to document non-human readable documents. I have brought this issue up in the context of Darwin Core where it is not very clear which RDF documents are normative. If Roger is not working on this any more, perhaps it could be passed off to somebody else (not me, please). I would recommend that whomever works on it consult with the authors and review managers of the standards which are undergoing ratification or which have been ratified since the existing documentation standard draft was written. I think that would be Darwin Core, Audubon Core, the GUID/LSID Applicability Statement, and TAPIR (I might have missed one). Those people would probably have useful feedback on what did and did not work for them. In particular, I think that the part of the draft which says that there is no versioning of standards needs to be reworked. I understand the rationale, but as a practical matter we are ignoring that prohibitin in Darwin Core which has a namespace policy that allows the standard to evolve without re-submission and ratification. Audubon Core is considering a similar policy. If the DwC namespace policy is something that works (I suppose that is subject to debate) it could be written into the documentation standards.
Or could these RDF guidelines be seen more as the kind of Type 3 (or type 2) documents that Roger describes with the proposed documentation specification standard.
I think that it might have been a mistake on my part to have mentioned creating a standard in the email because it seems to have brought up all kinds of ancient history of which I'm not aware. But then my initial email was just to bring up the idea with you and Éamonn rather than to start a list discussion (which happened inadvertently when I used "reply all" to a previous message of yours - remind me not to do that again). At this point it may be premature to suggest that it be a standard, although it could be one of the type "Current Best Practice". But first I think that there needs to be some more serious discussion and experimentation in the context of the RDF Task Group to find out whether creating RDF representations of controlled vocabularies actually accomplishes anything useful or not.
I have been following the first discussions on the TDWG-RDF Google Group with great interest. The discussion on how to report values as literals when a term is declared with the range to be a non-literal resource type has been very educational for me to follow. It seems to me that this is still a topic under exploratory discussion in the Dublin Core Metadata Initiative group, however some guidelines for the TDWG best practices would be valuable! A large part of the RDF vocabulary guidelines would probably be on a less technically detailed level?
As one who has struggled to understand the DCMI Abstract Model, I think that if we cannot provide guidelines which can be understood in a short period of time by people with a general understanding of data management, then we are wasting our time. I think that an important part of this is providing concrete examples.
Do you think that the same guidelines can cover the scope for both the RDF vocabularies of terms and the controlled value vocabularies? My thinking is that either terms or controlled values can be described as concepts in a basic RDF vocabularies - and that these concepts can be re-used in derived resources such as the Darwin Core Archive extensions and vocabularies (or re-used by ontologies). The best practices for constructing a vocabulary defining concepts intended either as controlled values or concepts intended as terms could be largely the same?
I don't know the answer to this. I was thinking about controlled vocabularies because they are generally initially defined as text strings. It would therefore be relatively easy to combine some form of those strings with a namespace to create a URI, then provide minimal RDF to support dereferencing. We have examples of this in the three controlled vocabularies I mentioned in my initial email. I think that we have learned from the experience with the Darwin Core type vocabulary that it is probably unwise to embed things like subclass properties within the term definition RDF itself. However, I am intrigued by the possibility of including enough semantics to allow a client to figure out that terms which are expressed in different languages are equivalent (as Éamonn mentioned in his response). I don't know what the best strategy for doing that is. The Library of Congress language tag RDF expresses relationships using SKOS terms, but I do not have enough experience with KOS to know how widely clients are equipped to make use of that kind of information. But I know that we have people in our community who are familiar with that and hopefully they can advise.
The issue of how best to define in RDF general-use vocabulary terms seems to me to be a more difficult issue. In Darwin Core, we have the normative definitions in RDF. This presents problems if there later turns out to be problems with that RDF - it requires going through the term change process to fix those problems. Audubon Core is following a different model, which is to define the terms in human language and leave the problem of creating the RDF for URI dereferencing until later (I think). I am hoping that as the RDF group looks carefully at how Darwin Core terms can be used in RDF that we can figure out what the actual best-practices are and provide some guidance for future vocabulary development.
I need to take some time to digest Paul's comments on this subject.
Steve