November 2012 - tdwg-tag - lists.tdwg.org

joining the TDWG vocabulary management task group
by Éamonn Ó Tuama [GBIF] 30 Jan '13

30 Jan '13

Apologies for any cross postings. At its recent conference in Beijing, the TDWG Executive/TAG approved the formation of a task group on vocabulary management. It goes by the acronym VoMaG (Vocabulary MAnagement Group). A revised charter (http://community.gbif.org/pg/file/read/28812/vocabulary-management-group-vo mag-charter-v1), outlining the goals and mode of working of the group is available on the GBIF community site (http://community.gbif.org/pg/groups/21382/vocabulary-management/) In the absence of a functioning TDWG site, we intend to use this site as the meeting place for discussions, sharing docs, etc. Please consider joining VoMaG as a core member. You should email me and also register on the GBIF community site. You can log in using OpenID if you have a Google or Yahoo account. Once the core group is established, we can proceed with developing a work programme/timeline to achieve our goals. With regards, Éamonn Ó Tuama ____________________________________________________ Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama(a)gbif.org) Senior Programme Officer, Global Biodiversity Information Facility Secretariat, Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK Phone: +45 3532 1494; Fax: +45 3532 1480

2 1

Re: [tdwg-tag] [tdwg-rdf: 105] Re: Any TCS users with experiences to report?
by Steve Baskauf 28 Nov '12

28 Nov '12

I read Rich's email as quoted in Nico's reply - I think maybe Rich's post didn't actually go out on the tdwg-tag or RDF group lists. Rich mentions that he is swamped and will reply later. For the moment it may be helpful to cite an earlier email of Rich's which it took me some time to dig out of the tdwg-content email list: http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001703.html In that post, Rich was responding to a thread that started when I asked how one would handle a real-life situation (the specimen pictured in http://images.cyberfloralouisiana.com/images/specimensheets/lsu/0/0/4/28/LS…) The relevant part begins about half way down the page with "In the web example given by Steve, we have... ". In that section, Rich notes that "Eventually, a third party may be able to deduce (perhaps through a suite of other, external information) a RelationshipAssertion that maps the TNU "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" to some other, perhaps published and well-defined taxon concept (of the same or different name). Also, if there are 100 specimens in the collection that L. Urbatsch identified as "Juncus diffusissimus Buckl." in 2009, then anchoring all 100 Identification instances to the one TNU, allows all of those specimens to inherit the mapping of the one "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" TNU instance to some other better-defined taxon concept." From that post, I understood that a TNU (a.k.a. "assertion" in Pyle 2004 http://systbio.org/files/phyloinformatics/1.pdf) can be as vague as an idea that some determiner had in his/her head about how organism/specimen instances should be mapped to a name. I think from what Rich said there that there is the potential that we as metadata aggregators may at some later point be able to map how that idea in the determiner's head fits in with a more well-defined (e.g. published) taxon description which one may choose to call a taxon concept rather than a TNU. As so often is the case, I think the problem here boils down to identifiers and the metadata that we associate with them. Let's say in the real-life example above, somebody (we can say GNUB) assigns a persistent identifier (perhaps a URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec L. Urbatsch 2009". We could say with an rdf:type statement that the resource identified by the URI is a TNU. We can give that resource a tc:hasName property linking it to the name which is represented by the string "Juncus diffusissimus Buckl.". (I'm not sure what property we use to say that L. Urbatch made the assertion). Now let's say that L. Urbatsch publishes a paper describing in detail her concept of Juncus diffusissimus Buckl. We can now assign the resource identified by the URI a tc:accordingTo property whose value is the DOI of the paper she wrote. If we want, we can replace the previous rdf:type statement with different one stating that the resource is a taxon concept rather than a TNU, or if we believe that all taxon concepts are also TNUs we can leave the rdf:type statement that we had before and just add a second one saying that the resource is also of type taxon concept. The point I'm trying to make is that as long as this "thing" that we are variously calling "taxon name usage", "taxon concept", "shallow taxonomic concept", or "deep taxonomic concept" can be assigned an identifier, what really matters is the metadata we associate with it, not really what we call it. The more metadata that we can connect with it, either through datatype properties like name strings or object properties that describe how the "thing" is related to other resources, the "deeper" the concept. On the other extreme, we may know nothing more than the name string. In that case we could call it a "nominal concept", but we could still assign it an identifier and maybe with luck we could associate more metadata with it (make it "deeper") at some point in the future. Returning to the original question of the thread (which was about the utility of TCS), TCS tries to deal with this problem using a thing called "signatures" (section 17.2, see http://bioimages.vanderbilt.edu/pages/TCS-Schema-UserGuide-v1.3.pdf) which are a somewhat crude attempt to make identifying strings unique by standardizing their format. However, TCS was written in 2005-2006. Since then, the development of DOIs, the TDWG GUID Applicability Statement standard, and best practices in the Linked Data world have provided well-established and standardized ways to create persistent and dereferenciable identifiers. So there isn't any reason I can see why we can't use them. I am going to be bold and say that we already have the minimum tools required to get started implementing TNUs/TaxonConcepts: - URI GUIDs (which if one preferred could be UUIDs or LSIDs -- HTTP proxied to make Linked Data people happy; see the TDWG GUID Applicability Statement standard if you don't know how to do this) to identify the TNU/concepts, - the two terms tc:hasName and tc:accordingTo (from the TDWG Taxon Concept ontology) to relate the TNUs/TaxonConcepts to names and sec. references, and - some sources for name and publication URI GUIDs. There are deficiencies all over the place for that last item, but they can be addressed over time by improving the scope of the relevant databases and the quality of the metadata provided. uBio has URIs for almost every name I've ever looked for. BHL has a growing collection of old literature which has been assigned identifiers by Rod Page's BioStor, new literature usually has an assigned, dereferenceable proxied DOI, and one can even make valid URIs from ISBNs of books (although they aren't resolvable). I'm not sure how one should address the situation where the "sec." reference of a TNU is a person and date since there isn't a standard database of people (as far as I know). But that could be remedied. Ultimately, one could create the kinds of mapping tools that Nico and Rich are talking about which relate different taxon concepts/TNUs which have set theory relationships. Whether that would be done with RDF, OWL, or something completely different I don't know, but the basic anchoring of persistent identifiers for the TNU/concepts to the names and sec. references wouldn't have to wait on that. We could also get hung up about what terms to use to express the metadata describing the basic TNU/name/sec. resources, but there is nothing that says that metadata can't change or be improved over time. It's the identifier that shouldn't change. Am I wrong about this??? Steve Nico Franz wrote: > Thank you, Rich. > > So we seem to agree on something like this: > > Rich Nico > taxon name usage <===> "shallow" taxonomic concept > taxon concept <===> "deep" taxonomic concept > > Both: labeling is via name sec. author > Both: authoring concepts/usages vs. identifying to those => slippery > issue; ideally requires proper speaker awareness. > > Why the latter? - well, because (again) the desirable effect of > using concepts - the desirable situation where these would have a > justification that goes beyond just really meticulous data management > and advances to the level of facilitating better science qua more > precise taxonomic semantics - only obtains if a great number of name > occurrences in a wide range of shallow-ish sources is linked via > identification to a presumably smaller number of occurrences where > those names are well defined and where successive definitions of names > are semantically linked. So there needs to be an emerging culture of > minimizing concept inflation. Otherwise we obtain what we have now > (mostly just names) and on top of that add new baggage (lots of really > shallow concepts) that nobody can do good semantics with. > > Here is where I think we disagree, perhaps just in terms of sales > strategy: > > You seem to suggest that making an a priori distinction between > TNUs and concepts is (1) possible in a good number of cases, (2) is > desirable perhaps in the form of registry, and (3) even necessary for > building and populating databases, etc. > > Here I disagree, for a number of reasons. First off I do believe > that not defining certain things too soon or too narrowly is sometimes > actually really good science and on the other hand, doing so can be a > show stopper if other people don't share this narrowness and find it > limiting. Second, while we can perhaps readily agree that a lengthy > monograph published in American Museum Novitates rises to the level of > authoring next concepts whereas a label saying "Family Carabidae" on a > specimen submitted as part of an insect student collection does not, > there are enough in-between cases where only time will tell. > > Example: USDA Plants promotes a particular perspective of > groundcherry taxonomy, genus-level concept Physalis - > http://plants.usda.gov/java/profile?symbol=physa - with some 29 > species-level concepts recognized. ASU's herbarium curator Les Landrum > is a bit of a groundcherry nerd (I say this with admiration). If you > go here: http://swbiodiversity.org/seinet/index.php, then Search > Collections => Select All => Next => Scientific Name = Physalis => > Search, you get some 3700 pertinent specimen records. If you then > switch to the Species List tab, you see 115 concept listed overall. > Switching to the USDA Plants Thesaurus will give you only 46 concepts > that these 3700 specimens are mapped to. Using instead the ASU > Taxonomic Thesaurus will yield 89 concepts linking variously to those > specimens. This is based on Les' classification of groundcherries > which is not further documented in the SEINet environment at this moment. > > Now, saying a a priori whether Les' list represents a set of TNUs > versus concepts would presumably require you to assert that there is > nobody who is Les or very much like him that can provide a > semantically very accurate mapping of the 89 name usages in the > SEINet-ASU Physalis list to the much more thoroughly circumscribed > USDA Plants concepts. That could seem like a daring prediction given > how little Les might think of the USDA perspective. At the very moment > that Les or someone very much like him DOES provide the mapping, what > looked like a list of TNUs then all of a sudden acquires - via the > mapping - a much deeper semantic status where others can readily go > from one classification to the next, even though each come with very > different amounts of information in their original appearances. Some > people may prefer Les' concepts at least for Arizonan groundcherries, > and in either case, the mapping put both on an even playing field. > > So this exemplifies IMO why so far the concept approach has been > too abstract, the TCN has been too depauperate on the > relationships/mapping side (worrying instead almost needlessness about > what constitutes a concept per se), and definitions between > identifications, name usages, shallow, deep concepts have been too > abstract as well. I believe we should focus less discussion on those > issues and more emphasis on building mapping tools that can carry a > wide range of input and logically infer additional implied mappings > from the initial expert-given set. The actual semantic properties of > that input will emerge a posteriori and will be hard to predict in > some cases. Some descriptions are lengthy but nobody understands them. > Some names lists are profoundly informative if the context of their > origin is well known to an expert. > > There will be some obvious overreaches in both directions (too many > unconnected items, some items that are connected more precisely than > their inherent information would seem to justify). I think these > overreaches would be tolerable. What's less productive to me is a > restrictive set of definitions that provide an early blockage in they > way towards an environment where mapping is supposed to occur very > frequently. We're not at the registry stage yet. More at the "can this > work in principle" stage. As I mentioned before, the mappings ARE the > concepts under a certain viewpoint. We don't want to pre-determine > their fate by separating TNUs from concepts in a great number of cases. > > I hope this was not a misrepresentation of your view and also a > clarification of my view. In the end, we both advocate some sort of > balance for the same concerns, but perhaps disagree only strategically > about the moment where/when that balance will materialize - upfront > via precise definitions and registration or later on via the > presence/lack of actual mappings. > > Best, > > Nico > > > On Mon, Nov 26, 2012 at 5:18 PM, Richard Pyle > <deepreef(a)bishopmuseum.org <mailto:deepreef@bishopmuseum.org>> wrote: > > I want to get into this topic in more detail (going back to > Steve’s original post), but this week is hell-week for me, so only > a quick comment now. > > > > I generally agree with everything Nico says, but I think we need > to be a little more clear of what we mean by “name sec. author” > > > > The core unit of the data model we’ve been building towards (GNUB, > which underlies ZooBank) uses as its fundamental unit something > we’ve been calling a “Taxon Name Usage Instance” (TNU). The scope > of what can be a TNU is intentionally very broad – anything from > an original taxon name description, to a mention in a newspaper > article, and potentially even a scribbled hand-written label or > letter. The only requirement is that it be static – that is, a > snapshot in time. I mention this because database records can be > represented as TNUs, but only as a static snapshot of the record. > If the essence of the database record changes over time (e.g., due > to changing taxonomic opinion), then a new TNU is generated for a > different snapshot in time. > > > > A very small subset of the universe of TNUs represent > Code-governed Nomenclatural Acts (original descriptions of new > names and other code-governed nomenclatural actions). In the case > of such TNUs involving the ICZN Code (for example), the TNUs are > registered in ZooBank. But the point is, one subset of all TNUs > are those that involve actions governed by a Code of nomenclature. > > > > The reason I mention this is that, if I read Nico’s email > correctly, I think he’s saying that not all TNUs de-facto > represent taxon concepts. Rather, analogous to the nomenclatural > subset of TNUs, there is a subset of TNUs that rise to the level > of representing Taxon Concept definitions. In the case of > nomenclatural acts, someone must make some sort of declaration > (assertion) that a specific TNU constitutes a Code-governed > nomenclatural act, along with relevant metadata relating to that > assertion and the nature of the Act. In the case of zoological > names, ZooBank is intended to facilitate this role (i.e., when a > person registers a TNU in ZooBank, there is an implied assertion > that the TNU represents a nomenclatural act under the ICZN Code). > > > > What would be nice to have (and what TDWG could play a helpful > role in facilitating), is a registry of sorts (analogous to > ZooBank) for those TNUs that represent taxon concepts. In other > words, a mechanism for people to “register” the subset of all TNUs > that represent taxon concepts. Secondarily, there would also be a > mechanism to make assertions about how registered taxon concepts > map to each other (via some sort of set theory relationship[s]). > > > > In summary, my points are > > 1) We should be clear when we say “name sec. author” whether > we mean it sensu lato (i.e., all TNUs); or sensu stricto (i.e., > only those TNUs that rise to the level of representing taxon > concepts). > > 2) There ought to be a registry (perhaps administered by > CoL?) for identifying the subset of TNUs that represent concept > definitions, and it should include a mechanism for making > set-theory relationship assertions among registered concept-TNUs. > > 3) The two things mentioned in #2 should be separate; that > is, one can assert that a particular TNU represents a taxon > concept separately from (potentially multiple) assertions about > how that taxon concept relates to other taxon concepts. > > > > More later. > > > > Aloha, > > Rich > > > > P.S By my standards that WAS quick! > -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu

4 7

Re: [tdwg-tag] [tdwg-rdf: 103] Re: Any TCS users with experiences to report?
by Richard Pyle 27 Nov '12

27 Nov '12

Hi Nco, > So we seem to agree on something like this: > > Rich Nico > taxon name usage <===> "shallow" taxonomic concept > taxon concept <===> "deep" taxonomic concept Not quite. They're all based on TNUs. I would represent it more like this: TNU without well-defined concept <===> "shallow" taxonomic concept (aka "potential taxon" sensu Berendsohn) TNU with well-defined concept <===> "deep" taxonomic concept > Both: labeling is via name sec. author Labeling of what? Should "sec." (or whatever abbreviated term I used) mean simply "this author used this name", or should it be restricted to mean "this author used this name in a way that was accompanied by a well-defined taxon concept circumscription"? In my discussions with various folk on questions related to GNUB & ZooBank, there seems to be a preference for "sensu" rather than "sec". > Both: authoring concepts/usages vs. identifying to those => slippery issue; ideally requires proper speaker awareness. Agreed -- this is where the assertions are needed. In the case of nomenclatural acts, we have robust Codes that prescribe specific criteria for what nomenclatural acts are, and thus there is a (mostly) objective mechanism for recognizing which among the entire universe of TNUs are associated with nomenclatural acts (and, hence, which should be registered as such). However, there is no universally adopted "Code of Taxon Concepts" that outline such criteria, to determine which TNUs are accompanied by sufficiently robust concept circumscription definitions, and which are merely non-concept TNUs. I think it would be a mistake to attempt to define such a Code -- (similar problem to the "what is a species" issue). > Why the latter? - well, because (again) the desirable effect of using concepts - > the desirable situation where these would have a justification that goes beyond > just really meticulous data management and advances to the level of facilitating > better science qua more precise taxonomic semantics - only obtains if a great > number of name occurrences in a wide range of shallow-ish sources is linked via > identification to a presumably smaller number of occurrences where those names > are well defined and where successive definitions of names are semantically linked. > So there needs to be an emerging culture of minimizing concept inflation. > Otherwise we obtain what we have now (mostly just names) and on top of that > add new baggage (lots of really shallow concepts) that nobody can do good semantics with. Agreed! > Here is where I think we disagree, perhaps just in terms of sales strategy: > > You seem to suggest that making an a priori distinction between TNUs and > concepts is (1) possible in a good number of cases, (2) is desirable perhaps in > the form of registry, and (3) even necessary for building and populating databases, etc. Sort of; but again, they're all based on TNUs. The issue is how to recognize which ones are associated with robust concept definitions, and which are not. So it's not about distinguishing between TNUs and concepts; it's about recognizing which TNUs include concept definitions, and therefore represent reference-points to concepts. I don't think this can be easily defined -- I think the only way to do it is by letting people assert it to be so. In other words, the definition of a TNU that represents a taxon concept is that someone says it is so. (Much like my own definition of a "species"!) > Here I disagree, for a number of reasons. First off I do believe that not defining > certain things too soon or too narrowly is sometimes actually really good science By "certain things", do you mean "defining a particular TNU as having a robust concept definition associated with it"? Or, do you mean "defining what a taxon concept is, in general"? > and on the other hand, doing so can be a show stopper if other people don't share > this narrowness and find it limiting. I think I agree -- but until I understand what you mean by "certain things", I can't be sure. > Second, while we can perhaps readily agree that a lengthy monograph published > in American Museum Novitates rises to the level of authoring next concepts > whereas a label saying "Family Carabidae" on a specimen submitted as part of > an insect student collection does not, there are enough in-between cases where > only time will tell. Absolutely!! No disagreement here. The idea is not to say "this TNU includes a good concept definition". The idea is to say "this person asserts that this TNU is a good concept definition". I agree, we don't want to be too precise on drawing that arbitrary line. Indeed, I don't think it's possible to do so. Hence, the "Code of Taxon Concepts" is, I think, a bad idea (as stated above). Ultimately, the definition of a taxon concept is that someone says it is so. > So this exemplifies IMO why so far the concept approach has been too abstract, > the TCN has been too depauperate on the relationships/mapping side (worrying > instead almost needlessness about what constitutes a concept per se), and > definitions between identifications, name usages, shallow, deep concepts have > been too abstract as well. I believe we should focus less discussion on those > issues and more emphasis on building mapping tools that can carry a wide > range of input and logically infer additional implied mappings from the initial > expert-given set. The actual semantic properties of that input will emerge a > posteriori and will be hard to predict in some cases. Some descriptions are > lengthy but nobody understands them. Some names lists are profoundly > informative if the context of their origin is well known to an expert. I agree with all of this. I think it would be wise to post-pone the development of any sort of concept registry until the a posteriori analysis is done. Build the maps first (based on assertions from willing practitioners), then decide whether there needs to be a "registry" for which TNUs represent concepts, and which do not. > There will be some obvious overreaches in both directions (too many unconnected > items, some items that are connected more precisely than their inherent > information would seem to justify). That is inevitable! > I think these overreaches would be tolerable. What's less productive to me is a restrictive > set of definitions that provide an early blockage in they way towards an environment > where mapping is supposed to occur very frequently. Agreed. > We're not at the registry stage yet. More at the "can this work in principle" stage. > As I mentioned before, the mappings ARE the concepts under a certain viewpoint. > We don't want to pre-determine their fate by separating TNUs from concepts in a > great number of cases. Yes. I think we're on the same page -- just on different time scales. I was looking towards the end-game, but you're focused more on "where do we go from here". I agree that we should focus on your timescale first, and let the longer-term solution emerge from that. > I hope this was not a misrepresentation of your view Only slightly.... :-) > and also a clarification of my view. In the end, we both advocate some sort of balance > for the same concerns, but perhaps disagree only strategically about the moment > where/when that balance will materialize - upfront via precise definitions and > registration or later on via the presence/lack of actual mappings. Actually, I don't think we disagree on strategy. I agree with everything you describe above, and now feel that a registry of concepts is better thought of as a possible end-game. Better now to define a mechanism to allow people to make cross-reference assertions among TNUs, and let the issue of "what is a taxon concept" (specifically: which TNUs represent good taxon concept definitions) emerge from that. Aloha, Rich Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef(a)bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control.

1 0

Re: [tdwg-tag] [tdwg-rdf: 103] Re: Any TCS users with experiences to report?
by Richard Pyle 27 Nov '12

27 Nov '12

I want to get into this topic in more detail (going back to Steve's original post), but this week is hell-week for me, so only a quick comment now. I generally agree with everything Nico says, but I think we need to be a little more clear of what we mean by "name sec. author" The core unit of the data model we've been building towards (GNUB, which underlies ZooBank) uses as its fundamental unit something we've been calling a "Taxon Name Usage Instance" (TNU). The scope of what can be a TNU is intentionally very broad - anything from an original taxon name description, to a mention in a newspaper article, and potentially even a scribbled hand-written label or letter. The only requirement is that it be static - that is, a snapshot in time. I mention this because database records can be represented as TNUs, but only as a static snapshot of the record. If the essence of the database record changes over time (e.g., due to changing taxonomic opinion), then a new TNU is generated for a different snapshot in time. A very small subset of the universe of TNUs represent Code-governed Nomenclatural Acts (original descriptions of new names and other code-governed nomenclatural actions). In the case of such TNUs involving the ICZN Code (for example), the TNUs are registered in ZooBank. But the point is, one subset of all TNUs are those that involve actions governed by a Code of nomenclature. The reason I mention this is that, if I read Nico's email correctly, I think he's saying that not all TNUs de-facto represent taxon concepts. Rather, analogous to the nomenclatural subset of TNUs, there is a subset of TNUs that rise to the level of representing Taxon Concept definitions. In the case of nomenclatural acts, someone must make some sort of declaration (assertion) that a specific TNU constitutes a Code-governed nomenclatural act, along with relevant metadata relating to that assertion and the nature of the Act. In the case of zoological names, ZooBank is intended to facilitate this role (i.e., when a person registers a TNU in ZooBank, there is an implied assertion that the TNU represents a nomenclatural act under the ICZN Code). What would be nice to have (and what TDWG could play a helpful role in facilitating), is a registry of sorts (analogous to ZooBank) for those TNUs that represent taxon concepts. In other words, a mechanism for people to "register" the subset of all TNUs that represent taxon concepts. Secondarily, there would also be a mechanism to make assertions about how registered taxon concepts map to each other (via some sort of set theory relationship[s]). In summary, my points are 1) We should be clear when we say "name sec. author" whether we mean it sensu lato (i.e., all TNUs); or sensu stricto (i.e., only those TNUs that rise to the level of representing taxon concepts). 2) There ought to be a registry (perhaps administered by CoL?) for identifying the subset of TNUs that represent concept definitions, and it should include a mechanism for making set-theory relationship assertions among registered concept-TNUs. 3) The two things mentioned in #2 should be separate; that is, one can assert that a particular TNU represents a taxon concept separately from (potentially multiple) assertions about how that taxon concept relates to other taxon concepts. More later. Aloha, Rich P.S By my standards that WAS quick! From: Nico Franz [mailto:nico.franz@asu.edu] Sent: Monday, November 26, 2012 1:40 PM To: tdwg-rdf(a)googlegroups.com Cc: Roderic Page; Tony.Rees(a)csiro.au; pmurray(a)anbg.gov.au; Simon.Pigot(a)csiro.au; J.Kennedy(a)napier.ac.uk; eotuama(a)gbif.org; tdwg-tag(a)lists.tdwg.org; deepreef(a)bishopmuseum.org; David Patterson Subject: Re: [tdwg-rdf: 103] Re: [tdwg-tag] Any TCS users with experiences to report? Thank you, Steve. This I think is quite helpful in summarizing where we are and aren't on this issue. I'd like to respond / reiterate mainly in relation to Rich's comments. http://lists.tdwg.org/pipermail/tdwg-tag/2012-November/002526.html Regarding the notion of what a concept "is" - Well, scientific definitions quite often can be deliberately vague and still work well enough (or work precisely because of that vagueness). I think there is acceptable middle ground here. Perhaps it is best to focus on the expected actions of concepts. (1) Concepts are intended to provide taxonomic resolution (i.e. help us understand whether two authors talk about the same or different perceived entities in nature [natural entities for the realist, human perceptions for the constructivist - no great matter where we stand on that one]) BEYOND what can be achieved via name strings and synonymy relationships. (2) Concepts are LABELED with a name sec. author combination (or something similar to to that). (3) (1) is relevantly fulfilled IF an expert feels confident enough in asserting that concept 1 and concept 2 have a single, or a combination of multiple, set theory relationship(s), or some sort of percent matching relationship, that goes beyond "somehow overlapping / not" (which type-anchored nomenclatural relationships could typically convey as well). This means that, to a degree (and reflecting real biological practice, I'd think), whether some name sec. author instance attains the level of concept - as in "concept that does more taxonomic resolution than just names" - or not is contingent on there being sufficient context for *some* expert to make such an assertion ("congruent or more inclusive than but not any of the other three RCC-5 relationships). As you know, learning about context is something that experts are very good at. It doesn't always jump out from the written pages. I cannot quite see how this is a profound issue for modeling in a database environment. You handle the name sec. author items. The presence or absence of concept relationships will be (by and large, over time) reflective of which subsets of the name sec. authors items allow deeper semantics in the sense of (1) and (3). Providers can have the option of IDENTIFYING specimens/observations TO a (set of) concept(s) if suitable concepts are already available in the environment and thus they see no need to add new ones. Most of this may not be a technical challenge but more a challenge of relearning speaker roles and making context explicit as one's own or someone else's. We tend to have this information in mind when we write systematic and non-systematic works. I think there are lots of valuable datasets out there presenting single, snapshot concept hierarchies of incredible value. What we need, if for nothing else then for the sake of argument, are tools to import these - two at a time initially - into a concept compatible environment that would allow visualization and mapping to take place and be checked for consistency. I say "for the sake of argument" because that model by itself has no reward scheme yet for an expert to get involved (though some might do it on their own initiative). I do believe that TDWG is a proper body to take on and promote this task but TDWG has perhaps more readily focused on leveraging existing information as opposed to being very proactive in facilitating new taxonomic practices. IMO we cannot just focus on the data that are there, but can also build tools to generate better primary data from now on moving forward. Best, Nico On Wed, Nov 21, 2012 at 8:24 AM, Steve Baskauf <steve.baskauf(a)vanderbilt.edu> wrote: In an effort to prevent the loss of information presented in the recent thread on TCS as it relates to RDF, I have created a summary at http://code.google.com/p/tdwg-rdf/wiki/TCSthread with links to the archived messages. I left out posts focused specifically on XML schemas which I consider to be outside the scope of the TDWG RDF group (that is somebody else's problem). I will continue to add to this page as additional relevant posts occur and can also post more "further information" links if anybody wants to send them to me. Steve -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582> , fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu

1 0

Re: [tdwg-tag] Any TCS users with experiences to report?
by Roderic Page 21 Nov '12

21 Nov '12

Playing devil's advocate I think there are several issues here: 1. The example you gave of an OGC query illustrates what for me is a major limitation of existing approaches (such as DiGiR and TAPIR), they focus on standardising queries not identifiers. Hence we can query databases in a consistent (if cumbersome) way, but have no easy way to refer to the things (taxa, specimens, etc.) we retrieve. Having stable, reusable, resolvable identifiers would be a step forward. 2. Taxonomic concepts aren't much use unless connected to data. Arguably the most widely used taxonomic database in biodiversity is the NCBI taxonomy database, which has stable identifiers, an API, and taxa that are connected to data (sequences and publications). The GBIF backbone classification is also connected to data (specimens and observations) although its taxon identifiers (like its occurrence ids) aren't terribly stable. 3. I think the standards-first approach tends to put the cart before the horse. I'm not sure it's the lack of standards that is the problem, it's the lack of usable information in taxonomic databases. Apart from NCBI and GBIF, what science can I do with taxonomic databases? What questions do they allow me to ask? Regards Rod Sent from my iPhone On 3 Nov 2012, at 03:41, <Tony.Rees(a)csiro.au> wrote: > Hi Jessie, also others who have responded thus far, > > You said: > >> I think it would be great if the major databases that describe taxa (not >> just list names) described their data as concepts and allowed people to >> link to their databases when identifying specimens and when sequencing >> etc, this would be the start of a really useful biodiversity network. > > Agreed! And also the databases that "just list names" are dealing with concepts as we know, comprising a valid name plus all listed synonyms in these cases... > > My feeling is the reason that there is not yet any standardization in this area - every data resource does its own thing using its own home-grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards-compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example > > http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&request=Ge… > > So maybe for either TCS, DwC and so on a missing part of the task is to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course! > > More soon, >

3 5

Proposed charter for a TAG Vocabulary Management Task Group (VOMAG)
by Dag Endresen (GBIF) 03 Nov '12

03 Nov '12

Dear TAG, After battling with the plans for a biodiversity knowledge organization (KOS) framework for biodiversity information resources we have identified the requirement to develop guidelines and best practices for the management of vocabularies of terms. Basic terms organized in vocabularies provides an essential element to underpin the biodiversity information standards. As introduced at the TDWG 2011 TAG meetings in New Orleans, we propose the formation of a new Vocabulary Management Task Group (VOMAG) [1] to be organized under the TDWG technical architecture group (TAG). Please find the draft charter available from the GBIF Community Site [2][3]. Here you will also find another draft document "Biodiversity Knowledge Organization System: Proposed Architecture; Version 0.1 February 2012" that provides an overview of the proposed KOS landscape and the context for the proposed work-plan of the Vocabulary Management Task Group Charter. This is the first draft so far only discussed with Greg Whitbread as convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as convener of the TDWG RDF/OWL task group. We invite feedback and comments to the proposed formation of the task group including suggestions with regard to the work-plan. Please join the Vocabulary Management group at the GBIF Community Site [1]. You can start or participate in discussions or share suggestions using the GBIF Community Site. Feel also free to make contact with us to volunteer as a core member for this proposed task group! [1] http://community.gbif.org/pg/groups/21382/vocabulary-management/ [2] http://community.gbif.org/pg/file/read/21388/ [3] http://community.gbif.org/pg/blog/read/21387/ [4] http://community.gbif.org/pg/file/read/21582/ Best regards Dag, Eamonn and David -- Dag Endresen, PhD Knowledge Systems Engineer Global Biodiversity Information Facility (GBIF) Universitetsparken 15, DK-2100 Copenhagen, Denmark http://community.gbif.org/pg/profile/dag.endresen

13 21

Re: [tdwg-tag] tdwg-tag Digest, Vol 67, Issue 4
by Éamonn Ó Tuama [GBIF] 02 Nov '12

02 Nov '12

As it happens, the TDWG exec has just given the go-ahead (at Beijing meeting) to form a vocabulary management task group under the TAG to look into how TDWG can better manage the development, maintenance and governance of vocabularies. A general call for participation will be issued in the next few weeks. One obvious task is to see where the TDWG Ontology now stands in relation to DwC and vocab best practices (e.g., why does the TDWG Ontology re-invent rather than re-use an existing vocab like, e.g., FOAF) and whether we should extract the best parts (e.g., Taxon Name, Taxon Concept, Collection) for clean-up, promotion, etc. In the absence of a functioning TDWG web site, we have put up a draft of the the VoMaG charter on the GBIF community site http://community.gbif.org/pg/groups/21382/vocabulary-management/ . The doc itself is here: http://community.gbif.org/pg/file/read/21388/vocabulary-management-task-grou p-charter-a-task-group-of-the-tag-interest-group-draft-version-february-2012 . Please get involved and help us to refine the charter and define tasks for the group. Éamonn -----Original Message----- From: tdwg-tag-bounces(a)lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of tdwg-tag-request(a)lists.tdwg.org Sent: 02 November 2012 02:13 To: tdwg-tag(a)lists.tdwg.org Subject: tdwg-tag Digest, Vol 67, Issue 4 Send tdwg-tag mailing list submissions to tdwg-tag(a)lists.tdwg.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-tag or, via email, send a message with subject or body 'help' to tdwg-tag-request(a)lists.tdwg.org You can reach the person managing the list at tdwg-tag-owner(a)lists.tdwg.org When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-tag digest..." Today's Topics: 1. Re: Any TCS users with experiences to report? (Richard Pyle) ---------------------------------------------------------------------- Message: 1 Date: Thu, 1 Nov 2012 15:13:20 -1000 From: Richard Pyle <deepreef(a)bishopmuseum.org> Subject: Re: [tdwg-tag] Any TCS users with experiences to report? To: "'Roderic Page'" <r.page(a)bio.gla.ac.uk>, <Tony.Rees(a)csiro.au> Cc: pmurray(a)anbg.gov.au, tdwg-tag(a)lists.tdwg.org, Simon.Pigot(a)csiro.au Message-ID: <006c01cdb897$3f978700$bec69500$(a)bishopmuseum.org> Content-Type: text/plain; charset="utf-8" As the Convenor of the TDWG Taxon Names and Concept group, I have failed in one of my core duties to address this issue. My inability to attend TDWG this year has only exacerbated this problem. Having said that?.. I have had many discussions with many folks over the past couple of years on this issue, and for various reasons the time is now ripe to re-visit this age-old problem and make some decisions about how to move forward. For the ZooBank LSID resolver, we used Roger?s vocabularies; and to some extent, the DwC terms harmonize (but not completely). A few years ago I made a push to either revitalize TCS (e.g., through TCS 2.0), or to allow it to retire (if it hasn?t already done so de facto). Having just emerged from nearly two very thick years of development on ZooBank, GNA/GNUB, etc., I am now more energized (and liberated, in terms of available time) to re-focus on how to move forward. My hope is that we can make some core decisions about how to move forward well before next year?s TDWG meeting. I would very-much welcome feedback from people on: 1) Who is actively using TCS? Does it work? Can it be improved? Should it be retired? 2) Who is using Roger?s vocabulary? Does it work? Can it be improved? 3) How much of DwC:Taxon is in active use? Just the ?traditional? terms; or some of the new ones introduced with the ratified DwC? Does it work? Can it be improved? 4) What other standards are being used in this space? Now that we have launched the new ZooBank, we will turn our attention to GNUB services that will start to put that content to work. It is therefore very much in our interest to support the sorts of data exchange mechanisms that people most need and, ideally, collapse the various ?flavors? into something we can all rally around. Aloha, Rich Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef(a)bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control. From: tdwg-tag-bounces(a)lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Thursday, November 01, 2012 1:56 PM To: <Tony.Rees(a)csiro.au> Cc: pmurray(a)anbg.gov.au; <tdwg-tag(a)lists.tdwg.org>; Simon.Pigot(a)csiro.au Subject: Re: [tdwg-tag] Any TCS users with experiences to report? A TDWG standard not actually being used, surely not ;) Leaving aside the wisdom of XML schema (yuck) and developing standards independently of actual products, it does puzzle me that the work Roger Hyam did on the LSID vocabularies is consistently overlooked. The is a RDF version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept This was used by CoL in their LSIDs, but because they usually broke I suspect nobody used them. We seem to be in a muddled state at present where there are competing vocabularies in use for taxonomic names and concepts, and these two notions are often not cleanly separated. Whereas nomenclators such as IPNI and Zoobank use the LSID taxon name vocabulary, other databases use vocabularies such as Darwin Core, which rather conflate names and concepts. It's not clear to me how this situation arose, but it somewhat defeats the point of having standards. Regards, Rod Sent from my iPhone On 1 Nov 2012, at 22:41, <Tony.Rees(a)csiro.au<mailto:Tony.Rees@csiro.au>> wrote: Hi TDWG persons, I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route ? I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more? the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their ?ibis? schema, though apparently based originally on TCS, http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumably http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something). I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better. It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts? Regards - Tony Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees(a)csiro.au<mailto:Tony.Rees@csiro.au> Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36 From: tdwg-tag-bounces(a)lists.tdwg.org<mailto:tdwg-tag-bounces@lists.tdwg.org> [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Wednesday, 7 March 2012 12:52 PM To: Steve Baskauf Cc: "?amonn ? Tuama (GBIF)"; TDWG TAG Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED] On 07/03/2012, at 3:11 AM, Steve Baskauf wrote: Dag and ?amonn, In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType . We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type. For instance, http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm Is declared to be a subclass of http://rs.tdwg.org/ontology/voc/TaxonConcept#<http://rs.tdwg.org/ontology/vo c/TaxonConcept>TaxonRelationshipTerm And we have a few specific items of that type: http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li terature-name These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory. Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it. ------------------------------------------------------------- There are two or three approaches to using a standard vocabulary when your own data does not quite match it. You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go. You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive. In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that. You can use the "define your own term" mechanism and assert both _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can have a completely separate predicate: _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of . You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted. Another alternative is to create an OWL rule that says "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of" But this creates a performance hit. ------------------------------------------------------------- That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have. I'd also suggest that that every enumeration (ie, ist of individuals) include two special values: NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional. These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email. _______________________________________________ tdwg-tag mailing list tdwg-tag(a)lists.tdwg.org<mailto:tdwg-tag@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-tag ________________________________ This message is only intended for the addressee named above. Its contents may be privileged or otherwise protected. Any unauthorized use, disclosure or copying of this message or its contents is prohibited. If you have received this message by mistake, please notify us immediately by reply mail or by collect telephone call. Any personal opinions expressed in this message do not necessarily represent the views of the Bishop Museum.

1 0

Re: [tdwg-tag] Any TCS users with experiences to report? [SEC=UNCLASSIFIED]
by Paul Murray 02 Nov '12

02 Nov '12

On 02/11/2012, at 12:57 PM, <Tony.Rees(a)csiro.au> <Tony.Rees(a)csiro.au> wrote: > Thanks, Paul, for the detailed response. > > So in the document I am writing, I will be able to say that the present Australian NSLs as a resource do *not* use TCS although arguably it might not be too difficult to transform specific elements to TCS if the full native richness of the information is not required for our use case? At this stage, yes. We are influenced by TCS, we have copied and extended some of the TCS enumerations, but they aren't TCS and there has been enough water under the bridge that I cannot say with confidence that the values that do match TCS values mean the same thing in all cases. > > For example in our planned use case (extension of ISO metadata) we could still define elements using TCS schema if we wished, and suck/re-render just those from relevant NSL services? Perhaps I spoke too soon. The problem is that if you pull out TCS elements and use them, then the result will not be valid XML. This: <?xml version="1.0" encoding="UTF-8"?> <somedata xmlns="http://myschema" xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01" > <myname> <tcs:ScientificName> <tcs:CanonicalName> <tcs:Simple>Canis</tcs:Simple> <tcs:CanonicalName> </tcs:ScientificName> </myname> </somedata> Is not valid xml - tcs:ScientificName element is not an element definition in tcs. However: TCS does expose some element *types*, notably CanonicalName, ScientificName - the type, not the element. (it would have been nice for them to prepend 'type' to their element types, but pressing on:) So you could define your own schema, your own type, and have it extend TCS CanonicalName. Or you could simply define a XML element as having type canonical name. Either way, this element could not be inserted into a TCS DataSet, but it would still be valid to use it. Give me a moment, and I'll see if I can make it happen … Ok! Success! If you define your own schema like so (I'm doing it all in the one directory to make things easier): <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/MySchema" xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01" elementFormDefault="qualified" > <xs:import namespace="http://www.tdwg.org/schemas/tcs/1.01" schemaLocation="TCSv101.xsd" /> <xs:element name="MyData"> <xs:complexType> <xs:sequence> <xs:element name="Name" type="tcs:CanonicalName" maxOccurs="unbounded" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Then you can use the TCS elements like so: <?xml version="1.0" encoding="UTF-8"?> <mydata:MyData xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mydata="http://www.example.org/MySchema" > <mydata:Name> <tcs:Simple>CANIDAE</tcs:Simple> </mydata:Name> <mydata:Name> <tcs:Simple>CANIDAE</tcs:Simple> <tcs:Uninomial>CANIDAE</tcs:Uninomial> </mydata:Name> <mydata:Name> <tcs:Simple>Canis</tcs:Simple> <tcs:Uninomial>Canis</tcs:Uninomial> </mydata:Name> <mydata:Name> <tcs:Simple>Canis familiaris</tcs:Simple> <tcs:Genus>Canis</tcs:Genus> <tcs:SpecificEpithet>familiaris</tcs:SpecificEpithet> </mydata:Name> </mydata:MyData> and it validates when I use the command xmllint --nonet --schema MySchema.xsd MyXml.xml Most importantly, this: <?xml version="1.0" encoding="UTF-8"?> <mydata:MyData xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mydata="http://www.example.org/MySchema" > <mydata:Name> <tcs:Simple>Canis</tcs:Simple> <tcs:Genus>Canis</tcs:Genus> </mydata:Name> </mydata:MyData> Does *not* validate, as it shouldn't. In TCS, you do not use "Genus" except as part of a more than uninomial name. TCS exposes several types that can be used in this way, and quite possibly I should have looked at doing this in the IBIS xml schemas, particularly as you can extend these types in your own schemas. 20/20 hindsight and all that. > I am thinking at this stage we might only use scientific name, LSID, authorship, rank, common names as available; and possibly a navigable parent tree to generate a taxonomic hierarchy, either as separate elements, or a concatenated string. Synonyms are also a possible area of interest but possibly more trouble than they will be worth in this use case. TCS Types of interest might be: ScientificName NameCitation AgentNames (for authors) RelationshipType AccordingToType > > I guess there are attractions to using a globally defined rather than locally defined schema where possible (although maybe not if it’s one no other clients support…) very much so. But to correctly leverage TCS in a way that can be validated, your will need your own schema. But this can be as simple as exposing each of the types as your own element: <xs:element name="CanonicalName" type="tcs:CanonicalName" maxOccurs="unbounded" minOccurs="0"/> and so on. It might be worth revisiting this for the biodiversity.org.au data at some stage when we are ready to do a fairly major revision of the structure of the output. > > Cheers - Tony > > From: Paul Murray [mailto:pmurray@anbg.gov.au] > Sent: Friday, 2 November 2012 12:31 PM > To: Rees, Tony (CMAR, Hobart) > Cc: TDWG TAG; Pigot, Simon (CMAR, Hobart); Whitbread, Greg (ANBG) - Contact > Subject: Re: [tdwg-tag] Any TCS users with experiences to report? [SEC=UNCLASSIFIED] > > Firstly, the XML schema: > > http://biodiversity.org.au/xml/ibis is an xml namespace, which works a bit differently to RDF namespaces. > > RDF does not have an explicit mechanism for finding schema metadata. By convention (and it is just a convention), we usually find the schema for a namespace by assuming that the namespace URI will work as a URL that can be fetched, and that fetching it will pull back a schema description (possibly in any one of several formats, using HTTP content negotiation). > > In XML, however, namespaces are explicitly linked to schema documents by the xsi:schemaLocation attribute. > > The xml generated by biodiversity.org.au > > http://biodiversity.org.au/apni.taxon/54321.xml > > Comes back with the declaration > > <app:documents xmlns:app="http://biodiversity.org.au/xml/servicelayer/content" xmlns:ibis="http://biodiversity.org.au/xml/ibis" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cfg="http://biodiversity.org.au/xml/servicelayer/configuration" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/dcterms.xsd http://purl.org/dc/dcmitype/http://dublincore.org/schemas/xmls/qdc/dcmitype… http://biodiversity.org.au/xml/servicelayer/content http://anbg.gov.au/ala/schemas/xml/app.xsd http://biodiversity.org.au/xml/ibis http://biodiversity.org.au/xml/ibis-20120706.xsd http://biodiversity.org.au/xml/apnihttp://biodiversity.org.au/xml/apni-2012… http://biodiversity.org.au/xml/afd http://biodiversity.org.au/xml/afd-20120706.xsd http://biodiversity.org.au/xml/col http://biodiversity.org.au/xml/col-20110615.xsd "> > > Although it's a bit buried in there, XML parsers can see from this that the xml namespace "http://biodiversity.org.au/xml/ibis" has a location of "http://biodiversity.org.au/xml/ibis-20120706.xsd". All of our XML is supposed to validate, and last time I checked it did. > > By the way - note the date on the filename. We have changed the schema from time to time. Another change is upcoming: the addition of an "excluded" flag for concepts that have been considered for APC and have been explicitly excluded (for a variety of reasons). This will be managed by a new schema document being available on our server and the generated xsi:schemaLocation attribute being changed. > > Secondly, TCS: > > The issue with TCS is that it is very difficult to extend. To use a bit of TCS in some other schema, you would import the element types and extend them. But TCS mostly does not expose its element types as named types that can be referenced externally - it's all done inline. This means that the only place a TCS "ScientificName" or "Rank" element can appear is somewhere inside a TCS DataSet element. This is not in itself a show-stopper: we could simply generate a DataSet wrapper when we produce output in response to fetches. > > But there were other issues such as (and I can only recall one or two at the moment - this mail is not a full defence of our decision to not go with TCS): > > A TaxonConcepts element may have multiple TaxonRelationships element. We would like to attach additional data to each relationship to capture information that TCS cannot. There is a ProviderSpecificData element, but this is at the end of the TaxonConcept element, and I could not work out a way to stuff the extra data for each relationship into that ProviderSpecificData element in such a way as it was attached to the correct relationship - although re-looking at it now I see a "ref" attribute and perhaps that is meant to do the job. > > There are multiple TCS "relationship types", but these did not quite match the data we had. It is not possible to put anything but a TCS relationship type enum into the "type" attribute of a TaxonRelationship element, so we wind up having to provide two fields - the "real" type and the nearest TCS equivalent. The "real" type needs to go in the ProviderSpecificData section - miles away (in the document) from what is supposed to be the primary place where the relationship is described. It's ugly. Furthermore, some of our relationships don't really match the TCS ones at all well - to the point that using a TCS type would be misleading. The TCS enumeration does not have a "other" value, so there was a bit of an impasse. > > In any event, we were looking at either putting some relationships in the TCS array, and some in the PSD array, or putting corresponding arrays in each. Of course, in the provider specific data section we cannot use any of the TCS elements, because the element types are not exposed and can only appear in a TCS DataSet at the correct spot. It just got to the point where the ProviderSpecificData section was bigger and more interesting than the TCS, so we broke it out into a separate XML document (which was bundles with the TCS using an ibis:documents wrapper), at which point we couldn't help but ask "Can some one explain again why we are trying to do this?". After more discussions with both the zoologists and the botanists, attempting to work out which TCS enumerated values I should use for what, we gave up. > > TCS does an admirable job of being watertight. If you have any valid XML document with any TCS element, then you know that it will be enclosed in a DataSet element and come bundled with enough context to make sense of it. It's a model for shifting around entire, self-contained *sets* of data. Entire taxonomies, sitting as big files on a disk (or in an xml store). But our service layer serves up fragments - one or several taxa in response to a request, and TCS turned out to not be a good model for what we do. The history of trying to use it has left us with a legacy of having multiple relationship-type fields (relationship "description" and relationship "category") whose product does not form a sensible set of values. > > What we have now is a site-specific schema that captures and exposes the data we have. Admittedly, this means that the grand goal we are all trying to accomplish - a consistent worldwide net of data - is not as far down the track as we were hoping to go. It means that the problem of working out how data set 1 matches data set 2 is pushed off onto aggregators, a job that is in general impossible for an aggregator to accomplish. If we could have fitted our data into TCS, if everyone else could also have done so, then that would have been wonderful. We were reluctant to abandon it, but to get our data out the door we eventually did. > > > On 02/11/2012, at 9:41 AM, <Tony.Rees(a)csiro.au> <Tony.Rees(a)csiro.au> wrote: > > > > > Hi TDWG persons, > > I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS,http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumablyhttp://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something). > > I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better. > > It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts? > > Regards - Tony > > Tony Rees > Manager, Divisional Data Centre, > CSIRO Marine and Atmospheric Research, > GPO Box 1538, > Hobart, Tasmania 7001, Australia > Ph: 0362 325318 (Int: +61 362 325318) > Fax: 0362 325000 (Int: +61 362 325000) > e-mail: Tony.Rees(a)csiro.au > Manager, OBIS Australia regional node, http://www.obis.org.au/ > Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm > Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 > LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36 > > From: tdwg-tag-bounces(a)lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray > Sent: Wednesday, 7 March 2012 12:52 PM > To: Steve Baskauf > Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG > Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED] > > > On 07/03/2012, at 3:11 AM, Steve Baskauf wrote: > > > > Dag and Éamonn, > > In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType . > > > We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type. > > For instance, > http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm > > Is declared to be a subclass of > http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm > > And we have a few specific items of that type: > http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation > http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name > http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym > http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-l… > > These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory. > > Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it. > > ------------------------------------------------------------- > > There are two or three approaches to using a standard vocabulary when your own data does not quite match it. > > You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go. > > You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive. > > In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that. > > You can use the "define your own term" mechanism and assert both > _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . > _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of . > > You can have a completely separate predicate: > _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of . > _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of . > > You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted. > > Another alternative is to create an OWL rule that says > "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of" > > But this creates a performance hit. > > ------------------------------------------------------------- > > That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have. > > I'd also suggest that that every enumeration (ie, ist of individuals) include two special values: > > NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means. > OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional. > > These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations. > > If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email. > > _______________________________________________ > tdwg-tag mailing list > tdwg-tag(a)lists.tdwg.org > http://lists.tdwg.org/mailman/listinfo/tdwg-tag > > If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email. > If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

2 1