tdwg-tag
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
November 2012
- 11 participants
- 8 discussions
Apologies for any cross postings.
At its recent conference in Beijing, the TDWG Executive/TAG approved the
formation of a task group on vocabulary management. It goes by the acronym
VoMaG (Vocabulary MAnagement Group).
A revised charter
(http://community.gbif.org/pg/file/read/28812/vocabulary-management-group-vo
mag-charter-v1), outlining the goals and mode of working of the group is
available on the GBIF community site
(http://community.gbif.org/pg/groups/21382/vocabulary-management/). In the
absence of a functioning TDWG site, we intend to use this site as the
meeting place for discussions, sharing docs, etc.
Please consider joining VoMaG as a core member. You should email me and also
register on the GBIF community site. You can log in using OpenID if you have
a Google or Yahoo account. Once the core group is established, we can
proceed with developing a work programme/timeline to achieve our goals.
With regards,
Éamonn Ó Tuama
____________________________________________________
Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama(a)gbif.org),
Senior Programme Officer, Global Biodiversity Information Facility
Secretariat,
Universitetsparken 15, DK-2100, Copenhagen Ø, DENMARK
Phone: +45 3532 1494; Fax: +45 3532 1480
2
1
Re: [tdwg-tag] [tdwg-rdf: 105] Re: Any TCS users with experiences to report?
by Steve Baskauf 28 Nov '12
by Steve Baskauf 28 Nov '12
28 Nov '12
I read Rich's email as quoted in Nico's reply - I think maybe Rich's
post didn't actually go out on the tdwg-tag or RDF group lists. Rich
mentions that he is swamped and will reply later. For the moment it may
be helpful to cite an earlier email of Rich's which it took me some time
to dig out of the tdwg-content email list:
http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001703.html
In that post, Rich was responding to a thread that started when I asked
how one would handle a real-life situation (the specimen pictured in
http://images.cyberfloralouisiana.com/images/specimensheets/lsu/0/0/4/28/LS…).
The relevant part begins about half way down the page with "In the web
example given by Steve, we have... ". In that section, Rich notes that
"Eventually, a third party may be able to deduce (perhaps through a suite of
other, external information) a RelationshipAssertion that maps the TNU
"[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" to some other, perhaps
published and well-defined taxon concept (of the same or different name).
Also, if there are 100 specimens in the collection that L. Urbatsch
identified as "Juncus diffusissimus Buckl." in 2009, then anchoring all 100
Identification instances to the one TNU, allows all of those specimens to
inherit the mapping of the one "[Juncus] diffusissimus Buckl. sec L.
Urbatsch 2009" TNU instance to some other better-defined taxon concept."
From that post, I understood that a TNU (a.k.a. "assertion" in Pyle
2004 http://systbio.org/files/phyloinformatics/1.pdf) can be as vague as
an idea that some determiner had in his/her head about how
organism/specimen instances should be mapped to a name. I think from
what Rich said there that there is the potential that we as metadata
aggregators may at some later point be able to map how that idea in the
determiner's head fits in with a more well-defined (e.g. published)
taxon description which one may choose to call a taxon concept rather
than a TNU.
As so often is the case, I think the problem here boils down to
identifiers and the metadata that we associate with them. Let's say in
the real-life example above, somebody (we can say GNUB) assigns a
persistent identifier (perhaps a URI constructed from an opaque UUID) to
"Juncus diffusissimus Buckl. sec L. Urbatsch 2009". We could say with
an rdf:type statement that the resource identified by the URI is a TNU.
We can give that resource a tc:hasName property linking it to the name
which is represented by the string "Juncus diffusissimus Buckl.". (I'm
not sure what property we use to say that L. Urbatch made the
assertion). Now let's say that L. Urbatsch publishes a paper describing
in detail her concept of Juncus diffusissimus Buckl. We can now assign
the resource identified by the URI a tc:accordingTo property whose value
is the DOI of the paper she wrote. If we want, we can replace the
previous rdf:type statement with different one stating that the resource
is a taxon concept rather than a TNU, or if we believe that all taxon
concepts are also TNUs we can leave the rdf:type statement that we had
before and just add a second one saying that the resource is also of
type taxon concept.
The point I'm trying to make is that as long as this "thing" that we are
variously calling "taxon name usage", "taxon concept", "shallow
taxonomic concept", or "deep taxonomic concept" can be assigned an
identifier, what really matters is the metadata we associate with it,
not really what we call it. The more metadata that we can connect with
it, either through datatype properties like name strings or object
properties that describe how the "thing" is related to other resources,
the "deeper" the concept. On the other extreme, we may know nothing
more than the name string. In that case we could call it a "nominal
concept", but we could still assign it an identifier and maybe with luck
we could associate more metadata with it (make it "deeper") at some
point in the future.
Returning to the original question of the thread (which was about the
utility of TCS), TCS tries to deal with this problem using a thing
called "signatures" (section 17.2, see
http://bioimages.vanderbilt.edu/pages/TCS-Schema-UserGuide-v1.3.pdf)
which are a somewhat crude attempt to make identifying strings unique by
standardizing their format. However, TCS was written in 2005-2006.
Since then, the development of DOIs, the TDWG GUID Applicability
Statement standard, and best practices in the Linked Data world have
provided well-established and standardized ways to create persistent and
dereferenciable identifiers. So there isn't any reason I can see why we
can't use them.
I am going to be bold and say that we already have the minimum tools
required to get started implementing TNUs/TaxonConcepts:
- URI GUIDs (which if one preferred could be UUIDs or LSIDs -- HTTP
proxied to make Linked Data people happy; see the TDWG GUID
Applicability Statement standard if you don't know how to do this) to
identify the TNU/concepts,
- the two terms tc:hasName and tc:accordingTo (from the TDWG Taxon
Concept ontology) to relate the TNUs/TaxonConcepts to names and sec.
references, and
- some sources for name and publication URI GUIDs.
There are deficiencies all over the place for that last item, but they
can be addressed over time by improving the scope of the relevant
databases and the quality of the metadata provided. uBio has URIs for
almost every name I've ever looked for. BHL has a growing collection of
old literature which has been assigned identifiers by Rod Page's
BioStor, new literature usually has an assigned, dereferenceable proxied
DOI, and one can even make valid URIs from ISBNs of books (although they
aren't resolvable). I'm not sure how one should address the situation
where the "sec." reference of a TNU is a person and date since there
isn't a standard database of people (as far as I know). But that could
be remedied. Ultimately, one could create the kinds of mapping tools
that Nico and Rich are talking about which relate different taxon
concepts/TNUs which have set theory relationships. Whether that would
be done with RDF, OWL, or something completely different I don't know,
but the basic anchoring of persistent identifiers for the TNU/concepts
to the names and sec. references wouldn't have to wait on that. We
could also get hung up about what terms to use to express the metadata
describing the basic TNU/name/sec. resources, but there is nothing that
says that metadata can't change or be improved over time. It's the
identifier that shouldn't change.
Am I wrong about this???
Steve
Nico Franz wrote:
> Thank you, Rich.
>
> So we seem to agree on something like this:
>
> Rich Nico
> taxon name usage <===> "shallow" taxonomic concept
> taxon concept <===> "deep" taxonomic concept
>
> Both: labeling is via name sec. author
> Both: authoring concepts/usages vs. identifying to those => slippery
> issue; ideally requires proper speaker awareness.
>
> Why the latter? - well, because (again) the desirable effect of
> using concepts - the desirable situation where these would have a
> justification that goes beyond just really meticulous data management
> and advances to the level of facilitating better science qua more
> precise taxonomic semantics - only obtains if a great number of name
> occurrences in a wide range of shallow-ish sources is linked via
> identification to a presumably smaller number of occurrences where
> those names are well defined and where successive definitions of names
> are semantically linked. So there needs to be an emerging culture of
> minimizing concept inflation. Otherwise we obtain what we have now
> (mostly just names) and on top of that add new baggage (lots of really
> shallow concepts) that nobody can do good semantics with.
>
> Here is where I think we disagree, perhaps just in terms of sales
> strategy:
>
> You seem to suggest that making an a priori distinction between
> TNUs and concepts is (1) possible in a good number of cases, (2) is
> desirable perhaps in the form of registry, and (3) even necessary for
> building and populating databases, etc.
>
> Here I disagree, for a number of reasons. First off I do believe
> that not defining certain things too soon or too narrowly is sometimes
> actually really good science and on the other hand, doing so can be a
> show stopper if other people don't share this narrowness and find it
> limiting. Second, while we can perhaps readily agree that a lengthy
> monograph published in American Museum Novitates rises to the level of
> authoring next concepts whereas a label saying "Family Carabidae" on a
> specimen submitted as part of an insect student collection does not,
> there are enough in-between cases where only time will tell.
>
> Example: USDA Plants promotes a particular perspective of
> groundcherry taxonomy, genus-level concept Physalis -
> http://plants.usda.gov/java/profile?symbol=physa - with some 29
> species-level concepts recognized. ASU's herbarium curator Les Landrum
> is a bit of a groundcherry nerd (I say this with admiration). If you
> go here: http://swbiodiversity.org/seinet/index.php, then Search
> Collections => Select All => Next => Scientific Name = Physalis =>
> Search, you get some 3700 pertinent specimen records. If you then
> switch to the Species List tab, you see 115 concept listed overall.
> Switching to the USDA Plants Thesaurus will give you only 46 concepts
> that these 3700 specimens are mapped to. Using instead the ASU
> Taxonomic Thesaurus will yield 89 concepts linking variously to those
> specimens. This is based on Les' classification of groundcherries
> which is not further documented in the SEINet environment at this moment.
>
> Now, saying a a priori whether Les' list represents a set of TNUs
> versus concepts would presumably require you to assert that there is
> nobody who is Les or very much like him that can provide a
> semantically very accurate mapping of the 89 name usages in the
> SEINet-ASU Physalis list to the much more thoroughly circumscribed
> USDA Plants concepts. That could seem like a daring prediction given
> how little Les might think of the USDA perspective. At the very moment
> that Les or someone very much like him DOES provide the mapping, what
> looked like a list of TNUs then all of a sudden acquires - via the
> mapping - a much deeper semantic status where others can readily go
> from one classification to the next, even though each come with very
> different amounts of information in their original appearances. Some
> people may prefer Les' concepts at least for Arizonan groundcherries,
> and in either case, the mapping put both on an even playing field.
>
> So this exemplifies IMO why so far the concept approach has been
> too abstract, the TCN has been too depauperate on the
> relationships/mapping side (worrying instead almost needlessness about
> what constitutes a concept per se), and definitions between
> identifications, name usages, shallow, deep concepts have been too
> abstract as well. I believe we should focus less discussion on those
> issues and more emphasis on building mapping tools that can carry a
> wide range of input and logically infer additional implied mappings
> from the initial expert-given set. The actual semantic properties of
> that input will emerge a posteriori and will be hard to predict in
> some cases. Some descriptions are lengthy but nobody understands them.
> Some names lists are profoundly informative if the context of their
> origin is well known to an expert.
>
> There will be some obvious overreaches in both directions (too many
> unconnected items, some items that are connected more precisely than
> their inherent information would seem to justify). I think these
> overreaches would be tolerable. What's less productive to me is a
> restrictive set of definitions that provide an early blockage in they
> way towards an environment where mapping is supposed to occur very
> frequently. We're not at the registry stage yet. More at the "can this
> work in principle" stage. As I mentioned before, the mappings ARE the
> concepts under a certain viewpoint. We don't want to pre-determine
> their fate by separating TNUs from concepts in a great number of cases.
>
> I hope this was not a misrepresentation of your view and also a
> clarification of my view. In the end, we both advocate some sort of
> balance for the same concerns, but perhaps disagree only strategically
> about the moment where/when that balance will materialize - upfront
> via precise definitions and registration or later on via the
> presence/lack of actual mappings.
>
> Best,
>
> Nico
>
>
> On Mon, Nov 26, 2012 at 5:18 PM, Richard Pyle
> <deepreef(a)bishopmuseum.org <mailto:deepreef@bishopmuseum.org>> wrote:
>
> I want to get into this topic in more detail (going back to
> Steve’s original post), but this week is hell-week for me, so only
> a quick comment now.
>
>
>
> I generally agree with everything Nico says, but I think we need
> to be a little more clear of what we mean by “name sec. author”
>
>
>
> The core unit of the data model we’ve been building towards (GNUB,
> which underlies ZooBank) uses as its fundamental unit something
> we’ve been calling a “Taxon Name Usage Instance” (TNU). The scope
> of what can be a TNU is intentionally very broad – anything from
> an original taxon name description, to a mention in a newspaper
> article, and potentially even a scribbled hand-written label or
> letter. The only requirement is that it be static – that is, a
> snapshot in time. I mention this because database records can be
> represented as TNUs, but only as a static snapshot of the record.
> If the essence of the database record changes over time (e.g., due
> to changing taxonomic opinion), then a new TNU is generated for a
> different snapshot in time.
>
>
>
> A very small subset of the universe of TNUs represent
> Code-governed Nomenclatural Acts (original descriptions of new
> names and other code-governed nomenclatural actions). In the case
> of such TNUs involving the ICZN Code (for example), the TNUs are
> registered in ZooBank. But the point is, one subset of all TNUs
> are those that involve actions governed by a Code of nomenclature.
>
>
>
> The reason I mention this is that, if I read Nico’s email
> correctly, I think he’s saying that not all TNUs de-facto
> represent taxon concepts. Rather, analogous to the nomenclatural
> subset of TNUs, there is a subset of TNUs that rise to the level
> of representing Taxon Concept definitions. In the case of
> nomenclatural acts, someone must make some sort of declaration
> (assertion) that a specific TNU constitutes a Code-governed
> nomenclatural act, along with relevant metadata relating to that
> assertion and the nature of the Act. In the case of zoological
> names, ZooBank is intended to facilitate this role (i.e., when a
> person registers a TNU in ZooBank, there is an implied assertion
> that the TNU represents a nomenclatural act under the ICZN Code).
>
>
>
> What would be nice to have (and what TDWG could play a helpful
> role in facilitating), is a registry of sorts (analogous to
> ZooBank) for those TNUs that represent taxon concepts. In other
> words, a mechanism for people to “register” the subset of all TNUs
> that represent taxon concepts. Secondarily, there would also be a
> mechanism to make assertions about how registered taxon concepts
> map to each other (via some sort of set theory relationship[s]).
>
>
>
> In summary, my points are
>
> 1) We should be clear when we say “name sec. author” whether
> we mean it sensu lato (i.e., all TNUs); or sensu stricto (i.e.,
> only those TNUs that rise to the level of representing taxon
> concepts).
>
> 2) There ought to be a registry (perhaps administered by
> CoL?) for identifying the subset of TNUs that represent concept
> definitions, and it should include a mechanism for making
> set-theory relationship assertions among registered concept-TNUs.
>
> 3) The two things mentioned in #2 should be separate; that
> is, one can assert that a particular TNU represents a taxon
> concept separately from (potentially multiple) assertions about
> how that taxon concept relates to other taxon concepts.
>
>
>
> More later.
>
>
>
> Aloha,
>
> Rich
>
>
>
> P.S By my standards that WAS quick!
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
4
7
Re: [tdwg-tag] [tdwg-rdf: 103] Re: Any TCS users with experiences to report?
by Richard Pyle 27 Nov '12
by Richard Pyle 27 Nov '12
27 Nov '12
Hi Nco,
> So we seem to agree on something like this:
>
> Rich Nico
> taxon name usage <===> "shallow" taxonomic concept
> taxon concept <===> "deep" taxonomic concept
Not quite. They're all based on TNUs. I would represent it more like this:
TNU without well-defined concept <===> "shallow" taxonomic concept (aka
"potential taxon" sensu Berendsohn)
TNU with well-defined concept <===> "deep" taxonomic concept
> Both: labeling is via name sec. author
Labeling of what? Should "sec." (or whatever abbreviated term I used) mean
simply "this author used this name", or should it be restricted to mean
"this author used this name in a way that was accompanied by a well-defined
taxon concept circumscription"? In my discussions with various folk on
questions related to GNUB & ZooBank, there seems to be a preference for
"sensu" rather than "sec".
> Both: authoring concepts/usages vs. identifying to those => slippery
issue; ideally requires proper speaker awareness.
Agreed -- this is where the assertions are needed. In the case of
nomenclatural acts, we have robust Codes that prescribe specific criteria
for what nomenclatural acts are, and thus there is a (mostly) objective
mechanism for recognizing which among the entire universe of TNUs are
associated with nomenclatural acts (and, hence, which should be registered
as such). However, there is no universally adopted "Code of Taxon Concepts"
that outline such criteria, to determine which TNUs are accompanied by
sufficiently robust concept circumscription definitions, and which are
merely non-concept TNUs. I think it would be a mistake to attempt to define
such a Code -- (similar problem to the "what is a species" issue).
> Why the latter? - well, because (again) the desirable effect of using
concepts -
> the desirable situation where these would have a justification that goes
beyond
> just really meticulous data management and advances to the level of
facilitating
> better science qua more precise taxonomic semantics - only obtains if a
great
> number of name occurrences in a wide range of shallow-ish sources is
linked via
> identification to a presumably smaller number of occurrences where those
names
> are well defined and where successive definitions of names are
semantically linked.
> So there needs to be an emerging culture of minimizing concept inflation.
> Otherwise we obtain what we have now (mostly just names) and on top of
that
> add new baggage (lots of really shallow concepts) that nobody can do good
semantics with.
Agreed!
> Here is where I think we disagree, perhaps just in terms of sales
strategy:
>
> You seem to suggest that making an a priori distinction between TNUs and
> concepts is (1) possible in a good number of cases, (2) is desirable
perhaps in
> the form of registry, and (3) even necessary for building and populating
databases, etc.
Sort of; but again, they're all based on TNUs. The issue is how to
recognize which ones are associated with robust concept definitions, and
which are not. So it's not about distinguishing between TNUs and concepts;
it's about recognizing which TNUs include concept definitions, and therefore
represent reference-points to concepts. I don't think this can be easily
defined -- I think the only way to do it is by letting people assert it to
be so. In other words, the definition of a TNU that represents a taxon
concept is that someone says it is so. (Much like my own definition of a
"species"!)
> Here I disagree, for a number of reasons. First off I do believe that
not defining
> certain things too soon or too narrowly is sometimes actually really good
science
By "certain things", do you mean "defining a particular TNU as having a
robust concept definition associated with it"? Or, do you mean "defining
what a taxon concept is, in general"?
> and on the other hand, doing so can be a show stopper if other people
don't share
> this narrowness and find it limiting.
I think I agree -- but until I understand what you mean by "certain things",
I can't be sure.
> Second, while we can perhaps readily agree that a lengthy monograph
published
> in American Museum Novitates rises to the level of authoring next concepts
> whereas a label saying "Family Carabidae" on a specimen submitted as part
of
> an insect student collection does not, there are enough in-between cases
where
> only time will tell.
Absolutely!! No disagreement here. The idea is not to say "this TNU
includes a good concept definition". The idea is to say "this person
asserts that this TNU is a good concept definition". I agree, we don't
want to be too precise on drawing that arbitrary line. Indeed, I don't
think it's possible to do so. Hence, the "Code of Taxon Concepts" is, I
think, a bad idea (as stated above). Ultimately, the definition of a taxon
concept is that someone says it is so.
> So this exemplifies IMO why so far the concept approach has been too
abstract,
> the TCN has been too depauperate on the relationships/mapping side
(worrying
> instead almost needlessness about what constitutes a concept per se), and
> definitions between identifications, name usages, shallow, deep concepts
have
> been too abstract as well. I believe we should focus less discussion on
those
> issues and more emphasis on building mapping tools that can carry a wide
> range of input and logically infer additional implied mappings from the
initial
> expert-given set. The actual semantic properties of that input will emerge
a
> posteriori and will be hard to predict in some cases. Some descriptions
are
> lengthy but nobody understands them. Some names lists are profoundly
> informative if the context of their origin is well known to an expert.
I agree with all of this. I think it would be wise to post-pone the
development of any sort of concept registry until the a posteriori analysis
is done. Build the maps first (based on assertions from willing
practitioners), then decide whether there needs to be a "registry" for which
TNUs represent concepts, and which do not.
> There will be some obvious overreaches in both directions (too many
unconnected
> items, some items that are connected more precisely than their inherent
> information would seem to justify).
That is inevitable!
> I think these overreaches would be tolerable. What's less productive to me
is a restrictive
> set of definitions that provide an early blockage in they way towards an
environment
> where mapping is supposed to occur very frequently.
Agreed.
> We're not at the registry stage yet. More at the "can this work in
principle" stage.
> As I mentioned before, the mappings ARE the concepts under a certain
viewpoint.
> We don't want to pre-determine their fate by separating TNUs from concepts
in a
> great number of cases.
Yes. I think we're on the same page -- just on different time scales. I was
looking towards the end-game, but you're focused more on "where do we go
from here". I agree that we should focus on your timescale first, and let
the longer-term solution emerge from that.
> I hope this was not a misrepresentation of your view
Only slightly.... :-)
> and also a clarification of my view. In the end, we both advocate some
sort of balance
> for the same concerns, but perhaps disagree only strategically about the
moment
> where/when that balance will materialize - upfront via precise definitions
and
> registration or later on via the presence/lack of actual mappings.
Actually, I don't think we disagree on strategy. I agree with everything
you describe above, and now feel that a registry of concepts is better
thought of as a possible end-game. Better now to define a mechanism to allow
people to make cross-reference assertions among TNUs, and let the issue of
"what is a taxon concept" (specifically: which TNUs represent good taxon
concept definitions) emerge from that.
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
Associate Zoologist in Ichthyology
Dive Safety Officer
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef(a)bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
Note: This disclaimer formally apologizes for the disclaimer below, over
which I have no control.
1
0
Re: [tdwg-tag] [tdwg-rdf: 103] Re: Any TCS users with experiences to report?
by Richard Pyle 27 Nov '12
by Richard Pyle 27 Nov '12
27 Nov '12
I want to get into this topic in more detail (going back to Steve's original
post), but this week is hell-week for me, so only a quick comment now.
I generally agree with everything Nico says, but I think we need to be a
little more clear of what we mean by "name sec. author"
The core unit of the data model we've been building towards (GNUB, which
underlies ZooBank) uses as its fundamental unit something we've been calling
a "Taxon Name Usage Instance" (TNU). The scope of what can be a TNU is
intentionally very broad - anything from an original taxon name description,
to a mention in a newspaper article, and potentially even a scribbled
hand-written label or letter. The only requirement is that it be static -
that is, a snapshot in time. I mention this because database records can be
represented as TNUs, but only as a static snapshot of the record. If the
essence of the database record changes over time (e.g., due to changing
taxonomic opinion), then a new TNU is generated for a different snapshot in
time.
A very small subset of the universe of TNUs represent Code-governed
Nomenclatural Acts (original descriptions of new names and other
code-governed nomenclatural actions). In the case of such TNUs involving the
ICZN Code (for example), the TNUs are registered in ZooBank. But the point
is, one subset of all TNUs are those that involve actions governed by a Code
of nomenclature.
The reason I mention this is that, if I read Nico's email correctly, I think
he's saying that not all TNUs de-facto represent taxon concepts. Rather,
analogous to the nomenclatural subset of TNUs, there is a subset of TNUs
that rise to the level of representing Taxon Concept definitions. In the
case of nomenclatural acts, someone must make some sort of declaration
(assertion) that a specific TNU constitutes a Code-governed nomenclatural
act, along with relevant metadata relating to that assertion and the nature
of the Act. In the case of zoological names, ZooBank is intended to
facilitate this role (i.e., when a person registers a TNU in ZooBank, there
is an implied assertion that the TNU represents a nomenclatural act under
the ICZN Code).
What would be nice to have (and what TDWG could play a helpful role in
facilitating), is a registry of sorts (analogous to ZooBank) for those TNUs
that represent taxon concepts. In other words, a mechanism for people to
"register" the subset of all TNUs that represent taxon concepts.
Secondarily, there would also be a mechanism to make assertions about how
registered taxon concepts map to each other (via some sort of set theory
relationship[s]).
In summary, my points are
1) We should be clear when we say "name sec. author" whether we mean it
sensu lato (i.e., all TNUs); or sensu stricto (i.e., only those TNUs that
rise to the level of representing taxon concepts).
2) There ought to be a registry (perhaps administered by CoL?) for
identifying the subset of TNUs that represent concept definitions, and it
should include a mechanism for making set-theory relationship assertions
among registered concept-TNUs.
3) The two things mentioned in #2 should be separate; that is, one can
assert that a particular TNU represents a taxon concept separately from
(potentially multiple) assertions about how that taxon concept relates to
other taxon concepts.
More later.
Aloha,
Rich
P.S By my standards that WAS quick!
From: Nico Franz [mailto:nico.franz@asu.edu]
Sent: Monday, November 26, 2012 1:40 PM
To: tdwg-rdf(a)googlegroups.com
Cc: Roderic Page; Tony.Rees(a)csiro.au; pmurray(a)anbg.gov.au;
Simon.Pigot(a)csiro.au; J.Kennedy(a)napier.ac.uk; eotuama(a)gbif.org;
tdwg-tag(a)lists.tdwg.org; deepreef(a)bishopmuseum.org; David Patterson
Subject: Re: [tdwg-rdf: 103] Re: [tdwg-tag] Any TCS users with experiences
to report?
Thank you, Steve.
This I think is quite helpful in summarizing where we are and aren't on
this issue. I'd like to respond / reiterate mainly in relation to Rich's
comments.
http://lists.tdwg.org/pipermail/tdwg-tag/2012-November/002526.html
Regarding the notion of what a concept "is" - Well, scientific
definitions quite often can be deliberately vague and still work well enough
(or work precisely because of that vagueness). I think there is acceptable
middle ground here. Perhaps it is best to focus on the expected actions of
concepts.
(1) Concepts are intended to provide taxonomic resolution (i.e. help us
understand whether two authors talk about the same or different perceived
entities in nature [natural entities for the realist, human perceptions for
the constructivist - no great matter where we stand on that one]) BEYOND
what can be achieved via name strings and synonymy relationships.
(2) Concepts are LABELED with a name sec. author combination (or something
similar to to that).
(3) (1) is relevantly fulfilled IF an expert feels confident enough in
asserting that concept 1 and concept 2 have a single, or a combination of
multiple, set theory relationship(s), or some sort of percent matching
relationship, that goes beyond "somehow overlapping / not" (which
type-anchored nomenclatural relationships could typically convey as well).
This means that, to a degree (and reflecting real biological practice,
I'd think), whether some name sec. author instance attains the level of
concept - as in "concept that does more taxonomic resolution than just
names" - or not is contingent on there being sufficient context for *some*
expert to make such an assertion ("congruent or more inclusive than but not
any of the other three RCC-5 relationships). As you know, learning about
context is something that experts are very good at. It doesn't always jump
out from the written pages.
I cannot quite see how this is a profound issue for modeling in a
database environment. You handle the name sec. author items. The presence or
absence of concept relationships will be (by and large, over time)
reflective of which subsets of the name sec. authors items allow deeper
semantics in the sense of (1) and (3). Providers can have the option of
IDENTIFYING specimens/observations TO a (set of) concept(s) if suitable
concepts are already available in the environment and thus they see no need
to add new ones. Most of this may not be a technical challenge but more a
challenge of relearning speaker roles and making context explicit as one's
own or someone else's. We tend to have this information in mind when we
write systematic and non-systematic works.
I think there are lots of valuable datasets out there presenting single,
snapshot concept hierarchies of incredible value. What we need, if for
nothing else then for the sake of argument, are tools to import these - two
at a time initially - into a concept compatible environment that would allow
visualization and mapping to take place and be checked for consistency. I
say "for the sake of argument" because that model by itself has no reward
scheme yet for an expert to get involved (though some might do it on their
own initiative).
I do believe that TDWG is a proper body to take on and promote this task
but TDWG has perhaps more readily focused on leveraging existing information
as opposed to being very proactive in facilitating new taxonomic practices.
IMO we cannot just focus on the data that are there, but can also build
tools to generate better primary data from now on moving forward.
Best,
Nico
On Wed, Nov 21, 2012 at 8:24 AM, Steve Baskauf
<steve.baskauf(a)vanderbilt.edu> wrote:
In an effort to prevent the loss of information presented in the recent
thread on TCS as it relates to RDF, I have created a summary at
http://code.google.com/p/tdwg-rdf/wiki/TCSthread with links to the archived
messages. I left out posts focused specifically on XML schemas which I
consider to be outside the scope of the TDWG RDF group (that is somebody
else's problem). I will continue to add to this page as additional relevant
posts occur and can also post more "further information" links if anybody
wants to send them to me.
Steve
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582 <tel:%28615%29%20343-4582> , fax: (615) 343-6707
<tel:%28615%29%20343-6707>
http://bioimages.vanderbilt.edu
1
0
Playing devil's advocate I think there are several issues here:
1. The example you gave of an OGC query illustrates what for me is a major limitation of existing approaches (such as DiGiR and TAPIR), they focus on standardising queries not identifiers. Hence we can query databases in a consistent (if cumbersome) way, but have no easy way to refer to the things (taxa, specimens, etc.) we retrieve. Having stable, reusable, resolvable identifiers would be a step forward.
2. Taxonomic concepts aren't much use unless connected to data. Arguably the most widely used taxonomic database in biodiversity is the NCBI taxonomy database, which has stable identifiers, an API, and taxa that are connected to data (sequences and publications). The GBIF backbone classification is also connected to data (specimens and observations) although its taxon identifiers (like its occurrence ids) aren't terribly stable.
3. I think the standards-first approach tends to put the cart before the horse. I'm not sure it's the lack of standards that is the problem, it's the lack of usable information in taxonomic databases. Apart from NCBI and GBIF, what science can I do with taxonomic databases? What questions do they allow me to ask?
Regards
Rod
Sent from my iPhone
On 3 Nov 2012, at 03:41, <Tony.Rees(a)csiro.au> wrote:
> Hi Jessie, also others who have responded thus far,
>
> You said:
>
>> I think it would be great if the major databases that describe taxa (not
>> just list names) described their data as concepts and allowed people to
>> link to their databases when identifying specimens and when sequencing
>> etc, this would be the start of a really useful biodiversity network.
>
> Agreed! And also the databases that "just list names" are dealing with concepts as we know, comprising a valid name plus all listed synonyms in these cases...
>
> My feeling is the reason that there is not yet any standardization in this area - every data resource does its own thing using its own home-grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards-compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example
>
> http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&request=Ge…
>
> So maybe for either TCS, DwC and so on a missing part of the task is to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course!
>
> More soon,
>
3
5
Proposed charter for a TAG Vocabulary Management Task Group (VOMAG)
by Dag Endresen (GBIF) 03 Nov '12
by Dag Endresen (GBIF) 03 Nov '12
03 Nov '12
Dear TAG,
After battling with the plans for a biodiversity knowledge organization
(KOS) framework for biodiversity information resources we have
identified the requirement to develop guidelines and best practices for
the management of vocabularies of terms. Basic terms organized in
vocabularies provides an essential element to underpin the biodiversity
information standards. As introduced at the TDWG 2011 TAG meetings in
New Orleans, we propose the formation of a new Vocabulary Management
Task Group (VOMAG) [1] to be organized under the TDWG technical
architecture group (TAG). Please find the draft charter available from
the GBIF Community Site [2][3]. Here you will also find another draft
document "Biodiversity Knowledge Organization System: Proposed
Architecture; Version 0.1 February 2012" that provides an overview of
the proposed KOS landscape and the context for the proposed work-plan of
the Vocabulary Management Task Group Charter.
This is the first draft so far only discussed with Greg Whitbread as
convener of the TDWG TAG and with Steve Baskauf and Joel Sachs as
convener of the TDWG RDF/OWL task group. We invite feedback and comments
to the proposed formation of the task group including suggestions with
regard to the work-plan. Please join the Vocabulary Management group at
the GBIF Community Site [1]. You can start or participate in discussions
or share suggestions using the GBIF Community Site. Feel also free to
make contact with us to volunteer as a core member for this proposed
task group!
[1] http://community.gbif.org/pg/groups/21382/vocabulary-management/
[2] http://community.gbif.org/pg/file/read/21388/
[3] http://community.gbif.org/pg/blog/read/21387/
[4] http://community.gbif.org/pg/file/read/21582/
Best regards
Dag, Eamonn and David
--
Dag Endresen, PhD
Knowledge Systems Engineer
Global Biodiversity Information Facility (GBIF)
Universitetsparken 15, DK-2100 Copenhagen, Denmark
http://community.gbif.org/pg/profile/dag.endresen
13
21
As it happens, the TDWG exec has just given the go-ahead (at Beijing
meeting) to form a vocabulary management task group under the TAG to look
into how TDWG can better manage the development, maintenance and governance
of vocabularies. A general call for participation will be issued in the next
few weeks. One obvious task is to see where the TDWG Ontology now stands in
relation to DwC and vocab best practices (e.g., why does the TDWG Ontology
re-invent rather than re-use an existing vocab like, e.g., FOAF) and whether
we should extract the best parts (e.g., Taxon Name, Taxon Concept,
Collection) for clean-up, promotion, etc. In the absence of a functioning
TDWG web site, we have put up a draft of the the VoMaG charter on the GBIF
community site
http://community.gbif.org/pg/groups/21382/vocabulary-management/ . The doc
itself is here:
http://community.gbif.org/pg/file/read/21388/vocabulary-management-task-grou
p-charter-a-task-group-of-the-tag-interest-group-draft-version-february-2012
. Please get involved and help us to refine the charter and define tasks for
the group.
Éamonn
-----Original Message-----
From: tdwg-tag-bounces(a)lists.tdwg.org
[mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of
tdwg-tag-request(a)lists.tdwg.org
Sent: 02 November 2012 02:13
To: tdwg-tag(a)lists.tdwg.org
Subject: tdwg-tag Digest, Vol 67, Issue 4
Send tdwg-tag mailing list submissions to
tdwg-tag(a)lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.tdwg.org/mailman/listinfo/tdwg-tag
or, via email, send a message with subject or body 'help' to
tdwg-tag-request(a)lists.tdwg.org
You can reach the person managing the list at
tdwg-tag-owner(a)lists.tdwg.org
When replying, please edit your Subject line so it is more specific than
"Re: Contents of tdwg-tag digest..."
Today's Topics:
1. Re: Any TCS users with experiences to report? (Richard Pyle)
----------------------------------------------------------------------
Message: 1
Date: Thu, 1 Nov 2012 15:13:20 -1000
From: Richard Pyle <deepreef(a)bishopmuseum.org>
Subject: Re: [tdwg-tag] Any TCS users with experiences to report?
To: "'Roderic Page'" <r.page(a)bio.gla.ac.uk>, <Tony.Rees(a)csiro.au>
Cc: pmurray(a)anbg.gov.au, tdwg-tag(a)lists.tdwg.org, Simon.Pigot(a)csiro.au
Message-ID: <006c01cdb897$3f978700$bec69500$(a)bishopmuseum.org>
Content-Type: text/plain; charset="utf-8"
As the Convenor of the TDWG Taxon Names and Concept group, I have failed in
one of my core duties to address this issue. My inability to attend TDWG
this year has only exacerbated this problem.
Having said that?.. I have had many discussions with many folks over the
past couple of years on this issue, and for various reasons the time is now
ripe to re-visit this age-old problem and make some decisions about how to
move forward.
For the ZooBank LSID resolver, we used Roger?s vocabularies; and to some
extent, the DwC terms harmonize (but not completely). A few years ago I
made a push to either revitalize TCS (e.g., through TCS 2.0), or to allow it
to retire (if it hasn?t already done so de facto).
Having just emerged from nearly two very thick years of development on
ZooBank, GNA/GNUB, etc., I am now more energized (and liberated, in terms of
available time) to re-focus on how to move forward. My hope is that we can
make some core decisions about how to move forward well before next year?s
TDWG meeting.
I would very-much welcome feedback from people on:
1) Who is actively using TCS? Does it work? Can it be improved?
Should it be retired?
2) Who is using Roger?s vocabulary? Does it work? Can it be improved?
3) How much of DwC:Taxon is in active use? Just the ?traditional?
terms; or some of the new ones introduced with the ratified DwC? Does it
work? Can it be improved?
4) What other standards are being used in this space?
Now that we have launched the new ZooBank, we will turn our attention to
GNUB services that will start to put that content to work. It is therefore
very much in our interest to support the sorts of data exchange mechanisms
that people most need and, ideally, collapse the various ?flavors? into
something we can all rally around.
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology
Dive Safety Officer Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef(a)bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
Note: This disclaimer formally apologizes for the disclaimer below, over
which I have no control.
From: tdwg-tag-bounces(a)lists.tdwg.org
[mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Roderic Page
Sent: Thursday, November 01, 2012 1:56 PM
To: <Tony.Rees(a)csiro.au>
Cc: pmurray(a)anbg.gov.au; <tdwg-tag(a)lists.tdwg.org>; Simon.Pigot(a)csiro.au
Subject: Re: [tdwg-tag] Any TCS users with experiences to report?
A TDWG standard not actually being used, surely not ;)
Leaving aside the wisdom of XML schema (yuck) and developing standards
independently of actual products, it does puzzle me that the work Roger Hyam
did on the LSID vocabularies is consistently overlooked. The is a RDF
version of TCS http://rs.tdwg.org/ontology/voc/TaxonConcept
This was used by CoL in their LSIDs, but because they usually broke I
suspect nobody used them.
We seem to be in a muddled state at present where there are competing
vocabularies in use for taxonomic names and concepts, and these two notions
are often not cleanly separated. Whereas nomenclators such as IPNI and
Zoobank use the LSID taxon name vocabulary, other databases use vocabularies
such as Darwin Core, which rather conflate names and concepts. It's not
clear to me how this situation arose, but it somewhat defeats the point of
having standards.
Regards,
Rod
Sent from my iPhone
On 1 Nov 2012, at 22:41, <Tony.Rees(a)csiro.au<mailto:Tony.Rees@csiro.au>>
wrote:
Hi TDWG persons,
I am involved in an activity here to set a local standard for storing
taxonomic name, identifier and (probably) hierarchy information in metadata
records using our profile of ISO 19115 for the latter, and the question will
come up as to whether to use elements from TCS, DwC, EML, NCBII extension to
ISO 19115, or other. By default I would expect the front runner to be TCS
but it appears few if any major systems have ever gone that route ? I have
looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more? the
nearest would perhaps be AFD/APNI (hence copying Paul on this email) however
their ?ibis? schema, though apparently based originally on TCS,
http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any
explicit reference to the TCS schema so far as I can see. (Note also the
cited schema definition http://biodiversity.org.au/xml/ibis [or presumably
http://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I
am missing something).
I am in the interesting position of also wishing to make apps which both
publish and consume taxonomic name information so *could* implement TCS for
these, but if no-one else is doing so maybe that is not a path to future
data harmonisation, and something like DwC might be better.
It does seem odd that we have a standard endorsed in 2005 by TDWG which is
apparently unused by any current major players in the real world. Any
thoughts?
Regards - Tony
Tony Rees
Manager, Divisional Data Centre,
CSIRO Marine and Atmospheric Research,
GPO Box 1538,
Hobart, Tasmania 7001, Australia
Ph: 0362 325318 (Int: +61 362 325318)
Fax: 0362 325000 (Int: +61 362 325000)
e-mail: Tony.Rees(a)csiro.au<mailto:Tony.Rees@csiro.au>
Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity
informatics research activities:
http://www.cmar.csiro.au/datacentre/biodiversity.htm
Personal info:
http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36
From:
tdwg-tag-bounces(a)lists.tdwg.org<mailto:tdwg-tag-bounces@lists.tdwg.org>
[mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray
Sent: Wednesday, 7 March 2012 12:52 PM
To: Steve Baskauf
Cc: "?amonn ? Tuama (GBIF)"; TDWG TAG
Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data
Standards [SEC=UNCLASSIFIED]
On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:
Dag and ?amonn,
In the context of the discussion which has been going on in the TDWG RDF
mailing list, I have been thinking more about the issue of how to deal with
DwC terms which state "Recommended best practice is to use a controlled
vocabulary...". That would be dcterms:type, dwc:language,
dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition,
dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition,
dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country,
dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus,
dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode,
dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .
We here have had all sorts of problems using other people's vocabularies -
they never quite match the data we have. Our solution has been to use the
standard terms where possible, but to mint our own where needed. We create
RDF objects and to declare them as being the correct type.
For instance,
http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm
Is declared to be a subclass of
http://rs.tdwg.org/ontology/voc/TaxonConcept#<http://rs.tdwg.org/ontology/vo
c/TaxonConcept>TaxonRelationshipTerm
And we have a few specific items of that type:
http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation
http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name
http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym
http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-li
terature-name
These individuals are therefore correctly typed to be legitimately be used
as a TDWG relationshipCategory.
Your lists of dwc:disposition values does not need to be exhaustive. It's
legitimate (from a machine point of view) for a site to create their own
terms. However, this does mean that the world becomes fragmented into a
number of site-specific vocabularies that cannot be machine-reasoned over.
The underlying reason for this is that that is in fact the way the world
actually is at the moment, and there's not a lot of help for it.
-------------------------------------------------------------
There are two or three approaches to using a standard vocabulary when your
own data does not quite match it.
You can use the standard term that is *closest in meaning* to your own term.
The difficulty here is that if the meaning of the standard term implies
things that are not true of your data, using it means that you are
asserting things that are in fact not true, and for that reason I suggest
that it's not the way to go.
You can use the standard term whose definition encompasses your term. The
difficulty here is that some vocabularies (notably Taxon Concept Schema)
don't have "other" or "unspecified" values for their enumerations - they are
not exhaustive.
In either of these cases, you will want to supplement the standard term with
another value specific to your own data set, whose definition you make
available. There are a few ways to do that.
You can use the "define your own term" mechanism and assert both
_:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .
_:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .
You can have a completely separate predicate:
_:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .
_:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .
You can also be terribly clever and declare your own predicate to be a
super-property of the TDWG predicate, one whose range is a union. This isn't
terribly useful to people using your data unless the tdwg triple is also
asserted.
Another alternative is to create an OWL rule that says "if a thing has
relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has
relationship-type tdwg:is-subtaxon-of"
But this creates a performance hit.
-------------------------------------------------------------
That little discussion aside, my main concern is that you don't get mired in
attempting to exhaustively list all the different island types (etc) as part
of the vocabulary that you are creating. It's a never-ending job. It might
be an idea to have the design guideline that no enumeration class defined by
the vocabulary shall have more than 10 values. It's arbitrary, but it will
keep people from being carried away subdividing types into a hierarchy that
they think is a good idea, but which doesn't match the data people already
have.
I'd also suggest that that every enumeration (ie, ist of individuals)
include two special values:
NOT_SPECIFIED. This value is not present in the source, underlying data. It
isn't in the database, the respondent didn't fill out the form fully.
Perhaps "NULL" might be a better name - assuming people at this level know
what it means.
OTHER. This means the value is some specific value, but it's not covered in
the TDWG list. I am not sure if this value should be explicitly used if you
are publishing your own vocabulary and using terms from that. I'm inclined
to say it should not be, because doing that would result in two values for
predicates that naturally should be functional.
These special values *can* be done as a single instance, which means you
could easily pull all "not specifieds" out of a dataset, but that means that
either the ranges would have to be declared as a union, which is messy, or
the individuals would have to be declared as having all possible types,
which would break disjoint class declarations.
If you have received this transmission in error please notify us immediately
by return e-mail and delete all copies. If this e-mail or any attachments
have been sent to you in error, that error does not constitute waiver of any
confidentiality, privilege or copyright in respect of information in the
e-mail or attachments. Please consider the environment before printing this
email.
_______________________________________________
tdwg-tag mailing list
tdwg-tag(a)lists.tdwg.org<mailto:tdwg-tag@lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-tag
________________________________
This message is only intended for the addressee named above. Its contents
may be privileged or otherwise protected. Any unauthorized use, disclosure
or copying of this message or its contents is prohibited. If you have
received this message by mistake, please notify us immediately by reply mail
or by collect telephone call. Any personal opinions expressed in this
message do not necessarily represent the views of the Bishop Museum.
1
0
Re: [tdwg-tag] Any TCS users with experiences to report? [SEC=UNCLASSIFIED]
by Paul Murray 02 Nov '12
by Paul Murray 02 Nov '12
02 Nov '12
On 02/11/2012, at 12:57 PM, <Tony.Rees(a)csiro.au> <Tony.Rees(a)csiro.au> wrote:
> Thanks, Paul, for the detailed response.
>
> So in the document I am writing, I will be able to say that the present Australian NSLs as a resource do *not* use TCS although arguably it might not be too difficult to transform specific elements to TCS if the full native richness of the information is not required for our use case?
At this stage, yes. We are influenced by TCS, we have copied and extended some of the TCS enumerations, but they aren't TCS and there has been enough water under the bridge that I cannot say with confidence that the values that do match TCS values mean the same thing in all cases.
>
> For example in our planned use case (extension of ISO metadata) we could still define elements using TCS schema if we wished, and suck/re-render just those from relevant NSL services?
Perhaps I spoke too soon.
The problem is that if you pull out TCS elements and use them, then the result will not be valid XML. This:
<?xml version="1.0" encoding="UTF-8"?>
<somedata
xmlns="http://myschema"
xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01"
>
<myname>
<tcs:ScientificName>
<tcs:CanonicalName>
<tcs:Simple>Canis</tcs:Simple>
<tcs:CanonicalName>
</tcs:ScientificName>
</myname>
</somedata>
Is not valid xml - tcs:ScientificName element is not an element definition in tcs.
However: TCS does expose some element *types*, notably CanonicalName, ScientificName - the type, not the element. (it would have been nice for them to prepend 'type' to their element types, but pressing on:)
So you could define your own schema, your own type, and have it extend TCS CanonicalName. Or you could simply define a XML element as having type canonical name. Either way, this element could not be inserted into a TCS DataSet, but it would still be valid to use it.
Give me a moment, and I'll see if I can make it happen …
Ok! Success!
If you define your own schema like so (I'm doing it all in the one directory to make things easier):
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.org/MySchema"
xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01"
elementFormDefault="qualified"
>
<xs:import namespace="http://www.tdwg.org/schemas/tcs/1.01" schemaLocation="TCSv101.xsd" />
<xs:element name="MyData">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" type="tcs:CanonicalName" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Then you can use the TCS elements like so:
<?xml version="1.0" encoding="UTF-8"?>
<mydata:MyData
xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mydata="http://www.example.org/MySchema"
>
<mydata:Name>
<tcs:Simple>CANIDAE</tcs:Simple>
</mydata:Name>
<mydata:Name>
<tcs:Simple>CANIDAE</tcs:Simple>
<tcs:Uninomial>CANIDAE</tcs:Uninomial>
</mydata:Name>
<mydata:Name>
<tcs:Simple>Canis</tcs:Simple>
<tcs:Uninomial>Canis</tcs:Uninomial>
</mydata:Name>
<mydata:Name>
<tcs:Simple>Canis familiaris</tcs:Simple>
<tcs:Genus>Canis</tcs:Genus>
<tcs:SpecificEpithet>familiaris</tcs:SpecificEpithet>
</mydata:Name>
</mydata:MyData>
and it validates when I use the command
xmllint --nonet --schema MySchema.xsd MyXml.xml
Most importantly, this:
<?xml version="1.0" encoding="UTF-8"?>
<mydata:MyData
xmlns:tcs="http://www.tdwg.org/schemas/tcs/1.01"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mydata="http://www.example.org/MySchema"
>
<mydata:Name>
<tcs:Simple>Canis</tcs:Simple>
<tcs:Genus>Canis</tcs:Genus>
</mydata:Name>
</mydata:MyData>
Does *not* validate, as it shouldn't. In TCS, you do not use "Genus" except as part of a more than uninomial name.
TCS exposes several types that can be used in this way, and quite possibly I should have looked at doing this in the IBIS xml schemas, particularly as you can extend these types in your own schemas. 20/20 hindsight and all that.
> I am thinking at this stage we might only use scientific name, LSID, authorship, rank, common names as available; and possibly a navigable parent tree to generate a taxonomic hierarchy, either as separate elements, or a concatenated string. Synonyms are also a possible area of interest but possibly more trouble than they will be worth in this use case.
TCS Types of interest might be:
ScientificName
NameCitation
AgentNames (for authors)
RelationshipType
AccordingToType
>
> I guess there are attractions to using a globally defined rather than locally defined schema where possible (although maybe not if it’s one no other clients support…)
very much so. But to correctly leverage TCS in a way that can be validated, your will need your own schema. But this can be as simple as exposing each of the types as your own element:
<xs:element name="CanonicalName" type="tcs:CanonicalName" maxOccurs="unbounded" minOccurs="0"/>
and so on.
It might be worth revisiting this for the biodiversity.org.au data at some stage when we are ready to do a fairly major revision of the structure of the output.
>
> Cheers - Tony
>
> From: Paul Murray [mailto:pmurray@anbg.gov.au]
> Sent: Friday, 2 November 2012 12:31 PM
> To: Rees, Tony (CMAR, Hobart)
> Cc: TDWG TAG; Pigot, Simon (CMAR, Hobart); Whitbread, Greg (ANBG) - Contact
> Subject: Re: [tdwg-tag] Any TCS users with experiences to report? [SEC=UNCLASSIFIED]
>
> Firstly, the XML schema:
>
> http://biodiversity.org.au/xml/ibis is an xml namespace, which works a bit differently to RDF namespaces.
>
> RDF does not have an explicit mechanism for finding schema metadata. By convention (and it is just a convention), we usually find the schema for a namespace by assuming that the namespace URI will work as a URL that can be fetched, and that fetching it will pull back a schema description (possibly in any one of several formats, using HTTP content negotiation).
>
> In XML, however, namespaces are explicitly linked to schema documents by the xsi:schemaLocation attribute.
>
> The xml generated by biodiversity.org.au
>
> http://biodiversity.org.au/apni.taxon/54321.xml
>
> Comes back with the declaration
>
> <app:documents xmlns:app="http://biodiversity.org.au/xml/servicelayer/content" xmlns:ibis="http://biodiversity.org.au/xml/ibis" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cfg="http://biodiversity.org.au/xml/servicelayer/configuration" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/dcterms.xsd http://purl.org/dc/dcmitype/http://dublincore.org/schemas/xmls/qdc/dcmitype… http://biodiversity.org.au/xml/servicelayer/content http://anbg.gov.au/ala/schemas/xml/app.xsd http://biodiversity.org.au/xml/ibis http://biodiversity.org.au/xml/ibis-20120706.xsd http://biodiversity.org.au/xml/apnihttp://biodiversity.org.au/xml/apni-2012… http://biodiversity.org.au/xml/afd http://biodiversity.org.au/xml/afd-20120706.xsd http://biodiversity.org.au/xml/col http://biodiversity.org.au/xml/col-20110615.xsd ">
>
> Although it's a bit buried in there, XML parsers can see from this that the xml namespace "http://biodiversity.org.au/xml/ibis" has a location of "http://biodiversity.org.au/xml/ibis-20120706.xsd". All of our XML is supposed to validate, and last time I checked it did.
>
> By the way - note the date on the filename. We have changed the schema from time to time. Another change is upcoming: the addition of an "excluded" flag for concepts that have been considered for APC and have been explicitly excluded (for a variety of reasons). This will be managed by a new schema document being available on our server and the generated xsi:schemaLocation attribute being changed.
>
> Secondly, TCS:
>
> The issue with TCS is that it is very difficult to extend. To use a bit of TCS in some other schema, you would import the element types and extend them. But TCS mostly does not expose its element types as named types that can be referenced externally - it's all done inline. This means that the only place a TCS "ScientificName" or "Rank" element can appear is somewhere inside a TCS DataSet element. This is not in itself a show-stopper: we could simply generate a DataSet wrapper when we produce output in response to fetches.
>
> But there were other issues such as (and I can only recall one or two at the moment - this mail is not a full defence of our decision to not go with TCS):
>
> A TaxonConcepts element may have multiple TaxonRelationships element. We would like to attach additional data to each relationship to capture information that TCS cannot. There is a ProviderSpecificData element, but this is at the end of the TaxonConcept element, and I could not work out a way to stuff the extra data for each relationship into that ProviderSpecificData element in such a way as it was attached to the correct relationship - although re-looking at it now I see a "ref" attribute and perhaps that is meant to do the job.
>
> There are multiple TCS "relationship types", but these did not quite match the data we had. It is not possible to put anything but a TCS relationship type enum into the "type" attribute of a TaxonRelationship element, so we wind up having to provide two fields - the "real" type and the nearest TCS equivalent. The "real" type needs to go in the ProviderSpecificData section - miles away (in the document) from what is supposed to be the primary place where the relationship is described. It's ugly. Furthermore, some of our relationships don't really match the TCS ones at all well - to the point that using a TCS type would be misleading. The TCS enumeration does not have a "other" value, so there was a bit of an impasse.
>
> In any event, we were looking at either putting some relationships in the TCS array, and some in the PSD array, or putting corresponding arrays in each. Of course, in the provider specific data section we cannot use any of the TCS elements, because the element types are not exposed and can only appear in a TCS DataSet at the correct spot. It just got to the point where the ProviderSpecificData section was bigger and more interesting than the TCS, so we broke it out into a separate XML document (which was bundles with the TCS using an ibis:documents wrapper), at which point we couldn't help but ask "Can some one explain again why we are trying to do this?". After more discussions with both the zoologists and the botanists, attempting to work out which TCS enumerated values I should use for what, we gave up.
>
> TCS does an admirable job of being watertight. If you have any valid XML document with any TCS element, then you know that it will be enclosed in a DataSet element and come bundled with enough context to make sense of it. It's a model for shifting around entire, self-contained *sets* of data. Entire taxonomies, sitting as big files on a disk (or in an xml store). But our service layer serves up fragments - one or several taxa in response to a request, and TCS turned out to not be a good model for what we do. The history of trying to use it has left us with a legacy of having multiple relationship-type fields (relationship "description" and relationship "category") whose product does not form a sensible set of values.
>
> What we have now is a site-specific schema that captures and exposes the data we have. Admittedly, this means that the grand goal we are all trying to accomplish - a consistent worldwide net of data - is not as far down the track as we were hoping to go. It means that the problem of working out how data set 1 matches data set 2 is pushed off onto aggregators, a job that is in general impossible for an aggregator to accomplish. If we could have fitted our data into TCS, if everyone else could also have done so, then that would have been wonderful. We were reluctant to abandon it, but to get our data out the door we eventually did.
>
>
> On 02/11/2012, at 9:41 AM, <Tony.Rees(a)csiro.au> <Tony.Rees(a)csiro.au> wrote:
>
>
>
>
> Hi TDWG persons,
>
> I am involved in an activity here to set a local standard for storing taxonomic name, identifier and (probably) hierarchy information in metadata records using our profile of ISO 19115 for the latter, and the question will come up as to whether to use elements from TCS, DwC, EML, NCBII extension to ISO 19115, or other. By default I would expect the front runner to be TCS but it appears few if any major systems have ever gone that route – I have looked at ITIS, COL, TROPICOS, WoRMS, IPNI, GBIF, AFD/APNI, more… the nearest would perhaps be AFD/APNI (hence copying Paul on this email) however their “ibis” schema, though apparently based originally on TCS,http://biodiversity.org.au/xml/ibis-20120909.xsd , does not make any explicit reference to the TCS schema so far as I can see. (Note also the cited schema definition http://biodiversity.org.au/xml/ibis [or presumablyhttp://biodiversity.org.au/xml/ibis.xsd] does not seem to exist, but maybe I am missing something).
>
> I am in the interesting position of also wishing to make apps which both publish and consume taxonomic name information so *could* implement TCS for these, but if no-one else is doing so maybe that is not a path to future data harmonisation, and something like DwC might be better.
>
> It does seem odd that we have a standard endorsed in 2005 by TDWG which is apparently unused by any current major players in the real world. Any thoughts?
>
> Regards - Tony
>
> Tony Rees
> Manager, Divisional Data Centre,
> CSIRO Marine and Atmospheric Research,
> GPO Box 1538,
> Hobart, Tasmania 7001, Australia
> Ph: 0362 325318 (Int: +61 362 325318)
> Fax: 0362 325000 (Int: +61 362 325000)
> e-mail: Tony.Rees(a)csiro.au
> Manager, OBIS Australia regional node, http://www.obis.org.au/
> Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm
> Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
> LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36
>
> From: tdwg-tag-bounces(a)lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Paul Murray
> Sent: Wednesday, 7 March 2012 12:52 PM
> To: Steve Baskauf
> Cc: "Éamonn Ó Tuama (GBIF)"; TDWG TAG
> Subject: Re: [tdwg-tag] Creating a TDWG standard for documenting Data Standards [SEC=UNCLASSIFIED]
>
>
> On 07/03/2012, at 3:11 AM, Steve Baskauf wrote:
>
>
>
> Dag and Éamonn,
>
> In the context of the discussion which has been going on in the TDWG RDF mailing list, I have been thinking more about the issue of how to deal with DwC terms which state "Recommended best practice is to use a controlled vocabulary...". That would be dcterms:type, dwc:language, dwc:basisOfRecord, dwc:sex, dwc:lifeStage, dwc:reproductiveCondition, dwc:behavior, dwc:establishmentMeans, dwc:occurrenceStatus, dwc:disposition, dwc:continent, dwc:waterBody, dwc:islandGroup, dwc:island, dwc:country, dwc:verbatimCoordinateSystem, dwc:georeferenceVerificationStatus, dwc:identificationVerificationStatus, dwc:taxonRank; dwc:nomenclaturalCode, dwc:taxonomicStatus, dwc:relationshipOfResource, and dwc:measurementType .
>
>
> We here have had all sorts of problems using other people's vocabularies - they never quite match the data we have. Our solution has been to use the standard terms where possible, but to mint our own where needed. We create RDF objects and to declare them as being the correct type.
>
> For instance,
> http://biodiversity.org.au/voc/afd/AFD#RelationshipTypeTerm
>
> Is declared to be a subclass of
> http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonRelationshipTerm
>
> And we have a few specific items of that type:
> http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-emendation
> http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-invalid-name
> http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-junior-homonym
> http://biodiversity.org.au/voc/afd/RelationshipTypeTerm#has-miscellaneous-l…
>
> These individuals are therefore correctly typed to be legitimately be used as a TDWG relationshipCategory.
>
> Your lists of dwc:disposition values does not need to be exhaustive. It's legitimate (from a machine point of view) for a site to create their own terms. However, this does mean that the world becomes fragmented into a number of site-specific vocabularies that cannot be machine-reasoned over. The underlying reason for this is that that is in fact the way the world actually is at the moment, and there's not a lot of help for it.
>
> -------------------------------------------------------------
>
> There are two or three approaches to using a standard vocabulary when your own data does not quite match it.
>
> You can use the standard term that is *closest in meaning* to your own term. The difficulty here is that if the meaning of the standard term implies things that are not true of your data, using it means that you are asserting things that are in fact not true, and for that reason I suggest that it's not the way to go.
>
> You can use the standard term whose definition encompasses your term. The difficulty here is that some vocabularies (notably Taxon Concept Schema) don't have "other" or "unspecified" values for their enumerations - they are not exhaustive.
>
> In either of these cases, you will want to supplement the standard term with another value specific to your own data set, whose definition you make available. There are a few ways to do that.
>
> You can use the "define your own term" mechanism and assert both
> _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .
> _:_ tdwg:has_relationship_type my-voc:is-recently-declared-subtaxon-of .
>
> You can have a completely separate predicate:
> _:_ tdwg:has_relationship_type tdwg:is-subtaxon-of .
> _:_ myvoc:has_relationship_type my-voc:is-recently-declared-subtaxon-of .
>
> You can also be terribly clever and declare your own predicate to be a super-property of the TDWG predicate, one whose range is a union. This isn't terribly useful to people using your data unless the tdwg triple is also asserted.
>
> Another alternative is to create an OWL rule that says
> "if a thing has relationship-type my-voc:is-recently-declared-subtaxon-of, then it also has relationship-type tdwg:is-subtaxon-of"
>
> But this creates a performance hit.
>
> -------------------------------------------------------------
>
> That little discussion aside, my main concern is that you don't get mired in attempting to exhaustively list all the different island types (etc) as part of the vocabulary that you are creating. It's a never-ending job. It might be an idea to have the design guideline that no enumeration class defined by the vocabulary shall have more than 10 values. It's arbitrary, but it will keep people from being carried away subdividing types into a hierarchy that they think is a good idea, but which doesn't match the data people already have.
>
> I'd also suggest that that every enumeration (ie, ist of individuals) include two special values:
>
> NOT_SPECIFIED. This value is not present in the source, underlying data. It isn't in the database, the respondent didn't fill out the form fully. Perhaps "NULL" might be a better name - assuming people at this level know what it means.
> OTHER. This means the value is some specific value, but it's not covered in the TDWG list. I am not sure if this value should be explicitly used if you are publishing your own vocabulary and using terms from that. I'm inclined to say it should not be, because doing that would result in two values for predicates that naturally should be functional.
>
> These special values *can* be done as a single instance, which means you could easily pull all "not specifieds" out of a dataset, but that means that either the ranges would have to be declared as a union, which is messy, or the individuals would have to be declared as having all possible types, which would break disjoint class declarations.
>
> If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.
>
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag(a)lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>
> If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.
>
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
2
1