[tdwg-tnc] LSIDs and taxon concepts
roger at tdwg.org
Tue Oct 9 15:32:40 CEST 2007
Yes WFS (not sure about WMS) is very similar to TAPIR.
In the mapping environment the data needs to be valid to put a point
or a map (or something more tricky) in our world it is a little more
complex what the validity is for. If you ask for a TaxonConcept and
it has a TaxonName embedded in it will the authority string of the TN
be broken into basionym and combination authorship? Some providers
might be able to do this (but few). Is it valid not to include an
authority string at all? It is difficult to see how we would define a
structure that would meet everyones needs.
The only way to do more validity testing in the transport layer is to
have clearer use-cases. Perhaps I should say "Valid for what?" rather
than "Valid for whom?".
All the best,
On 9 Oct 2007, at 13:23, Eamonn O Tuama wrote:
> Hi Roger,
> I was thinking "valid" for a particular consumer who expected
> certain items
> of data. For example, a GetCapabilities document in an OGC Web Map
> is commonly validated before it is returned to a client. So your
> scenario is something like the one I had in mind. As you pointed out
> earlier, it is possible to add an arbitrary literature reference to
> TaxonConcept, but you might want to enforce that (in a relatively
> coupled system) by using an XML Schema. But I do like the RDF "open
> approach and I assume that in scenarios 2 and 3, you could traverse
> relationships and find a link from a TaxonConcept to a literature
> ref in one
> of the triples that get aggregated, i.e. each TaxonConcept instance
> does not
> have to come packaged with a lit ref.
> -----Original Message-----
> From: Roger Hyam [mailto:rogerhyam at googlemail.com] On Behalf Of
> Roger Hyam
> Sent: 08 October 2007 17:38
> To: Eamonn O Tuama
> Cc: 'Richard Pyle'; tdwg-tnc at lists.tdwg.org
> Subject: Re: [tdwg-tnc] LSIDs and taxon concepts
> When I hear validity I always think "Valid for whom?" Maybe I am too
> much of a relativist/liberal etc. Anyhow...
> There are three scenarios
> 1) The consumer and producer of the document have an agreement that
> the serialized RDF will agree with a particular XML Schema. It is not
> possible to include an schemaLocation attribute in serialized RDF so
> this has to be asserted somewhere else. If you (as a consumer) are
> using TAPIR then you will have specified the output model (an XML
> Schema) when you asked for the data in the first place and the data
> should be valid against that schema or the provider is being
> naughty. You could then validate it against another schema of your
> own prior to import.
> 2) If you are not getting the RDF from TAPIR then you don't know the
> document structure. It could be serialized in a number of ways. In
> this case you should use an RDF parser (they are available for all
> languages) to generate an in memory (or database backed) model and
> programatically work over this to do as you please. Using the
> resource centered approach your code would do things like get all the
> TaxonConcepts in the graph then work over them asking for properties
> and values and doing something sensible with them. This is similar to
> what you would do if you got a schema validated XML document that you
> had bound to Java objects using JAXB or even if you read a valid
> document into a DOM of some kind.
> 3) If you are a really sophisticated semantic web wise client you
> would put RDF straight into a model (or triple store) with your own
> ontology that made assertions about what it considered valid
> TaxonConcepts. You would then just ask the model for a list of valid
> I would be interested to meet anyone who is actually validating XML
> documents they get back from a supplier using XML Schema and relying
> on that validity alone to import the data into their own data model.
> I have talked with people who generate Java from XML Schema but they
> usually then mess with it to get it to work for their application.
> If I received and ABCD document that wasn't valid for example I would
> have a choice of rejecting it entirely or trying to work out if the
> bit that broke the validity effects my data model. If it is valid I
> still have to check that the contents of the elements fit my model.
> Whatever happens I have to walk over the DOM programatically so there
> seems little point in actually validating it first I may as well just
> let my own code fail in a way I understand and can recover from.
> There is an analogy I use. When I post a letter I have to have a
> valid envelope, stamp and address on it. I don't expect the postman
> to open the letter and say "Hey that is bad grammar I am not
> delivering it!". When I read a letter I can understand it if the
> grammar is bad in parts - especially if those parts are unimportant
> to me. It might be valid to me but not to the postman.
> (BTW there is a postal strike in the UK at the moment - not sure if
> that strengthens the analogy or invalidates it)
> I hope this helps,
> On 8 Oct 2007, at 15:25, Eamonn O Tuama wrote:
>> Hi Roger,
>> I'd like you to comment on the issue of validation. In RDF, with
>> its Open
>> World assumption, we loose the ability to validate. So how easy is
>> it to
>> take RDF output and, if an application requires it, re-format it so
>> that it
>> can be validated against an XML Schema, i.e., take your example
>> below and
>> link it to an XML Schema. I understand that there are multiple ways
>> in RDF
>> to express the same thing, so would that create problems for a
>> schema if the
>> RDF was coming from different data providers. Or do we have control
>> of that
>> because of the LSID vocabularies?
>> -----Original Message-----
>> From: tdwg-tnc-bounces at lists.tdwg.org
>> [mailto:tdwg-tnc-bounces at lists.tdwg.org] On Behalf Of Roger Hyam
>> Sent: 05 October 2007 16:51
>> To: Richard Pyle
>> Cc: tdwg-tnc at lists.tdwg.org
>> Subject: Re: [tdwg-tnc] LSIDs and taxon concepts
>> Paul, Rich et al.
>> I'll try and answer all the questions in a single mail and also keep
>> it short.
>> Taxon Concept Schema (TCS) is an XML Schema that was standardized by
>> TDWG in 2005 but TCS also uses as short hand for distinguishing
>> between Taxon Names and Taxon Concepts.
>> The fundamental thing that TCS does (both the schema and the way of
>> modeling) is separate TaxonNames (or nomenclatural acts) from
>> TaxonConcepts (actual delimited or implied taxa that one would
>> identify something to).
>> In order to issue LSIDs for TaxonNames or TaxonConcepts it is
>> necessary to represent them in RDF rather than XML Schema. RDF is far
>> more modular by nature than XML Schema and so two vocabularies were
>> put together to represent TaxonNames and TaxonConcepts (rather than
>> one schema) but unless you are issuing pure nomenclatural data you
>> will usually use both.
>> The TaxonName vocabulary is being used by IPNI, Index Fungorum and
>> soon ZooBank. It is also being used by GBIF and anyone else who uses
>> the TaxonConcept or TaxonOccurrence because it is embedded within
>> these vocabularies. In fact it could be used anywhere some one wants
>> to break apart a name string.
>> The TaxonOccurrence vocabulary is being issued by the CATE project
>> (don't have the reference to hand) and Species2k/Catalogue of life
>> are going to use it for their checklist and of course by others who
>> issue TaxonOccurrence data.
>> I'll show and example of the embedding as it makes things clearer.
>> This is abbreviated for clarity. Suppose we want to express an
>> occurrence of a taxon (perhaps as a specimen)
>> <to:TaxonOccurrence rdf:about="urn:lsid:example.com:specimens:1234">
>> <dc:title>Hyam.R.D. 284927 - Rhododendron ponticum L.</dc:title>
>> <to:collector>Roger Hyam</to:collector>
>> <.... other stuff ...>
>> <to:expertName>Chris Browning</to:expertName>
>> <to:taxonName>Rhododendron ponticum
>> <to:taxon >
>> <tn:specificEpithet>ponticum</tn:specificEpithet >
>> <tn:authorship>L.</tn:authorship >
>> <tc:accordingToString>Brown and
>> Smith 1995</tc:accordingToString>
>> <tcom:publishedIn>Some monograph by
>> some guys</tcom:publishedIn>
>> So we have a TaxonOccurrence (really like a DarwinCore record but
>> with embedding). In order to express the identification of this
>> specimen in more detail than just a string we include a TaxonConcept
>> and a TaxonName. Neither the concept nor the name have identities
>> (they are both anonymous) but they are both objects of that type.
>> They could be replaced by references to external instances. There are
>> also properties to allow the supplier to "cop-out" of embedding
>> referencing anything and simply include a string if that is all they
>> have in their database.
>> So in issuing a TaxonOccurrence record I use both TaxonConcept and
>> TaxonName vocabularies. I am not using TCS in the sense of the XML
>> Schema but I am using in the sense of the notions involved.
>> This is where we are headed with integrated standards and semantic
>> I hope this helps.
>> BTW I hope it answers Rich's question as it is possible to add
>> reference info in the TaxonConcept to say where it was published
>> using the common properties defined in:
>> All the best,
>> On 5 Oct 2007, at 13:58, Richard Pyle wrote:
>>> Hi Paul and others,
>>> This leads me to a couple of questions about serving TCS data. For
>>> strictly speaking, ZooBank will be return metadata in accordance
>>> with the
>>> (http://wiki.tdwg.org/twiki/bin/view/TAG/TaxonNameLsidVoc), which
>>> is based
>>> on TCS, but is not TCS per se (ZooBank is concerned with taxon
>>> names, not
>>> concepts). There is also the TaxonConceptLsidVoc
>>> (http://wiki.tdwg.org/twiki/bin/view/TAG/TaxonConceptLsidVoc), which
>>> together with the TaxonNameLsidVoc and other more genral ontologies,
>>> collectively represent the same information as a TCS XML document.
>>> I guess
>>> that one of the things I'm not clear on is whether RDF returned for
>>> an LSID
>>> counts as "TCS", or does TCS specifically mean a document structured
>>> according to the TCS XML Schema?
>>> Also, what are we really serving when we say we're serving TCS
>>> Name-only data is part of TCS, but I wouldn't think of it as TCS
>>> per se. I
>>> think you need it in the cntext of an "accordingTo" instance. (By
>>> the way
>>> -- Roger -- I'd always thought of "accordingTo" as referring to a
>>> PublicationCitation, not an Actor or Team. A topic of discussion
>>> another day...
>>> But my point is, I've got hundreds of thousands of database records
>>> [Name accordingTo Publication], which each represent a pointer to a
>>> concept (that is, "concept" sensu Kennedy, not sensu Pyle). And
>>> for many of
>>> these, I also have information on synonymies within the Publication
>>> taxon concepts defined at the resolution of names, which means at
>>> implied resolution of type specimens). What I don't have,
>>> however, is
>>> robust sets of "taxon concept" records that go into more specific
>>> regarding the definition of the concept itself (in terms of non-type
>>> specimens and/or character data, for example). Also, I don't have
>>> much in
>>> the way of third-party RelationshipAssertions to define how these
>>> concepts map to each other.
>>> This leads to the question I've been meaning to ask, which is "How
>>> information do I need before I call it a TCS document?" I would
>>> say raw
>>> names data alone don't cut it -- you would need at least an
>>> before you could call it a concept/TCS document. But if all I have
>>> as an
>>> accordingTo (with no additional specimens or characters or
>>> RelationshipAssertions), do I still call it TCS?
>>> Sorry if I'm over-thinking this...
>>>> -----Original Message-----
>>>> From: tdwg-tnc-bounces at lists.tdwg.org
>>>> [mailto:tdwg-tnc-bounces at lists.tdwg.org] On Behalf Of Paul Allen
>>>> Sent: Friday, October 05, 2007 2:11 AM
>>>> To: tdwg-tnc at lists.tdwg.org
>>>> Subject: [tdwg-tnc] LSIDs and taxon concepts
>>>> Hi all,
>>>> I'm new to this list and hope that the following are
>>>> appropriate questions.
>>>> In Bratislava, I wasn't keeping detailed enough notes on
>>>> projects and their current and future plans wrt TCS.
>>>> What sites are currently publishing TCS-formatted data or
>>>> will be within the year? I know that zoobank.org will be
>>>> publishing TCS data in the near future. Is GBIF? ITIS? Species2000?
>>>> What sites are publishing real "taxon concept" data (in TCS
>>>> format or not)?
>>>> Conversely, what sites are simply publishing "nominal taxon
>>>> concepts" as opposed to detailed authoritative taxon concepts?
>>>> Is this the kind of thing for which we should generate a
>>>> survey to send to sites (i.e. their plans for publishing TCS)
>>>> or distrubute to TDWG members?
>>>> Paul Allen, Assistant Director
>>>> Information Science pea1 at cornell.edu
>>>> Cornell Lab of Ornithology (800) 843-BIRD
>>>> 159 Sapsucker Woods Road (607) 254-2480 (direct)
>>>> Ithaca, NY 14850 (607) 254-2415 (fax)
>>>> tdwg-tnc mailing list
>>>> tdwg-tnc at lists.tdwg.org
>>> tdwg-tnc mailing list
>>> tdwg-tnc at lists.tdwg.org
>> tdwg-tnc mailing list
>> tdwg-tnc at lists.tdwg.org
More information about the tdwg-content