[tdwg-content] Producing a global taxon register (was: ITIS TSNID to uBio NamebankIDs mapping)

Peter DeVries pete.devries at gmail.com
Sat Jun 4 01:26:07 CEST 2011


Hi Tony,

This is probably best handled with a coordinated set of vocabularies. Some
could apply to a large set of taxa while others would need to be somewhat
specific probably at the level of family or lower.

I think that with the instability of classification, and the uncertainty of
reliably reasoning over millions of records might be best to apply these
properties at the level of species rather than something like Order.

For instance do all Diptera have two wings?

I made up an example for testing etc under under GeoSpecies that you can see
here: http://about.geospecies.org/sparql.xhtml#example_8

<http://about.geospecies.org/sparql.xhtml#example_8>Some things to think
about are attributes like "Common" or "Rare" How does one apply these to
taxa as different as mosquitoes and elephants?

I did not mention this in my previous sparql example, but there is a
specific species of mosquito that seems to be dependent on a specific
pitcher plant to reproduce.

One could infer that the distribution of this species in Wisconsin is
limited by the distribution of that Pitcher Plant.

There are a whole host of other relationships such as pollinators/plants,
pathogen/vectors, predator/prey that could be modeled and tested using these
tools.

Respectfully,

- Pete

2011/6/3 <Tony.Rees at csiro.au>

> Hi all (jumping in with some trepidation...)
>
> It's good to hear some ramp-up may be coming of activity in the GNUB space
> (congratulations, Rich et al.). My main concern, however is that it does not
> solve my particular problem - which is in a nutshell, given "any" cited
> taxonomic name, what can we tell about it - with regard to its
> classification, nomenclatural and taxonomic/synonym status, and certain
> attributes (initially for my use case, simple geologic time - is it extant
> or not - and simple habitat classification - is it marine or not - though of
> course infinitely expandable from there).
>
> To me the vision of GNUB is too grand - to index all usages of all names in
> all sources - and the vision of GNI is too limited - to index the names but
> not actually record/harmonise/verify/manage (in a structured way) any
> associated information. I'm after something in between - what I have
> tentatively previously called HCAL - a hierarchical catalogue of all life
> (presuming that at least one "management" hierarchy is incorporated) - or
> maybe just a GTR - global taxon register. Sort of, waiting for the Catalogue
> of Life and/or ITIS to be complete, for both extant and fossil taxa, and
> also incorporate selected "taxon attributes" as above. (This is the space
> into which my IRMNG database is cast as a preliminary/"working for now"
> solution, but obviously without the significant resourcing / community
> cooperation required to build and sustain the thing for the long term).
>
> So my question is, how can such a product emerge from ongoing developments
> in GN* space, or other...
>
> Over to the experts,
>
> Best - Tony
>
> ________________________________________
> From: tdwg-content-bounces at lists.tdwg.org [
> tdwg-content-bounces at lists.tdwg.org] On Behalf Of Richard Pyle [
> deepreef at bishopmuseum.org]
> Sent: Saturday, 4 June 2011 8:48 AM
> To: tdwg-content at lists.tdwg.org
> Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
>
> Working backwards through this thread...
>
> I hadn't read Dima's post until just now, and I see that at least a couple
> of his points (i.e., #2, #5, #6) apply to exposing the UUIDs externally.
> However, I think that a simple protocol (such as replacing spaces with "_",
> and avoiding characters that look the same but are different -- such as the
> Cyrillic 'a') could go a long way to mitigating those problems.
>
> On the other hand, it really depends on what the identifier is for.  The
> string "Danaus_plexippus_(Linnaeus_1758)" may be more friendly to our eyes,
> but "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" is definitely more friendly to a
> computer (Dima's points 1, 3 & 4, among others).  My feeling is that the
> push for GUIDs is more about enabling computer-computer conversations, than
> it is about enabling human-human or human-computer interactions; and
> therefore we should not get bogged down in the "ugliness" of the
> identifiers.  In the context of electronic data services, the "ugliness"
> potential of the "Danaus_plexippus_(Linnaeus_1758)" approach to identifiers
> is far greater than the ugliness potential of
> "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523", when it comes to interlinking
> electronic biodiversity data.  It is nothing for a computer to render
> relevant metadata of the object identified by
> "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" into "Danaus plexippus
> (Linnaeus_1758)" on a computer screen or piece of paper for human-eyeball
> consumption.  But there are many pitfalls (some noted by Dima) for a
> computer to unambiguously resolve "Danaus_plexippus_(Linnaeus_1758)" back to
> a meaningful data object.
>
> I guess my revised point is:  GNI (and uBio/NameBank) are essentially the
> only taxonomic databases out there where a human-friendly
> persistent/actionable identifier of the sort being discussed is even
> plausible as an option.  It may not even be wise in this context (as per
> Dima's points), but it *might* be, depending on the need for a
> human-friendly identifier.
>
> Maybe the simplest thing to do would be to not regard "
> http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758)"
> as an identifier per se, but rather as a protocol for a web service.  In
> other words, if you append a text string to the root URL "
> http://gni.globalnames.org/name_strings/", GNI would run that text string
> against its index and return whatever metadata based on a text-string match.
>  This is not mutually exclusive with an "identifier" in the form of "
> http://gni.globalnames.org/name_strings/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523",
> that would less ambiguously resolve a known record in GNI.  At this point,
> the line between "identifier" and "service" gets fuzzy, of course.  But the
> analogy is true in ZooBank:
>
> The persistent "Identifer" looks like this:
> A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
>
> One way that this identifier can be represented as an *actionable*
> identifier is this:
> urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
>
> Another "actionable" form of the identifier might be this:
>
> http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
>
> or this:
> http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
>
> or even this(?):
>
> http://lsid.tdwg.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
>
> (all of which work, by the way)
>
> However, the following are examples of what I would think of as *services*:
> http://www.google.com/search?q=Danaus+plexippus+(Linnaeus+1758)
>
> http://lsid.tdwg.org/summary/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
>
> http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523&submit=Go
>
> But really, from the perspective of the end-user, does it matter if it's an
> identifier or a service?  Ultimately, they ask the questions, and the
> answers appear on their computer screens.
>
> Aloha,
> Rich
>
>
>
>
>
> > -----Original Message-----
> > From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-
> > bounces at lists.tdwg.org] On Behalf Of Dmitry Mozzherin
> > Sent: Friday, June 03, 2011 4:34 AM
> > To: David Remsen (GBIF)
> > Cc: tdwg-content at lists.tdwg.org; Dmitry Mozzherin; Orrell, Thomas; Alan
> J
> > Hampson; Nicolson, David; Gerald Guala
> > Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
> >
> > In my opinion UUIDs have a few advantages over strings --
> >
> > 1. It is uuid, so it will work with uuid tools (current and future ones)
> > 2. It is less  ambiguous -- For example -- what is the difference between
> Betulа and
> > Betula for your eyes? (one of them has a Cyrillic 'a')
> > 3. Database wise it is faster to search because it is just a 128bit
> number, while
> > a name is at least 245 byte varchar -- it makes searching much faster
> because
> > in relational databases the size of keys directly proportional to the
> search
> > speed
> > 4. UUID v. 5
> > (http://en.wikipedia.org/wiki/Universally_unique_identifier)
> > allows to generate UUID algorithmically without looking up a database (no
> > need for network connection)
> >  5. Links like
> http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758)might be ambigous -- I can think of several ways I can represent name string
> > part in the url and they will all resolve to the same thing in GNI.
> > 6. Unescaped unicode characters in url containing literal name strings
> (people
> > will forget to escape them) will depend on an implementation of a url
> > resolver
> >
> > Saying this links like
> > http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175
> > 8)
> > are definitely attractive and is it good to have them as another way to
> access
> > a name!
> > My personal preference would be not use them as main identifier because
> > of the reasons 1, 2, 3 and 5.
> >
> > Dima
> >
> >
> >
> >
> > On Fri, Jun 3, 2011 at 7:59 AM, David Remsen (GBIF) <dremsen at gbif.org>
> > wrote:
> > > Why not use the name as the basis for the resolvable identifier
> > > instead of a uuid. Isnt there a 1:1 cardinality between the name and
> > > the uuid in the GNI?  Doesnt that mean that
> > >
> > > http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> > 755c34
> > > c601ec
> > > and
> > >
> > http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175
> > > 8)
> > >
> > > are equally unique?  The latter is certainly more readable.  In those
> > > cases where the namestring is a homonym like
> > >
> > > http://gni.globalnames.org/name_strings/Oenanthe
> > >
> > > couldn't you just return the addresses of the two globally unique
> > > forms of the name when you resolve it?
> > >
> > > http://gni.globalnames.org/name_strings/Oenanthe_Smith_1899
> > >
> > > http://gni.globalnames.org/name_strings/Oenanthe_Jones_1900
> > >
> > > Wouldn't those be as globally unique and easier to read and adjust to?
> > > Or am I missing something.  I always wanted to do that with ubio IDs
> > > after a back and forth with Gregor Hagedorn and wished we hadn't
> > > exposed those integers.
> > >
> > > DR
> > >
> > >> Hi Steve,
> > >>
> > >> I don't have time to go through this in detail, and I can't speak for
> > >> the GNI, but I can tell you about how the GNI URI's work at least for
> now.
> > >>
> > >> A while back Dima Mozzherin and I were looking into how triples etc.
> > >> might be of use to the GNI.
> > >>
> > >> We needed a way to generate unique URI's for each name.
> > >>
> > >> We wanted to avoid having to keep these in sync and not require
> > >> everyone to look each ID up through some service.
> > >>
> > >> Dima came up with the following plan. We use the namestring as seed
> > >> to generate a unique UUID.
> > >>
> > >> Basically this is a shared algorithm which the GNI and TaxonConcept
> > >> both use. But it could be used by anyone.
> > >>
> > >> You feed the name string to the algorithm and it spits out a UUID. We
> > >> append then append that to a URI and web service so it is resolvable.
> > >>
> > >> So the name Danaus plexippus (Linnaeus 1758) =>
> > >> 4ef223c4-0c3e-5e84-ace9-755c34c601ec
> > >>
> > >> So if the GNI and and another group have the same namestring they
> > >> have the same UUID.
> > >>
> > >> People can then can link their data set to the GNI with the following
> > >> URI
> > >>
> > >> http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> > 755c3
> > >> 4c601ec
> > >>
> > >> RDF
> > >> http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> > 755c3
> > >> 4c601ec.rdf
> > >>
> > >> <http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> > 755c
> > >> 34c601ec.rdf>If you think of your data set as one table and the GNI
> > >> as another, this URI serves as the foreign key that connects them
> > >> together.
> > >>
> > >> Some on the list don't like how these look, but there is a tremendous
> > >> advantage in not having to worry about syncing two large data sets
> > >> and determining if a given integer is already in use.
> > >>
> > >> Also Rod Page has written a recently about UUID's.
> > >> http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-replicati
> > >> on.html
> > >>
> > >> <http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-
> > replicat
> > >> ion.html>There may be a way to do something similar with bit.ly like
> > >> identifiers that are shorter (mCcSp), but I think it the general idea
> > >> is a good one.
> > >>
> > >> If you recall from my talk at TDWG, I was able to use these to make
> > >> statements that one namestring was a synonym etc. of another etc.
> > >>
> > >> The algorithm we use is written in Ruby but I could be ported to many
> > >> different languages since UUIDs are widely supported.
> > >>
> > >> Respectfully,
> > >>
> > >> - Pete
> > >>
> > >>
> > >>
> > >> On Thu, Jun 2, 2011 at 11:41 PM, Steven J. Baskauf <
> > >> steve.baskauf at vanderbilt.edu> wrote:
> > >>
> > >>>  My email access has been sporadic since this thread developed, so
> > >>> at this point I'll respond to points made in several of the
> > >>> messages.
> > >>>
> > >>> First, I should note that there has been previous discussion on this
> > >>> list on a similar topic from
> > >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002231.htm
> > >>> lthrough
> > >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-
> > January/002231.html.
> > >>> One can review what was said at that time rather quickly by starting
> > >>> on the first linked message and clicking on the "Next Message" link
> > >>> until you get to the end of the range I gave above.
> > >>>
> > >>> My reason for the request for information that started this thread
> > >>> was that I wanted to link to a URI that would anchor the name
> > >>> portion of a name/sensu pair (TNU or Taxon Concept a la TCS if you
> > >>> prefer) as in this RDF
> > >>> snippet:
> > >>>
> > >>>    <tc:nameString>Quercus rubra L.</tc:nameString>
> > >>>    <tc:hasName
> > >>>
> > rdf:about="http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio
> > .org:namebank:448439"
> > >>> <http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:n
> > >>> amebank:448439>/>
> > >>>
> > >>>
> > >>> At this point in the discussion, I'm not actually talking about
> > >>> creating a link to a taxon concept but rather to a taxon name, so
> > >>> some of the issues Pete raised don't apply here (e.g. what's the
> > >>> "right" name for a concept
> > >>> -
> > >>> the question here is simply what's a stable identifier for the name)
> .
> > >>> In
> > >>> principle, I could probably just provide the name string and be done
> > >>> with it.  However, having some degree of faith that Smart, Computer
> > >>> Savvy People might some day be able to use the metadata returned by
> > >>> the URI (or perhaps metadata which they already have in a triple
> > >>> store onsite) to do cool things like knowing that my name is the
> > >>> same as an orthographic variant or that "Quercus rubra  L." is
> > >>> basically the same thing as "Quercus rubra", I would like to also
> > >>> provide a functional URI.
> > >>>
> > >>> As an end -user who isn't very interested in the technical issues
> > >>> involving names, I don't really care what URI I use.  I would prefer
> > >>> for it to be widely recognized and for it to "work" (i.e. be
> > >>> resolvable).  In the earlier
> > >>> (January) thread, there was discussion about existing identifiers.
> > >>> There
> > >>> were a number of posts, but in particular
> > >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002258.htm
> > >>> l
> > >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002259.htm
> > >>> ldiscussed the relative merits of ITIS and uBio ID numbers.  My
> > >>> take-home message from this was that uBio represented the largest
> > >>> single set of names with assigned identifiers (see
> > >>> http://gni.globalnames.org/data_sourcescited in Pete's email) and
> > >>> that uBio metadata provides useful references.
> > >>> Hence my interest in referencing uBio ids as a URI.  However, as a
> > >>> practical matter, the organizations that I share images with either
> > >>> want ITIS TSNs (EOL and Morphbank) or just names (Discover Life).
> > >>> Nobody is asking for uBio identifiers or any other identifier.
> > >>>
> > >>> I found Kevin's comment at
> > >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002486.html
> > >>> very
> > >>> thought-provoking: "My thoughts are that the most likely way this
> > >>> will be solved is by standard market type pressures - ie the best
> > >>> solution/IDs will be used the most and 'float' to the top."  I'm not
> > >>> going to make a judgment about what is the "best" solution or ID.
> > >>> But I would say that in "computer"
> > >>> history, being the "best" doesn't necessarily mean that something
> > >>> will be used.  Take for example, the FOAF vocabulary.  What the heck
> > >>> is Friend of a Friend?  I would venture to say that most of the
> > >>> people using the FOAF vocabulary don't know or care.  The FOAF
> > >>> vocabulary was the one that people started to use and once that
> > >>> happened, people didn't switch even if there was something better.
> > >>> I'm not familiar with the history of other stuff like YouTube and
> > >>> Craig's List, but I would guess that they weren't necessarily "the
> > >>> best" systems - they were just the one that the most people started
> > >>> using first and once that happened, people didn't switch.  I'm using
> > >>> ITIS IDs because they are easy to get and the people I communicate
> > >>> with want them.  Whether they are the "best" or "done correctly"
> > >>> doesn't matter to me as much as the fact that that they are widely
> > >>> recognized and stable (and that thus far every name that I've looked
> > >>> for has been in their database).
> > >>>
> > >>> I think that one reason why this question has been on my mind is
> > >>> that I've been waiting for GNUB (Global Name Use Bank) to come out.
> > >>> I'm not really up on how it is going to work, but my impression is
> > >>> that it was going to be based on the Global Name Index (GNI) which
> > >>> was mentioned in that earlier January thread.  At that point, the
> > >>> GNI names didn't have any identifiers that were exposed to the
> > >>> public as permanent GUIDs.  I'm assuming that if GNUB refers to GNI
> > >>> names, they will have some kind of identifiers.  So if that happens
> > >>> how is the GUID recommendation 8 going to be followed?  As Kevin
> > >>> said in
> > >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-June/002499.html
> > >>> "What I take from recommendation 8 of the GUID applicability guide
> > >>> ... is that if you DON'T already have a record in your own database
> > >>> for a taxon name/concept, then reuse an existing one.  "  What we
> > >>> have here with GNI is a situation where none of the records have
> > >>> identifiers.  In my mind, the "best practice" according to
> > >>> recommendation 8 would be for the GNI to reuse existing identifiers
> > >>> where they exist and NOT make up new ones.  This is a bit more
> > >>> complicated because the ITIS identifiers (which are in common
> > >>> use)
> > >>> don't have an http URI version that is resolvable, and while the
> > >>> uBio identifiers have a resolvable http URI, it's in the form of a
> > >>> proxied LSID, which I've already complained is very ugly.  So I'd
> > >>> like to hear some ideas about how to have "reused" identifiers in
> > >>> the GNI.
> > >>>
> > >>> One thing that comes to my mind would be to have a "domain name"
> > >>> like "http://purl.org/gni/" <http://purl.org/gni/> or
> > >>> "http://purl.org/tn/"<http://purl.org/tn/>("tn" for "taxon name")
> > >>> and to follow it with a namespace/id combination similar to what is
> > >>> done with lsids.  So for example "itis/19408" and "ubio/448439"
> > >>> could be appended, creating http://purl.org/gni/itis/19408and
> > >>> http://purl.org/gni/ubio/448439 for "Quercus rubra  L."  Both URIs
> > >>> could point to the same RDF and that RDF could indicate that the two
> > >>> identifiers are owl:sameAs .  I realize from what Bob Morris has
> > >>> cautioned in the past that there are problems with owl:sameAs when
> > >>> the two things aren't actually the same thing (e.g. if the uBio ID
> > >>> refers to a name string only but the ITIS TSN refers to the name
> > >>> plus an "accepted" status and a relationship to parent taxa).
> > >>> However, if there were an understanding that the GNI only refers to
> > >>> name strings, then one could still refer to
> > >>> http://purl.org/gni/itis/19408 as an identifier for the name string
> > >>> of the thing (whatever it is) that is referred to by an ITIS TSN of
> > >>> 19408.  I don't think there would be a problem saying that and the
> > >>> ubio ID were "owl:sameAs".  Some kind of solution like this would
> > >>> allow people to easily generate a resolvable URI for a name if they
> > >>> were using ITIS TSNs or uBio IDs.  If the name that one wanted to
> > >>> use was so obscure that it was one of the 9.5 million names that
> > >>> uBio has that ITIS doesn't have, then that name would only have the
> > >>> ubio version.  I have no idea whether this would be a good idea or
> > >>> not, but I was really cringing to think about 19 million newly
> > >>> minted UUIDs appended to
> > >>> "http://gni.globalnames.org/"<http://gni.globalnames.org/>and
> > >>> figuring out how to connect those horrid things to the names and
> > >>> ITIS TSNs that I'm already using.  I think that I said this before,
> > >>> but using the purl.org domain rather than one like
> > >>> http://gni.globalnames.org/ would in the future allow somebody else
> > >>> to take over management of providing the metadata when the GUIDs
> > are
> > >>> resolved without having to deal with issues of who "owns" the domain
> > >>> name.
> > >>>
> > >>> Steve
> > >>>
> > >>>
> > >>>
> > >>> Kevin Richards wrote:
> > >>>
> > >>>  Pete,
> > >>>
> > >>> I'm not trying to say what you are doing is a waste of
> time/impossible.
> > >>> I
> > >>> actually think RDF + semantics are a good way forward, but this
> > >>> really implies that we need to rely on the semantics and linkages
> > >>> rather than having a SINGLE ID for a taxon name.  (which is what I
> > >>> thought Steve was getting at).  Each instance of a taxon name can
> > >>> have its own ID and then all these instances are connected via
> > >>> ontology defined semantic links.  This seems more appropriate to me
> > >>> than insisting everyone uses the "Global Taxon Name ID X".
> > >>>
> > >>>
> > >>>
> > >>> In your example of *Aedes triseriatus* and *Ochlerotatus
> > >>> triseriatus* - these are two different names so they need two
> > >>> different IDs, they may be linked by a single taxon concept, but
> > >>> they are separate names.  So which of these now 3 IDs do you expect
> > >>> people to use, and according to what source??
> > >>>
> > >>>
> > >>>
> > >>> For example if we have a name, eg the Robin, Erithacus rubecula,
> > >>> mentioned
> > >>> in IT IS (TSN : 559964) and also in EOL (www.eol.org/pages/1051567),
> > >>> also
> > >>> in GBIF (http://data.gbif.org/species/21266780), also in avibase (
> > >>> http://avibase.bsc-eoc.org/species.jsp?avibaseid=C809B2B90399A43D),
> > >>> which
> > >>> ID are you hoping people will use??  Would you put the IT IS ID in
> your
> > >>> own
> > >>> dataset as the ID for that name - unlikely.  Or would it be better to
> > >>> link
> > >>> them up with semantic linkages.
> > >>>
> > >>>
> > >>>
> > >>> What I take from recommendation 8 of the GUID applicability guide (as
> > >>> Steve
> > >>> puts is "stop making up new identifiers when somebody else already
> has
> > >>> one
> > >>> for the thing you are talking about") is that if you DON'T already
> have
> > >>> a
> > >>> record in your own database for a taxon name/concept, then reuse an
> > >>> existing
> > >>> one.  NOT ditch all your current IDs and adopt someone else's
> > >>> (especially
> > >>> hard considering it is so hard to work out which if the multitude of
> > >>> names
> > >>> ad concept IDs that directly relates to your taxon name).
> > >>>
> > >>>
> > >>>
> > >>> I am all for limiting the number of IDs for the "same" thing, but in
> > >>> some
> > >>> cases it is more useful to build linkages than force this tight
> > >>> integration
> > >>> of data and IDs.  Especially for taxon names and concepts, where it
> is
> > >>> complex to define if you are even talking about the "same" thing or
> not.
> > >>>
> > >>>
> > >>>
> > >>> Kevin
> > >>>
> > >>>
> > >>>
> > >>> *From:* Peter DeVries
> > >>> [mailto:pete.devries at gmail.com<pete.devries at gmail.com>]
> > >>>
> > >>> *Sent:* Wednesday, 1 June 2011 12:38 p.m.
> > >>> *To:* Kevin Richards
> > >>> *Cc:* Steve Baskauf; tdwg-content at lists.tdwg.org; Gerald Guala;
> > >>> Nicolson,
> > >>> David; Alan J Hampson; Orrell, Thomas
> > >>> *Subject:* Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
> > >>>
> > >>>
> > >>>
> > >>> Hi Kevin,
> > >>>
> > >>>
> > >>>
> > >>> I forgot one mention some other things that are different about my
> > >>> project.
> > >>>
> > >>>
> > >>>
> > >>> You can write a simple SPARQL query to get a list of all the
> > >>> TaxonConcept's
> > >>> that have ITIS ids, or all those that have ITIS and NCBI ID's etc.
> > >>>
> > >>>
> > >>>
> > >>> You can do this on any SPARQL endpoint that hosts the data.
> > >>>
> > >>>
> > >>>
> > >>> You can download the entire data set and run the queries on your own
> > >>> endpoint.
> > >>>
> > >>>
> > >>>
> > >>> You can write a script that runs the query and downloads the ITIS
> > >>> numbers
> > >>> and exports them to CSV etc.
> > >>>
> > >>>
> > >>>
> > >>> - Pete
> > >>>
> > >>>
> > >>>
> > >>> On Tue, May 31, 2011 at 5:16 PM, Peter DeVries
> > <pete.devries at gmail.com>
> > >>> wrote:
> > >>>
> > >>> Hi Kevin,
> > >>>
> > >>> On Tue, May 31, 2011 at 3:27 PM, Kevin Richards <
> > >>> RichardsK at landcareresearch.co.nz> wrote:
> > >>>
> > >>> This is exactly why this problem still exists and will be very
> complex
> > >>> to
> > >>> solve - everyone says "we should have a single ID for a specific
> taxon
> > >>> name,
> > >>> there seems to be several IDs 'out there' that refer to the same
> taxon
> > >>> name,
> > >>> so Im going to create another ID to link them all up" - yet another
> ID
> > >>> that
> > >>> no one will particularly want to follow - you would have to get
> everyone
> > >>> to
> > >>> agree that your combinations/integration of taxon names is the best
> one
> > >>> and
> > >>> hope everyone follows it - unlikely in this domain.
> > >>>
> > >>>
> > >>>
> > >>> Isn't this kind of what the The Plant List, and eBird already do?
> > >>>
> > >>>
> > >>>
> > >>> A difference being that they tie these to a specific name and
> specific
> > >>> classification.
> > >>>
> > >>>
> > >>>
> > >>> The Plant list is not really even open so it is difficult to people
> to
> > >>> adopt it in mass.
> > >>>
> > >>>
> > >>>
> > >>> For instance, if I manage a herbarium, how do I easily reconcile my
> > >>> species
> > >>> list with the entities represented in the Plant List?
> > >>>
> > >>>
> > >>>
> > >>> eBird has millions of records which implies that they have been able
> to
> > >>> convince the observers in the field to adopt their system. You are
> > >>> correct
> > >>> in that there are probably a lot of taxonomists that don't like their
> > >>> list.
> > >>>
> > >>> It differs from many of the other classifications, but remember the
> > >>> system
> > >>> rewards them for not agreeing. Note the difference between the
> > microbial
> > >>> taxonomists and other taxonomists. In the case of the microbial
> > >>>
> > >>> workers, the system rewards them for solving problems not debating
> > >>> alternatives. Also, if a good idea comes out that will make it easier
> > >>> for
> > >>> the microbiologists to solve the problems they are rewarded for
> solving,
> > >>> they are less likely to care whose idea it is.
> > >>>
> > >>>
> > >>>
> > >>> Like the microbiologists, there are lots of biologists that work with
> > >>> species with the goal of addressing some non-taxonomic problem.
> > >>>
> > >>>
> > >>>
> > >>> They don't really care if the name is *Aedes triseriatus* or
> > >>> *Ochlerotatus
> > >>> triseriatus, *but they do care that the identifier that they connect
> > >>> their
> > >>> data to is stable.
> > >>>
> > >>>
> > >>>
> > >>> In regards to the issue of market forces,I suspect (but have no
> > >>> knowledge
> > >>> of) that there were probably decisions made in devising these lists
> that
> > >>> have more to do with appeasing certain personalities that creating
> best
> > >>> list. With the way this system rewards people it is likely that the
> > >>> "correct" version will float to the top only after that person has
> > >>> passed
> > >>> away. I don't have much faith that the best system will always float
> to
> > >>> the
> > >>> top, That has a lot to do with the personalities and how the system
> > >>> rewards
> > >>> are setup. Theoretically, it is possible for one strong personality
> or
> > >>> group
> > >>> to force others to adopt their less than optimal solution - at least
> > >>> this
> > >>> seems to happen in other environments.
> > >>>
> > >>>
> > >>>
> > >>> Also, there are all sorts of ways that people can use the publication
> > >>> record to rewrite history. Simply cite the review paper that cites
> the
> > >>> original paper. Or don't cite it at all.
> > >>>
> > >>>
> > >>>
> > >>> I would have used only the ITIS TSN but if the name changes the ID
> > >>> changes.
> > >>> This isn't "wrong", it just does not solve my problem.
> > >>>
> > >>>
> > >>>
> > >>> * ITIS also should add the spiders from the World Spider Catalog.
> > >>>
> > >>>
> > >>>
> > >>> Another issue that I think has inhibited adoption of a common list is
> > >>> that
> > >>> people can't agree on a particular name or a particular
> classification.
> > >>>
> > >>>
> > >>>
> > >>> Since you can model a species concept as having many names and many
> > >>> classifications why not do so?
> > >>>
> > >>>
> > >>>
> > >>> If this idea was originally accepted, I would not have needed to
> create
> > >>> TaxonConcept.org.
> > >>>
> > >>>
> > >>>
> > >>> My plan has aways been to get something that works to solve some
> > >>> problems
> > >>> and then let some larger group take it over.
> > >>>
> > >>>
> > >>>
> > >>> In a sense, I am more like the microbiologists in that I am not being
> > >>> paid
> > >>> to solve this or debate this problem.
> > >>>
> > >>>
> > >>>
> > >>> I am doing it because I think something like this is needed, and it
> is
> > >>> an
> > >>> interesting and personally rewarding puzzle.
> > >>>
> > >>>
> > >>>
> > >>> - Pete
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> My thoughts are that the most likely way this will be solve is by
> > >>> stnadard
> > >>> market type pressures - ie the best solution/IDs will be used the
> most
> > >>> and
> > >>> "float" to the top.  It is easy to say that the global taxon name
> data
> > >>> is a
> > >>> mess, but if you think about it 30 years ago taxon name data were
> very
> > >>> disparate, duplicated, unconnected, many with NO IDs at all.  So I
> > >>> beleive
> > >>> we are making progress and that we will continue to do so albeit at a
> > >>> fairly
> > >>> slow rate.
> > >>>
> > >>> Kevin
> > >>>
> > >>>
> > >>>
> > >>> "I agree. This was one of the reasons that I setup TaxonConcept the
> way
> > >>> I
> > >>> did. It attempts to connect both the LOD entities and the foreign key
> > >>> based
> > >>> entities."
> > >>>
> > >>>  Please consider the environment before printing this email
> > >>> Warning:  This electronic message together with any attachments is
> > >>> confidential. If you receive it in error: (i) you must not read, use,
> > >>> disclose, copy or retain it; (ii) please contact the sender
> immediately
> > >>> by
> > >>> reply email and then delete the emails.
> > >>> The views expressed in this email may not be those of Landcare
> > Research
> > >>> New
> > >>> Zealand Limited. http://www.landcareresearch.co.nz
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>>
> ------------------------------------------------------------------------------------
> > >>> Pete DeVries
> > >>> Department of Entomology
> > >>> University of Wisconsin - Madison
> > >>> 445 Russell Laboratories
> > >>> 1630 Linden Drive
> > >>> Madison, WI 53706
> > >>> Email: pdevries at wisc.edu
> > >>> TaxonConcept <http://www.taxonconcept.org/>  &
> > >>> GeoSpecies<http://about.geospecies.org/> Knowledge
> > >>> Bases
> > >>> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
> > >>>
> > >>>
> --------------------------------------------------------------------------------------
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>>
> ------------------------------------------------------------------------------------
> > >>> Pete DeVries
> > >>> Department of Entomology
> > >>> University of Wisconsin - Madison
> > >>> 445 Russell Laboratories
> > >>> 1630 Linden Drive
> > >>> Madison, WI 53706
> > >>> Email: pdevries at wisc.edu
> > >>> TaxonConcept <http://www.taxonconcept.org/>  &
> > >>> GeoSpecies<http://about.geospecies.org/> Knowledge
> > >>> Bases
> > >>> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
> > >>>
> > >>>
> --------------------------------------------------------------------------------------
> > >>>
> > >>> ------------------------------
> > >>> Please consider the environment before printing this email
> > >>> Warning: This electronic message together with any attachments is
> > >>> confidential. If you receive it in error: (i) you must not read, use,
> > >>> disclose, copy or retain it; (ii) please contact the sender
> immediately
> > >>> by
> > >>> reply email and then delete the emails.
> > >>> The views expressed in this email may not be those of Landcare
> > Research
> > >>> New
> > >>> Zealand Limited. http://www.landcareresearch.co.nz
> > >>>
> > >>>
> > >>> --
> > >>> Steven J. Baskauf, Ph.D., Senior Lecturer
> > >>> Vanderbilt University Dept. of Biological Sciences
> > >>>
> > >>> postal mail address:
> > >>> VU Station B 351634
> > >>> Nashville, TN  37235-1634,  U.S.A.
> > >>>
> > >>> delivery address:
> > >>> 2125 Stevenson Center
> > >>> 1161 21st Ave., S.
> > >>> Nashville, TN 37235
> > >>>
> > >>> office: 2128 Stevenson Center
> > >>> phone: (615) 343-4582,  fax: (615)
> > >>> 343-6707http://bioimages.vanderbilt.edu
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >>
> ------------------------------------------------------------------------------------
> > >> Pete DeVries
> > >> Department of Entomology
> > >> University of Wisconsin - Madison
> > >> 445 Russell Laboratories
> > >> 1630 Linden Drive
> > >> Madison, WI 53706
> > >> Email: pdevries at wisc.edu
> > >> TaxonConcept <http://www.taxonconcept.org/>  &
> > >> GeoSpecies<http://about.geospecies.org/> Knowledge
> > >> Bases
> > >> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
> > >>
> --------------------------------------------------------------------------------------
> > >> _______________________________________________
> > >> tdwg-content mailing list
> > >> tdwg-content at lists.tdwg.org
> > >> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> > >>
> > >
> > >
> > >
> > >
> ----------------------------------------------------------------------------
> > > David Remsen, Senior Programme Officer
> > > Electronic Catalog of Names of Known Organisms
> > > Global Biodiversity Information Facility Secretariat
> > > Universitetsparken 15, DK-2100 Copenhagen, Denmark
> > > Tel: +45-35321472   Fax: +45-35321480
> > > Skype: dremsen
> > >
> ----------------------------------------------------------------------------
> > >
> > >
> > >
> > > _______________________________________________
> > > tdwg-content mailing list
> > > tdwg-content at lists.tdwg.org
> > > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> > >
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>



-- 
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries at wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110603/687a6768/attachment-0001.html 


More information about the tdwg-content mailing list