[tdwg-content] ITIS TSNID to uBio NamebankIDs mapping

Peter DeVries pete.devries at gmail.com
Tue May 31 22:04:52 CEST 2011


Hi David,

Thanks for this heads up. The ITIS ID's that I have are either from checking
manually or mapping on names from one of your downloads.

In the database that feeds TaxonConcept I try to store both the ITIS ID and
the "Scientific Name" that is tied to that ID.

I do this so I know that it's uses the same or a different name for what I
think is the same concept.

Here is an example of a species concept where my name and the ITIS name
differ.

http://lod.taxonconcept.org/ses/sDHAp.html

The lexical groups are there to show indicate that there is some association
between a given namestring and the concept.

It does not mean that these are true synonyms, these are just to allow
pattern matching between namestrings and data that might possibly pertain to
the species concept.

Along the lines of Steve's earlier email it would be "relatively easy" or
perhaps relatively straightforward to create LOD URI's from your data set
with a separate controller that outputs either RDF or RDFa.

URI's similar to http://www.itis.gov/tsn/113839 (ruby on rails likes plural
controller so an alternative would be) http://www.itis.gov/tsns/113839

I could markup an RDF / RDFa example of one of your records in DarwinCore
RDF if that would help people get their head around this.

RDFa in which the RDF markup exists within the HTML page. Here is a page
with links to RDFa examples http://rdfa.info/wiki/Examples-in-the-wild

and a Wikipedia page on RDFa http://en.wikipedia.org/wiki/RDFa

Respectfully,

- Pete

On Tue, May 31, 2011 at 2:16 PM, Nicolson, David <NICOLSOD at si.edu> wrote:

> Dear Pete and Steve,
>
>
>
> I cannot comment on the technical content of your emails (sorry, I'm a
> content guy!), but I do note this comment by Pete:
>
> "Most of the plants also have the USDA Plants identifier. In fact you
> might be able to get the ITIS numbers via the USDA Plants Database."
>
>
>
> I would not recommend getting ITIS TSNs from any other source than ITIS
> (see my prior email for some how-to ideas).
>
>
>
> Firstly, the PLANTS Symbol-to-TSN matches were not always well managed due
> to some technical issues (involving early changes to ITIS, some due to
> problematic bulk-updates of ITIS from the PLANTS data, or sometimes due to
> other artifacts). In MOST cases the TSNs they list will be fine, but a
> silent subset will not.
>
>
>
> Secondly, as I noted, we are mid-stream in a full overhaul of the vascular
> plant data in ITIS, in almost every case using cooperatively-produced data
> sets that have also been made available to PLANTS as well. When/whether they
> use them to update that database is another question, but the ITIS updates
> are proceeding full-steam, with additional improvements where needed.
>
>
>
> Finally, at least when dealing with non-static data sets, I feel it is just
> 'best practice' to get them from the source wherever feasible, rather than
> from other places.
>
>
>
> Best,
>
> Dave
>
>
>
> David Nicolson
> Data Development Coordinator, Integrated Taxonomic Information System
> Biologist, USGS Core Science Systems, Biological Informatics Program
> nicolsod at si.edu    Office 202-633-2149    Fax 202-786-2934
> http://www.itis.gov/
> http://www.cbif.gc.ca/itis/
> "Nihil sumas necesse est..."
>
>
>
>
>
> *From:* Peter DeVries [mailto:pete.devries at gmail.com]
> *Sent:* Tuesday, May 31, 2011 2:48 PM
> *To:* Steve Baskauf
> *Cc:* Nicolson, David; tdwg-content at lists.tdwg.org; Gerald Guala; Orrell,
> Thomas; Alan J Hampson
>
> *Subject:* Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
>
>
>
> Hi Steve et al.,
>
>
>
> I agree. This was one of the reasons that I setup TaxonConcept the way I
> did. It attempts to connect both the LOD entities and the foreign key based
> entities.
>
>
>
> For example:
>
>
>
> The Racoon http://lod.taxonconcept.org/ses/CTZ8z.html
>
>
>
> Has links to many other URL's and URI's as well as the integer id's for:
>
>
>
> EoL
>
> NCBI
>
> ITIS
>
> BOLD
>
>
>
> * For some of these it might be best to represent these as a one to many
> since there are often many names for each concept.
>
>
>
> I have uBio ID's in GeoSpecies but I thought that would be eventually
> pulled in via the GNI.
>
>
>
> I also have a small set of other foreign keys for things like the
> Hymenoptera name server, FishBase, Mushroom Observer and Tropicos.
>
>
>
> Since these are specific to specific subsets of organisms, and came on
> later in my project I thought it would be best to use a separate RDF file to
> map to those.
>
>
>
> For instance with Fishbase http://assets.taxonconcept.org/fb/index.rdf
>
>
>
> Insects like this one http://lod.taxonconcept.org/ses/ICmLC.html also have
> the id for bugguide if it exists there and I have found it under the same
> name or a synonym.
>
>
>
> Of the ~105,000 concepts I have about 47,000 with ITIS ID's. This may be
> useful for your plant list and I can send you a spreadsheet if that is
> easier.
>
>
>
> Most of the plants also have the USDA Plants identifier. In fact you might
> be able to get the ITIS numbers via the USDA Plants Database.
>
>
>
> I have come to realize that many other groups see the solution to data
> access is with a custom API, but this requires understanding and debugging
> your code for each API.
>
>
>
> Once the data is available in RDF it is one API for everything. Some issues
> like what to call each field can be overcome by simply rewriting
> (converting) the RDF.
>
>
>
> This is easy as long as you have equivalent semantics in the meaning of the
> field.
>
>
>
> For instance, it does not really matter if this name is represented as
>
>
>
> <txn:hasScientificName>Procyon lotor</txn:hasScientificName> or <dwc:scientificName>Procyon
> lotor</dwc:scientificName>
>
>
>
> The important thing to understand is that in my model this field does not
> include the authorship string.
>
>
>
> This makes it easier to map this to other datasets and publications that
> don't include the authorship string.
>
>
>
> <txn:scientificNameAuthorship>(Linnaeus
> 1758)</txn:scientificNameAuthorship>
>
>
>
>  * The scientificNameAuthorship should eventually be mapped to a
> publication or a list of probable publications. It is too ambiguous.
>
>
>
> There was a debate about <scientificName> earlier on the list which seemed
> to go back and forth.
>
>
>
> I got tired of rewriting my examples each time and decided to use my own
> vocabulary that works in my example queries and has fields that map as
> closely to dwc as possible.
>
>
>
> - Pete
>
> On Tue, May 31, 2011 at 7:07 AM, Steve Baskauf <
> steve.baskauf at vanderbilt.edu> wrote:
>
> I had actually written a response to this thread about a week ago in which
> I tried to clarify why I wanted to connect the ITIS and uBio identifiers.
> However, I decided that the email was too cynical and not helpful, so I
> erased it.  However, I think that a couple of the points I had in that email
> probably should have been made, so I will try to state them again in a more
> constructive manner.
>
> My reason for wanting to connect the uBio and ITIS identifiers really had
> nothing to do with making use of any of the tools or services that either
> group provides.  Rather it has to do with my desire to follow the best
> practices for GUIDs as laid out in the TDWG GUID Applicability Statement
> (now an official standard).  In particular, I have in mind Recommendations 2
> and 8, which I paraphrase here as: "make HTTP URIs out of your identifiers"
> and "stop making up new identifiers when somebody else already has one for
> the thing you are talking about".  I suppose Recommendation 10 should also
> be mentioned, which I paraphrase as "provide RDF/XML to users that want
> it".
>
> I am actually using ITIS TSNs internally in my database.  However, last
> time I checked there were no GUIDs based on TSNs that met the
> recommendations I've paraphrased above.  (The ITIS website does mention
> "LSIDs" in the context of web services, but they don't follow either
> recommendation 2 or 10.)  However outdated they are, uBio identifiers do
> actually meet recommendations 2 and 10 and that is why I wanted to use them
> (although the http proxied forms are unnecessarily ugly and long).  So that
> explains in a nutshell the reason for my request.  If ITIS would provide a
> simple http URI form of their TSNs which could resolve via content
> negotiation to either HTML or RDF/XML, it would be much easier for me to
> just use them.
>
> OK, here is where I risk stepping on people's toes.  So I'll try to stomp
> gently.  I think that the area of taxon names is one where the TDWG
> community fails miserably at recommendation 8.  I've lost count of the
> number of different kinds of identifiers that are available for referring to
> taxon names (this issue was discussed previously in the thread that starts
> with http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002231.htmlso I won't repeat it here).  I don't know about (nor particularly care
> about) "turf" in this area, but I would challenge the community to get
> serious about recommendation 8 and come up with some consensus about a
> single, universal set of GUIDs for taxon names.  Those identifiers should
> (in my opinion which stems from the GUID recommendations):
> - be http URIs (rec 2)
> - be based on an existing identifier (rec 8)
> - return RDF/XML when a client requests it (rec 10)
> - not change (rec 4)
> I do not like proxied LSIDs (unnecessarily long with many useless
> characters) and I despise UUIDs (what is the point of creating a long,
> un-typeable string to replace a serial number that is already globally
> unique if appended to a domain name?).  Why not just register something like
> "http://purl.org/tn/" <http://purl.org/tn/> (with "tn" representing "taxon
> name") and stick one of the existing serial numbers onto it?  The domain
> name would be "turf-neutral" and anybody (GBIF, TDWG, or another
> organization) could manage the actual resolution through redirection from
> that domain.  Somebody else could take over the management of the GUIDs if
> the first group got tired of it or ran out of money.  The result would be a
> short and simple URI like "http://purl.org/tn/12345"<http://purl.org/tn/12345>.
> What would be wrong with that?  This is not rocket science and could be
> easily accomplished by a few tech-savvy people if the will were there.
>
> Steve
>
>
>
> Nicolson, David wrote:
>
> Hi Steve (and Dave),
>
>
>
> [NB: After having composed the email below, just before sending it, I re-read your initial email more carefully and realized that you said you already had the ITIS TSNs, and were looking to add the NamebankIDs! Doh! Well, in case you (or anyone else) is interested in methods of matching names to get TSNs, I'll go ahead and send this anyway. But do note the comments below about the ITIS "versions" and ongoing overhaul of the vascular plant data in ITIS!!! -Dave]
>
>
>
> I noticed this just before leaving work last week, and was out yesterday, but I wanted to chime in on this. I'm glad the uBio tools are meeting your needs (they do have some cool stuff!), but it should be noted that those tools are using a static snapshot of ITIS data from January 2009, and we have added about 50,000 additional scientific names, and updated tens of thousands of names beyond that (most of that in the last 6 months, as the frequency of loads dropped off in 2009-2010 due to technical issues).
>
>
>
> I also want to note that ITIS is right in the middle of a full update of the vascular plant data in ITIS, and we're loading updated families on a monthly basis... and at long last we are tackling all the leftover issues from several bulk loads from USDA PLANTS data that left unreconciled bits of ITIS' older vascular plant data in various confusing states... so it is a VAST improvement that is underway.
>
>
>
> There are several options for bouncing your names off the current version of ITIS.
>
>
>
> One is to automate a matching process using the live ITIS data, based on the existing ITIS Web Services. I am CC'ing Alan Hampson, our IT fellow who built the Web Services ( http://www.itis.gov/web_service.html ), in case you'd like to follow up with him on that option. The advantage is that once you have a process in place it is completely self-serve and can always utilize the current ITIS data. If you have the resources to do this I think it would be greatly to your advantage to use this approach.
>
>
>
> You can explore some ideas for client software to use the services at:
>
> http://www.itis.gov/ws_develop.html
>
>
>
> And for more information on ITIS web services try
>
> http://www.itis.gov/ws_description.html
>
> http://www.itis.gov/ITISWebService.xml
>
>
>
> The ability to flag multiply-matched names (as you noted) should probably be considered, so that appropriate manual steps can be taken. This solution will allow you to take advantage of subsequent updates to ITIS with a minimum of additional effort, and given that the plant data are in the middle of a major overhaul, this bears consideration!
>
>
>
> Another possibility is to grab a full snapshot of the ITIS data, and load it into a database so you can do what you wish. The obvious drawback is that it goes out of date, as with the ITIS snapshot uBio is currently using. But it puts you in the driver's seat re what to do & getting new versions of ITIS. Some general information about the full exports is in the following page, although conspicuously absent is any mention of the MySQL version which (assuming you have the free MySQL properly installed & configured) can be loaded with just a few clicks or a few command lines (depending on your platform):
>
> http://www.itis.gov/ftp_download.html
>
> And the current ITIS data are all here for downloading:
>
> http://www.itis.gov/downloads/
>
>
>
> A third option, which I note with some trepidation, is the old "Compare Nomenclature/Taxonomy" function on the ITIS site:
>
> http://www.itis.gov/taxmatch_ftp.html
>
> This is a VERY old function that we do plan on replacing (timeframe not yet certain), and it is vulnerable to timeouts, etc., which is why it notes to limit the number of names per pass. But with smaller chunks of names it does work quite well. The caveat is that I would make sure to choose the 4th option in Step 4, as it is at least aware (unlike the 3 other options) of multiply-matched name cases, and lists them separately at the bottom of the report. Just a bare listing of the scientific names, with the word "name" at the top, saved as plain text, is all that is needed for input.
>
>
>
> A final option would be to ask someone at ITIS to handle the matching for you (leaving you to decide re the multiply-matched names). This might be simple from your end, but is suboptimal as it leaves you in the same position as you are now should you want or need to compare names again in the future (whether due to acquiring new names in your system, or wanting to check against a later updated version of ITIS), and it pulls someone here (probably me) off of the push to get more updates into ITIS. But in a pinch, I'm certainly willing to try to help you, should it come down to that! I would just ask that you seriously consider the web services option (in particular) or the others above first.
>
>
>
> I hope this helps some. If you have already run all your matches against the old "ITIS" data via uBio then you might consider re-running (against the current ITIS data) at least the leftover names that you did not yet get matched. Let us know if you have questions (the itiswebmaster at itis.gov address goes to myself and Alan and several others, so that might be the best bet for a follow-up unless you have a question specifically for me).
>
>
>
> Regards,
>
> Dave
>
>
>
> David Nicolson
>
> Data Development Coordinator, Integrated Taxonomic Information System
>
> Biologist, USGS Core Science Systems, Biological Informatics Program
>
> nicolsod at si.edu     Office 202-633-2149    Fax 202-786-2934
>
> http://www.itis.gov/
>
> http://www.cbif.gc.ca/itis/
>
> "Nihil sumas necesse est..."
>
>
>
>
>
> -----Original Message-----
>
> Date: Fri, 20 May 2011 05:42:03 -0500
>
> From: Steve Baskauf <steve.baskauf at vanderbilt.edu> <steve.baskauf at vanderbilt.edu>
>
> Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
>
> To: "David Remsen (GBIF)" <dremsen at gbif.org> <dremsen at gbif.org>
>
> Cc: "tdwg-content at lists.tdwg.org" <tdwg-content at lists.tdwg.org> <tdwg-content at lists.tdwg.org> <tdwg-content at lists.tdwg.org>
>
> Message-ID: <4DD6457B.2080204 at vanderbilt.edu> <4DD6457B.2080204 at vanderbilt.edu>
>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
>
> Thanks, all, for the responses.  The "Compare to ITIS" function does
>
> just what I want.  I did a test run of 1000 names and it worked like a
>
> charm.  I will need to do a little massaging because sometimes two or
>
> more ITIS IDs come back for each uBio ID.  But I can handle that.
>
> Steve
>
>
>
> David Remsen (GBIF) wrote:
>
>
>
> Steve
>
>
>
> Have you tried this?
>
> http://www.ubio.org/clients/ITIS/index.php
>
>
>
> or this?
>
> http://www.ubio.org/services/mapper/index2.php
>
>
>
> All this ubio talk makes me think we were on to something.  Worth a thought about adopting the new stnadrds and tools and making it really smooth.
>
>
>
> DR
>
>
>
>
>
> On 20 May 2011, at 04:46, Steve Baskauf wrote:
>
>
>
>
>
>
>
> I have generated a csv spreadsheet of about 39 000 plant names for the
>
> U.S. which has the ITIS TSNIDs for the names in a column.  I would like
>
> to have the uBio Namebank IDs in another column of the table.  I have
>
> been looking them up on the uBio website by typing in the names as I
>
> need to know the IDs, but after doing about 300 of them, I'm getting
>
> tired of it.  Does anybody have a clever idea of a way to get the other
>
> 38 000 Namebank IDs without looking them up.  I'm sure that it would be
>
> possible to find this out because uBio gets names from ITIS.  However, I
>
> haven't seen any clues about how to do it in an automated fashion.  I'm
>
> guessing that there might be some way to use the uBio web services, but
>
> if so, it isn't obvious and I probably don't have the skills to carry it
>
> out anyway.
>
>
>
> Any ideas?
>
> Steve
>
>
>
> --
>
> Steven J. Baskauf, Ph.D., Senior Lecturer
>
> Vanderbilt University Dept. of Biological Sciences
>
>
>
> postal mail address:
>
> VU Station B 351634
>
> Nashville, TN  37235-1634,  U.S.A.
>
>
>
> delivery address:
>
> 2125 Stevenson Center
>
> 1161 21st Ave., S.
>
> Nashville, TN 37235
>
>
>
> office: 2128 Stevenson Center
>
> phone: (615) 343-4582,  fax: (615) 343-6707
>
> http://bioimages.vanderbilt.edu
>
>
>
> _______________________________________________
>
> tdwg-content mailing list
>
> tdwg-content at lists.tdwg.org
>
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
>
>
>
> .
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Steven J. Baskauf, Ph.D., Senior Lecturer
>
> Vanderbilt University Dept. of Biological Sciences
>
>
>
> postal mail address:
>
> VU Station B 351634
>
> Nashville, TN  37235-1634,  U.S.A.
>
>
>
> delivery address:
>
> 2125 Stevenson Center
>
> 1161 21st Ave., S.
>
> Nashville, TN 37235
>
>
>
> office: 2128 Stevenson Center
>
> phone: (615) 343-4582,  fax: (615) 343-6707
>
> http://bioimages.vanderbilt.edu
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
> --
>
> ------------------------------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> Email: pdevries at wisc.edu
> TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies<http://about.geospecies.org/> Knowledge
> Bases
> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
>
> --------------------------------------------------------------------------------------
>



-- 
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries at wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110531/c33053ca/attachment-0001.html 


More information about the tdwg-content mailing list