New subject: ITIS TSNID to uBio NamebankIDs mapping

24 May 2011

      Hi Steve (and Dave),

[NB: After having composed the email below, just before sending it, I re-read your initial email more carefully and realized that you said you already had the ITIS TSNs, and were looking to add the NamebankIDs! Doh! Well, in case you (or anyone else) is interested in methods of matching names to get TSNs, I'll go ahead and send this anyway. But do note the comments below about the ITIS "versions" and ongoing overhaul of the vascular plant data in ITIS!!! -Dave]

I noticed this just before leaving work last week, and was out yesterday, but I wanted to chime in on this. I'm glad the uBio tools are meeting your needs (they do have some cool stuff!), but it should be noted that those tools are using a static snapshot of ITIS data from January 2009, and we have added about 50,000 additional scientific names, and updated tens of thousands of names beyond that (most of that in the last 6 months, as the frequency of loads dropped off in 2009-2010 due to technical issues).

I also want to note that ITIS is right in the middle of a full update of the vascular plant data in ITIS, and we're loading updated families on a monthly basis... and at long last we are tackling all the leftover issues from several bulk loads from USDA PLANTS data that left unreconciled bits of ITIS' older vascular plant data in various confusing states... so it is a VAST improvement that is underway.

There are several options for bouncing your names off the current version of ITIS.

One is to automate a matching process using the live ITIS data, based on the existing ITIS Web Services. I am CC'ing Alan Hampson, our IT fellow who built the Web Services ( http://www.itis.gov/web_service.html ), in case you'd like to follow up with him on that option. The advantage is that once you have a process in place it is completely self-serve and can always utilize the current ITIS data. If you have the resources to do this I think it would be greatly to your advantage to use this approach. 

You can explore some ideas for client software to use the services at: 
http://www.itis.gov/ws_develop.html

And for more information on ITIS web services try 
http://www.itis.gov/ws_description.html
http://www.itis.gov/ITISWebService.xml

The ability to flag multiply-matched names (as you noted) should probably be considered, so that appropriate manual steps can be taken. This solution will allow you to take advantage of subsequent updates to ITIS with a minimum of additional effort, and given that the plant data are in the middle of a major overhaul, this bears consideration!

Another possibility is to grab a full snapshot of the ITIS data, and load it into a database so you can do what you wish. The obvious drawback is that it goes out of date, as with the ITIS snapshot uBio is currently using. But it puts you in the driver's seat re what to do & getting new versions of ITIS. Some general information about the full exports is in the following page, although conspicuously absent is any mention of the MySQL version which (assuming you have the free MySQL properly installed & configured) can be loaded with just a few clicks or a few command lines (depending on your platform):
http://www.itis.gov/ftp_download.html
And the current ITIS data are all here for downloading:
http://www.itis.gov/downloads/

A third option, which I note with some trepidation, is the old "Compare Nomenclature/Taxonomy" function on the ITIS site:
http://www.itis.gov/taxmatch_ftp.html
This is a VERY old function that we do plan on replacing (timeframe not yet certain), and it is vulnerable to timeouts, etc., which is why it notes to limit the number of names per pass. But with smaller chunks of names it does work quite well. The caveat is that I would make sure to choose the 4th option in Step 4, as it is at least aware (unlike the 3 other options) of multiply-matched name cases, and lists them separately at the bottom of the report. Just a bare listing of the scientific names, with the word "name" at the top, saved as plain text, is all that is needed for input.

A final option would be to ask someone at ITIS to handle the matching for you (leaving you to decide re the multiply-matched names). This might be simple from your end, but is suboptimal as it leaves you in the same position as you are now should you want or need to compare names again in the future (whether due to acquiring new names in your system, or wanting to check against a later updated version of ITIS), and it pulls someone here (probably me) off of the push to get more updates into ITIS. But in a pinch, I'm certainly willing to try to help you, should it come down to that! I would just ask that you seriously consider the web services option (in particular) or the others above first.

I hope this helps some. If you have already run all your matches against the old "ITIS" data via uBio then you might consider re-running (against the current ITIS data) at least the leftover names that you did not yet get matched. Let us know if you have questions (the itiswebmaster@itis.gov address goes to myself and Alan and several others, so that might be the best bet for a follow-up unless you have a question specifically for me).

Regards,
Dave

David Nicolson
Data Development Coordinator, Integrated Taxonomic Information System
Biologist, USGS Core Science Systems, Biological Informatics Program
nicolsod@si.edu     Office 202-633-2149    Fax 202-786-2934
http://www.itis.gov/
http://www.cbif.gc.ca/itis/
"Nihil sumas necesse est..."

-----Original Message-----
Date: Fri, 20 May 2011 05:42:03 -0500
From: Steve Baskauf <steve.baskauf@vanderbilt.edu>
Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
To: "David Remsen (GBIF)" <dremsen@gbif.org>
Cc: "tdwg-content@lists.tdwg.org" <tdwg-content@lists.tdwg.org>
Message-ID: <4DD6457B.2080204@vanderbilt.edu>
Content-Type: text/plain; charset="iso-8859-1"

Thanks, all, for the responses.  The "Compare to ITIS" function does 
just what I want.  I did a test run of 1000 names and it worked like a 
charm.  I will need to do a little massaging because sometimes two or 
more ITIS IDs come back for each uBio ID.  But I can handle that.
Steve

David Remsen (GBIF) wrote:
...
Steve
Have you tried this?
http://www.ubio.org/clients/ITIS/index.php
or this?
http://www.ubio.org/services/mapper/index2.php
All this ubio talk makes me think we were on to something.  Worth a thought about adopting the new stnadrds and tools and making it really smooth.
DR
On 20 May 2011, at 04:46, Steve Baskauf wrote:
...
I have generated a csv spreadsheet of about 39 000 plant names for the 
U.S. which has the ITIS TSNIDs for the names in a column.  I would like 
to have the uBio Namebank IDs in another column of the table.  I have 
been looking them up on the uBio website by typing in the names as I 
need to know the IDs, but after doing about 300 of them, I'm getting 
tired of it.  Does anybody have a clever idea of a way to get the other 
38 000 Namebank IDs without looking them up.  I'm sure that it would be 
possible to find this out because uBio gets names from ITIS.  However, I 
haven't seen any clues about how to do it in an automated fashion.  I'm 
guessing that there might be some way to use the uBio web services, but 
if so, it isn't obvious and I probably don't have the skills to carry it 
out anyway.
Any ideas?
Steve
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping

Dmitry Mozzherin

Roderic Page

Dmitry Mozzherin

Roderic Page

Roderic Page

Cynthia Parr

Roderic Page

Roderic Page

greg whitbread

Dmitry Mozzherin

tags

participants (16)