The mapping can get complicated. At the level of taxa, one could do a one-to-one mapping of  TreeBASE taxa to NCBI taxa. At the level of OTUs, the mapping may be one to many. An OTU may correspond to a single specimen, a single sequence, or a set of sequences from multiple individuals of the same taxon, or, indeed, a composite of exemplar taxa representing a higher taxon. So, OTUs map onto sets of observations.

Mapping OTUs to one or more specimen URIs would be great, if we had these. But as a rule we don't. Most individual data providers don't make individual specimens addressable. GBIF does, but we'd have to assume that these were stable over time, and that we have tools in place to map museum specimen codes to GBIF specimen URLs, and in general we don't. This isn't a huge obstacle if GBIF were to provide some guarantee that it's specimen URLs were stable, we'd then "just" need some tools to convert "Museum addreviation specimen code xxx" to a URL.

Regards

Rod

On 28 Aug 2010, at 23:59, Blum, Stan wrote:

Regarding #1, and assuming that this concerns molecular data where individuals often function as OTUs, I think it would be even more important for the long-term usefulness of the data to support the ability to reference the specimen with a resolvable GUID or at least a collection code and catalog number.  A simple assertion that “this sequence came from [an unknown specimen identified to be] this taxon” can’t be re-examined or validated except by the sequence data.  If you know what specimen it came from, the identification can be updated (by  more methods).  

On the other hand, a full name backed up by a URL or source/GUID would be a big improvement on codes and abbreviations

-Stan


On 8/27/10 8:27 AM, "Arlin Stoltzfus" <arlin@umd.edu> wrote:

I'm sending this reply to only the tdwg-phylo list (sending to everyone seems like overkill).

Here are two ideas based on the use of phylogenies:

1.   For various reasons, its important to be able to associate valid species sources or other universal identifiers (e.g., NCBI gis) with the human-readable OTU identifiers used in tree files, but this typically isn't done and it's not always easy.  The goal of this project is to enable ordinary phylogenetics & systematics users to use current standards (Newick, NHX, phyloxml, ...) to associate species names (possibly other tax ids) with phylogenies in their usual workflows.  The focus is on developing short-term tools and strategies that might lead to better long-term solutions.  In some cases, its just a matter of knowing how to use the file format properly, possibly aided by better tools for data input.  For users whose workflows rely on Newick, we would need a way to keep a separate mapping of OTU ids and tax ids, along with tools to interconvert or translate to one of the other formats (this could be as simple as an Excel spreadsheet or as complex as a web service that maintains your mapping and does the translation for you).   

2.  There is a huge variety of tree viewers.  To some extent, users need this variety due to their having different feature sets.  But users shouldn't have to choose the viewer based on data format restrictions.  The goal of this project is to improve the usability of tree viewers.   Assess the interoperability (standards compatibility) of tree viewing software, develop strategies to improve it, and get started on any strategies that can be implemented.  Its not possible to modify viewers whose source code is unavailable, but there may be ways to work around this with scripts and translation tools.  

Arlin

_______________________________________________
tdwg-phylo mailing list
tdwg-phylo@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-phylo