Hi all,
we have reserved one full day for a Phylogenetics Standards hands-on activity at the upcoming TDWG conference. We asked for power strips and wireless network and tables that we can sit around and work together. The only thing that's missing right now is what exactly we should target. The options are wide open right now, with the two only major constraints being that (a) we have no funding this year to bring people in who wouldn't otherwise be there, and (b) abstract submission deadline is next Wednesday.
Any feedback or ideas, wild or not, that you have would be welcome - send those our way.
And BTW currently this workshops seems to be placed on the Wednesday of the conference week, so those of you blitzing the local biosphere will unfortunately have a conflict.
Cheers, Nico & Hilmar
<><><><><><><><><><><><><><><><><><><><> Nico Cellinese, Ph.D. Assistant Curator, Herbarium & Informatics Adjunct Assistant Professor, Department of Biology
Florida Museum of Natural History University of Florida 354 Dickinson Hall, PO Box 117800 Gainesville, FL 32611-7800, U.S.A. Tel. 352-273-1979 Fax 352-846-1861 http://www.flmnh.ufl.edu
I'm sending this reply to only the tdwg-phylo list (sending to everyone seems like overkill).
Here are two ideas based on the use of phylogenies:
1. For various reasons, its important to be able to associate valid species sources or other universal identifiers (e.g., NCBI gis) with the human-readable OTU identifiers used in tree files, but this typically isn't done and it's not always easy. The goal of this project is to enable ordinary phylogenetics & systematics users to use current standards (Newick, NHX, phyloxml, ...) to associate species names (possibly other tax ids) with phylogenies in their usual workflows. The focus is on developing short-term tools and strategies that might lead to better long-term solutions. In some cases, its just a matter of knowing how to use the file format properly, possibly aided by better tools for data input. For users whose workflows rely on Newick, we would need a way to keep a separate mapping of OTU ids and tax ids, along with tools to interconvert or translate to one of the other formats (this could be as simple as an Excel spreadsheet or as complex as a web service that maintains your mapping and does the translation for you).
2. There is a huge variety of tree viewers. To some extent, users need this variety due to their having different feature sets. But users shouldn't have to choose the viewer based on data format restrictions. The goal of this project is to improve the usability of tree viewers. Assess the interoperability (standards compatibility) of tree viewing software, develop strategies to improve it, and get started on any strategies that can be implemented. Its not possible to modify viewers whose source code is unavailable, but there may be ways to work around this with scripts and translation tools.
Arlin
On Aug 26, 2010, at 9:40 PM, Nico Cellinese wrote:
Hi all,
we have reserved one full day for a Phylogenetics Standards hands-on activity at the upcoming TDWG conference. We asked for power strips and wireless network and tables that we can sit around and work together. The only thing that's missing right now is what exactly we should target. The options are wide open right now, with the two only major constraints being that (a) we have no funding this year to bring people in who wouldn't otherwise be there, and (b) abstract submission deadline is next Wednesday.
Any feedback or ideas, wild or not, that you have would be welcome - send those our way.
And BTW currently this workshops seems to be placed on the Wednesday of the conference week, so those of you blitzing the local biosphere will unfortunately have a conflict.
Cheers, Nico & Hilmar
<><><><><><><><><><><><><><><><><><><><> Nico Cellinese, Ph.D. Assistant Curator, Herbarium & Informatics Adjunct Assistant Professor, Department of Biology
Florida Museum of Natural History University of Florida 354 Dickinson Hall, PO Box 117800 Gainesville, FL 32611-7800, U.S.A. Tel. 352-273-1979 Fax 352-846-1861 http://www.flmnh.ufl.edu
<ATT00001.txt>
------- Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
Hi Arlin and others
I think that idea 1 below is a really valuable topic to address. The ability to link tree tips to other sources of data by taxon name or identifier is a threshold step for automating a whole range of analyses and displays. For example, the software I work on, Biodiverse http://www.purl.org/biodiverse/ , links phylogenies to species location data for visualisation and analysis. It would be great to have it running automatically online, linking (for example) trees from treebase to distributions from GBIF. To do this (and many other things) however we need a better solution to the taxon matching problem you describe.
cheers
Dan
From: tdwg-phylo-bounces@lists.tdwg.org [mailto:tdwg-phylo-bounces@lists.tdwg.org] On Behalf Of Arlin Stoltzfus Sent: Friday, August 27, 2010 11:27 AM To: tdwg-phylo@lists.tdwg.org Subject: Re: [tdwg-phylo] Upcoming TDWG meeting
I'm sending this reply to only the tdwg-phylo list (sending to everyone seems like overkill).
Here are two ideas based on the use of phylogenies:
1. For various reasons, its important to be able to associate valid species sources or other universal identifiers (e.g., NCBI gis) with the human-readable OTU identifiers used in tree files, but this typically isn't done and it's not always easy. The goal of this project is to enable ordinary phylogenetics & systematics users to use current standards (Newick, NHX, phyloxml, ...) to associate species names (possibly other tax ids) with phylogenies in their usual workflows. The focus is on developing short-term tools and strategies that might lead to better long-term solutions. In some cases, its just a matter of knowing how to use the file format properly, possibly aided by better tools for data input. For users whose workflows rely on Newick, we would need a way to keep a separate mapping of OTU ids and tax ids, along with tools to interconvert or translate to one of the other formats (this could be as simple as an Excel spreadsheet or as complex as a web service that maintains your mapping and does the translation for you).
2. There is a huge variety of tree viewers. To some extent, users need this variety due to their having different feature sets. But users shouldn't have to choose the viewer based on data format restrictions. The goal of this project is to improve the usability of tree viewers. Assess the interoperability (standards compatibility) of tree viewing software, develop strategies to improve it, and get started on any strategies that can be implemented. Its not possible to modify viewers whose source code is unavailable, but there may be ways to work around this with scripts and translation tools.
Arlin
On Aug 26, 2010, at 9:40 PM, Nico Cellinese wrote:
Hi all,
we have reserved one full day for a Phylogenetics Standards hands-on activity at the upcoming TDWG conference. We asked for power strips and wireless network and tables that we can sit around and work together. The only thing that's missing right now is what exactly we should target. The options are wide open right now, with the two only major constraints being that (a) we have no funding this year to bring people in who wouldn't otherwise be there, and (b) abstract submission deadline is next Wednesday.
Any feedback or ideas, wild or not, that you have would be welcome - send those our way.
And BTW currently this workshops seems to be placed on the Wednesday of the conference week, so those of you blitzing the local biosphere will unfortunately have a conflict.
Cheers,
Nico & Hilmar
<><><><><><><><><><><><><><><><><><><><>
Nico Cellinese, Ph.D.
Assistant Curator, Herbarium & Informatics
Adjunct Assistant Professor, Department of Biology
Florida Museum of Natural History
University of Florida
354 Dickinson Hall, PO Box 117800
Gainesville, FL 32611-7800, U.S.A.
Tel. 352-273-1979
Fax 352-846-1861
http://www.flmnh.ufl.edu http://www.flmnh.ufl.edu/
<ATT00001.txt>
-------
Arlin Stoltzfus (arlin@umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org
Regarding #1, and assuming that this concerns molecular data where individuals often function as OTUs, I think it would be even more important for the long-term usefulness of the data to support the ability to reference the specimen with a resolvable GUID or at least a collection code and catalog number. A simple assertion that “this sequence came from [an unknown specimen identified to be] this taxon” can’t be re-examined or validated except by the sequence data. If you know what specimen it came from, the identification can be updated (by more methods).
On the other hand, a full name backed up by a URL or source/GUID would be a big improvement on codes and abbreviations
-Stan
On 8/27/10 8:27 AM, "Arlin Stoltzfus" arlin@umd.edu wrote:
I'm sending this reply to only the tdwg-phylo list (sending to everyone seems like overkill).
Here are two ideas based on the use of phylogenies:
1. For various reasons, its important to be able to associate valid species sources or other universal identifiers (e.g., NCBI gis) with the human-readable OTU identifiers used in tree files, but this typically isn't done and it's not always easy. The goal of this project is to enable ordinary phylogenetics & systematics users to use current standards (Newick, NHX, phyloxml, ...) to associate species names (possibly other tax ids) with phylogenies in their usual workflows. The focus is on developing short-term tools and strategies that might lead to better long-term solutions. In some cases, its just a matter of knowing how to use the file format properly, possibly aided by better tools for data input. For users whose workflows rely on Newick, we would need a way to keep a separate mapping of OTU ids and tax ids, along with tools to interconvert or translate to one of the other formats (this could be as simple as an Excel spreadsheet or as complex as a web service that maintains your mapping and does the translation for you).
2. There is a huge variety of tree viewers. To some extent, users need this variety due to their having different feature sets. But users shouldn't have to choose the viewer based on data format restrictions. The goal of this project is to improve the usability of tree viewers. Assess the interoperability (standards compatibility) of tree viewing software, develop strategies to improve it, and get started on any strategies that can be implemented. Its not possible to modify viewers whose source code is unavailable, but there may be ways to work around this with scripts and translation tools.
Arlin
The mapping can get complicated. At the level of taxa, one could do a one-to-one mapping of TreeBASE taxa to NCBI taxa. At the level of OTUs, the mapping may be one to many. An OTU may correspond to a single specimen, a single sequence, or a set of sequences from multiple individuals of the same taxon, or, indeed, a composite of exemplar taxa representing a higher taxon. So, OTUs map onto sets of observations.
Mapping OTUs to one or more specimen URIs would be great, if we had these. But as a rule we don't. Most individual data providers don't make individual specimens addressable. GBIF does, but we'd have to assume that these were stable over time, and that we have tools in place to map museum specimen codes to GBIF specimen URLs, and in general we don't. This isn't a huge obstacle if GBIF were to provide some guarantee that it's specimen URLs were stable, we'd then "just" need some tools to convert "Museum addreviation specimen code xxx" to a URL.
Regards
Rod
On 28 Aug 2010, at 23:59, Blum, Stan wrote:
Regarding #1, and assuming that this concerns molecular data where individuals often function as OTUs, I think it would be even more important for the long-term usefulness of the data to support the ability to reference the specimen with a resolvable GUID or at least a collection code and catalog number. A simple assertion that “this sequence came from [an unknown specimen identified to be] this taxon” can’t be re-examined or validated except by the sequence data. If you know what specimen it came from, the identification can be updated (by more methods).
On the other hand, a full name backed up by a URL or source/GUID would be a big improvement on codes and abbreviations
-Stan
On 8/27/10 8:27 AM, "Arlin Stoltzfus" arlin@umd.edu wrote:
I'm sending this reply to only the tdwg-phylo list (sending to everyone seems like overkill).
Here are two ideas based on the use of phylogenies:
- For various reasons, its important to be able to associate
valid species sources or other universal identifiers (e.g., NCBI gis) with the human-readable OTU identifiers used in tree files, but this typically isn't done and it's not always easy. The goal of this project is to enable ordinary phylogenetics & systematics users to use current standards (Newick, NHX, phyloxml, ...) to associate species names (possibly other tax ids) with phylogenies in their usual workflows. The focus is on developing short-term tools and strategies that might lead to better long-term solutions. In some cases, its just a matter of knowing how to use the file format properly, possibly aided by better tools for data input. For users whose workflows rely on Newick, we would need a way to keep a separate mapping of OTU ids and tax ids, along with tools to interconvert or translate to one of the other formats (this could be as simple as an Excel spreadsheet or as complex as a web service that maintains your mapping and does the translation for you).
- There is a huge variety of tree viewers. To some extent, users
need this variety due to their having different feature sets. But users shouldn't have to choose the viewer based on data format restrictions. The goal of this project is to improve the usability of tree viewers. Assess the interoperability (standards compatibility) of tree viewing software, develop strategies to improve it, and get started on any strategies that can be implemented. Its not possible to modify viewers whose source code is unavailable, but there may be ways to work around this with scripts and translation tools.
Arlin
tdwg-phylo mailing list tdwg-phylo@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-phylo
On Aug 28, 2010, at 6:59 PM, Blum, Stan wrote:
Regarding #1, and assuming that this concerns molecular data where individuals often function as OTUs, I think it would be even more important for the long-term usefulness of the data to support the ability to reference the specimen with a resolvable GUID
... On the other hand, a full name backed up by a URL or source/GUID would be a big improvement on codes and abbreviations
My main aim is to support data integration (rather than validation), and the two most important integrating variables for the foreseeable future (at least in my limited vision) are species name (or other taxonomic identifier) and geographic coordinates. These are important partly because the great mass of users outside of TDWG are committed to using the same kinds of species names and the same kinds of coordinates.
Most phylogeny information artefacts (e.g., files) out there don't have either one of these , so integrating phylogenetic information into the global web of data isn't going to get very far until we make it easy for users to put this information into their trees.
To the extent that the scientific community is committed in the same way to specimen identifiers, then this makes the problem simpler because the specimen source would become the integrating variable, and would mediate the integration of data by species or location (because the specimen would have a species and a location). But I don't think we are there yet.
Arlin
On 8/27/10 8:27 AM, "Arlin Stoltzfus" arlin@umd.edu wrote:
I'm sending this reply to only the tdwg-phylo list (sending to everyone seems like overkill).
Here are two ideas based on the use of phylogenies:
- For various reasons, its important to be able to associate
valid species sources or other universal identifiers (e.g., NCBI gis) with the human-readable OTU identifiers used in tree files, but this typically isn't done and it's not always easy. The goal of this project is to enable ordinary phylogenetics & systematics users to use current standards (Newick, NHX, phyloxml, ...) to associate species names (possibly other tax ids) with phylogenies in their usual workflows. The focus is on developing short-term tools and strategies that might lead to better long-term solutions. In some cases, its just a matter of knowing how to use the file format properly, possibly aided by better tools for data input. For users whose workflows rely on Newick, we would need a way to keep a separate mapping of OTU ids and tax ids, along with tools to interconvert or translate to one of the other formats (this could be as simple as an Excel spreadsheet or as complex as a web service that maintains your mapping and does the translation for you).
- There is a huge variety of tree viewers. To some extent, users
need this variety due to their having different feature sets. But users shouldn't have to choose the viewer based on data format restrictions. The goal of this project is to improve the usability of tree viewers. Assess the interoperability (standards compatibility) of tree viewing software, develop strategies to improve it, and get started on any strategies that can be implemented. Its not possible to modify viewers whose source code is unavailable, but there may be ways to work around this with scripts and translation tools.
Arlin
<ATT00001.txt>
------- Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
To build momentum for a possible working session on linking trees, I put some parts of this thread on the twiki page here:
http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/LinkingTrees2010
This is one place to indicate your interest by adding your name and describing what you can bring to the workshop (in terms of both real- world problems and possible solutions).
All of you should be able to access the twiki using credentials that you established as a TDWG member.
Arlin
On Aug 30, 2010, at 2:08 PM, Arlin Stoltzfus wrote:
On Aug 28, 2010, at 6:59 PM, Blum, Stan wrote:
Regarding #1, and assuming that this concerns molecular data where individuals often function as OTUs, I think it would be even more important for the long-term usefulness of the data to support the ability to reference the specimen with a resolvable GUID
... On the other hand, a full name backed up by a URL or source/GUID would be a big improvement on codes and abbreviations
My main aim is to support data integration (rather than validation), and the two most important integrating variables for the foreseeable future (at least in my limited vision) are species name (or other taxonomic identifier) and geographic coordinates. These are important partly because the great mass of users outside of TDWG are committed to using the same kinds of species names and the same kinds of coordinates.
Most phylogeny information artefacts (e.g., files) out there don't have either one of these , so integrating phylogenetic information into the global web of data isn't going to get very far until we make it easy for users to put this information into their trees.
To the extent that the scientific community is committed in the same way to specimen identifiers, then this makes the problem simpler because the specimen source would become the integrating variable, and would mediate the integration of data by species or location (because the specimen would have a species and a location). But I don't think we are there yet.
Arlin
On 8/27/10 8:27 AM, "Arlin Stoltzfus" arlin@umd.edu wrote:
I'm sending this reply to only the tdwg-phylo list (sending to everyone seems like overkill).
Here are two ideas based on the use of phylogenies:
- For various reasons, its important to be able to associate
valid species sources or other universal identifiers (e.g., NCBI gis) with the human-readable OTU identifiers used in tree files, but this typically isn't done and it's not always easy. The goal of this project is to enable ordinary phylogenetics & systematics users to use current standards (Newick, NHX, phyloxml, ...) to associate species names (possibly other tax ids) with phylogenies in their usual workflows. The focus is on developing short-term tools and strategies that might lead to better long-term solutions. In some cases, its just a matter of knowing how to use the file format properly, possibly aided by better tools for data input. For users whose workflows rely on Newick, we would need a way to keep a separate mapping of OTU ids and tax ids, along with tools to interconvert or translate to one of the other formats (this could be as simple as an Excel spreadsheet or as complex as a web service that maintains your mapping and does the translation for you).
- There is a huge variety of tree viewers. To some extent,
users need this variety due to their having different feature sets. But users shouldn't have to choose the viewer based on data format restrictions. The goal of this project is to improve the usability of tree viewers. Assess the interoperability (standards compatibility) of tree viewing software, develop strategies to improve it, and get started on any strategies that can be implemented. Its not possible to modify viewers whose source code is unavailable, but there may be ways to work around this with scripts and translation tools.
Arlin
<ATT00001.txt>
Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
<ATT00001.txt>
------- Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
Hi all -
thanks for everyone's input and the enthusiasm - very much appreciated! We tried to synthesize this and our own thoughts into an abstract. The resulting draft is here:
https://docs.google.com/document/pub?id=1ihH6MGKBCpH10sqvDK5ahLgvz6P_qP-TaN-...
It is due for submission tomorrow, Sep 1, at 23:59 UTC which is (1 minute before) 8pm US Eastern Time (if I understand the time zones the same way as TDWG does), so there's not a whole lot of time for changes. That said, please don't hesitate if you have any suggestions, and we aren't quite up to the 3,500 character limit yet, so adding on is still possible too.
-Nico & Hilmar
On Aug 26, 2010, at 9:40 PM, Nico Cellinese wrote:
Hi all,
we have reserved one full day for a Phylogenetics Standards hands-on activity at the upcoming TDWG conference. We asked for power strips and wireless network and tables that we can sit around and work together. The only thing that's missing right now is what exactly we should target. The options are wide open right now, with the two only major constraints being that (a) we have no funding this year to bring people in who wouldn't otherwise be there, and (b) abstract submission deadline is next Wednesday.
Any feedback or ideas, wild or not, that you have would be welcome - send those our way.
And BTW currently this workshops seems to be placed on the Wednesday of the conference week, so those of you blitzing the local biosphere will unfortunately have a conflict.
Cheers, Nico & Hilmar
<><><><><><><><><><><><><><><><><><><><> Nico Cellinese, Ph.D. Assistant Curator, Herbarium & Informatics Adjunct Assistant Professor, Department of Biology
Florida Museum of Natural History University of Florida 354 Dickinson Hall, PO Box 117800 Gainesville, FL 32611-7800, U.S.A. Tel. 352-273-1979 Fax 352-846-1861 http://www.flmnh.ufl.edu
Just FYI, we submitted the abstract yesterday. The URL below should have the most up-to-date version. I've also linked to it from our wiki.
-hilmar
On Aug 31, 2010, at 10:25 PM, Hilmar Lapp wrote:
Hi all -
thanks for everyone's input and the enthusiasm - very much appreciated! We tried to synthesize this and our own thoughts into an abstract. The resulting draft is here:
https://docs.google.com/document/pub?id=1ihH6MGKBCpH10sqvDK5ahLgvz6P_qP-TaN-...
It is due for submission tomorrow, Sep 1, at 23:59 UTC which is (1 minute before) 8pm US Eastern Time (if I understand the time zones the same way as TDWG does), so there's not a whole lot of time for changes. That said, please don't hesitate if you have any suggestions, and we aren't quite up to the 3,500 character limit yet, so adding on is still possible too.
-Nico & Hilmar
On Aug 26, 2010, at 9:40 PM, Nico Cellinese wrote:
Hi all,
we have reserved one full day for a Phylogenetics Standards hands- on activity at the upcoming TDWG conference. We asked for power strips and wireless network and tables that we can sit around and work together. The only thing that's missing right now is what exactly we should target. The options are wide open right now, with the two only major constraints being that (a) we have no funding this year to bring people in who wouldn't otherwise be there, and (b) abstract submission deadline is next Wednesday.
Any feedback or ideas, wild or not, that you have would be welcome
- send those our way.
And BTW currently this workshops seems to be placed on the Wednesday of the conference week, so those of you blitzing the local biosphere will unfortunately have a conflict.
Cheers, Nico & Hilmar
<><><><><><><><><><><><><><><><><><><><> Nico Cellinese, Ph.D. Assistant Curator, Herbarium & Informatics Adjunct Assistant Professor, Department of Biology
Florida Museum of Natural History University of Florida 354 Dickinson Hall, PO Box 117800 Gainesville, FL 32611-7800, U.S.A. Tel. 352-273-1979 Fax 352-846-1861 http://www.flmnh.ufl.edu
--
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
participants (6)
-
Arlin Stoltzfus
-
Blum, Stan
-
Dan Rosauer
-
Hilmar Lapp
-
Nico Cellinese
-
Roderic Page