[tdwg-content] Darwin Core and phylogenies

Roderic Page Roderic.Page at glasgow.ac.uk
Thu Apr 16 12:18:48 CEST 2015

My apologies in advance if this has come up before.

Historically it seems that phylogenetics and classical biodiversity informatics live in separate worlds, for a bunch of reasons. One area they clearly intersect though is “phylogeography” or “geophylogenies”.

I’ve been playing simple ways to create and visualise geophylogenies  (e.g., http://iphylo.blogspot.co.uk/2015/01/geojson-and-geophylogenies.html ) and it occurs to me that it might be useful to have a Darwin Core-style way of encoding these. Phylogenetic data is typically stored very differently from the simple row-based approach adopted by Darwin Core (e.g., NEXUS format, XML [yuck], or JSON [sometimes yucky if it’s full of XML-style baggage, but I digress]). This makes it a challenge to integrate phylogenetic data with projects such as GBIF.

This lack of integration is regrettable, anyone looking at the rise of phylogeography and DNA barcoding will recognise that many classically-defined species are poor representations of actual biodiversity, even in taxa that we think we know quite well (e.g., vertebrates).

Given that we can encode a phylogenetic tree as a set of rows, each one representing an edge in the tree (what those of a certain age would call and "ancestor function”), it would be easy to have a Darwin Core extension that had one row per node in the tree, optionally with branch lengths, and leaf nodes also labelled by occurrence id (the same id used in the standard occurrence table, where ideally most if not all would be georeferenced). In this way we could add geophylogenies to Darwin Core archives, and existing tools for importing these (e.g., GBIF) could simply ignore the phylogeny until such times as they can handle them, but other tools could read the trees and display them.

Is anyone doing something along these lines?



