Re: [tdwg-content] [Fwd: Re: most GUIDs/URIs for names/taxon stuff not ready for prime time]

11 Jan 2011

      Hi Peter et al.,

You should look at this paper  *A Gross Anatomy Ontology for Hymenoptera*
 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0015991

<http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0015991>It
is an example of how one might create an ontology that helps clear up the
meaning behind all these anatomical terms etc.

One of the things that struck me was we seem to think along similar lines
but in different domains.

   *"Though counterintuitive to some, the development of anatomy ontologies
proceeds more effectively without reference to homology. Circumscriptions of
classes in an ontology must first and foremost (at least within the goals of
the HAO, and we feel in general) allow for the identification of instances
of the class in question. These circumscriptions are crudely analogous to
engineering blueprints in that they allow a domain expert to identify, with
reference to an individual, some instance of a concept (e.g. the anterior
ocellus on the specimen identified with the identifier NCSU 1234). Another
central reason for decoupling anatomy from phylogeny (homology) is that it
maximizes the potential cross-domain application of the ontology."*

In other words, the concept comes first and then you attach names to that
concept.

Respectfully,

- Pete

On Mon, Jan 10, 2011 at 5:17 PM, Peter Stevens <peter.stevens@mobot.org>wrote:
...
To Donat's message also - absolutely, and it is because we cannot do this
that we are floundering so much. Because all we have are assertions - that
some dignify as "hypotheses" - that such and such specimens go in a species
(and in many floras you don't get even that), taxonomists simply cannot
readily build on the past. When the measurements we make are linked to the
specimens we observe, then we will be in business. A solution is to be a
neo-Linnaeus - i.e., we need to enable things like names in current use that
allow us to ignore earlier literature and the headaches that interpreting it
cause us, or we can go to that literature for any useful information it may
contain, but it will not have any nomenclatural implications.
OK, I know I am glossing over a great deal. I am not suggesting that the
only people who can be neo-linnaeans are those who have carried out
multivariate analyses and have their data in a public repository - most of
us, myself included, do not carry out such analyses, not are they necessary
in all cases.  Digitisation of all literature is great, but why i as a
taxonomist should have to go back in perpetuity to check out whether
Zollinger or Miiquel's concept of a species is the same as mine, let alone
what they actually meant by lanecolate,  is beyond me (unfortunately, Vernon
Heywood has pre-empted the sisyphean title). Ditto about image digitization,
but I wait for the automatic recording of simple morphological data from
these.
I have to have all this worked out in six weeks or so for Berlin....
P.
On Jan 9, 2011, at 9:25 AM, Peter DeVries wrote:
Hi Donat,
On Sun, Jan 9, 2011 at 4:55 AM, Donat Agosti <agosti@amnh.org> wrote:
...
A simple solution would be that in future to each usage of a name in
taxonomic literature, at least for treatments, the name has to be
explicitly linked to the underlying specimen or observation data. Since
this is part of a treatment, the identifier should allow to retrieve the
treatment and through it the specimen data or at least the observation
data and the author who used it in this specific selection.
This is the goal I have been trying to achieve with TaxonConcept.org.
An open, resolvable, identifier that provides some information as to what
the entity is.
With the use of namespaces these can be short and easy to use.
Puma concolor se:v6n7p - the current scientific name followed by the
asserted concept.
They are currently not as informative as I would like, but allow linked
descriptions, revisions, type specimens, DNA etc.
Eventually the curation of these concepts would be done by taxon experts,
who should receive some sort of academically meaningful credit.
This credit could be easily tracked via Linked Open Data identifiers.
As I have said many times on the list, I would like to work with out in
conjunction with GBIF, EoL and the rest of the community.
I was not involved in the planning or development of the GNI, my role was
simply to test ways of connecting various names to some sort of "concept"
using the Linked Open Data approach. At the TDWG meeting we were able to
show that this could be used to create a queryable synonymy knowledge base.
As I have said before, I think it is best to start with concepts and then
connect that concept to the various related names.
Like Rod, I also like uBio and I am sure he would support moving those
resources to an updated vocabulary that does not use LSID's.
Respectfully,
- Pete
...
Donat
...
Why aren't identifiers reused?
----------------------------------------
Because in most cases they offer no added value. If I have a ITIS TSN
there's not much I can do with it. I can get a name from ITIS (with
some vague assurance that thus name is accepted - or not - with no
evidence for this assertion). I can think of only two taxonomic
identifiers that have real value and get much reuse, NCBI taxonomy ids
and uBio NameBankIDs. NCBI ids are reused because they underpin the
genomics databases, and genomics does real computational biology, and
makes extensive reuse of data (as exemplified by the annual Nucleic
Acids Research database issue). uBio NameBankIDs get reused because
uBio has lots of names, and provides services for discovering those
names in text (see, e.g., their use by BHL).
Few taxonomic name databases provide a compelling reason for anyone to
use their identifiers, most being digital sinks (you go there, get an
identifier for a name, and nothing else).
Why UUIDS?
----------------
UUIDs are ugly, and solve a problem that for the most part we don't
have. They are ideal for minting globally unique identifiers in a
distributed system, but we don't have distributed systems. Catalogue
of Life uses UUIDs, but these are centrally created (I suspect using
MySQL's UUID function, given how similar the UUIDs are to each other).
ZooBank uses them, but it is not (yet) a distributed system. If the
Catalogue of Life were genuinely a distributed system UUIDs would make
sense, but that's not actually how it works.
I think users would cope with UUIDs if the databases using them
provides clear value. For example, MusicBrainz uses UUIDs
http://musicbrainz.org/artist/563201cb-721c-4cfb-acca-c1ba69e3d1fb.html
, as does Mendeley, the latter hiding them from users via human-
readable URLs. Given that we have obvious user-friendly candidates for
URLs (taxonomic names), it would be trivial to hide UUIDs in names
(making homonyms distinct by adding authorship, or whatever it took to
make them unique as strings).
What, if anything, is a taxonomic name?
----------------------------------------------------
In my experience, when non-taxonomists meet taxonomists things get
ugly. For example, a publisher wanting to mark up taxonomic names in
text might ask taxonomists how to do this,and within minutes the
taxonomists are off into discussions of namestrings versus usages
versus concepts and pretty soon the publisher deeply regrets ever
asking the question. I've been at meetings where the look in
publishers' eyes said "run away, run away".
Part of the reason we have multiple databases is because different
projects are capturing different things (roughly speaking, uBio is
mostly about namestrings, Catalogue of Life is about concepts, IPNI
and ZooBank are about first usage of a name, etc.)
Most users outside our field won't give a damn about the niceties of
these distinctions, yet we persist in discussing them ad nauseam.
Until we provide a single, very simple service that takes a name
string and hides all this complexity (unless the user chooses to see
the gory details) while still providing useful information, we will be
stuck in multiple identifier hell. The tragedy is  we've never had
more people genuinely interested in linking to names than at present
-- publishers are desperately trying to add "semantic value" to their
content, and we are spectacularly ill-equipped to deliver this (and
it's our own fault).
I rather suspect we're rapidly approaching the point where users
outside taxonomy will simply say "to hell with these taxonomists,
let's just use Wikipedia and be done with it."
Regards
Rod
---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962@aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Dr. Donat Agosti
Research Associate, American Museum of Natural History and Smithsonian
Institution
Email: agosti@amnh.org
Web: http://antbase.org
CV: http://research.amnh.org/entomology/social_insects/agosticv_2003.html
Swiss Residence
Elahieh
Ave. Khazer no. 74
19649 Teheran
Iran
+98-21-2200 8765 (office)
+98-21-2260 6160 (home)
+98-919-489 2744 (mobile)
+1-202-558 0330 (skype-in US)
+41-44-5862911 (skype-in Switzerland)
--
Dr. Donat Agosti
Research Associate, American Museum of Natural History and Smithsonian
Institution
Email: agosti@amnh.org
Web: http://antbase.org
CV: http://research.amnh.org/entomology/social_insects/agosticv_2003.html
Swiss Residence
Elahieh
Ave. Khazer no. 74
19649 Teheran
Iran
+98-21-2200 8765 (office)
+98-21-2260 6160 (home)
+98-919-489 2744 (mobile)
+1-202-558 0330 (skype-in US)
+41-44-5862911 (skype-in Switzerland)
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
------------------------------------------------------------
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- 
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
------------------------------------------------------------