Re: [tdwg-content] Name is species concept thinking

13 Jun 2010

      Tim: Coffee time.

Dave:

Here's how I imagine this would work under GNA, integrated with GBIF:

1. Person submits text-string "Puma concolor" to a GNA-aware mapping
service.  

2. Service fires text string off to GNI, and sees how many lexical buckets
are involved, and how many protonyms are represented in those buckets.

3. If problems of Homonymy/Homography exist (i.e., if more than one
legitimate Protonym for a species-group name "concolor" has ever been
combined with a genus-group name "Puma"), then the service replies with a
page that says "Do you mean the big cat, or do you mean the protozoa?"
(pretending, for a moment, that the name "Puma concolor" has also been
applied to a protozoa).  Perhaps the service can also review the usage
history of the two names, and algorithmically determine that they most
likely meant the big cat -- but at least alert the user that a potential
case of homonymy/homography exists.

4. If step 2 yielded no apaprent homonymy/Homography, or if the user
selected one from among more than one Homonyms/Homographs, then the service
takes the selected ProtonymID and throws it at a GNUB-aware taxon concept
resolver.

5. The GNUB-aware Taxon Concept resolver looks at how many Taxon Concept
Service Providers (e.g., ITIS, EOL, WoRMS, etc.) have made some sort of
concept-definition assertion about the Protonym. In most cases, this
could/should be as simple as "Concept Service [X] says that for Protonym
[IDp], follow taxon name usage-instance [IDtnu]". Given [IDtnu], GNUB will
tell us which Genus combination to use, which orthographic spelling to use,
which taxon rank to use, and which set of Protonyms should be regarded as
subjective synonyms of the taxon concept represented by [IDtnu].  If the
different taxon concept providers (I call them "Meta Authorities") all agree
(i.e., each taxon concept provider yields the same set of ProtonymIDs), then
no user interaction is required on this step. If there are different
interpretations of what the current treatment of "Puma concolor [big cat]"
should be, then the user is presented with the different options (and
perhaps a bit of information on what the different active concepts are, in
terms of distribution and/or classification).

6. The resultant set of Protonym IDs from step 5 (the original ProtonymID
from step 2/3, plus the exploded set of Protonyms for subjective/hetrotypic
synonyms from step 5), are then thrown at GBIF (which would be GNA-Aware,
and thus know how to translate all the ProtonymIDs into a larger set of
text-string names and/or GBIF may have already cashed this by converting
text-string names from occurrence providers into ProtonymIDs via GNI).

7. The user is then presented with a distributional map from GBIF occurrence
records, based on the selected Protonym of the original submitted
text-string name, cast in the context of the set of heterotypic synonyms
established in Step 5.

The bad news is that this sounds incredibly complicated.  The good news is
that it's actually not.  Especially not from the user's perspective.

In the WORST case scenario, the user needs to provide three pieces of
information:

1. The text-string name submitted in Step 1.

2. A decision in the case of Homonyms/Homographs, what critter/weed/microbe
they're after.

3. A decision about which Meta Authority to follow for the taxon concept.

This, again, is the WORST case scenario.  A much more likely scenario
involves fewere steps for the end user.

Consider:

Step 2 only applies in the 10%(ish) cases of text-string names involved in
some sort of Homonymy/Homography problem.  So in 90%(ish) of cases, step 2
won't come into play.

Step 3 only applies in cases where the Meta-Authorities disagree on the
current usage of a name (e.g., ITIS is a lumper, WoRMS is a splitter).  Even
in cases where there is disagreement, the user could simply be presnted with
two (or more) maps, showing each of the current interpretations/statuses of
the selected critter/weed.  For example, the user might get a page that says
"If you follow the ITIS interpretation of this species, the map looks like
this. If you follow the WoRMS interpretation of the name, the map looks like
that."

And, indeed, Step 1 wouldn't exist in the majority of cases, because I
suspect most people will get to the Map service by clicking on a link from
some web page article or database system.  In most cases, this link would
also bypass Step 2 as well.

In other words, if we can continue to develop GNA the way we're already
developing it, we should be able to get the the point (Soon!) where a user
clicks a link on a web page, and immediately gets a single map distribution
using the taxon concpet adopted by the overwhelming majority of
Meta-Authorities, or (at worst) gets more than one map based on more-than
one contemporary/contentious views of what the species concept should be
(with links to more information, if the user wants the details).

So, if we keep building GNA, we should have exactly the service that Pete
says he'd like to have (i.e., a single map with the full distribution of the
species, regardless of what text-string name is used to lable the georef'd
occurrence data-points).

Simple, really....

:-)

Rich
...
-----Original Message-----
From: David Remsen (GBIF) [mailto:dremsen@gbif.org] 
Sent: Saturday, June 12, 2010 10:50 AM
To: Peter DeVries
Cc: David Remsen (GBIF); Richard Pyle; 
tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper
Subject: Re: [tdwg-content] Name is species concept thinking
Pete -
This statement has been sticking with me since I read it.   It might  
be me but I don't see any relationship between that statement 
and how  
this relates to taxon concepts.   In a concept-based system 
you could  
easily have two different maps for Puma concolor.    Whether Felis  
concolor is included is not relevant because nomenclatural 
synonyms have no bearing on the circumscription.  They are 
both names for the same type.
There may be two different concepts (circumscriptions) published for  
Aedes triseriatus.   It could be quite legit for a different  
(objective synonym only) name like Oclerotatus triseriatus to 
refer to  
that same concept.  So in that sense,   there is a rationale for  
different scientific names to be able to reference the same 
concept to  
meet that requirement of the example you cite.   But in 
zoology these  
examples aren't even considered different names and the rule 
of priority would prevent truly different (heterotypic names) 
from referring to the same type so the use cases for 
different scientific names being able to refer to a single 
concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for 
providing a single map for those examples you cite but it's 
not using true concept-based principles.
Best,
David
...
Frankly I think it would be an improvement if we could get maps etc 
that combine Aedes triseriatus / Ochlerotatus triseriatus
into one map
...
and Felis concolor and Puma concolor into a different 
single map. :-)
Respectfully,
- Pete