Re: [tdwg-content] Name is species concept thinking

13 Jun 2010

      Rich

What you described in 1-5 was exactly the scope and function of uBio  
NameBank and ClassificationBank.   This functionality has been refined  
in our ChecklistBank index.

It serves to provide a consistent resolution service for "Taxon  
Concept Service Providers".
By linking to a populated GNUB it would also have an improved means to  
provide the protonym circumscription of the concept, as you describe  
in (5).   In addition,  we would like to support the inclusion of  
bibliographic data, specimens,  geospatial information, and general  
descriptive data.   The DwC Archive approach provides one (not  
exclusively but I would appreciate pointers to others) means to  
mobilise these data from people who have it.

In (5) you describe the protonym-based circumscription to evaluate the  
relative agreement of the identified concepts (via 'meta- 
authorities').    This provides the basis for expanding the potential  
set of names for a subsequent data retrieval from GBIF (for example)  
to include all the related nomenclatural and lexical variants for  
those names (of course checking for homonym conflicts among them).    
Again, this is consistent to what was implemented in uBio services and  
we are currently implementing in our Checklist Bank (CLB)  (I use the  
term General Concept Mapping for this process).   I'm not sure I agree  
that this provides a true concept-based system, however.  I would call  
it a concept-informed system.

In (6) it appears the output of the Taxon Concept resolution process  
is either an expanded set of name strings or an array of  
protonymIDs.   I can see this is an option in (6).  If the latter,  I  
can see how this would provide a more precise concept-informed but  
name-based retrieval method and probably the best we can expect from  
large indices like GBIF.    But I don't see how it will support a  
strict concept-based retrieval.

The real world example that forms my litmus test is the blue-headed  
vireo,  Vireo solitarius (Wilson 1810) which was originally called  
Muscicapa solitaria and has also been combined to form Vireosylvia  
solitaria and Lanivireo solitarius.   Of course there are lexical  
variants as well (Google "Lanivireo solitaria" for example).   These,  
properly structured, would be the sort of useful set of lexical/ 
nomenclatural content I would hope as a response from a  GNI/GNUB  
resolution service based on protonymID.

One current view of the taxon (concept C1) has this species occupying  
the eastern part of the US.   Another species, Vireo plumbeus Coues,  
1866, (concept C2) occupies the middle west USA, and a third species,  
Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western  
coast.

Another view lumps all three of these into a single species which,  
based on the rule of priority, has the valid name Vireo solitarius   
and results in a new concept (C4).  This concept includes C1, C2,  and  
C3.   Both concepts have the scientific name of Vireo solitarius.

We can access and represent these in a consistent fashion using our  
CLB and probably others can too in their own index models.

So, now we have a specimen of Vireo solitarius that was captured in  
Minnesota.   It might be an errant instance of C1, Vireo solitarius  
sensu stricto, that strayed a bit west of normal.   It might be (C4)  
Vireo solitarius, sensu lato.     The specimen would need that concept  
identifier tied to the record to make this explicit.    So,  let's say  
that the identifier was made using the lumped concept (C4).  Of  
course, if this doesn't make it into the record, we are stuck with the  
name alone.

Using the method (6) you described would allow a user to discover the  
different treatments of Vireo solitarius (C1 and C4) and provide some  
means to discriminate them via concept resolution.

- C4 includes C1, C2, and C3 which would include all the names above.
- C1 would only include the nomenclatural/lexical variants for Vireo  
solitarius.

Resolution will enable us to perform a significantly more useful and  
concept-informed search.  It will, however,  include the specimen I  
referenced above in BOTH cases because "Vireo solitarius" or it's  
protonymID will be a search term in both cases.

A more precise concept based system would utilise a required taxon  
concept identifier in the specimen record to discriminate different  
uses of the SAME NAME.  In other words,  if you did a search of Vireo  
solitarius and the concept resolver indicated the different concepts  
above and you chose the sensu stricto (split) version,  you would get  
the C1 labelled records but the C4 labelled records would be excluded  
or at least come with a warning (may not be what you are looking  
for).  This of course requires our specimen records to have a concept  
identifier.   Or,  the concept definition itself will include  
additional annotations to enable us to make inferences

Ex.,

Publication date of the concept - If the split didn't happen until  
1980 and the specimen is from 1960 then we could infer C4.
Distribution information for the concept - if we disregard errant  
specimens then we might infer a 1985 Minnesota specimen is a C2 in  
spite of the different name.

In sum,  we are on track for achieving this and I believe our data  
mobilisation strategy will support getting these sort of data  
published.   When Markus returns from paternity leave I would hope we  
could include his thoughts on how we might expose these as RDF via our  
indices to support all aspects of this discussion.

David

On Jun 13, 2010, at 2:37 AM, Richard Pyle wrote:
...
Tim: Coffee time.
Dave:
Here's how I imagine this would work under GNA, integrated with GBIF:
1. Person submits text-string "Puma concolor" to a GNA-aware mapping
service.
2. Service fires text string off to GNI, and sees how many lexical  
buckets
are involved, and how many protonyms are represented in those buckets.
3. If problems of Homonymy/Homography exist (i.e., if more than one
legitimate Protonym for a species-group name "concolor" has ever been
combined with a genus-group name "Puma"), then the service replies  
with a
page that says "Do you mean the big cat, or do you mean the protozoa?"
(pretending, for a moment, that the name "Puma concolor" has also been
applied to a protozoa).  Perhaps the service can also review the usage
history of the two names, and algorithmically determine that they most
likely meant the big cat -- but at least alert the user that a  
potential
case of homonymy/homography exists.
4. If step 2 yielded no apaprent homonymy/Homography, or if the user
selected one from among more than one Homonyms/Homographs, then the  
service
takes the selected ProtonymID and throws it at a GNUB-aware taxon  
concept
resolver.
5. The GNUB-aware Taxon Concept resolver looks at how many Taxon  
Concept
Service Providers (e.g., ITIS, EOL, WoRMS, etc.) have made some sort  
of
concept-definition assertion about the Protonym. In most cases, this
could/should be as simple as "Concept Service [X] says that for  
Protonym
[IDp], follow taxon name usage-instance [IDtnu]". Given [IDtnu],  
GNUB will
tell us which Genus combination to use, which orthographic spelling  
to use,
which taxon rank to use, and which set of Protonyms should be  
regarded as
subjective synonyms of the taxon concept represented by [IDtnu].  If  
the
different taxon concept providers (I call them "Meta Authorities")  
all agree
(i.e., each taxon concept provider yields the same set of  
ProtonymIDs), then
no user interaction is required on this step. If there are different
interpretations of what the current treatment of "Puma concolor [big  
cat]"
should be, then the user is presented with the different options (and
perhaps a bit of information on what the different active concepts  
are, in
terms of distribution and/or classification).
6. The resultant set of Protonym IDs from step 5 (the original  
ProtonymID
from step 2/3, plus the exploded set of Protonyms for subjective/ 
hetrotypic
synonyms from step 5), are then thrown at GBIF (which would be GNA- 
Aware,
and thus know how to translate all the ProtonymIDs into a larger set  
of
text-string names and/or GBIF may have already cashed this by  
converting
text-string names from occurrence providers into ProtonymIDs via GNI).
7. The user is then presented with a distributional map from GBIF  
occurrence
records, based on the selected Protonym of the original submitted
text-string name, cast in the context of the set of heterotypic  
synonyms
established in Step 5.
The bad news is that this sounds incredibly complicated.  The good  
news is
that it's actually not.  Especially not from the user's perspective.
In the WORST case scenario, the user needs to provide three pieces of
information:
1. The text-string name submitted in Step 1.
2. A decision in the case of Homonyms/Homographs, what critter/weed/ 
microbe
they're after.
3. A decision about which Meta Authority to follow for the taxon  
concept.
This, again, is the WORST case scenario.  A much more likely scenario
involves fewere steps for the end user.
Consider:
Step 2 only applies in the 10%(ish) cases of text-string names  
involved in
some sort of Homonymy/Homography problem.  So in 90%(ish) of cases,  
step 2
won't come into play.
Step 3 only applies in cases where the Meta-Authorities disagree on  
the
current usage of a name (e.g., ITIS is a lumper, WoRMS is a  
splitter).  Even
in cases where there is disagreement, the user could simply be  
presnted with
two (or more) maps, showing each of the current interpretations/ 
statuses of
the selected critter/weed.  For example, the user might get a page  
that says
"If you follow the ITIS interpretation of this species, the map  
looks like
this. If you follow the WoRMS interpretation of the name, the map  
looks like
that."
And, indeed, Step 1 wouldn't exist in the majority of cases, because I
suspect most people will get to the Map service by clicking on a  
link from
some web page article or database system.  In most cases, this link  
would
also bypass Step 2 as well.
In other words, if we can continue to develop GNA the way we're  
already
developing it, we should be able to get the the point (Soon!) where  
a user
clicks a link on a web page, and immediately gets a single map  
distribution
using the taxon concpet adopted by the overwhelming majority of
Meta-Authorities, or (at worst) gets more than one map based on more- 
than
one contemporary/contentious views of what the species concept  
should be
(with links to more information, if the user wants the details).
So, if we keep building GNA, we should have exactly the service that  
Pete
says he'd like to have (i.e., a single map with the full  
distribution of the
species, regardless of what text-string name is used to lable the  
georef'd
occurrence data-points).
Simple, really....
:-)
Rich
...
-----Original Message-----
From: David Remsen (GBIF) [mailto:dremsen@gbif.org]
Sent: Saturday, June 12, 2010 10:50 AM
To: Peter DeVries
Cc: David Remsen (GBIF); Richard Pyle;
tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper
Subject: Re: [tdwg-content] Name is species concept thinking
Pete -
This statement has been sticking with me since I read it.   It might
be me but I don't see any relationship between that statement
and how
this relates to taxon concepts.   In a concept-based system
you could
easily have two different maps for Puma concolor.    Whether Felis
concolor is included is not relevant because nomenclatural
synonyms have no bearing on the circumscription.  They are
both names for the same type.
There may be two different concepts (circumscriptions) published for
Aedes triseriatus.   It could be quite legit for a different
(objective synonym only) name like Oclerotatus triseriatus to
refer to
that same concept.  So in that sense,   there is a rationale for
different scientific names to be able to reference the same
concept to
meet that requirement of the example you cite.   But in
zoology these
examples aren't even considered different names and the rule
of priority would prevent truly different (heterotypic names)
from referring to the same type so the use cases for
different scientific names being able to refer to a single
concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for
providing a single map for those examples you cite but it's
not using true concept-based principles.
Best,
David
...
Frankly I think it would be an improvement if we could get maps etc
that combine Aedes triseriatus / Ochlerotatus triseriatus
into one map
...
and Felis concolor and Puma concolor into a different
single map. :-)
Respectfully,
- Pete