Re: [tdwg-content] Name is species concept thinking
Warning to Tim: Go get your cup of tea before reading....
Hi Jerry,
I believe what you describe:
I think we do need an agreed way for identifying a 'set of name/article pointers' that define a useful grouping (which I would hesitate to call a concept). I don't think an open-ended linked data chain does provide that defined grouping.
...is already built into the GNUB data model.
Starting with a particular TaxonNameUsageID instance, we can directly get a set of Protonyms that are asserted in the Usage to be included within the taxon. From these Protonyms, we can explode out as far as you want to go.
The limitation, however, is that GNUB only includes the explict facts, not the interpreted meanings. In other words, we may know that Smith 1955 regarded Aus cus, Aus dus, and Aus eus as synonyms of Aus bus; but if he never mentioned Aus xus (either as another synonym, or as a distinct species) we can't know whether his circumscription of Aus bus would have included the type and implied other members of Aus xus. So, the facts alone don't cut it.
So if Smith treated Aus bus as follows (synonym indended below asserted valid name):
Aus bus L. - Aus bus L. - Aus cus Jones - Aus dus Brown - Aus eus Lamarck
And Pyle treated Aus bus as
Aus bus L. - Aus bus L. - Aus cus Jones - Aus dus Brown - Aus eus Lamarck - Aus xus Cooper
[without any mention of Smith's treatment of Aus bus]
...then we need a third party to assert whether or not "Aus bus sec. Smith" and "Aus bus sec. Pyle" are congruent.
With our new GNUB data model, we *could* represent these third-party assertions; but my gut feeling is that the third-party assertions should be external to the core GNUB model. But this is, of course, open for discussion.
Pete's point about the two different name/article intersections referring to the same 'concept' is resolved by the fact they are based on the same type, and that issue I prefer to see resolved at the nomenclatural level by protologue & type-collection pointers (as in the GNUB model).
Types don't do it for us. Using my example above, suppose we have DeVries' treatment of the Aus bus complex as follows.
Aus bus L. - Aus bus L. - Aus cus Jones - Aus dus Brown
Aus eus Lamarck - Aus xus Cooper
That is, he treated Aus bus and Aus eus as valid species, and included Aus xus as a synonym of the latter.
In all three treatements of Aus bus (Smith, Pyle, DeVries), the name "Aus bus" shares the same type -- but that doesn't mean that all three had the same taxon concept (circumscription).
So, in my mind, the real question is:
Do we need a separate "taxonConceptID" (as in the TDWG version of DwC) that we can use to brand the abstract Concept? Or are we able to assemble the same results from individual usage-instance mappings?
So, let's say that we are confident that Smith and Pyle both had the same idea for the taxon concept/circumscription of Aus bus (it's just that Smith forgot to list Aus xus in his synonymy). Do we need a single taxonConceptID to express the fact that they are the same taxon concept? Or can we derive that easily enough from a third-party assertion of the congruency of the concepts represented by the two usage-instances? Presumably we would also have separate taxonConceptID's for "Aus bus sec. DeVries", and for "Aus eus sec. DeVries".
I think it would help if we took a step back from using the term 'taxon concept' and agreed on what we are trying to achieve by linking/grouping the various constructs, and then arrive at a more precisely defined vocabulary for name/article intersections, and the open-ended universe of related stuff.
AGREED!!!
I suspect we will find that different end-user groups (e.g. hard core nomenclaturalists, nomenclaturally savvy taxonomists, most taxonomists, and the most important group ... non taxonomic savvy end-users of taxonomic services) all have differing and overlapping requirements, and a different understanding of the words being used.
NO DOUBT!
Despite my peripheral involvement in taxon concept space for many years I suspect the above comments reflect a deep seated blinkered view that stops me seeing how it should work given the existing vocabulary!
ME TOO! :-)
Rich
Richard and Jerry are getting at something that we need to think about.
There will be differences in what people think is an ideal species concept model.
Depending on the needs of different groups they may need some different conceptualization.
I recognize the utility in documenting the different conceptualizations over time.
There are however a lot of people who are more interested in these kinds of relationships.
<SnowShoeHare> <preyItemOf> <NorthAmericanLynx>
<Ochlerotatus_triseriatus> <documentedVectorOf> <LaCrosseEncephalitis>
A number of the records that have been submitted to GBIF etc are from groups that are primarily thinking about species concepts in this way.
In fact traditional taxonomy does not have much information that helps separate those individuals that can be vectors or pathogens and those that cannot.
For these kinds of users it might be more useful to have an open machine interpretable document that can be used to determine the criteria for what species concepts are good matches for a specimen and what species concepts are not good matches.
In this sense it is a way to agree on what are the characteristics (and the variability in those characteristics) that assign specimen x most closely with species concept Y.
I think that the species concepts that you describe in GNUB will allow you to capture the variations in conscriptions you are talking about.
I also think that the general things that Dima and I are working through will also help either directly though the TaxonConcept concepts or figuring out the best way to get the GNUB concepts working as you describe in a triplestore. I think they maybe thinking that the TaxonConcept concepts might help them determine how best to RDF the GNUB concepts if they will be different.
What proportion of currently described taxa have these types of overlapping conscriptions? I know of several insects like this, but what about mammals like the Cougar?
I hope that people are willing to accept that there maybe different types of species concepts depending on how they are intended to be used.
We may not be able to agree on what a species is but I think we might be able get to the point where it is clearer when we state that identifer X means instances with these somewhat variable set of characteristics.
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus into one map and Felis concolor and Puma concolor into a different single map. :-)
Respectfully,
- Pete
On Thu, Jun 10, 2010 at 5:15 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Warning to Tim: Go get your cup of tea before reading....
Hi Jerry,
I believe what you describe:
I think we do need an agreed way for identifying a 'set of name/article pointers' that define a useful grouping (which I would hesitate to call a concept). I don't think an open-ended linked data chain does provide that defined grouping.
...is already built into the GNUB data model.
Starting with a particular TaxonNameUsageID instance, we can directly get a set of Protonyms that are asserted in the Usage to be included within the taxon. From these Protonyms, we can explode out as far as you want to go.
The limitation, however, is that GNUB only includes the explict facts, not the interpreted meanings. In other words, we may know that Smith 1955 regarded Aus cus, Aus dus, and Aus eus as synonyms of Aus bus; but if he never mentioned Aus xus (either as another synonym, or as a distinct species) we can't know whether his circumscription of Aus bus would have included the type and implied other members of Aus xus. So, the facts alone don't cut it.
So if Smith treated Aus bus as follows (synonym indended below asserted valid name):
Aus bus L.
- Aus bus L.
- Aus cus Jones
- Aus dus Brown
- Aus eus Lamarck
And Pyle treated Aus bus as
Aus bus L.
- Aus bus L.
- Aus cus Jones
- Aus dus Brown
- Aus eus Lamarck
- Aus xus Cooper
[without any mention of Smith's treatment of Aus bus]
...then we need a third party to assert whether or not "Aus bus sec. Smith" and "Aus bus sec. Pyle" are congruent.
With our new GNUB data model, we *could* represent these third-party assertions; but my gut feeling is that the third-party assertions should be external to the core GNUB model. But this is, of course, open for discussion.
Pete's point about the two different name/article intersections referring to the same 'concept' is resolved by the fact they are based on the same type, and that issue I prefer to see resolved at the nomenclatural level by protologue & type-collection pointers (as in the GNUB model).
Types don't do it for us. Using my example above, suppose we have DeVries' treatment of the Aus bus complex as follows.
Aus bus L.
- Aus bus L.
- Aus cus Jones
- Aus dus Brown
Aus eus Lamarck
- Aus xus Cooper
That is, he treated Aus bus and Aus eus as valid species, and included Aus xus as a synonym of the latter.
In all three treatements of Aus bus (Smith, Pyle, DeVries), the name "Aus bus" shares the same type -- but that doesn't mean that all three had the same taxon concept (circumscription).
So, in my mind, the real question is:
Do we need a separate "taxonConceptID" (as in the TDWG version of DwC) that we can use to brand the abstract Concept? Or are we able to assemble the same results from individual usage-instance mappings?
So, let's say that we are confident that Smith and Pyle both had the same idea for the taxon concept/circumscription of Aus bus (it's just that Smith forgot to list Aus xus in his synonymy). Do we need a single taxonConceptID to express the fact that they are the same taxon concept? Or can we derive that easily enough from a third-party assertion of the congruency of the concepts represented by the two usage-instances? Presumably we would also have separate taxonConceptID's for "Aus bus sec. DeVries", and for "Aus eus sec. DeVries".
I think it would help if we took a step back from using the term 'taxon concept' and agreed on what we are trying to achieve by linking/grouping the various constructs, and then arrive at a more precisely defined vocabulary for name/article intersections, and the open-ended universe of related stuff.
AGREED!!!
I suspect we will find that different end-user groups (e.g. hard core nomenclaturalists, nomenclaturally savvy taxonomists, most taxonomists, and the most important group ... non taxonomic savvy end-users of taxonomic services) all have differing and overlapping requirements, and a different understanding of the words being used.
NO DOUBT!
Despite my peripheral involvement in taxon concept space for many years I suspect the above comments reflect a deep seated blinkered view that stops me seeing how it should work given the existing vocabulary!
ME TOO! :-)
Rich
This is sounding scarily close to my Organism Concept idea I had a while back. ;-)
From: Peter DeVries [mailto:pete.devries@gmail.com] Sent: Friday, 11 June 2010 11:34 a.m. To: Richard Pyle Cc: Jerry Cooper; Kevin Richards; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Name is species concept thinking
Richard and Jerry are getting at something that we need to think about.
There will be differences in what people think is an ideal species concept model.
Depending on the needs of different groups they may need some different conceptualization.
I recognize the utility in documenting the different conceptualizations over time.
There are however a lot of people who are more interested in these kinds of relationships.
<SnowShoeHare> <preyItemOf> <NorthAmericanLynx>
<Ochlerotatus_triseriatus> <documentedVectorOf> <LaCrosseEncephalitis>
A number of the records that have been submitted to GBIF etc are from groups that are primarily thinking about species concepts in this way.
In fact traditional taxonomy does not have much information that helps separate those individuals that can be vectors or pathogens and those that cannot.
For these kinds of users it might be more useful to have an open machine interpretable document that can be used to determine the criteria for what species concepts are good matches for a specimen and what species concepts are not good matches.
In this sense it is a way to agree on what are the characteristics (and the variability in those characteristics) that assign specimen x most closely with species concept Y.
I think that the species concepts that you describe in GNUB will allow you to capture the variations in conscriptions you are talking about.
I also think that the general things that Dima and I are working through will also help either directly though the TaxonConcept concepts or figuring out the best way to get the GNUB concepts working as you describe in a triplestore. I think they maybe thinking that the TaxonConcept concepts might help them determine how best to RDF the GNUB concepts if they will be different.
What proportion of currently described taxa have these types of overlapping conscriptions? I know of several insects like this, but what about mammals like the Cougar?
I hope that people are willing to accept that there maybe different types of species concepts depending on how they are intended to be used.
We may not be able to agree on what a species is but I think we might be able get to the point where it is clearer when we state that identifer X means instances with these somewhat variable set of characteristics.
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus into one map and Felis concolor and Puma concolor into a different single map. :-)
Respectfully,
- Pete
On Thu, Jun 10, 2010 at 5:15 PM, Richard Pyle <deepreef@bishopmuseum.orgmailto:deepreef@bishopmuseum.org> wrote: Warning to Tim: Go get your cup of tea before reading....
Hi Jerry,
I believe what you describe:
I think we do need an agreed way for identifying a 'set of name/article pointers' that define a useful grouping (which I would hesitate to call a concept). I don't think an open-ended linked data chain does provide that defined grouping.
...is already built into the GNUB data model.
Starting with a particular TaxonNameUsageID instance, we can directly get a set of Protonyms that are asserted in the Usage to be included within the taxon. From these Protonyms, we can explode out as far as you want to go.
The limitation, however, is that GNUB only includes the explict facts, not the interpreted meanings. In other words, we may know that Smith 1955 regarded Aus cus, Aus dus, and Aus eus as synonyms of Aus bus; but if he never mentioned Aus xus (either as another synonym, or as a distinct species) we can't know whether his circumscription of Aus bus would have included the type and implied other members of Aus xus. So, the facts alone don't cut it.
So if Smith treated Aus bus as follows (synonym indended below asserted valid name):
Aus bus L. - Aus bus L. - Aus cus Jones - Aus dus Brown - Aus eus Lamarck
And Pyle treated Aus bus as
Aus bus L. - Aus bus L. - Aus cus Jones - Aus dus Brown - Aus eus Lamarck - Aus xus Cooper
[without any mention of Smith's treatment of Aus bus]
...then we need a third party to assert whether or not "Aus bus sec. Smith" and "Aus bus sec. Pyle" are congruent.
With our new GNUB data model, we *could* represent these third-party assertions; but my gut feeling is that the third-party assertions should be external to the core GNUB model. But this is, of course, open for discussion.
Pete's point about the two different name/article intersections referring to the same 'concept' is resolved by the fact they are based on the same type, and that issue I prefer to see resolved at the nomenclatural level by protologue & type-collection pointers (as in the GNUB model).
Types don't do it for us. Using my example above, suppose we have DeVries' treatment of the Aus bus complex as follows.
Aus bus L. - Aus bus L. - Aus cus Jones - Aus dus Brown
Aus eus Lamarck - Aus xus Cooper
That is, he treated Aus bus and Aus eus as valid species, and included Aus xus as a synonym of the latter.
In all three treatements of Aus bus (Smith, Pyle, DeVries), the name "Aus bus" shares the same type -- but that doesn't mean that all three had the same taxon concept (circumscription).
So, in my mind, the real question is:
Do we need a separate "taxonConceptID" (as in the TDWG version of DwC) that we can use to brand the abstract Concept? Or are we able to assemble the same results from individual usage-instance mappings?
So, let's say that we are confident that Smith and Pyle both had the same idea for the taxon concept/circumscription of Aus bus (it's just that Smith forgot to list Aus xus in his synonymy). Do we need a single taxonConceptID to express the fact that they are the same taxon concept? Or can we derive that easily enough from a third-party assertion of the congruency of the concepts represented by the two usage-instances? Presumably we would also have separate taxonConceptID's for "Aus bus sec. DeVries", and for "Aus eus sec. DeVries".
I think it would help if we took a step back from using the term 'taxon concept' and agreed on what we are trying to achieve by linking/grouping the various constructs, and then arrive at a more precisely defined vocabulary for name/article intersections, and the open-ended universe of related stuff.
AGREED!!!
I suspect we will find that different end-user groups (e.g. hard core nomenclaturalists, nomenclaturally savvy taxonomists, most taxonomists, and the most important group ... non taxonomic savvy end-users of taxonomic services) all have differing and overlapping requirements, and a different understanding of the words being used.
NO DOUBT!
Despite my peripheral involvement in taxon concept space for many years I suspect the above comments reflect a deep seated blinkered view that stops me seeing how it should work given the existing vocabulary!
ME TOO! :-)
Rich
-- ---------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base ------------------------------------------------------------
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Pete -
This statement has been sticking with me since I read it. It might be me but I don't see any relationship between that statement and how this relates to taxon concepts. In a concept-based system you could easily have two different maps for Puma concolor. Whether Felis concolor is included is not relevant because nomenclatural synonyms have no bearing on the circumscription. They are both names for the same type.
There may be two different concepts (circumscriptions) published for Aedes triseriatus. It could be quite legit for a different (objective synonym only) name like Oclerotatus triseriatus to refer to that same concept. So in that sense, there is a rationale for different scientific names to be able to reference the same concept to meet that requirement of the example you cite. But in zoology these examples aren't even considered different names and the rule of priority would prevent truly different (heterotypic names) from referring to the same type so the use cases for different scientific names being able to refer to a single concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for providing a single map for those examples you cite but it's not using true concept-based principles.
Best, David
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus into one map and Felis concolor and Puma concolor into a different single map. :-)
Respectfully,
- Pete
Hi David,
A while back on Taxacom someone stated that they considered the scientific name including author is the species concept.
I refer to this mind set as "Name is species concept thinking"
It was in reference to a discussion of whether species concepts were even needed.
It might be useful to step back a bit an consider all the data sets that touch on the idea of a species.
This includes: occurrence records, field notes, academic publications.
Many of these do not include the authority information they simply list the genus and species.
Few of these records are created by someone who has thought about the conscription of the specific species concept to which they are creating data.
Many use the name in the key or the name that those around them use, with little thought as to the original type specimens and original species description.
The original description for Ochlerotatus triseriatus is about a paragraph and could have actually been one of about 10 species.
As far as I can tell the original type specimen is missing.
Modeling the relationships between a large number of these data sets as if they are based on the idea that the data creator actually read the original species description and thought about the actual species conscription is inappropriate.
That said modeling relationships between taxonomic publications where the authors actually read the original species description, reviewed the type specimens, and thought about the actual species conscription is appropriate.
Also consider that a large proportion of specimens are misidentified, and it occurs to me that modeling things like species occurrences as if they are *Puma concolor *(Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
- Pete
On Sat, Jun 12, 2010 at 3:49 PM, David Remsen (GBIF) dremsen@gbif.orgwrote:
Pete -
This statement has been sticking with me since I read it. It might be me but I don't see any relationship between that statement and how this relates to taxon concepts. In a concept-based system you could easily have two different maps for Puma concolor. Whether Felis concolor is included is not relevant because nomenclatural synonyms have no bearing on the circumscription. They are both names for the same type.
There may be two different concepts (circumscriptions) published for Aedes triseriatus. It could be quite legit for a different (objective synonym only) name like Oclerotatus triseriatus to refer to that same concept. So in that sense, there is a rationale for different scientific names to be able to reference the same concept to meet that requirement of the example you cite. But in zoology these examples aren't even considered different names and the rule of priority would prevent truly different (heterotypic names) from referring to the same type so the use cases for different scientific names being able to refer to a single concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for providing a single map for those examples you cite but it's not using true concept-based principles.
Best, David
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus into one map and Felis concolor and Puma concolor into a different single map. :-)
Respectfully,
- Pete
That said modeling relationships between taxonomic publications where the authors actually read the original species description, reviewed the type specimens, and thought about the actual species conscription is
appropriate.
This is the sort of things the Meta-Authorities would take into account when selecting a "follow-this-treatment" Usage-Instance for the preferred treatment of a name.
Also consider that a large proportion of specimens are misidentified, and it occurs to me that modeling things like species occurrences as if they are Puma concolor (Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
GNA can't helpw ith that directly -- but it can help indirectly. Imagine a service that takes ever specimen in a given collection's database, and runs it against a mapping service as I described in the previous message. I can easily imagine a GIS-based algorithm that finds "outliers" -- that is occurrence records that appear to be outside the distribution based on the occurrence records from other sources. A clver/robust such algorithm could probably even discern whether the outlier likely represented a range extension (e.g. poorly-known species, plausible extansion), vs. a misidentification (e.g., well-known species and/or common misidentification).
This would lead to a set of flagged records from the collection that might be misidentified.
Rich
I think that the problem is that most species descriptions are written a way that person1 interprets specimenA as conceptB and person2 interprets specimenA and ConceptC.
This needs to be made more scientific so that one can test what proportions of specimens actually conform to the description (concept).
These descriptions should be open, world readable and reference-able via a URI.
Respectfully,
- Pete
** There also seems to be mismatch between the concept the human identifier choose (often via a key) and the species description (concept) to which you are saying their data applies.
On Sat, Jun 12, 2010 at 7:50 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
That said modeling relationships between taxonomic publications where the authors actually read the original species description, reviewed the type specimens, and thought about the actual species conscription is
appropriate.
This is the sort of things the Meta-Authorities would take into account when selecting a "follow-this-treatment" Usage-Instance for the preferred treatment of a name.
Also consider that a large proportion of specimens are misidentified, and it occurs to me that modeling things like species occurrences as if they are Puma concolor (Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
GNA can't helpw ith that directly -- but it can help indirectly. Imagine a service that takes ever specimen in a given collection's database, and runs it against a mapping service as I described in the previous message. I can easily imagine a GIS-based algorithm that finds "outliers" -- that is occurrence records that appear to be outside the distribution based on the occurrence records from other sources. A clver/robust such algorithm could probably even discern whether the outlier likely represented a range extension (e.g. poorly-known species, plausible extansion), vs. a misidentification (e.g., well-known species and/or common misidentification).
This would lead to a set of flagged records from the collection that might be misidentified.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
What you're asking for would certainly be nice! But I was aiming more for what you described as "an improvement". Baby steps.... :-)
Seriously, though -- I agree taxonomists have failed to be sufficiently explicit in their writings over the centuries to provide the raw material for machine-generated reasoning and inferencing through the content of their documents. However, I'm not so sure they have failed to provide sufficient information to allow for (mostly) reliable and accurate human- (or at least taxonomist-) generated reasoning and inferencing. That's why I think a key aspect of all of this -- especially for legacy content -- is third-party assertions. I don't think it's true that "most" species descriptions result in persons 1&2 assinging a given specimen to two separate concepts. But certainly there are enough to represent a non-trivial problem.
Rich
_____
From: Peter DeVries [mailto:pete.devries@gmail.com] Sent: Saturday, June 12, 2010 7:24 PM To: Richard Pyle Cc: David Remsen (GBIF); tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper; dmozzherin; David Patterson Subject: Re: [tdwg-content] Name is species concept thinking
I think that the problem is that most species descriptions are written a way that person1 interprets specimenA as conceptB and person2 interprets specimenA and ConceptC.
This needs to be made more scientific so that one can test what proportions of specimens actually conform to the description (concept).
These descriptions should be open, world readable and reference-able via a URI.
Respectfully,
- Pete
** There also seems to be mismatch between the concept the human identifier choose (often via a key) and the species description (concept) to which you are saying their data applies.
On Sat, Jun 12, 2010 at 7:50 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
That said modeling relationships between taxonomic publications where the authors actually read the original species description, reviewed the type specimens, and thought about the actual species conscription is
appropriate.
This is the sort of things the Meta-Authorities would take into account when selecting a "follow-this-treatment" Usage-Instance for the preferred treatment of a name.
Also consider that a large proportion of specimens are misidentified, and it occurs to me that modeling things like species occurrences as if they are Puma concolor (Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
GNA can't helpw ith that directly -- but it can help indirectly. Imagine a service that takes ever specimen in a given collection's database, and runs it against a mapping service as I described in the previous message. I can easily imagine a GIS-based algorithm that finds "outliers" -- that is occurrence records that appear to be outside the distribution based on the occurrence records from other sources. A clver/robust such algorithm could probably even discern whether the outlier likely represented a range extension (e.g. poorly-known species, plausible extansion), vs. a misidentification (e.g., well-known species and/or common misidentification).
This would lead to a set of flagged records from the collection that might be misidentified.
Rich
_______________________________________________ tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Whoops, did I say most? That was probably an overstatement.. sorry!
To Paddy et al. I don't know if we really know unless we have some idea of the process by which they determined what name to use.
- Pete
On Sun, Jun 13, 2010 at 4:00 AM, Richard Pyle deepreef@bishopmuseum.orgwrote:
What you're asking for would certainly be nice! But I was aiming more for what you described as "an improvement". Baby steps.... :-)
Seriously, though -- I agree taxonomists have failed to be sufficiently explicit in their writings over the centuries to provide the raw material for machine-generated reasoning and inferencing through the content of their documents. However, I'm not so sure they have failed to provide sufficient information to allow for (mostly) reliable and accurate human- (or at least taxonomist-) generated reasoning and inferencing. That's why I think a key aspect of all of this -- especially for legacy content -- is third-party assertions. I don't think it's true that "most" species descriptions result in persons 1&2 assinging a given specimen to two separate concepts. But certainly there are enough to represent a non-trivial problem.
Rich
*From:* Peter DeVries [mailto:pete.devries@gmail.com] *Sent:* Saturday, June 12, 2010 7:24 PM *To:* Richard Pyle *Cc:* David Remsen (GBIF); tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper; dmozzherin; David Patterson *Subject:* Re: [tdwg-content] Name is species concept thinking
I think that the problem is that most species descriptions are written a way that person1 interprets specimenA as conceptB and person2 interprets specimenA and ConceptC.
This needs to be made more scientific so that one can test what proportions of specimens actually conform to the description (concept).
These descriptions should be open, world readable and reference-able via a URI.
Respectfully,
- Pete
** There also seems to be mismatch between the concept the human identifier choose (often via a key) and the species description (concept) to which you are saying their data applies.
On Sat, Jun 12, 2010 at 7:50 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
That said modeling relationships between taxonomic publications where the authors actually read the original species description, reviewed the type specimens, and thought about the actual species conscription is
appropriate.
This is the sort of things the Meta-Authorities would take into account when selecting a "follow-this-treatment" Usage-Instance for the preferred treatment of a name.
Also consider that a large proportion of specimens are misidentified, and it occurs to me that modeling things like species occurrences as if they are Puma concolor (Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
GNA can't helpw ith that directly -- but it can help indirectly. Imagine a service that takes ever specimen in a given collection's database, and runs it against a mapping service as I described in the previous message. I can easily imagine a GIS-based algorithm that finds "outliers" -- that is occurrence records that appear to be outside the distribution based on the occurrence records from other sources. A clver/robust such algorithm could probably even discern whether the outlier likely represented a range extension (e.g. poorly-known species, plausible extansion), vs. a misidentification (e.g., well-known species and/or common misidentification).
This would lead to a set of flagged records from the collection that might be misidentified.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
Pete - you just summed up much of where I left off in my just-sent post. If we can present those curating occurrence records with a comprehensive set of relevant concept options, ideally tied to reference publications like field guides, etc., we can support the use of concept identifiers in those records. That provides the basis for having some idea of the process by which they determined what name to use.
DR
To Paddy et al. I don't know if we really know unless we have some idea of the process by which they determined what name to use.
- Pete
On Sun, Jun 13, 2010 at 4:00 AM, Richard Pyle <deepreef@bishopmuseum.org
wrote:
What you're asking for would certainly be nice! But I was aiming more for what you described as "an improvement". Baby steps.... :-)
Seriously, though -- I agree taxonomists have failed to be sufficiently explicit in their writings over the centuries to provide the raw material for machine-generated reasoning and inferencing through the content of their documents. However, I'm not so sure they have failed to provide sufficient information to allow for (mostly) reliable and accurate human- (or at least taxonomist-) generated reasoning and inferencing. That's why I think a key aspect of all of this -- especially for legacy content -- is third-party assertions. I don't think it's true that "most" species descriptions result in persons 1&2 assinging a given specimen to two separate concepts. But certainly there are enough to represent a non-trivial problem.
Rich
From: Peter DeVries [mailto:pete.devries@gmail.com] Sent: Saturday, June 12, 2010 7:24 PM To: Richard Pyle Cc: David Remsen (GBIF); tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper; dmozzherin; David Patterson Subject: Re: [tdwg-content] Name is species concept thinking
I think that the problem is that most species descriptions are written a way that person1 interprets specimenA as conceptB and person2 interprets specimenA and ConceptC.
This needs to be made more scientific so that one can test what proportions of specimens actually conform to the description (concept).
These descriptions should be open, world readable and reference-able via a URI.
Respectfully,
- Pete
** There also seems to be mismatch between the concept the human identifier choose (often via a key) and the species description (concept) to which you are saying their data applies.
On Sat, Jun 12, 2010 at 7:50 PM, Richard Pyle <deepreef@bishopmuseum.org
wrote:
That said modeling relationships between taxonomic publications
where
the authors actually read the original species description, reviewed the type specimens, and thought about the actual species
conscription is appropriate.
This is the sort of things the Meta-Authorities would take into account when selecting a "follow-this-treatment" Usage-Instance for the preferred treatment of a name.
Also consider that a large proportion of specimens are
misidentified,
and it occurs to me that modeling things like species occurrences as if they are Puma concolor (Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
GNA can't helpw ith that directly -- but it can help indirectly. Imagine a service that takes ever specimen in a given collection's database, and runs it against a mapping service as I described in the previous message. I can easily imagine a GIS-based algorithm that finds "outliers" -- that is occurrence records that appear to be outside the distribution based on the occurrence records from other sources. A clver/robust such algorithm could probably even discern whether the outlier likely represented a range extension (e.g. poorly-known species, plausible extansion), vs. a misidentification (e.g., well-known species and/or common misidentification).
This would lead to a set of flagged records from the collection that might be misidentified.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
It would be immensely useful if someone capable of capturing a clear and concise summary of this lively discussion in commentary about the Darwin Core term taxonConceptID on http://code.google.com/p/darwincore/wiki/Taxon, which is sadly lacking in any guidance on the subject.
On Sun, Jun 13, 2010 at 3:58 AM, David Remsen (GBIF) dremsen@gbif.org wrote:
Pete - you just summed up much of where I left off in my just-sent post. If we can present those curating occurrence records with a comprehensive set of relevant concept options, ideally tied to reference publications like field guides, etc., we can support the use of concept identifiers in those records. That provides the basis for having some idea of the process by which they determined what name to use. DR
To Paddy et al. I don't know if we really know unless we have some idea of the process by which they determined what name to use.
- Pete
On Sun, Jun 13, 2010 at 4:00 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
What you're asking for would certainly be nice! But I was aiming more for what you described as "an improvement". Baby steps.... :-)
Seriously, though -- I agree taxonomists have failed to be sufficiently explicit in their writings over the centuries to provide the raw material for machine-generated reasoning and inferencing through the content of their documents. However, I'm not so sure they have failed to provide sufficient information to allow for (mostly) reliable and accurate human- (or at least taxonomist-) generated reasoning and inferencing. That's why I think a key aspect of all of this -- especially for legacy content -- is third-party assertions. I don't think it's true that "most" species descriptions result in persons 1&2 assinging a given specimen to two separate concepts. But certainly there are enough to represent a non-trivial problem.
Rich
From: Peter DeVries [mailto:pete.devries@gmail.com] Sent: Saturday, June 12, 2010 7:24 PM To: Richard Pyle Cc: David Remsen (GBIF); tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper; dmozzherin; David Patterson Subject: Re: [tdwg-content] Name is species concept thinking
I think that the problem is that most species descriptions are written a way that person1 interprets specimenA as conceptB and person2 interprets specimenA and ConceptC. This needs to be made more scientific so that one can test what proportions of specimens actually conform to the description (concept). These descriptions should be open, world readable and reference-able via a URI. Respectfully,
- Pete
** There also seems to be mismatch between the concept the human identifier choose (often via a key) and the species description (concept) to which you are saying their data applies.
On Sat, Jun 12, 2010 at 7:50 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
That said modeling relationships between taxonomic publications where the authors actually read the original species description, reviewed the type specimens, and thought about the actual species conscription is
appropriate.
This is the sort of things the Meta-Authorities would take into account when selecting a "follow-this-treatment" Usage-Instance for the preferred treatment of a name.
Also consider that a large proportion of specimens are misidentified, and it occurs to me that modeling things like species occurrences as if they are Puma concolor (Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
GNA can't helpw ith that directly -- but it can help indirectly. Imagine a service that takes ever specimen in a given collection's database, and runs it against a mapping service as I described in the previous message. I can easily imagine a GIS-based algorithm that finds "outliers" -- that is occurrence records that appear to be outside the distribution based on the occurrence records from other sources. A clver/robust such algorithm could probably even discern whether the outlier likely represented a range extension (e.g. poorly-known species, plausible extansion), vs. a misidentification (e.g., well-known species and/or common misidentification).
This would lead to a set of flagged records from the collection that might be misidentified.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
It would certainly help if we could have the tools/services/content/UIs in place to allow people who people taxonomically identify occrence records to point-and-click among the options. However, we don't need to have that infrastructure in place to get started. Whenever I give advice to people setting up data-gathering protocols involving occurrence records (of any kind), is to always always always *always* record the field guide or taxonomic treatment that was use in establishing the identification. Even if the person is an expert in the group, and identified it from their brain, they should still include some sort of published treatment as a "sensu" reference. Eventually, these treatments will get plugged into the taxon concept "matrix", and the identification will have taxon-concept context.
The point is, we don't need to build the tools before people can start capturing the information.
Rich
_____
From: David Remsen (GBIF) [mailto:dremsen@gbif.org] Sent: Sunday, June 13, 2010 12:59 AM To: Peter DeVries Cc: David Remsen (GBIF); Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper; dmozzherin; David Patterson Subject: Re: [tdwg-content] Name is species concept thinking
Pete - you just summed up much of where I left off in my just-sent post. If we can present those curating occurrence records with a comprehensive set of relevant concept options, ideally tied to reference publications like field guides, etc., we can support the use of concept identifiers in those records. That provides the basis for having some idea of the process by which they determined what name to use.
DR
To Paddy et al. I don't know if we really know unless we have some idea of the process by which they determined what name to use.
- Pete
On Sun, Jun 13, 2010 at 4:00 AM, Richard Pyle deepreef@bishopmuseum.org wrote:
What you're asking for would certainly be nice! But I was aiming more for what you described as "an improvement". Baby steps.... :-)
Seriously, though -- I agree taxonomists have failed to be sufficiently explicit in their writings over the centuries to provide the raw material for machine-generated reasoning and inferencing through the content of their documents. However, I'm not so sure they have failed to provide sufficient information to allow for (mostly) reliable and accurate human- (or at least taxonomist-) generated reasoning and inferencing. That's why I think a key aspect of all of this -- especially for legacy content -- is third-party assertions. I don't think it's true that "most" species descriptions result in persons 1&2 assinging a given specimen to two separate concepts. But certainly there are enough to represent a non-trivial problem.
Rich
_____
From: Peter DeVries [mailto:pete.devries@gmail.com]
Sent: Saturday, June 12, 2010 7:24 PM To: Richard Pyle Cc: David Remsen (GBIF); tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper; dmozzherin; David Patterson Subject: Re: [tdwg-content] Name is species concept thinking
I think that the problem is that most species descriptions are written a way that person1 interprets specimenA as conceptB and person2 interprets specimenA and ConceptC.
This needs to be made more scientific so that one can test what proportions of specimens actually conform to the description (concept).
These descriptions should be open, world readable and reference-able via a URI.
Respectfully,
- Pete
** There also seems to be mismatch between the concept the human identifier choose (often via a key) and the species description (concept) to which you are saying their data applies.
On Sat, Jun 12, 2010 at 7:50 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
That said modeling relationships between taxonomic publications where the authors actually read the original species description, reviewed the type specimens, and thought about the actual species conscription is
appropriate.
This is the sort of things the Meta-Authorities would take into account when selecting a "follow-this-treatment" Usage-Instance for the preferred treatment of a name.
Also consider that a large proportion of specimens are misidentified, and it occurs to me that modeling things like species occurrences as if they are Puma concolor (Linnaeus, 1771) sensu stricto is probably not appropriate. At best they are something like (Felis concolor / Puma concolor) with some significant level of error.
GNA can't helpw ith that directly -- but it can help indirectly. Imagine a service that takes ever specimen in a given collection's database, and runs it against a mapping service as I described in the previous message. I can easily imagine a GIS-based algorithm that finds "outliers" -- that is occurrence records that appear to be outside the distribution based on the occurrence records from other sources. A clver/robust such algorithm could probably even discern whether the outlier likely represented a range extension (e.g. poorly-known species, plausible extansion), vs. a misidentification (e.g., well-known species and/or common misidentification).
This would lead to a set of flagged records from the collection that might be misidentified.
Rich
_______________________________________________ tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Tim: Coffee time.
Dave:
Here's how I imagine this would work under GNA, integrated with GBIF:
1. Person submits text-string "Puma concolor" to a GNA-aware mapping service.
2. Service fires text string off to GNI, and sees how many lexical buckets are involved, and how many protonyms are represented in those buckets.
3. If problems of Homonymy/Homography exist (i.e., if more than one legitimate Protonym for a species-group name "concolor" has ever been combined with a genus-group name "Puma"), then the service replies with a page that says "Do you mean the big cat, or do you mean the protozoa?" (pretending, for a moment, that the name "Puma concolor" has also been applied to a protozoa). Perhaps the service can also review the usage history of the two names, and algorithmically determine that they most likely meant the big cat -- but at least alert the user that a potential case of homonymy/homography exists.
4. If step 2 yielded no apaprent homonymy/Homography, or if the user selected one from among more than one Homonyms/Homographs, then the service takes the selected ProtonymID and throws it at a GNUB-aware taxon concept resolver.
5. The GNUB-aware Taxon Concept resolver looks at how many Taxon Concept Service Providers (e.g., ITIS, EOL, WoRMS, etc.) have made some sort of concept-definition assertion about the Protonym. In most cases, this could/should be as simple as "Concept Service [X] says that for Protonym [IDp], follow taxon name usage-instance [IDtnu]". Given [IDtnu], GNUB will tell us which Genus combination to use, which orthographic spelling to use, which taxon rank to use, and which set of Protonyms should be regarded as subjective synonyms of the taxon concept represented by [IDtnu]. If the different taxon concept providers (I call them "Meta Authorities") all agree (i.e., each taxon concept provider yields the same set of ProtonymIDs), then no user interaction is required on this step. If there are different interpretations of what the current treatment of "Puma concolor [big cat]" should be, then the user is presented with the different options (and perhaps a bit of information on what the different active concepts are, in terms of distribution and/or classification).
6. The resultant set of Protonym IDs from step 5 (the original ProtonymID from step 2/3, plus the exploded set of Protonyms for subjective/hetrotypic synonyms from step 5), are then thrown at GBIF (which would be GNA-Aware, and thus know how to translate all the ProtonymIDs into a larger set of text-string names and/or GBIF may have already cashed this by converting text-string names from occurrence providers into ProtonymIDs via GNI).
7. The user is then presented with a distributional map from GBIF occurrence records, based on the selected Protonym of the original submitted text-string name, cast in the context of the set of heterotypic synonyms established in Step 5.
The bad news is that this sounds incredibly complicated. The good news is that it's actually not. Especially not from the user's perspective.
In the WORST case scenario, the user needs to provide three pieces of information:
1. The text-string name submitted in Step 1.
2. A decision in the case of Homonyms/Homographs, what critter/weed/microbe they're after.
3. A decision about which Meta Authority to follow for the taxon concept.
This, again, is the WORST case scenario. A much more likely scenario involves fewere steps for the end user.
Consider:
Step 2 only applies in the 10%(ish) cases of text-string names involved in some sort of Homonymy/Homography problem. So in 90%(ish) of cases, step 2 won't come into play.
Step 3 only applies in cases where the Meta-Authorities disagree on the current usage of a name (e.g., ITIS is a lumper, WoRMS is a splitter). Even in cases where there is disagreement, the user could simply be presnted with two (or more) maps, showing each of the current interpretations/statuses of the selected critter/weed. For example, the user might get a page that says "If you follow the ITIS interpretation of this species, the map looks like this. If you follow the WoRMS interpretation of the name, the map looks like that."
And, indeed, Step 1 wouldn't exist in the majority of cases, because I suspect most people will get to the Map service by clicking on a link from some web page article or database system. In most cases, this link would also bypass Step 2 as well.
In other words, if we can continue to develop GNA the way we're already developing it, we should be able to get the the point (Soon!) where a user clicks a link on a web page, and immediately gets a single map distribution using the taxon concpet adopted by the overwhelming majority of Meta-Authorities, or (at worst) gets more than one map based on more-than one contemporary/contentious views of what the species concept should be (with links to more information, if the user wants the details).
So, if we keep building GNA, we should have exactly the service that Pete says he'd like to have (i.e., a single map with the full distribution of the species, regardless of what text-string name is used to lable the georef'd occurrence data-points).
Simple, really....
:-)
Rich
-----Original Message----- From: David Remsen (GBIF) [mailto:dremsen@gbif.org] Sent: Saturday, June 12, 2010 10:50 AM To: Peter DeVries Cc: David Remsen (GBIF); Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper Subject: Re: [tdwg-content] Name is species concept thinking
Pete -
This statement has been sticking with me since I read it. It might be me but I don't see any relationship between that statement and how this relates to taxon concepts. In a concept-based system you could easily have two different maps for Puma concolor. Whether Felis concolor is included is not relevant because nomenclatural synonyms have no bearing on the circumscription. They are both names for the same type.
There may be two different concepts (circumscriptions) published for Aedes triseriatus. It could be quite legit for a different (objective synonym only) name like Oclerotatus triseriatus to refer to that same concept. So in that sense, there is a rationale for different scientific names to be able to reference the same concept to meet that requirement of the example you cite. But in zoology these examples aren't even considered different names and the rule of priority would prevent truly different (heterotypic names) from referring to the same type so the use cases for different scientific names being able to refer to a single concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for providing a single map for those examples you cite but it's not using true concept-based principles.
Best, David
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus
into one map
and Felis concolor and Puma concolor into a different
single map. :-)
Respectfully,
- Pete
Rich
What you described in 1-5 was exactly the scope and function of uBio NameBank and ClassificationBank. This functionality has been refined in our ChecklistBank index.
It serves to provide a consistent resolution service for "Taxon Concept Service Providers". By linking to a populated GNUB it would also have an improved means to provide the protonym circumscription of the concept, as you describe in (5). In addition, we would like to support the inclusion of bibliographic data, specimens, geospatial information, and general descriptive data. The DwC Archive approach provides one (not exclusively but I would appreciate pointers to others) means to mobilise these data from people who have it.
In (5) you describe the protonym-based circumscription to evaluate the relative agreement of the identified concepts (via 'meta- authorities'). This provides the basis for expanding the potential set of names for a subsequent data retrieval from GBIF (for example) to include all the related nomenclatural and lexical variants for those names (of course checking for homonym conflicts among them). Again, this is consistent to what was implemented in uBio services and we are currently implementing in our Checklist Bank (CLB) (I use the term General Concept Mapping for this process). I'm not sure I agree that this provides a true concept-based system, however. I would call it a concept-informed system.
In (6) it appears the output of the Taxon Concept resolution process is either an expanded set of name strings or an array of protonymIDs. I can see this is an option in (6). If the latter, I can see how this would provide a more precise concept-informed but name-based retrieval method and probably the best we can expect from large indices like GBIF. But I don't see how it will support a strict concept-based retrieval.
The real world example that forms my litmus test is the blue-headed vireo, Vireo solitarius (Wilson 1810) which was originally called Muscicapa solitaria and has also been combined to form Vireosylvia solitaria and Lanivireo solitarius. Of course there are lexical variants as well (Google "Lanivireo solitaria" for example). These, properly structured, would be the sort of useful set of lexical/ nomenclatural content I would hope as a response from a GNI/GNUB resolution service based on protonymID.
One current view of the taxon (concept C1) has this species occupying the eastern part of the US. Another species, Vireo plumbeus Coues, 1866, (concept C2) occupies the middle west USA, and a third species, Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western coast.
Another view lumps all three of these into a single species which, based on the rule of priority, has the valid name Vireo solitarius and results in a new concept (C4). This concept includes C1, C2, and C3. Both concepts have the scientific name of Vireo solitarius.
We can access and represent these in a consistent fashion using our CLB and probably others can too in their own index models.
So, now we have a specimen of Vireo solitarius that was captured in Minnesota. It might be an errant instance of C1, Vireo solitarius sensu stricto, that strayed a bit west of normal. It might be (C4) Vireo solitarius, sensu lato. The specimen would need that concept identifier tied to the record to make this explicit. So, let's say that the identifier was made using the lumped concept (C4). Of course, if this doesn't make it into the record, we are stuck with the name alone.
Using the method (6) you described would allow a user to discover the different treatments of Vireo solitarius (C1 and C4) and provide some means to discriminate them via concept resolution.
- C4 includes C1, C2, and C3 which would include all the names above. - C1 would only include the nomenclatural/lexical variants for Vireo solitarius.
Resolution will enable us to perform a significantly more useful and concept-informed search. It will, however, include the specimen I referenced above in BOTH cases because "Vireo solitarius" or it's protonymID will be a search term in both cases.
A more precise concept based system would utilise a required taxon concept identifier in the specimen record to discriminate different uses of the SAME NAME. In other words, if you did a search of Vireo solitarius and the concept resolver indicated the different concepts above and you chose the sensu stricto (split) version, you would get the C1 labelled records but the C4 labelled records would be excluded or at least come with a warning (may not be what you are looking for). This of course requires our specimen records to have a concept identifier. Or, the concept definition itself will include additional annotations to enable us to make inferences
Ex.,
Publication date of the concept - If the split didn't happen until 1980 and the specimen is from 1960 then we could infer C4. Distribution information for the concept - if we disregard errant specimens then we might infer a 1985 Minnesota specimen is a C2 in spite of the different name.
In sum, we are on track for achieving this and I believe our data mobilisation strategy will support getting these sort of data published. When Markus returns from paternity leave I would hope we could include his thoughts on how we might expose these as RDF via our indices to support all aspects of this discussion.
David
On Jun 13, 2010, at 2:37 AM, Richard Pyle wrote:
Tim: Coffee time.
Dave:
Here's how I imagine this would work under GNA, integrated with GBIF:
- Person submits text-string "Puma concolor" to a GNA-aware mapping
service.
- Service fires text string off to GNI, and sees how many lexical
buckets are involved, and how many protonyms are represented in those buckets.
- If problems of Homonymy/Homography exist (i.e., if more than one
legitimate Protonym for a species-group name "concolor" has ever been combined with a genus-group name "Puma"), then the service replies with a page that says "Do you mean the big cat, or do you mean the protozoa?" (pretending, for a moment, that the name "Puma concolor" has also been applied to a protozoa). Perhaps the service can also review the usage history of the two names, and algorithmically determine that they most likely meant the big cat -- but at least alert the user that a potential case of homonymy/homography exists.
- If step 2 yielded no apaprent homonymy/Homography, or if the user
selected one from among more than one Homonyms/Homographs, then the service takes the selected ProtonymID and throws it at a GNUB-aware taxon concept resolver.
- The GNUB-aware Taxon Concept resolver looks at how many Taxon
Concept Service Providers (e.g., ITIS, EOL, WoRMS, etc.) have made some sort of concept-definition assertion about the Protonym. In most cases, this could/should be as simple as "Concept Service [X] says that for Protonym [IDp], follow taxon name usage-instance [IDtnu]". Given [IDtnu], GNUB will tell us which Genus combination to use, which orthographic spelling to use, which taxon rank to use, and which set of Protonyms should be regarded as subjective synonyms of the taxon concept represented by [IDtnu]. If the different taxon concept providers (I call them "Meta Authorities") all agree (i.e., each taxon concept provider yields the same set of ProtonymIDs), then no user interaction is required on this step. If there are different interpretations of what the current treatment of "Puma concolor [big cat]" should be, then the user is presented with the different options (and perhaps a bit of information on what the different active concepts are, in terms of distribution and/or classification).
- The resultant set of Protonym IDs from step 5 (the original
ProtonymID from step 2/3, plus the exploded set of Protonyms for subjective/ hetrotypic synonyms from step 5), are then thrown at GBIF (which would be GNA- Aware, and thus know how to translate all the ProtonymIDs into a larger set of text-string names and/or GBIF may have already cashed this by converting text-string names from occurrence providers into ProtonymIDs via GNI).
- The user is then presented with a distributional map from GBIF
occurrence records, based on the selected Protonym of the original submitted text-string name, cast in the context of the set of heterotypic synonyms established in Step 5.
The bad news is that this sounds incredibly complicated. The good news is that it's actually not. Especially not from the user's perspective.
In the WORST case scenario, the user needs to provide three pieces of information:
The text-string name submitted in Step 1.
A decision in the case of Homonyms/Homographs, what critter/weed/
microbe they're after.
- A decision about which Meta Authority to follow for the taxon
concept.
This, again, is the WORST case scenario. A much more likely scenario involves fewere steps for the end user.
Consider:
Step 2 only applies in the 10%(ish) cases of text-string names involved in some sort of Homonymy/Homography problem. So in 90%(ish) of cases, step 2 won't come into play.
Step 3 only applies in cases where the Meta-Authorities disagree on the current usage of a name (e.g., ITIS is a lumper, WoRMS is a splitter). Even in cases where there is disagreement, the user could simply be presnted with two (or more) maps, showing each of the current interpretations/ statuses of the selected critter/weed. For example, the user might get a page that says "If you follow the ITIS interpretation of this species, the map looks like this. If you follow the WoRMS interpretation of the name, the map looks like that."
And, indeed, Step 1 wouldn't exist in the majority of cases, because I suspect most people will get to the Map service by clicking on a link from some web page article or database system. In most cases, this link would also bypass Step 2 as well.
In other words, if we can continue to develop GNA the way we're already developing it, we should be able to get the the point (Soon!) where a user clicks a link on a web page, and immediately gets a single map distribution using the taxon concpet adopted by the overwhelming majority of Meta-Authorities, or (at worst) gets more than one map based on more- than one contemporary/contentious views of what the species concept should be (with links to more information, if the user wants the details).
So, if we keep building GNA, we should have exactly the service that Pete says he'd like to have (i.e., a single map with the full distribution of the species, regardless of what text-string name is used to lable the georef'd occurrence data-points).
Simple, really....
:-)
Rich
-----Original Message----- From: David Remsen (GBIF) [mailto:dremsen@gbif.org] Sent: Saturday, June 12, 2010 10:50 AM To: Peter DeVries Cc: David Remsen (GBIF); Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper Subject: Re: [tdwg-content] Name is species concept thinking
Pete -
This statement has been sticking with me since I read it. It might be me but I don't see any relationship between that statement and how this relates to taxon concepts. In a concept-based system you could easily have two different maps for Puma concolor. Whether Felis concolor is included is not relevant because nomenclatural synonyms have no bearing on the circumscription. They are both names for the same type.
There may be two different concepts (circumscriptions) published for Aedes triseriatus. It could be quite legit for a different (objective synonym only) name like Oclerotatus triseriatus to refer to that same concept. So in that sense, there is a rationale for different scientific names to be able to reference the same concept to meet that requirement of the example you cite. But in zoology these examples aren't even considered different names and the rule of priority would prevent truly different (heterotypic names) from referring to the same type so the use cases for different scientific names being able to refer to a single concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for providing a single map for those examples you cite but it's not using true concept-based principles.
Best, David
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus
into one map
and Felis concolor and Puma concolor into a different
single map. :-)
Respectfully,
- Pete
Dave,
I believe this combination of GNI(Namebank) + GNUB(+ClassificationBank) is what we have been calling the Global Names Architecture for quite a while now ;-)
Jerry
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of David Remsen (GBIF) Sent: Sunday, 13 June 2010 10:56 p.m. To: Richard Pyle Cc: tdwg-content@lists.tdwg.org Mailing List Subject: Re: [tdwg-content] Name is species concept thinking
Rich
What you described in 1-5 was exactly the scope and function of uBio NameBank and ClassificationBank. This functionality has been refined in our ChecklistBank index.
It serves to provide a consistent resolution service for "Taxon Concept Service Providers". By linking to a populated GNUB it would also have an improved means to provide the protonym circumscription of the concept, as you describe in (5). In addition, we would like to support the inclusion of bibliographic data, specimens, geospatial information, and general descriptive data. The DwC Archive approach provides one (not exclusively but I would appreciate pointers to others) means to mobilise these data from people who have it.
In (5) you describe the protonym-based circumscription to evaluate the relative agreement of the identified concepts (via 'meta- authorities'). This provides the basis for expanding the potential set of names for a subsequent data retrieval from GBIF (for example) to include all the related nomenclatural and lexical variants for those names (of course checking for homonym conflicts among them). Again, this is consistent to what was implemented in uBio services and we are currently implementing in our Checklist Bank (CLB) (I use the term General Concept Mapping for this process). I'm not sure I agree that this provides a true concept-based system, however. I would call it a concept-informed system.
In (6) it appears the output of the Taxon Concept resolution process is either an expanded set of name strings or an array of protonymIDs. I can see this is an option in (6). If the latter, I can see how this would provide a more precise concept-informed but name-based retrieval method and probably the best we can expect from large indices like GBIF. But I don't see how it will support a strict concept-based retrieval.
The real world example that forms my litmus test is the blue-headed vireo, Vireo solitarius (Wilson 1810) which was originally called Muscicapa solitaria and has also been combined to form Vireosylvia solitaria and Lanivireo solitarius. Of course there are lexical variants as well (Google "Lanivireo solitaria" for example). These, properly structured, would be the sort of useful set of lexical/ nomenclatural content I would hope as a response from a GNI/GNUB resolution service based on protonymID.
One current view of the taxon (concept C1) has this species occupying the eastern part of the US. Another species, Vireo plumbeus Coues, 1866, (concept C2) occupies the middle west USA, and a third species, Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western coast.
Another view lumps all three of these into a single species which, based on the rule of priority, has the valid name Vireo solitarius and results in a new concept (C4). This concept includes C1, C2, and C3. Both concepts have the scientific name of Vireo solitarius.
We can access and represent these in a consistent fashion using our CLB and probably others can too in their own index models.
So, now we have a specimen of Vireo solitarius that was captured in Minnesota. It might be an errant instance of C1, Vireo solitarius sensu stricto, that strayed a bit west of normal. It might be (C4) Vireo solitarius, sensu lato. The specimen would need that concept identifier tied to the record to make this explicit. So, let's say that the identifier was made using the lumped concept (C4). Of course, if this doesn't make it into the record, we are stuck with the name alone.
Using the method (6) you described would allow a user to discover the different treatments of Vireo solitarius (C1 and C4) and provide some means to discriminate them via concept resolution.
- C4 includes C1, C2, and C3 which would include all the names above. - C1 would only include the nomenclatural/lexical variants for Vireo solitarius.
Resolution will enable us to perform a significantly more useful and concept-informed search. It will, however, include the specimen I referenced above in BOTH cases because "Vireo solitarius" or it's protonymID will be a search term in both cases.
A more precise concept based system would utilise a required taxon concept identifier in the specimen record to discriminate different uses of the SAME NAME. In other words, if you did a search of Vireo solitarius and the concept resolver indicated the different concepts above and you chose the sensu stricto (split) version, you would get the C1 labelled records but the C4 labelled records would be excluded or at least come with a warning (may not be what you are looking for). This of course requires our specimen records to have a concept identifier. Or, the concept definition itself will include additional annotations to enable us to make inferences
Ex.,
Publication date of the concept - If the split didn't happen until 1980 and the specimen is from 1960 then we could infer C4. Distribution information for the concept - if we disregard errant specimens then we might infer a 1985 Minnesota specimen is a C2 in spite of the different name.
In sum, we are on track for achieving this and I believe our data mobilisation strategy will support getting these sort of data published. When Markus returns from paternity leave I would hope we could include his thoughts on how we might expose these as RDF via our indices to support all aspects of this discussion.
David
On Jun 13, 2010, at 2:37 AM, Richard Pyle wrote:
Tim: Coffee time.
Dave:
Here's how I imagine this would work under GNA, integrated with GBIF:
- Person submits text-string "Puma concolor" to a GNA-aware mapping
service.
- Service fires text string off to GNI, and sees how many lexical
buckets are involved, and how many protonyms are represented in those buckets.
- If problems of Homonymy/Homography exist (i.e., if more than one
legitimate Protonym for a species-group name "concolor" has ever been combined with a genus-group name "Puma"), then the service replies with a page that says "Do you mean the big cat, or do you mean the protozoa?" (pretending, for a moment, that the name "Puma concolor" has also been applied to a protozoa). Perhaps the service can also review the usage history of the two names, and algorithmically determine that they most likely meant the big cat -- but at least alert the user that a potential case of homonymy/homography exists.
- If step 2 yielded no apaprent homonymy/Homography, or if the user
selected one from among more than one Homonyms/Homographs, then the service takes the selected ProtonymID and throws it at a GNUB-aware taxon concept resolver.
- The GNUB-aware Taxon Concept resolver looks at how many Taxon
Concept Service Providers (e.g., ITIS, EOL, WoRMS, etc.) have made some sort of concept-definition assertion about the Protonym. In most cases, this could/should be as simple as "Concept Service [X] says that for Protonym [IDp], follow taxon name usage-instance [IDtnu]". Given [IDtnu], GNUB will tell us which Genus combination to use, which orthographic spelling to use, which taxon rank to use, and which set of Protonyms should be regarded as subjective synonyms of the taxon concept represented by [IDtnu]. If the different taxon concept providers (I call them "Meta Authorities") all agree (i.e., each taxon concept provider yields the same set of ProtonymIDs), then no user interaction is required on this step. If there are different interpretations of what the current treatment of "Puma concolor [big cat]" should be, then the user is presented with the different options (and perhaps a bit of information on what the different active concepts are, in terms of distribution and/or classification).
- The resultant set of Protonym IDs from step 5 (the original
ProtonymID from step 2/3, plus the exploded set of Protonyms for subjective/ hetrotypic synonyms from step 5), are then thrown at GBIF (which would be GNA- Aware, and thus know how to translate all the ProtonymIDs into a larger set of text-string names and/or GBIF may have already cashed this by converting text-string names from occurrence providers into ProtonymIDs via GNI).
- The user is then presented with a distributional map from GBIF
occurrence records, based on the selected Protonym of the original submitted text-string name, cast in the context of the set of heterotypic synonyms established in Step 5.
The bad news is that this sounds incredibly complicated. The good news is that it's actually not. Especially not from the user's perspective.
In the WORST case scenario, the user needs to provide three pieces of information:
The text-string name submitted in Step 1.
A decision in the case of Homonyms/Homographs, what critter/weed/
microbe they're after.
- A decision about which Meta Authority to follow for the taxon
concept.
This, again, is the WORST case scenario. A much more likely scenario involves fewere steps for the end user.
Consider:
Step 2 only applies in the 10%(ish) cases of text-string names involved in some sort of Homonymy/Homography problem. So in 90%(ish) of cases, step 2 won't come into play.
Step 3 only applies in cases where the Meta-Authorities disagree on the current usage of a name (e.g., ITIS is a lumper, WoRMS is a splitter). Even in cases where there is disagreement, the user could simply be presnted with two (or more) maps, showing each of the current interpretations/ statuses of the selected critter/weed. For example, the user might get a page that says "If you follow the ITIS interpretation of this species, the map looks like this. If you follow the WoRMS interpretation of the name, the map looks like that."
And, indeed, Step 1 wouldn't exist in the majority of cases, because I suspect most people will get to the Map service by clicking on a link from some web page article or database system. In most cases, this link would also bypass Step 2 as well.
In other words, if we can continue to develop GNA the way we're already developing it, we should be able to get the the point (Soon!) where a user clicks a link on a web page, and immediately gets a single map distribution using the taxon concpet adopted by the overwhelming majority of Meta-Authorities, or (at worst) gets more than one map based on more- than one contemporary/contentious views of what the species concept should be (with links to more information, if the user wants the details).
So, if we keep building GNA, we should have exactly the service that Pete says he'd like to have (i.e., a single map with the full distribution of the species, regardless of what text-string name is used to lable the georef'd occurrence data-points).
Simple, really....
:-)
Rich
-----Original Message----- From: David Remsen (GBIF) [mailto:dremsen@gbif.org] Sent: Saturday, June 12, 2010 10:50 AM To: Peter DeVries Cc: David Remsen (GBIF); Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper Subject: Re: [tdwg-content] Name is species concept thinking
Pete -
This statement has been sticking with me since I read it. It might be me but I don't see any relationship between that statement and how this relates to taxon concepts. In a concept-based system you could easily have two different maps for Puma concolor. Whether Felis concolor is included is not relevant because nomenclatural synonyms have no bearing on the circumscription. They are both names for the same type.
There may be two different concepts (circumscriptions) published for Aedes triseriatus. It could be quite legit for a different (objective synonym only) name like Oclerotatus triseriatus to refer to that same concept. So in that sense, there is a rationale for different scientific names to be able to reference the same concept to meet that requirement of the example you cite. But in zoology these examples aren't even considered different names and the rule of priority would prevent truly different (heterotypic names) from referring to the same type so the use cases for different scientific names being able to refer to a single concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for providing a single map for those examples you cite but it's not using true concept-based principles.
Best, David
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus
into one map
and Felis concolor and Puma concolor into a different
single map. :-)
Respectfully,
- Pete
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Hi Dave,
By linking to a populated GNUB it would also have an improved means to provide the protonym circumscription of the concept, as you describe in (5).
Just to be clear, when you say "protonym circumscription of the concept", you mean a concept circumscription whose boundaries are defined by the set of included protonyms (as opposed to the concept circumscription established for the Protonym-usage instance; i.e., original description). Correct? Although such concept/circumscription definitions (effectively represented by the set of type specimens implied by the set of protonyms) are not as high-resoultion as concept/circumscription definitions that are defined by a broader suite of specimens, populations, or characters; they are, I believe, the "best bang for the buck" in that they give us 80% of the benefit for 20% of the work.
In addition, we would like to support the inclusion of bibliographic data,
Already included via GNUB.
specimens,
In my mind, a *key* value of GNUB/GNA is to serve as a taxon authority for specimen collections (i.e., the anchorpoints for specimen/observation taxonomic identifications).
geospatial information,
Inherited from the specimens/observations.
and general descriptive data.
Inherited from the PLAZI treatments anchored to the publications, as well as the published and unpublished character data anchored through specimens.
In (5) you describe the protonym-based circumscription to evaluate the relative agreement of the identified concepts (via 'meta- authorities'). This provides the basis for expanding the potential set of names for a subsequent data retrieval from GBIF (for example) to include all the related nomenclatural and lexical variants for those names (of course checking for homonym conflicts among them).
Yes, exactly!
In (6) it appears the output of the Taxon Concept resolution process is either an expanded set of name strings or an array of protonymIDs.
Before the content is built, the name-strings can be fed back into GNI to snoop out additional possible protonym links. However, in a data-populatd paradigm, it would be an array of ProtonymIDs.
If the latter, I can see how this would provide a more precise concept-informed but name-based retrieval method and probably the best we can expect from large indices like GBIF. But I don't see how it will support a strict concept-based retrieval.
If you are content with a protonym-based concept circumscription definition, it has all you need. Each Taxon Name Usage instance in GNUB represents an array of (minimually one) ProtonymIDs -- that is, the set of all protonyms representing the asserted taxon concept in the usage instance. Like I said, it's not as high-resolution as specimen/population/character-based concept/circumscription definitions, but I think it gets us most of the way there, with the least amount of effort (not to say that it requires little effort to get us that far -- just that trying to define concept boundaries at higher resolution requires *MUCH* more effort).
So, the question is, what concept boundaries are fuzzy when you use Protonym-based definitions?
Imagine an example where we have 7 protonyms of something in the Pacific; three described from type specimens collected in the eastern Pacific, and four from specimens collected throughout the western Pacific. We also have a bunch of specimens from the central Pacific, but no Protonyms typified from that region.
Taxonomist "A" declares that the three protonyms from the eastern Pacific represents one valid species (Aus bus), and the four from the west represent a second valid species (Aus xus). Taxonomist "B" declares the exact same thing. Using Protonym-based circumscriptions, we can infer that each the taxon concepts of "Aus bus" and "Aus xus" are both congruent between the two taxonomists.
The fuzziness comes in for the central Pacific populations:
1) Suppose that Taxonomist "A" explicitly cited the populations in the central Pacific, and declared them to be "Aus bus"; but Taxonomist "B" never mentioned them. In that case, we would probably want to establish the concept realtionship as "Aus bus sec. A <includes> Aus bus sec. B" (as opposed to "is congruent with", as would be the case for a Protonym-based circumscription).
2) Suppose that Taxonomist "A" explicitly cited the populations in the central Pacific, and declared them to be "Aus bus"; but Taxonomist "B" cited those same populations as belonging to "Aus xus". In that case, we would probably want to establish the concept realtionship as "Aus bus sec. A <overlaps with> Aus bus sec. B". Again, the Protonym-based circumscription in this case would give us an imprecise representation of the concept mappings.
However, in my experience (working in the Pacific, where this sort of circumsctance of eastern vs. western vs. central population differences happens a LOT), it's actually a very rare problem. That is, in scenario 1, it's most likely the case that Taxonomist B would have included the central populations the same way that Taxonomist A would have. As for scenario 2, I'm struggling to think of even a single example of this. I suspect that it's just very rare.
So the point is, I think that protonym-based circumscription definitions are perfectly adequate for the vast majority of use cases.
The real world example that forms my litmus test is the blue-headed vireo, Vireo solitarius (Wilson 1810) which was originally called Muscicapa solitaria and has also been combined to form Vireosylvia solitaria and Lanivireo solitarius. Of course there are lexical variants as well (Google "Lanivireo solitaria" for example). These, properly structured, would be the sort of useful set of lexical/ nomenclatural content I would hope as a response from a GNI/GNUB resolution service based on protonymID.
Send me a bunch of usage instances involving all the different name variants, and involving various concept definitions, and I can create a sample GNUB dataset that would illustrate how this would work. The name-mapping things is trivial, once the TNU instances have been populated. The concept mapping stuff is a bit more complex -- but still relatively simple compared to algorithms for, say, oxygen control systems in rebreathers..... :-)
One current view of the taxon (concept C1) has this species occupying the eastern part of the US. Another species, Vireo plumbeus Coues, 1866, (concept C2) occupies the middle west USA, and a third species, Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western coast.
Another view lumps all three of these into a single species which, based on the rule of priority, has the valid name Vireo solitarius and results in a new concept (C4). This concept includes C1, C2, and C3. Both concepts have the scientific name of Vireo solitarius.
We can access and represent these in a consistent fashion using our CLB and probably others can too in their own index models.
So, now we have a specimen of Vireo solitarius that was captured in Minnesota. It might be an errant instance of C1, Vireo solitarius sensu stricto, that strayed a bit west of normal. It might be (C4) Vireo solitarius, sensu lato. The specimen would need that concept identifier tied to the record to make this explicit. So, let's say that the identifier was made using the lumped concept (C4). Of course, if this doesn't make it into the record, we are stuck with the name alone.
Right -- this sounds like the same as the hypothetical example I made above. But like I say, I think this example is the exception, rather than the rule (i.e., it falls in the missing 20% of the "benefit" in the 80% benefit/20% work ratio).
Using the method (6) you described would allow a user to discover the different treatments of Vireo solitarius (C1 and C4) and provide some means to discriminate them via concept resolution.
- C4 includes C1, C2, and C3 which would include all the names above.
- C1 would only include the nomenclatural/lexical variants
for Vireo solitarius.
Resolution will enable us to perform a significantly more useful and concept-informed search. It will, however, include the specimen I referenced above in BOTH cases because "Vireo solitarius" or it's protonymID will be a search term in both cases.
Right -- until someone else comes along and provides a more explicit identification for that specimen.
A more precise concept based system would utilise a required taxon concept identifier in the specimen record to discriminate different uses of the SAME NAME.
Sure! That would be fantastic -- and maybe someday we'll get to the point where all specimen/observation identification events come in the form of "Aus bus sec. Smith 1955", rather than simply "Aus bus" (as the vast majority are now). This, in my mind, is the single greatest and most consistent informatics failure within legacy taxonomic works and specimen databases. But I think the good news is that we can still get 80% of the benefit by going only as far as protonyms (which we *can* derive from a name alone -- once we get past homonymy and gross misspellings).
In other words, if you did a search of Vireo solitarius and the concept resolver indicated the different concepts above and you chose the sensu stricto (split) version, you would get the C1 labelled records but the C4 labelled records would be excluded or at least come with a warning (may not be what you are looking for). This of course requires our specimen records to have a concept identifier. Or, the concept definition itself will include additional annotations to enable us to make inferences
I think the best we can do is flag those cass, and rely on caveat emptor.
Publication date of the concept - If the split didn't happen until 1980 and the specimen is from 1960 then we could infer C4. Distribution information for the concept - if we disregard errant specimens then we might infer a 1985 Minnesota specimen is a C2 in spite of the different name.
The date one could work within the GNUB architecture, because that dates are all there (as long as the specimen identification was also dated). With the right integration with GBIF, the distribution one *might* be derivable algorithmically, but it wold depend on the nature of the data.
In sum, we are on track for achieving this and I believe our data mobilisation strategy will support getting these sort of data published. When Markus returns from paternity leave I would hope we could include his thoughts on how we might expose these as RDF via our indices to support all aspects of this discussion.
Keep on a keepin' on....
Rich
P.S. Congrats to Markus! I was unaware!
Silly me to think that you might actually be approaching done with this conversation. ;-)
On Sun, Jun 13, 2010 at 1:50 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Dave,
By linking to a populated GNUB it would also have an improved means to provide the protonym circumscription of the concept, as you describe in (5).
Just to be clear, when you say "protonym circumscription of the concept", you mean a concept circumscription whose boundaries are defined by the set of included protonyms (as opposed to the concept circumscription established for the Protonym-usage instance; i.e., original description). Correct? Although such concept/circumscription definitions (effectively represented by the set of type specimens implied by the set of protonyms) are not as high-resoultion as concept/circumscription definitions that are defined by a broader suite of specimens, populations, or characters; they are, I believe, the "best bang for the buck" in that they give us 80% of the benefit for 20% of the work.
In addition, we would like to support the inclusion of bibliographic data,
Already included via GNUB.
specimens,
In my mind, a *key* value of GNUB/GNA is to serve as a taxon authority for specimen collections (i.e., the anchorpoints for specimen/observation taxonomic identifications).
geospatial information,
Inherited from the specimens/observations.
and general descriptive data.
Inherited from the PLAZI treatments anchored to the publications, as well as the published and unpublished character data anchored through specimens.
In (5) you describe the protonym-based circumscription to evaluate the relative agreement of the identified concepts (via 'meta- authorities'). This provides the basis for expanding the potential set of names for a subsequent data retrieval from GBIF (for example) to include all the related nomenclatural and lexical variants for those names (of course checking for homonym conflicts among them).
Yes, exactly!
In (6) it appears the output of the Taxon Concept resolution process is either an expanded set of name strings or an array of protonymIDs.
Before the content is built, the name-strings can be fed back into GNI to snoop out additional possible protonym links. However, in a data-populatd paradigm, it would be an array of ProtonymIDs.
If the latter, I can see how this would provide a more precise concept-informed but name-based retrieval method and probably the best we can expect from large indices like GBIF. But I don't see how it will support a strict concept-based retrieval.
If you are content with a protonym-based concept circumscription definition, it has all you need. Each Taxon Name Usage instance in GNUB represents an array of (minimually one) ProtonymIDs -- that is, the set of all protonyms representing the asserted taxon concept in the usage instance. Like I said, it's not as high-resolution as specimen/population/character-based concept/circumscription definitions, but I think it gets us most of the way there, with the least amount of effort (not to say that it requires little effort to get us that far -- just that trying to define concept boundaries at higher resolution requires *MUCH* more effort).
So, the question is, what concept boundaries are fuzzy when you use Protonym-based definitions?
Imagine an example where we have 7 protonyms of something in the Pacific; three described from type specimens collected in the eastern Pacific, and four from specimens collected throughout the western Pacific. We also have a bunch of specimens from the central Pacific, but no Protonyms typified from that region.
Taxonomist "A" declares that the three protonyms from the eastern Pacific represents one valid species (Aus bus), and the four from the west represent a second valid species (Aus xus). Taxonomist "B" declares the exact same thing. Using Protonym-based circumscriptions, we can infer that each the taxon concepts of "Aus bus" and "Aus xus" are both congruent between the two taxonomists.
The fuzziness comes in for the central Pacific populations:
- Suppose that Taxonomist "A" explicitly cited the populations in the
central Pacific, and declared them to be "Aus bus"; but Taxonomist "B" never mentioned them. In that case, we would probably want to establish the concept realtionship as "Aus bus sec. A <includes> Aus bus sec. B" (as opposed to "is congruent with", as would be the case for a Protonym-based circumscription).
- Suppose that Taxonomist "A" explicitly cited the populations in the
central Pacific, and declared them to be "Aus bus"; but Taxonomist "B" cited those same populations as belonging to "Aus xus". In that case, we would probably want to establish the concept realtionship as "Aus bus sec. A <overlaps with> Aus bus sec. B". Again, the Protonym-based circumscription in this case would give us an imprecise representation of the concept mappings.
However, in my experience (working in the Pacific, where this sort of circumsctance of eastern vs. western vs. central population differences happens a LOT), it's actually a very rare problem. That is, in scenario 1, it's most likely the case that Taxonomist B would have included the central populations the same way that Taxonomist A would have. As for scenario 2, I'm struggling to think of even a single example of this. I suspect that it's just very rare.
So the point is, I think that protonym-based circumscription definitions are perfectly adequate for the vast majority of use cases.
The real world example that forms my litmus test is the blue-headed vireo, Vireo solitarius (Wilson 1810) which was originally called Muscicapa solitaria and has also been combined to form Vireosylvia solitaria and Lanivireo solitarius. Of course there are lexical variants as well (Google "Lanivireo solitaria" for example). These, properly structured, would be the sort of useful set of lexical/ nomenclatural content I would hope as a response from a GNI/GNUB resolution service based on protonymID.
Send me a bunch of usage instances involving all the different name variants, and involving various concept definitions, and I can create a sample GNUB dataset that would illustrate how this would work. The name-mapping things is trivial, once the TNU instances have been populated. The concept mapping stuff is a bit more complex -- but still relatively simple compared to algorithms for, say, oxygen control systems in rebreathers..... :-)
One current view of the taxon (concept C1) has this species occupying the eastern part of the US. Another species, Vireo plumbeus Coues, 1866, (concept C2) occupies the middle west USA, and a third species, Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western coast.
Another view lumps all three of these into a single species which, based on the rule of priority, has the valid name Vireo solitarius and results in a new concept (C4). This concept includes C1, C2, and C3. Both concepts have the scientific name of Vireo solitarius.
We can access and represent these in a consistent fashion using our CLB and probably others can too in their own index models.
So, now we have a specimen of Vireo solitarius that was captured in Minnesota. It might be an errant instance of C1, Vireo solitarius sensu stricto, that strayed a bit west of normal. It might be (C4) Vireo solitarius, sensu lato. The specimen would need that concept identifier tied to the record to make this explicit. So, let's say that the identifier was made using the lumped concept (C4). Of course, if this doesn't make it into the record, we are stuck with the name alone.
Right -- this sounds like the same as the hypothetical example I made above. But like I say, I think this example is the exception, rather than the rule (i.e., it falls in the missing 20% of the "benefit" in the 80% benefit/20% work ratio).
Using the method (6) you described would allow a user to discover the different treatments of Vireo solitarius (C1 and C4) and provide some means to discriminate them via concept resolution.
- C4 includes C1, C2, and C3 which would include all the names above.
- C1 would only include the nomenclatural/lexical variants
for Vireo solitarius.
Resolution will enable us to perform a significantly more useful and concept-informed search. It will, however, include the specimen I referenced above in BOTH cases because "Vireo solitarius" or it's protonymID will be a search term in both cases.
Right -- until someone else comes along and provides a more explicit identification for that specimen.
A more precise concept based system would utilise a required taxon concept identifier in the specimen record to discriminate different uses of the SAME NAME.
Sure! That would be fantastic -- and maybe someday we'll get to the point where all specimen/observation identification events come in the form of "Aus bus sec. Smith 1955", rather than simply "Aus bus" (as the vast majority are now). This, in my mind, is the single greatest and most consistent informatics failure within legacy taxonomic works and specimen databases. But I think the good news is that we can still get 80% of the benefit by going only as far as protonyms (which we *can* derive from a name alone -- once we get past homonymy and gross misspellings).
In other words, if you did a search of Vireo solitarius and the concept resolver indicated the different concepts above and you chose the sensu stricto (split) version, you would get the C1 labelled records but the C4 labelled records would be excluded or at least come with a warning (may not be what you are looking for). This of course requires our specimen records to have a concept identifier. Or, the concept definition itself will include additional annotations to enable us to make inferences
I think the best we can do is flag those cass, and rely on caveat emptor.
Publication date of the concept - If the split didn't happen until 1980 and the specimen is from 1960 then we could infer C4. Distribution information for the concept - if we disregard errant specimens then we might infer a 1985 Minnesota specimen is a C2 in spite of the different name.
The date one could work within the GNUB architecture, because that dates are all there (as long as the specimen identification was also dated). With the right integration with GBIF, the distribution one *might* be derivable algorithmically, but it wold depend on the nature of the data.
In sum, we are on track for achieving this and I believe our data mobilisation strategy will support getting these sort of data published. When Markus returns from paternity leave I would hope we could include his thoughts on how we might expose these as RDF via our indices to support all aspects of this discussion.
Keep on a keepin' on....
Rich
P.S. Congrats to Markus! I was unaware!
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Dude...this conversation has been ongoing for more than 20 years (at least that's how long I've been participating in it -- the conversation has actually been going on since the dawn of biodiversity informatics). I doubt that we're going to resolve it now. But I do agree we need to get this kind of information captured in a more easily accessible, better-summarized archival form. Whether that should be in DwC wikispace, or GNA wikispace, however, is not clear.
But for now, the conversation is still hot, and in my experience, nothing throws a bucket of water on a hot topic of conversation more effectively than porting it from a push-based email list to pull-based wiki forum. My greatest hope is that we can:
1) Get passed the crude vocabulary and semantics (I'm using both of these terms in the vernacular sense here, not the technical sense) so that we can figure out if we really are all on the same page (or not); and
2) Sparking the generation of some sort of summary document that can live on the appropriate web-based discussion forum, with associated dialog & discussion.
Rich
-----Original Message----- From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Sunday, June 13, 2010 10:55 AM To: Richard Pyle Cc: David Remsen (GBIF); tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Name is species concept thinking
Silly me to think that you might actually be approaching done with this conversation. ;-)
On Sun, Jun 13, 2010 at 1:50 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Hi Dave,
By linking to a populated GNUB it would also have an
improved means
to provide the protonym circumscription of the concept, as you describe in (5).
Just to be clear, when you say "protonym circumscription of the concept", you mean a concept circumscription whose boundaries are defined by the set of included protonyms (as opposed to the concept circumscription established for the Protonym-usage
instance; i.e., original description). Correct?
Although such concept/circumscription definitions (effectively represented by the set of type specimens implied by the set of protonyms) are not as high-resoultion as concept/circumscription definitions that are defined by a broader suite of specimens, populations, or characters; they are, I believe, the "best bang for the buck" in that they give us 80% of the benefit for 20%
of the work.
In addition, we would like to support the inclusion of
bibliographic
data,
Already included via GNUB.
specimens,
In my mind, a *key* value of GNUB/GNA is to serve as a
taxon authority
for specimen collections (i.e., the anchorpoints for specimen/observation taxonomic identifications).
geospatial information,
Inherited from the specimens/observations.
and general descriptive data.
Inherited from the PLAZI treatments anchored to the
publications, as
well as the published and unpublished character data
anchored through specimens.
In (5) you describe the protonym-based circumscription to evaluate the relative agreement of the identified concepts (via 'meta- authorities'). This provides the basis for expanding
the potential
set of names for a subsequent data retrieval from GBIF (for example) to include all the related nomenclatural and lexical variants for those names (of course checking for homonym conflicts among them).
Yes, exactly!
In (6) it appears the output of the Taxon Concept
resolution process
is either an expanded set of name strings or an array of
protonymIDs.
Before the content is built, the name-strings can be fed
back into GNI
to snoop out additional possible protonym links. However, in a data-populatd paradigm, it would be an array of ProtonymIDs.
If the latter, I can see how this would provide a more precise concept-informed but name-based retrieval method and probably the best we can
expect from
large indices like GBIF. But I don't see how it will support a strict concept-based retrieval.
If you are content with a protonym-based concept circumscription definition, it has all you need. Each Taxon Name Usage instance in GNUB represents an array of (minimually one) ProtonymIDs --
that is,
the set of all protonyms representing the asserted taxon concept in the usage instance. Like I said, it's not as high-resolution as specimen/population/character-based concept/circumscription definitions, but I think it gets us most of the way there, with the least amount of effort (not to say that it requires little effort to get us that far -- just that trying to define concept boundaries at higher resolution requires
*MUCH* more effort).
So, the question is, what concept boundaries are fuzzy when you use Protonym-based definitions?
Imagine an example where we have 7 protonyms of something in the Pacific; three described from type specimens collected in
the eastern
Pacific, and four from specimens collected throughout the western Pacific. We also have a bunch of specimens from the
central Pacific,
but no Protonyms typified from that region.
Taxonomist "A" declares that the three protonyms from the eastern Pacific represents one valid species (Aus bus), and the
four from the
west represent a second valid species (Aus xus). Taxonomist "B" declares the exact same thing. Using Protonym-based
circumscriptions,
we can infer that each the taxon concepts of "Aus bus" and
"Aus xus"
are both congruent between the two taxonomists.
The fuzziness comes in for the central Pacific populations:
- Suppose that Taxonomist "A" explicitly cited the
populations in the
central Pacific, and declared them to be "Aus bus"; but
Taxonomist "B"
never mentioned them. In that case, we would probably want to establish the concept realtionship as "Aus bus sec. A
<includes> Aus
bus sec. B" (as opposed to "is congruent with", as would be
the case
for a Protonym-based circumscription).
- Suppose that Taxonomist "A" explicitly cited the
populations in the
central Pacific, and declared them to be "Aus bus"; but
Taxonomist "B"
cited those same populations as belonging to "Aus xus". In
that case,
we would probably want to establish the concept
realtionship as "Aus
bus sec. A <overlaps with> Aus bus sec. B". Again, the
Protonym-based
circumscription in this case would give us an imprecise
representation
of the concept mappings.
However, in my experience (working in the Pacific, where
this sort of
circumsctance of eastern vs. western vs. central population differences happens a LOT), it's actually a very rare
problem. That
is, in scenario 1, it's most likely the case that
Taxonomist B would
have included the central populations the same way that
Taxonomist A
would have. As for scenario 2, I'm struggling to think of even a single example of this. I suspect that it's just very rare.
So the point is, I think that protonym-based circumscription definitions are perfectly adequate for the vast majority of
use cases.
The real world example that forms my litmus test is the
blue-headed
vireo, Vireo solitarius (Wilson 1810) which was originally called Muscicapa solitaria and has also been combined to form Vireosylvia solitaria and Lanivireo solitarius. Of course there are lexical variants as well (Google "Lanivireo solitaria" for
example). These,
properly structured, would be the sort of useful set of lexical/ nomenclatural content I would hope as a response from a GNI/GNUB resolution service based on protonymID.
Send me a bunch of usage instances involving all the different name variants, and involving various concept definitions, and I
can create
a sample GNUB dataset that would illustrate how this would
work. The
name-mapping things is trivial, once the TNU instances have
been populated.
The concept mapping stuff is a bit more complex -- but still relatively simple compared to algorithms for, say, oxygen control systems in rebreathers..... :-)
One current view of the taxon (concept C1) has this
species occupying
the eastern part of the US. Another species, Vireo
plumbeus Coues,
1866, (concept C2) occupies the middle west USA, and a
third species,
Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western coast.
Another view lumps all three of these into a single species which, based on the rule of priority, has the valid name Vireo solitarius and results in a new concept (C4). This concept includes C1, C2, and C3. Both concepts have the scientific name of Vireo solitarius.
We can access and represent these in a consistent fashion
using our
CLB and probably others can too in their own index models.
So, now we have a specimen of Vireo solitarius that was captured in Minnesota. It might be an errant instance of C1, Vireo solitarius sensu stricto, that strayed a bit west of normal. It
might be (C4)
Vireo solitarius, sensu lato. The specimen would need
that concept
identifier tied to the record to make this explicit.
So, let's say
that the identifier was made using the lumped concept (C4). Of course, if this doesn't make it into the record, we are
stuck with
the name alone.
Right -- this sounds like the same as the hypothetical
example I made above.
But like I say, I think this example is the exception,
rather than the
rule (i.e., it falls in the missing 20% of the "benefit" in the 80% benefit/20% work ratio).
Using the method (6) you described would allow a user to
discover the
different treatments of Vireo solitarius (C1 and C4) and provide some means to discriminate them via concept resolution.
- C4 includes C1, C2, and C3 which would include all the
names above.
- C1 would only include the nomenclatural/lexical variants
for Vireo
solitarius.
Resolution will enable us to perform a significantly more
useful and
concept-informed search. It will, however, include the specimen I referenced above in BOTH cases because "Vireo solitarius" or it's protonymID will be a search term in both cases.
Right -- until someone else comes along and provides a more
explicit
identification for that specimen.
A more precise concept based system would utilise a required taxon concept identifier in the specimen record to discriminate
different
uses of the SAME NAME.
Sure! That would be fantastic -- and maybe someday we'll
get to the
point where all specimen/observation identification events
come in the
form of "Aus bus sec. Smith 1955", rather than simply "Aus bus" (as the vast majority are now). This, in my mind, is the
single greatest
and most consistent informatics failure within legacy
taxonomic works
and specimen databases. But I think the good news is that we can still get 80% of the benefit by going only as far as
protonyms (which
we *can* derive from a name alone -- once we get past
homonymy and gross misspellings).
In other words, if you did a search of Vireo solitarius and the concept resolver indicated the different concepts above and you chose the sensu stricto (split) version, you would get the C1
labelled records
but the C4 labelled records would be excluded or at least
come with a
warning (may not be what you are looking for). This of course requires our specimen records to have a concept identifier. Or, the concept definition itself will include additional annotations to enable us to make inferences
I think the best we can do is flag those cass, and rely on
caveat emptor.
Publication date of the concept - If the split didn't happen until 1980 and the specimen is from 1960 then we could infer C4. Distribution information for the concept - if we disregard errant specimens then we might infer a 1985 Minnesota specimen is a C2 in spite of the different name.
The date one could work within the GNUB architecture, because that dates are all there (as long as the specimen identification
was also
dated). With the right integration with GBIF, the distribution one *might* be derivable algorithmically, but it wold depend on
the nature of the data.
In sum, we are on track for achieving this and I believe our data mobilisation strategy will support getting these sort of data published. When Markus returns from paternity leave I
would hope we
could include his thoughts on how we might expose these as RDF via our indices to support all aspects of this discussion.
Keep on a keepin' on....
Rich
P.S. Congrats to Markus! I was unaware!
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On 13 Jun 2010, at 22:04, Richard Pyle wrote:
Dude...this conversation has been ongoing for more than 20 years
Rich,
I reckon about 2,500 years if you count the kick off as being Plato/Aristotle and the notion of essences (in Western cultures at least) but we are probably more up to date than this - perhaps advancing as far as John Locke in the 17th century.
What was the 'biological' (rather than epistemological) problem we were trying to solve?
All the best,
Roger
The only problem we're trying to solve here is how a computer can approximate taxon-concept boundaries (so one can get a proper distribution map of a species, even if the dots are tied to an assortment of different names/spellings), but without an ENORMOUS amount of work by humans. I don't think Plato or Aristotle, or even Locke were worried too much about that particular issue.
My contention is that the answer lies with TNUs. But of course, that's always my contention.
Rich
-----Original Message----- From: Roger Hyam [mailto:rogerhyam@mac.com] Sent: Sunday, June 13, 2010 10:04 PM To: Richard Pyle Cc: tuco@berkeley.edu; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Name is species concept thinking
On 13 Jun 2010, at 22:04, Richard Pyle wrote:
Dude...this conversation has been ongoing for more than 20 years
Rich,
I reckon about 2,500 years if you count the kick off as being Plato/Aristotle and the notion of essences (in Western cultures at least) but we are probably more up to date than this - perhaps advancing as far as John Locke in the 17th century.
What was the 'biological' (rather than epistemological) problem we were trying to solve?
All the best,
Roger
Hi David,
I would go about this in a different but not necessarily better way.
You have three entities that some would say make up one species and others would say are three species.
There are two aspect to this:
1) To what extent are these three entities more species like than subspecies like? 2) To what extent are other groups treating these as separate species or as one species.
#1) I would check would see to what extent there is actual gene flow between these different entities. This seems a more direct way to answer this than analyzing other descriptions. If they do seem to be more species-like then document the within population gene variation and document the morphological and other characters that seem to separate these entities. Expose that data via a URI for each of the species concepts.
If they seem to be more like subpopulations of one species then you have to decide if they will be treated as what I call "ObjectiveSpecies". Objective species are those entities that people have chosen to model as species even if they might not be. So all species are in a sense at least "Objective Species"
I use Objective species to separate domestic varieties like *Felis catus* from their wild relatives *Felis silvestris lybica. *Why? because occurrence records and publications about the house cat should not necessarily seen as relating to the African Wildcat.
2) How other people treat this entity is important. If they are seeing it as a separate entity and marking up their related records as if it a separate entity then maybe it is best modeled as an objective species with it's own URI. You can always merge these records yourself if you want to consider them one species in your analysis, and it is easier to merge than split. If the DNA population analysis suggest that there is some reality to these subpopulations then you record that, if not you note the DNA issue in your species description info.
In looking over how other groups conceptualize these entities, it seems as if many are going with the three species alternative. This includes ITIS, CoL and Wikipedia (DBpedia) and various Bird related sites.
So here is how I modeled these, below are links to the RDF and to what the LOD knows about them via Sindice.
Blue-headed Vireo *Vireo solitarius* Sigma http://sig.ma/search?pid=f896427d96a4d5a02e59ce44f32a6529 RDF http://lod.taxonconcept.org/ses/kw8XU.rdf
Cassin's Vireo *Vireo cassinii * Sigma http://sig.ma/search?pid=476eeb19e803285cfde3f4c4b8b8594b RDF http://lod.taxonconcept.org/ses/XAMBv.rdf
Plumbeous Vireo *Vireo plumbeus * Sigma http://sig.ma/search?pid=de4f58bde659eba689b4af56476cacae RDF http://lod.taxonconcept.org/ses/Jjvx5.rdf
I don't see these different approaches as either / or I think that they are complimentary but different ways to doing this depending on what kinds of questions you want to ask.
Respectfully,
- Pete
On Sun, Jun 13, 2010 at 5:55 AM, David Remsen (GBIF) dremsen@gbif.orgwrote:
Rich
What you described in 1-5 was exactly the scope and function of uBio NameBank and ClassificationBank. This functionality has been refined in our ChecklistBank index.
It serves to provide a consistent resolution service for "Taxon Concept Service Providers". By linking to a populated GNUB it would also have an improved means to provide the protonym circumscription of the concept, as you describe in (5). In addition, we would like to support the inclusion of bibliographic data, specimens, geospatial information, and general descriptive data. The DwC Archive approach provides one (not exclusively but I would appreciate pointers to others) means to mobilise these data from people who have it.
In (5) you describe the protonym-based circumscription to evaluate the relative agreement of the identified concepts (via 'meta-authorities'). This provides the basis for expanding the potential set of names for a subsequent data retrieval from GBIF (for example) to include all the related nomenclatural and lexical variants for those names (of course checking for homonym conflicts among them). Again, this is consistent to what was implemented in uBio services and we are currently implementing in our Checklist Bank (CLB) (I use the term General Concept Mapping for this process). I'm not sure I agree that this provides a true concept-based system, however. I would call it a concept-informed system.
In (6) it appears the output of the Taxon Concept resolution process is either an expanded set of name strings or an array of protonymIDs. I can see this is an option in (6). If the latter, I can see how this would provide a more precise concept-informed but name-based retrieval method and probably the best we can expect from large indices like GBIF. But I don't see how it will support a strict concept-based retrieval.
The real world example that forms my litmus test is the blue-headed vireo, Vireo solitarius (Wilson 1810) which was originally called Muscicapa solitaria and has also been combined to form Vireosylvia solitaria and Lanivireo solitarius. Of course there are lexical variants as well (Google "Lanivireo solitaria" for example). These, properly structured, would be the sort of useful set of lexical/nomenclatural content I would hope as a response from a GNI/GNUB resolution service based on protonymID.
One current view of the taxon (concept C1) has this species occupying the eastern part of the US. Another species, Vireo plumbeus Coues, 1866, (concept C2) occupies the middle west USA, and a third species, Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western coast.
Another view lumps all three of these into a single species which, based on the rule of priority, has the valid name Vireo solitarius and results in a new concept (C4). This concept includes C1, C2, and C3. Both concepts have the scientific name of Vireo solitarius.
We can access and represent these in a consistent fashion using our CLB and probably others can too in their own index models.
So, now we have a specimen of Vireo solitarius that was captured in Minnesota. It might be an errant instance of C1, Vireo solitarius sensu stricto, that strayed a bit west of normal. It might be (C4) Vireo solitarius, sensu lato. The specimen would need that concept identifier tied to the record to make this explicit. So, let's say that the identifier was made using the lumped concept (C4). Of course, if this doesn't make it into the record, we are stuck with the name alone.
Using the method (6) you described would allow a user to discover the different treatments of Vireo solitarius (C1 and C4) and provide some means to discriminate them via concept resolution.
- C4 includes C1, C2, and C3 which would include all the names above.
- C1 would only include the nomenclatural/lexical variants for Vireo
solitarius.
Resolution will enable us to perform a significantly more useful and concept-informed search. It will, however, include the specimen I referenced above in BOTH cases because "Vireo solitarius" or it's protonymID will be a search term in both cases.
A more precise concept based system would utilise a required taxon concept identifier in the specimen record to discriminate different uses of the SAME NAME. In other words, if you did a search of Vireo solitarius and the concept resolver indicated the different concepts above and you chose the sensu stricto (split) version, you would get the C1 labelled records but the C4 labelled records would be excluded or at least come with a warning (may not be what you are looking for). This of course requires our specimen records to have a concept identifier. Or, the concept definition itself will include additional annotations to enable us to make inferences
Ex.,
Publication date of the concept - If the split didn't happen until 1980 and the specimen is from 1960 then we could infer C4. Distribution information for the concept - if we disregard errant specimens then we might infer a 1985 Minnesota specimen is a C2 in spite of the different name.
In sum, we are on track for achieving this and I believe our data mobilisation strategy will support getting these sort of data published. When Markus returns from paternity leave I would hope we could include his thoughts on how we might expose these as RDF via our indices to support all aspects of this discussion.
David
On Jun 13, 2010, at 2:37 AM, Richard Pyle wrote:
Tim: Coffee time.
Dave:
Here's how I imagine this would work under GNA, integrated with GBIF:
- Person submits text-string "Puma concolor" to a GNA-aware mapping
service.
- Service fires text string off to GNI, and sees how many lexical buckets
are involved, and how many protonyms are represented in those buckets.
- If problems of Homonymy/Homography exist (i.e., if more than one
legitimate Protonym for a species-group name "concolor" has ever been combined with a genus-group name "Puma"), then the service replies with a page that says "Do you mean the big cat, or do you mean the protozoa?" (pretending, for a moment, that the name "Puma concolor" has also been applied to a protozoa). Perhaps the service can also review the usage history of the two names, and algorithmically determine that they most likely meant the big cat -- but at least alert the user that a potential case of homonymy/homography exists.
- If step 2 yielded no apaprent homonymy/Homography, or if the user
selected one from among more than one Homonyms/Homographs, then the service takes the selected ProtonymID and throws it at a GNUB-aware taxon concept resolver.
- The GNUB-aware Taxon Concept resolver looks at how many Taxon Concept
Service Providers (e.g., ITIS, EOL, WoRMS, etc.) have made some sort of concept-definition assertion about the Protonym. In most cases, this could/should be as simple as "Concept Service [X] says that for Protonym [IDp], follow taxon name usage-instance [IDtnu]". Given [IDtnu], GNUB will tell us which Genus combination to use, which orthographic spelling to use, which taxon rank to use, and which set of Protonyms should be regarded as subjective synonyms of the taxon concept represented by [IDtnu]. If the different taxon concept providers (I call them "Meta Authorities") all agree (i.e., each taxon concept provider yields the same set of ProtonymIDs), then no user interaction is required on this step. If there are different interpretations of what the current treatment of "Puma concolor [big cat]" should be, then the user is presented with the different options (and perhaps a bit of information on what the different active concepts are, in terms of distribution and/or classification).
- The resultant set of Protonym IDs from step 5 (the original ProtonymID
from step 2/3, plus the exploded set of Protonyms for subjective/hetrotypic synonyms from step 5), are then thrown at GBIF (which would be GNA-Aware, and thus know how to translate all the ProtonymIDs into a larger set of text-string names and/or GBIF may have already cashed this by converting text-string names from occurrence providers into ProtonymIDs via GNI).
- The user is then presented with a distributional map from GBIF
occurrence records, based on the selected Protonym of the original submitted text-string name, cast in the context of the set of heterotypic synonyms established in Step 5.
The bad news is that this sounds incredibly complicated. The good news is that it's actually not. Especially not from the user's perspective.
In the WORST case scenario, the user needs to provide three pieces of information:
The text-string name submitted in Step 1.
A decision in the case of Homonyms/Homographs, what
critter/weed/microbe they're after.
- A decision about which Meta Authority to follow for the taxon concept.
This, again, is the WORST case scenario. A much more likely scenario involves fewere steps for the end user.
Consider:
Step 2 only applies in the 10%(ish) cases of text-string names involved in some sort of Homonymy/Homography problem. So in 90%(ish) of cases, step 2 won't come into play.
Step 3 only applies in cases where the Meta-Authorities disagree on the current usage of a name (e.g., ITIS is a lumper, WoRMS is a splitter). Even in cases where there is disagreement, the user could simply be presnted with two (or more) maps, showing each of the current interpretations/statuses of the selected critter/weed. For example, the user might get a page that says "If you follow the ITIS interpretation of this species, the map looks like this. If you follow the WoRMS interpretation of the name, the map looks like that."
And, indeed, Step 1 wouldn't exist in the majority of cases, because I suspect most people will get to the Map service by clicking on a link from some web page article or database system. In most cases, this link would also bypass Step 2 as well.
In other words, if we can continue to develop GNA the way we're already developing it, we should be able to get the the point (Soon!) where a user clicks a link on a web page, and immediately gets a single map distribution using the taxon concpet adopted by the overwhelming majority of Meta-Authorities, or (at worst) gets more than one map based on more-than one contemporary/contentious views of what the species concept should be (with links to more information, if the user wants the details).
So, if we keep building GNA, we should have exactly the service that Pete says he'd like to have (i.e., a single map with the full distribution of the species, regardless of what text-string name is used to lable the georef'd occurrence data-points).
Simple, really....
:-)
Rich
-----Original Message-----
From: David Remsen (GBIF) [mailto:dremsen@gbif.org] Sent: Saturday, June 12, 2010 10:50 AM To: Peter DeVries Cc: David Remsen (GBIF); Richard Pyle; tdwg-content@lists.tdwg.org; Kevin Richards; Jerry Cooper Subject: Re: [tdwg-content] Name is species concept thinking
Pete -
This statement has been sticking with me since I read it. It might be me but I don't see any relationship between that statement and how this relates to taxon concepts. In a concept-based system you could easily have two different maps for Puma concolor. Whether Felis concolor is included is not relevant because nomenclatural synonyms have no bearing on the circumscription. They are both names for the same type.
There may be two different concepts (circumscriptions) published for Aedes triseriatus. It could be quite legit for a different (objective synonym only) name like Oclerotatus triseriatus to refer to that same concept. So in that sense, there is a rationale for different scientific names to be able to reference the same concept to meet that requirement of the example you cite. But in zoology these examples aren't even considered different names and the rule of priority would prevent truly different (heterotypic names) from referring to the same type so the use cases for different scientific names being able to refer to a single concept ID are quite limited.
Mapping objective (homotypic) synonymy provides the basis for providing a single map for those examples you cite but it's not using true concept-based principles.
Best, David
Frankly I think it would be an improvement if we could get maps etc that combine Aedes triseriatus / Ochlerotatus triseriatus
into one map
and Felis concolor and Puma concolor into a different
single map. :-)
Respectfully,
- Pete
participants (7)
-
David Remsen (GBIF)
-
Jerry Cooper
-
John Wieczorek
-
Kevin Richards
-
Peter DeVries
-
Richard Pyle
-
Roger Hyam