Hi David,<div><br></div><div>I would go about this in a different but not necessarily better way.</div><div><br></div><div>You have three entities that some would say make up one species and others would say are three species.</div>

<div><br></div><div>There are two aspect to this:</div><div><br></div><div>1) To what extent are these three entities more species like than subspecies like?</div><div>2) To what extent are other groups treating these as separate species or as one species.</div>

<div><br></div><div>#1) I would check would see to what extent there is actual gene flow between these different entities. This seems a more direct way to answer this than analyzing other descriptions.</div><div>     If they do seem to be more species-like then document the within population gene variation and document the morphological and other characters that seem to separate these entities.</div>

<div>     Expose that data via a URI for each of the species concepts.</div><div><br></div><div>     If they seem to be more like subpopulations of one species then you have to decide if they will be treated as what I call &quot;ObjectiveSpecies&quot;. Objective species are those entities that people</div>

<div>     have chosen to model as species even if they might not be. So all species are in a sense at least &quot;Objective Species&quot;</div><div><br></div><div>     I use Objective species to separate domestic varieties like <i>Felis catus</i> from their wild relatives <span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; "><em>Felis silvestris lybica. </em>Why? because occurrence records and publications about the house cat should</span></div>

<div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; ">     not necessarily seen as relating to the African Wildcat.</span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; "><br>

</span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; ">2) How other people treat this entity is important. If they are seeing it as a separate entity and marking up their related records as if it a separate entity then maybe it is best modeled as an</span></div>

<div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; ">    objective species with it&#39;s own URI. You can always merge these records yourself if you want to consider them one species in your analysis, and it is easier to merge than split.</span></div>

<div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; ">    If the DNA population analysis suggest that there is some reality to these subpopulations then you record that, if not you note the DNA issue in your species description info.</span></div>

<div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; "><br></span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; ">In looking over how other groups conceptualize these entities, it seems as if many are going with the three species alternative. This includes ITIS, CoL and Wikipedia (DBpedia) and various Bird related</span></div>

<div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; ">sites.</span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 15px; "><br></span></div>

<div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="line-height: 15px;">So here is how I modeled these, below are links to the RDF and to what the LOD knows about them via Sindice.</span></font></div>

<div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="line-height: 15px;"><br></span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="line-height: 15px;"><div>

Blue-headed Vireo <i>Vireo solitarius</i></div><div>Sigma &lt;<a href="http://sig.ma/search?pid=f896427d96a4d5a02e59ce44f32a6529">http://sig.ma/search?pid=f896427d96a4d5a02e59ce44f32a6529</a>&gt;</div><div>RDF &lt;<a href="http://lod.taxonconcept.org/ses/kw8XU.rdf">http://lod.taxonconcept.org/ses/kw8XU.rdf</a>&gt;</div>

<div><br></div><div>Cassin&#39;s Vireo <i>Vireo cassinii </i></div><div>Sigma &lt;h<a href="ttp://sig.ma/search?pid=476eeb19e803285cfde3f4c4b8b8594b">ttp://sig.ma/search?pid=476eeb19e803285cfde3f4c4b8b8594b</a>&gt;</div><div>

RDF &lt;h<a href="ttp://lod.taxonconcept.org/ses/XAMBv.rdf">ttp://lod.taxonconcept.org/ses/XAMBv.rdf</a>&gt;</div><div><br></div><div><br></div><div>Plumbeous Vireo <i>Vireo plumbeus </i></div><div>Sigma &lt;<a href="http://sig.ma/search?pid=de4f58bde659eba689b4af56476cacae">http://sig.ma/search?pid=de4f58bde659eba689b4af56476cacae</a>&gt;</div>

<div>RDF &lt;<a href="http://lod.taxonconcept.org/ses/Jjvx5.rdf">http://lod.taxonconcept.org/ses/Jjvx5.rdf</a>&gt;</div><div><br></div></span></font></div><div>I don&#39;t see these different approaches as either / or I think that they are complimentary but different ways to doing this depending on what kinds of questions you want to ask.</div>

<div><br></div><div>Respectfully,</div><div><br></div><div>- Pete</div><div><br></div><div><br><div class="gmail_quote">On Sun, Jun 13, 2010 at 5:55 AM, David Remsen (GBIF) <span dir="ltr">&lt;<a href="mailto:dremsen@gbif.org">dremsen@gbif.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Rich<br>

<br>

What you described in 1-5 was exactly the scope and function of uBio NameBank and ClassificationBank.   This functionality has been refined in our ChecklistBank index.<br>

<br>

It serves to provide a consistent resolution service for &quot;Taxon Concept Service Providers&quot;.<br>

By linking to a populated GNUB it would also have an improved means to provide the protonym circumscription of the concept, as you describe in (5).   In addition,  we would like to support the inclusion of bibliographic data, specimens,  geospatial information, and general descriptive data.   The DwC Archive approach provides one (not exclusively but I would appreciate pointers to others) means to mobilise these data from people who have it.<br>


<br>

In (5) you describe the protonym-based circumscription to evaluate the relative agreement of the identified concepts (via &#39;meta-authorities&#39;).    This provides the basis for expanding the potential set of names for a subsequent data retrieval from GBIF (for example) to include all the related nomenclatural and lexical variants for those names (of course checking for homonym conflicts among them).   Again, this is consistent to what was implemented in uBio services and we are currently implementing in our Checklist Bank (CLB)  (I use the term General Concept Mapping for this process).   I&#39;m not sure I agree that this provides a true concept-based system, however.  I would call it a concept-informed system.<br>


<br>

In (6) it appears the output of the Taxon Concept resolution process is either an expanded set of name strings or an array of protonymIDs.   I can see this is an option in (6).  If the latter,  I can see how this would provide a more precise concept-informed but name-based retrieval method and probably the best we can expect from large indices like GBIF.    But I don&#39;t see how it will support a strict concept-based retrieval.<br>


<br>

The real world example that forms my litmus test is the blue-headed vireo,  Vireo solitarius (Wilson 1810) which was originally called Muscicapa solitaria and has also been combined to form Vireosylvia solitaria and Lanivireo solitarius.   Of course there are lexical variants as well (Google &quot;Lanivireo solitaria&quot; for example).   These, properly structured, would be the sort of useful set of lexical/nomenclatural content I would hope as a response from a  GNI/GNUB resolution service based on protonymID.<br>


<br>

One current view of the taxon (concept C1) has this species occupying the eastern part of the US.   Another species, Vireo plumbeus Coues, 1866, (concept C2) occupies the middle west USA, and a third species, Vireo cassini Xántus de Vesey, 1858 (concept C3) is on the western coast.<br>


<br>

Another view lumps all three of these into a single species which, based on the rule of priority, has the valid name Vireo solitarius  and results in a new concept (C4).  This concept includes C1, C2,  and C3.   Both concepts have the scientific name of Vireo solitarius.<br>


<br>

We can access and represent these in a consistent fashion using our CLB and probably others can too in their own index models.<br>

<br>

So, now we have a specimen of Vireo solitarius that was captured in Minnesota.   It might be an errant instance of C1, Vireo solitarius sensu stricto, that strayed a bit west of normal.   It might be (C4) Vireo solitarius, sensu lato.     The specimen would need that concept identifier tied to the record to make this explicit.    So,  let&#39;s say that the identifier was made using the lumped concept (C4).  Of course, if this doesn&#39;t make it into the record, we are stuck with the name alone.<br>


<br>

Using the method (6) you described would allow a user to discover the different treatments of Vireo solitarius (C1 and C4) and provide some means to discriminate them via concept resolution.<br>

<br>

- C4 includes C1, C2, and C3 which would include all the names above.<br>

- C1 would only include the nomenclatural/lexical variants for Vireo solitarius.<br>

<br>

Resolution will enable us to perform a significantly more useful and concept-informed search.  It will, however,  include the specimen I referenced above in BOTH cases because &quot;Vireo solitarius&quot; or it&#39;s protonymID will be a search term in both cases.<br>


<br>

A more precise concept based system would utilise a required taxon concept identifier in the specimen record to discriminate different uses of the SAME NAME.  In other words,  if you did a search of Vireo solitarius and the concept resolver indicated the different concepts above and you chose the sensu stricto (split) version,  you would get the C1 labelled records but the C4 labelled records would be excluded or at least come with a warning (may not be what you are looking for).  This of course requires our specimen records to have a concept identifier.   Or,  the concept definition itself will include additional annotations to enable us to make inferences<br>


<br>

Ex.,<br>

<br>

Publication date of the concept - If the split didn&#39;t happen until 1980 and the specimen is from 1960 then we could infer C4.<br>

Distribution information for the concept - if we disregard errant specimens then we might infer a 1985 Minnesota specimen is a C2 in spite of the different name.<br>

<br>

In sum,  we are on track for achieving this and I believe our data mobilisation strategy will support getting these sort of data published.   When Markus returns from paternity leave I would hope we could include his thoughts on how we might expose these as RDF via our indices to support all aspects of this discussion.<br>

<font color="#888888">

<br>

David</font><div><div></div><div class="h5"><br>

<br>

<br>

<br>

<br>

On Jun 13, 2010, at 2:37 AM, Richard Pyle wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Tim: Coffee time.<br>

<br>

Dave:<br>

<br>

Here&#39;s how I imagine this would work under GNA, integrated with GBIF:<br>

<br>

1. Person submits text-string &quot;Puma concolor&quot; to a GNA-aware mapping<br>

service.<br>

<br>

2. Service fires text string off to GNI, and sees how many lexical buckets<br>

are involved, and how many protonyms are represented in those buckets.<br>

<br>

3. If problems of Homonymy/Homography exist (i.e., if more than one<br>

legitimate Protonym for a species-group name &quot;concolor&quot; has ever been<br>

combined with a genus-group name &quot;Puma&quot;), then the service replies with a<br>

page that says &quot;Do you mean the big cat, or do you mean the protozoa?&quot;<br>

(pretending, for a moment, that the name &quot;Puma concolor&quot; has also been<br>

applied to a protozoa).  Perhaps the service can also review the usage<br>

history of the two names, and algorithmically determine that they most<br>

likely meant the big cat -- but at least alert the user that a potential<br>

case of homonymy/homography exists.<br>

<br>

4. If step 2 yielded no apaprent homonymy/Homography, or if the user<br>

selected one from among more than one Homonyms/Homographs, then the service<br>

takes the selected ProtonymID and throws it at a GNUB-aware taxon concept<br>

resolver.<br>

<br>

5. The GNUB-aware Taxon Concept resolver looks at how many Taxon Concept<br>

Service Providers (e.g., ITIS, EOL, WoRMS, etc.) have made some sort of<br>

concept-definition assertion about the Protonym. In most cases, this<br>

could/should be as simple as &quot;Concept Service [X] says that for Protonym<br>

[IDp], follow taxon name usage-instance [IDtnu]&quot;. Given [IDtnu], GNUB will<br>

tell us which Genus combination to use, which orthographic spelling to use,<br>

which taxon rank to use, and which set of Protonyms should be regarded as<br>

subjective synonyms of the taxon concept represented by [IDtnu].  If the<br>

different taxon concept providers (I call them &quot;Meta Authorities&quot;) all agree<br>

(i.e., each taxon concept provider yields the same set of ProtonymIDs), then<br>

no user interaction is required on this step. If there are different<br>

interpretations of what the current treatment of &quot;Puma concolor [big cat]&quot;<br>

should be, then the user is presented with the different options (and<br>

perhaps a bit of information on what the different active concepts are, in<br>

terms of distribution and/or classification).<br>

<br>

6. The resultant set of Protonym IDs from step 5 (the original ProtonymID<br>

from step 2/3, plus the exploded set of Protonyms for subjective/hetrotypic<br>

synonyms from step 5), are then thrown at GBIF (which would be GNA-Aware,<br>

and thus know how to translate all the ProtonymIDs into a larger set of<br>

text-string names and/or GBIF may have already cashed this by converting<br>

text-string names from occurrence providers into ProtonymIDs via GNI).<br>

<br>

7. The user is then presented with a distributional map from GBIF occurrence<br>

records, based on the selected Protonym of the original submitted<br>

text-string name, cast in the context of the set of heterotypic synonyms<br>

established in Step 5.<br>

<br>

The bad news is that this sounds incredibly complicated.  The good news is<br>

that it&#39;s actually not.  Especially not from the user&#39;s perspective.<br>

<br>

In the WORST case scenario, the user needs to provide three pieces of<br>

information:<br>

<br>

1. The text-string name submitted in Step 1.<br>

<br>

2. A decision in the case of Homonyms/Homographs, what critter/weed/microbe<br>

they&#39;re after.<br>

<br>

3. A decision about which Meta Authority to follow for the taxon concept.<br>

<br>

This, again, is the WORST case scenario.  A much more likely scenario<br>

involves fewere steps for the end user.<br>

<br>

Consider:<br>

<br>

Step 2 only applies in the 10%(ish) cases of text-string names involved in<br>

some sort of Homonymy/Homography problem.  So in 90%(ish) of cases, step 2<br>

won&#39;t come into play.<br>

<br>

Step 3 only applies in cases where the Meta-Authorities disagree on the<br>

current usage of a name (e.g., ITIS is a lumper, WoRMS is a splitter).  Even<br>

in cases where there is disagreement, the user could simply be presnted with<br>

two (or more) maps, showing each of the current interpretations/statuses of<br>

the selected critter/weed.  For example, the user might get a page that says<br>

&quot;If you follow the ITIS interpretation of this species, the map looks like<br>

this. If you follow the WoRMS interpretation of the name, the map looks like<br>

that.&quot;<br>

<br>

And, indeed, Step 1 wouldn&#39;t exist in the majority of cases, because I<br>

suspect most people will get to the Map service by clicking on a link from<br>

some web page article or database system.  In most cases, this link would<br>

also bypass Step 2 as well.<br>

<br>

In other words, if we can continue to develop GNA the way we&#39;re already<br>

developing it, we should be able to get the the point (Soon!) where a user<br>

clicks a link on a web page, and immediately gets a single map distribution<br>

using the taxon concpet adopted by the overwhelming majority of<br>

Meta-Authorities, or (at worst) gets more than one map based on more-than<br>

one contemporary/contentious views of what the species concept should be<br>

(with links to more information, if the user wants the details).<br>

<br>

So, if we keep building GNA, we should have exactly the service that Pete<br>

says he&#39;d like to have (i.e., a single map with the full distribution of the<br>

species, regardless of what text-string name is used to lable the georef&#39;d<br>

occurrence data-points).<br>

<br>

Simple, really....<br>

<br>

:-)<br>

<br>

Rich<br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

-----Original Message-----<br>

From: David Remsen (GBIF) [mailto:<a href="mailto:dremsen@gbif.org" target="_blank">dremsen@gbif.org</a>]<br>

Sent: Saturday, June 12, 2010 10:50 AM<br>

To: Peter DeVries<br>

Cc: David Remsen (GBIF); Richard Pyle;<br>

<a href="mailto:tdwg-content@lists.tdwg.org" target="_blank">tdwg-content@lists.tdwg.org</a>; Kevin Richards; Jerry Cooper<br>

Subject: Re: [tdwg-content] Name is species concept thinking<br>

<br>

Pete -<br>

<br>

This statement has been sticking with me since I read it.   It might<br>

be me but I don&#39;t see any relationship between that statement<br>

and how<br>

this relates to taxon concepts.   In a concept-based system<br>

you could<br>

easily have two different maps for Puma concolor.    Whether Felis<br>

concolor is included is not relevant because nomenclatural<br>

synonyms have no bearing on the circumscription.  They are<br>

both names for the same type.<br>

<br>

There may be two different concepts (circumscriptions) published for<br>

Aedes triseriatus.   It could be quite legit for a different<br>

(objective synonym only) name like Oclerotatus triseriatus to<br>

refer to<br>

that same concept.  So in that sense,   there is a rationale for<br>

different scientific names to be able to reference the same<br>

concept to<br>

meet that requirement of the example you cite.   But in<br>

zoology these<br>

examples aren&#39;t even considered different names and the rule<br>

of priority would prevent truly different (heterotypic names)<br>

from referring to the same type so the use cases for<br>

different scientific names being able to refer to a single<br>

concept ID are quite limited.<br>

<br>

Mapping objective (homotypic) synonymy provides the basis for<br>

providing a single map for those examples you cite but it&#39;s<br>

not using true concept-based principles.<br>

<br>

Best,<br>

David<br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Frankly I think it would be an improvement if we could get maps etc<br>

that combine Aedes triseriatus / Ochlerotatus triseriatus<br>

</blockquote>

into one map<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

and Felis concolor and Puma concolor into a different<br>

</blockquote>

single map. :-)<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Respectfully,<br>

<br>

- Pete<br>

<br>

<br>

<br>

</blockquote>

<br>

</blockquote>

<br>

<br>

<br>

</blockquote>

<br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>----------------------------------------------------------------<br>Pete DeVries<br>Department of Entomology<br>University of Wisconsin - Madison<br>445 Russell Laboratories<br>

1630 Linden Drive<br>Madison, WI 53706<br>GeoSpecies Knowledge Base <br>About the GeoSpecies Knowledge Base<br>------------------------------------------------------------<br>

</div>