Hi Gregor,

I have had some more time to think about this issue.

In an ideal world the combination of genus and specific epithet should map to one concept.

I believe that was Linnaeus's original intent.

Unfortunately the current system seems to have strayed from this original goal.

Also, it was unanticipated that changes in phylogenetic thinking would cause as many changes to the "identifier: or name for a given species.

Dima and I came up with the somewhat strange URI's for name strings so we could look at ways to connect some sort of species concept to the various names in the GNI.

The GNI itself does not always know what kind of thing a given name string represents.

It also does not know if a given name string is valid or not.

The GNI needs some additional parts to make sense of these. I will leave it to others to describe those additional parts.

After looking through a large set of species related publications etc. it is clear that the vast majority of information sources only list the genus and specific epithet.

To make sense of those and do entity extraction etc. I think the following service would be useful.

You have a name, lets say Duplictus annoyii.

To get information about what that name might be you reference a semantic web service using a URL form of the name.

http://example.org/names/Duplictus_annoyii

If you a human using a web browser this will return a page that lists all the things it might be.

Kingdom Animalia Duplictus annoyii Schmoe 1886 => appears in database in X, Y, Z
Kingdom Animalia Duplictus annoyii Schmoe 1886 sec Thomas 1920 => appears in database in A, I, P
Kingdom Plantae   Duplictus annoyii Carefulnot => appears in database M

If there are no exact matches it could return:

Similar Name Strings
Kingdom Animalia Duplictus annoyi Schmoe 1886 => appears in database in X, Y, Z
Kingdom Plantae   Duplictus annous Carefulnot  => appears in database in A, I, P
Kingdom Plantae   Duplictus annous Carefulnot  => appears in database M
Kingdom Plantae   Duplicta  annous (Carefulnot)  => appears in database in X, Y, Z

A semantic web program would receive an RDF version of the information above and could use various heuristics to determine which are the most likely full names.

Eventually the service could also include opinions about which name variants are valid and which are not.

Felis concolor => Felis concolor Linnaeus 1771

"Valid" names would include properly formed synonyms so Felis concolor would return Felis concolor Linnaeus 1771.

The associated URLs for each of these forms would be

HTML Page http://example.org/names/Duplictus_annoyii.html

RDF Page http://example.org/names/Duplictus_annoyii.rdf

This service would help map a given genus specific epithet combination to possible full scientific names with authorship.

You would need an additional even more opinionated web service that would tell you what the most current full name is for a given scientific name with authorship.

One that recognizes that "Felis concolor Linnaeus 1771" is a older synonym for "Puma concolor (Linnaeus, 1771)"

In GNUB / GNITE these two URI's could be considered reconciliation groups.

http://example.org/recgroups/Animal/Felis_concolor    => PrefName:""Felis concolor Linnaeus 1771"
http://example.org/recgroups/Animal/Puma_concolor   => PrefName:"Puma concolor (Linnaeus, 1771)"

This allows a different service or different groups to make statements about these.

http://example.org/recgroups/Animal/Felis_concolor olderSynonymFor  http://example.org/recgroups/Animal/Puma_concolor 

or 

http://example.org/recgroups/Animal/Felis_concolor isBasionymOf http://example.org/recgroups/Animal/Puma_concolor 

Note that this system allows for different groups to make different an potentially conflicting statements about the different reconcillation groups.

These different statements are from different groups so can assigned to different "graphs"

<subject> <predicate> <object> <graph>

For example:

http://example.org/recgroups/Animal/Aedes_triseriatus isPreferedNameVariantOf <speciesXYZ> <urn:frustrated.medical.entomologists.org>
http://example.org/recgroups/Animal/Ochlerotatus_triseriatus isPreferedNameVariantOf <speciesXYZ> <urn:frustrated.dipterist.taxonomist.org>

For the vast majority of people that simply care about mapping these names to the same "thing:

They simply do the following in their own knowledge base.

http://example.org/recgroups/Animal/Aedes_triseriatus sameAs http://example.org/recgroups/Animal/Ochlerotatus_triseriatus

Now for them facts like:

http://example.org/recgroups/Animal/Aedes_triseriatus larvalDevelopmentIn <stdVocab:TreeHoles>

http://example.org/recgroups/Animal/Ochlerotatus_triseriatus takesBloodMealsFrom <stdVocab:Birds>

http://example.org/recgroups/Animal/Ochlerotatus_triseriatus knownVectorOf <stdVocab:LacCrosseEncephalitis>

Are interpreted as being statements about the same "thing"

Respectfully,

- Pete

On Thu, Nov 25, 2010 at 10:06 AM, Gregor Hagedorn <g.m.hagedorn@gmail.com> wrote:
A side remark, about where I believe the whole discussion is misleading:

> Puma concolor   se:v6n7p
> That way in the future if the name changes without a change in the concept.
> Eupuma concolor se:v6n7p
> The data says linked.

This always looks nice... However, with such proposals we, the
computer guys, make the concept-assessment someone elses problem (i.e.
the taxonomists, ecologists, pathologist, etc.), and, at the same
time, do not provide them the means to communicate. The assumption is
that a scientist or applied worker would know whether to add se:v6n7p
to a given taxon name or not.

With my taxonomer/pathologist hat on: I mostly have no clue which
concept XXX concolor is - and whether it is changed or not. Puma
concolor may be a different concept than Puma concolor. We are, of
course, guilty of communicating in a shamefully loose way (s.str., s.
lat. etc.), which could and should be improved by citing a secundum,
but beyond that: mapping concepts is a taxonomic opinion, no objective
truth.

So given that any trivial mapping mechanism can map multiple IDs (Puma
concolor, Eupuma concolor) to a single concept - the proposal saves
this trivial processing time, but does not contribute to the problem
of communicating in a way that is suitable to assess taxon concepts.

----

Aside: Please compare the highly linked, and generally correctly
linked Wikipedias with other content management system for the
advantage of human legible IDs [[Puma concolor]] over
http://x.y.net/node/234872561 - links. My own observation is that in
the latter case only a fraction of the desirable links are created,
and that these are quite often going to wrong, or perhaps obsoleted
places.

I therefore think:
se:Puma_concolor_sec._Smith
would be a much more useful mechanism than all the
computer-scientists-only proposals like se:v6n7p or
Gregor



--
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------