Hi Gregor,
I have had some more time to think about this issue.
In an ideal world the combination of genus and specific epithet should map to one concept.
I believe that was Linnaeus's original intent.
Unfortunately the current system seems to have strayed from this original goal.
Also, it was unanticipated that changes in phylogenetic thinking would cause as many changes to the "identifier: or name for a given species.
Dima and I came up with the somewhat strange URI's for name strings so we could look at ways to connect some sort of species concept to the various names in the GNI.
The GNI itself does not always know what kind of thing a given name string represents.
It also does not know if a given name string is valid or not.
The GNI needs some additional parts to make sense of these. I will leave it to others to describe those additional parts.
After looking through a large set of species related publications etc. it is clear that the vast majority of information sources only list the genus and specific epithet.
To make sense of those and do entity extraction etc. I think the following service would be useful.
You have a name, lets say Duplictus annoyii.
To get information about what that name might be you reference a semantic web service using a URL form of the name.
http://example.org/names/Duplictus_annoyii
If you a human using a web browser this will return a page that lists all the things it might be.
Kingdom Animalia Duplictus annoyii Schmoe 1886 => appears in database in X, Y, Z Kingdom Animalia Duplictus annoyii Schmoe 1886 sec Thomas 1920 => appears in database in A, I, P Kingdom Plantae Duplictus annoyii Carefulnot => appears in database M
If there are no exact matches it could return:
Similar Name Strings Kingdom Animalia Duplictus annoyi Schmoe 1886 => appears in database in X, Y, Z Kingdom Plantae Duplictus annous Carefulnot => appears in database in A, I, P Kingdom Plantae Duplictus annous Carefulnot => appears in database M Kingdom Plantae Duplicta annous (Carefulnot) => appears in database in X, Y, Z
A semantic web program would receive an RDF version of the information above and could use various heuristics to determine which are the most likely full names.
Eventually the service could also include opinions about which name variants are valid and which are not.
Felis concolor => Felis concolor Linnaeus 1771
"Valid" names would include properly formed synonyms so Felis concolor would return Felis concolor Linnaeus 1771.
The associated URLs for each of these forms would be
HTML Page http://example.org/names/Duplictus_annoyii.html
RDF Page http://example.org/names/Duplictus_annoyii.rdf
This service would help map a given genus specific epithet combination to possible full scientific names with authorship.
You would need an additional even more opinionated web service that would tell you what the most current full name is for a given scientific name with authorship.
One that recognizes that "Felis concolor Linnaeus 1771" is a older synonym for "Puma concolor (Linnaeus, 1771)"
In GNUB / GNITE these two URI's could be considered reconciliation groups.
http://example.org/recgroups/Animal/Felis_concolor => PrefName:""Felis concolor Linnaeus 1771" http://example.org/recgroups/Animal/Puma_concolor => PrefName:"Puma concolor (Linnaeus, 1771)"
This allows a different service or different groups to make statements about these.
http://example.org/recgroups/Animal/Felis_concolor olderSynonymFor http://example.org/recgroups/Animal/Puma_concolor
or
http://example.org/recgroups/Animal/Felis_concolor isBasionymOf http://example.org/recgroups/Animal/Puma_concolor
Note that this system allows for different groups to make different an potentially conflicting statements about the different reconcillation groups.
These different statements are from different groups so can assigned to different "graphs"
<subject> <predicate> <object> <graph>
For example:
http://example.org/recgroups/Animal/Aedes_triseriatus isPreferedNameVariantOf <speciesXYZ> urn:frustrated.medical.entomologists.org http://example.org/recgroups/Animal/Ochlerotatus_triseriatus isPreferedNameVariantOf <speciesXYZ> urn:frustrated.dipterist.taxonomist.org
For the vast majority of people that simply care about mapping these names to the same "thing:
They simply do the following in their own knowledge base.
http://example.org/recgroups/Animal/Aedes_triseriatus sameAs http://example.org/recgroups/Animal/Ochlerotatus_triseriatus
Now for them facts like:
http://example.org/recgroups/Animal/Aedes_triseriatus larvalDevelopmentIn stdVocab:TreeHoles
http://example.org/recgroups/Animal/Ochlerotatus_triseriatus takesBloodMealsFrom stdVocab:Birds
http://example.org/recgroups/Animal/Ochlerotatus_triseriatus knownVectorOf stdVocab:LacCrosseEncephalitis
Are interpreted as being statements about the same "thing"
Respectfully,
- Pete
On Thu, Nov 25, 2010 at 10:06 AM, Gregor Hagedorn g.m.hagedorn@gmail.comwrote:
A side remark, about where I believe the whole discussion is misleading:
Puma concolor se:v6n7p That way in the future if the name changes without a change in the
concept.
Eupuma concolor se:v6n7p The data says linked.
This always looks nice... However, with such proposals we, the computer guys, make the concept-assessment someone elses problem (i.e. the taxonomists, ecologists, pathologist, etc.), and, at the same time, do not provide them the means to communicate. The assumption is that a scientist or applied worker would know whether to add se:v6n7p to a given taxon name or not.
With my taxonomer/pathologist hat on: I mostly have no clue which concept XXX concolor is - and whether it is changed or not. Puma concolor may be a different concept than Puma concolor. We are, of course, guilty of communicating in a shamefully loose way (s.str., s. lat. etc.), which could and should be improved by citing a secundum, but beyond that: mapping concepts is a taxonomic opinion, no objective truth.
So given that any trivial mapping mechanism can map multiple IDs (Puma concolor, Eupuma concolor) to a single concept - the proposal saves this trivial processing time, but does not contribute to the problem of communicating in a way that is suitable to assess taxon concepts.
Aside: Please compare the highly linked, and generally correctly linked Wikipedias with other content management system for the advantage of human legible IDs [[Puma concolor]] over http://x.y.net/node/234872561 - links. My own observation is that in the latter case only a fraction of the desirable links are created, and that these are quite often going to wrong, or perhaps obsoleted places.
I therefore think: se:Puma_concolor_sec._Smith would be a much more useful mechanism than all the computer-scientists-only proposals like se:v6n7p or
http://gni.globalnames.org/name_strings/772d5162-f5aa-596c-98e0-a1c6c5a29bb9
Gregor