I was working toward creating some consensus on a vocabulary on how to relate the namestrings in the Global Names Index to some sort of species concept.
The goal is to have a vocabulary that works with a number of different "types" of species concept.
It was suggested to me that I just go ahead and make one.
Here is what I am proposing and hope to demonstrate at the Nomina section on Thursday at TDWG.
One of the first things to decide is what should be the "best" form of a scientific name string.
After looking through the GBIF info it seems that the following form which has no comma between the author and year might be preferred form.
For example Puma concolor (Linnaeus 1771)
The other various forms of the "same" name can then be related to each other using semantic web predicates in addition to the relationship between a name string and species concept.
Using this as a model, I thought the following triple structure might work well to map the concept to Accepted, Synonym etc.
Since the GNI namestrings are in the form of URI's they can serve as both subjects and objects.
Literals can not be used as subjects.
I use the form hasAcceptedScientificNameURI to distinguish between the URI form and the the literal form txn:hasAcceptedScientificName.
I was a little reluctant to do this within the txn namespace since it might be something for the TDWG vocabulary.
GUID Forms <concept> txn:hasAcceptedScientificNameURI < http://gni.globalnames.org/name_strings/772d5162-f5aa-596c-98e0-a1c6c5a29bb9
<concept> txn:hasSynonymNameURI < http://gni.globalnames.org/name_strings/35da7f30-25ff-5111-ab29-1a4f9988ef51
<concept> txn:hasBasionymNameURI < http://gni.globalnames.org/name_strings/35da7f30-25ff-5111-ab29-1a4f9988ef51
< http://gni.globalnames.org/name_strings/772d5162-f5aa-596c-98e0-a1c6c5a29bb9... txn:isAcceptedScientificNameURI_Of <concept> < http://gni.globalnames.org/name_strings/35da7f30-25ff-5111-ab29-1a4f9988ef51... txn:isAcceptedScientificNameURI_Of <concept> < http://gni.globalnames.org/name_strings/35da7f30-25ff-5111-ab29-1a4f9988ef51... txn:isBasionymNameURI_Of <concept>
Humans, note that the RDF representation of these GNI namestring information can be seen by adding ".rdf" to the end of the URI. Semantic Web applications already understand the distinction between the html and rdf representations.
For example http://gni.globalnames.org/name_strings/772d5162-f5aa-596c-98e0-a1c6c5a29bb9...
It is also important to discuss and make a decision about the Domain and Range for each of these predicates.
txn:hasAcceptedScientificNameURI
Domain - Some sort of TaxonConcept but may be best to leave the Domain unspecified except that it should be a URI.
Range - Some sort of Namestring represented in RDF.
Within a typical browser, the RDF form of the URI
It might make some sense to have a range that follows some standard RDF representation, but they do not necessarily have to be namestrings in the GNI.
I was thinking that those RDF representations that follow a specific structure could be seen as instances of the class "NameString"
If these are namestrings provided only by the GNI, then they might be in a modeled as instances of the class "GNI_NameString"
========== Literal Forms txn:hasAcceptedScientificName "Ochlerotatus triseriatus (Say 1823)" txn:hasAcceptedScientificName "Aedes triseriatus (Say 1823)" txn:hasBasionymName "Culex triseriatus Say 1823"
GUID Forms
<concept> txn:hasAcceptedScientificNameURI < http://gni.globalnames.org/name_strings/3f418cbf-358a-5174-ade1-25ae846a219e
<concept> txn:hasAcceptedScientificNameURI < http://gni.globalnames.org/name_strings/e9f5e126-5cdb-52bf-9050-349df4a9dd18
<concept> txn:hasBasionymNameURI < http://gni.globalnames.org/name_strings/6ce63b2c-58be-5335-826b-622b7b4bc328
< http://gni.globalnames.org/name_strings/3f418cbf-358a-5174-ade1-25ae846a219e... txn:isAcceptedScientificNameURI_Of <concept> < http://gni.globalnames.org/name_strings/e9f5e126-5cdb-52bf-9050-349df4a9dd18... txn:isAcceptedScientificNameURI_Of <concept> < http://gni.globalnames.org/name_strings/3f418cbf-358a-5174-ade1-25ae846a219e... txn:isBasionymNameURI_Of <concept>
==========
Do we want to also include information like that described below. In some cases, each of the databases have their own version of the "Accepted Name"
This would allow one to determine which DB's have different "Accepted" names <concept> txn:hasCOL2010_ScientificNameURI <> <concept> txn:hasNCBI_ScientificNameURI <> <concept> txn:hasITIS_ScientificNameURI <> <concept> txn:hasGBIF_ScientificNameURI <>
Tracking the Catalog of Life LSID's over time. Since there is a new COL LSID each year, you will need to create some mapping between these. <concept> txn:hasCol2009LSID urn:lsid:catalogueoflife.org: taxon:24e7d624-60a7-102d-be47-00304854f810:ac2009 <concept> txn:hasCol2010LSID urn:lsid:catalogueoflife.org: taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010
These LSID will work within VIrtuoso only because this quadstore has written extensions that allow them. LSID's themselves are not really true LOD identifiers.
Most semantic web tools do not know how to deal with these.
The relationships between these "properly formed" Accepted Names, Synonyms, Basionyms can then be represented with additional predicates depending on the relationship between the concept and the namestring.
I have choosen to not emphasize what kind of taxon concept we are using since these should be usable with many "kinds" of taxon concepts.
The advantage of representing these namestrings as URI's is that once loaded into a triplestore / quadstore one will be able to see and query what namestrings are related to a specific concept and what namestrings are related to various concepts.
Unless their is some disagreement, I plan to modify my ontology and upload the example triples to the cloud so the namestring relationships are viewable in SIndice etc by the time of the Nomina meeting on Thursday.
Respectively,
- Pete