I thought I should follow up in that this representation may appear to be overly "wordy".

However, in the triplestore this the representation: 

<concept> hasScientificNameURI      <name_uri>
<name_uri> isScientificNameURI_Of  <concept>


will occupy about ~ 62 bytes - Less that many unicode name strings by themselves.

- Pete

On Mon, Oct 11, 2010 at 2:29 PM, Peter DeVries <pete.devries@gmail.com> wrote:
I am starting to see some reasons why there might be a need for an more XML type version of the DarwinCore and a separate, still undefined buy truly "semantic", version that still need to be worked out.

Part of this is that I think it that most people don't understand the semantic issues and will not for sometime.

So for those who are submitting their data to GBIF, I don't know if they have to understand all the somewhat counterintuitive semantic issues.

Here is an example from one of our earlier discussions - I realized only later that I should have brought this up at the time.

For XML using "scientificName" makes a lot of sense.

However:

1) The current semantic web cannot have literals as subjects.

 2) Also if you represent knowledge as triples without ontology-based inferencing, you need to define predicates that for both "directions" in a particular relationship. (In general)

So if we had a URI for each scientific name we would need to make both of the following kinds of statements.

<concept> hasScientificNameURI      <name_uri>
<name_uri> isScientificNameURI_Of  <concept>

In the LOD cloud, you cannot assume that everyone will load and be able to infer against your specific vocabulary.

This is, in part, the result of vocabularies not always playing well together.

Also, it is not clear to what extent inferencing will work well on really large data sets or for particular projects.

Eventually this will get figured out, but for now we need predicates that can be used to represent both directions. (most of the time)

Without them you can only query in one direction.

This was not completely obvious to me until I had a test set and realized why I was not able to query it in the ways I wanted.

So for the more XML-centric DarwinCore there is no problem with using scientificName.

For the fully semantic web version, the following pattern might be easiest to clearly interpret.

hasScientificName "Puma concolor (Linnaeus 1771)"
hasScientificNameURI <http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8>

This makes it clear that the first form is to be used for a literal while the second form is to be used for a conventionally resolvable URI.

Why do we need these name URI's at all? 

Because some groups will need to relate name strings to each other and name strings to concepts etc.

Hard to do when you can't use a literal as a subject. :-)

Respectfully,

- Pete
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------



--
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------