OK, Pete, I'm going to try to write the other half of the email that I promised.  I'm going to start by saying that some of what I'm talking about here has already been posted on the Darwin-SW (DSW) wiki page called RelationshipToExistingModels (http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels).  I've actually wanted to bring this up with you for about six months, but have never taken the time to put it into an email.  Since this is going out to the list, I'll include some comments about background that you already know (assuming a broader audience).  I'd welcome your comments and feedback on what I've said and whether you think it is accurate or not.

One of the things that I think makes it difficult for people to follow what you are proposing on taxonconcept.org is that the structure of your RDF is complex.  I'm not saying that is a bad thing, I'm just saying that if you combine that with people's general unfamiliarity with RDF and the difficulty that some people have with visualizing RDF in XML format, it just isn't accessible to most people.  Even that difficulty in itself isn't necessarily a bad thing because RDF isn't really intended to be understood primarily by people - it's designed to be understood by computers, so many people on this list don't really need to care about it.  Nevertheless, in order to have a discussion about a proposal, one must be able to visualize it.  I am a very right-brained person and must have maps, diagrams, and graphs to conceptualize things.  So the first thing I did was to go to http://www.w3.org/RDF/Validator/, put in the URI your example, and tell the parser to give me graph only.  In the RelationshipToExistingModels wiki page, I looked at http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf which provided information about an occurrence.  One of the most obvious features of the resulting graph is that it is complex.  I used the word "reticulated" because there are many cross-connections between the nodes.  It takes a bit of time and a large-screen monitor to sort it all out, but if one ignores the literals and concentrates only on the nodes that are labeled with URIs, the structure is actually very similar to the structure we used in DSW.  So that tells me that we (Cam and I) are seeing the biodiversity informatics world in a very similar manner to the way you have been seeing it.  On obvious difference is in the class names used to type resources - we mostly used DwC classes and you used ones that you defined in your ontology, but that is a cosmetic difference if one assumes that the classes in DSW and at taxonconcept.org represent the same thing.

If one considers the basic RDF graph for DSW shown on the DSW home page (http://code.google.com/p/darwin-sw/), with the exception of dsw:Token which can be evidence for several things (and foaf:Person which was kind of thrown in at the end), the basic structure of DSW is linear.  There is a connection between each class wherever there is a potential need for a one-to-many join between class instances (see triangles [="crow's feet"] on the Fully Normalized Model on the RelationshipToExistingModels wiki page; DSW is like this diagram except there is no Time class, and TaxonNameUsage in the diagram is the Taxon class in DSW).  The connections are made by object properties that we defined in the DSW ontology.  A major difference between the DSW structure and the structure that can be seen in the RDF graph of the taxonconcept.org occurrence example is that there is not just one connection that can be used to traverse the classes.  For example, in DSW, to obtain information about Occurrences that are associated with an Identification, one would have to "surf" from a dwc:Identification instance to dsw:Individual instance using the dsw:identifies property, then from the dsw:Individual instance to the dwc:Occurrence instance using the dsw:hasOccurrence property.  Similarly, in the taxonconcept.org example, one could go from the txn:Identification instance to the txn:SpeciesIndividual instance using the txn:identificationOfIndividual property, then from the txn:SpeciesIndividual instance to the txn:Occurrence instance using the txn:individualHasOccurrence property.  However, taxonconcept.org also allows one to make the connection from the txn:Identification instance directly to the txn:Occurrence instance using the txn:identificationHasOccurrence, skipping the txn:SpeciesIndividual altogether.  Similar "shortcuts" connect other classes in taxonconcept.org whose analogues in DSW are separated by intervening classes, e.g. from txn:Occurrence to txn:SpeciesConcept (roughly analogous to dwc:Taxon) by ttxn:occurrenceHasSpeciesConcept, from dwc_area:Area (roughly analogous to dcterms:Location) to txn:SpeciesConcept by txn:areaHasObservedSpeciesConcept, etc.  I don't think the taxonconcept.org ontology has every possible connection between every class, but in theory one could do that if one wanted.  There would be even more connections and shortcuts if the taxonconcept.org ontology included a class that is analogous to dwc:Event (it's "flattened out" of the taxonconcept.org model, see http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001710.html and the posts that precede and follow it for more on the topic of "flattening" databases).  It is the presence of these "shortcut" properties in the taxonconcept.org ontology that makes it's RDF graph so complex and "reticulated" and the absence of them that makes a DWC RDF graph much simpler.

Which approach is correct?  As the old adage says "anybody can say anything about anything".  There isn't really anything intrinsically "wrong" with either including or excluding "shortcut" properties.  I am guessing that the reason why your ontology has them and DSW doesn't may be a reflection of the reasons why the ontologies were created.  From what you've said in the past, I gather that you would like to facilitate assembling masses of metadata in triple stores and run SPARQL queries on them to discover interesting things.  Cam and I want to make it possible to apply GUIDs to very diverse kinds of things and be able to track what happens to them if they end up in different places.  These are not necessarily mutually exclusive desires, but they do represent a difference in outlook.  I know virtually nothing about SPARQL, but at the risk of exposing myself as an ignoramus, I'm going to mention SPARQL queries in this post anyway.  I assume from your examples that it is relatively easy to run a query to discover resources that are one object property-step away from a subject resource.  I would assume that it would be much more difficult to run such queries on things that are five object property-steps apart.  For example, if one wanted to know all of the instances of txn:SpeciesConcept's that occurred at a dwc_area:Area all one would have to do is to search for all of the objects of the txn:areaHasObservedSpeciesConcept properties for instances of that particular dwc_area:Area in a triple store.  In DSW, one would need to look for all of the dwc:Events that happened at that dcterms:Location, then find all of the dwc:Occurrences that happened at those dwc:Events, then find out which dsw:Individuals were represented in those dwc:Occurrences, then look up all of the dwc:Identifications for those dsw:Individuals, and finally make a non-redundant list of dwc:Taxon instances that were represented in those dwc:Identifications.  I don't know if there is a simple SPARQL query for that, but I doubt it.  So from the standpoint of querying, the "shortcut" property method that taxonconcept.org uses is much better.

However, there is an important problem with the "shortcut" strategy.  In order to be able to make a simple query that makes use of single-step properties, one must know what kind of query a user will want to make and then make sure that there is a shortcut property that connects the classes of interest.  This requires either a crystal ball to be able to predict what people are interested in asking, or just making up properties for every possible shortcut.  If I'm doing the math right, with the 6 classes that are included in the existing DwC plus IndividualOrganism (or SpeciesIndividual if you prefer), there would be 15 connections among the classes which would make 30 object properties required to connect them if one wanted every connection to have a pair of inverse properties to enable going in either direction.  If one included the Token class, that would make 21 pairs.  The burden would then fall on the metadata provider to provide values for all of those properties. And although 21 connections doesn't sound that bad, there could actually be a lot more actually property assignments than that because there isn't any restriction that says that there will only be one value for a property.  If an organism has two Identifications, then every xxxHasIdentification kind of property is going to have two values.  If there are many Identifications (e.g. http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf) there would be many values.  Essentially, the metadata provider is left with the job of pre-running every kind of query that a user could possibly want to do. 

An alternative to this would be to simply the model by "flattening out" certain classes (making the model less "normalized").  You did that with Event.  In my Biodiversity Informatics article (https://journals.ku.edu/index.php/jbi/article/view/3664) I did it for Event and Location.  Historically museum people have "flattened out" IndividualOrganism and Token.  People normalize out Identification all of the time.  As Rich Pyle pointed out in the post I cited above, people "flatten" more complex models into simpler models all the time because it is convenient and it makes their databases simpler and easier to manage.  But if our desire is to come up with a general model that will work for museum people and their old specimen labels, bird and whale observation people, DNA barcoding people, people who document live organisms with images and sound, bioblitzers, etc. it has to include every class that participants can reasonably need to have to facilitate needed "one-to-many database joins" or whatever you want to call that.  In October, I posted a message to the tdwg-content list where I warned against setting precedents for using "wrong" RDF (http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001663.html).  In that post, my point was that people should not apply properties to instances of the wrong class.  That is exactly what happens when people simplify models by eliminating classes that have only one-to-one relationships with other classes in their database.  So if I were to restate my complaint again, I'd frame it this way: a "wrong" RDF model is one that leaves out classes that potential users may need to express the complexities of their data.  This principle was an underlying assumption when we constructed DSW, and to know what classes people needed, we looked at the discussion that took place on the tdwg-content list in Oct/Nov.  As I point out on the RelationshipToExistingModels wiki page, we could have included a Time class, but as a practical matter, nobody has expressed a need for it (at least yet).  So given this principle, reducing the number of shortcut properties by getting rid of classes is simply not an option for any model that hopes to include all of the kinds of metadata that one would like to describe within a community. 

So the bottom line, in my opinion, is that in a model as complex as what we need in the biodiversity informatics community (i.e. a "fully normalized" model) we simply cannot hope to create and assign object properties that connect every class.  Hence in DSW we only created the minimum number of object properties needed to express what we considered the fundamental relationships among the classes.  What this means is that it simply may not be possible to make simple SPARQL queries on the data to find out what people want to know.  Rather, the burden will fall on software developers to create software that can traverse the network of connections among the classes and extract the information that they need to answer the questions they want people to be able to pose through use of their software.  Nailing down what the community consensus is on the classes and their connections is the first step to being able to create that kind of software. 

This email is already too long, but I think that I need to make one more point about the impossibility of expecting a metadata provider pre-populating all of the necessary "shortcut" properties that one would want to use in simple SPARQL queries.  If there is only one person at one institution creating all of the metadata, then it is easy to make sure that all of the subject resources are assigned values for the appropriate shortcut object properties.  (I think this is the case in the example SPARQL queries that you have put out on the list, i.e. all of the metadata was provided by you - I didn't go back and look at the examples again, so I could be wrong about that.)  However, in the situation that Cam and I are interested in, the various connected resources may be at different institutions with metadata submitted "to the cloud" by different people.  For example, the tree http://bioimages.vanderbilt.edu/uncg/84 is in the University of North Carolina at Greensboro arboretum.  An image of that tree, http://bioimages.vanderbilt.edu/kirchoff/em1968 , is in the Bioimages image collection.  A specimen from that tree, http://bioimages.vanderbilt.edu/specimen/ncu592805 , is in the University of North Carolina herbarium in Chapel Hill.  Although at the moment these URIs are all under the http://bioimages.vanderbilt.edu subdomain, I would hope that at some point in the future, there would be permanent GUIDs for all of them (except the image) under someone else's management other than me.  Hopefully there will be a GUID for the Taxon assigned to an Identification of the tree which would eventually be managed at some community-maintained place like the Global Name Use Bank (GNUB).  So lets say I used the shortcut model and assigned a ""dsw:occurrenceHasTaxon" property (which doesn't actually exist in DSW at the present) to the Occurrence documented by the image in my collection (URI=http://bioimages.vanderbilt.edu/kirchoff/em1968#occ).  Now let's say that a Quercus expert looks at the UNC specimen and decides that it is some different species (i.e. creates a different dwc:Identification).  There now should be an additional value of the "dsw:occurrenceHasTaxon" property of the Occurrence metadata that I'm managing, but I'm not going to know that because the Identification has been made by somebody else, not me.  [I should note that the BiSciCol project is hoping to make it possible for people to find out this kind of thing, see http://biscicol.blogspot.com/ .]  Is it my responsibility to continually trawl the cloud and always be updating all of the many shortcut properties that would be possible to assign to the resources whose metadata I'm managing?  If I don't do that, then SPARQL queries that people would run on "dsw:occurrenceHasTaxon" properties would miss information that had been added to the cloud by others - it would only find out things that I already knew when I created my metadata record for the resource I control.  It seems to me that a major point of Linked Open Data is that individuals add to the cloud by contributing their little bit to it and that Wonderful Things happen when people find out stuff by connecting those bits with other bits contributed by other people at another place in the cloud.  If we create a system that only works when people are expected to know in advance what those Wonderful Things are, then the whole exercise becomes pointless. 

Anyway, I hope that this explains to some extent one of the reasons why Cam and I created DSW rather than just jumping in and using the taxonconcept.org ontology.  We wanted something considerably simpler.

I was going to comment/ask about at least one more thing about the ontology at taxonconcept.org, but this email is already way too long, so I'll take that up in a subsequent email.

Steve


Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?

I was thinking that it would be best to create a separate class that can be used for populations of a species.

This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities

http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar

See http://lod.taxonconcept.org/ses/v6n7p.html HTML
       http://lod.taxonconcept.org/ses/v6n7p.rdf  RDF
       http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%2Fses%2Fv6n7p%23Species Knowledge Base View (http://bit.ly bit.ly/gMFqR1
 
The model mints URI's for the following related entities. See RDF. or KB View

http://lod.taxonconcept.org/ses/mCcSp#Image      - An image of a Cougar
http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar
http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar
http://lod.taxonconcept.org/ses/mCcSp#Taxonomy   - A Basic Taxonomy for the Cougar, one alternative among many potential classifications
http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade
http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
    
    
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.


This tag is used an individual organism that that is an instance of the species concept pecies concept RDF.
This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.


  <txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual">
    <dcterms:title>A Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title>
    <skos:prefLabel>A Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel>
    <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier>
    <dcterms:description>A lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description>
    <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/>
    <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/>
  </txn:SpeciesIndividualTag>

Add a tag for a species population to the species concept RDF.
This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.

  <txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population">
    <dcterms:title>A Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title>
    <skos:prefLabel>A Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel>
    <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier>
    <dcterms:description>A lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description>
    <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/>
    <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/>
  </txn:SpeciesPopulationTag>


This is the RDF for a population, it has as one of it's parts an individual organism.
It is typed to indicate that it refers to a population of Cougars.

  <owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation">
    <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population"/>
    <skos:prefLabel>The population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel>
    <dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individual"/>
    <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf"/>
  </owl:Class>

Respectfully,

- Pete

-------------------------------------------------------------------------------------

Pete DeVries

Department of Entomology

University of Wisconsin - Madison

445 Russell Laboratories

1630 Linden Drive

Madison, WI 53706

Email: pdevries@wisc.edu

TaxonConcept  &  GeoSpecies Knowledge Bases

A Semantic Web, Linked Open Data  Project

---------------------------------------------------------------------------------------




-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu