Pete,
A few things are being conflated here. Teasing them out:
1. My read of the sentence
"As examples, see the OpenLink Data Explorer [102] and offer it Quercus alba or the SPARQL query [103] in the GeoSpecies project [104] based on a small purpose-built ontology [105] of mosquito-borne human pathogens."
is that it's the SPARQL query that's based on a small purpose built ontology of mosquito-borne human pathogens, not the GeoSpecies project. I think it's appropriate that the only example of linked biodiversity data given by the report comes from GeoSpecies/taxonconcept.org, since you've gone farther in this space than anyone else.
2. My rdf representation of the TDWG bioblitz data is primarily an experiment in representing Darwin Core on the semantic web. Amongst those thinking about the right way to do this, I probably advocate the least amount of change from the current Darwin Core: one or two new classes, and possibly some "hasX" properties, where X is a class. I can see some utility for range constraints on these classes, but would avoid domain constraints almost entirely. Others on tdwg-content are advocating a different style. That is all to say that http://www.cs.umbc.edu/~jsachs/occurrences/TechnoBioblitzOccurrences.rdf may not be the best example of everything wrong with TDWG, only because many in TDWG would not endorse it. That said, I'll address some of your points:
i. For the most part, there is no overlap between Darin Core and "commonly used Linked Data vocabularies". DwC itself encourages use of Dublin Core where appropriate. The exceptions are for dateTimes and locations. I don't know why these exceptions were made (though I guess the reasons are in the archives somewhere). In any event, I rejected the somewhat baroque DwC construction for location, and opted to use the geo vocabulary. You're probably right that it makes sense to use dcterms for timestamps as well.
ii. The dataset works pretty well to query for instances of a particular species. It's not hard to query for people either. It would, I agree, be easier if people's names were more standardized, and assgning URIs of the sort you created (e.g. http://lod.taxonconcept.org/people/tdwg2010bioblitz#Donald_Hobern) is one way to do that. Your approach is in harmony with the recommendation of the GBIF report to "Promote the widespread adoption of URI-based standard values for key Darwin Core attribute values". (Recommendation 3.1.j)
iii. The current version of the data uses taxonconceptIDs from taxonconcept.org for 411 of the records. It remains (I think) non-trivial to assign taxonconceptIDs appropriately to all occurrence records. In response to some of the anomalies you pointed out earlier, I also made a pass at normalizing the transparent ETHAN IDs that the dataset uses.
iv. In regards to identifying which identifications are preferred, there are a number of ways forward. What would you sugget?
3. Broadly speaking, I don't see anything obectionable in the report, although I think the recommendations are heavy on building ontologies, and light on suggesting paths to linked data representations of instance data.
Joel.
On Fri, 11 Feb 2011, Peter DeVries wrote:
Sure, I will need to be brief.
The KOS document is still largely dismissive of Linked Open Data
If you look at the current Darwin Core as represented by the TDWG
BioBlitz Occurrence Data Set. http://www.cs.umbc.edu/~jsachs/occurrences/TechnoBioblitzOccurrences.rdf .
a) uses it's own date vocab rather and formatting rather than dc:date b) don't think the current version of the vocab resolves correctly following LOD standards c) other than the *geo* which TDWG does not seem to agree with how much of this is using any other commonly used LOD vocabulary
How well does this data set work to query for occurrences of a given species?
Or identifications or observations by a particular person?
Was there any thought to identifying which of the various identifications is the preferred one for mapping etc?
These are all issues that become apparent when you start marking up records and attempting queries.
I modified this somewhat so that at least some of the occurrences are tied not to only a particular non-normalized name but to a species concept.
I also started to normalize the various text strings for people to a standard URI. The data itself has the same person identified with several name variations.
It is here: http://lod.taxonconcept.org/tdwg2010bioblitz/TechnoBioblitzOccurrences_dates...
As I showed earlier this version has the same person linked via the same identifier and allows browsing a queries base on semantically informative identifiers.
To query the original data for occurrences of a given species you would need to know all the various names that people entered for that taxon.
Here is an example of a improved record http://bit.ly/hy4HFi (People
and species concept unambiguously defined with URI's and linked to related records)
Here is a poorly linked record http://bit.ly/fSReZS (The same people
and the same species labeled with a number of different string combinations etc. )
Not clear who recorded what things, what other things the recorded,
or what things are the same species and what things are different species.
In summary, to get these to work in a way that others expect I had to make them more like the TaxonConcept.org records.
I have been advocating for some of differences like a URI for a species concept, and adopting well understood external vocabularies this and other lists since 2006.
Considering how many examples etc. and discussions I have been involved in it seems a bit strange that the authors of the KOS paper characterize my efforts as
"in the GeoSpecies project104 based on a small purpose-built ontology105 of mosquito-borne human pathogens."
Respectfully,
- Pete
On Fri, Feb 11, 2011 at 4:34 PM, Hilmar Lapp hlapp@nescent.org wrote:
Pete:
On Feb 11, 2011, at 5:24 PM, Peter DeVries wrote:
Is there some reason why there is so much "push" towards specialized near proprietary solutions like LSID's and LOD unfriendly vocabularies?
Would you mind to elaborate what you mean by this?
-hilmar
--
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/