[tdwg-content] More Strange Monkey Business-like things in GBIF KOS Document

joel sachs jsachs at csee.umbc.edu
Mon Feb 14 18:05:05 CET 2011


Pete,

A few things are being conflated here. Teasing them out:

1. My read of the sentence

"As examples, see the OpenLink Data Explorer [102] and
offer it Quercus alba or the SPARQL query [103] in the GeoSpecies project 
[104] based on a
small purpose-built ontology [105] of mosquito-borne human pathogens."

is that it's the SPARQL query that's based on a small purpose built
ontology of mosquito-borne human pathogens, not the GeoSpecies project. I
think it's appropriate that the only example of linked biodiversity data
given by the report comes from GeoSpecies/taxonconcept.org, since you've
gone farther in this space than anyone else.


2. My rdf representation of the TDWG bioblitz data is primarily an 
experiment in representing Darwin Core on the semantic web. Amongst those 
thinking about the right way to do this, I probably advocate the least 
amount of change from the current Darwin Core: one or two new classes, 
and possibly some "hasX" properties, where X is a class. I can see some 
utility for range constraints on these classes, but would avoid domain 
constraints almost entirely. Others on tdwg-content are advocating a 
different style. That is all to say that 
http://www.cs.umbc.edu/~jsachs/occurrences/TechnoBioblitzOccurrences.rdf
may not be the best example of everything wrong with TDWG, only because 
many in TDWG would not endorse it. That said, I'll address some of your 
points:

i. For the most part, there is no overlap between Darin Core and "commonly 
used Linked Data vocabularies". DwC itself encourages use of Dublin Core 
where appropriate. The exceptions are for dateTimes and locations. I don't 
know why these exceptions were made (though I guess the reasons are in 
the archives somewhere). In any event, I rejected the somewhat baroque DwC 
construction for location, and opted to use the geo vocabulary. You're 
probably right that it makes sense to use dcterms for timestamps as 
well.

ii. The dataset works pretty well to query for instances of a particular 
species. It's not hard to query for people either. It would, I agree, be easier if 
people's names were more standardized, and assgning URIs of the sort you 
created (e.g. 
http://lod.taxonconcept.org/people/tdwg2010bioblitz#Donald_Hobern) is one 
way to do that. Your approach is in harmony with the 
recommendation of the GBIF report to "Promote the widespread adoption of 
URI-based standard values for key Darwin Core attribute values". 
(Recommendation 3.1.j)

iii. The current version of the 
data uses taxonconceptIDs from taxonconcept.org for 411 of the records. 
It remains (I think) non-trivial to assign taxonconceptIDs appropriately 
to all occurrence records. 
In response to some of the anomalies you pointed out earlier, I also 
made a pass at normalizing the transparent ETHAN IDs that the dataset 
uses.

iv. In regards to identifying which identifications are preferred, 
there are a number of ways forward. What would you sugget?

3. Broadly speaking, I don't see anything obectionable in the report, 
although I think the recommendations are heavy on building ontologies, and 
light on suggesting paths to linked data representations of instance data.

Joel.




On Fri, 11 Feb 2011, Peter DeVries wrote:

> Sure, I will need to be brief.
>
> 1) The KOS document is still largely dismissive of Linked Open Data
>
> 2) If you look at the current Darwin Core as represented by the TDWG
> BioBlitz Occurrence Data Set.
>    http://www.cs.umbc.edu/~jsachs/occurrences/TechnoBioblitzOccurrences.rdf
> .
>
> a) uses it's own date vocab rather and formatting rather than dc:date
> b) don't think the current version of the vocab resolves correctly following
> LOD standards
> c) other than the *geo* which TDWG does not seem to agree with how much of
> this is using any other commonly used LOD vocabulary
>
> How well does this data set work to query for occurrences of a given
> species?
>
> Or identifications or observations by a particular person?
>
> Was there any thought to identifying which of the various identifications is
> the preferred one for mapping etc?
>
> These are all issues that become apparent when you start marking up records
> and attempting queries.
>
> I modified this somewhat so that at least some of the occurrences are tied
> not to only a particular non-normalized name but to a species concept.
>
> I also started to normalize the various text strings for people to a
> standard URI. The data itself has the same person identified with several
> name variations.
>
> It is here:
> http://lod.taxonconcept.org/tdwg2010bioblitz/TechnoBioblitzOccurrences_dates.rdf
>
> As I showed earlier this version has the same person linked via the same
> identifier and allows browsing a queries base on semantically informative
> identifiers.
>
> To query the original data for occurrences of a given species you would need
> to know all the various names that people entered for that taxon.
>
>
>        Here is an example of a improved record  http://bit.ly/hy4HFi (People
> and species concept unambiguously defined with URI's and linked to related
> records)
>
>
>        Here is a poorly linked record http://bit.ly/fSReZS (The same people
> and the same species labeled with a number of different string combinations
> etc. )
>
>         Not clear who recorded what things, what other things the recorded,
> or what things are the same species and what things are different species.
>
>
> In summary, to get these to work in a way that others expect I had to make
> them more like the TaxonConcept.org records.
>
> I have been advocating for some of differences like a URI for a species
> concept, and adopting well understood external vocabularies this and other
> lists since 2006.
>
> Considering how many examples etc. and discussions I have been involved in
> it seems a bit strange that the authors of the KOS paper characterize my
> efforts as
>
> "in the GeoSpecies project104 based on a small purpose-built ontology105 of
> mosquito-borne human pathogens."
>
> Respectfully,
>
> - Pete
>
>
> On Fri, Feb 11, 2011 at 4:34 PM, Hilmar Lapp <hlapp at nescent.org> wrote:
>
>> Pete:
>>
>> On Feb 11, 2011, at 5:24 PM, Peter DeVries wrote:
>>
>> Is there some reason why there is so much "push" towards specialized near
>> proprietary solutions like LSID's and LOD unfriendly vocabularies?
>>
>>
>> Would you mind to elaborate what you mean by this?
>>
>> -hilmar
>>
>>  --
>> ===========================================================
>> : Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
>> ===========================================================
>>
>>
>>
>>
>
>
> -- 
> ---------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
> Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------
>


More information about the tdwg-content mailing list