[tdwg-content] What I learned at the TechnoBioBlitz

Mon Oct 11 15:14:07 CEST 2010

Tim,

Thanks - responses below ...

On Mon, 11 Oct 2010, Tim Robertson (GBIF) wrote:

> Hi Joel,
>
> Thanks for taking the time to summarise this.  A few comments inline:
>
> On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
>
>> One of the goals of the recent bioblitz was to think about the suitability
>> and appropriatness of TDWG standards for citizen science. Robert Stevenson
>> has volunteered to take the lead on preparing a technobioblitz lessons
>> learned document, and though the scope of this document is not yet
>> determined, I think the audience will include bioblitz organizers,
>> software developers, and TDWG as a whole. I hope no one is shy about
>> sharing lessons they think they learned, or suggestions that they have. We
>> can use the bioblitz google group for this discussion, and copy in
>> tdwg-content when our discussion is standards-specific.
>> 
>> Here are some of my immediate observations:
>> 
>> 1. Darwin Core is almost exactly right for citizen science. However, there
>> is a desperate need for examples and templates of its use. To illustrate
>> this need: one of the developers spoke of the design choice between "a
>> simple csv file and a Darwin Core record". But a simple csv file is a
>> legitimate representation of Darwin Core! To be fair to the developer,
>> such a sentence might not have struck me as absurd a year ago, before
>> Remsen said "let's use DwC for the bioblitz".
>> 
>> We provided a couple of example DwC records (text and rdf) in the bioblitz
>> data profile [1]. I  think the lessons learned document should include an
>> on-line catalog of cut-and-pasteable examples covering a variety of use
>> cases, together with a dead simple desciption of DwC, something like
>> "Darwin Core is a collection of terms, together with definitions."
>> 
>> Here are areas where we augemented or diverged from DwC in the bioblitz:
>> 
>> i. We added obs:observedBy [2], since there is no equivalent property in
>> DwC, and it's important in Citizen Science (though often not available).
>
> Is this not the intention of recordedBy?
>
> http://rs.tdwg.org/dwc/terms/#recordedBy
> A list (concatenated and separated) of names of people, groups, or 
> organizations responsible for recording the original Occurrence. The primary 
> collector or observer, especially one who applies a personal identifier 
> (recordNumber), should be listed first.

I think it's useful to preserve the distinction between the primary 
observer and the record creator, and this distinction is lost in a 
concatenated list.

>> ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and
>> longitude. The geo namespace is a well used and supported standard, and
>> records with geo coordinates are automatically mapped by several
>> applications.
>
> Keeping an inventory of applications somewhere might be worthwhile to help 
> promote or decide on this.

An inventory of consumers of DwC, geo, and other namespaces would be 
great.

>> Since everyone was using GPS  to retrieve their coordinates,
>> we were able to assume WGS-84 as the datum.
>> 
>> If someone had used another Datum, say XYZ, we would have added columns to
>> the Fusion table so that they could have expressed their coordiantes in
>> DwC, as, e.g.:
>> DwC:decimalLatitude=41.5
>> DwC:decimalLongitude=-70.7
>> DwC:geodeticDatum=XYZ
>> 
>> (I would argue that it should be kosher DwC to express the above as simply
>> XYZ:lat and XYZ:long. DwC already incorporates terms from other
>> namespaces, such as Dublin Core, so there is precedent for this.
>> 
>> 2. DwC:scientificName might be more user friendly than taxonomy:binomial
>> and the other taxonomy machine tags EOL uses for flickr images.  If
>> DwC:scientificName isn't self-explanatory enough, a user can look it up,
>> and see that any scientific name is acceptable, at any taxonomic rank, or
>> not having any rank. And once we have a scientific name, higher ranks can
>> be inferred.
>> 
>> 3. Catalogue of Life was an important part of the workflow, but we
>> had some problems with it. Future bioblitzes might consider using
>> something like a CoL fork, as recently described by Rod Page [4].
>> 
>> 4. We didn't include "basisOfRecord" in the original data profile, and so
>> it wasn't a column in the Fusion Table [5]. But when a transcriber felt it
>> was necessary to include in order to capture data in a particular field
>> sheet, she just added the column to the table. This flexibility of schema
>> is important, and is in harmony with the semantic web.
>
> For citizen science, would it not make more sense to apply some easy 
> guideline to select one of:
> - HumanObservation
> - PreservedSpecimen
> - LivingSpecimen
> (http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
>
> Basis of record is one of the fundamental fields to know when consuming 
> content, so I think any effort to capture that at source will be worthwhile 
> in the long run.

I agree. We did include an "Evidence" field in the paper field sheet, 
although we neglected to modify the Fusion table and the data 
documentation. My point is that it's nice to have an architecture that 
doesn't punish you too much for thinking of additional terms at the last minute.

Cheers -
Joel.

>> 5. There seemed to be enthusiasm for another field event at next year's
>> TDWG. This could be an opportunity to gather other types of data (eg.
>> character data) and thereby
>> i) expose meeting particpants to another set of everyday problems from the
>> world of biodiversity workflows, and ii) try other TDWG technology on
>> for size, e.g. the observation exchange format, annotation framework, etc.
>> 
>> 
>> Happy Thanksgiving to all in Canada -
>> Joel.
>> ----
>> 
>> 
>> 1. 
>> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
>> 2. Slightly bastardizing our old observation ontology -
>> http://spire.umbc.edu/ontologies/Observation.owl
>> 3. http://www.w3.org/2003/01/geo/
>> 4. 
>> http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html
>> 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
>> 
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> 
>