[tdwg-content] What I learned at the TechnoBioBlitz

Mon Oct 11 14:00:23 CEST 2010

Hi Joel,

Thanks for taking the time to summarise this.  A few comments inline:

On Oct 11, 2010, at 1:46 PM, joel sachs wrote:

> One of the goals of the recent bioblitz was to think about the  
> suitability
> and appropriatness of TDWG standards for citizen science. Robert  
> Stevenson
> has volunteered to take the lead on preparing a technobioblitz lessons
> learned document, and though the scope of this document is not yet
> determined, I think the audience will include bioblitz organizers,
> software developers, and TDWG as a whole. I hope no one is shy about
> sharing lessons they think they learned, or suggestions that they  
> have. We
> can use the bioblitz google group for this discussion, and copy in
> tdwg-content when our discussion is standards-specific.
>
> Here are some of my immediate observations:
>
> 1. Darwin Core is almost exactly right for citizen science. However,  
> there
> is a desperate need for examples and templates of its use. To  
> illustrate
> this need: one of the developers spoke of the design choice between "a
> simple csv file and a Darwin Core record". But a simple csv file is a
> legitimate representation of Darwin Core! To be fair to the developer,
> such a sentence might not have struck me as absurd a year ago, before
> Remsen said "let's use DwC for the bioblitz".
>
> We provided a couple of example DwC records (text and rdf) in the  
> bioblitz
> data profile [1]. I  think the lessons learned document should  
> include an
> on-line catalog of cut-and-pasteable examples covering a variety of  
> use
> cases, together with a dead simple desciption of DwC, something like
> "Darwin Core is a collection of terms, together with definitions."
>
> Here are areas where we augemented or diverged from DwC in the  
> bioblitz:
>
> i. We added obs:observedBy [2], since there is no equivalent  
> property in
> DwC, and it's important in Citizen Science (though often not  
> available).

Is this not the intention of recordedBy?

http://rs.tdwg.org/dwc/terms/#recordedBy
A list (concatenated and separated) of names of people, groups, or  
organizations responsible for recording the original Occurrence. The  
primary collector or observer, especially one who applies a personal  
identifier (recordNumber), should be listed first.

> ii. We used geo:lat and geo:long [3] instead of DwC terms for  
> latitude and
> longitude. The geo namespace is a well used and supported standard,  
> and
> records with geo coordinates are automatically mapped by several
> applications.

Keeping an inventory of applications somewhere might be worthwhile to  
help promote or decide on this.

> Since everyone was using GPS  to retrieve their coordinates,
> we were able to assume WGS-84 as the datum.
>
> If someone had used another Datum, say XYZ, we would have added  
> columns to
> the Fusion table so that they could have expressed their coordiantes  
> in
> DwC, as, e.g.:
> DwC:decimalLatitude=41.5
> DwC:decimalLongitude=-70.7
> DwC:geodeticDatum=XYZ
>
> (I would argue that it should be kosher DwC to express the above as  
> simply
> XYZ:lat and XYZ:long. DwC already incorporates terms from other
> namespaces, such as Dublin Core, so there is precedent for this.
>
> 2. DwC:scientificName might be more user friendly than  
> taxonomy:binomial
> and the other taxonomy machine tags EOL uses for flickr images.  If
> DwC:scientificName isn't self-explanatory enough, a user can look it  
> up,
> and see that any scientific name is acceptable, at any taxonomic  
> rank, or
> not having any rank. And once we have a scientific name, higher  
> ranks can
> be inferred.
>
> 3. Catalogue of Life was an important part of the workflow, but we
> had some problems with it. Future bioblitzes might consider using
> something like a CoL fork, as recently described by Rod Page [4].
>
> 4. We didn't include "basisOfRecord" in the original data profile,  
> and so
> it wasn't a column in the Fusion Table [5]. But when a transcriber  
> felt it
> was necessary to include in order to capture data in a particular  
> field
> sheet, she just added the column to the table. This flexibility of  
> schema
> is important, and is in harmony with the semantic web.

For citizen science, would it not make more sense to apply some easy  
guideline to select one of:
- HumanObservation
- PreservedSpecimen
- LivingSpecimen
(http://code.google.com/p/darwincore/wiki/RecordLevelTerms)

Basis of record is one of the fundamental fields to know when  
consuming content, so I think any effort to capture that at source  
will be worthwhile in the long run.

> 5. There seemed to be enthusiasm for another field event at next  
> year's
> TDWG. This could be an opportunity to gather other types of data (eg.
> character data) and thereby
> i) expose meeting particpants to another set of everyday problems  
> from the
> world of biodiversity workflows, and ii) try other TDWG technology on
> for size, e.g. the observation exchange format, annotation  
> framework, etc.
>
>
> Happy Thanksgiving to all in Canada -
> Joel.
> ----
>
>
> 1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
> 2. Slightly bastardizing our old observation ontology -
> http://spire.umbc.edu/ontologies/Observation.owl
> 3. http://www.w3.org/2003/01/geo/
> 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html
> 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101011/f2d1bee5/attachment.html