[tdwg-content] A proposal to improve Darwin Core for invasive species data

Donald Hobern dhobern at gbif.org
Fri May 27 16:28:15 CEST 2016


On behalf of the GBIF Secretariat, I’d like to emphasise our extreme interest in assisting with the next phase in developing Darwin Core and TDWG standards generally.

Many important points have already been made.  Flexibility to accommodate plain text and URIs in the same fields leaves some problems for data aggregators and users but is clearly the only workable way to enable data publishing from the widest possible range of sources.

However, I think it is essential that we use this opportunity to revisit the whole architecture of how we represent share, and use biodiversity data.  There are several interconnected aspects that should be included in this debate.

I take it as a given that our shared vision should include enabling human users and machines to find all of the information and to traverse all of the data connections that a knowledgeable researcher can see in the biodiversity literature, collections and other resources. By this, I mean that we should be able to start from any point in the biodiversity data graph and find the meaningful links to associated data objects. From specimen to taxon concept to taxon name to publication; from sequence to associated sequences to taxon concepts to species occurrences; etc., etc.

This means that our data architecture needs to pay attention to the following matters (quite independently of the challenges of delivering the infrastructures that underpin their successful implementation):


•       Agreement on the set of core data classes within the biodiversity domain which we consider important enough to standardise (specimen, collection, taxon name, taxon concept, sequence, gene, publication, taxon trait, … - or whatever we all agree).

•       Agreement on the set of core relationships between instances of these classes that we consider important enough to standardise (specimen identifiedAs taxon concept, taxon name publishedIn publication, etc.).

•       Making sure that our data publishing mechanisms (cores, extensions, etc.) align accurately with these classes and support these relationships – this mainly means reworking the current confused interplay between cores, DwC classes, use of dcterms:type and use of basisOfRecord – every record should be clearly identified as an instance of a class (or a view of several linked class instances) and (for the core data classes) this should form the basis for inference and interpretation.

•       An ongoing process of defining for each core class what properties are mandatory (maybe only: id, class), highly desirable (depending on the class, things like: decimal coordinates, scientific name, identifiedAs, publishedIn), generally agreed (many other properties for which we have working vocabularies and do not want unnecessary multiplication, e.g.: waterbody, maximumDepthInMeters) or optional/bespoke (anything else that any data publisher wishes to include). In other words, allow any properties to be shared but ensure that the contours of the data are clear to standard tools.

•       A set of good examples of datasets mapped into this model, using various serialisations.

It would be valuable to get a feeling for what workshops might be needed, when might be best to hold them, and how much funding would be required to ensure the right attendance.  GBIF may be able to contribute some of this.

Donald

----------------------------------------------------------------------
Donald Hobern - GBIF Executive Secretary - dhobern at gbif.org
Global Biodiversity Information Facility http://www.gbif.org/
GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
Tel: +45 3532 1471  Mob: +45 2875 1471  Fax: +45 2875 1480
----------------------------------------------------------------------

From: tdwg-content [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Lee Belbin
Sent: Friday, 27 May 2016 5:31 AM
To: Quentin Groom <quentin.groom at plantentuinmeise.be>
Cc: TDWG Content Mailing List <tdwg-content at lists.tdwg.org>
Subject: Re: [tdwg-content] A proposal to improve Darwin Core for invasive species data

Hi John et al,

Ditto (to Annie's comment) from me.

As most of you know, I have been standing back from TDWG for a few years now, but it has been hard to avoid interest in the development of TDWG's most significant standard. I have been aware of at least three independent suggestions that are relevant to my interests.

As from the beginning, the difficulty remains in getting wise heads together (face-to-face) and getting consensus on recommendations in a timely manner. The TDWG meetings are too busy without sufficient time to achieve outcomes. Teleconferencing is great in theory but hopeless in practice. The only alternative is to find time and $ for dedicated meetings adjacent to, or separate from the TDWG Conference. It has been done before to good effect.

I raise this as I believe that the time for such a meeting is well and truly here.

Cheers

Lee

Lee Belbin
Blatant Fabrications Pty Ltd
Tasmania

On Thu, May 26, 2016 at 4:04 AM, Quentin Groom <quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>> wrote:
I second Annie, this is a real problem in Biodiversity Informatics in general. Unless you can turn maintenance tasks into peer reviewed papers then you get little credit for it.
Quentin



Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium

ORCID: 0000-0002-0596-5376<http://orcid.org/0000-0002-0596-5376>

Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

E-mail:     quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>
Skype name: qgroom
Website:    www.botanicgarden.be<http://www.botanicgarden.be>


On 25 May 2016 at 17:57, John Wieczorek <tuco at berkeley.edu<mailto:tuco at berkeley.edu>> wrote:
Thanks Annie. I don't plan to give up, it's just that I don't feel I have been doing it justice for a while now.

On Wed, May 25, 2016 at 12:52 PM, Simpson, Annie <asimpson at usgs.gov<mailto:asimpson at usgs.gov>> wrote:
Chiming in now, very briefly...

John, you are the DwC champion who has kept it moving forward in a comprehensible and useful way. The TDWG community is very grateful for your work on this and I personally hope you don't give up the good fight.

Annie Simpson, biologist & information scientist
http://orcid.org/0000-0001-8338-5134
BISON project (http://bison.usgs.ornl.gov)
Core Science Analytics, Synthesis, & Libraries Program
U.S. Geological Survey, MS 302
12201 Sunrise Valley Drive
Reston, Virginia  20192
=================
asimpson at usgs.gov<mailto:asimpson at usgs.gov>
703.648.4281<tel:703.648.4281> desk

On Wed, May 25, 2016 at 7:26 AM, John Wieczorek <tuco at berkeley.edu<mailto:tuco at berkeley.edu>> wrote:
One primary idea behind the BCO is indeed as the proving ground you mention. The challenge is having consistent available resources to do that work. With BCO it could be a particular challenge, I think, since it can cover so much semantic space. I would love to be in a position to be a proper BCO caretaker, but I have not even been able to do a good job with Darwin Core as it is.

On Wed, May 25, 2016 at 2:00 AM, Quentin Groom <quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>> wrote:
Interesting! I had not come across Apple Core before. Isn't this proving-ground role something that the Biological Collections Ontology can also do?
I like the idea of a Darwin Core with different levels of adherence to rules. I agree that strict enforcement of rules will inhibit the flow of data, but in my own experience there are simple fields that I could have easily completed in a standardized way, if only I could have found a suitable recommendation to follow. Hierarchical vocabularies are particularly useful here, because they have built in flexibility.
Regards
Quentin



Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium

ORCID: 0000-0002-0596-5376<http://orcid.org/0000-0002-0596-5376>

Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

E-mail:     quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>
Skype name: qgroom
Website:    www.botanicgarden.be<http://www.botanicgarden.be>


On 25 May 2016 at 03:17, John Wieczorek <tuco at berkeley.edu<mailto:tuco at berkeley.edu>> wrote:
I think it is indeed worthwhile to have content standards to go with the term definitions. Applce Core (now under renewed development at https://github.com/tdwg/applecore) is a good example of this.

On Tue, May 24, 2016 at 6:16 PM, Chuck Miller <Chuck.Miller at mobot.org<mailto:Chuck.Miller at mobot.org>> wrote:
John,
It’s interesting how long that text has been out there, and without much comment.  It seems to presume there is a binary situation: tightly controlled vocabulary that is exclusive or loosely controlled that is inclusive.   Maybe it’s time now to consider something additional in the middle.  We know a lot more about how the Darwin Core standard is being used, or at least have plenty of examples.  With the addition of use cases into the standards for terms, progress could be made on use-case-based standard vocabulary that could reduce the “garbage in/garbage out” problem that comes from being totally inclusive.

TDWG standards in the 80s and 90s were a little more about controlled vocabulary and reducing garbage than they have been in the 00s and 10s.  Maybe we should spend some time on that aspect of data exchange again and use cases could be a method.

Best regards,
Chuck


From: tdwg-content [mailto:tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org>] On Behalf Of John Wieczorek
Sent: Tuesday, May 24, 2016 2:48 PM

To: Quentin Groom
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] A proposal to improve Darwin Core for invasive species data

I would say that the primary factor driving the philosophy for loose controlled vocabulary recommendations is a desire to promote the stability of Darwin Core term definitions, because changes can be disruptive. Section 1.4 on the Simple Darwin Core page (http://rs.tdwg.org/dwc/terms/simple/index.htm) gives further practical arguments for this stance. I have copied the relevant text here for convenience:

"There is a difference between having data in a field and requiring that field to have a value from among a legal set of values. The Darwin Core is simple in that it has minimal restrictions on the contents of fields. The term comments give recommendations about the use of controlled vocabularies and how to structure content wherever appropriate. Data contributors are encouraged to follow these recommendations as well as possible. You might argue that having no restrictions will promote "dirty" data (data of low quality or dubious value). Consider the simple axiom "It's not what you have, but what you do with it that matters." If data restrictions were in place at the fundamental level, then a record having any non-compliant data in any of its fields could not be shared via the standard. Not only would there be a dearth of shared data in that case (or an unused standard), but also there would be no way to use the standard to build shared data cleaning tools to actually improve the situation, nor to use data services to look up alternative representations (language translations, for example) to serve a broader audience. The rest is up to how the records will be used - in other words, it is up to applications to enforce further restrictions if appropriate, and it is up to the stakeholders of those applications to decide what the restrictions will be for the purpose the application is trying to serve."


On Tue, May 24, 2016 at 3:44 PM, Quentin Groom <quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>> wrote:
Hi Paco,
I'm glad to hear Plinian Core is active, I only recently discovered it and think it is a good initiative. The species data I've seen is in quite diverse and in unstandardised formats. It would be nice to see some of the big providers using Plinian Core.

I'm not so worried about imposing limitations on users, because as far as I can see Darwin Core only recommends vocabularies, it doesn't enforce them. Having said that, it would be useful to know what is meant by "Recommended best practice is to use a controlled vocabulary", because if Darwin Core doesn't impose a vocabulary and there is no field to specify which vocabulary you are using then it doesn't help interoperability much.

I'm happy to also discuss off list. Invasiveness and impact are difficult to standardize and so far I've chosen other fields that I consider easier to gain consensus on.

Regards
Quentin



Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium

ORCID: 0000-0002-0596-5376<http://orcid.org/0000-0002-0596-5376>

Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

E-mail:     quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>
Skype name: qgroom
Website:    www.botanicgarden.be<http://www.botanicgarden.be>


On 24 May 2016 at 19:08, Francisco Pando <pando at gbif.es<mailto:pando at gbif.es>> wrote:

Quentin et al.,



Plinian Core is active and backed up by an international group that seeks expansion. A session is planned in TDWG 2016 about it within the Species Information Interest Group slot.



"Invasiveness" is a section within the Plinian Core schema: https://github.com/PlinianCore/Documentation/wiki/InvasivenessClass

It is much based on the GISIN schema.   This can be revisited, updated and harmonized with current initiatives, some mentioned in this thread. Quentin, we may do a bit of exchange of-list



Whereas shared vocabularies bring plenty of good things , I share Chuck’s concerns about imposing some unwanted limitations for some potential users of the schema.



Best,



Paco


Francisco Pando

Investigador
Real Jardín Botánico - CSIC
Plaza de Murillo, 2
28014 Madrid, Spain
Tel.+34 91 420 3017 x 172<tel:%2B34%2091%20420%203017%20x%20172>

From: tdwg-content [mailto:tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org>] On Behalf Of Quentin Groom
Sent: Monday, May 23, 2016 10:10 PM
To: Chuck Miller

Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] A proposal to improve Darwin Core for invasive species data

Hi Chuck,
thanks for your point. The use cases I'm thinking of are conservation red-listing; horizon-scanning for potential new invasives; early warning of new aliens; impact assessment and invasion monitoring. We have recently be discussing the possibility of automating all of these process so that they can be repeated regularly, or as soon as new data becomes available. Obviously, for this we need observations, but we also need check-lists to tell us what is considered native or alien, present or extinct.
I know more about invasive species research than red-listing, but I am aware that the current rate of red-listing is so slow that most things will become extinct before they are assessed. Given that the IUCN criteria are so clear, it should be possible to automate the whole process using GBIF. The only limitation with then be mobilizing the observations.
Regards
Quentin



Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium

ORCID: 0000-0002-0596-5376<http://orcid.org/0000-0002-0596-5376>

Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

E-mail:     quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>
Skype name: qgroom
Website:    www.botanicgarden.be<http://www.botanicgarden.be>


On 23 May 2016 at 21:10, Chuck Miller <Chuck.Miller at mobot.org<mailto:Chuck.Miller at mobot.org>> wrote:
Quentin,
I think in addition to defining the community that needs the new Origin term, you also need to define the use cases to which the proposed controlled vocabularies for establishmentMeans and occurenceStatus apply.  Darwin Core is used in multiple ways.  I think there may be use cases for these terms that don’t match the invasive species use cases. One controlled vocabulary may not work for all Darwin Core users.

Best regards,
Chuck

Chuck Miller | VP-IT & CIO | Missouri Botanical Garden
4344 Shaw Boulevard | Saint Louis, MO 63110 | Phone 314-577-9419<tel:314-577-9419>
From: tdwg-content [mailto:tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org>] On Behalf Of John Wieczorek
Sent: Monday, May 23, 2016 1:36 PM
To: Quentin Groom
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] A proposal to improve Darwin Core for invasive species data

Hi Quentin,

Thank you for your effort in putting forth these welll thought out proposals. At various times I have heard discussions on the inadequecy of establishmentMeans. Your work encapsulates the problem well.
One of the things that helps when proposing to add a Darwin Core term is demonstrating that there is a community that needs it. Can you tell us who has a demonstrated need to share this information? Anyone out there who has this interest is also welcome to share that here to provide evidence of demand from more than one group, project or individual.

Cheers,

John

On Mon, May 23, 2016 at 3:23 PM, Quentin Groom <quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>> wrote:
I've been working on a proposal to improve Darwin Core for use with invasive species data.

The proposal is detailed on GitHub at https://github.com/qgroom/ias-dwc-proposal/blob/master/proposal.md.

The proposal is for a new term "origin" and suggested vocabularies for establishmentMeans and occurrenceStatus.

I'd welcome your feedback on the proposal.

From my perspective it provides some needed clarity on the establishmentMeans and occurrenceStatus fields, but also adds the origin that is needed for invasive species research and for conservation assessments.

I'm not sure of the best way to discuss this, but if you have concrete proposals for changes you might raise them as issues on GitHub, as well as mentioning them here.

Regards
Quentin




Dr. Quentin Groom
(Botany and Information Technology)

Botanic Garden Meise
Domein van Bouchout
B-1860 Meise
Belgium

ORCID: 0000-0002-0596-5376<http://orcid.org/0000-0002-0596-5376>

Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

E-mail:     quentin.groom at plantentuinmeise.be<mailto:quentin.groom at plantentuinmeise.be>
Skype name: qgroom
Website:    www.botanicgarden.be<http://www.botanicgarden.be>


_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content



_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content


_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content





_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content




_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-content/attachments/20160527/46c91355/attachment.html>


More information about the tdwg-content mailing list