DwC extension in the GBWG repository
Dear all We have now added the work-in-progress DwC extensionhttps://github.com/tdwg/gbwg/tree/main/dwc-mixs/dwc to the GBWG repository. Issues related to questions and TODOs for the extension should/will be tagged DwC-A Extensionhttps://github.com/tdwg/gbwg/labels/DwC-A%20Extension.
Best, Thomas Stjernegaard Jeppesen Web developer orcid.org/0000-0003-1691-239Xhttps://orcid.org/0000-0003-1691-239X
Global Biodiversity Information Facility Universitetsparken 15, 2100 København Ø www.gbif.orghttps://www.gbif.org/ | www.catalogueoflife.org/https://www.catalogueoflife.org/
Dear Thomas,
Many thanks - would you mind adding the intention of this content and some words on the relation to the mapping we're doing in the TG to the README? I'd also add its status to the README - Is this to be ratified by TDWG? Has it already been ratified? etc.
Best, Pier Luigi
On 05/03/2021 13:33, Thomas Stjernegaard Jeppesen wrote:
Dear all
We have now added the work-in-progress DwC extension https://github.com/tdwg/gbwg/tree/main/dwc-mixs/dwc to the GBWG repository.
Issues related to questions and TODOs for the extension should/will be tagged DwC-A Extension https://github.com/tdwg/gbwg/labels/DwC-A%20Extension.
Best,
*Thomas Stjernegaard Jeppesen*
Web developer
orcid.org/0000-0003-1691-239X https://orcid.org/0000-0003-1691-239X
*Global Biodiversity Information Facility*
Universitetsparken 15, 2100 København Ø
www.gbif.org https://www.gbif.org/| www.catalogueoflife.org/ https://www.catalogueoflife.org/
dwc-mixs mailing list dwc-mixs@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/dwc-mixs
Thanks Pier
I'll reply here for the benefit of all the group members too, so we can capture ideas before putting them into the READMEs.
The intention of this extension is to allow data exchange within the GBIF, OBIS and ALA infrastructures specifically using the Darwin Core Archive standard. Being an extension to DwC this only draws in the terms not covered by Darwin Core, and also only brings in fields that would complement the kinds of information that the Darwin Core provides. Therefore the cross-mapping work of the group is not fully relevant to this extension, although any changes in MIxS would be followed now, or in the future. The background for how this extension came to be is described in section 2 of the forthcoming guide (in draft) for exchanging DNA-derived data in GBIF https://doi.org/10.35035/doc-vf1a-nr22
The question on ratification is really one for the members of the task group to consider. It would be useful to have the task group approve that this was a sensible route for Darwin Core Archive use. Ratification by TDWG isn't strictly necessary for GBIF/ALA/OBIS but would be desirable. GBIF have committed to having DwC-A support during Q2 2021 so there are time pressures to consider and we believe this is nearly ready.
What GBIF are really seeking from the group is guidance on:
1. Is it correct use of MIxS in this specific application profile? 2. Are there considerations that OBIS would like to bring forward? 3. Is there scope to split the MIxS fields Thomas identified? 4. What should the name of this extension be? (bearing in mind 5 below) 5. Is it reasonable to supplement the MIxS fields with the additional ones to accommodate more use cases
We'll open github issues specifically for some of these, but I thought I'd share here for context.
Thanks, Tim
On 05/03/2021, 13.58, "dwc-mixs on behalf of Pier Luigi Buttigieg" <dwc-mixs-bounces@lists.tdwg.org on behalf of pier.buttigieg@awi.de> wrote:
Dear Thomas,
Many thanks - would you mind adding the intention of this content and some words on the relation to the mapping we're doing in the TG to the README? I'd also add its status to the README - Is this to be ratified by TDWG? Has it already been ratified? etc.
Best, Pier Luigi
On 05/03/2021 13:33, Thomas Stjernegaard Jeppesen wrote: > > Dear all > > We have now added the work-in-progress DwC extension > https://github.com/tdwg/gbwg/tree/main/dwc-mixs/dwc to the GBWG > repository. > > Issues related to questions and TODOs for the extension should/will be > tagged DwC-A Extension > https://github.com/tdwg/gbwg/labels/DwC-A%20Extension. > > Best, > > *Thomas Stjernegaard Jeppesen* > > Web developer > > orcid.org/0000-0003-1691-239X https://orcid.org/0000-0003-1691-239X > > *Global Biodiversity Information Facility* > > Universitetsparken 15, 2100 København Ø > > www.gbif.org https://www.gbif.org/| www.catalogueoflife.org/ > https://www.catalogueoflife.org/ > > > _______________________________________________ > dwc-mixs mailing list > dwc-mixs@lists.tdwg.org > http://lists.tdwg.org/mailman/listinfo/dwc-mixs
-- https://orcid.org/0000-0002-4366-3088
_______________________________________________ dwc-mixs mailing list dwc-mixs@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/dwc-mixs
Hi Tim
[...] The intention of this extension is to allow data exchange within the GBIF, OBIS and ALA infrastructures specifically using the Darwin Core Archive standard.
Thanks for the clarification; however, does this occur outside of the standards space (i.e. TDWG/GSC)? This is then ad hoc?
And is the DC-A standard is more about how to package (meta)data rather than the specification of the fields right (asking from ignorance here)?
I still can't really understand the long-term role/utility of extensions if the fields they specify are not coordinated with the standards groups / used to extend the core standards "officially".
Being an extension to DwC this only draws in the terms not covered by Darwin Core, and also only brings in fields that would complement the kinds of information that the Darwin Core provides
Brings them in from where? These are made up ad hoc?
Therefore the cross-mapping work of the group is not fully relevant to this extension, although any changes in MIxS would be followed now, or in the future. The background for how this extension came to be is described in section 2 of the forthcoming guide (in draft) for exchanging DNA-derived data in GBIFhttps://doi.org/10.35035/doc-vf1a-nr22
Is this an internal way of handling this information outside of TDWG and GSC processes?
This sets off all kinds of alarm bells, especially if they are marketed widely as a sort of parallel de facto standard.
The question on ratification is really one for the members of the task group to consider. It would be useful to have the task group approve that this was a sensible route for Darwin Core Archive use.
If it's outside the standardisation processes of TDWG/GSC (as imperfect as they are), I don't really see how that's sensible in a global sense.
In a local sense, working against time constraints, it does make sense; however, only with a declared intent and plan to fold advancements into the global processes/standards.
I don't fully understand the nuances, probably, but this just doesn't sound like good strategy.
Ratification by TDWG isn't strictly necessary for GBIF/ALA/OBIS but would be desirable. GBIF have committed to having DwC-A support during Q2 2021 so there are time pressures to consider and we believe this is nearly ready.
Indeed, it isn't - anyone can do anything anytime - the question here is if this is creating more silos and not going through a process that the community at large (aside from GBIF users) can also use.
It feels like this is creating more work downstream, where we will then have three entities to map (GBIF/TDWG/MIxS) all with different ways of doing things. Does OBIS also want to create its own thing?
What GBIF are really seeking from the group is guidance on:
- Is it correct use of MIxS in this specific application profile?
Given that the MIxS standards are mainlined into the INSDC, it makes sense to use this for anything omic.
My hesitation of using MIxS because it was basically a spreadsheet is more or less removed thanks to Bill et al.'s work moving them into the linked open data world and giving each term an IRI.
That being said, many of the MIxS terms in the environmental packages (many being biogeochemical parameters etc) should be replaced by IRIs of terms from standards bodies in those communities, once we find good parallels.
- Are there considerations that OBIS would like to bring forward?
This would be very good to know. I think my concerns above would carry over.
- Is there scope to split the MIxS fields Thomas identified?
MIxS v6 is almost out, but we can lodge issues on the GSC tracker to this effect, cross-linking them to those in our GBWG tracker.
- What should the name of this extension be? (bearing in mind 5 below)
I'm not sure what you're referring to.
- Is it reasonable to supplement the MIxS fields with the additional ones to accommodate more use cases
The best way to go about this is to post issues on the MIxS tracker to get them in there (they accept new environmental packages or extensions all the time, recently one from a global consortium of food agencies and one for the COVID response)
We'll open github issues specifically for some of these, but I thought I'd share here for context.
Thanks - I still feel like I don't get the relationship between these actors over the archives, vs the core standards, vs the unilateral move, etc.
Is there somewhere where these things are explained?
Best, Pier Luigi
Hi Pier Luigi,
Maybe I'm missing the point, but I assume we are not questioning the existence of Darwin Core extensions, as creating a Darwin Core extension is exactly what this group set out to do? If the concern is with where some of the earlier discussions have happened, I can't speak for other communities but from the OBIS perspective we are happy with how things have progressed and we have had ample opportunity to contribute. I'm not sure this creates another silo, as far as I'm aware data formatting is outside the scope of the Darwin Core standard (although it offers some guidelines, https://dwc.tdwg.org/text/), and Darwin Core archives merely complement the standardized vocabularies by offering a flexible way to package vocabulary aligned data elements and metadata documents.
Our main interest in this group is to be able to bring MIxS described datasets into our data model which is mostly Darwin Core based, so from a practical standpoint it makes sense to only include MIxS terms which have no Darwin Core equivalent into the extension (hence "extension").
Best, Pieter
Pieter Provoost OBIS Data Manager Intergovernmental Oceanographic Commission (IOC) of UNESCO IOC Project Office for IODE Wandelaarkaai 7/61 - 8400 Oostende - Belgium +32 478 574420
On 08/03/2021, 23:15, "dwc-mixs on behalf of Pier Luigi Buttigieg" <dwc-mixs-bounces@lists.tdwg.org on behalf of pier.buttigieg@awi.de> wrote:
CAUTION: This email is external from UNESCO. Please be vigilant on its sender and content. ATTENTION : Cet e-mail est externe à l'UNESCO. Soyez vigilant sur son expéditeur et contenu.
Hi Tim
> [...] > The intention of this extension is to allow data exchange within the GBIF, OBIS and ALA infrastructures specifically using the Darwin Core Archive standard.
Thanks for the clarification; however, does this occur outside of the standards space (i.e. TDWG/GSC)? This is then ad hoc?
And is the DC-A standard is more about how to package (meta)data rather than the specification of the fields right (asking from ignorance here)?
I still can't really understand the long-term role/utility of extensions if the fields they specify are not coordinated with the standards groups / used to extend the core standards "officially".
> Being an extension to DwC this only draws in the terms not covered by Darwin Core, and also only brings in fields that would complement the kinds of information that the Darwin Core provides
Brings them in from where? These are made up ad hoc?
> Therefore the cross-mapping work of the group is not fully relevant to this extension, although any changes in MIxS would be followed now, or in the future. The background for how this extension came to be is described in section 2 of the forthcoming guide (in draft) for exchanging DNA-derived data in GBIFhttps://doi.org/10.35035/doc-vf1a-nr22
Is this an internal way of handling this information outside of TDWG and GSC processes?
This sets off all kinds of alarm bells, especially if they are marketed widely as a sort of parallel de facto standard.
> The question on ratification is really one for the members of the task group to consider. It would be useful to have the task group approve that this was a sensible route for Darwin Core Archive use.
If it's outside the standardisation processes of TDWG/GSC (as imperfect as they are), I don't really see how that's sensible in a global sense.
In a local sense, working against time constraints, it does make sense; however, only with a declared intent and plan to fold advancements into the global processes/standards.
I don't fully understand the nuances, probably, but this just doesn't sound like good strategy.
> Ratification by TDWG isn't strictly necessary for GBIF/ALA/OBIS but would be desirable. GBIF have committed to having DwC-A support during Q2 2021 so there are time pressures to consider and we believe this is nearly ready.
Indeed, it isn't - anyone can do anything anytime - the question here is if this is creating more silos and not going through a process that the community at large (aside from GBIF users) can also use.
It feels like this is creating more work downstream, where we will then have three entities to map (GBIF/TDWG/MIxS) all with different ways of doing things. Does OBIS also want to create its own thing?
> What GBIF are really seeking from the group is guidance on: > > 1. Is it correct use of MIxS in this specific application profile?
Given that the MIxS standards are mainlined into the INSDC, it makes sense to use this for anything omic.
My hesitation of using MIxS because it was basically a spreadsheet is more or less removed thanks to Bill et al.'s work moving them into the linked open data world and giving each term an IRI.
That being said, many of the MIxS terms in the environmental packages (many being biogeochemical parameters etc) should be replaced by IRIs of terms from standards bodies in those communities, once we find good parallels.
> 2. Are there considerations that OBIS would like to bring forward?
This would be very good to know. I think my concerns above would carry over.
> 3. Is there scope to split the MIxS fields Thomas identified?
MIxS v6 is almost out, but we can lodge issues on the GSC tracker to this effect, cross-linking them to those in our GBWG tracker.
> 4. What should the name of this extension be? (bearing in mind 5 below)
I'm not sure what you're referring to.
> 5. Is it reasonable to supplement the MIxS fields with the additional ones to accommodate more use cases
The best way to go about this is to post issues on the MIxS tracker to get them in there (they accept new environmental packages or extensions all the time, recently one from a global consortium of food agencies and one for the COVID response)
> > We'll open github issues specifically for some of these, but I thought I'd share here for context.
Thanks - I still feel like I don't get the relationship between these actors over the archives, vs the core standards, vs the unilateral move, etc.
Is there somewhere where these things are explained?
Best, Pier Luigi
-- https://orcid.org/0000-0002-4366-3088
_______________________________________________ dwc-mixs mailing list dwc-mixs@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/dwc-mixs
participants (4)
-
Pier Luigi Buttigieg
-
Provoost, Pieter
-
Thomas Stjernegaard Jeppesen
-
Tim Robertson