[dwc-mixs] [gensc-cig] Use of the MIxS-based DWC extension

Thomas Stjernegaard Jeppesen tsjeppesen at gbif.org
Thu Sep 30 08:25:16 UTC 2021


Hi Chris
This is explained in the task group report on page 8, section “Outcomes > DwC Extensions > Variations of the MIxS-DwC Extension”:

Additionally, the DNA-derived data extension also takes measures to optimise the formatting
and machine-readability of keys from MIxS. This stems from the fact that some MIxS
key-value pairs are not atomic, i.e. they include multiple values in the same field (e.g. the
MIxS key “pcr_primers” requires the user to enter a value which comprises a string that
represents both the forward and reverse primer sequence, separated by a semicolon). This
value-level formatting creates a bespoke data structure which then requires custom software
or code to parse, limiting interoperability with external systems. Thus, in the case of
pcr_primers, the DNA derived data extension uses alternative keys, based on the MIxS key,
which are associated with atomic values: pcr_primer_forward and pcr_primer_reverse. This
allows for more efficient and unambiguous data ingestion into search indices, relational
databases, or similar solutions with minimal processing.

We fully acknowledge resolvable references are wanted and at some point anticipate that. However the notion of using URIs (not URLs) as a means for identifying concepts has been around and used by GBIF for decades (A URI doesn't necessarily have to be a digitally accessible thing - it's just an identifier format).
Out of curiosity: Is MIxS 6 released or any news on when it will be?

Best
Thomas

From: Chris Mungall <cjmungall at lbl.gov>
Date: Wednesday, 29 September 2021 at 20:25
To: Chris H <only1chunts at gmail.com>
Cc: Thomas Stjernegaard Jeppesen <tsjeppesen at gbif.org>, "dwc-mixs at lists.tdwg.org" <dwc-mixs at lists.tdwg.org>, gensc-cig <gensc-cig at googlegroups.com>
Subject: Re: [gensc-cig] Use of the MIxS-based DWC extension

Excellent!

What is the status of mapping the remaining fields here:


  <extension encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.gbif.org/terms/1.0/DNADerivedData">

    <files>

      <location>dnaderiveddata.txt</location>

    </files>

    <coreid index="0" />

    <field index="1" term="https://w3id.org/gensc/terms/MIXS:0000012"/>

    <field index="2" term="https://w3id.org/gensc/terms/MIXS:0000013"/>

    <field index="3" term="https://w3id.org/gensc/terms/MIXS:0000014"/>

    <field index="4" term="https://w3id.org/gensc/terms/MIXS:0000044"/>

    <field index="5" term="https://w3id.org/gensc/terms/MIXS:0000090"/>

    <field index="6" term="http://rs.gbif.org/terms/pcr_primer_forward"/>

    <field index="7" term="http://rs.gbif.org/terms/pcr_primer_reverse"/>

    <field index="8" term="http://rs.gbif.org/terms/pcr_primer_name_forward"/>

    <field index="9" term="http://rs.gbif.org/terms/pcr_primer_name_reverse"/>

    <field index="10" term="http://rs.gbif.org/terms/pcr_primer_reference"/>

    <field index="11" term="http://rs.gbif.org/terms/dna_sequence"/>

  </extension>

Some are in MIxS but not as 1:1; e.g. MIxS has a combined field for primers whereas many databases separate these into two. Also it looks like the URIs like http://rs.gbif.org/terms/pcr_primer_forward don't resolve?

On Wed, Sep 29, 2021 at 7:44 AM Chris H <only1chunts at gmail.com<mailto:only1chunts at gmail.com>> wrote:
Good news! Thank you Thomas for sharing.
I am forwarding this onto the GSC-MIxS group as I’m sure they will be pleased to see real-world usage of the work we do.

Thanks
Chris


From: Thomas Stjernegaard Jeppesen<mailto:tsjeppesen at gbif.org>
Sent: 29 September 2021 15:09
To: dwc-mixs at lists.tdwg.org<mailto:dwc-mixs at lists.tdwg.org>
Subject: [dwc-mixs] Use of the MIxS-based DWC extension

Hi All
I just wanted to let you know that GBIF now starts to see incoming  data using the MIxS-based DNA derived data extension for DwC.
A few examples:

  1.  https://www.gbif.org/dataset/e0b59ee7-19ae-4eb0-9217-33317fb50d47

     *   Example occurrence record: https://www.gbif.org/occurrence/3357191905 (scroll to the bottom to see the MIxS / DNA data)

  1.  https://www.gbif.org/dataset/9e29a2fe-d780-48a8-a93f-9ce041f9202f

     *   Example occurrence record: https://www.gbif.org/occurrence/3356905303

  1.  https://www.gbif.org/dataset/040c5662-da76-4782-a48e-cdea1892d14c

     *   Example occurrence: https://www.gbif.org/occurrence/2979635351

Best wishes

Thomas Stjernegaard Jeppesen
Web developer
orcid.org/0000-0003-1691-239X<https://orcid.org/0000-0003-1691-239X>

Global Biodiversity Information Facility
Universitetsparken 15, 2100 København Ø
www.gbif.org<https://www.gbif.org/> | www.catalogueoflife.org/<https://www.catalogueoflife.org/>



--
You received this message because you are subscribed to the Google Groups "gensc-cig" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensc-cig+unsubscribe at googlegroups.com<mailto:gensc-cig+unsubscribe at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensc-cig/E6030553-7655-47AA-AE08-D98DDF4B7BCD%40hxcore.ol<https://groups.google.com/d/msgid/gensc-cig/E6030553-7655-47AA-AE08-D98DDF4B7BCD%40hxcore.ol?utm_medium=email&utm_source=footer>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/dwc-mixs/attachments/20210930/dc78b773/attachment-0001.html>


More information about the dwc-mixs mailing list