Thursday 28th August at 14:00 UTC - MIDS approach to missing data

Dear all In preparation for the expert review and the subsequent public review we have been focussing on some key areas of the MIDS standard which still had some questions remaining. One of these is the approach taken to handle specimens which have missing data and how these would be scored. We are therefore planning a special focus session of the regular MIDS meetings to be held on Thursday 28th August at 14:00 UTC. There are two main points here which relate to the users of MIDS - in particular, curators/collection managers and researchers. The proposed option in MIDS would enable any specimen to have the potential to reach MIDS3, meaning that a collection manager/curator could aim to have an entire collection digitised to MIDS3. However, researchers may expect all specimens at a certain MIDS levels to have 'informative' data in each required field, rather than an indication that the data are missing. It would therefore need to be made clear that reaching MIDS3 is not a guarantee of the quality of the data in the record, but rather a statement on the level of digitisation that has been carried out. This would be in alignment with the view that MIDS is not a measure of quality but of data presence or absence. Option 1 These specimens would reach MIDS3 if values were entered for missing data which do not match any proposed excluded values. Option 2 These specimens could potentially never achieve MIDS3 if the data were not available on the specimen label or associated literature. The proposed management of missing data in MIDS is presented below. Handling of unknown and incomplete data Best practice dictates that wherever possible data should not be published with empty field values as this is misleading for both human users and machines. There are many reasons why data can be missing, unknown, incomplete or explicitly withheld (Groom 2019) and various tactics have been used in the past to deal with such situations. However, with the increasing use of machines to interpret and act upon data, more consistent practices should be promoted. Entering values to provide information relating to the unknown or incomplete data available enables curators and researchers to make decisions about the relevant records. For curators and collection managers, knowing why the data are unknown may help the digitisation process, potentially indicating a different workflow from basic transcription. For researchers, it can help in highlighting records that need additional research to determine an unknown value or that may not be possible to include in a desired analysis. Unknown values for data information elements If information is missing or incomplete in the specimen record for any field mapped to a MIDS information element then it is recommended to enter one of the terms for missing data values proposed by Groom et al. (2019) (Table 5). For calculating the MIDS level, it would be possible to exclude some values as having insufficient non-null values. For example: "unknown", "unknown:undigitised", "known:undigitised", "NULL", whitespace only. There will be some questions that will need to be considered, and one of the largest of these relates to the translation of insufficient non-null values into different languages. In addition, some systems may not allow empty fields and an enforced missing data value would be automatically entered. In this case, the default value could be flagged for exclusion. We have a GitHub Discussion for everyone to participate even if you can't join the meeting. https://github.com/tdwg/mids/discussions/159 There is also a Googledoc for people to add comments and notes here: https://docs.google.com/document/d/1Wlku5hoKLrFKWq9dL29D_cChB9PWw_fzQBJNwv58... With best wishes Elspeth and Cat MIDS Task Group Convenors [Logo] Dr Elspeth Haston Deputy Herbarium Curator Tel 00 44 (0)131 248 2800 20a Inverleith Row, Edinburgh, EH3 5LR Scotland rbge.org.uk<https://rbge.org.uk/> @emhaston | Google Scholar | https://orcid.org/0000-0001-9144-2848 Search our Herbarium collections online at http://data.rbge.org.uk/herb The Royal Botanic Garden Edinburgh<https://www.rbge.org.uk> is a charity registered in Scotland (No SC007983) | Support Us<https://www.rbge.org.uk/support-us> This notice applies to this email and to any other email subsequently sent by anyone at RBGE and appearing in the same chain of email correspondence. References below to "this email" should be read accordingly. This e-mail and its attachments (if any) are confidential, may be protected by copyright and may be privileged. If you receive this e-mail in error, notify us immediately by reply e-mail, delete it and do not use, disclose or copy it. Unless we expressly say otherwise in this e-mail, this e-mail does not create, form part of, or vary, any contractual or unilateral obligation. No liability is accepted for viruses and it is your responsibility to scan attachments (if any). Where this e-mail is unrelated to the business of RBGE, the opinions expressed within this e-mail are the opinions of the sender and do not necessarily constitute those of RBGE. RBGE emails are filtered and monitored.

Dear all Apologies for any duplication of this message. In preparation for the expert review and the subsequent public review we have been focussing on some key areas of the MIDS standard which still had some questions remaining. One of these is the approach taken to handle specimens which have missing data and how these would be scored. We are therefore planning a special focus session of the regular MIDS meetings to be held on Thursday 28th August at 14:00 UTC. There are two main points here which relate to the users of MIDS - in particular, curators/collection managers and researchers. The proposed option in MIDS would enable any specimen to have the potential to reach MIDS3, meaning that a collection manager/curator could aim to have an entire collection digitised to MIDS3. However, researchers may expect all specimens at a certain MIDS levels to have 'informative' data in each required field, rather than an indication that the data are missing. It would therefore need to be made clear that reaching MIDS3 is not a guarantee of the quality of the data in the record, but rather a statement on the level of digitisation that has been carried out. This would be in alignment with the view that MIDS is not a measure of quality but of data presence or absence. Option 1 These specimens would reach MIDS3 if values were entered for missing data which do not match any proposed excluded values. Option 2 These specimens could potentially never achieve MIDS3 if the data were not available on the specimen label or associated literature. The proposed management of missing data in MIDS is presented below. Handling of unknown and incomplete data Best practice dictates that wherever possible data should not be published with empty field values as this is misleading for both human users and machines. There are many reasons why data can be missing, unknown, incomplete or explicitly withheld (Groom 2019) and various tactics have been used in the past to deal with such situations. However, with the increasing use of machines to interpret and act upon data, more consistent practices should be promoted. Entering values to provide information relating to the unknown or incomplete data available enables curators and researchers to make decisions about the relevant records. For curators and collection managers, knowing why the data are unknown may help the digitisation process, potentially indicating a different workflow from basic transcription. For researchers, it can help in highlighting records that need additional research to determine an unknown value or that may not be possible to include in a desired analysis. Unknown values for data information elements If information is missing or incomplete in the specimen record for any field mapped to a MIDS information element then it is recommended to enter one of the terms for missing data values proposed by Groom et al. (2019) (Table 5). For calculating the MIDS level, it would be possible to exclude some values as having insufficient non-null values. For example: "unknown", "unknown:undigitised", "known:undigitised", "NULL", whitespace only. There will be some questions that will need to be considered, and one of the largest of these relates to the translation of insufficient non-null values into different languages. In addition, some systems may not allow empty fields and an enforced missing data value would be automatically entered. In this case, the default value could be flagged for exclusion. We have a GitHub Discussion for everyone to participate even if you can't join the meeting. https://github.com/tdwg/mids/discussions/159 There is also a Googledoc for people to add comments and notes here: https://docs.google.com/document/d/1Wlku5hoKLrFKWq9dL29D_cChB9PWw_fzQBJNwv58... With best wishes Elspeth and Cat MIDS Task Group Convenors [Logo] Dr Elspeth Haston Deputy Herbarium Curator Tel 00 44 (0)131 248 2800 20a Inverleith Row, Edinburgh, EH3 5LR Scotland rbge.org.uk<https://rbge.org.uk/> @emhaston | Google Scholar | https://orcid.org/0000-0001-9144-2848 Search our Herbarium collections online at http://data.rbge.org.uk/herb The Royal Botanic Garden Edinburgh<https://www.rbge.org.uk> is a charity registered in Scotland (No SC007983) | Support Us<https://www.rbge.org.uk/support-us> This notice applies to this email and to any other email subsequently sent by anyone at RBGE and appearing in the same chain of email correspondence. References below to "this email" should be read accordingly. This e-mail and its attachments (if any) are confidential, may be protected by copyright and may be privileged. If you receive this e-mail in error, notify us immediately by reply e-mail, delete it and do not use, disclose or copy it. Unless we expressly say otherwise in this e-mail, this e-mail does not create, form part of, or vary, any contractual or unilateral obligation. No liability is accepted for viruses and it is your responsibility to scan attachments (if any). Where this e-mail is unrelated to the business of RBGE, the opinions expressed within this e-mail are the opinions of the sender and do not necessarily constitute those of RBGE. RBGE emails are filtered and monitored.

Hi Elspeth, Thanks, this is an important topic! I will not be able to attend 28th August but here is my view: Thinking about the intent of MIDS, the presence of "unknown" values in required fields should prevent the record from reaching Level 3, because that indicates the information was not actually captured, only the field was filled. Those values don’t contribute true information and might undermine data usability or completeness assessments. For MIDS3 there is also the issue that you cannot really indicate for sure that a specimen has no additional identifiers or links (a requirement for MIDS3). Since the specification currently doesn’t require meaningful content, just presence of fields, the definition may need to be updated a little. Kind regards, Wouter On Thu, 14 Aug 2025 at 16:08, Elspeth Haston <EHaston@rbge.org.uk> wrote:
Dear all
Apologies for any duplication of this message.
In preparation for the expert review and the subsequent public review we have been focussing on some key areas of the MIDS standard which still had some questions remaining. One of these is the approach taken to handle specimens which have missing data and how these would be scored.
We are therefore planning a special focus session of the regular MIDS meetings to be held on *Thursday 28th August at 14:00 UTC.*
There are two main points here which relate to the users of MIDS – in particular, curators/collection managers and researchers.
The proposed option in MIDS would enable any specimen to have the potential to reach MIDS3, meaning that a collection manager/curator could aim to have an entire collection digitised to MIDS3.
However, researchers may expect all specimens at a certain MIDS levels to have ‘informative’ data in each required field, rather than an indication that the data are missing. It would therefore need to be made clear that reaching MIDS3 is not a guarantee of the quality of the data in the record, but rather a statement on the level of digitisation that has been carried out. This would be in alignment with the view that MIDS is not a measure of quality but of data presence or absence.
*Option 1*
These specimens would reach MIDS3 if values were entered for missing data which do not match any proposed excluded values.
*Option 2*
These specimens could potentially never achieve MIDS3 if the data were not available on the specimen label or associated literature.
The proposed management of missing data in MIDS is presented below.
*Handling of unknown and incomplete data*
Best practice dictates that wherever possible data should not be published with empty field values as this is misleading for both human users and machines. There are many reasons why data can be missing, unknown, incomplete or explicitly withheld (Groom 2019) and various tactics have been used in the past to deal with such situations. However, with the increasing use of machines to interpret and act upon data, more consistent practices should be promoted.
Entering values to provide information relating to the unknown or incomplete data available enables curators and researchers to make decisions about the relevant records. For curators and collection managers, knowing why the data are unknown may help the digitisation process, potentially indicating a different workflow from basic transcription. For researchers, it can help in highlighting records that need additional research to determine an unknown value or that may not be possible to include in a desired analysis.
*Unknown values for data information elements*
If information is missing or incomplete in the specimen record for any field mapped to a MIDS information element then it is recommended to enter one of the terms for missing data values proposed by Groom et al. (2019) (Table 5). For calculating the MIDS level, it would be possible to exclude some values as having insufficient non-null values. For example: “unknown”, “unknown:undigitised”, “known:undigitised”, “NULL”, whitespace only.
There will be some questions that will need to be considered, and one of the largest of these relates to the translation of insufficient non-null values into different languages. In addition, some systems may not allow empty fields and an enforced missing data value would be automatically entered. In this case, the default value could be flagged for exclusion.
We have a GitHub Discussion for everyone to participate even if you can’t join the meeting. https://github.com/tdwg/mids/discussions/159
There is also a Googledoc for people to add comments and notes here:
https://docs.google.com/document/d/1Wlku5hoKLrFKWq9dL29D_cChB9PWw_fzQBJNwv58...
With best wishes
Elspeth and Cat
MIDS Task Group Convenors
[image: Logo]
*Dr Elspeth Haston* Deputy Herbarium Curator Tel 00 44 (0)131 248 2800
20a Inverleith Row, Edinburgh, EH3 5LR Scotland rbge.org.uk
@emhaston | Google Scholar | https://orcid.org/0000-0001-9144-2848 Search our Herbarium collections online at http://data.rbge.org.uk/herb
The Royal Botanic Garden Edinburgh <https://www.rbge.org.uk> is a charity registered in Scotland (No SC007983) | Support Us <https://www.rbge.org.uk/support-us>
This notice applies to this email and to any other email subsequently sent by anyone at RBGE and appearing in the same chain of email correspondence. References below to "this email" should be read accordingly. This e-mail and its attachments (if any) are confidential, may be protected by copyright and may be privileged. If you receive this e-mail in error, notify us immediately by reply e-mail, delete it and do not use, disclose or copy it. Unless we expressly say otherwise in this e-mail, this e-mail does not create, form part of, or vary, any contractual or unilateral obligation. No liability is accepted for viruses and it is your responsibility to scan attachments (if any). Where this e-mail is unrelated to the business of RBGE, the opinions expressed within this e-mail are the opinions of the sender and do not necessarily constitute those of RBGE. RBGE emails are filtered and monitored. _______________________________________________ tdwg-mids mailing list -- tdwg-mids@lists.tdwg.org To unsubscribe send an email to tdwg-mids-leave@lists.tdwg.org
-- Coordinator International E-infrastructures and Data International Biodiversity Infrastructures Natural Biodiversity Center, P.O. Box 9517, 2300 RA Leiden, The Netherlands Deputy director, Distributed System of Scientific Collections (DiSSCo <http://dissco.eu/>) Node Manager for DiSSCo, Global Biodiversity Information Facility (GBIF <http://www.gbif.org/>) Chair Biodiversity Data Integration IG, Research Data Alliance (RDA <http://www.rd-alliance.org/>) *ORCID*: 0000-0002-3090-1761 | *Linkedin*: *linkedin.com/in/wouteraddink/ <http://linkedin.com/in/wouteraddink/>* *Twitter*: @wouter99999 | *Tel*: +31 (0) 71 751 9364 wouter.addink@naturalis.nl - www.naturalis.nl - www.catalogueoflife.org - www.dissco.eu

Dear all Apologies for any duplication of this message. Just a reminder about today's focus session of the MIDS meeting. In preparation for the expert review and the subsequent public review we have been focussing on some key areas of the MIDS standard which still had some questions remaining. One of these is the approach taken to handle specimens which have missing data and how these would be scored. We are therefore planning a special focus session of the regular MIDS meetings to be held on Thursday 28th August at 14:00 UTC. There are two main points here which relate to the users of MIDS - in particular, curators/collection managers and researchers. The proposed option in MIDS would enable any specimen to have the potential to reach MIDS3, meaning that a collection manager/curator could aim to have an entire collection digitised to MIDS3. However, researchers may expect all specimens at a certain MIDS levels to have 'informative' data in each required field, rather than an indication that the data are missing. It would therefore need to be made clear that reaching MIDS3 is not a guarantee of the quality of the data in the record, but rather a statement on the level of digitisation that has been carried out. This would be in alignment with the view that MIDS is not a measure of quality but of data presence or absence. Option 1 These specimens would reach MIDS3 if values were entered for missing data which do not match any proposed excluded values. Option 2 These specimens could potentially never achieve MIDS3 if the data were not available on the specimen label or associated literature. The proposed management of missing data in MIDS is presented below. Handling of unknown and incomplete data Best practice dictates that wherever possible data should not be published with empty field values as this is misleading for both human users and machines. There are many reasons why data can be missing, unknown, incomplete or explicitly withheld (Groom 2019) and various tactics have been used in the past to deal with such situations. However, with the increasing use of machines to interpret and act upon data, more consistent practices should be promoted. Entering values to provide information relating to the unknown or incomplete data available enables curators and researchers to make decisions about the relevant records. For curators and collection managers, knowing why the data are unknown may help the digitisation process, potentially indicating a different workflow from basic transcription. For researchers, it can help in highlighting records that need additional research to determine an unknown value or that may not be possible to include in a desired analysis. Unknown values for data information elements If information is missing or incomplete in the specimen record for any field mapped to a MIDS information element then it is recommended to enter one of the terms for missing data values proposed by Groom et al. (2019) (Table 5). For calculating the MIDS level, it would be possible to exclude some values as having insufficient non-null values. For example: "unknown", "unknown:undigitised", "known:undigitised", "NULL", whitespace only. There will be some questions that will need to be considered, and one of the largest of these relates to the translation of insufficient non-null values into different languages. In addition, some systems may not allow empty fields and an enforced missing data value would be automatically entered. In this case, the default value could be flagged for exclusion. We have a GitHub Discussion for everyone to participate even if you can't join the meeting. https://github.com/tdwg/mids/discussions/159 There is also a Googledoc for people to add comments and notes here: https://docs.google.com/document/d/1Wlku5hoKLrFKWq9dL29D_cChB9PWw_fzQBJNwv58... With best wishes Elspeth and Cat MIDS Task Group Convenors [Logo] Dr Elspeth Haston Deputy Herbarium Curator Tel 00 44 (0)131 248 2800 20a Inverleith Row, Edinburgh, EH3 5LR Scotland rbge.org.uk<https://rbge.org.uk/> @emhaston | Google Scholar | https://orcid.org/0000-0001-9144-2848 Search our Herbarium collections online at http://data.rbge.org.uk/herb The Royal Botanic Garden Edinburgh<https://www.rbge.org.uk> is a charity registered in Scotland (No SC007983) | Support Us<https://www.rbge.org.uk/support-us> This notice applies to this email and to any other email subsequently sent by anyone at RBGE and appearing in the same chain of email correspondence. References below to "this email" should be read accordingly. This e-mail and its attachments (if any) are confidential, may be protected by copyright and may be privileged. If you receive this e-mail in error, notify us immediately by reply e-mail, delete it and do not use, disclose or copy it. Unless we expressly say otherwise in this e-mail, this e-mail does not create, form part of, or vary, any contractual or unilateral obligation. No liability is accepted for viruses and it is your responsibility to scan attachments (if any). Where this e-mail is unrelated to the business of RBGE, the opinions expressed within this e-mail are the opinions of the sender and do not necessarily constitute those of RBGE. RBGE emails are filtered and monitored.

Apologies for not including the Zoom link Join Zoom Meeting https://us02web.zoom.us/j/89321928120?pwd=tc1AG6KqCz5Sv0eca4nvFKF0Df8sja.1 Meeting ID: 893 2192 8120 Passcode: 459138 --- One tap mobile +13017158592,,89321928120#,,,,*459138# US (Washington DC) +13052241968,,89321928120#,,,,*459138# US --- Dial by your location * +1 301 715 8592 US (Washington DC) * +1 305 224 1968 US * +1 309 205 3325 US * +1 312 626 6799 US (Chicago) * +1 346 248 7799 US (Houston) * +1 360 209 5623 US * +1 386 347 5053 US * +1 507 473 4847 US * +1 564 217 2000 US * +1 646 931 3860 US * +1 669 444 9171 US * +1 669 900 6833 US (San Jose) * +1 689 278 1000 US * +1 719 359 4580 US * +1 929 205 6099 US (New York) * +1 253 205 0468 US * +1 253 215 8782 US (Tacoma) Meeting ID: 893 2192 8120 Passcode: 459138 Find your local number: https://us02web.zoom.us/u/knnx5ANWd From: Elspeth Haston Sent: 28 August 2025 12:39 To: TDWG Minimum Information about a Digital Specimen (MIDS) Task Group Mailing List <tdwg-mids@lists.tdwg.org> Subject: Reminder: TODAY at 14:00 UTC - MIDS approach to missing data Dear all Apologies for any duplication of this message. Just a reminder about today's focus session of the MIDS meeting. In preparation for the expert review and the subsequent public review we have been focussing on some key areas of the MIDS standard which still had some questions remaining. One of these is the approach taken to handle specimens which have missing data and how these would be scored. We are therefore planning a special focus session of the regular MIDS meetings to be held on Thursday 28th August at 14:00 UTC. There are two main points here which relate to the users of MIDS - in particular, curators/collection managers and researchers. The proposed option in MIDS would enable any specimen to have the potential to reach MIDS3, meaning that a collection manager/curator could aim to have an entire collection digitised to MIDS3. However, researchers may expect all specimens at a certain MIDS levels to have 'informative' data in each required field, rather than an indication that the data are missing. It would therefore need to be made clear that reaching MIDS3 is not a guarantee of the quality of the data in the record, but rather a statement on the level of digitisation that has been carried out. This would be in alignment with the view that MIDS is not a measure of quality but of data presence or absence. Option 1 These specimens would reach MIDS3 if values were entered for missing data which do not match any proposed excluded values. Option 2 These specimens could potentially never achieve MIDS3 if the data were not available on the specimen label or associated literature. The proposed management of missing data in MIDS is presented below. Handling of unknown and incomplete data Best practice dictates that wherever possible data should not be published with empty field values as this is misleading for both human users and machines. There are many reasons why data can be missing, unknown, incomplete or explicitly withheld (Groom 2019) and various tactics have been used in the past to deal with such situations. However, with the increasing use of machines to interpret and act upon data, more consistent practices should be promoted. Entering values to provide information relating to the unknown or incomplete data available enables curators and researchers to make decisions about the relevant records. For curators and collection managers, knowing why the data are unknown may help the digitisation process, potentially indicating a different workflow from basic transcription. For researchers, it can help in highlighting records that need additional research to determine an unknown value or that may not be possible to include in a desired analysis. Unknown values for data information elements If information is missing or incomplete in the specimen record for any field mapped to a MIDS information element then it is recommended to enter one of the terms for missing data values proposed by Groom et al. (2019) (Table 5). For calculating the MIDS level, it would be possible to exclude some values as having insufficient non-null values. For example: "unknown", "unknown:undigitised", "known:undigitised", "NULL", whitespace only. There will be some questions that will need to be considered, and one of the largest of these relates to the translation of insufficient non-null values into different languages. In addition, some systems may not allow empty fields and an enforced missing data value would be automatically entered. In this case, the default value could be flagged for exclusion. We have a GitHub Discussion for everyone to participate even if you can't join the meeting. https://github.com/tdwg/mids/discussions/159 There is also a Googledoc for people to add comments and notes here: https://docs.google.com/document/d/1Wlku5hoKLrFKWq9dL29D_cChB9PWw_fzQBJNwv58... With best wishes Elspeth and Cat MIDS Task Group Convenors [Logo] Dr Elspeth Haston Deputy Herbarium Curator Tel 00 44 (0)131 248 2800 20a Inverleith Row, Edinburgh, EH3 5LR Scotland rbge.org.uk<https://rbge.org.uk/> @emhaston | Google Scholar | https://orcid.org/0000-0001-9144-2848 Search our Herbarium collections online at http://data.rbge.org.uk/herb The Royal Botanic Garden Edinburgh<https://www.rbge.org.uk> is a charity registered in Scotland (No SC007983) | Support Us<https://www.rbge.org.uk/support-us> This notice applies to this email and to any other email subsequently sent by anyone at RBGE and appearing in the same chain of email correspondence. References below to "this email" should be read accordingly. This e-mail and its attachments (if any) are confidential, may be protected by copyright and may be privileged. If you receive this e-mail in error, notify us immediately by reply e-mail, delete it and do not use, disclose or copy it. Unless we expressly say otherwise in this e-mail, this e-mail does not create, form part of, or vary, any contractual or unilateral obligation. No liability is accepted for viruses and it is your responsibility to scan attachments (if any). Where this e-mail is unrelated to the business of RBGE, the opinions expressed within this e-mail are the opinions of the sender and do not necessarily constitute those of RBGE. RBGE emails are filtered and monitored.
participants (2)
-
Elspeth Haston
-
Wouter Addink