Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go
Hi all,
I have done the exercise mentioned in subject. Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/sr... The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data I hope some of the remarks and questions are useful!
I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPThttps://ipt.gbif.org/resource?r=brokewest-fish when the new terms are up in the sandbox.
Cheers Ming
Ming, this is amazing. The embedded R in the document is very helpful. Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
From: tdwg-humboldt tdwg-humboldt-bounces@lists.tdwg.org on behalf of Yi Ming Gan ymgan@naturalsciences.be Date: Monday, August 7, 2023 at 11:33 AM To: tdwg-humboldt@lists.tdwg.org tdwg-humboldt@lists.tdwg.org Cc: wmh6@cornell.edu wmh6@cornell.edu Subject: [tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go Hi all,
I have done the exercise mentioned in subject. Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/sr... The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data I hope some of the remarks and questions are useful!
I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPThttps://ipt.gbif.org/resource?r=brokewest-fish when the new terms are up in the sandbox.
Cheers Ming
yuhuuu!!! amazing progress! Thanks Ming and John!
I hope everybody received my invitation to today's meeting.
I might be a couple of minutes late but hope to see everybody soon!
Cheers Yani
On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan ymgan@naturalsciences.be wrote:
Hi all,
I have done the exercise mentioned in subject. Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/sr... The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data I hope some of the remarks and questions are useful!
I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPT https://ipt.gbif.org/resource?r=brokewest-fish when the new terms are up in the sandbox.
Cheers Ming _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Hi all,
One of the things that is still missing from the list of terms document (https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
- name as you would like it to be listed - affiliation as you would like it to be listed - ORCID identifier.
Thanks, Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
Hi Steve,
Thanks for working on this. Here is my info:
Zachary R. Kachian Keller Science Action Center, Field Museum of Natural History https://orcid.org/0000-0002-0500-0339
Best, Zach
On Wed, Aug 9, 2023 at 8:42 AM Baskauf, Steven James < steve.baskauf@vanderbilt.edu> wrote:
Hi all,
One of the things that is still missing from the list of terms document ( https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
name as you would like it to be listed
affiliation as you would like it to be listed
ORCID identifier.
Thanks,
Steve
--
Steven J. Baskauf, Ph.D. he/him/his
Data Science and Data Curation Specialist / Librarian III
Jean & Alexander Heard Libraries, Vanderbilt University
Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Got it! Thanks Zach. Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
From: tdwg-humboldt tdwg-humboldt-bounces@lists.tdwg.org on behalf of Zachary Kachian zkachian@fieldmuseum.org Date: Wednesday, August 9, 2023 at 3:49 PM To: Humboldt Core TG tdwg-humboldt@lists.tdwg.org Subject: Re: [tdwg-humboldt] List of terms authors Hi Steve,
Thanks for working on this. Here is my info:
Zachary R. Kachian Keller Science Action Center, Field Museum of Natural History https://orcid.org/0000-0002-0500-0339
Best, Zach
On Wed, Aug 9, 2023 at 8:42 AM Baskauf, Steven James <steve.baskauf@vanderbilt.edumailto:steve.baskauf@vanderbilt.edu> wrote: Hi all,
One of the things that is still missing from the list of terms document (https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
- name as you would like it to be listed - affiliation as you would like it to be listed - ORCID identifier.
Thanks, Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
_______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Hi Steve,
Here is mine:
Dmitry Schigel, Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark https://orcid.org/0000-0002-2919-1168
PS Rob recruited a superreviewer from Prague here at ESA2023 – a very motivated person. Will reveal the name when he writes to me, as he promised
DS
From: tdwg-humboldt tdwg-humboldt-bounces@lists.tdwg.org On Behalf Of Baskauf, Steven James Sent: Wednesday, 9 August, 2023 19:42 To: Humboldt Core TG tdwg-humboldt@lists.tdwg.org Subject: Re: [tdwg-humboldt] List of terms authors
Got it! Thanks Zach. Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
From: tdwg-humboldt <tdwg-humboldt-bounces@lists.tdwg.orgmailto:tdwg-humboldt-bounces@lists.tdwg.org> on behalf of Zachary Kachian <zkachian@fieldmuseum.orgmailto:zkachian@fieldmuseum.org> Date: Wednesday, August 9, 2023 at 3:49 PM To: Humboldt Core TG <tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org> Subject: Re: [tdwg-humboldt] List of terms authors Hi Steve,
Thanks for working on this. Here is my info:
Zachary R. Kachian Keller Science Action Center, Field Museum of Natural History https://orcid.org/0000-0002-0500-0339
Best, Zach
On Wed, Aug 9, 2023 at 8:42 AM Baskauf, Steven James <steve.baskauf@vanderbilt.edumailto:steve.baskauf@vanderbilt.edu> wrote: Hi all,
One of the things that is still missing from the list of terms document (https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
- name as you would like it to be listed - affiliation as you would like it to be listed - ORCID identifier.
Thanks, Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
_______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Hi Steve
Here is the information you requested.
Robert D Stevenson Department of Biology orcid.org/0000-0003-1617-5895
On Wed, Aug 9, 2023 at 4:49 PM Zachary Kachian zkachian@fieldmuseum.org wrote:
Hi Steve,
Thanks for working on this. Here is my info:
Zachary R. Kachian Keller Science Action Center, Field Museum of Natural History https://orcid.org/0000-0002-0500-0339
Best, Zach
On Wed, Aug 9, 2023 at 8:42 AM Baskauf, Steven James < steve.baskauf@vanderbilt.edu> wrote:
Hi all,
One of the things that is still missing from the list of terms document ( https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
name as you would like it to be listed
affiliation as you would like it to be listed
ORCID identifier.
Thanks,
Steve
--
Steven J. Baskauf, Ph.D. he/him/his
Data Science and Data Curation Specialist / Librarian III
Jean & Alexander Heard Libraries, Vanderbilt University
Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Hi Steve,
Mine is:
Peter Brenton Atlas of Living Australia, CSIRO ORCID: 0000-0001-9730-8340
From: tdwg-humboldt tdwg-humboldt-bounces@lists.tdwg.org On Behalf Of Baskauf, Steven James Sent: Wednesday, 9 August 2023 11:42 PM To: Humboldt Core TG tdwg-humboldt@lists.tdwg.org Cc: wmh6@cornell.edu Subject: [tdwg-humboldt] List of terms authors
Hi all,
One of the things that is still missing from the list of terms document (https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we've tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
- name as you would like it to be listed - affiliation as you would like it to be listed - ORCID identifier.
Thanks, Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
Hi Steve,
Thanks for organizing this. My information is:
Kate Ingenloff, Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. https://orcid.org/0000-0001-5942-9053
Cheers, Kate
On Wed, Aug 9, 2023 at 3:42 PM Baskauf, Steven James < steve.baskauf@vanderbilt.edu> wrote:
Hi all,
One of the things that is still missing from the list of terms document ( https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
name as you would like it to be listed
affiliation as you would like it to be listed
ORCID identifier.
Thanks,
Steve
--
Steven J. Baskauf, Ph.D. he/him/his
Data Science and Data Curation Specialist / Librarian III
Jean & Alexander Heard Libraries, Vanderbilt University
Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Hi Steve, Is this already in the stage when a full review of one person going through all the terms is welcome? I am asking for "not being too late" as with my review of AC subjectPart and subjectOrientation. Kind regards, Carlos
On Thu, Aug 10, 2023 at 9:39 AM Kate Ingenloff kathryn.ingenloff@gmail.com wrote:
Hi Steve,
Thanks for organizing this. My information is:
Kate Ingenloff, Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. https://orcid.org/0000-0001-5942-9053
Cheers, Kate
On Wed, Aug 9, 2023 at 3:42 PM Baskauf, Steven James < steve.baskauf@vanderbilt.edu> wrote:
Hi all,
One of the things that is still missing from the list of terms document ( https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
name as you would like it to be listed
affiliation as you would like it to be listed
ORCID identifier.
Thanks,
Steve
--
Steven J. Baskauf, Ph.D. he/him/his
Data Science and Data Curation Specialist / Librarian III
Jean & Alexander Heard Libraries, Vanderbilt University
Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
--
Kate Ingenloff, PhD Pronouns: she/her(s) (+45) 51 44 13 23
"When one tugs at a single thread in nature, he finds it attached to the rest of the world." ~John Muir _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Hi Carlos,
The Humboldt Extension Task Group is pushing towards requesting a public review very soon. As soon as that starts, comments would be greatly appreciated.
In an earlier stage (implementation testing), a comprehensive review such as the one you are suggesting would also have been great, but I think at this point it would be hard to incorporate a lot of suggestions without holding up progress towards getting the public review started (which we’ve been pushing towards for a number of weeks).
Thanks for your interest in this and the feedback will be awesome! Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
From: Myriatrix Admin archilegt@gmail.com Date: Thursday, August 10, 2023 at 5:56 AM To: Humboldt Core TG tdwg-humboldt@lists.tdwg.org, Baskauf, Steven James steve.baskauf@vanderbilt.edu Subject: Re: [tdwg-humboldt] List of terms authors Hi Steve, Is this already in the stage when a full review of one person going through all the terms is welcome? I am asking for "not being too late" as with my review of AC subjectPart and subjectOrientation. Kind regards, Carlos
On Thu, Aug 10, 2023 at 9:39 AM Kate Ingenloff <kathryn.ingenloff@gmail.commailto:kathryn.ingenloff@gmail.com> wrote: Hi Steve,
Thanks for organizing this. My information is:
Kate Ingenloff, Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. https://orcid.org/0000-0001-5942-9053
Cheers, Kate
On Wed, Aug 9, 2023 at 3:42 PM Baskauf, Steven James <steve.baskauf@vanderbilt.edumailto:steve.baskauf@vanderbilt.edu> wrote: Hi all,
One of the things that is still missing from the list of terms document (https://tdwg.github.io/hc/list/) is the list of contributors. In other vocabularies, this is the place where we’ve tried to list everyone who had made a substantive contribution to building the vocabulary. I am going to try to implement filling the contributors by script and also to include the ORCIDs of the contributors. So if you could please send me the following, I would appreciate it:
- name as you would like it to be listed - affiliation as you would like it to be listed - ORCID identifier.
Thanks, Steve
-- Steven J. Baskauf, Ph.D. he/him/his Data Science and Data Curation Specialist / Librarian III Jean & Alexander Heard Libraries, Vanderbilt University Nashville, TN 37235, USA
Biodiversity Information Standards (TDWG) Executive Committee/Technical Architecture Group Chair https://baskauf.github.io/
_______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
-- ------------------------------ Kate Ingenloff, PhD Pronouns: she/her(s) (+45) 51 44 13 23
"When one tugs at a single thread in nature, he finds it attached to the rest of the world." ~John Muir _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Thanks for the documents Ming and Yani. It is nice to see the continued progress. Sorry I am not able to join this week's meeting.
A few comments which may not be useful after the meeting.
I did look at eco:isLeastSpecificTargetCategoryQuantityInclusive Guidelines here https://docs.google.com/document/d/11fPW4JibRWUnrtTOZwDkAcxo5d-lwnATXyh3xmO6...
The purpose is clear after reading the document but I found the term name "isLeastSpecificTargetCategoryQuantityInclusive" non intuitive.
Could the words "counts" and "preCalulated" be used somehow?
In reading the beginning of the document there is a sentence that says
" need to be treated differently in order to calculate the total quantity of organisms in the *least specific category*. " I suggest saying "in the *least specific taxonomic category" that will often be the species level.*
A larger question. If some one has terms that the Event Extension does not support. What option does some one have to share data.
I was thinking specifically of Jon Sullivan's old email. Could the event extension documents provide general advice for people with terms like Jon?
Best, Rob
Sullivan, Jon Jon.Sullivan@lincoln.ac.nz via https://support.google.com/mail/answer/1311182?hl=en lists.tdwg.org Wed, Apr 5, 10:21 PM to tdwg-humboldt@lists.tdwg.org Hello fine Humboldt Core people,
I’ve been lurking on the mailing list and finally sat down and had a proper look at how all the current HC terms map onto my ecological surveying. Perhaps my feedback is still useful at this late stage. (Rob Stevenson invited me to do this while you were doing the case studies at the end of last year but I got swamped.)
I’m on something of a personal mission to document the changes in the readily detectable and identifiable species around me. My surveys hit the 20-year mark last Saturday and I’m at over 1.5 million observations. They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe cranking through them over the last month and a half and I’m about half way through now. I’m hoping that my dataset is just the kind of ecological survey project that will benefit from the standardisation offered by Humboldt Core.
Anyway, with that preamble, below is a summary of how my data structure maps onto the draft Humboldt Core. My biggest puzzle is how to describe my unbounded distance-sampled transect counts within a structure that only seems to quantify the surveyed sites by area.
Cheers,
Jon
# Humboldt Core Terms useful as is:
*samplingPerformedBy* *identifiedBy* *verbatimSiteNames* *eventDuration* [I use samplingDurationMinutes at the moment, which will translate easily.] eventDurationUnit *targetTaxonomicScope* *targetLifeStageScope* *excludedLifeStageScope* *targetDegreeOfEstablishmentScope* [I'm hoping I can use a different vocabularly in here as I also use "endemic", as some of my surveys are only of New Zealand endemics and do not include relatively recently established Australian natives and are now also NZ natives. I also find the concepts "invasive" and "widespread invasive" too slippery to use for species scope, so I use "naturalised" when I want to refer to wild exotic species.] *excludedDegreeOfEstablishmentScope* *targetGrowthFormScope* [eg when I'm surveying woody weeds, that would be targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope == "naturalised", excludedDegreeOfEstablishmentScope == "native", targetGrowthFormScope == "tree|shrub|liane"] *targetHabitatScope* [eg targetHabitatScope == "road|roadside" for my roadkill surveys] *reportedWeather* *protocolNames* *protocolDescription* *protocolReferences* *isAbundanceReported* *hasVouchers* *voucherInstitutions* [although often my collected specimens are still in my person collection, eg waiting in my plant press] isSamplingEffortReported *samplingEffortValue* [at the moment I'm using the DWC "samplingEffort", eg samplingEffort == "11.91 km in 244 minutes"] *taxonCompletenessReported* ["reported complete"] *isTaxonomicScopeComplete* *isLifeStageScopeComplete* *isDegreeOfEstablishmentScopeComplete* *isGrowthFormScopeComplete* [is there an issue here if these fields are ever interpreted independently of one another? If I'm surveying woody weeds, then, in combination, TaxonomicScope == "Tracheophtya" and LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope == "naturalised" are complete. However, each independently is not complete, e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be better if there was just one isScopeComplete and that applies to all the scopes listed? I can't think at the moment of a survey type that would list a series of Scope values and differ in whether or not ScopeComplete == "true" for each.]
# Humboldt Core Terms useful but with modification:
*samplingEffortUnit* [how do I untangle distance and duration in this field? If I'm exploring at a site, I have the distance from my GPS and the duration. I've been regarding those in combination as my sampling effort for a site.] *isLeastSpecificTargetCategoryQuantityInclusive* [I have a more complicated approach to this where I can mark individuals as "different", "possibly same", and "same", for possible or definite resightings. That lets me calculate a maximum and minimum range for my count of individuals of a species at a site. I'm not sure how to handle this here.]
# Concepts in my wildcounts data that seem to be missing from Humboldt Core Terms:
*geospatialScopeDistanceInKilometers* [most of my surveys are distance-sampling along unbounded transects, for which I have distance not area. Also, when I'm exploring sites, I record with my GPS the total distance I travel while surveying the site. In both cases, sampling distance is more relevant for my data than sampling area. I'm not sampling with plots. I need somewhere standard to put my samplingDistanceKm.] *targetPhenologyScope* [eg often in my repeat surveys I'm just mapping out individual plants that are currently flowering or fruiting. The rest get ignored. I call this whatsoughtReproductiveCondition in my data.] *targetSeenHeardScope* [e.g., if I'm in a car, I'm only counting the birds I see, while I also include the birds I hear when I'm biking. Similarly, if I was extracting data from my AudioMoth that runs in our garden, that would be all birds heard only.] *targetWildCaptiveScope* [sometimes I only survey the wild individuals. This is an important distinction to make when surveying weeds in urban areas.] *targetLargestBodyLengthScope* [eg sometimes I'm just surveying big birds, or big butteflies, such as when I'm surveying from a moving car. I have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres in my data to capture this.] *targetDeadOrAliveScope* [eg when surveying roadkill they're all dead. I can be surveying dead birds along a drive but not surveying all live birds.] *whatsoughtBodyCondition* (when I'm repeating surveying roadkill along standard routes, I'm only counting the fresh carcasses, so in my data whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect Humboldt Core to contain terms for details like this, but I’m also not sure how to handle them in Humboldt Core data.) *samplingDurationMinutes* [this will be samplingEffortValue and samplingEffortUnit when there is somewhere else to put my sampled distance] *predetermined* [sometimes I might hear an interesting bird and go outside and do a checklist including that bird. This, in my data, is a survey with predetermined=="false". If I do a planned survey at a planned time, irrespective of the conditions and species present, then that's predetermined=="true". In my data, predetermined can be assigned both at a whole survey level and at a howSought level for each Scope (eg if I went outside to look for an interesting bird I heard, and decided to survey butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for my bird list but howSoughtPredetermined == "true" for my butterfly list). I'm not sure where that concept fits into Humboldt Core.]
# Humboldt Core Terms not used by me (at the moment):
*identificationReferences* [I use scientificNameAuthorship and namePublishedIn and namePublishedInYear from Darwin Core in my species database as a source for each species concept, so perhaps that works for this?] *siteCount* [this could be calculated for each of my surveys, usually by summing up sections of my transects] *siteNestingDescription* [I don't use this now but could add a generic text description of my method into here. Most of my surveys have a footprintWKT LINESTRING of where I searched, and these can optionally have nested subsets of linestrings in different habitats, or sections of a transect, or time periods. I could descibe this in text but it will also be apparent from the data.] *verbatimSiteDescriptions* *geospatialScopeAreaInSquareKilometers* [all my surveys in recent years are unbounded distance-sampled transects. I use a footprintWKT LINESTRING generated from a GPX file to define the transect. For some taxa, I restrict the width of the distance sampled, and so could calculate an area. For most (eg birds), my observations are all counts in estimated distance bands, including as far away as I can see and hear with my unaided eye and ear. A geospatialScopeDistanceInKilometers would work much better for my surveys than area. At the moment I'm using a LINESTRING footprintWKT and samplingDistanceKm.] *totalAreaSampledInSquareKilometers* [comment as above] *reportedExtremeConditions* *compilationType* *compilationSourceTypes* *inventoryTypes* [At the moment I don't understand what this refers to. If I do a distance-sampling transect and count and map out all butterflies, which I do, is that inventoryTypes == "open search"? Or does that mean something else?] *isAbundanceCapReported* [I suppose I can include this always as isAbundanceCapReported == "FALSE"] *abundanceCap* *isVegetationCoverReported* *isAbsenceReported* [lots of absences can be inferred from my surveys--that's one of the reasons I do them--but I don't have data like “blackbird = 0". I'm assuming that's the kind of data you're meaning by isAbsenceReported == "true"] *absentTaxa* [although all absent taxa can be inferred from the target fields like targetTaxonomicScope. I assume you're not expecting impossibly long lists of absent taxa with every survey dataset. It's more efficient to say targetTaxonomicScope == "Aves" and just list the birds seen and heard.] *hasMaterialSamples* *materialSampleTypes* *samplingEffortProtocol* [I don't see the difference between this and samplingEffortValue for my surveys, when my survey method has been described in protocolDescription] *taxonCompletenessProtocols* [I'm confused at the moment by how to describe my sampling method across protocolDescription, samplingEffortProtocol, and taxonCompletenessProtocols. I sense that taxonCompletenessProtocols is trying to get at whether a survey has made an estimate of its detection probability for a surveyed taxon, and how that was assessed, and whether this influenced the sampling effort. In my case, for example, I might bike a 20 km route and distance count/map all birds I see and hear. That's a good fit for the protocolDescription. I'm not sure what I would then need to state for the samplingEffortProtocol and the taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can be modelled from my data but that doesn't influence how quickly I'm biking or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]
# examples some details of my method that probably just don't belong as Humboldt Core Terms
gpsSource visualAid auditoryAid dataEntryHardware dataEntrySoftware distance bands from observer
On Wed, Aug 9, 2023 at 7:23 AM Yanina Sica yanina.sica@gmail.com wrote:
yuhuuu!!! amazing progress! Thanks Ming and John!
I hope everybody received my invitation to today's meeting.
I might be a couple of minutes late but hope to see everybody soon!
Cheers Yani
On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan ymgan@naturalsciences.be wrote:
Hi all,
I have done the exercise mentioned in subject. Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/sr... The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data I hope some of the remarks and questions are useful!
I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPT https://ipt.gbif.org/resource?r=brokewest-fish when the new terms are up in the sandbox.
Cheers Ming _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Hi Rob, Thanks for these comments. I agree the eco:isLeastSpecificTargetCategoryQuantityInclusive is quite a terrible name but we reached to that name because we took 2 main things under consideration:
* it is needed to understand how to treat dwc:organismQuantity and dwc:organismQuantityType. So we thought Quantity should be there * and it allows for multiple target categories (e.g., taxonomic ranks within a higher rank or different life stages for the same species) so thats why we left it quite open...
Regarding your larger question, this is a very interesting comment and I think we should cover this in the User Guidehttps://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit. My initial thought is that the extended measurements or facts extension (emof) could be useful here. Also, I think you are allowed to include your own terms when using the IPT, but I am not sure.
Best!
Yani [https://lh3.googleusercontent.com/docs/AOD9vFpqo6Fpc-BS60ivUQu2xD0bLOTXzZsfn...]https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit Humboldt Extension for ecological inventories User Guidehttps://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit Humboldt Extension for Ecological Inventories: User Guide Date modified: 2023-05-11 Part of TDWG Standard: Not part of any standard Abstract: This user guide provides information and examples showing how to use the Humboldt Extension for Ecological Inventories ("eco") to extend Darwin Core Even... docs.google.com
________________________________ From: tdwg-humboldt tdwg-humboldt-bounces@lists.tdwg.org on behalf of Rob Stevenson rdstevenson10@gmail.com Sent: Friday, August 11, 2023 6:14 PM To: Humboldt Core TG tdwg-humboldt@lists.tdwg.org Cc: wmh6@cornell.edu wmh6@cornell.edu Subject: Re: [tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go
Thanks for the documents Ming and Yani. It is nice to see the continued progress. Sorry I am not able to join this week's meeting.
A few comments which may not be useful after the meeting.
I did look at eco:isLeastSpecificTargetCategoryQuantityInclusive Guidelines here https://docs.google.com/document/d/11fPW4JibRWUnrtTOZwDkAcxo5d-lwnATXyh3xmO6...
The purpose is clear after reading the document but I found the term name "isLeastSpecificTargetCategoryQuantityInclusive" non intuitive.
Could the words "counts" and "preCalulated" be used somehow?
In reading the beginning of the document there is a sentence that says
" need to be treated differently in order to calculate the total quantity of organisms in the least specific category. " I suggest saying "in the least specific taxonomic category" that will often be the species level.
A larger question. If some one has terms that the Event Extension does not support. What option does some one have to share data.
I was thinking specifically of Jon Sullivan's old email. Could the event extension documents provide general advice for people with terms like Jon?
Best, Rob
[https://lh3.googleusercontent.com/cm/AJSPFBwKJnLurLy3G58vsBvIjfKZkiBfoEsnOty...] Sullivan, Jon Jon.Sullivan@lincoln.ac.nzmailto:Jon.Sullivan@lincoln.ac.nz viahttps://support.google.com/mail/answer/1311182?hl=en lists.tdwg.orghttp://lists.tdwg.org/ Wed, Apr 5, 10:21 PM [https://mail.google.com/mail/u/0/images/cleardot.gif] [https://mail.google.com/mail/u/0/images/cleardot.gif] to tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org [https://mail.google.com/mail/u/0/images/cleardot.gif] Hello fine Humboldt Core people,
I’ve been lurking on the mailing list and finally sat down and had a proper look at how all the current HC terms map onto my ecological surveying. Perhaps my feedback is still useful at this late stage. (Rob Stevenson invited me to do this while you were doing the case studies at the end of last year but I got swamped.)
I’m on something of a personal mission to document the changes in the readily detectable and identifiable species around me. My surveys hit the 20-year mark last Saturday and I’m at over 1.5 million observations. They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe cranking through them over the last month and a half and I’m about half way through now. I’m hoping that my dataset is just the kind of ecological survey project that will benefit from the standardisation offered by Humboldt Core.
Anyway, with that preamble, below is a summary of how my data structure maps onto the draft Humboldt Core. My biggest puzzle is how to describe my unbounded distance-sampled transect counts within a structure that only seems to quantify the surveyed sites by area.
Cheers,
Jon
# Humboldt Core Terms useful as is:
samplingPerformedBy identifiedBy verbatimSiteNames eventDuration [I use samplingDurationMinutes at the moment, which will translate easily.] eventDurationUnit targetTaxonomicScope targetLifeStageScope excludedLifeStageScope targetDegreeOfEstablishmentScope [I'm hoping I can use a different vocabularly in here as I also use "endemic", as some of my surveys are only of New Zealand endemics and do not include relatively recently established Australian natives and are now also NZ natives. I also find the concepts "invasive" and "widespread invasive" too slippery to use for species scope, so I use "naturalised" when I want to refer to wild exotic species.] excludedDegreeOfEstablishmentScope targetGrowthFormScope [eg when I'm surveying woody weeds, that would be targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope == "naturalised", excludedDegreeOfEstablishmentScope == "native", targetGrowthFormScope == "tree|shrub|liane"] targetHabitatScope [eg targetHabitatScope == "road|roadside" for my roadkill surveys] reportedWeather protocolNames protocolDescription protocolReferences isAbundanceReported hasVouchers voucherInstitutions [although often my collected specimens are still in my person collection, eg waiting in my plant press] isSamplingEffortReported samplingEffortValue [at the moment I'm using the DWC "samplingEffort", eg samplingEffort == "11.91 km in 244 minutes"] taxonCompletenessReported ["reported complete"] isTaxonomicScopeComplete isLifeStageScopeComplete isDegreeOfEstablishmentScopeComplete isGrowthFormScopeComplete [is there an issue here if these fields are ever interpreted independently of one another? If I'm surveying woody weeds, then, in combination, TaxonomicScope == "Tracheophtya" and LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope == "naturalised" are complete. However, each independently is not complete, e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be better if there was just one isScopeComplete and that applies to all the scopes listed? I can't think at the moment of a survey type that would list a series of Scope values and differ in whether or not ScopeComplete == "true" for each.]
# Humboldt Core Terms useful but with modification:
samplingEffortUnit [how do I untangle distance and duration in this field? If I'm exploring at a site, I have the distance from my GPS and the duration. I've been regarding those in combination as my sampling effort for a site.] isLeastSpecificTargetCategoryQuantityInclusive [I have a more complicated approach to this where I can mark individuals as "different", "possibly same", and "same", for possible or definite resightings. That lets me calculate a maximum and minimum range for my count of individuals of a species at a site. I'm not sure how to handle this here.]
# Concepts in my wildcounts data that seem to be missing from Humboldt Core Terms:
geospatialScopeDistanceInKilometers [most of my surveys are distance-sampling along unbounded transects, for which I have distance not area. Also, when I'm exploring sites, I record with my GPS the total distance I travel while surveying the site. In both cases, sampling distance is more relevant for my data than sampling area. I'm not sampling with plots. I need somewhere standard to put my samplingDistanceKm.] targetPhenologyScope [eg often in my repeat surveys I'm just mapping out individual plants that are currently flowering or fruiting. The rest get ignored. I call this whatsoughtReproductiveCondition in my data.] targetSeenHeardScope [e.g., if I'm in a car, I'm only counting the birds I see, while I also include the birds I hear when I'm biking. Similarly, if I was extracting data from my AudioMoth that runs in our garden, that would be all birds heard only.] targetWildCaptiveScope [sometimes I only survey the wild individuals. This is an important distinction to make when surveying weeds in urban areas.] targetLargestBodyLengthScope [eg sometimes I'm just surveying big birds, or big butteflies, such as when I'm surveying from a moving car. I have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres in my data to capture this.] targetDeadOrAliveScope [eg when surveying roadkill they're all dead. I can be surveying dead birds along a drive but not surveying all live birds.] whatsoughtBodyCondition (when I'm repeating surveying roadkill along standard routes, I'm only counting the fresh carcasses, so in my data whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect Humboldt Core to contain terms for details like this, but I’m also not sure how to handle them in Humboldt Core data.) samplingDurationMinutes [this will be samplingEffortValue and samplingEffortUnit when there is somewhere else to put my sampled distance] predetermined [sometimes I might hear an interesting bird and go outside and do a checklist including that bird. This, in my data, is a survey with predetermined=="false". If I do a planned survey at a planned time, irrespective of the conditions and species present, then that's predetermined=="true". In my data, predetermined can be assigned both at a whole survey level and at a howSought level for each Scope (eg if I went outside to look for an interesting bird I heard, and decided to survey butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for my bird list but howSoughtPredetermined == "true" for my butterfly list). I'm not sure where that concept fits into Humboldt Core.]
# Humboldt Core Terms not used by me (at the moment):
identificationReferences [I use scientificNameAuthorship and namePublishedIn and namePublishedInYear from Darwin Core in my species database as a source for each species concept, so perhaps that works for this?] siteCount [this could be calculated for each of my surveys, usually by summing up sections of my transects] siteNestingDescription [I don't use this now but could add a generic text description of my method into here. Most of my surveys have a footprintWKT LINESTRING of where I searched, and these can optionally have nested subsets of linestrings in different habitats, or sections of a transect, or time periods. I could descibe this in text but it will also be apparent from the data.] verbatimSiteDescriptions geospatialScopeAreaInSquareKilometers [all my surveys in recent years are unbounded distance-sampled transects. I use a footprintWKT LINESTRING generated from a GPX file to define the transect. For some taxa, I restrict the width of the distance sampled, and so could calculate an area. For most (eg birds), my observations are all counts in estimated distance bands, including as far away as I can see and hear with my unaided eye and ear. A geospatialScopeDistanceInKilometers would work much better for my surveys than area. At the moment I'm using a LINESTRING footprintWKT and samplingDistanceKm.] totalAreaSampledInSquareKilometers [comment as above] reportedExtremeConditions compilationType compilationSourceTypes inventoryTypes [At the moment I don't understand what this refers to. If I do a distance-sampling transect and count and map out all butterflies, which I do, is that inventoryTypes == "open search"? Or does that mean something else?] isAbundanceCapReported [I suppose I can include this always as isAbundanceCapReported == "FALSE"] abundanceCap isVegetationCoverReported isAbsenceReported [lots of absences can be inferred from my surveys--that's one of the reasons I do them--but I don't have data like “blackbird = 0". I'm assuming that's the kind of data you're meaning by isAbsenceReported == "true"] absentTaxa [although all absent taxa can be inferred from the target fields like targetTaxonomicScope. I assume you're not expecting impossibly long lists of absent taxa with every survey dataset. It's more efficient to say targetTaxonomicScope == "Aves" and just list the birds seen and heard.] hasMaterialSamples materialSampleTypes samplingEffortProtocol [I don't see the difference between this and samplingEffortValue for my surveys, when my survey method has been described in protocolDescription] taxonCompletenessProtocols [I'm confused at the moment by how to describe my sampling method across protocolDescription, samplingEffortProtocol, and taxonCompletenessProtocols. I sense that taxonCompletenessProtocols is trying to get at whether a survey has made an estimate of its detection probability for a surveyed taxon, and how that was assessed, and whether this influenced the sampling effort. In my case, for example, I might bike a 20 km route and distance count/map all birds I see and hear. That's a good fit for the protocolDescription. I'm not sure what I would then need to state for the samplingEffortProtocol and the taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can be modelled from my data but that doesn't influence how quickly I'm biking or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]
# examples some details of my method that probably just don't belong as Humboldt Core Terms
gpsSource visualAid auditoryAid dataEntryHardware dataEntrySoftware distance bands from observer
On Wed, Aug 9, 2023 at 7:23 AM Yanina Sica <yanina.sica@gmail.commailto:yanina.sica@gmail.com> wrote: yuhuuu!!! amazing progress! Thanks Ming and John!
I hope everybody received my invitation to today's meeting.
I might be a couple of minutes late but hope to see everybody soon!
Cheers Yani
On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan <ymgan@naturalsciences.bemailto:ymgan@naturalsciences.be> wrote: Hi all,
I have done the exercise mentioned in subject. Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/sr... The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data I hope some of the remarks and questions are useful!
I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPThttps://ipt.gbif.org/resource?r=brokewest-fish when the new terms are up in the sandbox.
Cheers Ming _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.orgmailto:tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
-- Robert D Stevenson Associate Professor Department of Biology UMass Boston
For Events, the Extended Measurements or Facts extension is the only way to declare and share properties outside of the mappable terms. For Occurrences, one also has the option to use dwc:dynamicProperties.
In the IPT, one can upload a file with fields beyond those that are mapped, but only the mapped fields plus those that are set to a constant within the IPT get propagated in the output.
On Sun, Aug 13, 2023 at 1:16 PM ys628 yanina.sica@yale.edu wrote:
Hi Rob, Thanks for these comments. I agree the eco:isLeastSpecificTargetCategoryQuantityInclusive is quite a terrible name but we reached to that name because we took 2 main things under consideration:
- it is needed to understand how to treat dwc:organismQuantity and *
dwc:organismQuantityType. So we thought Quantity should be there*
- and it allows for multiple target categories (e.g., taxonomic ranks
within a higher rank or different life stages for the same species) so thats why we left it quite open...
Regarding your larger question, this is a very interesting comment and I think we should cover this in the User Guide https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit. My initial thought is that the extended measurements or facts extension (emof) could be useful here. Also, I think you are allowed to include your own terms when using the IPT, but I am not sure.
Best!
Yani
https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit Humboldt Extension for ecological inventories User Guide https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit Humboldt Extension for Ecological Inventories: User Guide Date modified: 2023-05-11 Part of TDWG Standard: Not part of any standard Abstract: This user guide provides information and examples showing how to use the Humboldt Extension for Ecological Inventories ("eco") to extend Darwin Core Even... docs.google.com
*From:* tdwg-humboldt tdwg-humboldt-bounces@lists.tdwg.org on behalf of Rob Stevenson rdstevenson10@gmail.com *Sent:* Friday, August 11, 2023 6:14 PM *To:* Humboldt Core TG tdwg-humboldt@lists.tdwg.org *Cc:* wmh6@cornell.edu wmh6@cornell.edu *Subject:* Re: [tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go
Thanks for the documents Ming and Yani. It is nice to see the continued progress. Sorry I am not able to join this week's meeting.
A few comments which may not be useful after the meeting.
I did look at eco:isLeastSpecificTargetCategoryQuantityInclusive Guidelines here https://docs.google.com/document/d/11fPW4JibRWUnrtTOZwDkAcxo5d-lwnATXyh3xmO6...
The purpose is clear after reading the document but I found the term name "isLeastSpecificTargetCategoryQuantityInclusive" non intuitive.
Could the words "counts" and "preCalulated" be used somehow?
In reading the beginning of the document there is a sentence that says
" need to be treated differently in order to calculate the total quantity of organisms in the *least specific category*. " I suggest saying "in the *least specific taxonomic category" that will often be the species level.*
A larger question. If some one has terms that the Event Extension does not support. What option does some one have to share data.
I was thinking specifically of Jon Sullivan's old email. Could the event extension documents provide general advice for people with terms like Jon?
Best, Rob
Sullivan, Jon Jon.Sullivan@lincoln.ac.nz via https://support.google.com/mail/answer/1311182?hl=en lists.tdwg.org Wed, Apr 5, 10:21 PM to tdwg-humboldt@lists.tdwg.org Hello fine Humboldt Core people,
I’ve been lurking on the mailing list and finally sat down and had a proper look at how all the current HC terms map onto my ecological surveying. Perhaps my feedback is still useful at this late stage. (Rob Stevenson invited me to do this while you were doing the case studies at the end of last year but I got swamped.)
I’m on something of a personal mission to document the changes in the readily detectable and identifiable species around me. My surveys hit the 20-year mark last Saturday and I’m at over 1.5 million observations. They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe cranking through them over the last month and a half and I’m about half way through now. I’m hoping that my dataset is just the kind of ecological survey project that will benefit from the standardisation offered by Humboldt Core.
Anyway, with that preamble, below is a summary of how my data structure maps onto the draft Humboldt Core. My biggest puzzle is how to describe my unbounded distance-sampled transect counts within a structure that only seems to quantify the surveyed sites by area.
Cheers,
Jon
# Humboldt Core Terms useful as is:
*samplingPerformedBy* *identifiedBy* *verbatimSiteNames* *eventDuration* [I use samplingDurationMinutes at the moment, which will translate easily.] eventDurationUnit *targetTaxonomicScope* *targetLifeStageScope* *excludedLifeStageScope* *targetDegreeOfEstablishmentScope* [I'm hoping I can use a different vocabularly in here as I also use "endemic", as some of my surveys are only of New Zealand endemics and do not include relatively recently established Australian natives and are now also NZ natives. I also find the concepts "invasive" and "widespread invasive" too slippery to use for species scope, so I use "naturalised" when I want to refer to wild exotic species.] *excludedDegreeOfEstablishmentScope* *targetGrowthFormScope* [eg when I'm surveying woody weeds, that would be targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope == "naturalised", excludedDegreeOfEstablishmentScope == "native", targetGrowthFormScope == "tree|shrub|liane"] *targetHabitatScope* [eg targetHabitatScope == "road|roadside" for my roadkill surveys] *reportedWeather* *protocolNames* *protocolDescription* *protocolReferences* *isAbundanceReported* *hasVouchers* *voucherInstitutions* [although often my collected specimens are still in my person collection, eg waiting in my plant press] isSamplingEffortReported *samplingEffortValue* [at the moment I'm using the DWC "samplingEffort", eg samplingEffort == "11.91 km in 244 minutes"] *taxonCompletenessReported* ["reported complete"] *isTaxonomicScopeComplete* *isLifeStageScopeComplete* *isDegreeOfEstablishmentScopeComplete* *isGrowthFormScopeComplete* [is there an issue here if these fields are ever interpreted independently of one another? If I'm surveying woody weeds, then, in combination, TaxonomicScope == "Tracheophtya" and LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope == "naturalised" are complete. However, each independently is not complete, e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be better if there was just one isScopeComplete and that applies to all the scopes listed? I can't think at the moment of a survey type that would list a series of Scope values and differ in whether or not ScopeComplete == "true" for each.]
# Humboldt Core Terms useful but with modification:
*samplingEffortUnit* [how do I untangle distance and duration in this field? If I'm exploring at a site, I have the distance from my GPS and the duration. I've been regarding those in combination as my sampling effort for a site.] *isLeastSpecificTargetCategoryQuantityInclusive* [I have a more complicated approach to this where I can mark individuals as "different", "possibly same", and "same", for possible or definite resightings. That lets me calculate a maximum and minimum range for my count of individuals of a species at a site. I'm not sure how to handle this here.]
# Concepts in my wildcounts data that seem to be missing from Humboldt Core Terms:
*geospatialScopeDistanceInKilometers* [most of my surveys are distance-sampling along unbounded transects, for which I have distance not area. Also, when I'm exploring sites, I record with my GPS the total distance I travel while surveying the site. In both cases, sampling distance is more relevant for my data than sampling area. I'm not sampling with plots. I need somewhere standard to put my samplingDistanceKm.] *targetPhenologyScope* [eg often in my repeat surveys I'm just mapping out individual plants that are currently flowering or fruiting. The rest get ignored. I call this whatsoughtReproductiveCondition in my data.] *targetSeenHeardScope* [e.g., if I'm in a car, I'm only counting the birds I see, while I also include the birds I hear when I'm biking. Similarly, if I was extracting data from my AudioMoth that runs in our garden, that would be all birds heard only.] *targetWildCaptiveScope* [sometimes I only survey the wild individuals. This is an important distinction to make when surveying weeds in urban areas.] *targetLargestBodyLengthScope* [eg sometimes I'm just surveying big birds, or big butteflies, such as when I'm surveying from a moving car. I have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres in my data to capture this.] *targetDeadOrAliveScope* [eg when surveying roadkill they're all dead. I can be surveying dead birds along a drive but not surveying all live birds.] *whatsoughtBodyCondition* (when I'm repeating surveying roadkill along standard routes, I'm only counting the fresh carcasses, so in my data whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect Humboldt Core to contain terms for details like this, but I’m also not sure how to handle them in Humboldt Core data.) *samplingDurationMinutes* [this will be samplingEffortValue and samplingEffortUnit when there is somewhere else to put my sampled distance] *predetermined* [sometimes I might hear an interesting bird and go outside and do a checklist including that bird. This, in my data, is a survey with predetermined=="false". If I do a planned survey at a planned time, irrespective of the conditions and species present, then that's predetermined=="true". In my data, predetermined can be assigned both at a whole survey level and at a howSought level for each Scope (eg if I went outside to look for an interesting bird I heard, and decided to survey butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for my bird list but howSoughtPredetermined == "true" for my butterfly list). I'm not sure where that concept fits into Humboldt Core.]
# Humboldt Core Terms not used by me (at the moment):
*identificationReferences* [I use scientificNameAuthorship and namePublishedIn and namePublishedInYear from Darwin Core in my species database as a source for each species concept, so perhaps that works for this?] *siteCount* [this could be calculated for each of my surveys, usually by summing up sections of my transects] *siteNestingDescription* [I don't use this now but could add a generic text description of my method into here. Most of my surveys have a footprintWKT LINESTRING of where I searched, and these can optionally have nested subsets of linestrings in different habitats, or sections of a transect, or time periods. I could descibe this in text but it will also be apparent from the data.] *verbatimSiteDescriptions* *geospatialScopeAreaInSquareKilometers* [all my surveys in recent years are unbounded distance-sampled transects. I use a footprintWKT LINESTRING generated from a GPX file to define the transect. For some taxa, I restrict the width of the distance sampled, and so could calculate an area. For most (eg birds), my observations are all counts in estimated distance bands, including as far away as I can see and hear with my unaided eye and ear. A geospatialScopeDistanceInKilometers would work much better for my surveys than area. At the moment I'm using a LINESTRING footprintWKT and samplingDistanceKm.] *totalAreaSampledInSquareKilometers* [comment as above] *reportedExtremeConditions* *compilationType* *compilationSourceTypes* *inventoryTypes* [At the moment I don't understand what this refers to. If I do a distance-sampling transect and count and map out all butterflies, which I do, is that inventoryTypes == "open search"? Or does that mean something else?] *isAbundanceCapReported* [I suppose I can include this always as isAbundanceCapReported == "FALSE"] *abundanceCap* *isVegetationCoverReported* *isAbsenceReported* [lots of absences can be inferred from my surveys--that's one of the reasons I do them--but I don't have data like “blackbird = 0". I'm assuming that's the kind of data you're meaning by isAbsenceReported == "true"] *absentTaxa* [although all absent taxa can be inferred from the target fields like targetTaxonomicScope. I assume you're not expecting impossibly long lists of absent taxa with every survey dataset. It's more efficient to say targetTaxonomicScope == "Aves" and just list the birds seen and heard.] *hasMaterialSamples* *materialSampleTypes* *samplingEffortProtocol* [I don't see the difference between this and samplingEffortValue for my surveys, when my survey method has been described in protocolDescription] *taxonCompletenessProtocols* [I'm confused at the moment by how to describe my sampling method across protocolDescription, samplingEffortProtocol, and taxonCompletenessProtocols. I sense that taxonCompletenessProtocols is trying to get at whether a survey has made an estimate of its detection probability for a surveyed taxon, and how that was assessed, and whether this influenced the sampling effort. In my case, for example, I might bike a 20 km route and distance count/map all birds I see and hear. That's a good fit for the protocolDescription. I'm not sure what I would then need to state for the samplingEffortProtocol and the taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can be modelled from my data but that doesn't influence how quickly I'm biking or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]
# examples some details of my method that probably just don't belong as Humboldt Core Terms
gpsSource visualAid auditoryAid dataEntryHardware dataEntrySoftware distance bands from observer
On Wed, Aug 9, 2023 at 7:23 AM Yanina Sica yanina.sica@gmail.com wrote:
yuhuuu!!! amazing progress! Thanks Ming and John!
I hope everybody received my invitation to today's meeting.
I might be a couple of minutes late but hope to see everybody soon!
Cheers Yani
On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan ymgan@naturalsciences.be wrote:
Hi all,
I have done the exercise mentioned in subject. Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/sr... The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data I hope some of the remarks and questions are useful!
I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPT https://ipt.gbif.org/resource?r=brokewest-fish when the new terms are up in the sandbox.
Cheers Ming _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
-- Robert D Stevenson Associate Professor Department of Biology UMass Boston _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
Correction, as Tim pointed out to me, dwc:dynamicProperties can also be used in the Event Core.
On Mon, Aug 14, 2023 at 10:46 AM John Wieczorek tuco@berkeley.edu wrote:
For Events, the Extended Measurements or Facts extension is the only way to declare and share properties outside of the mappable terms. For Occurrences, one also has the option to use dwc:dynamicProperties.
In the IPT, one can upload a file with fields beyond those that are mapped, but only the mapped fields plus those that are set to a constant within the IPT get propagated in the output.
On Sun, Aug 13, 2023 at 1:16 PM ys628 yanina.sica@yale.edu wrote:
Hi Rob, Thanks for these comments. I agree the eco:isLeastSpecificTargetCategoryQuantityInclusive is quite a terrible name but we reached to that name because we took 2 main things under consideration:
- it is needed to understand how to treat dwc:organismQuantity and *
dwc:organismQuantityType. So we thought Quantity should be there*
- and it allows for multiple target categories (e.g., taxonomic ranks
within a higher rank or different life stages for the same species) so thats why we left it quite open...
Regarding your larger question, this is a very interesting comment and I think we should cover this in the User Guide https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit. My initial thought is that the extended measurements or facts extension (emof) could be useful here. Also, I think you are allowed to include your own terms when using the IPT, but I am not sure.
Best!
Yani
https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit Humboldt Extension for ecological inventories User Guide https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit Humboldt Extension for Ecological Inventories: User Guide Date modified: 2023-05-11 Part of TDWG Standard: Not part of any standard Abstract: This user guide provides information and examples showing how to use the Humboldt Extension for Ecological Inventories ("eco") to extend Darwin Core Even... docs.google.com
*From:* tdwg-humboldt tdwg-humboldt-bounces@lists.tdwg.org on behalf of Rob Stevenson rdstevenson10@gmail.com *Sent:* Friday, August 11, 2023 6:14 PM *To:* Humboldt Core TG tdwg-humboldt@lists.tdwg.org *Cc:* wmh6@cornell.edu wmh6@cornell.edu *Subject:* Re: [tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go
Thanks for the documents Ming and Yani. It is nice to see the continued progress. Sorry I am not able to join this week's meeting.
A few comments which may not be useful after the meeting.
I did look at eco:isLeastSpecificTargetCategoryQuantityInclusive Guidelines here https://docs.google.com/document/d/11fPW4JibRWUnrtTOZwDkAcxo5d-lwnATXyh3xmO6...
The purpose is clear after reading the document but I found the term name "isLeastSpecificTargetCategoryQuantityInclusive" non intuitive.
Could the words "counts" and "preCalulated" be used somehow?
In reading the beginning of the document there is a sentence that says
" need to be treated differently in order to calculate the total quantity of organisms in the *least specific category*. " I suggest saying "in the *least specific taxonomic category" that will often be the species level.*
A larger question. If some one has terms that the Event Extension does not support. What option does some one have to share data.
I was thinking specifically of Jon Sullivan's old email. Could the event extension documents provide general advice for people with terms like Jon?
Best, Rob
Sullivan, Jon Jon.Sullivan@lincoln.ac.nz via https://support.google.com/mail/answer/1311182?hl=en lists.tdwg.org Wed, Apr 5, 10:21 PM to tdwg-humboldt@lists.tdwg.org Hello fine Humboldt Core people,
I’ve been lurking on the mailing list and finally sat down and had a proper look at how all the current HC terms map onto my ecological surveying. Perhaps my feedback is still useful at this late stage. (Rob Stevenson invited me to do this while you were doing the case studies at the end of last year but I got swamped.)
I’m on something of a personal mission to document the changes in the readily detectable and identifiable species around me. My surveys hit the 20-year mark last Saturday and I’m at over 1.5 million observations. They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe cranking through them over the last month and a half and I’m about half way through now. I’m hoping that my dataset is just the kind of ecological survey project that will benefit from the standardisation offered by Humboldt Core.
Anyway, with that preamble, below is a summary of how my data structure maps onto the draft Humboldt Core. My biggest puzzle is how to describe my unbounded distance-sampled transect counts within a structure that only seems to quantify the surveyed sites by area.
Cheers,
Jon
# Humboldt Core Terms useful as is:
*samplingPerformedBy* *identifiedBy* *verbatimSiteNames* *eventDuration* [I use samplingDurationMinutes at the moment, which will translate easily.] eventDurationUnit *targetTaxonomicScope* *targetLifeStageScope* *excludedLifeStageScope* *targetDegreeOfEstablishmentScope* [I'm hoping I can use a different vocabularly in here as I also use "endemic", as some of my surveys are only of New Zealand endemics and do not include relatively recently established Australian natives and are now also NZ natives. I also find the concepts "invasive" and "widespread invasive" too slippery to use for species scope, so I use "naturalised" when I want to refer to wild exotic species.] *excludedDegreeOfEstablishmentScope* *targetGrowthFormScope* [eg when I'm surveying woody weeds, that would be targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope == "naturalised", excludedDegreeOfEstablishmentScope == "native", targetGrowthFormScope == "tree|shrub|liane"] *targetHabitatScope* [eg targetHabitatScope == "road|roadside" for my roadkill surveys] *reportedWeather* *protocolNames* *protocolDescription* *protocolReferences* *isAbundanceReported* *hasVouchers* *voucherInstitutions* [although often my collected specimens are still in my person collection, eg waiting in my plant press] isSamplingEffortReported *samplingEffortValue* [at the moment I'm using the DWC "samplingEffort", eg samplingEffort == "11.91 km in 244 minutes"] *taxonCompletenessReported* ["reported complete"] *isTaxonomicScopeComplete* *isLifeStageScopeComplete* *isDegreeOfEstablishmentScopeComplete* *isGrowthFormScopeComplete* [is there an issue here if these fields are ever interpreted independently of one another? If I'm surveying woody weeds, then, in combination, TaxonomicScope == "Tracheophtya" and LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope == "naturalised" are complete. However, each independently is not complete, e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be better if there was just one isScopeComplete and that applies to all the scopes listed? I can't think at the moment of a survey type that would list a series of Scope values and differ in whether or not ScopeComplete == "true" for each.]
# Humboldt Core Terms useful but with modification:
*samplingEffortUnit* [how do I untangle distance and duration in this field? If I'm exploring at a site, I have the distance from my GPS and the duration. I've been regarding those in combination as my sampling effort for a site.] *isLeastSpecificTargetCategoryQuantityInclusive* [I have a more complicated approach to this where I can mark individuals as "different", "possibly same", and "same", for possible or definite resightings. That lets me calculate a maximum and minimum range for my count of individuals of a species at a site. I'm not sure how to handle this here.]
# Concepts in my wildcounts data that seem to be missing from Humboldt Core Terms:
*geospatialScopeDistanceInKilometers* [most of my surveys are distance-sampling along unbounded transects, for which I have distance not area. Also, when I'm exploring sites, I record with my GPS the total distance I travel while surveying the site. In both cases, sampling distance is more relevant for my data than sampling area. I'm not sampling with plots. I need somewhere standard to put my samplingDistanceKm.] *targetPhenologyScope* [eg often in my repeat surveys I'm just mapping out individual plants that are currently flowering or fruiting. The rest get ignored. I call this whatsoughtReproductiveCondition in my data.] *targetSeenHeardScope* [e.g., if I'm in a car, I'm only counting the birds I see, while I also include the birds I hear when I'm biking. Similarly, if I was extracting data from my AudioMoth that runs in our garden, that would be all birds heard only.] *targetWildCaptiveScope* [sometimes I only survey the wild individuals. This is an important distinction to make when surveying weeds in urban areas.] *targetLargestBodyLengthScope* [eg sometimes I'm just surveying big birds, or big butteflies, such as when I'm surveying from a moving car. I have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres in my data to capture this.] *targetDeadOrAliveScope* [eg when surveying roadkill they're all dead. I can be surveying dead birds along a drive but not surveying all live birds.] *whatsoughtBodyCondition* (when I'm repeating surveying roadkill along standard routes, I'm only counting the fresh carcasses, so in my data whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect Humboldt Core to contain terms for details like this, but I’m also not sure how to handle them in Humboldt Core data.) *samplingDurationMinutes* [this will be samplingEffortValue and samplingEffortUnit when there is somewhere else to put my sampled distance] *predetermined* [sometimes I might hear an interesting bird and go outside and do a checklist including that bird. This, in my data, is a survey with predetermined=="false". If I do a planned survey at a planned time, irrespective of the conditions and species present, then that's predetermined=="true". In my data, predetermined can be assigned both at a whole survey level and at a howSought level for each Scope (eg if I went outside to look for an interesting bird I heard, and decided to survey butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for my bird list but howSoughtPredetermined == "true" for my butterfly list). I'm not sure where that concept fits into Humboldt Core.]
# Humboldt Core Terms not used by me (at the moment):
*identificationReferences* [I use scientificNameAuthorship and namePublishedIn and namePublishedInYear from Darwin Core in my species database as a source for each species concept, so perhaps that works for this?] *siteCount* [this could be calculated for each of my surveys, usually by summing up sections of my transects] *siteNestingDescription* [I don't use this now but could add a generic text description of my method into here. Most of my surveys have a footprintWKT LINESTRING of where I searched, and these can optionally have nested subsets of linestrings in different habitats, or sections of a transect, or time periods. I could descibe this in text but it will also be apparent from the data.] *verbatimSiteDescriptions* *geospatialScopeAreaInSquareKilometers* [all my surveys in recent years are unbounded distance-sampled transects. I use a footprintWKT LINESTRING generated from a GPX file to define the transect. For some taxa, I restrict the width of the distance sampled, and so could calculate an area. For most (eg birds), my observations are all counts in estimated distance bands, including as far away as I can see and hear with my unaided eye and ear. A geospatialScopeDistanceInKilometers would work much better for my surveys than area. At the moment I'm using a LINESTRING footprintWKT and samplingDistanceKm.] *totalAreaSampledInSquareKilometers* [comment as above] *reportedExtremeConditions* *compilationType* *compilationSourceTypes* *inventoryTypes* [At the moment I don't understand what this refers to. If I do a distance-sampling transect and count and map out all butterflies, which I do, is that inventoryTypes == "open search"? Or does that mean something else?] *isAbundanceCapReported* [I suppose I can include this always as isAbundanceCapReported == "FALSE"] *abundanceCap* *isVegetationCoverReported* *isAbsenceReported* [lots of absences can be inferred from my surveys--that's one of the reasons I do them--but I don't have data like “blackbird = 0". I'm assuming that's the kind of data you're meaning by isAbsenceReported == "true"] *absentTaxa* [although all absent taxa can be inferred from the target fields like targetTaxonomicScope. I assume you're not expecting impossibly long lists of absent taxa with every survey dataset. It's more efficient to say targetTaxonomicScope == "Aves" and just list the birds seen and heard.] *hasMaterialSamples* *materialSampleTypes* *samplingEffortProtocol* [I don't see the difference between this and samplingEffortValue for my surveys, when my survey method has been described in protocolDescription] *taxonCompletenessProtocols* [I'm confused at the moment by how to describe my sampling method across protocolDescription, samplingEffortProtocol, and taxonCompletenessProtocols. I sense that taxonCompletenessProtocols is trying to get at whether a survey has made an estimate of its detection probability for a surveyed taxon, and how that was assessed, and whether this influenced the sampling effort. In my case, for example, I might bike a 20 km route and distance count/map all birds I see and hear. That's a good fit for the protocolDescription. I'm not sure what I would then need to state for the samplingEffortProtocol and the taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can be modelled from my data but that doesn't influence how quickly I'm biking or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]
# examples some details of my method that probably just don't belong as Humboldt Core Terms
gpsSource visualAid auditoryAid dataEntryHardware dataEntrySoftware distance bands from observer
On Wed, Aug 9, 2023 at 7:23 AM Yanina Sica yanina.sica@gmail.com wrote:
yuhuuu!!! amazing progress! Thanks Ming and John!
I hope everybody received my invitation to today's meeting.
I might be a couple of minutes late but hope to see everybody soon!
Cheers Yani
On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan ymgan@naturalsciences.be wrote:
Hi all,
I have done the exercise mentioned in subject. Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/sr... The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data I hope some of the remarks and questions are useful!
I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPT https://ipt.gbif.org/resource?r=brokewest-fish when the new terms are up in the sandbox.
Cheers Ming _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
-- Robert D Stevenson Associate Professor Department of Biology UMass Boston _______________________________________________ tdwg-humboldt mailing list tdwg-humboldt@lists.tdwg.org https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
participants (11)
-
Baskauf, Steven James
-
Brenton, Peter (NCMI, Black Mountain)
-
Dmitry Schigel
-
John Wieczorek
-
Kate Ingenloff
-
Myriatrix Admin
-
Rob Stevenson
-
Yanina Sica
-
Yi Ming Gan
-
ys628
-
Zachary Kachian