[tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go

Sun Aug 13 16:16:17 UTC 2023

Hi Rob,
Thanks for these comments.
I agree the eco:isLeastSpecificTargetCategoryQuantityInclusive is quite a terrible name but we reached to that name because we took 2 main things under consideration:

  *   it is needed to understand how to treat dwc:organismQuantity and dwc:organismQuantityType. So we thought Quantity should be there
  *   and it allows for multiple target categories (e.g., taxonomic ranks within a higher rank or different life stages for the same species) so thats why we left it quite open...

Regarding your larger question, this is a very interesting comment and I think we should cover this in the User Guide<https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit>. My initial thought is that the extended measurements or facts extension (emof) could be useful here. Also, I think you are allowed to include your own terms when using the IPT, but I am not sure.

Best!

Yani
[https://lh3.googleusercontent.com/docs/AOD9vFpqo6Fpc-BS60ivUQu2xD0bLOTXzZsfnumIZfQXlIYGZo-VZcaDpxOavvAkD60uk3PARLzqvtBVN-HgqCd5D4nOGrOzZCUQ1vMjk-6Rs7Nz=w1200-h630-p]<https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit>
Humboldt Extension for ecological inventories User Guide<https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit>
Humboldt Extension for Ecological Inventories: User Guide Date modified: 2023-05-11 Part of TDWG Standard: Not part of any standard Abstract: This user guide provides information and examples showing how to use the Humboldt Extension for Ecological Inventories ("eco") to extend Darwin Core Even...
docs.google.com

________________________________
From: tdwg-humboldt <tdwg-humboldt-bounces at lists.tdwg.org> on behalf of Rob Stevenson <rdstevenson10 at gmail.com>
Sent: Friday, August 11, 2023 6:14 PM
To: Humboldt Core TG <tdwg-humboldt at lists.tdwg.org>
Cc: wmh6 at cornell.edu <wmh6 at cornell.edu>
Subject: Re: [tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go

Thanks for the documents Ming and Yani. It is nice to see the continued progress.  Sorry I am not able to join this week's meeting.

A few comments which may not be useful after the meeting.

I did look at eco:isLeastSpecificTargetCategoryQuantityInclusive Guidelines  here https://docs.google.com/document/d/11fPW4JibRWUnrtTOZwDkAcxo5d-lwnATXyh3xmO65rs/edit

The purpose is clear after reading the document but I found the term name "isLeastSpecificTargetCategoryQuantityInclusive" non intuitive.

Could the words "counts" and  "preCalulated" be used somehow?

In reading the beginning of the document there is a sentence that says

" need to be treated differently in order to calculate the total quantity of organisms in the least specific category. "
I suggest saying "in the least specific taxonomic category" that will often be the species level.

A larger question.  If some one has terms that the Event Extension  does not support.  What option does some one have to share data.

I was thinking specifically of Jon Sullivan's old email. Could the event extension documents provide general advice for people with terms like Jon?

Best, Rob

[https://lh3.googleusercontent.com/cm/AJSPFBwKJnLurLy3G58vsBvIjfKZkiBfoEsnOtyIXvJ_9JzVwybC62bl2YYlNCVfR7h1=s80-p]
Sullivan, Jon Jon.Sullivan at lincoln.ac.nz<mailto:Jon.Sullivan at lincoln.ac.nz> via<https://support.google.com/mail/answer/1311182?hl=en> lists.tdwg.org<http://lists.tdwg.org/>
Wed, Apr 5, 10:21 PM
[https://mail.google.com/mail/u/0/images/cleardot.gif]
[https://mail.google.com/mail/u/0/images/cleardot.gif]
to tdwg-humboldt at lists.tdwg.org<mailto:tdwg-humboldt at lists.tdwg.org>
[https://mail.google.com/mail/u/0/images/cleardot.gif]
Hello fine Humboldt Core people,

I’ve been lurking on the mailing list and finally sat down and had a proper look at how all the current HC terms map onto my ecological surveying. Perhaps my feedback is still useful at this late stage. (Rob Stevenson invited me to do this while you were doing the case studies at the end of last year but I got swamped.)

I’m on something of a personal mission to document the changes in the readily detectable and identifiable species around me. My surveys hit the 20-year mark last Saturday and I’m at over 1.5 million observations. They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe cranking through them over the last month and a half and I’m about half way through now. I’m hoping that my dataset is just the kind of ecological survey project that will benefit from the standardisation offered by Humboldt Core.

Anyway, with that preamble, below is a summary of how my data structure maps onto the draft Humboldt Core. My biggest puzzle is how to describe my unbounded distance-sampled transect counts within a structure that only seems to quantify the surveyed sites by area.

Cheers,

Jon

# Humboldt Core Terms useful as is:

samplingPerformedBy
identifiedBy
verbatimSiteNames
eventDuration [I use samplingDurationMinutes at the moment, which will translate easily.]
eventDurationUnit
targetTaxonomicScope
targetLifeStageScope
excludedLifeStageScope
targetDegreeOfEstablishmentScope [I'm hoping I can use a different vocabularly in here as I also use "endemic", as some of my surveys are only of New Zealand endemics and do not include relatively recently established Australian natives and are now also NZ natives. I also find the concepts "invasive" and "widespread invasive" too slippery to use for species scope, so I use "naturalised" when I want to refer to wild exotic species.]
excludedDegreeOfEstablishmentScope
targetGrowthFormScope [eg when I'm surveying woody weeds, that would be targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope == "naturalised",  excludedDegreeOfEstablishmentScope == "native", targetGrowthFormScope == "tree|shrub|liane"]
targetHabitatScope [eg targetHabitatScope == "road|roadside" for my roadkill surveys]
reportedWeather
protocolNames
protocolDescription
protocolReferences
isAbundanceReported
hasVouchers
voucherInstitutions [although often my collected specimens are still in my person collection, eg waiting in my plant press]
isSamplingEffortReported
samplingEffortValue [at the moment I'm using the DWC "samplingEffort", eg samplingEffort == "11.91 km in 244 minutes"]
taxonCompletenessReported ["reported complete"]
isTaxonomicScopeComplete
isLifeStageScopeComplete
isDegreeOfEstablishmentScopeComplete
isGrowthFormScopeComplete [is there an issue here if these fields are ever interpreted independently of one another? If I'm surveying woody weeds, then, in combination, TaxonomicScope == "Tracheophtya" and LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope == "naturalised" are complete. However, each independently is not complete, e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be better if there was just one isScopeComplete and that applies to all the scopes listed? I can't think at the moment of a survey type that would list a series of Scope values and differ in whether or not ScopeComplete == "true" for each.]

# Humboldt Core Terms useful but with modification:

samplingEffortUnit [how do I untangle distance and duration in this field? If I'm exploring at a site, I have the distance from my GPS and the duration. I've been regarding those in combination as my sampling effort for a site.]
isLeastSpecificTargetCategoryQuantityInclusive [I have a more complicated approach to this where I can mark individuals as "different", "possibly same", and "same", for possible or definite resightings. That lets me calculate a maximum and minimum range for my count of individuals of a species at a site. I'm not sure how to handle this here.]

# Concepts in my wildcounts data that seem to be missing from Humboldt Core Terms:

geospatialScopeDistanceInKilometers [most of my surveys are distance-sampling along unbounded transects, for which I have distance not area. Also, when I'm exploring sites, I record with my GPS the total distance I travel while surveying the site. In both cases, sampling distance is more relevant for my data than sampling area. I'm not sampling with plots. I need somewhere standard to put my samplingDistanceKm.]
targetPhenologyScope [eg often in my repeat surveys I'm just mapping out individual plants that are currently flowering or fruiting. The rest get ignored. I call this whatsoughtReproductiveCondition in my data.]
targetSeenHeardScope [e.g., if I'm in a car, I'm only counting the birds I see, while I also include the birds I hear when I'm biking. Similarly, if I was extracting data from my AudioMoth that runs in our garden, that would be all birds heard only.]
targetWildCaptiveScope [sometimes I only survey the wild individuals. This is an important distinction to make when surveying weeds in urban areas.]
targetLargestBodyLengthScope [eg sometimes I'm just surveying big birds, or big butteflies, such as when I'm surveying from a moving car. I have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres in my data to capture this.]
targetDeadOrAliveScope [eg when surveying roadkill they're all dead. I can be surveying dead birds along a drive but not surveying all live birds.]
whatsoughtBodyCondition (when I'm repeating surveying roadkill along standard routes, I'm only counting the fresh carcasses, so in my data whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect Humboldt Core to contain terms for details like this, but I’m also not sure how to handle them in Humboldt Core data.)
samplingDurationMinutes [this will be samplingEffortValue and samplingEffortUnit when there is somewhere else to put my sampled distance]
predetermined [sometimes I might hear an interesting bird and go outside and do a checklist including that bird. This, in my data, is a survey with predetermined=="false". If I do a planned survey at a planned time, irrespective of the conditions and species present, then that's predetermined=="true". In my data, predetermined can be assigned both at a whole survey level and at a howSought level for each Scope (eg if I went outside to look for an interesting bird I heard, and decided to survey butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for my bird list but howSoughtPredetermined == "true" for my butterfly list). I'm not sure where that concept fits into Humboldt Core.]

# Humboldt Core Terms not used by me (at the moment):

identificationReferences [I use scientificNameAuthorship and namePublishedIn and namePublishedInYear from Darwin Core in my species database as a source for each species concept, so perhaps that works for this?]
siteCount [this could be calculated for each of my surveys, usually by summing up sections of my transects]
siteNestingDescription [I don't use this now but could add a generic text description of my method into here. Most of my surveys have a footprintWKT LINESTRING of where I searched, and these can optionally have nested subsets of linestrings in different habitats, or sections of a transect, or time periods. I could descibe this in text but it will also be apparent from the data.]
verbatimSiteDescriptions
geospatialScopeAreaInSquareKilometers [all my surveys in recent years are unbounded distance-sampled transects. I use a footprintWKT LINESTRING generated from a GPX file to define the transect. For some taxa, I restrict the width of the distance sampled, and so could calculate an area. For most (eg birds), my observations are all counts in estimated distance bands, including as far away as I can see and hear with my unaided eye and ear. A geospatialScopeDistanceInKilometers would work much better for my surveys than area. At the moment I'm using a LINESTRING footprintWKT and samplingDistanceKm.]
totalAreaSampledInSquareKilometers [comment as above]
reportedExtremeConditions
compilationType
compilationSourceTypes
inventoryTypes [At the moment I don't understand what this refers to. If I do a distance-sampling transect and count and map out all butterflies, which I do, is that inventoryTypes == "open search"? Or does that mean something else?]
isAbundanceCapReported [I suppose I can include this always as isAbundanceCapReported == "FALSE"]
abundanceCap
isVegetationCoverReported
isAbsenceReported [lots of absences can be inferred from my surveys--that's one of the reasons I do them--but I don't have data like “blackbird = 0". I'm assuming that's the kind of data you're meaning by isAbsenceReported == "true"]
absentTaxa [although all absent taxa can be inferred from the target fields like targetTaxonomicScope. I assume you're not expecting impossibly long lists of absent taxa with every survey dataset. It's more efficient to say targetTaxonomicScope == "Aves" and just list the birds seen and heard.]
hasMaterialSamples
materialSampleTypes
samplingEffortProtocol [I don't see the difference between this and samplingEffortValue for my surveys, when my survey method has been described in protocolDescription]
taxonCompletenessProtocols [I'm confused at the moment by how to describe my sampling method across protocolDescription, samplingEffortProtocol, and taxonCompletenessProtocols. I sense that taxonCompletenessProtocols is trying to get at whether a survey has made an estimate of its detection probability for a surveyed taxon, and how that was assessed, and whether this influenced the sampling effort. In my case, for example, I might bike a 20 km route and distance count/map all birds I see and hear. That's a good fit for the protocolDescription. I'm not sure what I would then need to state for the samplingEffortProtocol and the taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can be modelled from my data but that doesn't influence how quickly I'm biking or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]

# examples some details of my method that probably just don't belong as Humboldt Core Terms

gpsSource
visualAid
auditoryAid
dataEntryHardware
dataEntrySoftware
distance bands from observer

On Wed, Aug 9, 2023 at 7:23 AM Yanina Sica <yanina.sica at gmail.com<mailto:yanina.sica at gmail.com>> wrote:
yuhuuu!!! amazing progress! Thanks Ming and John!

I hope everybody received my invitation to today's meeting.

I might be a couple of minutes late but hope to see everybody soon!

Cheers
Yani

On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan <ymgan at naturalsciences.be<mailto:ymgan at naturalsciences.be>> wrote:
Hi all,

I have done the exercise mentioned in subject.
Please see the rendered html with this link: https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/src/mapping.html
The repository is public: https://github.com/biodiversity-aq/humboldt-for-eco-survey-data
I hope some of the remarks and questions are useful!

I think the guiding principles are fine after last Wednesday long meeting (special thanks to Yani, Ani and Wesley). The dataset based on latest term name (from last Wednesday) is good to go. I can update the dataset in test IPT<https://ipt.gbif.org/resource?r=brokewest-fish> when the new terms are up in the sandbox.

Cheers
Ming
_______________________________________________
tdwg-humboldt mailing list
tdwg-humboldt at lists.tdwg.org<mailto:tdwg-humboldt at lists.tdwg.org>
https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
_______________________________________________
tdwg-humboldt mailing list
tdwg-humboldt at lists.tdwg.org<mailto:tdwg-humboldt at lists.tdwg.org>
https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt

--
Robert D Stevenson
Associate Professor
Department of Biology
UMass Boston
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-humboldt/attachments/20230813/1a7bf5b7/attachment-0001.html>