[tdwg-humboldt] Feedback on current Humboldt Core terms

Thu Apr 6 02:20:51 UTC 2023

Hello fine Humboldt Core people,

I’ve been lurking on the mailing list and finally sat down and had a proper look at how all the current HC terms map onto my ecological surveying. Perhaps my feedback is still useful at this late stage. (Rob Stevenson invited me to do this while you were doing the case studies at the end of last year but I got swamped.)

I’m on something of a personal mission to document the changes in the readily detectable and identifiable species around me. My surveys hit the 20-year mark last Saturday and I’m at over 1.5 million observations. They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe cranking through them over the last month and a half and I’m about half way through now. I’m hoping that my dataset is just the kind of ecological survey project that will benefit from the standardisation offered by Humboldt Core.

Anyway, with that preamble, below is a summary of how my data structure maps onto the draft Humboldt Core. My biggest puzzle is how to describe my unbounded distance-sampled transect counts within a structure that only seems to quantify the surveyed sites by area.

Cheers,

Jon

# Humboldt Core Terms useful as is:

samplingPerformedBy
identifiedBy
verbatimSiteNames
eventDuration [I use samplingDurationMinutes at the moment, which will translate easily.]
eventDurationUnit
targetTaxonomicScope
targetLifeStageScope
excludedLifeStageScope
targetDegreeOfEstablishmentScope [I'm hoping I can use a different vocabularly in here as I also use "endemic", as some of my surveys are only of New Zealand endemics and do not include relatively recently established Australian natives and are now also NZ natives. I also find the concepts "invasive" and "widespread invasive" too slippery to use for species scope, so I use "naturalised" when I want to refer to wild exotic species.]
excludedDegreeOfEstablishmentScope
targetGrowthFormScope [eg when I'm surveying woody weeds, that would be targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope == "naturalised",  excludedDegreeOfEstablishmentScope == "native", targetGrowthFormScope == "tree|shrub|liane"]
targetHabitatScope [eg targetHabitatScope == "road|roadside" for my roadkill surveys]
reportedWeather
protocolNames
protocolDescription
protocolReferences
isAbundanceReported
hasVouchers
voucherInstitutions [although often my collected specimens are still in my person collection, eg waiting in my plant press]
isSamplingEffortReported
samplingEffortValue [at the moment I'm using the DWC "samplingEffort", eg samplingEffort == "11.91 km in 244 minutes"]
taxonCompletenessReported ["reported complete"]
isTaxonomicScopeComplete
isLifeStageScopeComplete
isDegreeOfEstablishmentScopeComplete
isGrowthFormScopeComplete [is there an issue here if these fields are ever interpreted independently of one another? If I'm surveying woody weeds, then, in combination, TaxonomicScope == "Tracheophtya" and LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope == "naturalised" are complete. However, each independently is not complete, e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be better if there was just one isScopeComplete and that applies to all the scopes listed? I can't think at the moment of a survey type that would list a series of Scope values and differ in whether or not ScopeComplete == "true" for each.]

# Humboldt Core Terms useful but with modification:

samplingEffortUnit [how do I untangle distance and duration in this field? If I'm exploring at a site, I have the distance from my GPS and the duration. I've been regarding those in combination as my sampling effort for a site.]
isLeastSpecificTargetCategoryQuantityInclusive [I have a more complicated approach to this where I can mark individuals as "different", "possibly same", and "same", for possible or definite resightings. That lets me calculate a maximum and minimum range for my count of individuals of a species at a site. I'm not sure how to handle this here.]

# Concepts in my wildcounts data that seem to be missing from Humboldt Core Terms:

geospatialScopeDistanceInKilometers [most of my surveys are distance-sampling along unbounded transects, for which I have distance not area. Also, when I'm exploring sites, I record with my GPS the total distance I travel while surveying the site. In both cases, sampling distance is more relevant for my data than sampling area. I'm not sampling with plots. I need somewhere standard to put my samplingDistanceKm.]
targetPhenologyScope [eg often in my repeat surveys I'm just mapping out individual plants that are currently flowering or fruiting. The rest get ignored. I call this whatsoughtReproductiveCondition in my data.]
targetSeenHeardScope [e.g., if I'm in a car, I'm only counting the birds I see, while I also include the birds I hear when I'm biking. Similarly, if I was extracting data from my AudioMoth that runs in our garden, that would be all birds heard only.]
targetWildCaptiveScope [sometimes I only survey the wild individuals. This is an important distinction to make when surveying weeds in urban areas.]
targetLargestBodyLengthScope [eg sometimes I'm just surveying big birds, or big butteflies, such as when I'm surveying from a moving car. I have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres in my data to capture this.]
targetDeadOrAliveScope [eg when surveying roadkill they're all dead. I can be surveying dead birds along a drive but not surveying all live birds.]
whatsoughtBodyCondition (when I'm repeating surveying roadkill along standard routes, I'm only counting the fresh carcasses, so in my data whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect Humboldt Core to contain terms for details like this, but I’m also not sure how to handle them in Humboldt Core data.)
samplingDurationMinutes [this will be samplingEffortValue and samplingEffortUnit when there is somewhere else to put my sampled distance]
predetermined [sometimes I might hear an interesting bird and go outside and do a checklist including that bird. This, in my data, is a survey with predetermined=="false". If I do a planned survey at a planned time, irrespective of the conditions and species present, then that's predetermined=="true". In my data, predetermined can be assigned both at a whole survey level and at a howSought level for each Scope (eg if I went outside to look for an interesting bird I heard, and decided to survey butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for my bird list but howSoughtPredetermined == "true" for my butterfly list). I'm not sure where that concept fits into Humboldt Core.]

# Humboldt Core Terms not used by me (at the moment):

identificationReferences [I use scientificNameAuthorship and namePublishedIn and namePublishedInYear from Darwin Core in my species database as a source for each species concept, so perhaps that works for this?]
siteCount [this could be calculated for each of my surveys, usually by summing up sections of my transects]
siteNestingDescription [I don't use this now but could add a generic text description of my method into here. Most of my surveys have a footprintWKT LINESTRING of where I searched, and these can optionally have nested subsets of linestrings in different habitats, or sections of a transect, or time periods. I could descibe this in text but it will also be apparent from the data.]
verbatimSiteDescriptions
geospatialScopeAreaInSquareKilometers [all my surveys in recent years are unbounded distance-sampled transects. I use a footprintWKT LINESTRING generated from a GPX file to define the transect. For some taxa, I restrict the width of the distance sampled, and so could calculate an area. For most (eg birds), my observations are all counts in estimated distance bands, including as far away as I can see and hear with my unaided eye and ear. A geospatialScopeDistanceInKilometers would work much better for my surveys than area. At the moment I'm using a LINESTRING footprintWKT and samplingDistanceKm.]
totalAreaSampledInSquareKilometers [comment as above]
reportedExtremeConditions
compilationType
compilationSourceTypes
inventoryTypes [At the moment I don't understand what this refers to. If I do a distance-sampling transect and count and map out all butterflies, which I do, is that inventoryTypes == "open search"? Or does that mean something else?]
isAbundanceCapReported [I suppose I can include this always as isAbundanceCapReported == "FALSE"]
abundanceCap
isVegetationCoverReported
isAbsenceReported [lots of absences can be inferred from my surveys--that's one of the reasons I do them--but I don't have data like “blackbird = 0". I'm assuming that's the kind of data you're meaning by isAbsenceReported == "true"]
absentTaxa [although all absent taxa can be inferred from the target fields like targetTaxonomicScope. I assume you're not expecting impossibly long lists of absent taxa with every survey dataset. It's more efficient to say targetTaxonomicScope == "Aves" and just list the birds seen and heard.]
hasMaterialSamples
materialSampleTypes
samplingEffortProtocol [I don't see the difference between this and samplingEffortValue for my surveys, when my survey method has been described in protocolDescription]
taxonCompletenessProtocols [I'm confused at the moment by how to describe my sampling method across protocolDescription, samplingEffortProtocol, and taxonCompletenessProtocols. I sense that taxonCompletenessProtocols is trying to get at whether a survey has made an estimate of its detection probability for a surveyed taxon, and how that was assessed, and whether this influenced the sampling effort. In my case, for example, I might bike a 20 km route and distance count/map all birds I see and hear. That's a good fit for the protocolDescription. I'm not sure what I would then need to state for the samplingEffortProtocol and the taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can be modelled from my data but that doesn't influence how quickly I'm biking or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]

# examples some details of my method that probably just don't belong as Humboldt Core Terms

gpsSource
visualAid
auditoryAid
dataEntryHardware
dataEntrySoftware
distance bands from observer

________________________________________
Jon Sullivan, Ph.D.
Senior Lecturer
Department of Pest-Management and Conservation
P.O. Box 84
Lincoln University / Te Whare Wanaka O Aoraki
Lincoln 7647
New Zealand

office: Room 524, Burns building, Lincoln University campus
Cnr Springs & Ellesmere Junction Roads
email: Jon.Sullivan at lincoln.ac.nz
tel: (03) 423 0756
for international calls, replace (03) with (643)

@jon_sullivan at iNaturalist NZ<https://inaturalist.nz/observations?place_id=any&user_id=jon_sullivan&verifiable=any>
@joncounts at mastodon.nz<https://mastodon.nz/@joncounts>
EcolLincNZ blog<http://lincolnecology.org.nz/>
ecological surveying with WildCounts<https://wildcounts.org/>
<https://wildcounts.org/>
________________________________________

On 4/04/2023, at 12:56 PM, ys628 <yanina.sica at yale.edu> wrote:
Dear all,

We have made a lot of progress with the Implementation report.  Please have a look and edit here<https://docs.google.com/document/d/1RFdSHoyzWCQk9qO6uup4xQjWOMzPyBb-A0mcjj98hbk/edit#heading=h.u3r7un3jbl3s>.

I would like to thank Wesley, Ming, Zach and Steve for pushing this forward! I really appreciate it!

Here is a brief description of the document, the goal is to convince TDWG people that this extension is useful an that it would work!
- Authors will include the people writing the report and participating in the testing (we may even publish this as a paper in BISS)
- Introduction and background will be the basis of the Feature report as it already includes the rationale behind building this extension. This means that this Implementation report will include the information needed in the Feature report. For more info on TDWG required documents see here: http://rs.tdwg.org/vms/doc/specification/#421-feature-report<https://www.google.com/url?q=http://rs.tdwg.org/vms/doc/specification/%23421-feature-report&sa=D&source=docs&ust=1680098196850728&usg=AOvVaw3MkAtmG8IRHVETy2hJFn4L>
- Development of the vocabulary includes some description of what is considered an inventory, a basic description of the terms, and a link to the final table with the terms
- Use cases will include a description of the datasets and how the mapping was done
- Lessons learned will include all the challenges identified during the mapping and testing that were addressed using the Humboldt extension
- Unresolved issues/remaining challenges will include all the challenges identified during the mapping and testing that were NOT addressed using the Humboldt extension
- Conclusions will be some sort of summary stating that we are ready to go to public review

I will not be able to join our next meeting, but it would be great if you can meet and discuss how the document is looking otherwise please take the time to review the document. The sooner we finish this the faster we can start the public review.

We also need to review the documentation and the list of terms<https://docs.google.com/spreadsheets/d/1AbUUKDkgilbtHu9Dh_5V2dnOeQNbKBcRo4d7VEDyqOg/edit#gid=697606170> that will accompany the implementation report. I would also ask you to discuss line 41 of that sheet.

Hope everybody is good and we got this!

All the best!

Yanina V. Sica, PhD
Lead Data Team
Map of Life<https://mol.org/> | Center for Biodiversity and Global Change<https://bgc.yale.edu/>
Yale University
pronouns: she/her/hers
If you are receiving this email outside of your working hours, I am not expecting you to read or respond.
_______________________________________________
tdwg-humboldt mailing list
tdwg-humboldt at lists.tdwg.org<mailto:tdwg-humboldt at lists.tdwg.org>
https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt

________________________________

"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-humboldt/attachments/20230406/10e32a7b/attachment-0001.html>