[tdwg-humboldt] Feedback on current Humboldt Core terms

Tue Apr 11 15:13:28 UTC 2023

Hi Jon,

Thank you for taking the time to compare your data set terms to those of
the Humboldt core.

It would be useful if you could join the Wednesday meetings for one or two
sessions.  It has been a time that is difficult for me this semester but a
conversation with the others based on your comparisons is an excellent
starting point.

Another option is to have a side meeting with one or two of the group
members. I think Wes Hochachka, for instance, would have some good insights

Best
Rob

On Wed, Apr 5, 2023 at 10:21 PM Sullivan, Jon <Jon.Sullivan at lincoln.ac.nz>
wrote:

> Hello fine Humboldt Core people,
>
> I’ve been lurking on the mailing list and finally sat down and had a
> proper look at how all the current HC terms map onto my ecological
> surveying. Perhaps my feedback is still useful at this late stage. (Rob
> Stevenson invited me to do this while you were doing the case studies at
> the end of last year but I got swamped.)
>
> I’m on something of a personal mission to document the changes in the
> readily detectable and identifiable species around me. My surveys hit the
> 20-year mark last Saturday and I’m at over 1.5 million observations.
> They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe
> cranking through them over the last month and a half and I’m about half way
> through now. I’m hoping that my dataset is just the kind of ecological
> survey project that will benefit from the standardisation offered
> by Humboldt Core.
>
> Anyway, with that preamble, below is a summary of how my data structure
> maps onto the draft Humboldt Core. My biggest puzzle is how to describe my
> unbounded distance-sampled transect counts within a structure that only
> seems to quantify the surveyed sites by area.
>
> Cheers,
>
> Jon
>
> # Humboldt Core Terms useful as is:
>
> *samplingPerformedBy*
> *identifiedBy*
> *verbatimSiteNames*
> *eventDuration* [I use samplingDurationMinutes at the moment, which will
> translate easily.]
> eventDurationUnit
> *targetTaxonomicScope*
> *targetLifeStageScope*
> *excludedLifeStageScope*
> *targetDegreeOfEstablishmentScope* [I'm hoping I can use a different
> vocabularly in here as I also use "endemic", as some of my surveys are only
> of New Zealand endemics and do not include relatively recently established
> Australian natives and are now also NZ natives. I also find the concepts
> "invasive" and "widespread invasive" too slippery to use for species scope,
> so I use "naturalised" when I want to refer to wild exotic species.]
> *excludedDegreeOfEstablishmentScope*
> *targetGrowthFormScope* [eg when I'm surveying woody weeds, that would be
> targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope ==
> "naturalised",  excludedDegreeOfEstablishmentScope == "native",
> targetGrowthFormScope == "tree|shrub|liane"]
> *targetHabitatScope* [eg targetHabitatScope == "road|roadside" for my
> roadkill surveys]
> *reportedWeather*
> *protocolNames*
> *protocolDescription*
> *protocolReferences*
> *isAbundanceReported*
> *hasVouchers*
> *voucherInstitutions* [although often my collected specimens are still in
> my person collection, eg waiting in my plant press]
> isSamplingEffortReported
> *samplingEffortValue* [at the moment I'm using the DWC "samplingEffort",
> eg samplingEffort == "11.91 km in 244 minutes"]
> *taxonCompletenessReported* ["reported complete"]
> *isTaxonomicScopeComplete*
> *isLifeStageScopeComplete*
> *isDegreeOfEstablishmentScopeComplete*
> *isGrowthFormScopeComplete* [is there an issue here if these fields are
> ever interpreted independently of one another? If I'm surveying woody
> weeds, then, in combination, TaxonomicScope == "Tracheophtya" and
> LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope ==
> "naturalised" are complete. However, each independently is not complete,
> e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be
> better if there was just one isScopeComplete and that applies to all the
> scopes listed? I can't think at the moment of a survey type that would list
> a series of Scope values and differ in whether or not ScopeComplete ==
> "true" for each.]
>
>
> # Humboldt Core Terms useful but with modification:
>
> *samplingEffortUnit* [how do I untangle distance and duration in this
> field? If I'm exploring at a site, I have the distance from my GPS and the
> duration. I've been regarding those in combination as my sampling effort
> for a site.]
> *isLeastSpecificTargetCategoryQuantityInclusive* [I have a more
> complicated approach to this where I can mark individuals as "different",
> "possibly same", and "same", for possible or definite resightings. That
> lets me calculate a maximum and minimum range for my count of individuals
> of a species at a site. I'm not sure how to handle this here.]
>
>
> # Concepts in my wildcounts data that seem to be missing from Humboldt
> Core Terms:
>
> *geospatialScopeDistanceInKilometers* [most of my surveys are
> distance-sampling along unbounded transects, for which I have distance not
> area. Also, when I'm exploring sites, I record with my GPS the total
> distance I travel while surveying the site. In both cases, sampling
> distance is more relevant for my data than sampling area. I'm not sampling
> with plots. I need somewhere standard to put my samplingDistanceKm.]
> *targetPhenologyScope* [eg often in my repeat surveys I'm just mapping
> out individual plants that are currently flowering or fruiting. The rest
> get ignored. I call this whatsoughtReproductiveCondition in my data.]
> *targetSeenHeardScope* [e.g., if I'm in a car, I'm only counting the
> birds I see, while I also include the birds I hear when I'm biking.
> Similarly, if I was extracting data from my AudioMoth that runs in our
> garden, that would be all birds heard only.]
> *targetWildCaptiveScope* [sometimes I only survey the wild individuals.
> This is an important distinction to make when surveying weeds in urban
> areas.]
> *targetLargestBodyLengthScope* [eg sometimes I'm just surveying big
> birds, or big butteflies, such as when I'm surveying from a moving car. I
> have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres
> in my data to capture this.]
> *targetDeadOrAliveScope* [eg when surveying roadkill they're all dead. I
> can be surveying dead birds along a drive but not surveying all live birds.]
> *whatsoughtBodyCondition* (when I'm repeating surveying roadkill along
> standard routes, I'm only counting the fresh carcasses, so in my data
> whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect
> Humboldt Core to contain terms for details like this, but I’m also not sure
> how to handle them in Humboldt Core data.)
> *samplingDurationMinutes* [this will be samplingEffortValue and
> samplingEffortUnit when there is somewhere else to put my sampled distance]
> *predetermined* [sometimes I might hear an interesting bird and go
> outside and do a checklist including that bird. This, in my data, is a
> survey with predetermined=="false". If I do a planned survey at a planned
> time, irrespective of the conditions and species present, then that's
> predetermined=="true". In my data, predetermined can be assigned both at a
> whole survey level and at a howSought level for each Scope (eg if I went
> outside to look for an interesting bird I heard, and decided to survey
> butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for
> my bird list but howSoughtPredetermined == "true" for my butterfly list).
> I'm not sure where that concept fits into Humboldt Core.]
>
> # Humboldt Core Terms not used by me (at the moment):
>
> *identificationReferences* [I use scientificNameAuthorship and
> namePublishedIn and namePublishedInYear from Darwin Core in my species
> database as a source for each species concept, so perhaps that works for
> this?]
> *siteCount* [this could be calculated for each of my surveys, usually by
> summing up sections of my transects]
> *siteNestingDescription* [I don't use this now but could add a generic
> text description of my method into here. Most of my surveys have a
> footprintWKT LINESTRING of where I searched, and these can optionally have
> nested subsets of linestrings in different habitats, or sections of a
> transect, or time periods. I could descibe this in text but it will also be
> apparent from the data.]
> *verbatimSiteDescriptions*
> *geospatialScopeAreaInSquareKilometers* [all my surveys in recent years
> are unbounded distance-sampled transects. I use a footprintWKT LINESTRING
> generated from a GPX file to define the transect. For some taxa, I restrict
> the width of the distance sampled, and so could calculate an area. For most
> (eg birds), my observations are all counts in estimated distance bands,
> including as far away as I can see and hear with my unaided eye and ear. A
> geospatialScopeDistanceInKilometers would work much better for my surveys
> than area. At the moment I'm using a LINESTRING footprintWKT and
> samplingDistanceKm.]
> *totalAreaSampledInSquareKilometers* [comment as above]
> *reportedExtremeConditions*
> *compilationType*
> *compilationSourceTypes*
> *inventoryTypes* [At the moment I don't understand what this refers to.
> If I do a distance-sampling transect and count and map out all butterflies,
> which I do, is that inventoryTypes == "open search"? Or does that mean
> something else?]
> *isAbundanceCapReported* [I suppose I can include this always as
> isAbundanceCapReported == "FALSE"]
> *abundanceCap*
> *isVegetationCoverReported*
> *isAbsenceReported* [lots of absences can be inferred from my
> surveys--that's one of the reasons I do them--but I don't have data like
> “blackbird = 0". I'm assuming that's the kind of data you're meaning by
> isAbsenceReported == "true"]
> *absentTaxa* [although all absent taxa can be inferred from the target
> fields like targetTaxonomicScope. I assume you're not expecting impossibly
> long lists of absent taxa with every survey dataset. It's more efficient to
> say targetTaxonomicScope == "Aves" and just list the birds seen and heard.]
> *hasMaterialSamples*
> *materialSampleTypes*
> *samplingEffortProtocol* [I don't see the difference between this and
> samplingEffortValue for my surveys, when my survey method has been
> described in protocolDescription]
> *taxonCompletenessProtocols* [I'm confused at the moment by how to
> describe my sampling method across protocolDescription,
> samplingEffortProtocol, and taxonCompletenessProtocols. I sense that
> taxonCompletenessProtocols is trying to get at whether a survey has made an
> estimate of its detection probability for a surveyed taxon, and how that
> was assessed, and whether this influenced the sampling effort. In my case,
> for example, I might bike a 20 km route and distance count/map all birds I
> see and hear. That's a good fit for the protocolDescription. I'm not sure
> what I would then need to state for the samplingEffortProtocol and the
> taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can
> be modelled from my data but that doesn't influence how quickly I'm biking
> or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]
>
> # examples some details of my method that probably just don't belong as
> Humboldt Core Terms
>
> gpsSource
> visualAid
> auditoryAid
> dataEntryHardware
> dataEntrySoftware
> distance bands from observer
>
>
>
> ________________________________________
> *Jon Sullivan*, Ph.D.
> Senior Lecturer
> Department of Pest-Management and Conservation
> P.O. Box 84
> Lincoln University / Te Whare Wanaka O Aoraki
> Lincoln 7647
> New Zealand
>
> *office:* Room 524, Burns building, Lincoln University campus
> Cnr Springs & Ellesmere Junction Roads
> *email:* Jon.Sullivan at lincoln.ac.nz
> *tel: *(03) 423 0756
> for international calls, replace (03) with (643)
>
> @jon_sullivan at *iNaturalist NZ*
> <https://inaturalist.nz/observations?place_id=any&user_id=jon_sullivan&verifiable=any>
> @joncounts at mastodon.nz <https://mastodon.nz/@joncounts>
> EcolLincNZ blog <http://lincolnecology.org.nz/>
> ecological surveying with *WildCounts* <https://wildcounts.org/>
> <https://wildcounts.org/>
> ________________________________________
>
>
>
>
>
> On 4/04/2023, at 12:56 PM, ys628 <yanina.sica at yale.edu> wrote:
> Dear all,
>
> We have made a lot of progress with the Implementation report.  *Please
> have a look and edit here
> <https://docs.google.com/document/d/1RFdSHoyzWCQk9qO6uup4xQjWOMzPyBb-A0mcjj98hbk/edit#heading=h.u3r7un3jbl3s>*
> *.*
>
> *I would like to thank Wesley, Ming, Zach and Steve for pushing this
> forward! I really appreciate it!*
>
> Here is a brief description of the document, the goal is to convince TDWG
> people that this extension is useful an that it would work!
> - *Authors *will include the people writing the report and participating
> in the testing (we may even publish this as a paper in BISS)
> - *Introduction and background* will be the basis of the Feature report
> as it already includes the rationale behind building this extension. This
> means that this Implementation report will include the information needed
> in the Feature report. For more info on TDWG required documents see here:
> http://rs.tdwg.org/vms/doc/specification/#421-feature-report
> <https://www.google.com/url?q=http://rs.tdwg.org/vms/doc/specification/%23421-feature-report&sa=D&source=docs&ust=1680098196850728&usg=AOvVaw3MkAtmG8IRHVETy2hJFn4L>
> - *Development of the vocabulary* includes some description of what is
> considered an inventory, a basic description of the terms, and a link to
> the final table with the terms
> - *Use cases* will include a description of the datasets and how the
> mapping was done
> - *Lessons learned *will include all the challenges identified during the
> mapping and testing that were addressed using the Humboldt extension
> - *Unresolved issues/remaining challenges* will include all the
> challenges identified during the mapping and testing that were NOT
> addressed using the Humboldt extension
> - *Conclusions* will be some sort of summary stating that we are ready to
> go to public review
>
> *I will not be able to join our next meeting, but it would be great if you
> can meet and discuss how the document is looking otherwise please take the
> time to review the document. *The sooner we finish this the faster we can
> start the public review.
>
> We also need to review the documentation and the list of terms
> <https://docs.google.com/spreadsheets/d/1AbUUKDkgilbtHu9Dh_5V2dnOeQNbKBcRo4d7VEDyqOg/edit#gid=697606170>
>  that will accompany the implementation report. I would also ask you to
> discuss line 41 of that sheet.
>
>
> Hope everybody is good and we got this!
>
> All the best!
>
>
>
> Yanina V. Sica, PhD
> Lead Data Team
> Map of Life <https://mol.org/> | Center for Biodiversity and Global Change
> <https://bgc.yale.edu/>
> Yale University
> pronouns: she/her/hers
> *If you are receiving this email outside of your working hours, I am not
> expecting you to read or respond.*
> _______________________________________________
> tdwg-humboldt mailing list
> tdwg-humboldt at lists.tdwg.org
> https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
>
>
>
> ------------------------------
>
> "The contents of this e-mail (including any attachments) may be
> confidential and/or subject to copyright. Any unauthorised use,
> distribution, or copying of the contents is expressly prohibited. If you
> have received this e-mail in error, please advise the sender by return
> e-mail or telephone and then delete this e-mail together with all
> attachments from your system."
> _______________________________________________
> tdwg-humboldt mailing list
> tdwg-humboldt at lists.tdwg.org
> https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
>

-- 
Robert D Stevenson
Associate Professor
Department of Biology
UMass Boston
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-humboldt/attachments/20230411/f7fe9946/attachment-0001.html>