[tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go

Fri Aug 11 16:14:15 UTC 2023

Thanks for the documents Ming and Yani. It is nice to see the continued
progress.  Sorry I am not able to join this week's meeting.

A few comments which may not be useful after the meeting.

I did look at eco:isLeastSpecificTargetCategoryQuantityInclusive
Guidelines  here
https://docs.google.com/document/d/11fPW4JibRWUnrtTOZwDkAcxo5d-lwnATXyh3xmO65rs/edit

The purpose is clear after reading the document but I found the term name
"isLeastSpecificTargetCategoryQuantityInclusive" non intuitive.

Could the words "counts" and  "preCalulated" be used somehow?

In reading the beginning of the document there is a sentence that says

" need to be treated differently in order to calculate the total quantity
of organisms in the *least specific category*. "
I suggest saying "in the *least specific taxonomic category" that will
often be the species level.*

A larger question.  If some one has terms that the Event Extension  does
not support.  What option does some one have to share data.

I was thinking specifically of Jon Sullivan's old email. Could the event
extension documents provide general advice for people with terms like Jon?

Best, Rob

Sullivan, Jon Jon.Sullivan at lincoln.ac.nz via
<https://support.google.com/mail/answer/1311182?hl=en> lists.tdwg.org
Wed, Apr 5, 10:21 PM
to tdwg-humboldt at lists.tdwg.org
Hello fine Humboldt Core people,

I’ve been lurking on the mailing list and finally sat down and had a proper
look at how all the current HC terms map onto my ecological surveying.
Perhaps my feedback is still useful at this late stage. (Rob Stevenson
invited me to do this while you were doing the case studies at the end of
last year but I got swamped.)

I’m on something of a personal mission to document the changes in the
readily detectable and identifiable species around me. My surveys hit the
20-year mark last Saturday and I’m at over 1.5 million observations.
They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe
cranking through them over the last month and a half and I’m about half way
through now. I’m hoping that my dataset is just the kind of ecological
survey project that will benefit from the standardisation offered
by Humboldt Core.

Anyway, with that preamble, below is a summary of how my data structure
maps onto the draft Humboldt Core. My biggest puzzle is how to describe my
unbounded distance-sampled transect counts within a structure that only
seems to quantify the surveyed sites by area.

Cheers,

Jon

# Humboldt Core Terms useful as is:

*samplingPerformedBy*
*identifiedBy*
*verbatimSiteNames*
*eventDuration* [I use samplingDurationMinutes at the moment, which will
translate easily.]
eventDurationUnit
*targetTaxonomicScope*
*targetLifeStageScope*
*excludedLifeStageScope*
*targetDegreeOfEstablishmentScope* [I'm hoping I can use a different
vocabularly in here as I also use "endemic", as some of my surveys are only
of New Zealand endemics and do not include relatively recently established
Australian natives and are now also NZ natives. I also find the concepts
"invasive" and "widespread invasive" too slippery to use for species scope,
so I use "naturalised" when I want to refer to wild exotic species.]
*excludedDegreeOfEstablishmentScope*
*targetGrowthFormScope* [eg when I'm surveying woody weeds, that would be
targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope ==
"naturalised",  excludedDegreeOfEstablishmentScope == "native",
targetGrowthFormScope == "tree|shrub|liane"]
*targetHabitatScope* [eg targetHabitatScope == "road|roadside" for my
roadkill surveys]
*reportedWeather*
*protocolNames*
*protocolDescription*
*protocolReferences*
*isAbundanceReported*
*hasVouchers*
*voucherInstitutions* [although often my collected specimens are still in
my person collection, eg waiting in my plant press]
isSamplingEffortReported
*samplingEffortValue* [at the moment I'm using the DWC "samplingEffort", eg
samplingEffort == "11.91 km in 244 minutes"]
*taxonCompletenessReported* ["reported complete"]
*isTaxonomicScopeComplete*
*isLifeStageScopeComplete*
*isDegreeOfEstablishmentScopeComplete*
*isGrowthFormScopeComplete* [is there an issue here if these fields are
ever interpreted independently of one another? If I'm surveying woody
weeds, then, in combination, TaxonomicScope == "Tracheophtya" and
LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope ==
"naturalised" are complete. However, each independently is not complete,
e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be
better if there was just one isScopeComplete and that applies to all the
scopes listed? I can't think at the moment of a survey type that would list
a series of Scope values and differ in whether or not ScopeComplete ==
"true" for each.]

# Humboldt Core Terms useful but with modification:

*samplingEffortUnit* [how do I untangle distance and duration in this
field? If I'm exploring at a site, I have the distance from my GPS and the
duration. I've been regarding those in combination as my sampling effort
for a site.]
*isLeastSpecificTargetCategoryQuantityInclusive* [I have a more complicated
approach to this where I can mark individuals as "different", "possibly
same", and "same", for possible or definite resightings. That lets me
calculate a maximum and minimum range for my count of individuals of a
species at a site. I'm not sure how to handle this here.]

# Concepts in my wildcounts data that seem to be missing from Humboldt Core
Terms:

*geospatialScopeDistanceInKilometers* [most of my surveys are
distance-sampling along unbounded transects, for which I have distance not
area. Also, when I'm exploring sites, I record with my GPS the total
distance I travel while surveying the site. In both cases, sampling
distance is more relevant for my data than sampling area. I'm not sampling
with plots. I need somewhere standard to put my samplingDistanceKm.]
*targetPhenologyScope* [eg often in my repeat surveys I'm just mapping out
individual plants that are currently flowering or fruiting. The rest get
ignored. I call this whatsoughtReproductiveCondition in my data.]
*targetSeenHeardScope* [e.g., if I'm in a car, I'm only counting the birds
I see, while I also include the birds I hear when I'm biking. Similarly, if
I was extracting data from my AudioMoth that runs in our garden, that would
be all birds heard only.]
*targetWildCaptiveScope* [sometimes I only survey the wild individuals.
This is an important distinction to make when surveying weeds in urban
areas.]
*targetLargestBodyLengthScope* [eg sometimes I'm just surveying big birds,
or big butteflies, such as when I'm surveying from a moving car. I have a
minimum and maximum value for whatsoughtLongestBodyDimensionMetres in my
data to capture this.]
*targetDeadOrAliveScope* [eg when surveying roadkill they're all dead. I
can be surveying dead birds along a drive but not surveying all live birds.]
*whatsoughtBodyCondition* (when I'm repeating surveying roadkill along
standard routes, I'm only counting the fresh carcasses, so in my data
whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect
Humboldt Core to contain terms for details like this, but I’m also not sure
how to handle them in Humboldt Core data.)
*samplingDurationMinutes* [this will be samplingEffortValue and
samplingEffortUnit when there is somewhere else to put my sampled distance]
*predetermined* [sometimes I might hear an interesting bird and go outside
and do a checklist including that bird. This, in my data, is a survey with
predetermined=="false". If I do a planned survey at a planned time,
irrespective of the conditions and species present, then that's
predetermined=="true". In my data, predetermined can be assigned both at a
whole survey level and at a howSought level for each Scope (eg if I went
outside to look for an interesting bird I heard, and decided to survey
butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for
my bird list but howSoughtPredetermined == "true" for my butterfly list).
I'm not sure where that concept fits into Humboldt Core.]

# Humboldt Core Terms not used by me (at the moment):

*identificationReferences* [I use scientificNameAuthorship and
namePublishedIn and namePublishedInYear from Darwin Core in my species
database as a source for each species concept, so perhaps that works for
this?]
*siteCount* [this could be calculated for each of my surveys, usually by
summing up sections of my transects]
*siteNestingDescription* [I don't use this now but could add a generic text
description of my method into here. Most of my surveys have a footprintWKT
LINESTRING of where I searched, and these can optionally have nested
subsets of linestrings in different habitats, or sections of a transect, or
time periods. I could descibe this in text but it will also be apparent
from the data.]
*verbatimSiteDescriptions*
*geospatialScopeAreaInSquareKilometers* [all my surveys in recent years are
unbounded distance-sampled transects. I use a footprintWKT LINESTRING
generated from a GPX file to define the transect. For some taxa, I restrict
the width of the distance sampled, and so could calculate an area. For most
(eg birds), my observations are all counts in estimated distance bands,
including as far away as I can see and hear with my unaided eye and ear. A
geospatialScopeDistanceInKilometers would work much better for my surveys
than area. At the moment I'm using a LINESTRING footprintWKT and
samplingDistanceKm.]
*totalAreaSampledInSquareKilometers* [comment as above]
*reportedExtremeConditions*
*compilationType*
*compilationSourceTypes*
*inventoryTypes* [At the moment I don't understand what this refers to. If
I do a distance-sampling transect and count and map out all butterflies,
which I do, is that inventoryTypes == "open search"? Or does that mean
something else?]
*isAbundanceCapReported* [I suppose I can include this always as
isAbundanceCapReported == "FALSE"]
*abundanceCap*
*isVegetationCoverReported*
*isAbsenceReported* [lots of absences can be inferred from my
surveys--that's one of the reasons I do them--but I don't have data like
“blackbird = 0". I'm assuming that's the kind of data you're meaning by
isAbsenceReported == "true"]
*absentTaxa* [although all absent taxa can be inferred from the target
fields like targetTaxonomicScope. I assume you're not expecting impossibly
long lists of absent taxa with every survey dataset. It's more efficient to
say targetTaxonomicScope == "Aves" and just list the birds seen and heard.]
*hasMaterialSamples*
*materialSampleTypes*
*samplingEffortProtocol* [I don't see the difference between this and
samplingEffortValue for my surveys, when my survey method has been
described in protocolDescription]
*taxonCompletenessProtocols* [I'm confused at the moment by how to describe
my sampling method across protocolDescription, samplingEffortProtocol, and
taxonCompletenessProtocols. I sense that taxonCompletenessProtocols is
trying to get at whether a survey has made an estimate of its detection
probability for a surveyed taxon, and how that was assessed, and whether
this influenced the sampling effort. In my case, for example, I might bike
a 20 km route and distance count/map all birds I see and hear. That's a
good fit for the protocolDescription. I'm not sure what I would then need
to state for the samplingEffortProtocol and the taxonCompletenessProtocols
here. Do I ignore them? Detection probabilty can be modelled from my data
but that doesn't influence how quickly I'm biking or how far I'm biking, so
I'm guessing I have no taxonCompletenessProtocol.]

# examples some details of my method that probably just don't belong as
Humboldt Core Terms

gpsSource
visualAid
auditoryAid
dataEntryHardware
dataEntrySoftware
distance bands from observer

On Wed, Aug 9, 2023 at 7:23 AM Yanina Sica <yanina.sica at gmail.com> wrote:

> yuhuuu!!! amazing progress! Thanks Ming and John!
>
> I hope everybody received my invitation to today's meeting.
>
> I might be a couple of minutes late but hope to see everybody soon!
>
> Cheers
> Yani
>
>
> On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan <ymgan at naturalsciences.be>
> wrote:
>
>> Hi all,
>>
>>
>> I have done the exercise mentioned in subject.
>> Please see the rendered html with this link:
>> https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/src/mapping.html
>> The repository is public:
>> https://github.com/biodiversity-aq/humboldt-for-eco-survey-data
>> I hope some of the remarks and questions are useful!
>>
>> I think the guiding principles are fine after last Wednesday long meeting
>> (special thanks to Yani, Ani and Wesley). The dataset based on latest term
>> name (from last Wednesday) is good to go. I can update the dataset in
>> test IPT <https://ipt.gbif.org/resource?r=brokewest-fish> when the new
>> terms are up in the sandbox.
>>
>>
>>
>> Cheers
>> Ming
>> _______________________________________________
>> tdwg-humboldt mailing list
>> tdwg-humboldt at lists.tdwg.org
>> https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
>>
> _______________________________________________
> tdwg-humboldt mailing list
> tdwg-humboldt at lists.tdwg.org
> https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
>

-- 
Robert D Stevenson
Associate Professor
Department of Biology
UMass Boston
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-humboldt/attachments/20230811/abb35e91/attachment-0001.html>