[tdwg-humboldt] Inferring non-detection of target taxa of presence-only data using Humboldt Extension | Test dataset with updated terms good to go

Mon Aug 14 13:55:47 UTC 2023

Correction, as Tim pointed out to me, dwc:dynamicProperties can also be
used in the Event Core.

On Mon, Aug 14, 2023 at 10:46 AM John Wieczorek <tuco at berkeley.edu> wrote:

> For Events, the Extended Measurements or Facts extension is the only way
> to declare and share properties outside of the mappable terms. For
> Occurrences, one also has the option to use dwc:dynamicProperties.
>
> In the IPT, one can upload a file with fields beyond those that are
> mapped, but only the mapped fields plus those that are set to a constant
> within the IPT get propagated in the output.
>
> On Sun, Aug 13, 2023 at 1:16 PM ys628 <yanina.sica at yale.edu> wrote:
>
>> Hi Rob,
>> Thanks for these comments.
>> I agree the eco:isLeastSpecificTargetCategoryQuantityInclusive is quite
>> a terrible name but we reached to that name because we took 2 main things
>> under consideration:
>>
>>    - it is needed to understand how to treat dwc:organismQuantity and *
>>    dwc:organismQuantityType. So we thought Quantity should be there*
>>    - and it allows for multiple target categories (e.g., taxonomic ranks
>>    within a higher rank or different life stages for the same species) so
>>    thats why we left it quite open...
>>
>> Regarding your larger question, this is a very interesting comment and I
>> think we should cover this in the User Guide
>> <https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit>.
>> My initial thought is that the extended measurements or facts extension
>> (emof) could be useful here. Also, I think you are allowed to include your
>> own terms when using the IPT, but I am not sure.
>>
>> Best!
>>
>> Yani
>>
>> <https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit>
>> Humboldt Extension for ecological inventories User Guide
>> <https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit>
>> Humboldt Extension for Ecological Inventories: User Guide Date modified:
>> 2023-05-11 Part of TDWG Standard: Not part of any standard Abstract: This
>> user guide provides information and examples showing how to use the
>> Humboldt Extension for Ecological Inventories ("eco") to extend Darwin Core
>> Even...
>> docs.google.com
>>
>>
>>
>> ------------------------------
>> *From:* tdwg-humboldt <tdwg-humboldt-bounces at lists.tdwg.org> on behalf
>> of Rob Stevenson <rdstevenson10 at gmail.com>
>> *Sent:* Friday, August 11, 2023 6:14 PM
>> *To:* Humboldt Core TG <tdwg-humboldt at lists.tdwg.org>
>> *Cc:* wmh6 at cornell.edu <wmh6 at cornell.edu>
>> *Subject:* Re: [tdwg-humboldt] Inferring non-detection of target taxa of
>> presence-only data using Humboldt Extension | Test dataset with updated
>> terms good to go
>>
>> Thanks for the documents Ming and Yani. It is nice to see the continued
>> progress.  Sorry I am not able to join this week's meeting.
>>
>> A few comments which may not be useful after the meeting.
>>
>> I did look at eco:isLeastSpecificTargetCategoryQuantityInclusive
>> Guidelines  here
>> https://docs.google.com/document/d/11fPW4JibRWUnrtTOZwDkAcxo5d-lwnATXyh3xmO65rs/edit
>>
>> The purpose is clear after reading the document but I found the term name
>> "isLeastSpecificTargetCategoryQuantityInclusive" non intuitive.
>>
>> Could the words "counts" and  "preCalulated" be used somehow?
>>
>> In reading the beginning of the document there is a sentence that says
>>
>> " need to be treated differently in order to calculate the total quantity
>> of organisms in the *least specific category*. "
>> I suggest saying "in the *least specific taxonomic category" that will
>> often be the species level.*
>>
>> A larger question.  If some one has terms that the Event Extension  does
>> not support.  What option does some one have to share data.
>>
>> I was thinking specifically of Jon Sullivan's old email. Could the event
>> extension documents provide general advice for people with terms like Jon?
>>
>> Best, Rob
>>
>> Sullivan, Jon Jon.Sullivan at lincoln.ac.nz via
>> <https://support.google.com/mail/answer/1311182?hl=en> lists.tdwg.org
>> Wed, Apr 5, 10:21 PM
>> to tdwg-humboldt at lists.tdwg.org
>> Hello fine Humboldt Core people,
>>
>> I’ve been lurking on the mailing list and finally sat down and had a
>> proper look at how all the current HC terms map onto my ecological
>> surveying. Perhaps my feedback is still useful at this late stage. (Rob
>> Stevenson invited me to do this while you were doing the case studies at
>> the end of last year but I got swamped.)
>>
>> I’m on something of a personal mission to document the changes in the
>> readily detectable and identifiable species around me. My surveys hit the
>> 20-year mark last Saturday and I’m at over 1.5 million observations.
>> They’re mostly still in geotagged audio notes but I’ve had AWS Transcribe
>> cranking through them over the last month and a half and I’m about half way
>> through now. I’m hoping that my dataset is just the kind of ecological
>> survey project that will benefit from the standardisation offered
>> by Humboldt Core.
>>
>> Anyway, with that preamble, below is a summary of how my data structure
>> maps onto the draft Humboldt Core. My biggest puzzle is how to describe my
>> unbounded distance-sampled transect counts within a structure that only
>> seems to quantify the surveyed sites by area.
>>
>> Cheers,
>>
>> Jon
>>
>> # Humboldt Core Terms useful as is:
>>
>> *samplingPerformedBy*
>> *identifiedBy*
>> *verbatimSiteNames*
>> *eventDuration* [I use samplingDurationMinutes at the moment, which will
>> translate easily.]
>> eventDurationUnit
>> *targetTaxonomicScope*
>> *targetLifeStageScope*
>> *excludedLifeStageScope*
>> *targetDegreeOfEstablishmentScope* [I'm hoping I can use a different
>> vocabularly in here as I also use "endemic", as some of my surveys are only
>> of New Zealand endemics and do not include relatively recently established
>> Australian natives and are now also NZ natives. I also find the concepts
>> "invasive" and "widespread invasive" too slippery to use for species scope,
>> so I use "naturalised" when I want to refer to wild exotic species.]
>> *excludedDegreeOfEstablishmentScope*
>> *targetGrowthFormScope* [eg when I'm surveying woody weeds, that would
>> be targetTaxonomicScope == Tracheophtya, targetDegreeOfEstablishmentScope
>> == "naturalised",  excludedDegreeOfEstablishmentScope == "native",
>> targetGrowthFormScope == "tree|shrub|liane"]
>> *targetHabitatScope* [eg targetHabitatScope == "road|roadside" for my
>> roadkill surveys]
>> *reportedWeather*
>> *protocolNames*
>> *protocolDescription*
>> *protocolReferences*
>> *isAbundanceReported*
>> *hasVouchers*
>> *voucherInstitutions* [although often my collected specimens are still
>> in my person collection, eg waiting in my plant press]
>> isSamplingEffortReported
>> *samplingEffortValue* [at the moment I'm using the DWC "samplingEffort",
>> eg samplingEffort == "11.91 km in 244 minutes"]
>> *taxonCompletenessReported* ["reported complete"]
>> *isTaxonomicScopeComplete*
>> *isLifeStageScopeComplete*
>> *isDegreeOfEstablishmentScopeComplete*
>> *isGrowthFormScopeComplete* [is there an issue here if these fields are
>> ever interpreted independently of one another? If I'm surveying woody
>> weeds, then, in combination, TaxonomicScope == "Tracheophtya" and
>> LifeStageScope == "tree|shrub|liane" and DegreeOfEstablishmentScope ==
>> "naturalised" are complete. However, each independently is not complete,
>> e.g., I didn't survey all TaxonomicScope == "Tracheophtya". Would it be
>> better if there was just one isScopeComplete and that applies to all the
>> scopes listed? I can't think at the moment of a survey type that would list
>> a series of Scope values and differ in whether or not ScopeComplete ==
>> "true" for each.]
>>
>>
>> # Humboldt Core Terms useful but with modification:
>>
>> *samplingEffortUnit* [how do I untangle distance and duration in this
>> field? If I'm exploring at a site, I have the distance from my GPS and the
>> duration. I've been regarding those in combination as my sampling effort
>> for a site.]
>> *isLeastSpecificTargetCategoryQuantityInclusive* [I have a more
>> complicated approach to this where I can mark individuals as "different",
>> "possibly same", and "same", for possible or definite resightings. That
>> lets me calculate a maximum and minimum range for my count of individuals
>> of a species at a site. I'm not sure how to handle this here.]
>>
>>
>> # Concepts in my wildcounts data that seem to be missing from Humboldt
>> Core Terms:
>>
>> *geospatialScopeDistanceInKilometers* [most of my surveys are
>> distance-sampling along unbounded transects, for which I have distance not
>> area. Also, when I'm exploring sites, I record with my GPS the total
>> distance I travel while surveying the site. In both cases, sampling
>> distance is more relevant for my data than sampling area. I'm not sampling
>> with plots. I need somewhere standard to put my samplingDistanceKm.]
>> *targetPhenologyScope* [eg often in my repeat surveys I'm just mapping
>> out individual plants that are currently flowering or fruiting. The rest
>> get ignored. I call this whatsoughtReproductiveCondition in my data.]
>> *targetSeenHeardScope* [e.g., if I'm in a car, I'm only counting the
>> birds I see, while I also include the birds I hear when I'm biking.
>> Similarly, if I was extracting data from my AudioMoth that runs in our
>> garden, that would be all birds heard only.]
>> *targetWildCaptiveScope* [sometimes I only survey the wild individuals.
>> This is an important distinction to make when surveying weeds in urban
>> areas.]
>> *targetLargestBodyLengthScope* [eg sometimes I'm just surveying big
>> birds, or big butteflies, such as when I'm surveying from a moving car. I
>> have a minimum and maximum value for whatsoughtLongestBodyDimensionMetres
>> in my data to capture this.]
>> *targetDeadOrAliveScope* [eg when surveying roadkill they're all dead. I
>> can be surveying dead birds along a drive but not surveying all live birds.]
>> *whatsoughtBodyCondition* (when I'm repeating surveying roadkill along
>> standard routes, I'm only counting the fresh carcasses, so in my data
>> whatsoughtBodyCondition == "dead carcass <24 hours old”. I don’t expect
>> Humboldt Core to contain terms for details like this, but I’m also not sure
>> how to handle them in Humboldt Core data.)
>> *samplingDurationMinutes* [this will be samplingEffortValue and
>> samplingEffortUnit when there is somewhere else to put my sampled distance]
>> *predetermined* [sometimes I might hear an interesting bird and go
>> outside and do a checklist including that bird. This, in my data, is a
>> survey with predetermined=="false". If I do a planned survey at a planned
>> time, irrespective of the conditions and species present, then that's
>> predetermined=="true". In my data, predetermined can be assigned both at a
>> whole survey level and at a howSought level for each Scope (eg if I went
>> outside to look for an interesting bird I heard, and decided to survey
>> butterflies while I'm at it, then I'm howSoughtPredetermined == "false" for
>> my bird list but howSoughtPredetermined == "true" for my butterfly list).
>> I'm not sure where that concept fits into Humboldt Core.]
>>
>> # Humboldt Core Terms not used by me (at the moment):
>>
>> *identificationReferences* [I use scientificNameAuthorship and
>> namePublishedIn and namePublishedInYear from Darwin Core in my species
>> database as a source for each species concept, so perhaps that works for
>> this?]
>> *siteCount* [this could be calculated for each of my surveys, usually by
>> summing up sections of my transects]
>> *siteNestingDescription* [I don't use this now but could add a generic
>> text description of my method into here. Most of my surveys have a
>> footprintWKT LINESTRING of where I searched, and these can optionally have
>> nested subsets of linestrings in different habitats, or sections of a
>> transect, or time periods. I could descibe this in text but it will also be
>> apparent from the data.]
>> *verbatimSiteDescriptions*
>> *geospatialScopeAreaInSquareKilometers* [all my surveys in recent years
>> are unbounded distance-sampled transects. I use a footprintWKT LINESTRING
>> generated from a GPX file to define the transect. For some taxa, I restrict
>> the width of the distance sampled, and so could calculate an area. For most
>> (eg birds), my observations are all counts in estimated distance bands,
>> including as far away as I can see and hear with my unaided eye and ear. A
>> geospatialScopeDistanceInKilometers would work much better for my surveys
>> than area. At the moment I'm using a LINESTRING footprintWKT and
>> samplingDistanceKm.]
>> *totalAreaSampledInSquareKilometers* [comment as above]
>> *reportedExtremeConditions*
>> *compilationType*
>> *compilationSourceTypes*
>> *inventoryTypes* [At the moment I don't understand what this refers to.
>> If I do a distance-sampling transect and count and map out all butterflies,
>> which I do, is that inventoryTypes == "open search"? Or does that mean
>> something else?]
>> *isAbundanceCapReported* [I suppose I can include this always as
>> isAbundanceCapReported == "FALSE"]
>> *abundanceCap*
>> *isVegetationCoverReported*
>> *isAbsenceReported* [lots of absences can be inferred from my
>> surveys--that's one of the reasons I do them--but I don't have data like
>> “blackbird = 0". I'm assuming that's the kind of data you're meaning by
>> isAbsenceReported == "true"]
>> *absentTaxa* [although all absent taxa can be inferred from the target
>> fields like targetTaxonomicScope. I assume you're not expecting impossibly
>> long lists of absent taxa with every survey dataset. It's more efficient to
>> say targetTaxonomicScope == "Aves" and just list the birds seen and heard.]
>> *hasMaterialSamples*
>> *materialSampleTypes*
>> *samplingEffortProtocol* [I don't see the difference between this and
>> samplingEffortValue for my surveys, when my survey method has been
>> described in protocolDescription]
>> *taxonCompletenessProtocols* [I'm confused at the moment by how to
>> describe my sampling method across protocolDescription,
>> samplingEffortProtocol, and taxonCompletenessProtocols. I sense that
>> taxonCompletenessProtocols is trying to get at whether a survey has made an
>> estimate of its detection probability for a surveyed taxon, and how that
>> was assessed, and whether this influenced the sampling effort. In my case,
>> for example, I might bike a 20 km route and distance count/map all birds I
>> see and hear. That's a good fit for the protocolDescription. I'm not sure
>> what I would then need to state for the samplingEffortProtocol and the
>> taxonCompletenessProtocols here. Do I ignore them? Detection probabilty can
>> be modelled from my data but that doesn't influence how quickly I'm biking
>> or how far I'm biking, so I'm guessing I have no taxonCompletenessProtocol.]
>>
>> # examples some details of my method that probably just don't belong as
>> Humboldt Core Terms
>>
>> gpsSource
>> visualAid
>> auditoryAid
>> dataEntryHardware
>> dataEntrySoftware
>> distance bands from observer
>>
>>
>>
>>
>> On Wed, Aug 9, 2023 at 7:23 AM Yanina Sica <yanina.sica at gmail.com> wrote:
>>
>> yuhuuu!!! amazing progress! Thanks Ming and John!
>>
>> I hope everybody received my invitation to today's meeting.
>>
>> I might be a couple of minutes late but hope to see everybody soon!
>>
>> Cheers
>> Yani
>>
>>
>> On Mon, Aug 7, 2023 at 6:33 PM Yi Ming Gan <ymgan at naturalsciences.be>
>> wrote:
>>
>> Hi all,
>>
>>
>> I have done the exercise mentioned in subject.
>> Please see the rendered html with this link:
>> https://raw.githack.com/biodiversity-aq/humboldt-for-eco-survey-data/main/src/mapping.html
>> The repository is public:
>> https://github.com/biodiversity-aq/humboldt-for-eco-survey-data
>> I hope some of the remarks and questions are useful!
>>
>> I think the guiding principles are fine after last Wednesday long meeting
>> (special thanks to Yani, Ani and Wesley). The dataset based on latest term
>> name (from last Wednesday) is good to go. I can update the dataset in
>> test IPT <https://ipt.gbif.org/resource?r=brokewest-fish> when the new
>> terms are up in the sandbox.
>>
>>
>>
>> Cheers
>> Ming
>> _______________________________________________
>> tdwg-humboldt mailing list
>> tdwg-humboldt at lists.tdwg.org
>> https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
>>
>> _______________________________________________
>> tdwg-humboldt mailing list
>> tdwg-humboldt at lists.tdwg.org
>> https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
>>
>>
>>
>> --
>> Robert D Stevenson
>> Associate Professor
>> Department of Biology
>> UMass Boston
>> _______________________________________________
>> tdwg-humboldt mailing list
>> tdwg-humboldt at lists.tdwg.org
>> https://lists.tdwg.org/mailman/listinfo/tdwg-humboldt
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-humboldt/attachments/20230814/f7f8999b/attachment-0001.html>