[tdwg-content] Delimiters for Darwin Core list-type terms

Markus Döring m.doering at mac.com
Mon Oct 7 21:04:53 CEST 2013


Rich,
is there any other term apart from dynamicProperties that would use the KVP delimiter?

Markus



On 07.10.2013, at 20:07, Richard Pyle wrote:

> I am very supportive of this being at least recommended (and used
> consistently in documentation) for DWC.  I understand Tim's and Hilmar's and
> Joel's and Bob's point(s), but we have HIGHLY normalized data, and we still
> find it very useful in some cases to pass delimited arrays within a single
> value (usually in the form of UUIDs, but also other things like full
> taxonomies and such).  So if there are *ever* times when it's useful (and I
> believe it is), then a recommended "best practice" (and consistency in
> documentation) is a good thing.  Maybe in the future we will no longer have
> any need to flatten data, but we're not at the future yet, so let's try to
> make the present (and near-term future) a bit easier to deal with.
> 
> I am also in strong favor of the pipe ("|") -- for the reasons that John and
> others mentioned, and also because to human eyeballs, it's more intuitive, I
> think -- and it's easier to detect programmatically when a value is an array
> of similar values separated by a rarely-used character (not that anyone
> *should* be trying to detect it programmatically, but reality does
> bite....).
> 
> I would like to go a step further and see a secondary standard delimiter
> defined (for key-value pairs).  The obvious one would be "=", but it may
> appear in fields not as a delimiter.  Thus, we have been using the tilde "~"
> for this secondary delimiter.  However, I don't have a strong opinion -- I
> just would like to see consistency.  The "uniqueness" of the secondary
> delimiter is less critical, because the probability of "=" showing up
> *within* a delimited value seems less likely.  In other words, once you know
> it's a delimited list (by the presence of the primary delimiter), the risk
> of confusion on the secondary delimiter goes way down. But I'd still vote
> for tilde.
> 
> Aloha,
> Rich
> 
>> -----Original Message-----
>> From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-
>> bounces at lists.tdwg.org] On Behalf Of John Wieczorek
>> Sent: Monday, October 07, 2013 3:47 AM
>> To: Steve Baskauf
>> Cc: TDWG Content Mailing List
>> Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
>> 
>> I second the motion. Steve, is there any documention on the logic behind
> the
>> choice of delimiter for AC that will help people here?
>> 
>> On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf
>> <steve.baskauf at vanderbilt.edu> wrote:
>>> I don't have an opinion about what the recommended delimiter should
>>> be, but I think it would be beneficial for there to be consistency
>>> between Darwin Core and Audubon Core.  You can see what the
>>> recommendation is for Audubon Core at
>>> 
>> http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_
>> p
>>> lain_text_values
>>> - it's the pipe "|".  Either Darwin Core should go with this, or if
>>> there is a consensus reached here that is different, then AC should be
>>> changed before it is ratified, which potentially could happen in a
>>> matter of weeks.  It is highly likely that there will be records that
>>> are a mixture of AC and DwC, so it would not be a good thing for the
>> recommendations to differ.
>>> 
>>> Steve
>>> 
>>> Markus Döring wrote:
>>> 
>>> Hi John et al.,
>>> 
>>> I would like to see a single recommended default delimiter,
>>> preferrably the semicolon as its natural and hardly used in values.
>>> For dwc archives there is a multiValueDelimiter attribute for every
>>> term mapping that allows to declare other delimiters if needed.
>>> 
>>> Currently it is hardly possible to detect multi values in a field and
>>> you can just test for some often used ones but even then you never
>>> know if they were meant to be delimiters.
>>> Having a single default value helps to get the idea of multi values
>>> across and make it a bit more accessible I believe.
>>> 
>>> dwc:vernacularName I would personally prefer to see as a single value
>>> term as it is mostly useful in combination with a locale and rarely is
>>> shared on its own.
>>> Seeing dwc:typeStatus being a multi value term also feels wrong as the
>>> name is in singluar while the others carry the multi value nature in
>>> the name already.
>>> 
>>> 
>>> Markus
>>> 
>>> 
>>> 
>>> n 07.10.2013, at 12:28, John Wieczorek wrote:
>>> 
>>> 
>>> 
>>> Dear all,
>>> 
>>> On the list of pending Darwin Core issues is a topic of general
>>> concern about terms that could or do recommend the concatenation and
>>> delimiting of a list of values. The specific issue was submitted on
>>> the Darwin Core Project site at
>>> https://code.google.com/p/darwincore/issues/detail?id=168. Right now
>>> there is variation in the recommendations of distinct terms.
>>> 
>>> The Darwin Core terms that could be used to hold lists include the
>>> following (use the index at
>>> http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
>>> details of each of these):
>>> 
>>> informationWithheld
>>> dataGeneralizations
>>> dynamicProperties
>>> recordedBy
>>> preparations
>>> otherCatalogNumbers
>>> previousIdentifications
>>> associatedMedia
>>> associatedReferences
>>> associatedOccurrences
>>> associatedSequences
>>> associatedTaxa
>>> higherGeography
>>> georeferenceSources
>>> typeStatus
>>> higherClassification
>>> vernacularName
>>> 
>>> There are some issues. Many terms do not show examples. Most of those
>>> that do show examples recommend semi-colon (';') -
>>> associatedOccurrences, recordedBy, preparations, otherCatalogNumbers,
>>> previousIdentifications, higherGeography, georeferenceSources, and
>>> higherClassification, The example for higherClassification does not
>>> have spaces after the semi-colon while all others do.
>>> 
>>> Terms that could hold a list of URLs would require a delimiter that
>>> would be an invalid part of a URL unless it was escaped. This
>>> precludes comma (','), semi-colon (';'), and colon (':'), among
>>> others. One possibility here might be the vertical bar or "pipe"
>>> ('|').
>>> 
>>> The term dynamicProperties is meant to take key-value pairs. The
>>> examples suggest the format key=value, with any list delimited by a
>>> semi-colon, for example, "tragusLengthInMeters=0.014;
>>> weightInGrams=120". The example for associatedTaxa also shows a
>>> key-value pair ("host: Quercus alba"), but it is formatted differently
>>> from the examples for dynamicProperties. There are other terms, such
>>> as vernacularName, which could potentially also take a key-value pair,
>>> though it is not currently recommended to be a list.
>>> 
>>> Please ignore the issue of whether the idea of list-type terms is a
>>> good idea or not - that is not the issue we're trying to resolve here.
>>> Instead, the issue is whether a consistent recommendation can be made
>>> for how to delimit the values in a list. And if not a consistent
>>> recommendation, can we make specific recommendations for distinct
>>> terms? If specific recommendations can be made for a term, should that
>>> be reflected in examples within the term definitions, or should such
>>> recommendations reside only in Type 3 supplementary documentation
>> such
>>> as that which can be found on the Darwin Core Project site at, for
>>> example,
>>> 
>> https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequen
>> ces?
>>> Should some of these terms have specific recommendations to contain
>>> only single values (e.g., vernacularName), in which case they are not
>>> really viable in Simple Darwin Core?
>>> 
>>> Cheers,
>>> 
>>> John
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>> 
>>> 
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>> 
>>> .
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept.
>>> of Biological Sciences
>>> 
>>> postal mail address:
>>> PMB 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>> 
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>> 
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 322-4942 If you fax, please phone
>>> or email so that I will know to look for it.
>>> http://bioimages.vanderbilt.edu
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> 
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content



More information about the tdwg-content mailing list