[tdwg-content] Delimiters for Darwin Core list-type terms

Richard Pyle deepreef at bishopmuseum.org
Mon Oct 7 20:07:24 CEST 2013


I am very supportive of this being at least recommended (and used
consistently in documentation) for DWC.  I understand Tim's and Hilmar's and
Joel's and Bob's point(s), but we have HIGHLY normalized data, and we still
find it very useful in some cases to pass delimited arrays within a single
value (usually in the form of UUIDs, but also other things like full
taxonomies and such).  So if there are *ever* times when it's useful (and I
believe it is), then a recommended "best practice" (and consistency in
documentation) is a good thing.  Maybe in the future we will no longer have
any need to flatten data, but we're not at the future yet, so let's try to
make the present (and near-term future) a bit easier to deal with.

I am also in strong favor of the pipe ("|") -- for the reasons that John and
others mentioned, and also because to human eyeballs, it's more intuitive, I
think -- and it's easier to detect programmatically when a value is an array
of similar values separated by a rarely-used character (not that anyone
*should* be trying to detect it programmatically, but reality does
bite....).

I would like to go a step further and see a secondary standard delimiter
defined (for key-value pairs).  The obvious one would be "=", but it may
appear in fields not as a delimiter.  Thus, we have been using the tilde "~"
for this secondary delimiter.  However, I don't have a strong opinion -- I
just would like to see consistency.  The "uniqueness" of the secondary
delimiter is less critical, because the probability of "=" showing up
*within* a delimited value seems less likely.  In other words, once you know
it's a delimited list (by the presence of the primary delimiter), the risk
of confusion on the secondary delimiter goes way down. But I'd still vote
for tilde.

Aloha,
Rich

> -----Original Message-----
> From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-
> bounces at lists.tdwg.org] On Behalf Of John Wieczorek
> Sent: Monday, October 07, 2013 3:47 AM
> To: Steve Baskauf
> Cc: TDWG Content Mailing List
> Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
> 
> I second the motion. Steve, is there any documention on the logic behind
the
> choice of delimiter for AC that will help people here?
> 
> On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf
> <steve.baskauf at vanderbilt.edu> wrote:
> > I don't have an opinion about what the recommended delimiter should
> > be, but I think it would be beneficial for there to be consistency
> > between Darwin Core and Audubon Core.  You can see what the
> > recommendation is for Audubon Core at
> >
> http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_
> p
> > lain_text_values
> > - it's the pipe "|".  Either Darwin Core should go with this, or if
> > there is a consensus reached here that is different, then AC should be
> > changed before it is ratified, which potentially could happen in a
> > matter of weeks.  It is highly likely that there will be records that
> > are a mixture of AC and DwC, so it would not be a good thing for the
> recommendations to differ.
> >
> > Steve
> >
> > Markus Döring wrote:
> >
> > Hi John et al.,
> >
> > I would like to see a single recommended default delimiter,
> > preferrably the semicolon as its natural and hardly used in values.
> > For dwc archives there is a multiValueDelimiter attribute for every
> > term mapping that allows to declare other delimiters if needed.
> >
> > Currently it is hardly possible to detect multi values in a field and
> > you can just test for some often used ones but even then you never
> > know if they were meant to be delimiters.
> > Having a single default value helps to get the idea of multi values
> > across and make it a bit more accessible I believe.
> >
> > dwc:vernacularName I would personally prefer to see as a single value
> > term as it is mostly useful in combination with a locale and rarely is
> > shared on its own.
> > Seeing dwc:typeStatus being a multi value term also feels wrong as the
> > name is in singluar while the others carry the multi value nature in
> > the name already.
> >
> >
> > Markus
> >
> >
> >
> > n 07.10.2013, at 12:28, John Wieczorek wrote:
> >
> >
> >
> > Dear all,
> >
> > On the list of pending Darwin Core issues is a topic of general
> > concern about terms that could or do recommend the concatenation and
> > delimiting of a list of values. The specific issue was submitted on
> > the Darwin Core Project site at
> > https://code.google.com/p/darwincore/issues/detail?id=168. Right now
> > there is variation in the recommendations of distinct terms.
> >
> > The Darwin Core terms that could be used to hold lists include the
> > following (use the index at
> > http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
> > details of each of these):
> >
> > informationWithheld
> > dataGeneralizations
> > dynamicProperties
> > recordedBy
> > preparations
> > otherCatalogNumbers
> > previousIdentifications
> > associatedMedia
> > associatedReferences
> > associatedOccurrences
> > associatedSequences
> > associatedTaxa
> > higherGeography
> > georeferenceSources
> > typeStatus
> > higherClassification
> > vernacularName
> >
> > There are some issues. Many terms do not show examples. Most of those
> > that do show examples recommend semi-colon (';') -
> > associatedOccurrences, recordedBy, preparations, otherCatalogNumbers,
> > previousIdentifications, higherGeography, georeferenceSources, and
> > higherClassification, The example for higherClassification does not
> > have spaces after the semi-colon while all others do.
> >
> > Terms that could hold a list of URLs would require a delimiter that
> > would be an invalid part of a URL unless it was escaped. This
> > precludes comma (','), semi-colon (';'), and colon (':'), among
> > others. One possibility here might be the vertical bar or "pipe"
> > ('|').
> >
> > The term dynamicProperties is meant to take key-value pairs. The
> > examples suggest the format key=value, with any list delimited by a
> > semi-colon, for example, "tragusLengthInMeters=0.014;
> > weightInGrams=120". The example for associatedTaxa also shows a
> > key-value pair ("host: Quercus alba"), but it is formatted differently
> > from the examples for dynamicProperties. There are other terms, such
> > as vernacularName, which could potentially also take a key-value pair,
> > though it is not currently recommended to be a list.
> >
> > Please ignore the issue of whether the idea of list-type terms is a
> > good idea or not - that is not the issue we're trying to resolve here.
> > Instead, the issue is whether a consistent recommendation can be made
> > for how to delimit the values in a list. And if not a consistent
> > recommendation, can we make specific recommendations for distinct
> > terms? If specific recommendations can be made for a term, should that
> > be reflected in examples within the term definitions, or should such
> > recommendations reside only in Type 3 supplementary documentation
> such
> > as that which can be found on the Darwin Core Project site at, for
> > example,
> >
> https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequen
> ces?
> > Should some of these terms have specific recommendations to contain
> > only single values (e.g., vernacularName), in which case they are not
> > really viable in Simple Darwin Core?
> >
> > Cheers,
> >
> > John
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >
> >
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >
> > .
> >
> >
> >
> >
> > --
> > Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept.
> > of Biological Sciences
> >
> > postal mail address:
> > PMB 351634
> > Nashville, TN  37235-1634,  U.S.A.
> >
> > delivery address:
> > 2125 Stevenson Center
> > 1161 21st Ave., S.
> > Nashville, TN 37235
> >
> > office: 2128 Stevenson Center
> > phone: (615) 343-4582,  fax: (615) 322-4942 If you fax, please phone
> > or email so that I will know to look for it.
> > http://bioimages.vanderbilt.edu
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content



More information about the tdwg-content mailing list