[tdwg-content] Delimiters for Darwin Core list-type terms

Richard Pyle deepreef at bishopmuseum.org
Mon Oct 7 20:28:00 CEST 2013


One clarification on my comment RE: secondary delimiter.  Key-value pairs is
one reason why a secondary delimiter is helpful.  Another situation (which
we have from time to time -- even with our highly normalized data; indeed
*because* of our highly normalized data), is a nested array.  We've never
needed more than two tiers of nesting, but having one tier has been
extremely helpful in a couple of situations (which I'd be happy to explain,
but if I did so, Tim would first need to go get a cup of coffee....or was it
tea?)

Aloha,
Rich

> -----Original Message-----
> From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-
> bounces at lists.tdwg.org] On Behalf Of Richard Pyle
> Sent: Monday, October 07, 2013 8:07 AM
> To: tuco at berkeley.edu; 'Steve Baskauf'
> Cc: 'TDWG Content Mailing List'
> Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
> 
> I am very supportive of this being at least recommended (and used
> consistently in documentation) for DWC.  I understand Tim's and Hilmar's
and
> Joel's and Bob's point(s), but we have HIGHLY normalized data, and we
still
> find it very useful in some cases to pass delimited arrays within a single
value
> (usually in the form of UUIDs, but also other things like full taxonomies
and
> such).  So if there are *ever* times when it's useful (and I believe it
is), then
> a recommended "best practice" (and consistency in
> documentation) is a good thing.  Maybe in the future we will no longer
have
> any need to flatten data, but we're not at the future yet, so let's try to
make
> the present (and near-term future) a bit easier to deal with.
> 
> I am also in strong favor of the pipe ("|") -- for the reasons that John
and
> others mentioned, and also because to human eyeballs, it's more intuitive,
I
> think -- and it's easier to detect programmatically when a value is an
array of
> similar values separated by a rarely-used character (not that anyone
> *should* be trying to detect it programmatically, but reality does
bite....).
> 
> I would like to go a step further and see a secondary standard delimiter
> defined (for key-value pairs).  The obvious one would be "=", but it may
> appear in fields not as a delimiter.  Thus, we have been using the tilde
"~"
> for this secondary delimiter.  However, I don't have a strong opinion -- I
just
> would like to see consistency.  The "uniqueness" of the secondary
delimiter
> is less critical, because the probability of "=" showing up
> *within* a delimited value seems less likely.  In other words, once you
know
> it's a delimited list (by the presence of the primary delimiter), the risk
of
> confusion on the secondary delimiter goes way down. But I'd still vote for
> tilde.
> 
> Aloha,
> Rich
> 
> > -----Original Message-----
> > From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-
> > bounces at lists.tdwg.org] On Behalf Of John Wieczorek
> > Sent: Monday, October 07, 2013 3:47 AM
> > To: Steve Baskauf
> > Cc: TDWG Content Mailing List
> > Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
> >
> > I second the motion. Steve, is there any documention on the logic
> > behind
> the
> > choice of delimiter for AC that will help people here?
> >
> > On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf
> > <steve.baskauf at vanderbilt.edu> wrote:
> > > I don't have an opinion about what the recommended delimiter should
> > > be, but I think it would be beneficial for there to be consistency
> > > between Darwin Core and Audubon Core.  You can see what the
> > > recommendation is for Audubon Core at
> > >
> >
> http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_
> > p
> > > lain_text_values
> > > - it's the pipe "|".  Either Darwin Core should go with this, or if
> > > there is a consensus reached here that is different, then AC should
> > > be changed before it is ratified, which potentially could happen in
> > > a matter of weeks.  It is highly likely that there will be records
> > > that are a mixture of AC and DwC, so it would not be a good thing
> > > for the
> > recommendations to differ.
> > >
> > > Steve
> > >
> > > Markus Döring wrote:
> > >
> > > Hi John et al.,
> > >
> > > I would like to see a single recommended default delimiter,
> > > preferrably the semicolon as its natural and hardly used in values.
> > > For dwc archives there is a multiValueDelimiter attribute for every
> > > term mapping that allows to declare other delimiters if needed.
> > >
> > > Currently it is hardly possible to detect multi values in a field
> > > and you can just test for some often used ones but even then you
> > > never know if they were meant to be delimiters.
> > > Having a single default value helps to get the idea of multi values
> > > across and make it a bit more accessible I believe.
> > >
> > > dwc:vernacularName I would personally prefer to see as a single
> > > value term as it is mostly useful in combination with a locale and
> > > rarely is shared on its own.
> > > Seeing dwc:typeStatus being a multi value term also feels wrong as
> > > the name is in singluar while the others carry the multi value
> > > nature in the name already.
> > >
> > >
> > > Markus
> > >
> > >
> > >
> > > n 07.10.2013, at 12:28, John Wieczorek wrote:
> > >
> > >
> > >
> > > Dear all,
> > >
> > > On the list of pending Darwin Core issues is a topic of general
> > > concern about terms that could or do recommend the concatenation and
> > > delimiting of a list of values. The specific issue was submitted on
> > > the Darwin Core Project site at
> > > https://code.google.com/p/darwincore/issues/detail?id=168. Right now
> > > there is variation in the recommendations of distinct terms.
> > >
> > > The Darwin Core terms that could be used to hold lists include the
> > > following (use the index at
> > > http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
> > > details of each of these):
> > >
> > > informationWithheld
> > > dataGeneralizations
> > > dynamicProperties
> > > recordedBy
> > > preparations
> > > otherCatalogNumbers
> > > previousIdentifications
> > > associatedMedia
> > > associatedReferences
> > > associatedOccurrences
> > > associatedSequences
> > > associatedTaxa
> > > higherGeography
> > > georeferenceSources
> > > typeStatus
> > > higherClassification
> > > vernacularName
> > >
> > > There are some issues. Many terms do not show examples. Most of
> > > those that do show examples recommend semi-colon (';') -
> > > associatedOccurrences, recordedBy, preparations,
> > > otherCatalogNumbers, previousIdentifications, higherGeography,
> > > georeferenceSources, and higherClassification, The example for
> > > higherClassification does not have spaces after the semi-colon while
all
> others do.
> > >
> > > Terms that could hold a list of URLs would require a delimiter that
> > > would be an invalid part of a URL unless it was escaped. This
> > > precludes comma (','), semi-colon (';'), and colon (':'), among
> > > others. One possibility here might be the vertical bar or "pipe"
> > > ('|').
> > >
> > > The term dynamicProperties is meant to take key-value pairs. The
> > > examples suggest the format key=value, with any list delimited by a
> > > semi-colon, for example, "tragusLengthInMeters=0.014;
> > > weightInGrams=120". The example for associatedTaxa also shows a
> > > key-value pair ("host: Quercus alba"), but it is formatted
> > > differently from the examples for dynamicProperties. There are other
> > > terms, such as vernacularName, which could potentially also take a
> > > key-value pair, though it is not currently recommended to be a list.
> > >
> > > Please ignore the issue of whether the idea of list-type terms is a
> > > good idea or not - that is not the issue we're trying to resolve here.
> > > Instead, the issue is whether a consistent recommendation can be
> > > made for how to delimit the values in a list. And if not a
> > > consistent recommendation, can we make specific recommendations for
> > > distinct terms? If specific recommendations can be made for a term,
> > > should that be reflected in examples within the term definitions, or
> > > should such recommendations reside only in Type 3 supplementary
> > > documentation
> > such
> > > as that which can be found on the Darwin Core Project site at, for
> > > example,
> > >
> >
> https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequen
> > ces?
> > > Should some of these terms have specific recommendations to contain
> > > only single values (e.g., vernacularName), in which case they are
> > > not really viable in Simple Darwin Core?
> > >
> > > Cheers,
> > >
> > > John
> > > _______________________________________________
> > > tdwg-content mailing list
> > > tdwg-content at lists.tdwg.org
> > > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> > >
> > >
> > > _______________________________________________
> > > tdwg-content mailing list
> > > tdwg-content at lists.tdwg.org
> > > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> > >
> > > .
> > >
> > >
> > >
> > >
> > > --
> > > Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept.
> > > of Biological Sciences
> > >
> > > postal mail address:
> > > PMB 351634
> > > Nashville, TN  37235-1634,  U.S.A.
> > >
> > > delivery address:
> > > 2125 Stevenson Center
> > > 1161 21st Ave., S.
> > > Nashville, TN 37235
> > >
> > > office: 2128 Stevenson Center
> > > phone: (615) 343-4582,  fax: (615) 322-4942 If you fax, please phone
> > > or email so that I will know to look for it.
> > > http://bioimages.vanderbilt.edu
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> 
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content



More information about the tdwg-content mailing list