Well said. Totally agree. -hilmar

On Oct 7, 2013, at 10:02 AM, Tim Robertson [GBIF] wrote:

I suspect any attempt to find a universal delimiter will be flakey at best - see Unicode character 1 as an example [1].    
I would urge DwC to stop at only defining the concept of each term and leave it to the serialization formats, schema definitions, data models etc (e.g. DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of things.

If you were to design an XML schema you would use things like:

<tim:identifications>
  <dwc:scientificName>A</dwc:scientificName>
  <dwc:scientificName>B</dwc:scientificName>
  <dwc:scientificName>C</dwc:scientificName>
</tim:identifications>

and not:

<tim:identifications>
  <dwc:scientificName>A|B|C</dwc:scientificName>
</tim:identifications>

I don't think it wise for the DwC standard to suggest anyone should.

I suspect this request stems from those working with denormalized data structures, and trying to shoe-horn all data into flat structures (e.g. DwC-A).  I think that is a dangerous path to go down, and makes things more difficult for both producers and consumers.  Very quickly you will get into the situation where you will want to also suggest "well the element at index [0] of field X should be interpreted as the index [0] for field Y" (e.g. identifications and identification dates).  

Cheers,
Tim

[1] http://www.fileformat.info/info/unicode/char/1f/index.htm




On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:

I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core.  You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_text_values - it's the pipe "|".  Either Darwin Core should go with this, or if there is a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks.  It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.

Steve

Markus Döring wrote:
Hi John et al.,

I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. 
For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.

Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters.
Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.

dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own.
Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already. 


Markus



n 07.10.2013, at 12:28, John Wieczorek wrote:

  
Dear all,

On the list of pending Darwin Core issues is a topic of general
concern about terms that could or do recommend the concatenation and
delimiting of a list of values. The specific issue was submitted on
the Darwin Core Project site at
https://code.google.com/p/darwincore/issues/detail?id=168. Right now
there is variation in the recommendations of distinct terms.

The Darwin Core terms that could be used to hold lists include the
following (use the index at
http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
details of each of these):

informationWithheld
dataGeneralizations
dynamicProperties
recordedBy
preparations
otherCatalogNumbers
previousIdentifications
associatedMedia
associatedReferences
associatedOccurrences
associatedSequences
associatedTaxa
higherGeography
georeferenceSources
typeStatus
higherClassification
vernacularName

There are some issues. Many terms do not show examples. Most of those
that do show examples recommend semi-colon (';') -
associatedOccurrences, recordedBy, preparations, otherCatalogNumbers,
previousIdentifications, higherGeography, georeferenceSources, and
higherClassification, The example for higherClassification does not
have spaces after the semi-colon while all others do.

Terms that could hold a list of URLs would require a delimiter that
would be an invalid part of a URL unless it was escaped. This
precludes comma (','), semi-colon (';'), and colon (':'), among
others. One possibility here might be the vertical bar or "pipe"
('|').

The term dynamicProperties is meant to take key-value pairs. The
examples suggest the format key=value, with any list delimited by a
semi-colon, for example, "tragusLengthInMeters=0.014;
weightInGrams=120". The example for associatedTaxa also shows a
key-value pair ("host: Quercus alba"), but it is formatted differently
from the examples for dynamicProperties. There are other terms, such
as vernacularName, which could potentially also take a key-value pair,
though it is not currently recommended to be a list.

Please ignore the issue of whether the idea of list-type terms is a
good idea or not - that is not the issue we're trying to resolve here.
Instead, the issue is whether a consistent recommendation can be made
for how to delimit the values in a list. And if not a consistent
recommendation, can we make specific recommendations for distinct
terms? If specific recommendations can be made for a term, should that
be reflected in examples within the term definitions, or should such
recommendations reside only in Type 3 supplementary documentation such
as that which can be found on the Darwin Core Project site at, for
example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences?
Should some of these terms have specific recommendations to contain
only single values (e.g., vernacularName), in which case they are not
really viable in Simple Darwin Core?

Cheers,

John
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
    

_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

.

  

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================