Delimiters for Darwin Core list-type terms
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Markus,
I too would like to see a consistent recommendation, but the semi-colon is a problem, as is it connot unambiguously delimit URLs. We need a delimiter that can if we want a consistent recommendation.
I agree with you about the singular semantics of vernacularName, and happily it is currently defined that way. The typeStatus term is currently defined as being a list, however, so that one would require a change in definition, and some guidelines if the term is used in a Simple Darwin Core context.
Cheers,
John
On Mon, Oct 7, 2013 at 3:32 PM, Markus Döring m.doering@mac.com wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t... - it's the pipe "|". Either Darwin Core should go with this, or if there is a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
I second the motion. Steve, is there any documention on the logic behind the choice of delimiter for AC that will help people here?
On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
I am very supportive of this being at least recommended (and used consistently in documentation) for DWC. I understand Tim's and Hilmar's and Joel's and Bob's point(s), but we have HIGHLY normalized data, and we still find it very useful in some cases to pass delimited arrays within a single value (usually in the form of UUIDs, but also other things like full taxonomies and such). So if there are *ever* times when it's useful (and I believe it is), then a recommended "best practice" (and consistency in documentation) is a good thing. Maybe in the future we will no longer have any need to flatten data, but we're not at the future yet, so let's try to make the present (and near-term future) a bit easier to deal with.
I am also in strong favor of the pipe ("|") -- for the reasons that John and others mentioned, and also because to human eyeballs, it's more intuitive, I think -- and it's easier to detect programmatically when a value is an array of similar values separated by a rarely-used character (not that anyone *should* be trying to detect it programmatically, but reality does bite....).
I would like to go a step further and see a secondary standard delimiter defined (for key-value pairs). The obvious one would be "=", but it may appear in fields not as a delimiter. Thus, we have been using the tilde "~" for this secondary delimiter. However, I don't have a strong opinion -- I just would like to see consistency. The "uniqueness" of the secondary delimiter is less critical, because the probability of "=" showing up *within* a delimited value seems less likely. In other words, once you know it's a delimited list (by the presence of the primary delimiter), the risk of confusion on the secondary delimiter goes way down. But I'd still vote for tilde.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Monday, October 07, 2013 3:47 AM To: Steve Baskauf Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
I second the motion. Steve, is there any documention on the logic behind
the
choice of delimiter for AC that will help people here?
On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at
http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_ p
lain_text_values
- it's the pipe "|". Either Darwin Core should go with this, or if
there is a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the
recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation
such
as that which can be found on the Darwin Core Project site at, for example,
https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequen ces?
Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
One clarification on my comment RE: secondary delimiter. Key-value pairs is one reason why a secondary delimiter is helpful. Another situation (which we have from time to time -- even with our highly normalized data; indeed *because* of our highly normalized data), is a nested array. We've never needed more than two tiers of nesting, but having one tier has been extremely helpful in a couple of situations (which I'd be happy to explain, but if I did so, Tim would first need to go get a cup of coffee....or was it tea?)
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Monday, October 07, 2013 8:07 AM To: tuco@berkeley.edu; 'Steve Baskauf' Cc: 'TDWG Content Mailing List' Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
I am very supportive of this being at least recommended (and used consistently in documentation) for DWC. I understand Tim's and Hilmar's
and
Joel's and Bob's point(s), but we have HIGHLY normalized data, and we
still
find it very useful in some cases to pass delimited arrays within a single
value
(usually in the form of UUIDs, but also other things like full taxonomies
and
such). So if there are *ever* times when it's useful (and I believe it
is), then
a recommended "best practice" (and consistency in documentation) is a good thing. Maybe in the future we will no longer
have
any need to flatten data, but we're not at the future yet, so let's try to
make
the present (and near-term future) a bit easier to deal with.
I am also in strong favor of the pipe ("|") -- for the reasons that John
and
others mentioned, and also because to human eyeballs, it's more intuitive,
I
think -- and it's easier to detect programmatically when a value is an
array of
similar values separated by a rarely-used character (not that anyone *should* be trying to detect it programmatically, but reality does
bite....).
I would like to go a step further and see a secondary standard delimiter defined (for key-value pairs). The obvious one would be "=", but it may appear in fields not as a delimiter. Thus, we have been using the tilde
"~"
for this secondary delimiter. However, I don't have a strong opinion -- I
just
would like to see consistency. The "uniqueness" of the secondary
delimiter
is less critical, because the probability of "=" showing up *within* a delimited value seems less likely. In other words, once you
know
it's a delimited list (by the presence of the primary delimiter), the risk
of
confusion on the secondary delimiter goes way down. But I'd still vote for tilde.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Monday, October 07, 2013 3:47 AM To: Steve Baskauf Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
I second the motion. Steve, is there any documention on the logic behind
the
choice of delimiter for AC that will help people here?
On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at
http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_
p
lain_text_values
- it's the pipe "|". Either Darwin Core should go with this, or if
there is a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the
recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while
all
others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation
such
as that which can be found on the Darwin Core Project site at, for example,
https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequen
ces?
Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Rich, is there any other term apart from dynamicProperties that would use the KVP delimiter?
Markus
On 07.10.2013, at 20:07, Richard Pyle wrote:
I am very supportive of this being at least recommended (and used consistently in documentation) for DWC. I understand Tim's and Hilmar's and Joel's and Bob's point(s), but we have HIGHLY normalized data, and we still find it very useful in some cases to pass delimited arrays within a single value (usually in the form of UUIDs, but also other things like full taxonomies and such). So if there are *ever* times when it's useful (and I believe it is), then a recommended "best practice" (and consistency in documentation) is a good thing. Maybe in the future we will no longer have any need to flatten data, but we're not at the future yet, so let's try to make the present (and near-term future) a bit easier to deal with.
I am also in strong favor of the pipe ("|") -- for the reasons that John and others mentioned, and also because to human eyeballs, it's more intuitive, I think -- and it's easier to detect programmatically when a value is an array of similar values separated by a rarely-used character (not that anyone *should* be trying to detect it programmatically, but reality does bite....).
I would like to go a step further and see a secondary standard delimiter defined (for key-value pairs). The obvious one would be "=", but it may appear in fields not as a delimiter. Thus, we have been using the tilde "~" for this secondary delimiter. However, I don't have a strong opinion -- I just would like to see consistency. The "uniqueness" of the secondary delimiter is less critical, because the probability of "=" showing up *within* a delimited value seems less likely. In other words, once you know it's a delimited list (by the presence of the primary delimiter), the risk of confusion on the secondary delimiter goes way down. But I'd still vote for tilde.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Monday, October 07, 2013 3:47 AM To: Steve Baskauf Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
I second the motion. Steve, is there any documention on the logic behind
the
choice of delimiter for AC that will help people here?
On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at
http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_ p
lain_text_values
- it's the pipe "|". Either Darwin Core should go with this, or if
there is a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the
recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation
such
as that which can be found on the Darwin Core Project site at, for example,
https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequen ces?
Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Rich, is there any other term apart from dynamicProperties that would use the KVP delimiter?
Not that I am aware of. Although, as Tim pointed out, typeStatus might be an example of a nested array (though not shown in the example).
And I agree with your description of where we *ought* to be headed on this.
But the current reality is that there are terms in dwc that call for an array of values, and people will likely continue to have a need for such going forward. I personally would like to see some statement about "best practice" for delimiters (when needed) in the context of dwc, but I'm mostly indifferent as to whether this should be part of the standard, or simply an external recommended "best practice". Maybe not associated with individual terms, but as a general practice.
If not a statement about best practice, at the very least I think there should be consistency in the documentation, and let people glean what they will from that. I don't see any value in maintaining deliberate inconsistency in the documentation as an overt statement that "do whatever is best for your implementation".
In keeping with the original issue as John posted it (i.e., explicitly separating the question of whether multiple delimited values are a good or bad thing), I will maintain that I support consistency in documentation for the terms that already exist, and that I support the pipe (|) -- without spaces -- as the primary delimiter used in the documentation & examples. Moreover, in cases where a secondary delimiter is useful (even if only for dynamicProperties), my vote for the secondary delimiter is the tilde.
Aloha, Rich
I suspect any attempt to find a universal delimiter will be flakey at best - see Unicode character 1 as an example [1]. I would urge DwC to stop at only defining the concept of each term and leave it to the serialization formats, schema definitions, data models etc (e.g. DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of things.
If you were to design an XML schema you would use things like:
tim:identifications dwc:scientificNameA</dwc:scientificName> dwc:scientificNameB</dwc:scientificName> dwc:scientificNameC</dwc:scientificName> </tim:identifications>
and not:
tim:identifications dwc:scientificNameA|B|C</dwc:scientificName> </tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data structures, and trying to shoe-horn all data into flat structures (e.g. DwC-A). I think that is a dangerous path to go down, and makes things more difficult for both producers and consumers. Very quickly you will get into the situation where you will want to also suggest "well the element at index [0] of field X should be interpreted as the index [0] for field Y" (e.g. identifications and identification dates).
Cheers, Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t... - it's the pipe "|". Either Darwin Core should go with this, or if there is a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Well said. Totally agree. -hilmar
On Oct 7, 2013, at 10:02 AM, Tim Robertson [GBIF] wrote:
I suspect any attempt to find a universal delimiter will be flakey at best - see Unicode character 1 as an example [1]. I would urge DwC to stop at only defining the concept of each term and leave it to the serialization formats, schema definitions, data models etc (e.g. DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of things.
If you were to design an XML schema you would use things like:
tim:identifications dwc:scientificNameA</dwc:scientificName> dwc:scientificNameB</dwc:scientificName> dwc:scientificNameC</dwc:scientificName> </tim:identifications>
and not:
tim:identifications dwc:scientificNameA|B|C</dwc:scientificName> </tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data structures, and trying to shoe-horn all data into flat structures (e.g. DwC-A). I think that is a dangerous path to go down, and makes things more difficult for both producers and consumers. Very quickly you will get into the situation where you will want to also suggest "well the element at index [0] of field X should be interpreted as the index [0] for field Y" (e.g. identifications and identification dates).
Cheers, Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t... - it's the pipe "|". Either Darwin Core should go with this, or if there is a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I kind of expected it was futile to make the plea "Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here." I had to try.
There are definitely more rigorous ways to share the information than in concatenated lists. The "list" terms are just as Tim describes, an attempt to share in a flat data structure data that do not fit well in a flat structure, but are nevertheless of common interest. There probably shouldn't be an expectation that one could process the content of such fields and derive individual values, as we can't even get simple content under control yet (see http://soyouthinkyoucandigitize.wordpress.com/2013/07/18/data-diversity-of-t...).
Nevertheless, these terms do exist, and they expect lists, and people are using them in distinct ways that make them a challenge to process. It would be nice to give guidance. I have no problem if that guidance stays out of the term definitions, but we have a legacy problem of definitions that tell us that the content should consist of a delimited list.
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best - see Unicode character 1 as an example [1]. I would urge DwC to stop at only defining the concept of each term and leave it to the serialization formats, schema definitions, data models etc (e.g. DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of things.
If you were to design an XML schema you would use things like:
tim:identifications dwc:scientificNameA</dwc:scientificName> dwc:scientificNameB</dwc:scientificName> dwc:scientificNameC</dwc:scientificName> </tim:identifications>
and not:
tim:identifications dwc:scientificNameA|B|C</dwc:scientificName> </tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data structures, and trying to shoe-horn all data into flat structures (e.g. DwC-A). I think that is a dangerous path to go down, and makes things more difficult for both producers and consumers. Very quickly you will get into the situation where you will want to also suggest "well the element at index [0] of field X should be interpreted as the index [0] for field Y" (e.g. identifications and identification dates).
Cheers, Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I kind of expected it was futile to make the plea "Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here." I had to try.
Sorry John, I fell into your trap.
Surely though, talking about solutions to problems rather than the problem itself is the _worst_ option available to us?
Consider the likes of this list term http://rs.tdwg.org/dwc/terms/#typeStatus The description suggests a separated and concatenated list but the example (unless I misunderstand) is showing only 1 list item which is a triplet of "type + author + pub" in a human readable form. This one field is actually suggesting a structure of a repeatable triplet, so need 2 delimiters if machines are to extract the scientific name for the typification . Perhaps these terms are really just verbatim text blocks intended for human consumption (which is fine with me, and we don't need to define delimiters)? Or perhaps we should be discussing terms to atomize them further (e.g. introduce dwc:typeName and dwc:typePublication)?
If we are heading this way though, can I also suggest we consider declaring the expected ordering on lists where omitted? The likes of http://rs.tdwg.org/dwc/terms/#higherGeography doesn't have one whereas http://rs.tdwg.org/dwc/terms/#higherGeography does.
There are definitely more rigorous ways to share the information than in concatenated lists. The "list" terms are just as Tim describes, an attempt to share in a flat data structure data that do not fit well in a flat structure, but are nevertheless of common interest. There probably shouldn't be an expectation that one could process the content of such fields and derive individual values, as we can't even get simple content under control yet (see http://soyouthinkyoucandigitize.wordpress.com/2013/07/18/data-diversity-of-t...).
Nevertheless, these terms do exist, and they expect lists, and people are using them in distinct ways that make them a challenge to process. It would be nice to give guidance. I have no problem if that guidance stays out of the term definitions, but we have a legacy problem of definitions that tell us that the content should consist of a delimited list.
Other than deprecating and redefining as new concepts (terms), I don't see any robust way I am afraid. Some things are just not meant to be denormalized.
Cheers, Tim
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best - see Unicode character 1 as an example [1]. I would urge DwC to stop at only defining the concept of each term and leave it to the serialization formats, schema definitions, data models etc (e.g. DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of things.
If you were to design an XML schema you would use things like:
tim:identifications dwc:scientificNameA</dwc:scientificName> dwc:scientificNameB</dwc:scientificName> dwc:scientificNameC</dwc:scientificName> </tim:identifications>
and not:
tim:identifications dwc:scientificNameA|B|C</dwc:scientificName> </tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data structures, and trying to shoe-horn all data into flat structures (e.g. DwC-A). I think that is a dangerous path to go down, and makes things more difficult for both producers and consumers. Very quickly you will get into the situation where you will want to also suggest "well the element at index [0] of field X should be interpreted as the index [0] for field Y" (e.g. identifications and identification dates).
Cheers, Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Mon, Oct 7, 2013 at 4:54 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I kind of expected it was futile to make the plea "Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here." I had to try.
Sorry John, I fell into your trap.
:-) You were not alone.
Surely though, talking about solutions to problems rather than the problem itself is the _worst_ option available to us?
I just wanted to keep the issues separate and focus on the one submitted, for the very reason that it will otherwise get to broad and contentious to provide any solutions at all. I don't much like spending energy when that is the likely outcome. The process more often seem to yield results when it is kept simple. And in this case, having a better recommendation does not set us in any worse position than we are now. No one presented an issue to the tracker recommending the deprecation of all "list" terms".
Consider the likes of this list term http://rs.tdwg.org/dwc/terms/#typeStatus The description suggests a separated and concatenated list but the example (unless I misunderstand) is showing only 1 list item which is a triplet of "type + author + pub" in a human readable form. This one field is actually suggesting a structure of a repeatable triplet, so need 2 delimiters if machines are to extract the scientific name for the typification . Perhaps these terms are really just verbatim text blocks intended for human consumption (which is fine with me, and we don't need to define delimiters)? Or perhaps we should be discussing terms to atomize them further (e.g. introduce dwc:typeName and dwc:typePublication)?
Yes, the example gives only one typeStatus entry, not a list. Yes, one can argue that the content mixes concepts if those distinct concepts are of interest. A look at the history of typeStatus will reveal that it has its origins deep in the Darwin Core history, and no one has yet suggested that it should be other than what it is. Another item for the issue tracker if anyone wants to defend a change.
If we are heading this way though, can I also suggest we consider declaring the expected ordering on lists where omitted? The likes of http://rs.tdwg.org/dwc/terms/#higherGeography doesn't have one whereas http://rs.tdwg.org/dwc/terms/#higherGeography does.
Same example. Did you mean http://rs.tdwg.org/dwc/terms/#higherClassification? It would be a fine thing to amend the recommendation for higherGeography to suggests the ordering. If anyone seconds the motion I'll create an issue for it.
There are definitely more rigorous ways to share the information than in concatenated lists. The "list" terms are just as Tim describes, an attempt to share in a flat data structure data that do not fit well in a flat structure, but are nevertheless of common interest. There probably shouldn't be an expectation that one could process the content of such fields and derive individual values, as we can't even get simple content under control yet (see http://soyouthinkyoucandigitize.wordpress.com/2013/07/18/data-diversity-of-t...).
Nevertheless, these terms do exist, and they expect lists, and people are using them in distinct ways that make them a challenge to process. It would be nice to give guidance. I have no problem if that guidance stays out of the term definitions, but we have a legacy problem of definitions that tell us that the content should consist of a delimited list.
Other than deprecating and redefining as new concepts (terms), I don't see any robust way I am afraid. Some things are just not meant to be denormalized.
That would be a fine conclusion as well, if we can get consensus. I would then just add secondary documentation saying "Beware all ye who enter (data) here."
Cheers, Tim
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best -
see Unicode character 1 as an example [1].
I would urge DwC to stop at only defining the concept of each term and leave
it to the serialization formats, schema definitions, data models etc (e.g.
DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of
things.
If you were to design an XML schema you would use things like:
dwc:scientificNameA</dwc:scientificName>
dwc:scientificNameB</dwc:scientificName>
dwc:scientificNameC</dwc:scientificName>
</tim:identifications>
and not:
dwc:scientificNameA|B|C</dwc:scientificName>
</tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data
structures, and trying to shoe-horn all data into flat structures (e.g.
DwC-A). I think that is a dangerous path to go down, and makes things more
difficult for both producers and consumers. Very quickly you will get into
the situation where you will want to also suggest "well the element at index
[0] of field X should be interpreted as the index [0] for field Y" (e.g.
identifications and identification dates).
Cheers,
Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but
I think it would be beneficial for there to be consistency between Darwin
Core and Audubon Core. You can see what the recommendation is for Audubon
Core at
http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before
it is ratified, which potentially could happen in a matter of weeks. It is
highly likely that there will be records that are a mixture of AC and DwC,
so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the
semicolon as its natural and hardly used in values.
For dwc archives there is a multiValueDelimiter attribute for every term
mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you
can just test for some often used ones but even then you never know if they
were meant to be delimiters.
Having a single default value helps to get the idea of multi values across
and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term
as it is mostly useful in combination with a locale and rarely is shared on
its own.
Seeing dwc:typeStatus being a multi value term also feels wrong as the name
is in singluar while the others carry the multi value nature in the name
already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general
concern about terms that could or do recommend the concatenation and
delimiting of a list of values. The specific issue was submitted on
the Darwin Core Project site at
https://code.google.com/p/darwincore/issues/detail?id=168. Right now
there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the
following (use the index at
http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
details of each of these):
informationWithheld
dataGeneralizations
dynamicProperties
recordedBy
preparations
otherCatalogNumbers
previousIdentifications
associatedMedia
associatedReferences
associatedOccurrences
associatedSequences
associatedTaxa
higherGeography
georeferenceSources
typeStatus
higherClassification
vernacularName
There are some issues. Many terms do not show examples. Most of those
that do show examples recommend semi-colon (';') -
associatedOccurrences, recordedBy, preparations, otherCatalogNumbers,
previousIdentifications, higherGeography, georeferenceSources, and
higherClassification, The example for higherClassification does not
have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that
would be an invalid part of a URL unless it was escaped. This
precludes comma (','), semi-colon (';'), and colon (':'), among
others. One possibility here might be the vertical bar or "pipe"
('|').
The term dynamicProperties is meant to take key-value pairs. The
examples suggest the format key=value, with any list delimited by a
semi-colon, for example, "tragusLengthInMeters=0.014;
weightInGrams=120". The example for associatedTaxa also shows a
key-value pair ("host: Quercus alba"), but it is formatted differently
from the examples for dynamicProperties. There are other terms, such
as vernacularName, which could potentially also take a key-value pair,
though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a
good idea or not - that is not the issue we're trying to resolve here.
Instead, the issue is whether a consistent recommendation can be made
for how to delimit the values in a list. And if not a consistent
recommendation, can we make specific recommendations for distinct
terms? If specific recommendations can be made for a term, should that
be reflected in examples within the term definitions, or should such
recommendations reside only in Type 3 supplementary documentation such
as that which can be found on the Darwin Core Project site at, for
example,
https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences?
Should some of these terms have specific recommendations to contain
only single values (e.g., vernacularName), in which case they are not
really viable in Simple Darwin Core?
Cheers,
John
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
On Mon, 7 Oct 2013, John Wieczorek wrote:
On Mon, Oct 7, 2013 at 4:54 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I kind of expected it was futile to make the plea "Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here." I had to try.
Sorry John, I fell into your trap.
:-) You were not alone.
Surely though, talking about solutions to problems rather than the problem itself is the _worst_ option available to us?
I just wanted to keep the issues separate and focus on the one submitted, for the very reason that it will otherwise get to broad and contentious to provide any solutions at all. I don't much like spending energy when that is the likely outcome. The process more often seem to yield results when it is kept simple. And in this case, having a better recommendation does not set us in any worse position than we are now. No one presented an issue to the tracker recommending the deprecation of all "list" terms".
I do plan on submitting an issue to the tracker. It won't be to deprecate the terms, but, as Tim suggests to change the definitions so that the recommendations on how to deal with multiple values is left to the various representation guides (text, xml, rdf). (If anyone else wants to submit this issue first, please go ahead.)
I'm glad that Tim fell into your trap, as it raises an important issue, withour precluding Darwin Core (through the representation guides) from providing consistent guidance on these terms.
Joel.
Consider the likes of this list term http://rs.tdwg.org/dwc/terms/#typeStatus The description suggests a separated and concatenated list but the example (unless I misunderstand) is showing only 1 list item which is a triplet of "type + author + pub" in a human readable form. This one field is actually suggesting a structure of a repeatable triplet, so need 2 delimiters if machines are to extract the scientific name for the typification . Perhaps these terms are really just verbatim text blocks intended for human consumption (which is fine with me, and we don't need to define delimiters)? Or perhaps we should be discussing terms to atomize them further (e.g. introduce dwc:typeName and dwc:typePublication)?
Yes, the example gives only one typeStatus entry, not a list. Yes, one can argue that the content mixes concepts if those distinct concepts are of interest. A look at the history of typeStatus will reveal that it has its origins deep in the Darwin Core history, and no one has yet suggested that it should be other than what it is. Another item for the issue tracker if anyone wants to defend a change.
If we are heading this way though, can I also suggest we consider declaring the expected ordering on lists where omitted? The likes of http://rs.tdwg.org/dwc/terms/#higherGeography doesn't have one whereas http://rs.tdwg.org/dwc/terms/#higherGeography does.
Same example. Did you mean http://rs.tdwg.org/dwc/terms/#higherClassification? It would be a fine thing to amend the recommendation for higherGeography to suggests the ordering. If anyone seconds the motion I'll create an issue for it.
There are definitely more rigorous ways to share the information than in concatenated lists. The "list" terms are just as Tim describes, an attempt to share in a flat data structure data that do not fit well in a flat structure, but are nevertheless of common interest. There probably shouldn't be an expectation that one could process the content of such fields and derive individual values, as we can't even get simple content under control yet (see http://soyouthinkyoucandigitize.wordpress.com/2013/07/18/data-diversity-of-t...).
Nevertheless, these terms do exist, and they expect lists, and people are using them in distinct ways that make them a challenge to process. It would be nice to give guidance. I have no problem if that guidance stays out of the term definitions, but we have a legacy problem of definitions that tell us that the content should consist of a delimited list.
Other than deprecating and redefining as new concepts (terms), I don't see any robust way I am afraid. Some things are just not meant to be denormalized.
That would be a fine conclusion as well, if we can get consensus. I would then just add secondary documentation saying "Beware all ye who enter (data) here."
Cheers, Tim
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best -
see Unicode character 1 as an example [1].
I would urge DwC to stop at only defining the concept of each term and leave
it to the serialization formats, schema definitions, data models etc (e.g.
DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of
things.
If you were to design an XML schema you would use things like:
dwc:scientificNameA</dwc:scientificName>
dwc:scientificNameB</dwc:scientificName>
dwc:scientificNameC</dwc:scientificName>
</tim:identifications>
and not:
dwc:scientificNameA|B|C</dwc:scientificName>
</tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data
structures, and trying to shoe-horn all data into flat structures (e.g.
DwC-A). I think that is a dangerous path to go down, and makes things more
difficult for both producers and consumers. Very quickly you will get into
the situation where you will want to also suggest "well the element at index
[0] of field X should be interpreted as the index [0] for field Y" (e.g.
identifications and identification dates).
Cheers,
Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but
I think it would be beneficial for there to be consistency between Darwin
Core and Audubon Core. You can see what the recommendation is for Audubon
Core at
http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before
it is ratified, which potentially could happen in a matter of weeks. It is
highly likely that there will be records that are a mixture of AC and DwC,
so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the
semicolon as its natural and hardly used in values.
For dwc archives there is a multiValueDelimiter attribute for every term
mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you
can just test for some often used ones but even then you never know if they
were meant to be delimiters.
Having a single default value helps to get the idea of multi values across
and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term
as it is mostly useful in combination with a locale and rarely is shared on
its own.
Seeing dwc:typeStatus being a multi value term also feels wrong as the name
is in singluar while the others carry the multi value nature in the name
already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general
concern about terms that could or do recommend the concatenation and
delimiting of a list of values. The specific issue was submitted on
the Darwin Core Project site at
https://code.google.com/p/darwincore/issues/detail?id=168. Right now
there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the
following (use the index at
http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
details of each of these):
informationWithheld
dataGeneralizations
dynamicProperties
recordedBy
preparations
otherCatalogNumbers
previousIdentifications
associatedMedia
associatedReferences
associatedOccurrences
associatedSequences
associatedTaxa
higherGeography
georeferenceSources
typeStatus
higherClassification
vernacularName
There are some issues. Many terms do not show examples. Most of those
that do show examples recommend semi-colon (';') -
associatedOccurrences, recordedBy, preparations, otherCatalogNumbers,
previousIdentifications, higherGeography, georeferenceSources, and
higherClassification, The example for higherClassification does not
have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that
would be an invalid part of a URL unless it was escaped. This
precludes comma (','), semi-colon (';'), and colon (':'), among
others. One possibility here might be the vertical bar or "pipe"
('|').
The term dynamicProperties is meant to take key-value pairs. The
examples suggest the format key=value, with any list delimited by a
semi-colon, for example, "tragusLengthInMeters=0.014;
weightInGrams=120". The example for associatedTaxa also shows a
key-value pair ("host: Quercus alba"), but it is formatted differently
from the examples for dynamicProperties. There are other terms, such
as vernacularName, which could potentially also take a key-value pair,
though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a
good idea or not - that is not the issue we're trying to resolve here.
Instead, the issue is whether a consistent recommendation can be made
for how to delimit the values in a list. And if not a consistent
recommendation, can we make specific recommendations for distinct
terms? If specific recommendations can be made for a term, should that
be reflected in examples within the term definitions, or should such
recommendations reside only in Type 3 supplementary documentation such
as that which can be found on the Darwin Core Project site at, for
example,
https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences?
Should some of these terms have specific recommendations to contain
only single values (e.g., vernacularName), in which case they are not
really viable in Simple Darwin Core?
Cheers,
John
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
+1 to remove multi value recommendations from the main DwC definitions and leave it to the implementations to deal with lists if needed.
As many terms currently are in plural we can easily create new terms for single values. I am not entirely convinced though that these terms are very useful if they combine various properties into an unstructured string. In these cases it might be better to define several single value terms instead. A quick attempt to create single value terms:
Terms currently in plural form: ---------------------------------------- dataGeneralizations -> dataGeneralization
dynamicProperties -> dynamicProperty
preparations -> preparation
associatedSequences -> associatedSequence
georeferenceSources -> georeferenceSource
associatedReferences -> associatedReference (id and/or citation string ?)
otherCatalogNumbers -> otherCatalogNumber (needed at all if there is already catalogNumber ?)
previousIdentifications -> previousIdentification (combines various identification properties into human string. Redefine as just the previously identified scientificName or deprecate entirely ?)
associatedOccurrences -> associatedOccurrence (combines occurrenceID with relation type. Maybe just associatedOccurrenceID ? Or just deprecate it in favor of the ResourceRelationship terms ?)
associatedTaxa -> associatedTaxon (combines taxonID or scientificName with associationType)
Terms already in singular form. ------------------------------------ Can we redefine those terms or do we need to create new ones?
typeStatus # seems to combine any property about typification/type designation right now, although I have mostly seen a single status so far. The discussion page recommends values for the "status portion of the content". How about restricting the term to this status / kind of type?
vernacularName # I have not seen anyone using this as a list. Is anyone aware of such a case? Might not cause too much trouble to redefine
recordedBy
associatedMedia
higherClassification # isn't a single classification always a list?
higherGeography
informationWithheld
Markus
On 07.10.2013, at 18:15, joel sachs wrote:
On Mon, 7 Oct 2013, John Wieczorek wrote:
On Mon, Oct 7, 2013 at 4:54 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I kind of expected it was futile to make the plea "Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here." I had to try.
Sorry John, I fell into your trap.
:-) You were not alone.
Surely though, talking about solutions to problems rather than the problem itself is the _worst_ option available to us?
I just wanted to keep the issues separate and focus on the one submitted, for the very reason that it will otherwise get to broad and contentious to provide any solutions at all. I don't much like spending energy when that is the likely outcome. The process more often seem to yield results when it is kept simple. And in this case, having a better recommendation does not set us in any worse position than we are now. No one presented an issue to the tracker recommending the deprecation of all "list" terms".
I do plan on submitting an issue to the tracker. It won't be to deprecate the terms, but, as Tim suggests to change the definitions so that the recommendations on how to deal with multiple values is left to the various representation guides (text, xml, rdf). (If anyone else wants to submit this issue first, please go ahead.)
I'm glad that Tim fell into your trap, as it raises an important issue, withour precluding Darwin Core (through the representation guides) from providing consistent guidance on these terms.
Joel.
Consider the likes of this list term http://rs.tdwg.org/dwc/terms/#typeStatus The description suggests a separated and concatenated list but the example (unless I misunderstand) is showing only 1 list item which is a triplet of "type + author + pub" in a human readable form. This one field is actually suggesting a structure of a repeatable triplet, so need 2 delimiters if machines are to extract the scientific name for the typification . Perhaps these terms are really just verbatim text blocks intended for human consumption (which is fine with me, and we don't need to define delimiters)? Or perhaps we should be discussing terms to atomize them further (e.g. introduce dwc:typeName and dwc:typePublication)?
Yes, the example gives only one typeStatus entry, not a list. Yes, one can argue that the content mixes concepts if those distinct concepts are of interest. A look at the history of typeStatus will reveal that it has its origins deep in the Darwin Core history, and no one has yet suggested that it should be other than what it is. Another item for the issue tracker if anyone wants to defend a change.
If we are heading this way though, can I also suggest we consider declaring the expected ordering on lists where omitted? The likes of http://rs.tdwg.org/dwc/terms/#higherGeography doesn't have one whereas http://rs.tdwg.org/dwc/terms/#higherGeography does.
Same example. Did you mean http://rs.tdwg.org/dwc/terms/#higherClassification? It would be a fine thing to amend the recommendation for higherGeography to suggests the ordering. If anyone seconds the motion I'll create an issue for it.
There are definitely more rigorous ways to share the information than in concatenated lists. The "list" terms are just as Tim describes, an attempt to share in a flat data structure data that do not fit well in a flat structure, but are nevertheless of common interest. There probably shouldn't be an expectation that one could process the content of such fields and derive individual values, as we can't even get simple content under control yet (see http://soyouthinkyoucandigitize.wordpress.com/2013/07/18/data-diversity-of-t...).
Nevertheless, these terms do exist, and they expect lists, and people are using them in distinct ways that make them a challenge to process. It would be nice to give guidance. I have no problem if that guidance stays out of the term definitions, but we have a legacy problem of definitions that tell us that the content should consist of a delimited list.
Other than deprecating and redefining as new concepts (terms), I don't see any robust way I am afraid. Some things are just not meant to be denormalized.
That would be a fine conclusion as well, if we can get consensus. I would then just add secondary documentation saying "Beware all ye who enter (data) here."
Cheers, Tim
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best -
see Unicode character 1 as an example [1].
I would urge DwC to stop at only defining the concept of each term and leave
it to the serialization formats, schema definitions, data models etc (e.g.
DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of
things.
If you were to design an XML schema you would use things like:
dwc:scientificNameA</dwc:scientificName>
dwc:scientificNameB</dwc:scientificName>
dwc:scientificNameC</dwc:scientificName>
</tim:identifications>
and not:
dwc:scientificNameA|B|C</dwc:scientificName>
</tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data
structures, and trying to shoe-horn all data into flat structures (e.g.
DwC-A). I think that is a dangerous path to go down, and makes things more
difficult for both producers and consumers. Very quickly you will get into
the situation where you will want to also suggest "well the element at index
[0] of field X should be interpreted as the index [0] for field Y" (e.g.
identifications and identification dates).
Cheers,
Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but
I think it would be beneficial for there to be consistency between Darwin
Core and Audubon Core. You can see what the recommendation is for Audubon
Core at
http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before
it is ratified, which potentially could happen in a matter of weeks. It is
highly likely that there will be records that are a mixture of AC and DwC,
so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the
semicolon as its natural and hardly used in values.
For dwc archives there is a multiValueDelimiter attribute for every term
mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you
can just test for some often used ones but even then you never know if they
were meant to be delimiters.
Having a single default value helps to get the idea of multi values across
and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term
as it is mostly useful in combination with a locale and rarely is shared on
its own.
Seeing dwc:typeStatus being a multi value term also feels wrong as the name
is in singluar while the others carry the multi value nature in the name
already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general
concern about terms that could or do recommend the concatenation and
delimiting of a list of values. The specific issue was submitted on
the Darwin Core Project site at
https://code.google.com/p/darwincore/issues/detail?id=168. Right now
there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the
following (use the index at
http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
details of each of these):
informationWithheld
dataGeneralizations
dynamicProperties
recordedBy
preparations
otherCatalogNumbers
previousIdentifications
associatedMedia
associatedReferences
associatedOccurrences
associatedSequences
associatedTaxa
higherGeography
georeferenceSources
typeStatus
higherClassification
vernacularName
There are some issues. Many terms do not show examples. Most of those
that do show examples recommend semi-colon (';') -
associatedOccurrences, recordedBy, preparations, otherCatalogNumbers,
previousIdentifications, higherGeography, georeferenceSources, and
higherClassification, The example for higherClassification does not
have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that
would be an invalid part of a URL unless it was escaped. This
precludes comma (','), semi-colon (';'), and colon (':'), among
others. One possibility here might be the vertical bar or "pipe"
('|').
The term dynamicProperties is meant to take key-value pairs. The
examples suggest the format key=value, with any list delimited by a
semi-colon, for example, "tragusLengthInMeters=0.014;
weightInGrams=120". The example for associatedTaxa also shows a
key-value pair ("host: Quercus alba"), but it is formatted differently
from the examples for dynamicProperties. There are other terms, such
as vernacularName, which could potentially also take a key-value pair,
though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a
good idea or not - that is not the issue we're trying to resolve here.
Instead, the issue is whether a consistent recommendation can be made
for how to delimit the values in a list. And if not a consistent
recommendation, can we make specific recommendations for distinct
terms? If specific recommendations can be made for a term, should that
be reflected in examples within the term definitions, or should such
recommendations reside only in Type 3 supplementary documentation such
as that which can be found on the Darwin Core Project site at, for
example,
https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences?
Should some of these terms have specific recommendations to contain
only single values (e.g., vernacularName), in which case they are not
really viable in Simple Darwin Core?
Cheers,
John
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I don't want to put the cart before the horse here because the Darwin Core RDF Guide has not been formally introduced as an addition to Darwin Core (it's waiting until John W. feels the time is right to do so in the context of dealing with the various issues he's been working through). But it is in the queue with recommendation of the RDF Task Group to adopt and can be viewed online. I mention this because it contains a specific recommendation for how to deal with terms that have multiple values in a concatenated list. See http://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposal#2.5.1_Definition_... for the details.
In a nutshell, the guide establishes the convention that the existing dwc: namespace terms be used with literals which are formatted as described in the existing standard (e.g. a delineated, concatenated list). It creates new versions of the terms (in a new namespace dwcuri:) which are intended to be repeatable and to have single values which are URI references.
I am hesitant to bring this up because the guide has not been formally introduced nor has a 30 day discussion period been declared. But I think in light of this discussion, it is important for people to know that the guide does address this issue in the context of RDF.
Steve
Markus Döring wrote:
+1 to remove multi value recommendations from the main DwC definitions and leave it to the implementations to deal with lists if needed.
As many terms currently are in plural we can easily create new terms for single values. I am not entirely convinced though that these terms are very useful if they combine various properties into an unstructured string. In these cases it might be better to define several single value terms instead. A quick attempt to create single value terms:
Terms currently in plural form:
dataGeneralizations -> dataGeneralization
dynamicProperties -> dynamicProperty
preparations -> preparation
associatedSequences -> associatedSequence
georeferenceSources -> georeferenceSource
associatedReferences -> associatedReference (id and/or citation string ?)
otherCatalogNumbers -> otherCatalogNumber (needed at all if there is already catalogNumber ?)
previousIdentifications -> previousIdentification (combines various identification properties into human string. Redefine as just the previously identified scientificName or deprecate entirely ?)
associatedOccurrences -> associatedOccurrence (combines occurrenceID with relation type. Maybe just associatedOccurrenceID ? Or just deprecate it in favor of the ResourceRelationship terms ?)
associatedTaxa -> associatedTaxon (combines taxonID or scientificName with associationType)
Terms already in singular form.
Can we redefine those terms or do we need to create new ones?
typeStatus # seems to combine any property about typification/type designation right now, although I have mostly seen a single status so far. The discussion page recommends values for the "status portion of the content". How about restricting the term to this status / kind of type?
vernacularName # I have not seen anyone using this as a list. Is anyone aware of such a case? Might not cause too much trouble to redefine
recordedBy
associatedMedia
higherClassification # isn't a single classification always a list?
higherGeography
informationWithheld
Markus
On 07.10.2013, at 18:15, joel sachs wrote:
On Mon, 7 Oct 2013, John Wieczorek wrote:
On Mon, Oct 7, 2013 at 4:54 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I kind of expected it was futile to make the plea "Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here." I had to try.
Sorry John, I fell into your trap.
:-) You were not alone.
Surely though, talking about solutions to problems rather than the problem itself is the _worst_ option available to us?
I just wanted to keep the issues separate and focus on the one submitted, for the very reason that it will otherwise get to broad and contentious to provide any solutions at all. I don't much like spending energy when that is the likely outcome. The process more often seem to yield results when it is kept simple. And in this case, having a better recommendation does not set us in any worse position than we are now. No one presented an issue to the tracker recommending the deprecation of all "list" terms".
I do plan on submitting an issue to the tracker. It won't be to deprecate the terms, but, as Tim suggests to change the definitions so that the recommendations on how to deal with multiple values is left to the various representation guides (text, xml, rdf). (If anyone else wants to submit this issue first, please go ahead.)
I'm glad that Tim fell into your trap, as it raises an important issue, withour precluding Darwin Core (through the representation guides) from providing consistent guidance on these terms.
Joel.
Consider the likes of this list term http://rs.tdwg.org/dwc/terms/#typeStatus The description suggests a separated and concatenated list but the example (unless I misunderstand) is showing only 1 list item which is a triplet of "type + author + pub" in a human readable form. This one field is actually suggesting a structure of a repeatable triplet, so need 2 delimiters if machines are to extract the scientific name for the typification . Perhaps these terms are really just verbatim text blocks intended for human consumption (which is fine with me, and we don't need to define delimiters)? Or perhaps we should be discussing terms to atomize them further (e.g. introduce dwc:typeName and dwc:typePublication)?
Yes, the example gives only one typeStatus entry, not a list. Yes, one can argue that the content mixes concepts if those distinct concepts are of interest. A look at the history of typeStatus will reveal that it has its origins deep in the Darwin Core history, and no one has yet suggested that it should be other than what it is. Another item for the issue tracker if anyone wants to defend a change.
If we are heading this way though, can I also suggest we consider declaring the expected ordering on lists where omitted? The likes of http://rs.tdwg.org/dwc/terms/#higherGeography doesn't have one whereas http://rs.tdwg.org/dwc/terms/#higherGeography does.
Same example. Did you mean http://rs.tdwg.org/dwc/terms/#higherClassification? It would be a fine thing to amend the recommendation for higherGeography to suggests the ordering. If anyone seconds the motion I'll create an issue for it.
There are definitely more rigorous ways to share the information than in concatenated lists. The "list" terms are just as Tim describes, an attempt to share in a flat data structure data that do not fit well in a flat structure, but are nevertheless of common interest. There probably shouldn't be an expectation that one could process the content of such fields and derive individual values, as we can't even get simple content under control yet (see http://soyouthinkyoucandigitize.wordpress.com/2013/07/18/data-diversity-of-t...).
Nevertheless, these terms do exist, and they expect lists, and people are using them in distinct ways that make them a challenge to process. It would be nice to give guidance. I have no problem if that guidance stays out of the term definitions, but we have a legacy problem of definitions that tell us that the content should consist of a delimited list.
Other than deprecating and redefining as new concepts (terms), I don't see any robust way I am afraid. Some things are just not meant to be denormalized.
That would be a fine conclusion as well, if we can get consensus. I would then just add secondary documentation saying "Beware all ye who enter (data) here."
Cheers, Tim
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best -
see Unicode character 1 as an example [1].
I would urge DwC to stop at only defining the concept of each term and leave
it to the serialization formats, schema definitions, data models etc (e.g.
DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of
things.
If you were to design an XML schema you would use things like:
dwc:scientificNameA</dwc:scientificName>
dwc:scientificNameB</dwc:scientificName>
dwc:scientificNameC</dwc:scientificName>
</tim:identifications>
and not:
dwc:scientificNameA|B|C</dwc:scientificName>
</tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data
structures, and trying to shoe-horn all data into flat structures (e.g.
DwC-A). I think that is a dangerous path to go down, and makes things more
difficult for both producers and consumers. Very quickly you will get into
the situation where you will want to also suggest "well the element at index
[0] of field X should be interpreted as the index [0] for field Y" (e.g.
identifications and identification dates).
Cheers,
Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but
I think it would be beneficial for there to be consistency between Darwin
Core and Audubon Core. You can see what the recommendation is for Audubon
Core at
http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before
it is ratified, which potentially could happen in a matter of weeks. It is
highly likely that there will be records that are a mixture of AC and DwC,
so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the
semicolon as its natural and hardly used in values.
For dwc archives there is a multiValueDelimiter attribute for every term
mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you
can just test for some often used ones but even then you never know if they
were meant to be delimiters.
Having a single default value helps to get the idea of multi values across
and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term
as it is mostly useful in combination with a locale and rarely is shared on
its own.
Seeing dwc:typeStatus being a multi value term also feels wrong as the name
is in singluar while the others carry the multi value nature in the name
already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general
concern about terms that could or do recommend the concatenation and
delimiting of a list of values. The specific issue was submitted on
the Darwin Core Project site at
https://code.google.com/p/darwincore/issues/detail?id=168. Right now
there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the
following (use the index at
http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the
details of each of these):
informationWithheld
dataGeneralizations
dynamicProperties
recordedBy
preparations
otherCatalogNumbers
previousIdentifications
associatedMedia
associatedReferences
associatedOccurrences
associatedSequences
associatedTaxa
higherGeography
georeferenceSources
typeStatus
higherClassification
vernacularName
There are some issues. Many terms do not show examples. Most of those
that do show examples recommend semi-colon (';') -
associatedOccurrences, recordedBy, preparations, otherCatalogNumbers,
previousIdentifications, higherGeography, georeferenceSources, and
higherClassification, The example for higherClassification does not
have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that
would be an invalid part of a URL unless it was escaped. This
precludes comma (','), semi-colon (';'), and colon (':'), among
others. One possibility here might be the vertical bar or "pipe"
('|').
The term dynamicProperties is meant to take key-value pairs. The
examples suggest the format key=value, with any list delimited by a
semi-colon, for example, "tragusLengthInMeters=0.014;
weightInGrams=120". The example for associatedTaxa also shows a
key-value pair ("host: Quercus alba"), but it is formatted differently
from the examples for dynamicProperties. There are other terms, such
as vernacularName, which could potentially also take a key-value pair,
though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a
good idea or not - that is not the issue we're trying to resolve here.
Instead, the issue is whether a consistent recommendation can be made
for how to delimit the values in a list. And if not a consistent
recommendation, can we make specific recommendations for distinct
terms? If specific recommendations can be made for a term, should that
be reflected in examples within the term definitions, or should such
recommendations reside only in Type 3 supplementary documentation such
as that which can be found on the Darwin Core Project site at, for
example,
https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences?
Should some of these terms have specific recommendations to contain
only single values (e.g., vernacularName), in which case they are not
really viable in Simple Darwin Core?
Cheers,
John
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list
tdwg-content@lists.tdwg.org
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
On Mon, 7 Oct 2013, John Wieczorek wrote:
I kind of expected it was futile to make the plea "Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here." I had to try.
There are definitely more rigorous ways to share the information than in concatenated lists. The "list" terms are just as Tim describes, an attempt to share in a flat data structure data that do not fit well in a flat structure, but are nevertheless of common interest. There probably shouldn't be an expectation that one could process the content of such fields and derive individual values, as we can't even get simple content under control yet (see http://soyouthinkyoucandigitize.wordpress.com/2013/07/18/data-diversity-of-t...).
Nevertheless, these terms do exist, and they expect lists, and people are using them in distinct ways that make them a challenge to process. It would be nice to give guidance. I have no problem if that guidance stays out of the term definitions, but we have a legacy problem of definitions that tell us that the content should consist of a delimited list.
This shouldn't be a problem. If we were to cut and paste the prefixing phrase "A list (concatenated and separated) of ..." from the term definitions to the text guide, then any currently compliant legacy spreadsheet data would remain compliant. (For an example of such a definition, see the "Discussion" page for http://terms.gbif.org/wiki/dwc:recordedBy)
As for legacy XML ... is there any? (If so, it's likely that, as Tim points out, it already complies with the not-yet-formally-proposed new definitions.)
As for legacy RDF ... as far as I know, Steve Baskauf is the only provider that this would affect, and he would likely welcome the change, as it would simplify the Darwin Core RDF Guide.
Best, Joel.
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best - see Unicode character 1 as an example [1]. I would urge DwC to stop at only defining the concept of each term and leave it to the serialization formats, schema definitions, data models etc (e.g. DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of things.
If you were to design an XML schema you would use things like:
tim:identifications dwc:scientificNameA</dwc:scientificName> dwc:scientificNameB</dwc:scientificName> dwc:scientificNameC</dwc:scientificName> </tim:identifications>
and not:
tim:identifications dwc:scientificNameA|B|C</dwc:scientificName> </tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data structures, and trying to shoe-horn all data into flat structures (e.g. DwC-A). I think that is a dangerous path to go down, and makes things more difficult for both producers and consumers. Very quickly you will get into the situation where you will want to also suggest "well the element at index [0] of field X should be interpreted as the index [0] for field Y" (e.g. identifications and identification dates).
Cheers, Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Your suspicion coincides with my experience.
I dispute that your arguments are the entrapment warned against. Independent of whether it's a good idea at all, there is some risk that flattening strategies in one context may not be consistent witht those in another. That may not be critical, but it should at least be something that is approached with eyes wide open. The example I argued with Gregor was about an early decision for his usage of flattening the only place Audubon Core is not flat. In that place there is sort of some sub properties (I oversimplify) for which he wants to make new properties whose name is "ab" where a is the name of first and b of the second property. I resisted doing that before the AC adoption in part because I thought it is part of a bigger problem.
In the current case, consider the possibly special(?) issues
1. What is meant by two successive occurrences of the separator? 2. What is meant by the two successive occurrences of the separator with only whitespace between them?
On Mon, Oct 7, 2013 at 4:02 PM, Tim Robertson [GBIF] trobertson@gbif.org wrote:
I suspect any attempt to find a universal delimiter will be flakey at best - see Unicode character 1 as an example [1]. I would urge DwC to stop at only defining the concept of each term and leave it to the serialization formats, schema definitions, data models etc (e.g. DwC-A, XML, RDF, JSON, HTML, excel templates etc) to define those kind of things.
If you were to design an XML schema you would use things like:
tim:identifications dwc:scientificNameA</dwc:scientificName> dwc:scientificNameB</dwc:scientificName> dwc:scientificNameC</dwc:scientificName> </tim:identifications>
and not:
tim:identifications dwc:scientificNameA|B|C</dwc:scientificName> </tim:identifications>
I don't think it wise for the DwC standard to suggest anyone should.
I suspect this request stems from those working with denormalized data structures, and trying to shoe-horn all data into flat structures (e.g. DwC-A). I think that is a dangerous path to go down, and makes things more difficult for both producers and consumers. Very quickly you will get into the situation where you will want to also suggest "well the element at index [0] of field X should be interpreted as the index [0] for field Y" (e.g. identifications and identification dates).
Cheers, Tim
[1] http://www.fileformat.info/info/unicode/char/1f/index.htm
On Oct 7, 2013, at 3:45 PM, Steve Baskauf wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I'm amenable to anything and would probably prefer that it be DwC-wide (at least) and that AC follow that
The present guidance is: 'Some AC terms permit values that are lists to be represented as plain text. The choice of how to separate list items is ultimately left to the implementers of AC. Typical usage is to choose a punctuation mark such as ",", ";", or "|". In these cases a special escape syntax needs to be defined for cases in which the separator is part of the metadata value. Unfortunately, even for standard list formats like CSV, different software packages choose different escape methods, hindering interchange. In the absence of an implementation-specific choice we recommend to use "|" as separator and "|" as an escaped vertical bar.'
As of 4 Nov 2011, the AC draft said: 'Some AC terms permit values that are lists to be represented as plain text. The choice of how to separate list items is necessarily left to the implementors of AC. For example, an XML implementation of AC might choose to use standard XML container methods whereas an implementer of a spreadsheet version, in which cells may contain lists, might specify a punctuation mark, e.g. comma, and supply some special escape syntax for use when the comma is part of the metadata value. An implementation might even make different such choices depending on the term involved, the languages supported, etc.'
The current placed there in the Revision as of 09:32, 10 October 2012 made by Gregor. I don't recall why, though it does specify default guidance, which the previous draft did not. Both took the position that the issue should be left to implementers of serializations, and I doubt the wisdom of that now, if only because AC includes so much DwC, some of it by reference (namely the geography terminology). The possibility that an AC implementer is allowed to provide their own separator choice for native AC terms but deal with the consequences of this mismatch in the borrowed terms seems daunting. All that said, there are \other/ borrowed terms in AC and \those/ might have yet different conventions. (This is a little true for DwC also, but mainly for dc and possibly (alas) for dcterms. This point may have been made in the discussion, but if so, I wasn't following closely enough.)
Bob
On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
For what it's worth, I seem to be arguing contradictory positions in alternate posts....
On Mon, Oct 7, 2013 at 11:08 PM, Bob Morris morris.bob@gmail.com wrote:
I'm amenable to anything and would probably prefer that it be DwC-wide (at least) and that AC follow that
The present guidance is: 'Some AC terms permit values that are lists to be represented as plain text. The choice of how to separate list items is ultimately left to the implementers of AC. Typical usage is to choose a punctuation mark such as ",", ";", or "|". In these cases a special escape syntax needs to be defined for cases in which the separator is part of the metadata value. Unfortunately, even for standard list formats like CSV, different software packages choose different escape methods, hindering interchange. In the absence of an implementation-specific choice we recommend to use "|" as separator and "|" as an escaped vertical bar.'
As of 4 Nov 2011, the AC draft said: 'Some AC terms permit values that are lists to be represented as plain text. The choice of how to separate list items is necessarily left to the implementors of AC. For example, an XML implementation of AC might choose to use standard XML container methods whereas an implementer of a spreadsheet version, in which cells may contain lists, might specify a punctuation mark, e.g. comma, and supply some special escape syntax for use when the comma is part of the metadata value. An implementation might even make different such choices depending on the term involved, the languages supported, etc.'
The current placed there in the Revision as of 09:32, 10 October 2012 made by Gregor. I don't recall why, though it does specify default guidance, which the previous draft did not. Both took the position that the issue should be left to implementers of serializations, and I doubt the wisdom of that now, if only because AC includes so much DwC, some of it by reference (namely the geography terminology). The possibility that an AC implementer is allowed to provide their own separator choice for native AC terms but deal with the consequences of this mismatch in the borrowed terms seems daunting. All that said, there are \other/ borrowed terms in AC and \those/ might have yet different conventions. (This is a little true for DwC also, but mainly for dc and possibly (alas) for dcterms. This point may have been made in the discussion, but if so, I wasn't following closely enough.)
Bob
On Mon, Oct 7, 2013 at 3:45 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
I don't have an opinion about what the recommended delimiter should be, but I think it would be beneficial for there to be consistency between Darwin Core and Audubon Core. You can see what the recommendation is for Audubon Core at http://terms.gbif.org/wiki/Audubon_Core_%281.0_normative%29#Lists_of_plain_t...
- it's the pipe "|". Either Darwin Core should go with this, or if there is
a consensus reached here that is different, then AC should be changed before it is ratified, which potentially could happen in a matter of weeks. It is highly likely that there will be records that are a mixture of AC and DwC, so it would not be a good thing for the recommendations to differ.
Steve
Markus Döring wrote:
Hi John et al.,
I would like to see a single recommended default delimiter, preferrably the semicolon as its natural and hardly used in values. For dwc archives there is a multiValueDelimiter attribute for every term mapping that allows to declare other delimiters if needed.
Currently it is hardly possible to detect multi values in a field and you can just test for some often used ones but even then you never know if they were meant to be delimiters. Having a single default value helps to get the idea of multi values across and make it a bit more accessible I believe.
dwc:vernacularName I would personally prefer to see as a single value term as it is mostly useful in combination with a locale and rarely is shared on its own. Seeing dwc:typeStatus being a multi value term also feels wrong as the name is in singluar while the others carry the multi value nature in the name already.
Markus
n 07.10.2013, at 12:28, John Wieczorek wrote:
Dear all,
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
The Darwin Core terms that could be used to hold lists include the following (use the index at http://rs.tdwg.org/dwc/terms/index.htm#theterms to find and see the details of each of these):
informationWithheld dataGeneralizations dynamicProperties recordedBy preparations otherCatalogNumbers previousIdentifications associatedMedia associatedReferences associatedOccurrences associatedSequences associatedTaxa higherGeography georeferenceSources typeStatus higherClassification vernacularName
There are some issues. Many terms do not show examples. Most of those that do show examples recommend semi-colon (';') - associatedOccurrences, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, higherGeography, georeferenceSources, and higherClassification, The example for higherClassification does not have spaces after the semi-colon while all others do.
Terms that could hold a list of URLs would require a delimiter that would be an invalid part of a URL unless it was escaped. This precludes comma (','), semi-colon (';'), and colon (':'), among others. One possibility here might be the vertical bar or "pipe" ('|').
The term dynamicProperties is meant to take key-value pairs. The examples suggest the format key=value, with any list delimited by a semi-colon, for example, "tragusLengthInMeters=0.014; weightInGrams=120". The example for associatedTaxa also shows a key-value pair ("host: Quercus alba"), but it is formatted differently from the examples for dynamicProperties. There are other terms, such as vernacularName, which could potentially also take a key-value pair, though it is not currently recommended to be a list.
Please ignore the issue of whether the idea of list-type terms is a good idea or not - that is not the issue we're trying to resolve here. Instead, the issue is whether a consistent recommendation can be made for how to delimit the values in a list. And if not a consistent recommendation, can we make specific recommendations for distinct terms? If specific recommendations can be made for a term, should that be reflected in examples within the term definitions, or should such recommendations reside only in Type 3 supplementary documentation such as that which can be found on the Darwin Core Project site at, for example, https://code.google.com/p/darwincore/wiki/Occurrence#associatedSequences? Should some of these terms have specific recommendations to contain only single values (e.g., vernacularName), in which case they are not really viable in Simple Darwin Core?
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Robert A. Morris
Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390
Filtered Push Project Harvard University Herbaria Harvard University
email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://wiki.filteredpush.org http://www.cs.umb.edu/~ram === The content of this communication is made entirely on my own behalf and in no way should be deemed to express official positions of The University of Massachusetts at Boston or Harvard University.
Dear all,
There has been some momentum toward solutions to the problems as originally presented. I would like to summarize what I perceive as the middle ground between them and ask anyone with dissenting opinions to please express them in reply.
a) Make all of the changes in the term definitions rather than in type 3 documentation.
b) Retain unchanged the definitions of terms on the list I gave that are already described without reference to being a list (informationWitheld, dataGeneralizations, vernacularName).
c) For terms that are designed to contain lists (dynamicProperties, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, associatedMedia, associatedReferences, associatedOccurrences, associatedSequences, associatedTaxa, higherGeography, georeferencedBy, georeferenceSources, identifiedBy, identificationReferences, typeStatus, higherClassification, measurementDeterminedBy), recommend in the term definition that the list be delimited with a vertical bar '|' and no white space. Please note that I missed some terms from this category in my original list.
Here is an example of what such a definition might look like:
higherClassification: "A list (concatenated and separated) of taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record. Recommended best practice is to order the list starting with the highest rank and separate the names for each rank with a vertical bar ('|')."
d) For terms in the list in c), above, include one example with a single value (where appropriate) and one example with a list showing the concatenation with the delimiter.
Here is an example of what such a comment might look like:
georeferencedBy: "Example: "Brad Millen (ROM)", "Kristina Yamamoto (MVZ)|Janet Fang (MVZ)".
e) For higherGeography, recommend an order for the terms in the list from least specific to most specific.
f) For associatedTaxa, change the comment to use a format equivalent to that for dynamicProperties.
g) Defer other concerns with specific terms. If these persists beyond the current conversation, I encourage you to submit a Term Definition issue in the issue tracker (https://code.google.com/p/darwincore/issues/list).
Cheers,
John
On Tue, 8 Oct 2013, John Wieczorek wrote:
Dear all,
There has been some momentum toward solutions to the problems as originally presented. I would like to summarize what I perceive as the middle ground between them and ask anyone with dissenting opinions to please express them in reply.
a) Make all of the changes in the term definitions rather than in type 3 documentation.
b) Retain unchanged the definitions of terms on the list I gave that are already described without reference to being a list (informationWitheld, dataGeneralizations, vernacularName).
c) For terms that are designed to contain lists (dynamicProperties, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, associatedMedia, associatedReferences, associatedOccurrences, associatedSequences, associatedTaxa, higherGeography, georeferencedBy, georeferenceSources, identifiedBy, identificationReferences, typeStatus, higherClassification, measurementDeterminedBy), recommend in the term definition that the list be delimited with a vertical bar '|' and no white space.
John,
I didn't see any objection to the suggestion made by Tim that the recommendation to use a concatenated list be moved from the term definitions to the Text Guide. Did I miss it?
Thanks, Joel.
Please note that I missed some terms from this category in my original list.
Here is an example of what such a definition might look like:
higherClassification: "A list (concatenated and separated) of taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record. Recommended best practice is to order the list starting with the highest rank and separate the names for each rank with a vertical bar ('|')."
d) For terms in the list in c), above, include one example with a single value (where appropriate) and one example with a list showing the concatenation with the delimiter.
Here is an example of what such a comment might look like:
georeferencedBy: "Example: "Brad Millen (ROM)", "Kristina Yamamoto (MVZ)|Janet Fang (MVZ)".
e) For higherGeography, recommend an order for the terms in the list from least specific to most specific.
f) For associatedTaxa, change the comment to use a format equivalent to that for dynamicProperties.
g) Defer other concerns with specific terms. If these persists beyond the current conversation, I encourage you to submit a Term Definition issue in the issue tracker (https://code.google.com/p/darwincore/issues/list).
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Oct 8, 2013, at 1:19 PM, joel sachs wrote:
c) For terms that are designed to contain lists (dynamicProperties, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, associatedMedia, associatedReferences, associatedOccurrences, associatedSequences, associatedTaxa, higherGeography, georeferencedBy, georeferenceSources, identifiedBy, identificationReferences, typeStatus, higherClassification, measurementDeterminedBy), recommend in the term definition that the list be delimited with a vertical bar '|' and no white space.
This should not be in the term definition, IMO. Tim laid out the arguments really well. This is a matter for different ways of applying DwC to concrete use-cases. If it's a flattened out spreadsheet, then the recommendation could be the above. But as a part of the normative vocabulary term definitions, this would be bad - it would essentially render DwC at odds with common and best practices in XML, XSD, and RDF.
-hilmar
John, +1 from me for the changes you suggest.
If this does not become part of the term definitions (and I understand the arguments made by Joel and Hilmar), I think users of Simple Darwin Core would greatly benefit from a separate list of terms WITH those recommendations and examples. I always go to http://rs.tdwg.org/dwc/terms/index.htm as a source for guidance, but a more specific list might be in order, for example http://rs.gbif.org/core/dwc_occurrence.xml (ideally maintained/endorsed by TDWG).
Cheers,
Peter
________________________________________ Van: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] namens joel sachs [jsachs@csee.umbc.edu] Verzonden: dinsdag 8 oktober 2013 19:19 Aan: John Wieczorek CC: TDWG Content Mailing List Onderwerp: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
On Tue, 8 Oct 2013, John Wieczorek wrote:
Dear all,
There has been some momentum toward solutions to the problems as originally presented. I would like to summarize what I perceive as the middle ground between them and ask anyone with dissenting opinions to please express them in reply.
a) Make all of the changes in the term definitions rather than in type 3 documentation.
b) Retain unchanged the definitions of terms on the list I gave that are already described without reference to being a list (informationWitheld, dataGeneralizations, vernacularName).
c) For terms that are designed to contain lists (dynamicProperties, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, associatedMedia, associatedReferences, associatedOccurrences, associatedSequences, associatedTaxa, higherGeography, georeferencedBy, georeferenceSources, identifiedBy, identificationReferences, typeStatus, higherClassification, measurementDeterminedBy), recommend in the term definition that the list be delimited with a vertical bar '|' and no white space.
John,
I didn't see any objection to the suggestion made by Tim that the recommendation to use a concatenated list be moved from the term definitions to the Text Guide. Did I miss it?
Thanks, Joel.
Please note that I missed some terms from this category in my original list.
Here is an example of what such a definition might look like:
higherClassification: "A list (concatenated and separated) of taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record. Recommended best practice is to order the list starting with the highest rank and separate the names for each rank with a vertical bar ('|')."
d) For terms in the list in c), above, include one example with a single value (where appropriate) and one example with a list showing the concatenation with the delimiter.
Here is an example of what such a comment might look like:
georeferencedBy: "Example: "Brad Millen (ROM)", "Kristina Yamamoto (MVZ)|Janet Fang (MVZ)".
e) For higherGeography, recommend an order for the terms in the list from least specific to most specific.
f) For associatedTaxa, change the comment to use a format equivalent to that for dynamicProperties.
g) Defer other concerns with specific terms. If these persists beyond the current conversation, I encourage you to submit a Term Definition issue in the issue tracker (https://code.google.com/p/darwincore/issues/list).
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
I concur with Peter, and as I am focused on implementation, I need one place to go to get the definitions and examples. It may even be possible to integrate AppleCore into a resource like http://rs.gbif.org/core/dwc_occurrence.xml with "tags" for botanical implementation, which avoids having to reference yet another site and may lead to better adoption...
Best, JAmes
On Thu, Oct 10, 2013 at 8:33 AM, DESMET, Peter Peter.DESMET@inbo.be wrote:
John, +1 from me for the changes you suggest.
If this does not become part of the term definitions (and I understand the arguments made by Joel and Hilmar), I think users of Simple Darwin Core would greatly benefit from a separate list of terms WITH those recommendations and examples. I always go to http://rs.tdwg.org/dwc/terms/index.htm as a source for guidance, but a more specific list might be in order, for example http://rs.gbif.org/core/dwc_occurrence.xml (ideally maintained/endorsed by TDWG).
Cheers,
Peter
Van: tdwg-content-bounces@lists.tdwg.org [ tdwg-content-bounces@lists.tdwg.org] namens joel sachs [ jsachs@csee.umbc.edu] Verzonden: dinsdag 8 oktober 2013 19:19 Aan: John Wieczorek CC: TDWG Content Mailing List Onderwerp: Re: [tdwg-content] Delimiters for Darwin Core list-type terms
On Tue, 8 Oct 2013, John Wieczorek wrote:
Dear all,
There has been some momentum toward solutions to the problems as originally presented. I would like to summarize what I perceive as the middle ground between them and ask anyone with dissenting opinions to please express them in reply.
a) Make all of the changes in the term definitions rather than in type 3 documentation.
b) Retain unchanged the definitions of terms on the list I gave that are already described without reference to being a list (informationWitheld, dataGeneralizations, vernacularName).
c) For terms that are designed to contain lists (dynamicProperties, recordedBy, preparations, otherCatalogNumbers, previousIdentifications, associatedMedia, associatedReferences, associatedOccurrences, associatedSequences, associatedTaxa, higherGeography, georeferencedBy, georeferenceSources, identifiedBy, identificationReferences, typeStatus, higherClassification, measurementDeterminedBy), recommend in the term definition that the list be delimited with a vertical bar '|' and no white space.
John,
I didn't see any objection to the suggestion made by Tim that the recommendation to use a concatenated list be moved from the term definitions to the Text Guide. Did I miss it?
Thanks, Joel.
Please note that I missed some terms from this category in my original list.
Here is an example of what such a definition might look like:
higherClassification: "A list (concatenated and separated) of taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record. Recommended best practice is to order the list starting with the highest rank and separate the names for each rank with a vertical bar ('|')."
d) For terms in the list in c), above, include one example with a single value (where appropriate) and one example with a list showing the concatenation with the delimiter.
Here is an example of what such a comment might look like:
georeferencedBy: "Example: "Brad Millen (ROM)", "Kristina Yamamoto (MVZ)|Janet Fang (MVZ)".
e) For higherGeography, recommend an order for the terms in the list from least specific to most specific.
f) For associatedTaxa, change the comment to use a format equivalent to that for dynamicProperties.
g) Defer other concerns with specific terms. If these persists beyond the current conversation, I encourage you to submit a Term Definition issue in the issue tracker (https://code.google.com/p/darwincore/issues/list).
Cheers,
John _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
- * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Mon, 7 Oct 2013 12:28:44 +0200 John Wieczorek tuco@berkeley.edu wrote:
On the list of pending Darwin Core issues is a topic of general concern about terms that could or do recommend the concatenation and delimiting of a list of values. The specific issue was submitted on the Darwin Core Project site at https://code.google.com/p/darwincore/issues/detail?id=168. Right now there is variation in the recommendations of distinct terms.
I concur that standardization of possible delimiters in the documentation would be a good thing.
Point of information: The AppleCore guidance specifies, in a number of places, the following guidance for a delimiter when a field contains a list of values: 'Separate with "|" or ";"', and in one place (preparations) 'separated by ";" or "|"'.
-Paul
participants (11)
-
Bob Morris
-
DESMET, Peter
-
Hilmar Lapp
-
James Macklin
-
joel sachs
-
John Wieczorek
-
Markus Döring
-
Paul J. Morris
-
Richard Pyle
-
Steve Baskauf
-
Tim Robertson [GBIF]