Standards for date / time values?

Roger Hyam roger at TDWG.ORG
Mon Feb 27 17:56:41 CET 2006


Who is using data scored like this - actually searching on and analyzing
data that was scored to 'summer', 'autumn', 'early 19th century'

This thread seems to be a great deal about how to capture data from
various historic sources but what are the *use cases* for exploitation
of it? What is the cost benefit analysis of doing more than Stan
suggested above,  /a priori,/ in the hope that it will be more useful
than just capturing a string plus a year that can analyzed later if they
ever need to be - which they might not.

Just some thoughts,

Roger



Michael Lee wrote:
> Hello all,
>
> It seems to me that the VerbatimDate/EarliestDate/LatestDate are
> helpful, but combine two separate issues:
>
> 1) duration of the event spanning more than one date/time unit
> 2) error associated with the date.
>
> Is the distinction useful?  Probably depends on your application of the
> date.
>
> To handle these, VegBank uses three fields:
>
> startDateTime
> stopDateTime
> dateTimeAccuracy
> (note that there is NO verbatimDate)
>
> StartDateTime is most precise known date/time for the beginning of an
> event (in our case an observation of a vegetation plot) and the
> EndDateTime is the end date/time (which could have the same value).
> DateTimeAccuracy is a string with a list of values ranging from 1 second
> to 1 day to 1 year to 5,10 years, etc.
>
> start and stop deal with issue #1 above, duration (potentially) exceeding
> usual date description precision.
>
> The accuracy field deals with isue #2 above, error or uncertainty
> regarding the dates.  It is this accuracy field that seems to be missing
> from this discussion thus far.
>
> As far as time zones go, error on the order of hours isn't TOO important
> to us, so we don't worry about it too much.  But I think what we're
> aiming
> to do is to store the date/time according to UTC.  This means that to
> reconstruct what time the person saw something, the original time zone
> and
> Daylight Savings Time would need to be known, but this isn't crucial to
> us, either.
>
> I like Lynn's list of weird dates, so I'll throw in what we'd do with
> them
> (this of course being "interpreted" but as Rich says, that's almost
> always
> the case):
>
> PRE-1975
>  Start:null ; End: 01-JAN-1975 ; Accuracy: null or possibly a large value
> post-1992
>  Start: 01-Jan-1992; End:null ; Accuracy: null or possibly a large value
> summer 2001
>  Start: 15-JUL-2001; End:15-JUL-2001 ; Accuracy: 1.5 months
> Mid-1990s
>  Start: 01-JUL-1995; End: 01-JUL-1995 ; Accuracy: 2 years
> late 1950's
>  Start: 01-JAN-1958; End: 01-JAN-1958; Accuracy: 2.5 years
> Circa 1941
>  Start: 01-JUL-1941; End:01-JUL-1941; Accuracy: 6 months
> Early 1990s
>  Start: 01-JAN-1992; End:01-JAN-1992; Accuracy: 2 years
>
> Except for the "pre-Date" and "post-Date" I have assumed a relatively
> short duration.  But this is interpreted.  The problems with this
> approach
> I see are:
> 1) accuracy for start and stop dates could be different
> 2) accuracy is not a black and white idea.  It's often more of a
> gradient. Seems
> illogical to specify 10 years as accuracy in the first 2 examples, as the
> dates could be much more than 10 years away.
> 3) a closed list on accuracy seems to force you into decisions you don't
> want to make.  But an open field is problematic as you might not
> always be
> able to interpret it.  Perhaps 2 fields: DateAccuracyNumeric and
> DateAccuracyUnits would help, or you could require the units to be
> converted to years, with very tiny accuracies being really small decimal
> numbers.
> 4) Doesn't tackle issues of 2 dates (May or Sept of 1995) all that well.
>
>
> But it seems flexible enough for our purposes.  I wasn't part of the team
> that designed it, so I'm not sure what alternatives they considered and
> for what reasons they rejected any alternatives they considered.
>
> cheers,
> michael
>
>
> ----------------------------
> Michael Lee
> VegBank Project Manager
> http://www.vegbank.org
> ----------------------------
>
> On Mon, 27 Feb 2006, Hannu Saarenmaa wrote:
>
>> In my understanding the "Not interpreted" case would be just a text
>> string "DateTimeSourceText", repeating whatever is written in the label
>> about the dates and times.  This can be useful if questions arise of the
>> interpretation.
>>
>> Btw., isn't he name of the element "EarliestDateCollected" in Darwin
>> Core v.1.4 proposal a bit misleading as it really is
>> "EarliestDateTimeCollected"?
>>
>> Hannu
>>
>> Richard Pyle wrote:
>>
>>> RE: CollectingDatesInterpreted
>>> When would this field *not* be set to true?
>>>
>>
>


--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger at tdwg.org
 +44 1578 722782
-------------------------------------


--------------000907080801060507050101
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Who is using data scored like this - actually searching on and
analyzing data that was scored to 'summer', 'autumn', 'early 19th
century'<br>
<br>
This thread seems to be a great deal about how to capture data from
various historic sources but what are the <b>use cases</b> for
exploitation of it? What is the cost benefit analysis of doing more
than Stan suggested above,&nbsp; <i>a priori,</i> in the hope that it will
be more useful than just capturing a string plus a year that can
analyzed later if they ever need to be - which they might not.<br>
<br>
Just some thoughts,<br>
<br>
Roger<br>
<br>
<br>
<br>
Michael Lee wrote:
<blockquote
 cite="midPine.A41.4.63+UNC.0602271029100.30026 at login9.isis.unc.edu"
 type="cite">Hello all,
  <br>
  <br>
It seems to me that the VerbatimDate/EarliestDate/LatestDate are
  <br>
helpful, but combine two separate issues:
  <br>
  <br>
1) duration of the event spanning more than one date/time unit
  <br>
2) error associated with the date.
  <br>
  <br>
Is the distinction useful?&nbsp; Probably depends on your application of the
  <br>
date.
  <br>
  <br>
To handle these, VegBank uses three fields:
  <br>
  <br>
startDateTime
  <br>
stopDateTime
  <br>
dateTimeAccuracy
  <br>
(note that there is NO verbatimDate)
  <br>
  <br>
StartDateTime is most precise known date/time for the beginning of an
  <br>
event (in our case an observation of a vegetation plot) and the
  <br>
EndDateTime is the end date/time (which could have the same value).
  <br>
DateTimeAccuracy is a string with a list of values ranging from 1
second
  <br>
to 1 day to 1 year to 5,10 years, etc.
  <br>
  <br>
start and stop deal with issue #1 above, duration (potentially)
exceeding
  <br>
usual date description precision.
  <br>
  <br>
The accuracy field deals with isue #2 above, error or uncertainty
  <br>
regarding the dates.&nbsp; It is this accuracy field that seems to be
missing
  <br>
from this discussion thus far.
  <br>
  <br>
As far as time zones go, error on the order of hours isn't TOO
important
  <br>
to us, so we don't worry about it too much.&nbsp; But I think what we're
aiming
  <br>
to do is to store the date/time according to UTC.&nbsp; This means that to
  <br>
reconstruct what time the person saw something, the original time zone
and
  <br>
Daylight Savings Time would need to be known, but this isn't crucial to
  <br>
us, either.
  <br>
  <br>
I like Lynn's list of weird dates, so I'll throw in what we'd do with
them
  <br>
(this of course being "interpreted" but as Rich says, that's almost
always
  <br>
the case):
  <br>
  <br>
PRE-1975
  <br>
&nbsp;Start:null ; End: 01-JAN-1975 ; Accuracy: null or possibly a large
value
  <br>
post-1992
  <br>
&nbsp;Start: 01-Jan-1992; End:null ; Accuracy: null or possibly a large
value
  <br>
summer 2001
  <br>
&nbsp;Start: 15-JUL-2001; End:15-JUL-2001 ; Accuracy: 1.5 months
  <br>
Mid-1990s
  <br>
&nbsp;Start: 01-JUL-1995; End: 01-JUL-1995 ; Accuracy: 2 years
  <br>
late 1950's
  <br>
&nbsp;Start: 01-JAN-1958; End: 01-JAN-1958; Accuracy: 2.5 years
  <br>
Circa 1941
  <br>
&nbsp;Start: 01-JUL-1941; End:01-JUL-1941; Accuracy: 6 months
  <br>
Early 1990s
  <br>
&nbsp;Start: 01-JAN-1992; End:01-JAN-1992; Accuracy: 2 years
  <br>
  <br>
Except for the "pre-Date" and "post-Date" I have assumed a relatively
  <br>
short duration.&nbsp; But this is interpreted.&nbsp; The problems with this
approach
  <br>
I see are:
  <br>
1) accuracy for start and stop dates could be different
  <br>
2) accuracy is not a black and white idea.&nbsp; It's often more of a
gradient. Seems
  <br>
illogical to specify 10 years as accuracy in the first 2 examples, as
the
  <br>
dates could be much more than 10 years away.
  <br>
3) a closed list on accuracy seems to force you into decisions you
don't
  <br>
want to make.&nbsp; But an open field is problematic as you might not always
be
  <br>
able to interpret it.&nbsp; Perhaps 2 fields: DateAccuracyNumeric and
  <br>
DateAccuracyUnits would help, or you could require the units to be
  <br>
converted to years, with very tiny accuracies being really small
decimal
  <br>
numbers.
  <br>
4) Doesn't tackle issues of 2 dates (May or Sept of 1995) all that
well.
  <br>
  <br>
  <br>
But it seems flexible enough for our purposes.&nbsp; I wasn't part of the
team
  <br>
that designed it, so I'm not sure what alternatives they considered and
  <br>
for what reasons they rejected any alternatives they considered.
  <br>
  <br>
cheers,
  <br>
michael
  <br>
  <br>
  <br>
----------------------------
  <br>
Michael Lee
  <br>
VegBank Project Manager
  <br>
<a class="moz-txt-link-freetext" href="http://www.vegbank.org">http://www.vegbank.org</a>
  <br>
----------------------------
  <br>
  <br>
On Mon, 27 Feb 2006, Hannu Saarenmaa wrote:
  <br>
  <br>
  <blockquote type="cite">In my understanding the "Not interpreted"
case would be just a text
    <br>
string "DateTimeSourceText", repeating whatever is written in the label
    <br>
about the dates and times.&nbsp; This can be useful if questions arise of
the
    <br>
interpretation.
    <br>
    <br>
Btw., isn't he name of the element "EarliestDateCollected" in Darwin
    <br>
Core v.1.4 proposal a bit misleading as it really is
    <br>
"EarliestDateTimeCollected"?
    <br>
    <br>
Hannu
    <br>
    <br>
Richard Pyle wrote:
    <br>
    <br>
    <blockquote type="cite">RE: CollectingDatesInterpreted
      <br>
When would this field *not* be set to true?
      <br>
      <br>
    </blockquote>
    <br>
  </blockquote>
  <br>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>
 <a class="moz-txt-link-abbreviated" href="mailto:roger at tdwg.org">roger at tdwg.org</a>
 +44 1578 722782
-------------------------------------
</pre>
</body>
</html>


More information about the tdwg mailing list