Who is using data scored like this - actually searching on and analyzing data that was scored to 'summer', 'autumn', 'early 19th century'

This thread seems to be a great deal about how to capture data from various historic sources but what are the use cases for exploitation of it? What is the cost benefit analysis of doing more than Stan suggested above,  a priori, in the hope that it will be more useful than just capturing a string plus a year that can analyzed later if they ever need to be - which they might not.

Just some thoughts,

Roger



Michael Lee wrote:
Hello all,

It seems to me that the VerbatimDate/EarliestDate/LatestDate are
helpful, but combine two separate issues:

1) duration of the event spanning more than one date/time unit
2) error associated with the date.

Is the distinction useful?  Probably depends on your application of the
date.

To handle these, VegBank uses three fields:

startDateTime
stopDateTime
dateTimeAccuracy
(note that there is NO verbatimDate)

StartDateTime is most precise known date/time for the beginning of an
event (in our case an observation of a vegetation plot) and the
EndDateTime is the end date/time (which could have the same value).
DateTimeAccuracy is a string with a list of values ranging from 1 second
to 1 day to 1 year to 5,10 years, etc.

start and stop deal with issue #1 above, duration (potentially) exceeding
usual date description precision.

The accuracy field deals with isue #2 above, error or uncertainty
regarding the dates.  It is this accuracy field that seems to be missing
from this discussion thus far.

As far as time zones go, error on the order of hours isn't TOO important
to us, so we don't worry about it too much.  But I think what we're aiming
to do is to store the date/time according to UTC.  This means that to
reconstruct what time the person saw something, the original time zone and
Daylight Savings Time would need to be known, but this isn't crucial to
us, either.

I like Lynn's list of weird dates, so I'll throw in what we'd do with them
(this of course being "interpreted" but as Rich says, that's almost always
the case):

PRE-1975
 Start:null ; End: 01-JAN-1975 ; Accuracy: null or possibly a large value
post-1992
 Start: 01-Jan-1992; End:null ; Accuracy: null or possibly a large value
summer 2001
 Start: 15-JUL-2001; End:15-JUL-2001 ; Accuracy: 1.5 months
Mid-1990s
 Start: 01-JUL-1995; End: 01-JUL-1995 ; Accuracy: 2 years
late 1950's
 Start: 01-JAN-1958; End: 01-JAN-1958; Accuracy: 2.5 years
Circa 1941
 Start: 01-JUL-1941; End:01-JUL-1941; Accuracy: 6 months
Early 1990s
 Start: 01-JAN-1992; End:01-JAN-1992; Accuracy: 2 years

Except for the "pre-Date" and "post-Date" I have assumed a relatively
short duration.  But this is interpreted.  The problems with this approach
I see are:
1) accuracy for start and stop dates could be different
2) accuracy is not a black and white idea.  It's often more of a gradient. Seems
illogical to specify 10 years as accuracy in the first 2 examples, as the
dates could be much more than 10 years away.
3) a closed list on accuracy seems to force you into decisions you don't
want to make.  But an open field is problematic as you might not always be
able to interpret it.  Perhaps 2 fields: DateAccuracyNumeric and
DateAccuracyUnits would help, or you could require the units to be
converted to years, with very tiny accuracies being really small decimal
numbers.
4) Doesn't tackle issues of 2 dates (May or Sept of 1995) all that well.


But it seems flexible enough for our purposes.  I wasn't part of the team
that designed it, so I'm not sure what alternatives they considered and
for what reasons they rejected any alternatives they considered.

cheers,
michael


----------------------------
Michael Lee
VegBank Project Manager
http://www.vegbank.org
----------------------------

On Mon, 27 Feb 2006, Hannu Saarenmaa wrote:

In my understanding the "Not interpreted" case would be just a text
string "DateTimeSourceText", repeating whatever is written in the label
about the dates and times.  This can be useful if questions arise of the
interpretation.

Btw., isn't he name of the element "EarliestDateCollected" in Darwin
Core v.1.4 proposal a bit misleading as it really is
"EarliestDateTimeCollected"?

Hannu

Richard Pyle wrote:

RE: CollectingDatesInterpreted
When would this field *not* be set to true?





--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------