Creation/Modification times and Revision Numbers

Wed Aug 16 10:30:27 CEST 2000

At 12:25 AM 8/16/00 -0600, Stuart G. Poss wrote:
>Bryan Heidorn wrote:
>
>> What I am concerned about with the revision number is easing processing for
>> www
>> spiders and for composite documents created from lower level
>> documents/treatments. www spiders search the network for new documents to
>> index. If the documents are dynamically created "on demand", they are in
some
>> way different every time the spider looks, if nothing else because the
>> "creation date" will change to always be now. We need something more stable
>> like editorial date... the last time a qualified human said the record was
OK.
>
>Yes.  Each "atomic element" must have a "date last modified [by authority]"
>attribute for the reason you suggest.
>
>However, would not the spec also need to require that the "spider", or
>specifically those processing instructions that define the "behavior" or
result
>set produced by the spider or other collation [transformation] programs
>(methods), also leave a distinguishing revision number in the document so
that
>one can determine what specific method was used?  One might expect different
>behaviors even for slight changes to the processing (compilation) code
used to
>assemble the consitutent [atomic] elements of the document to influence the
>document content.
I was thinking the document type definition would specify a method for
generating a field. Kevin had an example of calculating a mean or something
like that. That does mean that there need to be a set of predefined methods
for
value calculation and collection but I do not think that we need to be
exhaustive in the spec. The spec can just say that you should identify your
method. If it is all object oriented as Bob suggests it might be real
executable code. I think that is too much to ask for the first rev of the spec
but it should be an option. As Stuart suggests, the resulting value generated
by a method at a particular time needs to be tagged with information about
when
and how it was generated. The attribution fields might do part of that.
>
>Even though there may be transient lack of availability of some distributed
>resources (perhaps due to network failure, loss of a particular server,
server
>load, etc.] , the behavior of the "spider" also could influence the document
>content [atomic element inclusion], albeit independently.  This would be an
issue
>for "higher level" associations (documents) that are created by automatically
>assembling other "documents" that are themselves constructed from "atomic
>elements".  Consequently, requiring the "spider" [read element inclusion
>processing instructions] to leave an identifying mark in the document would
>provide an ability to distinguish the source of the potential differences in
the
>final dynamic document.
ditto
>
>>
>>
>> The same problem exists for higher level taxon descriptions that are
created
>> from lower level descriptions. There was a short thread about this a while
ago
>> on this list. A good feature of dynamic creation of the descriptions is
that
>> when a low level detail changes the higher level summary information also
>> automatically changes. The "oops, that specimen was actually a different
>> species" syndrome, may cause a character state recording some value to
change
>> (because the outlier was recognized as another species.)  We do not want to
>> have to recalculate and verify all atomic facts whenever someone looks at a
>> high level description.
>
>Yes, I fully agree that the lower [atomic] level of detail must force changes
at
>higher more associative/inclusive levels precisely for the reason you
suggest.
>
>However, within an environment where the data may be distributed there may be
>sources of error that may be generated when "key" atomic elements are not
found
>(server load to great, network failure, etc) or perhaps only older versions
can
>be located (only mirrors available, unflushed cache, etc).  Consequently, to
>assess the logic needed to [possibly further] process or interpret a
document's
>contents would it not be useful to also have a means to pass knowledge of
what
>methods were used to assemble the document?    Presumably, the behaviors of
the
>processing instructions (document generating methods) will influence the
>circumstances under which some kinds of atomic element processing
instructions
>(methods) will fail while others may succeed, say in making subsequent
changes
>"automatically" or "recognizing" the existence of another species.
>
>Don't we need a spec that includes [requires?] a means to pass knowledge
about
>the kinds of associations that were made to generate the document, as well as
the
>lower level "atomic elements"and their attribution, if we are to avoid
>recalculation to fundamental atomic level at each stage in the transformation
>(document building) process?  If documents were characterized in XML, this
could
>make it much easier to chain XML transformations by faciliating the
recognition
>of what order of transformations were permissible.  This could also greatly
>simplify the modification of processor (method) behavior and the
>development/maintenance of efficient interprocess communication. A document
that
>was validated by successfully being transformed at one lower level of
processing
>to another by a given method, would not require revalidation at these "lower
>levels" when being transformed to the next "higher" level by another method.
>Presumably, such tags could be scattered at widely different levels within
the
>document, indicating scope of the document over which specific
transformations
>were relevant.

Right. This is complex. Any CS doctoral students out there looking for a
dissertation? Just kidding! It is not that bad. Well, if we make simplifying
assumptions it is not that bad. For rev 1.0 we should skip it but I can think
of an easy case to test. We could include a common name list in the butterfly
records from itis*ca. That list could be manually generated at first. By
"manually", I mean have the contributor search itis and type the results by
hand into the treatment. Later we can have the system query itis for XML that
can be inserted directly (or transformed first) into the treatment,
on-the-fly.
We'll see how the method tracking issue shakes out on the real example.
>
>These issues are separate from the kinds of errors that may result when a
method
>encounters "atomic elements" that are internally inconsistent, within a
>distributed processing environment.  So far we've avoided how to handle this
>separate issue within the spec.  I don't have an answer except perhaps simply
>requiring everyone to agree :-) or specifying the kinds of errors that should
be
>generated under such conditions.
Perhaps we should just report processing errors somewhere. All calculated
fields have comment fields already. Perhaps there should be exception fields
too that in most cases will be empty. Again, this is a bell or whistle that is
not needed to get "something" running.
>
>Perhaps others may suggest a better/alternate approach to address these
aspects
>of dynamic document generation/transformation within the spec.
>
>
Bryan
--
--------------------------------------------------------------------
  P. Bryan Heidorn    Graduate School of Library and Information Science
  pheidorn at uiuc.edu   University of Illinois at Urbana-Champaign
  (V)217/ 244-7792    501 East Daniel St., Champaign, IL  61820-6212
  (F)217/ 244-3302    http://alexia.lis.uiuc.edu/~heidorn