Creation/Modification times and Revision Numbers

Wed Aug 16 00:25:49 CEST 2000

Bryan Heidorn wrote:

> What I am concerned about with the revision number is easing processing for
> www
> spiders and for composite documents created from lower level
> documents/treatments. www spiders search the network for new documents to
> index. If the documents are dynamically created "on demand", they are in some
> way different every time the spider looks, if nothing else because the
> "creation date" will change to always be now. We need something more stable
> like editorial date... the last time a qualified human said the record was OK.

Yes.  Each "atomic element" must have a "date last modified [by authority]"
attribute for the reason you suggest.

However, would not the spec also need to require that the "spider", or
specifically those processing instructions that define the "behavior" or result
set produced by the spider or other collation [transformation] programs
(methods), also leave a distinguishing revision number in the document so that
one can determine what specific method was used?  One might expect different
behaviors even for slight changes to the processing (compilation) code used to
assemble the consitutent [atomic] elements of the document to influence the
document content.

Even though there may be transient lack of availability of some distributed
resources (perhaps due to network failure, loss of a particular server, server
load, etc.] , the behavior of the "spider" also could influence the document
content [atomic element inclusion], albeit independently.  This would be an issue
for "higher level" associations (documents) that are created by automatically
assembling other "documents" that are themselves constructed from "atomic
elements".  Consequently, requiring the "spider" [read element inclusion
processing instructions] to leave an identifying mark in the document would
provide an ability to distinguish the source of the potential differences in the
final dynamic document.

>
>
> The same problem exists for higher level taxon descriptions that are created
> from lower level descriptions. There was a short thread about this a while ago
> on this list. A good feature of dynamic creation of the descriptions is that
> when a low level detail changes the higher level summary information also
> automatically changes. The "oops, that specimen was actually a different
> species" syndrome, may cause a character state recording some value to change
> (because the outlier was recognized as another species.)  We do not want to
> have to recalculate and verify all atomic facts whenever someone looks at a
> high level description.

Yes, I fully agree that the lower [atomic] level of detail must force changes at
higher more associative/inclusive levels precisely for the reason you suggest.

However, within an environment where the data may be distributed there may be
sources of error that may be generated when "key" atomic elements are not found
(server load to great, network failure, etc) or perhaps only older versions can
be located (only mirrors available, unflushed cache, etc).  Consequently, to
assess the logic needed to [possibly further] process or interpret a document's
contents would it not be useful to also have a means to pass knowledge of what
methods were used to assemble the document?    Presumably, the behaviors of the
processing instructions (document generating methods) will influence the
circumstances under which some kinds of atomic element processing instructions
(methods) will fail while others may succeed, say in making subsequent changes
"automatically" or "recognizing" the existence of another species.

Don't we need a spec that includes [requires?] a means to pass knowledge about
the kinds of associations that were made to generate the document, as well as the
lower level "atomic elements"and their attribution, if we are to avoid
recalculation to fundamental atomic level at each stage in the transformation
(document building) process?  If documents were characterized in XML, this could
make it much easier to chain XML transformations by faciliating the recognition
of what order of transformations were permissible.  This could also greatly
simplify the modification of processor (method) behavior and the
development/maintenance of efficient interprocess communication. A document that
was validated by successfully being transformed at one lower level of processing
to another by a given method, would not require revalidation at these "lower
levels" when being transformed to the next "higher" level by another method.
Presumably, such tags could be scattered at widely different levels within the
document, indicating scope of the document over which specific transformations
were relevant.

These issues are separate from the kinds of errors that may result when a method
encounters "atomic elements" that are internally inconsistent, within a
distributed processing environment.  So far we've avoided how to handle this
separate issue within the spec.  I don't have an answer except perhaps simply
requiring everyone to agree :-) or specifying the kinds of errors that should be
generated under such conditions.

Perhaps others may suggest a better/alternate approach to address these aspects
of dynamic document generation/transformation within the spec.

>
>
> Wow, what a pile of trouble for a little feature!
>
> I do not know what to do about defining "treatment", "document", "collection"
> and "project". I am willing to adopt any one else's definitions. The key point
> is that there are actually different things that need to be treated
> differently
> some how.

Thats fine by me as long as we can communicate which ones we're using.

>
>
> Bryan
> At 04:16 PM 8/15/00 -0600, Stuart G. Poss wrote:
> >Bryan Heidorn wrote:
> >
> >> Yes, perhaps there are really two different fields
> >> Treatment creation time and revision number. I think time alone is not
> enough
> >> since one can not tell from that if the treatment has changed since it was
> >> last
> >> viewed or used (to create higher level treatments).
> >
> >>
> >
> >> Do you instead mean time created and time last revised, as well as revision
> >> number?
> >
> >It is conceivable that different systems (servers) might have various,
> slighly
> >different versions of the constructional software running on them that could,
> at
> >least in principle, produce two different version numbers even when
> >"simultaneously" generating elements of the same document (treatment?).
> >
> >Don't we need to keep in mind that both "collections [attributable to a
> unique
> >source?]" and "treatments [virtual collections generated from multiple
> sources
> >with respect to specific <processing> instructions?]" [or visa versa?]
> may be
> >dynamic in distributed environments?
> >
> >I too remain unsure how the concepts and scope of terms "treatment" and
> >"document" and "collection" are being used (defined) as this discussion
> emerges.
> >It might be useful for us to maintain a glossary, perhaps with qualifiers (ie
> >sensu Bryan or sensu Kevin, etc), as such issues arise.  We can then at least
> >know whether we agree/disagree with respect to what definitions required or
> with
> >respect to how the definitions are used.
> >
> >
> >>
> >> >
> >> >| The current standard makes a relatively weak standard that the
> contributor
> >> >| codes are unique to the treatment. I think we need to use a broader
> >> >| definition. The should be unique to a collection at least.
> >> >
> >> >Again we have a definitional problem and I think my treatment = your
> >> >collection.
> >> >
> >> >| Attribution:
> >> >| Must this be a contributor? If so this information should be handles
> as a
> >> >| property of <CONTRIBUTOR ROLE=PRINCIPAL|COPRINCIPAL> or as another
> tag of
> >> >| <CONTRIBUTOR>
> >> >|     .... <ROLE>PRINCIPLE</ROLE>....
> >> >| Suggestions?
> >> >
> >> >Yes, it could be done like this, but what would be wrong with doing it the
> >> >other way - seems somehow neater to me, and I can't see much inefficiency.
> >> It could
> >>
> >> --
> >> --------------------------------------------------------------------
> >>   P. Bryan Heidorn    Graduate School of Library and Information Science
> >>   pheidorn at uiuc.edu   University of Illinois at Urbana-Champaign
> >>   (V)217/ 244-7792    501 East Daniel St., Champaign, IL  61820-6212
> >>   (F)217/ 244-3302    http://alexia.lis.uiuc.edu/~heidorn
> >
> --
> --------------------------------------------------------------------
>   P. Bryan Heidorn    Graduate School of Library and Information Science
>   pheidorn at uiuc.edu   University of Illinois at Urbana-Champaign
>   (V)217/ 244-7792    501 East Daniel St., Champaign, IL  61820-6212
>   (F)217/ 244-3302    http://alexia.lis.uiuc.edu/~heidorn