Space shuttles and bicycles

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Mon Jul 24 09:41:32 CEST 2000


Dear Stuart,

I agree completely with everything you say, but it worries me all the same.
You point out the complexity of descriptive data and the enormity of the
task of completely capturing it. But we need to get something done, and I
think we need some incremental stages.

Your suggestion as to maintaining threads of discussion is not unlike the
way the list was running before it fell over. Some of the threads did
indeed morph into monsters, others got lost and I think many people with
them. I'd really like to try for a while keeping the discussion focused on
the document with the proposed list of elements, to glean suggestions from
people as to whether it's completely inadequate or what. At the same time,
of course, I don't want to constrain people to run with this or with my
suggested way of doing things. I may well be way off the track of what's
possible or achievable. Working up the document may provide us with an
incremental advance, or it may be that such incremental advances are not
worth achieving and your suggestion for a great leap forward is the way to
go. My way of looking at it is that if DELTA is a bicycle, I'm proposing a
motor bike, and you're sketching out plans for a space shuttle. Maybe I'm
not being visionary enough?

It seems to me that there's an old way of describing something, and many
possible new ways. The old way is with a set of characters with values
(states) applied to a set of taxa. This is the form of DELTA data, Lucid
data, textual descriptions (in a way). Updating our standard for this way
of describing is achievable now, I think.

New ways of describing something, such as with 3-D tomographic imaging etc,
may well be the way of the future. But I'm not sure that we can have one
descriptive standard that encompasses both old and new ways under one roof.
This is why we need extensibility - can we take an incremental step along
the lines that I'm proposing while allowing for the future brave new world.
Or can we have a set of linked standards - one for describing in the boring
old characters/states/taxa way, and others for the more space-shuttle ways
that can be linked in as they develop.

Looking forward to responses

Cheers - k

At 06:07 PM 19/7/00 -0600, you wrote:
>Kevin and colleagues,
>
>As per our discussions at the US-Australia Workshop, I would again
>reiterate a few
>general observation with respect to the list and express my agreement with
>specific
>comments made by you, Bryan Heidorn, and Stan Blum.  However, with your
>indulgence,
>I would also like to provide what may be a somewhat different perspective.
>
>The focus on "requirements" for a descriptive data standard for taxonomy,
>as you
>and Stan emphasize is a critical one, even though as Bryan points out
>there remains
>a number of issues that need to be dealt with that may not be fully
>accounted for
>in the draft standard you have kindly provided.  I would agree that we need a
>mechanism (structure?) for subsequent discussion on the list to permit both
>general, theoretical issues to be addressed, while simultaneously breaking
>down the
>practical realities of dealing with the complexity of specific issues
>involved that
>at times dictates useful "digression" into jargon-laden specifics that
>might be
>relevant for particular implementation issues that require vigorous
>discussion.
>I'm not sure at this stage whether it is possible, at least in my own mind, to
>distinguish structure from content, since dealing with existing structure and
>content may be necessary to define what we perceive are requirements.  My
>own sense
>of the previous discussion is that there are a variety of perspectives as
>to what
>constitutes "descriptive data" and "requirements" in this context, as well
>as what
>are the specific priorities (aspects of "standards") that are necessary for
>specific applications (eg DELTA and LUCID) to intercommunicate in an
>application-neutral manner.  However, mixing them into a common thread
>proved a bit
>overwhelming.
>
>My own bias is for a better understanding of how we can construct such a
>"draft
>standard" so that it is open to considerable extension for the
>incorporation of
>meta-language descriptors for more esoteric data structures, while
>maintaining a
>flexible general framework needed to associate existing "character" data,
>while
>also addressing the practical necessity of managing various "annotations" of
>qualitative characters.   I believe this is important primarily because we
>ultimate
>want machines to do most of the translating among formats, with minimal
>loss of
>information or human intervention.  I believe it is also important for the
>more
>difficult task that lies ahead of encoding means for machines to "feature
>extract" across a multiplicity of representations of character data.
>
>As a taxonomists/morphologist I am constantly confronted with new data
>formats and
>widely different data sources.  Virtually all are created in specific
>contexts and
>do not generally have a "web-wide" mechanism for associating their
>content.  For
>example, it is difficult for me to determine if there exist data sets that
>encompass different "encodings" of information pertaining to specific
>structures
>for specific taxa.  I need a dyanamic mechanism that will permit me to
>become aware
>of data sets pertaining to say the pectoral fins of a particular scorpionfish,
>without having to know in advance that such data may exist in the form of 1) a
>collections record of a skeleton in a particular collection, 2) a
>published data
>set characterizing the measurements taken from a particular study, 3) a
>CAT scan of
>such a critter, 4) an archive containing the representation of specific
>character
>states used in a phylogenetic analysis, 5) numerous gif/jpg files of
>radiographs of
>specimens, 6) a text based description of the pectoral fins in a fossil,
>7) the
>title of a paper describing the sensory innervation of the fin, or 8) a
>database of
>specific HOX genes involved in fin formation.
>
>Certainly, the Rich Attribution component of your document is critical
>element for
>this, but I do not yet see how I can use this document to establish the
>"meta-data
>wrapper" needed to compile such a list, much less establish to what extent
>I can
>use such a text based "wrapper" to associate these disparate kinds of
>taxonomic
>data.  How do I deal with data that are largely numeric in content or purely
>graphic (pixel encoded)?  Nonetheless, I would agree that there is a need
>for a
>series of "collation rules" to establish scope at different hierarchical
>levels or
>for specific context-oriented activity.  I would, for example,
>add  several lower
>levels still in this context (including parts of specimens as described at the
>organ, tissue, and cellular, subcellular, and molecular level).  Of
>course, the
>difficulty here is that resolution and context may create data structures
>that are
>not entirely hierarchical, particularly for objects of composite origin or
>study.
>For the nervous system in chordates one can break down the system into
>units with
>respect to various elements that could in one sense be heirarchical
>(perhaps brain,
>spinal chord, ramus lateralis accessorius, neuron, motor unit, motor endplate,
>etc.).  However, with respect to a physiological classification dealing
>with action
>at the level of specific neurons this classification scheme would not work
>since
>the nerve is composite and composed of both sensory and motor
>elements.  Likewise,
>it would be difficult to place neuroendrocrine components, specific
>neuropeptides,
>or developmental anlagen, such as placodes, however important, into a parallel
>heirarchy.  Likewise, usefully descriptive properties could not be easily
>restricted to specific components.  I found the discussion at the workshop
>regarding the use of acyclic directional graphs as a fundamental data
>structure
>most interesting, but I'm not sure that morphological descriptors, perhaps
>unlike
>gene products, are necessarily acyclic.  For example, a specific bone such
>as the
>mandible can be classified as an element of the visceral skeleton as well as a
>composite element containing both endochondral as well as dermal bone.  If one
>looks early enough in development, one can't even recognize these anatomical
>distinctions, although they may exist at a molecular level.  How should
>structures
>that change with development or function be tagged and associated?  Would
>this not
>depend upon context?  Nonetheless, following from your document, it might be a
>useful excercise to consider to what extent certain classes of morphological
>descriptors can be considered in such a graph-theoretical framework from
>which we
>might be able to establish certain constructs as useful in associating
>otherwise
>disparate, yet specific data (glossaries?).  Trees certainly are a useful data
>structure for description of many morphological features, but not the only
>ones.
>
>Consequently, it might be useful to break up the discussion into
>sub-discussions or
>threads for which specific requirements can be more readily circumscribed
>and for
>which the makings of a "meta-language" needed to search and assimilate
>alternate
>representations might be more quickly forthcomming.  This is important
>because the
>universe of potentially different data structures for encoding character
>data is
>very large.  There is no need for those interested primarily in DELTA - LUCID
>translations, or LUCID - PHYLIP, etc. transformations to be held up by more
>specific requirements concerning translations/annotations of more arcane data
>structures, even though some, like Bryan and I, may feel that transformations
>between "other kinds" of data structures must also be incorporated in a
>way that
>allows their potential richness to be exploited.  However, acheiving such
>extensibility will require the "standard discriptors" to be be quite
>general (but
>not ambiguous) in construction.
>
>Such an approach might permit us to generalize across a number of possibly
>highly
>specific topics and requirements that are not universally applicable and
>with which
>many of us are differentially fluent.  This approach would be especially
>useful,
>should we begin at a latter point begin to use them to construct XML
>schema or to
>outline what might be necessary using XSLT to transform them from one XML
>format to
>another.  Since XML is promising as a data neutral specification language,
>we might
>want to maintain a separate "XML thread", and perhaps even various XML
>(alternative)-implementaton subthreads (Java XML API's vs MS XML API's, vs
>"others?
>or DTD's vs Schemas, "elements" vs "atributes", etc.) that will influence
>how such
>a "standard" could currently be implemented.  Although certainly I would
>agree that
>we do not want implementations to drive the standards, it is important to
>have an
>understanding of how potential implementation might affect the utility of the
>standards.  It might be useful here to draw an analogy to the presentation
>made at
>the workshop by Sue Rhee in her discussion of the need for an "ontological"
>database for common annotation of gene function across molecular databases.
>Likewise, we need a generalized means of characterizing the "language" used to
>describe the various entries in different "glossaries" used to describe
>character
>data.  The need for such "cross molecular" databases would not arise,
>except for
>specific implementation issues that are not presently adequately addressed.
>Likewise, your "External lexica" might be usefully encompassed in the
>concept of
>XML name spaces.  Although I can't think of specific examples off the top
>of my
>head, some anatomical terms are used differently in different contexts
>("viens" in
>animals and plants might be a simple example).  We need to be able to
>distinguish
>the contexts.  Perhaps this is what you mean by global versus local
>characters?
>
>Hence, from my perspective it might be useful for the dialog to move
>forward along
>several separate, yet not entirely distinct threads, where folks with specific
>interests could provide input as they see fit, ignoring that which seems
>irrelevant.  A few may even want to keep their thumbs in all the pies.  In
>glancing
>over what has come before, we might consider as possible threads: 1) general
>theoretical perspectives on "taxonomic data", 2) one or more application
>specific
>threads (ie DELTA, LUCID, "phylogeny packages", NEXUS, others?, etc.), 3)
>issues
>pertaining to description and characterization of qualitative data, 4) issues
>pertaining description and characterization of quantitative characters, 5)
>issues
>pertaining to text based description (semi-structured data), 6) issues
>pertaining
>to structured data (ie relational or object modeled data structures), and
>7) meta-language requirements (headers, tagging architecture, XML
>etc.).  No doubt
>you or others might be able to amend these or to add a few others from
>within which
>we might eventually reach consensus on assembly of a few key requirements
>that are
>general to all and from which interoperable implementation could proceed
>so as to
>be able to assess the usefulness of our work.  Perhaps some threads could
>rule out
>discussion of "content" and others "structure".  In any event, it might be
>useful
>to let natural selection act to allow the most productive threads to
>survive and
>"establish focus", while the others die out, without letting the whole wither
>because of the complexity and interconnection of the fundamental issues.
>
>No doubt some of these threads will might morph into monsters not anticipated.
>Consequently, to keep it all coordinate, there must be some general
>agreement/understanding to focus on common requirements (GOALS) that we
>are trying
>to acheive.  However, at this early stage, these might be largely implict
>so as not
>to lock ourselves into unnecessarily narrow perspectives.  For this to work,
>perhaps one or two "ring masters" or "virtual ushers (bouncers?)" are
>needed to
>keep the various performers and audience on cue, to summarize progress
>from time to
>time, and to remove, add, or combine  threads at key moments  (ie oversee
>and exert
>some "administrative" control over the various threads).  This is
>important  so
>that a specific set of useful general requirements is forthcomming in a timely
>fashion.  I nominate you and Bryan (actually you guys nominated yourselves in
>Boston or was it unanimous proclamation?).   Subsequent to such general and
>specific discussion, I believe we would then be in a better position to
>respond to
>specific requests for comments on documents, such as that you have put forth
>outlining draft standards.
>
>Stuart
>
>
>Kevin Thiele wrote:
>
> > Dear Colleagues,
> >
> > you will all be aware that the SDD list fell over several months ago. My
> > interpretation of this is that many of the taxonomists on the list were
> > left behind, perhaps early on, by the energetic discussions over issues of
> > data structuring (XML, schemas, RDF etc). Most of this was certainly way
> > over my head. Things got too top-heavy, and attempts to structure the
> > discussion using message tags didn't seem to provide much focus.
> >
> > Recent discussions (at a meeting on US-Australian cooperation in
> > bioinformatics in Washington, July 2000, attended by several SDD
> > contributors) has again highlighted the great need for an SDD standard and
> > shown that the lack of a new, inclusive standard is holding back progress
> > on descriptive databasing and software design.
> >
> > We need to restart the list with better focus. I'd like to suggest that the
> > way forward is to entirely set aside (for the time being) any discussion of
> > data structure and focus entirely on content (the requirements analysis)
> > for a while. We should agree on an outline of the data that we need to
> > capture, then pass this on to the computerheads to provide a best-practice
> > structure for storing and managing this captured data.
> >
> > The attached document was put up to the list shortly before it fell over.
> > It's attached again here, slightly edited. ANY TAXONOMISTS STILL OUT THERE
> > - please look at this. What data that you need to capture aren't handled
> > here? Will this work? Is this the way to proceed?
> >
> > I think that the document subsumes the data requirements of the DELTA and
> > Lucid programs, plus a bit more particularly in the areas of data
> > attribution and hierarchical nesting of treatments. The intention is that
> > the elements in this list should provide a way of storing any data needed
> > to describe the morphology or anatomy of any organism or taxon.
> >
> > Note that this should be read merely as a list of data elements - the
> > structure of the list does not imply a structure for the data file (XML or
> > otherwise) used to store the data.
> >
> > It may be the case that this document can be jointly modified to produce a
> > final document, or we may need to start from scratch with another. Any
> ideas?
> >
> > Cheers - k
> >
> >   ------------------------------------------------------------------------
> >                               Name: DDST Specifications.doc
> >    DDST Specifications.doc    Type: Microsoft Word Document
> (application/msword)
> >                           Encoding: base64




More information about the tdwg-content mailing list