August 2000 - tdwg-content

Re: characters/states and measurements and other hoary problems
by Stuart G. Poss 03 Aug '00

03 Aug '00

A couple of points. With regard to footnote 1. 1 Attribution and sources for an item datum overides that for a character or taxon, which override that for the treatment as a whole. Attribution for characters and taxa are equivalent and additive. Don't we also need to say that item data attribution at the specimen level may override that for a character or taxon as a whole? Otherwise, the system has no way of dealing with misidentifications, particularly if some but not all parts have been associated with the wrong ID, as might be frequently encountered in fossils, or when dealing with taxa whose character state definitions are later found applicable for only a specific size range (ie differentially break down at small sizes). Specimen or parts level data/attributions also would presumably be additive, even if potentially contradictory (subject to differences of opinion), would they not? Statement: One file will comprise one treatment, the basic unit of which is one or more characters describing one or more taxa or individuals. If we are to presume nested levels of groupings of descriptor elements, then we need to be able to clearly distinguish data values that are regarded as referring to "individual units" (specimen, species, higher level taxon) at one level, but are "collective" when evaluated at a different level. Not only with the "collation rules" be different for different levels, but may be different depending on whether a given feature (possibly "same feature but defined differently") is regarded as a data item refering to a specific "individual unit", or as a "collective unit". That is, the data item is a representation of data that might apply to a collection, rather than a measurable value that may have a scope no larger than a specific measure of a specific specimen. Perhaps some treatments might include "collections of collections" that would imply a mixing of both situations. Seems to me we need a means of distinguishing between collective representations and "unit values" (for lack of a better word), if for no other reason than to be able to track which data items apply to particular "features of general interest" (eg. leaves, roots, head, foot, etc) and which taxa are involved (higher-level taxon, species, individuals, parts of individuals). Its not clear to me how the current DDST can be used to associate "basic units" that might be reasonably differentially defined at different "levels of composition" by different investigators. Perhaps we need some general means to assign the "scope" over which the definitions of "unit" apply. Kevin Thiele wrote: > Dear List'eners > > attached find DDST Specifications.htm. See if this works. > > Cheers - k > > ------------------------------------------------------------------------ > Name: DDST Specifications.htm > DDST Specifications.htm Type: Hypertext Markup Language (text/html) > Encoding: quoted-printable

1 0

Re: characters/states and measurements and other hoary problems
by Una Smith 03 Aug '00

03 Aug '00

Kevin Thiele <kevin.thiele(a)PI.CSIRO.AU> wrote: > attached find DDST Specifications.htm. Here is the plain text equivalent, courtesy of lynx. Una Smith -------------------------------- snip ------------------------------------ Draft Specifications for a Descriptive Data Standard for Taxonomy Version History: Version 1.0 February 24, 2000, K.Thiele Version 1.1 revised July 18, 2000, K.Thiele General Requirements The DDST will be a data file structure that allows the capture and management of all types of data required for describing the morphology and anatomy of an organism or taxon. All data and metadata needed will be stored in one file, structured into several blocks (character lists, taxon lists, items data etc.). One file will comprise one treatment, the basic unit of which is one or more characters describing one or more taxa or individuals. The DDST will support the following: External lexica: these are externally-referenced lists of characters and states, or taxa, shared between several treatments. Lexica may be used without modification, or with one or more characters, states or taxa added internally (e.g. global vs local characters). Collation of data: data in the DDST may be captured and managed at several levels. One treatment (see above for definition of treatment) may store descriptive data for individual specimens, another may store data for species-level taxa, while another may store data for higher-level taxa. These individual treatments may be linked into a nested hierarchy, with specified collation rules allowing collation of data up the hierarchy, and passing of data down the hierarchy. Thus, some characters in the species-level treatment may be scored directly in that treatment, while others will collate data (e.g. leaf measurements) from items in the specimen-level treatment. Conversely, some characters may be scored in a genus-level treatment, and these become implicitly true for all taxa in a linked species-level treatment. Rich Attribution: all data elements in the DDST may be fully attributed to a source (e.g. contributor, published reference, specimen etc). Attribution will be optional at any level. Attribution will allow data-tracking and house-keepng, especially in circumstances when several contributors work on one treatment. The list of data elements below is structured using tabbed levels. Items tabbed across one level and enclosed in square parentheses are replicable within the higher level. Items in bold are required within their level (although the higher-level structure to which they belong may not be required) Comments are in curly parentheses. Note that this draft specification does not imply any particular structure for the data file used. It should be read as a list of required data elements for the final specification. ______________________________________________________________________ Treatment Name {Free-text title for the treatment} Description {Free-text description of the treatment} Treatment build/revision number {A real numeric e.g. 4.1 used for version control} Treatment build/revision date {Date string (standardised format?)} Contributors List {List of contributors to the treatment, including the principal builder} [ ID {Unique (in the context of this treatment) number for the contributor} Name {contributors name} Contact details {contributors address, email etc} Private notes {internal notes on contributor}7 ] Attribution {ID of principal treatment builder - this is the default attribution unless a lower-level item is specifically attributed}1 List of sources [ ID {Number for the source} Description {e.g. reference, description of specimen set etc} ] Principal Source {ID of the principal (default) source for the data}1 Treatment attachments {General information topics applicable to the treatment as a whole} [ Attachment name Attachment type {e.g. xml,html,txt,rtf,jpeg,gif} Attachment path/URL Public attachment notes7 Private attachment notes7 ] Private treatment notes {internal freeform notes for treatment}7 Character list source {path to an external lexicon that defines the character list for the treatment} Character set names list {list of set names for characters} [ Name {name string for a character set} ] Character List {required unless an external lexicon resource has been specified above} [ Character Name3 Character ID Set membership {list of sets to which the character belongs; a character must be able to belong to more than one set} Attribution1 {reference to a contributors ID from the Contributors list} Source1 (reference to a sources ID from the Sources list) Collated Character source {path name for another treatment that contains lower-level data for this character} Collation rule name {Name of a collation rule as defined in the Collation Rules list}2 Character type {ordered multistate, unordered multistate etc} Character dependencies (up) 4 Applies To list (or global/restricted type definition, then leave it to program to extract) 5 Character attachments [ attachment name attachment type attachment path/URL Public notes7 Private notes7 ] Private notes {internal notes for character}7 Character State List [ Character state name | Character state ID Character dependencies (down) Character state attachments [ attachment name attachment type attachment path/URL Public notes7 Private notes7 ] Private notes7 ] ] Taxon list source {path to an external resource that defines the taxon list for the treatment} Taxon set names {defines a list of allowable names for taxon sets} [ name ] Taxon List [ Name | Taxon ID Taxon set membership {list of sets to which the taxon belongs; a taxon must be able to belong to more than one set?} Taxon attribution1 Taxon attachments [ attachment name attachment type attachment path/URL Public notes7 Private notes7 ] Private notes7 ] Item Data {This will hold the "score matrix"} Taxon Name|ID/Character Name|ID6 Character Name|ID/Taxon Name|ID6 State Name|ID Score {normally present, rare, present by misinterpretation etc} Score Attribution1 Public Notes7 Private Notes7 ______________________________________________________________________ 1 Attribution and sources for an item datum overides that for a character or taxon, which override that for the treatment as a whole. Attribution for characters and taxa are equivalent and additive. 2 Treatments are nestable. That is, one treatment may contain data on specimens, a higher-level treatment on taxa. The higher-level treatment gathers information for some characters from lower-level treatments, using a specified collation rule. Collation rules will be specified externally to the treatment, and will cover e.g. how to merge scores, calculate values, deal with conflicts in source data etc 3 Character names may be hierarchically nested. Character properties (e.g. sets, dependencies, attachments) are only specified for the lowest level characters. e.g. Leaves margins teeth orientation }only these have shape } properties 4 Dependencies may be defined either up or down (but not both?). An up dependency lists the character states that make this character inapplicable; a down dependency lists characters that become inapplicable when this state is chosen. 5 The idea here is to specify a subset of taxa for which this character is scored, or to specify that the character is non-global, then leave it to the parsing program to determine the taxon list. This feature would be used by future identification programs that employ the Progressive Revelation model. 6 The item data may be stored as the equivalent of either a taxon-state matrix or a state-taxon matrix, depending upon whether taxa are nested within characters or characters are nested within taxa. There will need to be a way of specifying which of these is operative. 7 Public Notes are available for parsing, Private Notes are not, and are designed for private housekeeping within the treatment

1 0

Re: characters/states and measurements and other hoary problems
by Susan B. Farmer 03 Aug '00

03 Aug '00

>Dear List'eners > >attached find DDST Specifications.htm. See if this works. > Uh, not really. Same scenario as for Word Docs for me. Susan > ><!doctype html public "-//w3c//dtd html 4.0 transitional//en"> ><html> ><head> > <meta http-equiv=3D"Content-Type" content=3D"text/html;= > charset=3Diso-8859-1"> >

1 0

Re: Where are we?
by Susan B. Farmer 02 Aug '00

02 Aug '00

>On Mon, 31 Jul 2000, Peter Rauch wrote: > >>(Can it be posted in plain-ascii? Maybe not everyone could view >>it?) > >I second that request. I can view Word docs, but it is a hassle. > > Una Smith > Make that a third ... Susan Farmer

1 0

Re: Where are we?
by Una Smith 02 Aug '00

02 Aug '00

On Mon, 31 Jul 2000, Peter Rauch wrote: >(Can it be posted in plain-ascii? Maybe not everyone could view >it?) I second that request. I can view Word docs, but it is a hassle. Una Smith

1 0

Where are we?
by Kevin Thiele 01 Aug '00

01 Aug '00

Dear List'eners I don't want to push things, but it seems to me that the discussion's not gaining much more focus that we had last time. I think mostly we're reiterating and discussing the scope of the problem, and while this is important and valid, we also need to move beyond this to specific attempts at dealing with the problem. We're discussing the need for a standard rather than actually creating one. I put up a while ago an initial attempt at a proposal (the document DDST Specifications.doc, you all will have this buried in your attachments folder somewhere). Some people have looked at this, I know, but I think we need to decide: 1. Is this a start, or is it the wrong way to proceed and should be ditched in favour of something better 2. If it's an adequate start, what concrete changes are needed to incorporate the ideas people are discussing? 3. If it should be ditched, can anyone produce a better draft specification that we can build on. I think at this stage we need a draft document around which the discussion can focus. Contributions that point out complications and problems can then make concrete suggestions as to changes to the draft proposal, rather than be mere discussion points. It seems to me we should be at this stage around now - we've discussed the problems for close on 12 months now. Now it may be that the structure of this group is poorly suited for the task, and a recommendation should go to TDWG that this isn't the way to proceed. As it stands, it seems to me that we're functioning well as a discussion group but not well as a working group (actually working on the problem). This is undoubtedly partly because we're all busy - I know that I'm doing all this in my "spare" time and I'm sure most of us are in the same boat. Perhaps the only way that this can happen properly is for funds to be sought to employ someone on a contract basis to turn the ideas raised here into concrete output, using the list as an expert group/listening post/scratch-pad resource or whatever. We need to be honest as to whether we're going to achieve something. Any idea? Cheers - k

1 0