- tdwg-content - lists.tdwg.org

Progressive Revelation
by Kevin Thiele 28 Feb '00

28 Feb '00

At 15:30 24/02/00 +1100, Eric Zurcher wrote: >6) I'm intrigued by the notion of a "Progressive Revelation model" >(footnote 5). It sounds terribly theological - or perhaps that's >Thiele-logical? (my apologies to Kevin, but I really can't resist bad puns). I'm often accused of teleology, but rarely of theology. Progressive Revelation is perhaps a new way of handling holes in data matrices for random-access keys. The background is this: The simplest data structure for a random-access key is a fully populated matrix i.e. all taxa are scored for all characters/states. Works well sometimes, especially if the taxa are highly comparable e.g. the species of a genus or the genera of a family. This structure is problematic sometimes though, for two reasons. Firstly and most simply, you may not have data for all taxa, and need to leave holes in the matrix. Solution is simple - fill the holes with ?s and allow for this in the key program. But it often also happens that some characters are simply inapplicable to some taxa, or (worse) are non-ambiguous for some taxa but ambiguous for others. For instance, stipules don't occur in monocots, stipule-like structures sometimes do but if you try scoring stipule characters as defined for dicots against monocots you run into all sorts of strife because of ambiguity of context. LucID can handle this to some extent using the "present by misinterpretation" score, but the problem is in the character definition, not the score. Sometimes a better way to handle such circumstances using LucID is to break the key up into a hierarchically nested set of keys & subkeys. For instance, you want to create a key to grass species of Australia but there are many special characters needed for identifying Poa species that are either inapplicable to or ambiguous with respect to the remaining grasses. So put Poa as a genus in the top-level key and attach to it a subkey to Poa species in which you can optimise your character definitions for the Poas. There are some disadvantages to this but often the advantages (in having an optimised rather than generalised and suboptimal character list) outweigh the disdvantages. But there may be another solution - Progressive Revelation. As far as I know no-one's done this yet, but I think it has merit. It would work like this. Create a key to all grass species so you're working with a list of all taxa at species level including all the Poa species. The character list has two classes of characters - ones that are scored over all taxa (these will be the easily generalised characters) and ones that are scored for only a subset of taxa (the characters that are highly specific and/or not easily generalisable). When the key program starts it splashes up the generalised characters only. But if after answering some characters you end up with only Poas, the program finds and adds to the character list the Poa-specific characters. Characters are progressively revealed as you proceed through the key, with as much depth as necessary - e.g. you may come down to a species complex of alpine Poas and presto! some characters appear that are just the ticket to separate them. Might work. This seems to me to be a more natural way of approaching both the building and running of a key. In some ways it's like a hybrid between a traditional random-access key and a traditional nested hard-copy key, but it has more flexibility than either. In the context of natural-language descriptions (and more controversially) it would also provide a challenge to what to me is the Universalist furphy that all descriptions should be strictly comparable! I've included these ideas into the SDD Specification because I think this may be one for the future. Cheers - k ps Eric - if you incorporate these ideas into DELTA we'll need to make an arrangement regarding due acknowledgement and royalties :-)

1 0

Re: SDD Specifications Document
by Kevin Thiele 28 Feb '00

28 Feb '00

At 15:30 24/02/00 +1100, Eric Zurcher wrote: >1) One "pattern" that recurs in the document in the use of "attachments" to >entities. These consist of a name, type, path, public notes, and private >notes. Presumably one could define some sort of generic "attachment" object >and avoid the multiple (4) redefinitions currently given in the outline. >This would shorten the outline appreciably. This sounds sensible. Should someone edit the document to reflect this, or should we just leave efficiency dividends like this to Leigh or whoever creates the final thing? Two comments though. Firstly, most of these attachment objects will be unique - presumably it won't often happen that one attachment will be used for two taxa, or for a taxon and a character state. So do you gain much? Secondly, ease of reading the text treatment should be a consideration. If all the information re the attachment stays with the taxon/state etc entry, this may make the treatment more navigable. >2) ID numbers crop up in a number of places. But it's not obvious how >"unique" these various IDs need to be. For example, must character IDs be >unique only with the context of the set of all character IDs, or of all IDs >used anywhere within the treatment? Or perhaps they are intended be used >across treatments (to facilitate merging, etc.), and must be unique (but >consistent) in an even broader sense? (But perhaps this is an area this is >best left deliberately ambiguous for now.) I'd say that IDs need to be unique within their context eg at the character level. Super-uniqueness beyond the treatment is the lexicon issue and doesn't need to be addressed here, I think. We should construct the standard in such a way that lexica can grow, but without a requirement for them. For instance, there should be no requirement that character IDs form a contiguous integer set. >3) Character (and taxon) sets - these should probably be defined >hierarchically, so that sets would be able contain other sets, as well as >the base elements (characters or taxa). Note that there seems to be another >"up vs. down" problem here - is it better to define a set by listing the >members within it, or for the each of the members to list the sets to which >it belongs? What do people think? Does the idea of allowing nested sets overlap and merge the boundary with the nested character structure, and perhaps become redundant. For instance, I envisage the characters being nested (as discussed earlier in the list): Plant leaves orientation venation prominence reticulation flowers petals colour number .........etc Sets are perhaps a shortcut across all this. If they were heirarchically structured, would you just be repeating much of the above without the lowest level? >4) I'm rather confused by footnote 3, regarding the nesting of character >names, and the restriction of "properties" to only the lowest level. What >it the reason for this restriction? But certainly there seems to be merit >in separating the "properties" of a character from it's textual >representation. This is almost essential when attempting to generate >natural-language descriptions in multiple languages. Similarly, different >wordings may be appropriate in different application contexts >(natural-language vs. interactive keys vs. conventional keys, or keys of >the layman vs. the specialist). There are many things in the document that I haven't thought through properly, but I decided to get it out for comment before labouring further. I think what I meant was that the higher-level structures in the character names list do not have any representation in the data "matrix", only the lowest level. Consider some data represented in the LucID way (as a taxon-state matrix) (you can do the same thing for a DELTA representation in which cells of the matrix hold taxon-character scores): 123456789 Taxon1 010101000 Taxon2 111010101 Taxon3 000101001 Columns 1-9 represent character states. Columns 1-3 may be the states of character 1, 4-5 the states of character 2, 6-9 the states of character 3. Now characters 1&2 may both belong to a higher-level structure, but the properties of this higher-level thing are not equivalent to the properties of the state (for instance, it can't have a score). But perhaps there are properties in common? >5) This draft allows for a "score" only within the context of a "state >name". It is not obvious how characters with non-discrete values (e.g. >numeric values) would be handled. Just as for Leigh's original XDELTA. >7) For purposes of natural-language generation (and perhaps other >applications), it is desirable to have some sort of "connection operator" >between states within a character (e.g., "flowers blue or violet" vs. >"flowers blue and violet" vs. "flowers blue to violet" all carry slightly >different meanings). This and other requirements of generating >natural-language descriptions might be an argument for generally preferring >a "characters within taxa" representation to "taxa within characters". Again, I think Leigh's XDELTA covers this, using nested scores.

1 0

Re: SDD Specifications Document
by Kevin Thiele 28 Feb '00

28 Feb '00

Bob Morris surely something like this - working out data requirements - needs to be done whether the final product has a DTD, RDF or XML-Schema base. I'm not qualified to argue the implications or advantages, so how about those of us like me argue about the data, and you mob argue about the rest. Maybe we can get to resolutions at the same time! Cheers - k

1 0

Re: SDD Specifications Document
by Neil Caithness 25 Feb '00

25 Feb '00

> From: "Mauro J. Cavalcanti" <maurobio(a)ACD.UFRJ.BR> > > I was also curious about this notion. Maybe it is one of the latest > "revelations" of Religious Cladism - its practitioners are fond of such > theological terminology. Mauro, are you serious, or was this just a through-away comment? This is the sort of pseudo-criticism that comes from people who understand nothing. "Religious Cladism" means what, and "its practitioners" refers to whom? Neil Caithness

1 0

Re: SDD Specifications Document
by Mauro J. Cavalcanti 25 Feb '00

25 Feb '00

Eric Zurcher escreveu: > 6) I'm intrigued by the notion of a "Progressive Revelation model" > (footnote 5). It sounds terribly theological - or perhaps that's > Thiele-logical? (my apologies to Kevin, but I really can't resist bad puns). I was also curious about this notion. Maybe it is one of the latest "revelations" of Religious Cladism - its practitioners are fond of such theological terminology. Otherwise, I essentially agree with all the points raised by Eric Z. I was about to suggest that a graphic model, perhaps using the UML methodology, was in order, but Jean-Marc Vanel has already taken care of that. His model is interesting, and deserves attention. Cheers, -- + - - - - - - - - - - - - Mauro J. Cavalcanti - - - - - - - - - - - - + | Setor de Paleovertebrados, Departamento de Geologia e Paleontologia | | Museu Nacional do Rio de Janeiro | | Quinta da Boa Vista, 20940-040, Rio de Janeiro, RJ, BRASIL | | E-mail: maurobio(a)acd.ufrj.br | | Home Page: http://read.at/digitax/personal.html | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + "Life is complex. It consists of real and imaginary parts."

1 0

Re: SDD Specifications Document
by Eric Zurcher 24 Feb '00

24 Feb '00

At 16:02 23/02/2000 +1100, Kevin Thiele wrote: >Dear list_eners > >attached find SDDspecs.rtf. This is my attempt to formalise the discussion >so far into a set of specifications for a new descriptive data standard. > and requested comments his proposal. I've not had time to think this through in detail, but I'd nonetheless like to offer a few scattered observations. 1) One "pattern" that recurs in the document in the use of "attachments" to entities. These consist of a name, type, path, public notes, and private notes. Presumably one could define some sort of generic "attachment" object and avoid the multiple (4) redefinitions currently given in the outline. This would shorten the outline appreciably. 2) ID numbers crop up in a number of places. But it's not obvious how "unique" these various IDs need to be. For example, must character IDs be unique only with the context of the set of all character IDs, or of all IDs used anywhere within the treatment? Or perhaps they are intended be used across treatments (to facilitate merging, etc.), and must be unique (but consistent) in an even broader sense? (But perhaps this is an area this is best left deliberately ambiguous for now.) 3) Character (and taxon) sets - these should probably be defined hierarchically, so that sets would be able contain other sets, as well as the base elements (characters or taxa). Note that there seems to be another "up vs. down" problem here - is it better to define a set by listing the members within it, or for the each of the members to list the sets to which it belongs? 4) I'm rather confused by footnote 3, regarding the nesting of character names, and the restriction of "properties" to only the lowest level. What it the reason for this restriction? But certainly there seems to be merit in separating the "properties" of a character from it's textual representation. This is almost essential when attempting to generate natural-language descriptions in multiple languages. Similarly, different wordings may be appropriate in different application contexts (natural-language vs. interactive keys vs. conventional keys, or keys of the layman vs. the specialist). 5) This draft allows for a "score" only within the context of a "state name". It is not obvious how characters with non-discrete values (e.g. numeric values) would be handled. 6) I'm intrigued by the notion of a "Progressive Revelation model" (footnote 5). It sounds terribly theological - or perhaps that's Thiele-logical? (my apologies to Kevin, but I really can't resist bad puns). 7) For purposes of natural-language generation (and perhaps other applications), it is desirable to have some sort of "connection operator" between states within a character (e.g., "flowers blue or violet" vs. "flowers blue and violet" vs. "flowers blue to violet" all carry slightly different meanings). This and other requirements of generating natural-language descriptions might be an argument for generally preferring a "characters within taxa" representation to "taxa within characters". Cheers, Eric Zurcher CSIRO Division of Entomology Canberra, Australia E-mail: ericz(a)ento.csiro.au

1 0

Re: <XML> Abstract Data Model for Taxonomy
by Jean-Marc Vanel 24 Feb '00

24 Feb '00

Leigh Dodds a écrit : > 1. Firstly is it possible to express items like > "Feature often present by mis-identification" Sorry, my english is too weak. Does this mean a character value that is often reported by confusion with another taxon ? > I note you've already highlighted seasonal variations as a problem. My solution: the default for a Property is to record the state for: - middle of active season - adult specimen Otherwise, create a new Property. Note that beside the specified sub-elements of tax:Property (appliesTo, appliesToClasses, name), it is possible to add extra information like, say, <bio:season>spring</bio:season>. This is a property (rdf:Property) about a property (tax:Property). > To that I'd add geographical, My (simple) solution: if anything is different in some region, create a lower level tax:TaxonomicClass, containing just the differing Features; according to model, other Features will be inherited. > 2. It seems to presuppose a Linnean viewpoint i.e. Kingdom, > Phylum, etc. I didn't write that, in fact this model is so generic that it could almost be applied to car models! > Gregor has previously pointed out that the > reality is much more complex. Multiple hierarchies > can be produced. For example, how would the model be used to express > DNA/Protein data? Physiological versus Molecular hierarchies? And belonging to phytosociological associations, also. Although I didn't think of that, this is no problem. You just have to publish on the Web a Schema with, say, a ProteinClass and a ProteinSubClass, declaring that both are rdfs:subClassOf of tax:TaxonomicClass, and connecting them using tax:lowerClasses and tax:upperClass, etc. And hoping that many will connect to your Schema... Well, this is the freedom of the Web, but I hope that a standard will be found. But I wanted to show that extendibility is perfectly possible, with the possibility that the new semantic is declared as such, so that tools can understand the new semantic. > 3. I assume that the list of Feature/Properties will not be > fixed, but can be extended at will? Yes > However I'm concerned that the presence of items like 'TaxonomicClass' in > the model doesn't > capture the range of flexibility that Kevin, Gregor and the others > have stated as a requirement. Is this requirement like the Molecular hierarchies, or something else ? Cheers Jean-Marc -- <person> <first_name>Jean-Marc</first_name> <name>Vanel</name> <project>Worlwide Botanical Knowledge Base - making botany available on Internet <a href="http://wwbota.free.fr/" >site</a> </project> <homePage>http://jmvanel.free.fr/</homePage> <a href="mailto:jmvanel@free.fr">mail (eventually put "wwbota" in subject to route your mail in relevant folder)</a> </person>

1 0

Re: <XML> Abstract Data Model for Taxonomy
by Leigh Dodds 24 Feb '00

24 Feb '00

Hi, > The spring is coming, the crocuses are in bloom, it's time to come up > with a formal model. Some questions: 1. Firstly is it possible to express items like "Feature often present by mis-identification" (I came across this somewhere, but can't recall the posting/webpage) I note you've already highlighted seasonal variations as a problem. To that I'd add geographical, and developmental (i.e. age of item). 2. It seems to presuppose a Linnean viewpoint i.e. Kingdom, Phylum, etc. Gregor has previously pointed out that the reality is much more complex. Multiple hierarchies can be produced. For example, how would the model be used to express DNA/Protein data? Physiological versus Molecular hierarchies? [See (RQT) Character and item hierarchy from Gregor posted on 1/12/99 - I'd include a URL, but the ListServ archive seems to require a username/password] 3. I assume that the list of Feature/Properties will not be fixed, but can be extended at will? Apologies if this seems overly critical, I'm just trying to get my head around things. Fundamentally I think that an RDF (or even Groves) model of the data will be extremely useful. However I'm concerned that the presence of items like 'TaxonomicClass' in the model doesn't capture the range of flexibility that Kevin, Gregor and the others have stated as a requirement. I'd welcome some further discussion/information of your model. Cheers, L.

1 0

<XML> Abstract Data Model for Taxonomy
by Jean-Marc Vanel 24 Feb '00

24 Feb '00

Hello dear group The spring is coming, the crocuses are in bloom, it's time to come up with a formal model. Look at http://wwbota.free.fr/UML_diagrams.htm -- <person> <first_name>Jean-Marc</first_name> <name>Vanel</name> <project>Worlwide Botanical Knowledge Base - making botany available on Internet <a href="http://wwbota.free.fr/" >site</a> </project> <a href="mailto:jmvanel@free.fr">mail (eventually put "wwbota" in subject to route your mail in relevant folder)</a> </person>

1 0

Re: SDD Specifications Document
by Jean-Marc Vanel 24 Feb '00

24 Feb '00

Dear contributors It's good to see a new debate begining! Undefined terms in SDD Specifications Document: * "Progressive Revelation model" * Treatment * score It seems that this Specification mixes data and Specification of processing to be done on this data by an application. Processing is a very interesting subject (see transforms), but good design is layered design, and processing seems an upper and distinct layer. Eric Zurcher a écrit : > 1) One "pattern" that recurs in the document in the use of "attachments" to > entities. .... > 2) ID numbers crop up in a number of places. But it's not obvious how > "unique" these various IDs need to be. I agree. > some sort of "connection operator" > between states within a character (e.g., "flowers blue or violet" vs. > "flowers blue and violet" vs. "flowers blue to violet" "flowers blue and violet" will have to be covered by my proposed "Abstract Data Model for Taxonomy". <person> <first_name>Jean-Marc</first_name> <name>Vanel</name> <project>Worlwide Botanical Knowledge Base - making botany available on Internet <a href="http://wwbota.free.fr/" >site</a> </project> <a href="mailto:jmvanel@free.fr">mail (eventually put "wwbota" in subject to route your mail in relevant folder)</a> </person>

1 0