- tdwg-content - lists.tdwg.org

Re: (XML) XML?
by Jean-Marc Vanel 05 Dec '99

05 Dec '99

Wellcome in this list! It seems that we are on the same wavelength, see my vision in : http://jmvanel.free.fr/call.htm I know little about Palm Pilot. But we should avoid being tied to proprietary systems and API, which doesn't prevent to work on Palm Pilot, having a compatibility API layer for proprietary calls. It seems that a lot can be done a standard browser, using Javascript, DOM, XSLT, with small applets to provide the rest of functionalities. Internet Explorer 5 will probably be soon XSLT-compliant, and Mozilla (www.mozilla.org) is advancing, having a modular, open-source architecture, and support for XML and CSS. Dave Vieglais a écrit : > Palm Pilots are eminently programmable. With standard tools? > Add a GPS... Add in a couple more data collection devices- and capture quantitative information at the same time- temperature, humidity, canopy gap fraction, even fish-eye photos (if you're into that sort of thing)... Can you connect all those things on a palm pilot ? > Sounds like a fun project to me. Have you the time and/or sponsorship and/or skills to do that? > The real key is datasets that are readily interoperable- that's what XML can provide. Don't worry, datasets will come within a few month, using a simple linguistic approach to parse existing floristic texts (see http://jmvanel.free.fr/Samples/parsing.htm) Cheers Jean-Marc

1 0

Re: (XML) XML?
by Dave Vieglais 04 Dec '99

04 Dec '99

Palm Pilots are eminently programmable. But they only have a few (2-8) meg of RAM depending on the model. So, they'll never hold a very large data set. There are a few options- upload an appropriate portion of the dataset to the device, carry the data with you in a format that can be easily loaded into the Palm Pilot, or provide the device with a connection to another device that stores enough data. There are several wireless solutions available, with transmission ranges from a hundred or so feet to a few miles, to global (cellular or satellite). A custom solution (custom software, off-the-shelf components) could probably be worked out with a laptop as base station, radio modem for connectivity, and a happy Palm Pilot/ windows CE user in the field. Add a GPS, a database back in the base station, and a bunch of field observations could be collected quickly and easily, with one point of data entry. Add in a couple more data collection devices- and capture quantitative information at the same time- temperature, humidity, canopy gap fraction, even fish-eye photos (if you're into that sort of thing)... Sounds like a fun project to me. The real key is datasets that are readily interoperable- that's what XML can provide. Dave V. > > *snip* I've got to catch up on my back reading ... > > But I saw this and *had* to respond! > > > > >Example 2: Say I want to run NaviKey on my Palm Pilot*. > The device has > >limited resources and I would prefer to use very small files > if at all > >possible. > > > >As you say, these problems will all go away someday. > > > >Best, > > > >-Noel Cross > > > >*I only wish it were possible -- A few people have actually > asked for this > >capability, as it would be handy to use such as program in the field. > > > > It's probably not that far off. I found the program that Larry Morse > first did for interactive specimen identification before I > entered Grad > School. It takes 64K to run. (At the time it probably had a limit of > 64K for data too -- and that stretched the limits of the > computer.) Most > of your hand-help calculators have a memory of at least 64K now. > How programmable *is* a Palm Pilot? > > Susan Farmer > sfarmer(a)goldsword.com > Botany Department, University of Tennessee > http://www.goldsword.com/sfarmer/Trillium

1 0

(RQT) character/state/comment
by Noel Cross 04 Dec '99

04 Dec '99

Hi all, Getting back to requirements, one thing that came up at the TDWG meeting was what I think of as the character/state/comment issue. Often, DELTA uses "characters", "states", and "comments" in the following manner: (Using XDELTA here instead of DELTA-CLASSIC because to me it's a bit more clear) <character number="8"> <description>outer edge of front wing</description> <comment>shape</comment> <multi type="unordered"> <state number="1">convex</state> <state number="2">straight</state> <state number="3">concave</state> <state number="4">irregular</state> </multi> </character> It was pointed out that "shape" is more than just a comment, but a critical piece of data, a "property", which modifies the character and perhaps even restricts the states to various types. Mike seemed to indicate that there would be a way to make this explicit in DELTA, though I don't recall what the mechanism was. Should this be explored as a possible requirement? How might this be issue best be resolved? -Noel

1 0

Re: (XML) XML?
by Jean-Marc Vanel 04 Dec '99

04 Dec '99

My synthetic answer to Kevin Thiele's question: Why XML? * it is the ultimate standard for exchange of data of any kind: textual, database, knowledge representation, etc; this standard is quickly growing and there is no doubt it will succeed, accompanying the growth of Internet; * it is extendible, allowing to be very formal or very unformal, as needed, * lots of free tools and standards exist for parsing, viewing, transforming, querying, * it is possible to apply the same techniques on the client browser or on the server, thus adding flexibility. See the main links on XML on http://jmvanel.free.fr/XML.htm And now the figures about the "verbosity" of XML: My XML example (a single plant species from the flora of China) compresses from 2500 bytes to 1090 (57%): http://jmvanel.free.fr/Samples/DisplayDescriptions/species_example.xml An Xdelta example (Lepidoptera) on Leigh Dodds' site compresses from 50500 bytes to 3600 (93%), because it has a large number of repeted tags. I used winzip. Now we see that we can have all plant species on a CD (650 Mb), even without compression: 2500 bytes * 250 000 species = 625 Mb And on a DVD you can have 10 times more! And also on Internet with the new HTTP 1.1 protocol, the files can be compressed during transmission. Still 2 more arguments: the verbose repeted tags are not repeted when the XML file has been parsed and is in memory; the last argument is financial: today you can buy a 10 000 Mega-bytes disk for 200 US$ , and it will decrease. So file size is NO problem for taxonomy with XML. You must understand that it's the small price to pay for extensibility and interoperability. Noel Cross wrote : > On Thu, 2 Dec 1999, Kevin Thiele wrote: > > > Over the past couple of days I've partially implemented an export function > > to produce Leigh's XDELTA documents (as a simple example of a possible XML > > format) for the data in Lucid keys. I have a key to families of flowering > > plants of Australia (240 taxa, 166 characters, 600 states). The data I'm > > using are simple - basically a score matrix, a list of taxa and a list of > > characters. The file sizes in three formats for these data are: > > > > LucID 166 kb > > DELTA 240 kb > > XML c3 Mb > > > > And this is only the most basic XML!.

1 0

(XML)shearing files between programs without data loss
by Jean-Marc Vanel 04 Dec '99

04 Dec '99

Shearing files between programs without data loss Several users have reported problems of data loss when passing files between programs. With XML and the Document Object Model, this will never happen, because it's easy for a programmer to work on the document tree, just adding nodes where additions have been made by user, and leaving the other parts unchanged. So a round-trip beetwen programs will be feasible, leading to a composite document having a common core of elements understood by both programs, some elements understood by program A only, and some by program B only. Cheers JMV "XML is not a problem, it's full of solutions"

1 0

Re: (XML) XML?
by Susan B. Farmer 04 Dec '99

04 Dec '99

*snip* I've got to catch up on my back reading ... But I saw this and *had* to respond! > >Example 2: Say I want to run NaviKey on my Palm Pilot*. The device has >limited resources and I would prefer to use very small files if at all >possible. > >As you say, these problems will all go away someday. > >Best, > >-Noel Cross > >*I only wish it were possible -- A few people have actually asked for this >capability, as it would be handy to use such as program in the field. > It's probably not that far off. I found the program that Larry Morse first did for interactive specimen identification before I entered Grad School. It takes 64K to run. (At the time it probably had a limit of 64K for data too -- and that stretched the limits of the computer.) Most of your hand-help calculators have a memory of at least 64K now. How programmable *is* a Palm Pilot? Susan Farmer sfarmer(a)goldsword.com Botany Department, University of Tennessee http://www.goldsword.com/sfarmer/Trillium

1 0

Re: (XML) XML?
by Noel Cross 04 Dec '99

04 Dec '99

On Sat, 4 Dec 1999, Jean-Marc Vanel wrote: > And now the figures about the "verbosity" of XML: > > My XML example (a single plant species from the flora of China) compresses from > 2500 bytes to 1090 (57%): > http://jmvanel.free.fr/Samples/DisplayDescriptions/species_example.xml > > An Xdelta example (Lepidoptera) on Leigh Dodds' site > compresses from 50500 bytes to 3600 (93%), because it has a large number of > repeted tags. > > I used winzip. > > Now we see that we can have all plant species on a CD (650 Mb), even without > compression: > 2500 bytes * 250 000 species = 625 Mb > > And on a DVD you can have 10 times more! And also on Internet with the new HTTP > 1.1 protocol, the files can be compressed during transmission. Still 2 more > arguments: the verbose repeted tags are not repeted when the XML file has been > parsed and is in memory; the last argument is financial: today you can buy a 10 > 000 Mega-bytes disk for 200 US$ , and it will decrease. > > So file size is NO problem for taxonomy with XML. You must understand that it's > the small price to pay for extensibility and interoperability. Agreed. In some cases large file size does present a problem though. Example 1: NaviKey is a Java applet that has to do the following: - download data - parse data - interact with user If the data are uncompressed XML files, then download time is obviously a major problem. If the XML is compressed, there is still the added time involved in compressing/uncompressing data, as well as new memory considerations. Example 2: Say I want to run NaviKey on my Palm Pilot*. The device has limited resources and I would prefer to use very small files if at all possible. As you say, these problems will all go away someday. Best, -Noel Cross *I only wish it were possible -- A few people have actually asked for this capability, as it would be handy to use such as program in the field.

1 0

Re: (RQT) Character and item hierarchy
by Stuart G. Poss 03 Dec '99

03 Dec '99

"Robert A. (Bob) Morris" wrote: > > The biggest advantage of XML for this group is its impending ubiquity, > not its formal structure, and (in my opinion) the discussion here > should return to discussing problems, not solutions because that > ubiquity will make the community want to use it whether it is the best > fit or not (just as has happened with HTML as a presentation vehicle). > > FYI I noticed this on the news lines: http://www.infoworld.com/articles/ic/xml/99/11/30/991130icxmlbiz.xml It seems we are NOT alone in the universe after all.

1 0

Re: (XML) XML?
by Noel Cross 03 Dec '99

03 Dec '99

On Thu, 2 Dec 1999, Kevin Thiele wrote: > Over the past couple of days I've partially implemented an export function > to produce Leigh's XDELTA documents (as a simple example of a possible XML > format) for the data in Lucid keys. I have a key to families of flowering > plants of Australia (240 taxa, 166 characters, 600 states). The data I'm > using are simple - basically a score matrix, a list of taxa and a list of > characters. The file sizes in three formats for these data are: > > LucID 166 kb > DELTA 240 kb > XML c3 Mb > > And this is only the most basic XML!. > By comparing three different encodings of a particular data set, I think that Kevin has given us a very graphic demonstration of the potential verbosity of an XML specification for descriptive data. This is definitely a syntactic issues that we'll need to address, even as we try to characterize the semantics of what it is that we need to encode. For this reason, I would suggest that examples giving in XML should not imply that we'll end up actually using XML as the recommended syntactic encoding of descriptive data, and that indeed no decision has been made on this matter. -Noel Cross

1 0

(XML) XML?
by Kevin Thiele 02 Dec '99

02 Dec '99

Will someone list for us exactly the benefits and disbenefits that will flow from using XML for a data interchange standard? Gregor wrote: >>These issues should be seen quite separately, and that we need both >> >>1. A markup language for text documents. This includes: >> - existing descriptions, captured by OCR or other means, where the >> markup would be manually (or automated/data mining?) >> - computer generated natural language descriptions that are >> published as electronic documents. A new CSIRO package could have >> a ToNat command that automatically adds the necessary, hidden >> markup code. >>2. A data language for new observations, including repeated >> measurements or repeated observations of categorical data, e.g. >> shape of multiple leaves in a single specimen. Further, many >> structures for knowledge managements (data revision, annotation, >> quality control and assessment) need to be implemented here. >> >>The issues overlap, and it would be beneficial to use as much common >>syntax as possible, but fundamentally I believe them to be quite >>different. To which Jim Croft replied: >Can you explain in more detail why they should be fundamentally different? >Aren't the differences just a matter a matter of degree, different points >on a continuum of data as it were? And aren't the basic principles >applicable across the whole? We need to explore this a lot more. I'm talking here about the data language, not the markup language, as it seems to me there are practical, even if not theoretical, differences. I can well understand the benefit from marking up a document (a text description) in XML. But when it comes to straight data, I'm less sure. By straight data, I mean data that can be represented most concisely as a matrix, for instance: Characters: Leaves ovate elliptic Flowers blue yellow Taxa: taxon1 taxon2 Data (taxa x states): 0101 1010 (...or a DELTA type disaggregated matrix) With this type of data, the XML markup becomes considerably more verbose than the data. Is this a problem? Over the past couple of days I've partially implemented an export function to produce Leigh's XDELTA documents (as a simple example of a possible XML format) for the data in Lucid keys. I have a key to families of flowering plants of Australia (240 taxa, 166 characters, 600 states). The data I'm using are simple - basically a score matrix, a list of taxa and a list of characters. The file sizes in three formats for these data are: LucID 166 kb DELTA 240 kb XML c3 Mb And this is only the most basic XML!. Now, perhaps in a few years we'll be carrying terabytes of data in our back pockets, but I would have thought that file size is important, particularly if I'm using this as an interchange format and want to email the data to someone (the xml file compresses very well into a zipfile, it's true). The following question: what could we do with such data as XML that we couldn't do with the data as a simple structured file as above? Leigh wrote: >Secondly I might not be generating anything visible at all. I can >well imagine an application that will take an XML document and >from the data within produce (say) a taxonomic tree or tress of that >data. Here I wouldn't use a stylesheet, I'd simply process the >data directly. Is direct processing of XML data any easier than direct processing of the data in a simpler format? Perhaps there will be off-the-shelf parsing tools, but how much of a benefit will this be? The problem to my mind is that in current formats, e.g. DELTA and LucID, much information is implied by context. Thus, in 1010 0101 The taxa and character state numbers (identities) are implied by the position of the data bit in the matrix. In XML this information is verbosely explicit. Is the following true?: once upon a time, computers could represent but not efficiently analyse or process textual data, hence documents were stored as text but "data" were stored as matrices etc. Now, XML has blurred the boundary between these types of information ("textual" and "data") and we're exploring the implications of that blurring. But are there now no differences, and no further need for a matrix? A final point. It seems to me that the discussion so far has been dominated by computer nuts (no offence intended, we need you!), with relatively little input from the community of taxonomists whose needs this standard is supposed to serve. Most of these will be happy to use a well-crafted tool, but won't want to know the intricacies of XML, and will be put off if things are not quite straightforward. This is a danger. We may end up with a wonderful, sophisticated, state-of-the-art, cutting-edge standard that nobody uses. Cheers - k

1 0