Jim Croft writes:
Date: Mon, 30 Oct 2000 23:57:44 +1100 From: Jim Croft jrc@anbg.gov.au To: TDWG-SDD@usobi.org Subject: Re: TDWG-SDD XML proposals of Kevin Thiele
The material at http://www.cs.umb.edu/efg/ThieleDraft/thiele_0_3.html illustrates our efforts to draft an XML Schema from Thiele's draft proposal version 0.3 ("TDD0.3") and illustrate its utility with simple applications manipulating a taxonomic treatment.
No-one seems to be commenting on this, at least on the list... is it all too scary? With Halloween just around the corner, do these guys deserve to be tricked or treated?
Unless people are commenting directly to Kevin/Bob/Jun, I would imagine they would be feeling pretty disappointed so far...
I have been through this a number of times, still trying to come to grips with the various options that have been used from the draft specs of a few weeks ago... and why...
This is a great attempt at implementation and surely demonstrates that we are on the right track... doesn't it?
Yes, the format does seem a bit verbose, but from what I gather Bob and Jun are saying, let the software, applications and databases worry about that...
Personally I prefer something like:
<taxonomy> <rank="species" value="alfari"/> <rank="genus" value="Azteca"/> <author value="Emery"/> <year value="1893"/> </taxonomy>
to something like:
<DESCRIPTION name="Taxonomic Information"> <FEATURE name="Species Name"> <FEATURE_VALUE>alfari</FEATURE_VALUE> </FEATURE> <FEATURE name="Genus Name"> <FEATURE_VALUE>Azteca</FEATURE_VALUE> </FEATURE> <FEATURE name="Namer"> <FEATURE_VALUE>Emery</FEATURE_VALUE> </FEATURE> <FEATURE name="Name Year"> <FEATURE_VALUE>1893</FEATURE_VALUE> </FEATURE> </DESCRIPTION>
but if we never have to read it in this format, I do not suppose it really matters...
Anyway, it looks as though we are settling on an architecture allows both discursive descriptions accommodated by 'narrative' elements, and for the compulsive obsessives, a nested set of 'features' that can have 'feature_values', 'qualifers' and associated 'narrative'... Right?
And that the competing architecture of a list of 'feature_values' that can be present, absent, unknown, doubtful, present by misinterpretation, absent by misinterpretation, imaginary, etc. has been given the flick?
Not really. We believe that it is pretty straightforward to translate between this kind of format and the one more spiritually akin to XML and will illustrate that Real Soon Now. The biggest thing we need to think about is exactly how the XML id reference mechanisms should be used to implement things as in Kevin's examples.
My own view is that there should actually be a single standard representation with a bunch of ways to transform between it and other desired representations. The advantage of this is that you need fewer translators to go between representations A and B if everything has a bidirectional representation with a common one, X. The disadvantage is that you change X at the risk of breaking X<->A, thereby breaking A<->B when neither A nor B changed. That said, I think that this issue should not have lots of philosophical discussion until we, or somebody else, demonstrates the conversion between list-like formats and tree-like formats.
Or having coded the data in this manner, is this distinction in data representation employed by different interactive identification products, not really relevant or meaningful?
Sorry about the rambling - just trying to come to grips with what has been done here... I am pretty sure it is good stuff once I work out what it all means...
One vague background concern I have is with things like:
<FEATURE name="Page Author"> <FEATURE name="name"> <FEATURE_VALUE>John T. Longino</FEATURE_VALUE> </FEATURE> <FEATURE name="efg address"> <FEATURE_VALUE>The Evergreen State College, Olympia WA 98505 USA</FEATURE_VALUE> </FEATURE> <FEATURE name="email"> <FEATURE_VALUE>longinoj@elwha.evergreen.edu</FEATURE_VALUE> </FEATURE> </FEATURE>
which seems on the surface to correspond to ye olde library cataloguing metadata stuff. Is there a wheel here we do not need to reinvent?
Absolutely. Per our remarks in the web page, we mainly did this out of time constraints. For metadata, there is no doubt that the Dublin Core and its extension the Darwin Core should be looked at. For bibliographic things, the American Publishers Association has an SGML DTD which is likely to be XML by now, etc. I'll write some references to those for people to consider.
Related to this there are a number of projects around to place using XML to mark up and display descriptive data and taxonomic treatments and they seem to inventing rafts of 'feature' names for what appear to be the same things in terms of blocks onf information. At the higher and metadata level, is it too early to talk about rationalizing these? Being very wary here, not wanting to open the Pandora's character-list box of worms again...
Can we start assembling a list?
jim
Bob Morris