I am sorry - I had not realized the discussion had moved off list... :(
was just replying mindlessly to email as it came in... :)
jim
Date: Wed, 17 Sep 2003 08:45:00 +1000 To: "G. Hagedorn" G.Hagedorn@bba.de From: Jim Croft jrc@anbg.gov.au Subject: Re: simple instance
Bob very bravely defended our decision and I think it is more or less correct,
It was not meant to be an attack... just expressing a bit of concern that non-SDD people are going to look at it and freak out at the specification overhead...
but your reaction concerns me, because this is socialogical and a question of acceptance.
that is why I made the comments... if our specification is so complex in its implementation that people can not understand or install it they will not become engaged in our noble cause...
The specification (= "schema" with comments) of html is not, it is huge. Doing <leaves><simple/></leaves> is very short. However, that means we have to put the specification (= "schema") elsewhere.
As a general rule, argument by analogy is not a good thing to do, but in this case I think the comparison is apt...
The reason HTML took off is that it was simple enough to understand the code and implied DTD, and it was not allowed to fail from the users perspective - you could put in as many or as few tags as you liked, even leave out the all important <html> and half the closing tags, and it would still not bomb out on the browser... I believe it was this level of fault tolerance that ensured its unquestionable success.
SGML on the other hand was impenetrable to the average human, or scientist, and was almost a complex as SDD and as a result has almost vanished from the everyday information management lexicon. It was undeniably a better environment to do what we were trying to achieve, but there was not way we were going to be able to give it the level of social acceptance to enable it to take off like HTML did...
and then along came XML which offered the power of SGML with the simplicity of HTML... we had to be the opportunity to be as simple or as complex as we want...
and here we are... :)
The SDD design principle is to allow the user to define things. That means we not only have the leaves and simple, we also need to define that. That makes it a lot more complicated.
perhaps were should look a bit more closely at the word 'allow'. As it stands at the moment, are we allowing, or compelling people to define everything, multiple times?
Is there a compromise position? can we specify the standard to different backwardly compatible levels depending on the degree of compulsion we might want for a particular instance?
Audience definitions: a single audience definition will do according to our current definition. You must define one, however.
why? can't we have a general audience definition implied by default?
You can also include a fairly complete set of audience definitions which should be readily available at the end of this effort. Both is fine.
Or we could leave it to be user definable? is it linear? perhaps a 3,5,7,10 level of audience sophistication from stupid to genius is ok? :)
If I receive a dataset from an editor that does not care about statistical parameters (unable to express numeric statements) and want to edit this with another application, adding numeric statements, I somehow have to have a mechanism to add the basic infrastructure to become able to do this. This is an extra step, which we could avoid by simple for SDD version 1 require a fixed set of minimal statistical parameters to be present in each file. For processors not really supporting it, this would in fact be a singe xinclude statement to some globally stored SDD template file.
This is fine if we live in Bob's utopian machine to machine world, but until we get there we have these human things to deal with... we need to be able to present a series of initial and intermediate steps in using SDD that people can understand and implement or all we will have is a nice specification that suffers the same fate as SGMwho...
Even if it going to be machine talking to machine, and ultimately that is what it needs to do, a human has to interpret what we have invented and instruct the machines accordingly. Unfortunately 99 our of 100 people working on descriptive data do not have the level if computing savvy of a Bob or a Gregor - for the most part they are taxonomists who know about organisms and we can't, and shouldn't, try to turn them into programmers. we need to provide something to the well meaning taxonomist who want to do the right thing. If we give them the schema and instance we are talking about, most will take one look and walk away...
Reason 1. silly but true: xmlspy does not display attributes in the schema view directly, you are have to click on an element and then look in a separate window. We therefore overlooked things that were hidden in attributes.
yep, you are right - that is silly - I remember the discussion now... :)
Reason 2. We kept moving things between elements and attributes. Some things just have to be elements (because they contain further structure), many things we could choose. After some indecision I increasingly found that I have the best feeling of simply saying: key/keyref in attributes, the rest elements.
I think it was Guillaume who had a good definition - if it was data it belonged in an element, if was metadata, data about the data, then it was an attribute... don't know if this is right, but it seemed to make a lot of sense at the time and stuck in my brain to the extent that I have not looked at a schema the same way since... :)
jim
~ Jim Croft ~ jrc@anbg.gov.au ~ 02-62465500 ~ www.anbg.gov.au/jrc/ ~
~ Jim Croft ~ jrc@anbg.gov.au ~ 02-62465500 ~ www.anbg.gov.au/jrc/ ~