[Tdwg-tag] A very simple question stated again.

Mon Mar 27 19:51:00 CEST 2006

For producing tools we have been fans of Castor, but it has some flaws 
about key/keyref that make us have to do some hand coding that shouldn't 
be. We have a new P2P-based collaborative object editing 
project---especially aimed at immage annotation--- in which we are 
recoding a hand-coded prototype C# SDD instance document editor into 
Java, and our prelimary opinion is that Apache XML-Beans is good for the 
problem of making schema-driven (==schema independent) tools.  Our  
generator that produces programs XXX2SDD and SDD2XXX (for data encoded 
by any reasonable descriptive data XML schema XXX) is based on Castor 
and needs hand configuration of some metadata not easily expressible in 
XML Schema (especially regular expressions). In a system we built for 
NatureServe to provide for schema-driven distributed role-based access 
control of sensitive data (e.g. geolocations of endangered species) we 
had to hand-craft a schema-path hash enumeration  to generate XSLT as 
the control filter, but that works mainly because XML-Schema has an 
expression in XML-Schema, so a parser plus the relationship of XPath to 
XML-Schema is enough.  [This effort is less about SDD than it is about 
what will/would become a representation of observations]. Similarly, a 
tool we built for adding human readable heuristics to the otherwise 
meaningless integer key/keyref pairs is driven mostly by parsing the 
subject schema---which is supposed to insulate it from changes to the 
SDD schema, and we have yet to look at whether these two apparently 
similar tasks in two different projects are in fact a common task.

We find these kind of frameworks pretty productive, but I don't claim 
that \they/ are the tools for biologists, only that they can \produce/ 
the biology-friendly tools.I worry that such frameworks are not mature 
for RDF (else why has it proved so difficult to teach biologists how to 
use ontology tools, and why, for example, is the premier tool, Protege, 
unable to handle the complexity of ecological ontologies?). I'm going to 
guess that the purely parser-driven tools must be OK for RDFS. For 
example, I suppose there must be something around for generating SPARQL 
queries based on <something>. I have a lot to learn before I can offer 
opinions based on arguments other than expressed in an old U.S.(?) 
cultural idiom: "Where's the beef?". Or like SETI: just because we 
haven't found it doesn't mean it isn't there, it only means that we 
can't tell if we are any closer to finding it than before we started 
looking. That's an enviable position for research projects---it keeps 
our money flowing---but maybe not for production architecture proposals.

For what its worth, we also have acceded to the position in  our 
particular biologist clientele (mostly field naturalists) that 
"biologist friendly" means "looks like Excel". We implemented a VBL 
application for management of property lists on Excel cells but haven't 
tried to exploit frameworks, and it is stuck to a particular (simple) 
schema [or maybe even none at all---I forget, since generating VBL is 
not something I want my lab to aspire to...]. And, as it turns out, 
Excel ain't so bad at managing triples on a small scale. In the heat of 
a whirlwind meeting recently about invasive species information, Kevin 
Thiele forgot that he had started learning about triples and reinvented 
them---in Excel!. See  
http://wiki.cs.umb.edu/twiki/pub/IASPS/TerminologySummary/GISINSchemaworkgroup-Terminology.xls
and my biologist-friendly (I hope) commentary on it at 
http://wiki.cs.umb.edu/twiki/bin/view/IASPS/SampleDefinitions

We are also getting our feet wet in aspect oriented programming. The 
Spring framework is proving to have a very big following, but at the 
moment I have no clue where it, or its relatives, might fit in the 
discussions at hand.

Bob

Roger Hyam wrote:

>
> Hi Bob,
>
> What tools are you using with SDD and the other UBIF based schemas?
>
> Thanks,
>
> Roger
>
> Bob Morris wrote:
>
>> Ignoring the often observed combersomeness of versioning in 
>> XML-Schema, your example seems a bit of a red-herring and to me 
>> serves more as supporting a claim---with which I agree---that it is 
>> \easier/ in RDF than XML-Schema to clarify relations and easier in 
>> XML-Schema to fail to do so,  For example, were it the \intent/ to 
>> insure that Collector isA Human, it is certainly possible to do so in 
>> RDF. This then would require an extender to introduce a new 
>> supercclass, say PseudoCollector. along with all the baggage(?) of 
>> enforcing the properties it shares with the subclass. I don't know 
>> enough about DL to assert this with any confidence, but allowing 
>> arbitrary superclassing also sounds to me like it might cause pain to 
>> reasoners.
>>
>> I take the gist of the previously posted reference to the final pages 
>> of  http://www.omg.org/docs/ad/05-08-01.pdf to be that even in a 
>> modeling tool \more expressive/ than OWL, there is likely to remain 
>> the embedding of semantics in naming conventions, a cranky, but 
>> somewhat successful mechanism that is part of what Stan seems to be 
>> exploring.
>>
>> To me, issues of pain to reasoning engines is not small. To my mind, 
>> machine reasoning is the biggest motivation for considering RDF-based 
>> representations, and it is well-understood in the research community 
>> that it is quite easy for this utility to vanish into the thin air of 
>> exponential time or other complexities. I am reminded of the Feb 15 
>> posting by Steve Perry which contained the scary (to me) sentence. 
>> "We mostly use text editors for developing ontologies because we've
>> occasionally found Protege to be unstable with large complicated OWL
>> models.". (*)
>>
>> Bob
>>
>> (*)In fairness to Steve, who is way more qualified than I in these 
>> matters, on Feb 20 he posted a rather detailed analysis which makes 
>> me question my belief about the main utility of RDF where he writes:
>>
>> "For the same reason I think the primary use case is not
>> inference over OWL-described RDF, but search over flexible RDF-Schema
>> described data models.  I personally think that RDF might make some use
>> cases, especially the merge case, easier to handle.  So I'd like to see
>> further discussion of the use cases above for both XML Schema and RDF."
>>
>> Alas, my finding his arguments convincing does nothing to assuage my 
>> terrors. If biologist-friendly OWL tools are lacking for non-toy 
>> ontology development, it is likely that biologist-friendly tools for 
>> RDF/RDFS are not-even on the production-quality horizon.
>>
>> Roger Hyam wrote:
>>
>>> Hi Stan,
>>>
>>> Can I take that as a Yes then? Or is it a No?
>>>
>>> Concrete example:
>>>
>>> Take this instance document.
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <ExampleDataSet xmlns="http://example.org/specimens#">
>>>     <Specimen>
>>>         <Collector>
>>>             <Name>John Doe</Name>
>>>         </Collector>
>>>     </Specimen>
>>> </ExampleDataSet>
>>>
>>> Is John Doe a person or a research vessel?
>>>
>>> If the document validates against this schema:
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
>>> elementFormDefault="qualified"
>>>   targetNamespace="http://example.org/specimens#" 
>>> xmlns:specimens="http://example.org/specimens#">
>>>   <xs:element name="ExampleDataSet">
>>>     <xs:complexType>
>>>       <xs:sequence>
>>>         <xs:element ref="specimens:Specimen"/>
>>>       </xs:sequence>
>>>     </xs:complexType>
>>>   </xs:element>
>>>   <xs:element name="Specimen">
>>>     <xs:complexType>
>>>       <xs:sequence>
>>>         <xs:element name="Collector" type="specimens:personType"/>
>>>       </xs:sequence>
>>>     </xs:complexType>
>>>   </xs:element>
>>>   <xs:complexType name="personType">
>>>     <xs:sequence>
>>>       <xs:element name="Name" type="xs:string"/>
>>>     </xs:sequence>
>>>   </xs:complexType>
>>> </xs:schema>
>>>
>>> then it looks like John Doe is a person.
>>>
>>> Without looking at the schema John Doe could as easily have been a 
>>> team of people or an expedition or a sampling machine etc. We can 
>>> change the meaning of the instance document depending on the schema 
>>> it validates against - if we say meaning is in the schema - and 
>>> there are many schemas that this document could validate against.
>>>
>>> This is the reason for my question. If we interpret meaning from the 
>>> type hierarchy in XML Schema then we are stuck with single 
>>> inheritance (good in Java but maybe not so hot in data modeling). It 
>>> also means that we hard link structure to meaning. It is very 
>>> difficult for some one to come along and extend this schema saying 
>>> "I have an element and my element represents a collector that isn't 
>>> a person" because the notion of collector is hard coded to the 
>>> structure for representing a person. They can't abstractly say "This 
>>> machine is a type of collector".
>>>
>>> Another way to imply meaning is through namespaces. The element 
>>> http://example.org/specimens#Collector could resolve to a document 
>>> that tells us what we mean by collector. Then we wouldn't have to 
>>> worry so much about the 'structure' of the document but about the 
>>> individual meanings of elements. We could still use XML Schema to 
>>> encode useful structures but the meanings of elements would come 
>>> from the namespace. (And I didn't even mention that this is how RDF 
>>> does it - oh bother - now I have...).
>>>
>>> My central question is how we map between existing and future 
>>> schemas. If we can't say where the meaning is encoded in our current 
>>> schemas then we can't even start the process.
>>>
>>> All the best,
>>>
>>> Roger
>>>
>>>
>>> Blum, Stan wrote:
>>>
>>>> "Are the semantics encoded in the XML Schema or in the structure of 
>>>> the XML
>>>> instance documents that validate against that schema? Is it 
>>>> possible to
>>>> 'understand' an instance document without reference to the schema?"
>>>>
>>>> Possible answers are:
>>>>
>>>>
>>>>     1.    Yes: you can understand an XML instance document in the
>>>> absence of a schema it validates against i.e. just from the 
>>>> structure of the
>>>> elements and the namespaces used.
>>>>            2.    No: you require the XML Schema to understand the 
>>>> document.
>>>>
>>>>
>>>> Roger, If you postulate that the instance document is valid against 
>>>> the
>>>> schema, and the that the element and attribute names are meaningful 
>>>> to the
>>>> reader (a human, or software written by a human who understands their
>>>> meaning), then the only additional semantics the schema could 
>>>> provide would
>>>> be in the annotations/documentation, if any exist in the schema. 
>>>> I'm not entirely sure what you include in [data] "structure", but 
>>>> if you only
>>>> mean concepts such as tuples, trees, sets, lists, bags, etc., then 
>>>> I would
>>>> disagree that semantics are encoded substantially in data structure 
>>>> (of the
>>>> XML instance doc or any other record).  It is true that without proper
>>>> structure, semantics cannot be encoded, but I think semantics are 
>>>> encoded
>>>> predominantly in class/element-attribute names and any referenced
>>>> documentation (i.e., natural language).  If you replace meaningful 
>>>> names with
>>>> surrogate keys (e.g., integers) and thereby obscure any meaning 
>>>> conveyed by
>>>> the names, then the instance document would lose a lot of its meaning.
>>>>
>>>> I'm not exactly sure how this relates to the earlier discussion 
>>>> about XML
>>>> schema, RDF, and more powerful modeling methodologies like UML. but 
>>>> I hope it
>>>> helps.
>>>>
>>>> Cheers,
>>>>
>>>> -Stan
>>>>  
>>>
>>>
>>>
>>>
>>
>
>

-- 
Robert A. Morris
Professor of Computer Science
UMASS-Boston
ram at cs.umb.edu
http://www.cs.umb.edu/efg
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466