Hi Roger
TAG list url is here with the archive:
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
thanks, I registered.
I need help to understand rdf. Whereas xml schema has a conceptual mapping to database or oo-programming design, rdf seem to have none, I lack anything I can relate it too. I still have not seen any software to help me understand what you produced.
RDF is no more complex than xml schema. The RDFS way of doing things is far more object orientated than schema. It forces you to have classes and properties whereas arbitrary XML document structures can be ambiguous as to whether they are defining objects or properties of objects so - I don't see your reasoning.
I never encountered atomizing every statement into subject-predicate-object in OO design...
"to whether they are defining objects or properties of objects so": xml schema is about classes, not objects (instances). Can you give an example what you find confusing in xml schema, I don't see it.
Of course you do have the strange animal of mixed content in xml schema, but ignoring this (none of the TDWG standard used it) you have classes and each class has a type. The type can be simple or complex, just like in OO languages.
I did already tried the primer but it did not help me, it seemed to talk of use cases rather in Artificial intelligence that are hard for me to follow.
The RDF primer is a good place to start reading:
http://www.w3.org/TR/rdf-primer/
It is less than 100 printed pages so can probably be read in an evening and understood in several evenings!
There is a tutorial here:
http://www.w3schools.com/rdf/default.asp
and loads of books and things
The key to understanding it I found was that it is about describing resources not validating documents. When using XML Schema we are trying to create a set of rules to validate a document that describes the resource. We are effectively designing forms. With RDF we are describing the attributes of the resource that we want to use to describe it. Thus the two things are not mutually exclusive - which I hoped to demonstrate with my code.
That may be a good pointer to the problems I have. Because I do not think we are describing resources. In my mind we are sharing scientific data. I want the data, not the resources.
That should be discussed in common, but I already had 0 time the last three days after starting this reply, and plenty of messages have come in....
-------------
- Secondarily, on cursory reading I saw that you introduced the term
"GenusEpithet". This does not exist in the codes and is illogical, see def. below obtained by Google:
An epithet (Greek and Latin epitheton; literally meaning 'imposed' ) is a descriptive word or lapidary phrase, often metaphoric, that is essentially a reduced or condensed appositive. Epithets are sometimes attached to a person's name, as what might be described as a glorified nickname. Not every adjective is an epithet, even worn clichés. An epithet is linked to its noun by long- established usage and some are not otherwise employed. en.wikipedia.org/wiki/Epithet
You are correct and I agree with you in my post as I said that this is not normal English usage.
If I call the property 'genus' or 'genusName' then people would be tempted to use it when they are describing a TaxonName of rank genus. This would be wrong as a taxon name of rank genus is a uninomial (monomial) and the 'uninomial' property should be used.
The property 'genusEpithet' represents the first particle of a binomial or trinomial name (which happens to consist of the word used as the genus name). I am open to suggestions for other names but I guess that genusEpithet is better than firstParticleOfBinomialTrinomialName. genusPart? genus...? Incidentally under ICBN there is a weird rule that this word is not actually the genus. If there were homonymic generic names (two identical) then the species 'belong' to earlier homonym even if the author of the species intended them to be in the later homonym. This makes sense but it took me a long time to get it.
To me it is not about English usage, but about logic. English in general uses words in a logical way, so do the Nomenclatural Codes. An "Epithet" can never be the first particle (as you describe), because it means "added to something in front of it". It is like calling the front page a TitleAppendix because it is secondary material to main content.
I find simple "Genus" best - stick with the codes, but I see your reasoning. Perhaps someone finds something better than GenusPart or GenusNameParticle (I dont), but I would go for them.
I never found the rule weird, by the way, it is perfectly logical given the code considers a "name" the name-string, without the authors and publication details logically required in the case of homonyms. It basically safeguards against unnecessary name changes.
EDI booking is all ok. Many thanks!
Gregor
---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Hi Gregor,
I've placed some comments regarding your exchange with Roger in-line:
Gregor Hagedorn wrote:
Hi Roger
TAG list url is here with the archive:
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
thanks, I registered.
I need help to understand rdf. Whereas xml schema has a conceptual mapping to database or oo-programming design, rdf seem to have none, I lack anything I can relate it too. I still have not seen any software to help me understand what you produced.
XML Schema was not designed to have a conceptual mapping to databases or object-oriented frameworks. There are a set of tools and a series of conventions for loading XML schema instances into objects and for mapping schemas into relational table structures, but most of these systems only work if you use a subset of XML Schema language features. For example, XML Schema features like substitution groups and xsd:any cause many of these tools to have problems.
RDF is no more complex than xml schema. The RDFS way of doing things is far more object orientated than schema. It forces you to have classes and properties whereas arbitrary XML document structures can be ambiguous as to whether they are defining objects or properties of objects so - I don't see your reasoning.
I never encountered atomizing every statement into subject-predicate-object in OO design...
That's very true and whether you think of RDF as a graph of 3-tuples or whether you envision it as a set of "objects" that are instances of classes depends on the type of problem you're trying to solve. Triples are the lowest level but thinking in terms of the abstraction of objects and classes can be helpful for some tasks.
As an aside, some XML databases reduce XML Schema instances to a low-level structure called a flattened tree that can be analogous to triples. It is possible to decompose any XML instance into an ordered list of XPath = value pairs where the XPaths are concrete and used to refer to any element or attribute in the document. This is one of two approaches for building an XML database from scratch.
"to whether they are defining objects or properties of objects so": xml schema is about classes, not objects (instances). Can you give an example what you find confusing in xml schema, I don't see it.
XML Schema is a grammar for accepting or rejecting documents. It does not define classes of objects and the relationships among those classes. With care and common agreement from the stakeholders of an XML Schema one can create a schema such that it describes classes of objects, but this is by agreement not by design. The only class of object described by the ABCD schema is an ABCD document, not a specimen or a publication or a name or what have you.
Of course you do have the strange animal of mixed content in xml schema, but ignoring this (none of the TDWG standard used it) you have classes and each class has a type. The type can be simple or complex, just like in OO languages.
I agree that you can define simple and complex types in an XML Schema, however these are syntactic types. An XML Schema type (simple or complex) is simply a rule for accepting or rejecting an XML subtree. It does not define what a thing is and how it relates to other things, it merely describes the form a thing must have in order to be acceptable to a validating XML parser.
Because XML Schema was designed to be a grammar for the validation of XML trees and not a semantic typing system, using it to build a global collection of interrelated data objects introduces a variety of issues:
First XML Schema is very limited in the relationships you can define between types. One of the most-used relationships in OOA/OOD is inheritance and XML Schema does not provide proper inheritance. In XML Schema, substitution groups can be used as an inheritance-like language function, but they only work within a single schema. To do more than that, one must start importing other schemas which can cause some surprising problems.
Second, there is no global identity property in XML. One can use id's, but they are local to a single instance document. The use of GUIDs will enable us to build a large collection of interrelated data objects of different types. To accomplish this in XML we would have to agree on how to represent GUIDs in all of the TDWG schema. Again, this is something we can accomplish, but it will be accomplished by agreement instead as opposed to being enforced by the technology stack.
Third, XML Schema introduces the problem of schema interoperability. If I have a TCS XML Schema that allows pointers to instances of a publication XML Schema and I want instances of my TCS schema to be able to represent publications either as GUIDs or as actual data, then I must design my TCS schema to import my publication schema. This is fine for taxon concepts and publications, but what about taxon concepts and specimens? The Specimen XML Schema would have to import the TCS schema (because a specimen can be identified as an instance of a particular taxon concept) and TCS would have to import the specimen schema. This is circular import and it is not allowed. Furthermore, there is no sophisticated XML instance pre-processor system (as in C compilers) that supports conditional imports. In order to do this with XML we would have to change our requirements such that we only ever allow references to data objects defined under a foreign schema by GUID and never allow copies of those foreign data objects to be embedded in our XML instance. In plain English this means our TCS instance can't embed a publication data object, it can only refer to it by GUID. Once again, this builds greater dependency upon the GUID framework which exists by agreement only due to the second problem listed above.
This is only three out of a great many issues with using XML Schema to build a large collection of interrelated data objects. RDF (along with RDF-Schema and/or OWL) solve many of these problems. To be fair RDF also has its drawbacks, not limited to complexity of client-APIs and inefficiency of triple stores. I'd be happy to discuss problems on both sides of this ontological divide at more length if anyone else is interested.
I did already tried the primer but it did not help me, it seemed to talk of use cases rather in Artificial intelligence that are hard for me to follow.
The RDF primer is a good place to start reading:
http://www.w3.org/TR/rdf-primer/
It is less than 100 printed pages so can probably be read in an evening and understood in several evenings!
There is a tutorial here:
http://www.w3schools.com/rdf/default.asp
and loads of books and things
The key to understanding it I found was that it is about describing resources not validating documents. When using XML Schema we are trying to create a set of rules to validate a document that describes the resource. We are effectively designing forms. With RDF we are describing the attributes of the resource that we want to use to describe it. Thus the two things are not mutually exclusive - which I hoped to demonstrate with my code.
That may be a good pointer to the problems I have. Because I do not think we are describing resources. In my mind we are sharing scientific data. I want the data, not the resources.
Resources only act as identifiers for things, for data objects of a particular type. What is important is the description of those things (the data). In the RDF universe I've been imagining, resources are GUIDs for things like names, specimens, observations, publications, people, institutions, sequences, etc.
One thing we haven't talked about is the fundamental unit of data exchange in an RDF universe. It's not a document (as in the XML universe) nor is it a statement (a triple), instead it is a set of triples that form a concise description of a resource. See http://swdev.nokia.com/uriqa/CBD.html (a W3C proposal).
-Steve
Dear Steve,
many thanks for your comments. You definitely pointed out many places where my language was inaccurate
XML Schema is a grammar for accepting or rejecting documents. It does not define classes of objects and the relationships among those classes. With care and common agreement from the stakeholders of an XML Schema one can create a schema such that it describes classes of objects, but this is by agreement not by design. The only class of object described by the ABCD schema is an ABCD document, not a specimen or a publication or a name or what have you.
I agree, but then you can (and TCS and UBIF/SDD do) use schema in a way, that by design avoids global elements, substitution groups etc. Instead, we use types, which always are intended map to a class.
I agree that you can define simple and complex types in an XML Schema, however these are syntactic types. An XML Schema type (simple or complex) is simply a rule for accepting or rejecting an XML subtree. It does not define what a thing is and how it relates to other things, it merely describes the form a thing must have in order to be acceptable to a validating XML parser.
Yes, but if you write these rules by means of class inheritance, extension and polymorphism, and you add a note that this is not meant to be random, you surely are enabled to interpret this as design. You can just as well claim that an Java/whatever OO architecture is not about defining what a thing is and how it relates to other things. Strictly you are correct, but I believe that no strict separation is in place here.
First XML Schema is very limited in the relationships you can define between types. One of the most-used relationships in OOA/OOD is inheritance and XML Schema does not provide proper inheritance.
I believe it does. w3c-schema has type derivation, extension, and even type polymorphism (all somewhat limited by parsing determinism optimisations in schema). You can have extension both on simple and complex types. For polymorphism, it even has the special xsi:type attribute (strictly a separate schema, but documented in the w3c schema documentation.
substitution groups can be used as an inheritance-like language function, but they only work within a single schema. To do more than that, one must start importing other schemas which can cause some surprising problems.
I believe what you say may be true for substitution groups, but not for extension.
Second, there is no global identity property in XML. One can use id's, but they are local to a single instance document. The use of GUIDs will enable us to build a large collection of interrelated data objects of different types. To accomplish this in XML we would have to agree on how to represent GUIDs in all of the TDWG schema. Again, this is something we can accomplish, but it will be accomplished by agreement instead as opposed to being enforced by the technology stack.
I think this is erroneous. Whenever you define a data element as the simple type xs:uri you inform any parser that you mean this to be a guid. Whether parsers use that information is another question, it certainly is not validated in current validators.
Also there is not restriction that id attributes must be local. In fact they can be freely typed, including to xs:uri.
=== By the way, conversely in SDD we have identified a major problem in forcing people to use URIs for every internal reference. The problem is learning curve (school children trying to develop their own LUCID key to backyard plants should NOT be bothered with defining their GUID-scheme first - and then as a biologist I may say that biologists often would like to be treated the same...) and legal (e.g. my current base address is my employers one, but as soon as I leave or retire, I am legally forced to no longer use bba.de in any circumstances).
I am not sure how to overcome this, perhaps someone should indeed register a urn:local schema. ===
Third, XML Schema introduces the problem of schema interoperability. If I have a TCS XML Schema that allows pointers to instances of a publication XML Schema and I want instances of my TCS schema to be able to represent publications either as GUIDs or as actual data, then I must design my TCS schema to import my publication schema. This is fine for taxon concepts and publications, but what about taxon concepts and specimens? The Specimen XML Schema would have to import the TCS schema (because a specimen can be identified as an instance of a particular taxon concept) and TCS would have to import the specimen schema. This is circular import and it is not allowed.
I fully agree with this being a serious problem.
In principle it is possible to overcome this with the use of type polymorphism. UBIF would define an abstract base type (and yes, if we need more base types we would need to extent to UBIF schema, creating a new version of it).
However, in testing in 2002/2003 it turned out that major xml tools did not handle multiple namespace schemata correctly, so we never got down the road very far. So I cannot say how realistic the solution is with current software.
I agree this is an open problem with w3c schema.
This is only three out of a great many issues with using XML Schema to build a large collection of interrelated data objects. RDF (along with RDF-Schema and/or OWL) solve many of these problems. To be fair RDF also has its drawbacks, not limited to complexity of client-APIs and inefficiency of triple stores. I'd be happy to discuss problems on both sides of this ontological divide at more length if anyone else is interested.
I am and I believe we should be. Please do so, to help us get a clearer picture. Current usage ("mainstream") seems to point to xml-schema, but I think ontological approaches are exiting. I just feel we loose quite a bit as well, simply because RDF may be so general, that it does not allow to write software for more constrained (and therefore easier to analyse) cases. Although RDBMS can be used as triple store, that is not what they are designed for, so my current impression is we do loose the time proven utility of ER models implemented in RDBMS. I may still be wrong, I just start to learn about RDF/S.
Resources only act as identifiers for things, for data objects of a particular type. What is important is the description of those things (the data). In the RDF universe I've been imagining, resources are GUIDs for things like names, specimens, observations, publications, people, institutions, sequences, etc.
What are the resources, what are metadata and data when expressing knowledge about a taxon or specimen in saying:
Ipomoea violacea in the USA: "Flowers frequently dark to light blue, sometimes bordering on violet (G. Hagedorn, 29.3.2006)" and then "Flowers dark or light blue to purplish (Much. Better, 30.3.2006)"
We have object parts, characters, states, frequency modifiers, IPR metadata, versions, etc.
SDD expresses this through xml-schema. I find it very hard to think how to express this in RDF tuples. Maybe attempting this may help to understand what we loose in RDF.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
participants (2)
-
Gregor Hagedorn
-
Steven Perry