Hi Roger,
I'm not a PyWrapper developer, so my opinion is certainly not authoritative, but my understanding is that PyWrapper only maps concepts that correspond to attributes or leaf XML elements (those elements that contain only text node children and do not contain other elements).
In other words, CanonicalName is not considered by PyWrapper to be a concept. Neither is TaxonNames, TaxonName, or DataSet. The concept mapping system in PyWrapper allows only a direct correspondence between a database column (or literal or concatenation of a combination of literals and columns) and an "atomic" XML element (or attribute).
My understanding is that the goal that lead to this design decision was that one would want to query an abstract data model, and not an XML document. This abstract data model is equivalent to a list of hashtables that contains concept = value pairs. In my opinion this approach has both advantages and disadvantages.
On the one hand, using an abstract data model makes it possible to do views, which is considered a huge advantage.
On the other hand this definition of concept (essentially 1 XML element = 1 column) means that the abstract data model is limited to a simple hashtable. This is like programming without data structures, like C with no structs or Java with no classes or objects, and is therefore somewhat limiting.
Another disadvantage is that this data model assumes that concepts are completely context-free. In other words, no concept can depend upon another to set the context for it's semantic meaning. This is difficult to handle if you have a schema that allows a chunk of XML like the following:
<depth> <measurement>112</measurement> <unit>fathoms</unit> </depth>
In this example, what we're really concerned with is the concept of depth (which cannot be expressed in PyWrapper/TAPIR). Depth's full semantic meaning is dependent upon both measurement and unit. If one substitutes measurement for depth (as you would do with PyWrapper), then depth/measurement is contextually constrained by the meaning of the independent concept depth/unit. However there is no way within the concept mapping or view system to relate these two concepts automatically, so it's up to the developer of a particular view to understand the relationship between these concepts and to treat is appropriately.
The problem is one can't even treat it properly for some view schema using the view subsystem. The issue here is that PyWrapper cannot do transformation on concept values. So, if you wanted to map the above version of a depth concept into a Darwin Core MaNIS depth concept using PyWrapper, you can do one of two things. You could Forgo translation and map depth/unit directly into dwc:DepthInMeters (which would be wrong because 112 fathoms != 112 meters) or you could set dwc:DepthInMeters to the concatenation of depth/measurement and depth/unit, in this case forming the string "112 fathoms" (which is also wrong because this is not a "depth in meters" and would prevent the generated Darwin Core instance from validating).
The only proper way to handle this case with PyWrapper is to foresee that your user will want to use a Darwin Core view, and to create a new column in your underlying database that contains depth in meters, then map this concept into PyWrapper. In essence, this bypasses the view subsystem and relies on concept mapping and your own translation program that exists outside of PyWrapper to do the job properly.
This is a general problem shared by all designs that use independent atomic concepts.
Please don't think I'm denigrating PyWrapper. The problem of searching an arbitrary database structure and mapping the results into an arbitrary XML schema instance is difficult enough without having to worry about mapping non-atomic concepts. I just wanted to point out that using atomic concepts has some serious limitations.
-Steve
Roger Hyam wrote:
I am a little confused as to what is included in the capabilities response under capabilities/schemas/schema.
The annotation says:
"Each known and mapped concept of a schema listed with a boolean flag indicating if its searchable (default = true)."
I have a genus field in my database and I map it to the following concept:
<concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus" searchable="true" />
Should I also include the parent concepts:
<concept path="/DataSet/TaxonNames/TaxonName/CanonicalName" searchable="false" />
<concept path="/DataSet/TaxonNames/TaxonName" searchable="false" /> <concept path="/DataSet/TaxonNames" searchable="false" /> <concept path="/DataSet" searchable="false" />
or are they implied? I presume if they are included then they aren't searchable as they can't be used to build filters.
Does this:
<concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus" searchable="true" />
mean:
* This concept has been mapped to data and can be used in arbitrary response structures and filters in combination with concepts from any of the other conceptual schemas listed in here.
If so is the entire schemas section of capabilities response optional if arbitrary views are not supported? Both the schemas and concepts are given in views anyhow - so do we need to list them here?
Your thoughts most appreciated - I may just have the wrong end of the stick.
Roger
--
Roger Hyam Technical Architect Taxonomic Databases Working Group
http://www.tdwg.org roger@tdwg.org
+44 1578 722782
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org