[tdwg-tapir] Tapir > Capabilities > Schemas > Concepts
Steven Perry
smperry at ku.edu
Thu Nov 3 16:48:10 CET 2005
Hi Roger,
I'm not a PyWrapper developer, so my opinion is certainly not
authoritative, but my understanding is that PyWrapper only maps concepts
that correspond to attributes or leaf XML elements (those elements that
contain only text node children and do not contain other elements).
In other words, CanonicalName is not considered by PyWrapper to be a
concept. Neither is TaxonNames, TaxonName, or DataSet. The concept
mapping system in PyWrapper allows only a direct correspondence between
a database column (or literal or concatenation of a combination of
literals and columns) and an "atomic" XML element (or attribute).
My understanding is that the goal that lead to this design decision was
that one would want to query an abstract data model, and not an XML
document. This abstract data model is equivalent to a list of
hashtables that contains concept = value pairs. In my opinion this
approach has both advantages and disadvantages.
On the one hand, using an abstract data model makes it possible to do
views, which is considered a huge advantage.
On the other hand this definition of concept (essentially 1 XML element
= 1 column) means that the abstract data model is limited to a simple
hashtable. This is like programming without data structures, like C
with no structs or Java with no classes or objects, and is therefore
somewhat limiting.
Another disadvantage is that this data model assumes that concepts are
completely context-free. In other words, no concept can depend upon
another to set the context for it's semantic meaning. This is difficult
to handle if you have a schema that allows a chunk of XML like the
following:
<depth>
<measurement>112</measurement>
<unit>fathoms</unit>
</depth>
In this example, what we're really concerned with is the concept of
depth (which cannot be expressed in PyWrapper/TAPIR). Depth's full
semantic meaning is dependent upon both measurement and unit. If one
substitutes measurement for depth (as you would do with PyWrapper), then
depth/measurement is contextually constrained by the meaning of the
independent concept depth/unit. However there is no way within the
concept mapping or view system to relate these two concepts
automatically, so it's up to the developer of a particular view to
understand the relationship between these concepts and to treat is
appropriately.
The problem is one can't even treat it properly for some view schema
using the view subsystem. The issue here is that PyWrapper cannot do
transformation on concept values. So, if you wanted to map the above
version of a depth concept into a Darwin Core MaNIS depth concept using
PyWrapper, you can do one of two things. You could Forgo translation
and map depth/unit directly into dwc:DepthInMeters (which would be wrong
because 112 fathoms != 112 meters) or you could set dwc:DepthInMeters to
the concatenation of depth/measurement and depth/unit, in this case
forming the string "112 fathoms" (which is also wrong because this is
not a "depth in meters" and would prevent the generated Darwin Core
instance from validating).
The only proper way to handle this case with PyWrapper is to foresee
that your user will want to use a Darwin Core view, and to create a new
column in your underlying database that contains depth in meters, then
map this concept into PyWrapper. In essence, this bypasses the view
subsystem and relies on concept mapping and your own translation program
that exists outside of PyWrapper to do the job properly.
This is a general problem shared by all designs that use independent
atomic concepts.
Please don't think I'm denigrating PyWrapper. The problem of searching
an arbitrary database structure and mapping the results into an
arbitrary XML schema instance is difficult enough without having to
worry about mapping non-atomic concepts. I just wanted to point out
that using atomic concepts has some serious limitations.
-Steve
Roger Hyam wrote:
> I am a little confused as to what is included in the capabilities
> response under capabilities/schemas/schema.
>
> The annotation says:
>
> "Each known and mapped concept of a schema listed with a boolean flag
> indicating if its searchable (default = true)."
>
> I have a genus field in my database and I map it to the following concept:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus"
> searchable="true" />
>
> Should I also include the parent concepts:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName"
> searchable="false" />
> <concept path="/DataSet/TaxonNames/TaxonName" searchable="false" />
> <concept path="/DataSet/TaxonNames" searchable="false" />
> <concept path="/DataSet" searchable="false" />
>
> or are they implied? I presume if they are included then they aren't
> searchable as they can't be used to build filters.
>
> Does this:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus"
> searchable="true" />
>
> mean:
>
> * This concept has been mapped to data and can be used in
> arbitrary response structures and filters in combination with
> concepts from any of the other conceptual schemas listed in here.
>
> If so is the entire schemas section of capabilities response optional
> if arbitrary views are not supported? Both the schemas and concepts
> are given in views anyhow - so do we need to list them here?
>
> Your thoughts most appreciated - I may just have the wrong end of the
> stick.
>
> Roger
>
>
>
>--
>
>-------------------------------------
> Roger Hyam
> Technical Architect
> Taxonomic Databases Working Group
>-------------------------------------
> http://www.tdwg.org
> roger at tdwg.org
> +44 1578 722782
>-------------------------------------
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>tdwg-tapir mailing list
>tdwg-tapir at lists.tdwg.org
>http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
>
>
More information about the tdwg-tag
mailing list