Re: [tdwg-tapir] Mapping to CNS file

13 Mar 2007

      Hi Renato,

Thanks for the reply and for TAPIRLink 0.2

I have written a CNS handler for TAPIRLink and it seems to work...  
just testing it some more.  It is a pretty simple thing.

Why does the TAPIRLink configuration file use : instead of # in the  
data types?

<concept id="http://rs.tdwg.org/dwc/dwcore/Remarks" name="Remarks"  
type="http://www.w3.org/2001/XMLSchema:string" required="false"  
searchable="true"/>

Is this accident or design? Does it have to be followed?

I am continuing to explore the idea of thunking to a non-namespace  
XML document where the namespace prefixes are converted to  
underscores so that:

/rdf:RDF/tor:OccurrenceRecord/tor:hasVoucher/tsp:Specimen/tsp:procedure

becomes

/rdf_RDF/tor_OccurrenceRecord/tor_hasVoucher/tsp_Specimen/tsp_procedure

This makes creating output models very very easy and something that  
can be automated.  The resulting instance documents can be converted  
to RDF with a simple XSLT or other script (I have not actually done  
this yet - but it should be simple ;) ).

'Thunking' is a technical term that sounds better than 'hacking' :)

I think we need be a degree of automation of output structure  
generation. If not any changes to the ontology will take a lot of  
error prone work to migrate into the TAPIR network infrastructure.

I'll try and get a complete demo of this working as it doesn't really  
make sense till it is demonstrated.

Off list could you add me to the source forge project and initiate me  
into what I need to do to add the CNS handler

Many thanks,

Roger

On 12 Mar 2007, at 02:54, Renato De Giovanni wrote:
...
Hi Roger,
I'll try to answer your questions below...
...
How can we use such as CNS file to configure the configurators of the
wrappers and is this desirable?
For TapirLink, as you already know it will first be necessary to  
write a
CNS handler to load concepts in this format. PyWrapper can deal  
with CNS.
...
I believe the current configurators assume that the full name of the
concepts is resolvable to the documentation for the concept. The
concept paths generated form the ontology are not resolvable in this
way. I could write a script that generated a documentation page when
it was passed one of these paths. Would this be a reasonable thing to
do?
There's a recommendation that concepts should try to be globally  
unique
and also resolvable to something that helps understanding what they  
are.
But this should not be assumed.
In TapirLink it's up to the conceptual schema handler  
implementation to
know how to get documentation or not. The Darwin schema handler  
stores a
link to documentation only if it finds a
xs:annotation/xs:documentation/@source node inside the concept  
definition.
A generic CNS handler could probably try to check if the concept
identifier is resolvable, and in this case store the identifier  
itself as
a link to documentation.
...
There are lots of concepts in this file (138) and it will get a lot
bigger when I add in the general properties. We really need a way of
organizing the mapping process into a hierarchy browsing process.
Could we have a pre-mapping phase where someone browses the ontology
and creates a list of concepts that they want to map. They could then
use this much shorter list in the configurator. Would this be more
productive? I have started mocking something up along these lines but
will abandon it if there is another way forward.
I don't know if PyWrapper already has some kind of hierarchical  
mapping
facility. In the case of TapirLink, you need to know that the original
design was to work with simple DarwinCore-like schemas (one or more  
flat
schema), and also to work with "almost" tabular data behind the  
scenes. So
before trying to implement such hierarchical mapping in TapirLink, it
would be important to know if it really makes sense to have underlying
tabular data being mapped to complex data models.
Anyway, even providers with tabular data could make use of an ontology
browser to select the concepts that they want to provide. I'm thinking
about how this could be used by TapirLink...
Perhaps the ontology browser could be a separate application that  
in the
end would generate a link to a conceptual schema (in some specific  
format)
whose concepts could then be loaded by a corresponding handler in the
configurator.
...
I was thinking of writing a script to automate the production of
output model structures in much the same way as creating the CNS but
I ran into  an interesting problem with preventing recursion.
...
What do you think?
I'm not sure I understood the problem - it's late here too... :-)
I would ask you a more basic question first: are you sure you need to
automate response structure creation from parts of the ontology?  
What kind
of structures do you have in mind? RDF stuff for LSID resolution?
One different thing that would be intersting - especially  
considering the
amount of time that it took to build output models during the  
workshop -
is an output model designer application. It would receive as an  
input a
response structure and one or more conceptual schemas. The application
would parse the response structure and display an interface to map all
nodes against the concepts. But I realize that in this case the  
response
structure would be an input, not an output as you want.
Best Regards,
--
Renato
_______________________________________________
tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir

Re: [tdwg-tapir] Mapping to CNS file

Roger Hyam