[tdwg-tapir] Mapping to CNS file
Hi Folks,
I have been doing some thinking about mapping to the ontology.
We have a way of defining TAPIR concepts as paths through the ontology. I have written a script to generate a CNS file from a particular view onto the ontology i.e. starting at a root class and following all the links in the RDF to extinction but not allowing recursion on any one path. It needs a little work to include the common and external properties but seems to run OK.
I attach an example CNS if it will make it through the mail server.
How can we use such as CNS file to configure the configurators of the wrappers and is this desirable?
I believe the current configurators assume that the full name of the concepts is resolvable to the documentation for the concept. The concept paths generated form the ontology are not resolvable in this way. I could write a script that generated a documentation page when it was passed one of these paths. Would this be a reasonable thing to do?
There are lots of concepts in this file (138) and it will get a lot bigger when I add in the general properties. We really need a way of organizing the mapping process into a hierarchy browsing process. Could we have a pre-mapping phase where someone browses the ontology and creates a list of concepts that they want to map. They could then use this much shorter list in the configurator. Would this be more productive? I have started mocking something up along these lines but will abandon it if there is another way forward.
I was thinking of writing a script to automate the production of output model structures in much the same way as creating the CNS but I ran into an interesting problem with preventing recursion.
When building paths for the CNS any one path can't include the same class twice - which is easy to do when you are only thinking about paths but when you are building a series of XML Schemas it becomes more complex.
A --includes--> B --includes--> C --includes--> A
Can be detected and C can contain a link instead of the include.
A --includes--> B --includes--> C --linksTo--> A
So we avoid recursion when building the concepts in the CNS.
But if we have to consider multiple paths and possible loops then we end in situations where we can't make choices automatically. If B can include C and C can include B for example.
A --includes--> B --includes--> C --linksTo-->B A --includes--> C --includes--> B --linksTo -->C
There are two definitions of B here, one with a link to C and one which includes C. We can't do this in XML Schema because we have to have two definitions of the same thing in the same namespace. When you reference the element in the schema for A it couldn't know which one you meant. Please correct me if I am wrong - I'd like to be.
This example is likely to arise more often than you might think as there are lots of circular links in the ontology. just think of synonymy and basionyms and hierarchies etc.
So a decision still has to be made - by a human I guess - about which is the optimum combination of links and includes for a useful output structure.
It makes me think a convention on namespace substitution is the easiest way forward for building output models - but that might be just because we are getting near the end of the day here....
What do you think?
Roger
Hi Roger,
I'll try to answer your questions below...
How can we use such as CNS file to configure the configurators of the wrappers and is this desirable?
For TapirLink, as you already know it will first be necessary to write a CNS handler to load concepts in this format. PyWrapper can deal with CNS.
I believe the current configurators assume that the full name of the concepts is resolvable to the documentation for the concept. The concept paths generated form the ontology are not resolvable in this way. I could write a script that generated a documentation page when it was passed one of these paths. Would this be a reasonable thing to do?
There's a recommendation that concepts should try to be globally unique and also resolvable to something that helps understanding what they are. But this should not be assumed.
In TapirLink it's up to the conceptual schema handler implementation to know how to get documentation or not. The Darwin schema handler stores a link to documentation only if it finds a xs:annotation/xs:documentation/@source node inside the concept definition. A generic CNS handler could probably try to check if the concept identifier is resolvable, and in this case store the identifier itself as a link to documentation.
There are lots of concepts in this file (138) and it will get a lot bigger when I add in the general properties. We really need a way of organizing the mapping process into a hierarchy browsing process. Could we have a pre-mapping phase where someone browses the ontology and creates a list of concepts that they want to map. They could then use this much shorter list in the configurator. Would this be more productive? I have started mocking something up along these lines but will abandon it if there is another way forward.
I don't know if PyWrapper already has some kind of hierarchical mapping facility. In the case of TapirLink, you need to know that the original design was to work with simple DarwinCore-like schemas (one or more flat schema), and also to work with "almost" tabular data behind the scenes. So before trying to implement such hierarchical mapping in TapirLink, it would be important to know if it really makes sense to have underlying tabular data being mapped to complex data models.
Anyway, even providers with tabular data could make use of an ontology browser to select the concepts that they want to provide. I'm thinking about how this could be used by TapirLink...
Perhaps the ontology browser could be a separate application that in the end would generate a link to a conceptual schema (in some specific format) whose concepts could then be loaded by a corresponding handler in the configurator.
I was thinking of writing a script to automate the production of output model structures in much the same way as creating the CNS but I ran into an interesting problem with preventing recursion. ... What do you think?
I'm not sure I understood the problem - it's late here too... :-)
I would ask you a more basic question first: are you sure you need to automate response structure creation from parts of the ontology? What kind of structures do you have in mind? RDF stuff for LSID resolution?
One different thing that would be intersting - especially considering the amount of time that it took to build output models during the workshop - is an output model designer application. It would receive as an input a response structure and one or more conceptual schemas. The application would parse the response structure and display an interface to map all nodes against the concepts. But I realize that in this case the response structure would be an input, not an output as you want.
Best Regards, -- Renato
Hi Renato,
Thanks for the reply and for TAPIRLink 0.2
I have written a CNS handler for TAPIRLink and it seems to work... just testing it some more. It is a pretty simple thing.
Why does the TAPIRLink configuration file use : instead of # in the data types?
<concept id="http://rs.tdwg.org/dwc/dwcore/Remarks" name="Remarks" type="http://www.w3.org/2001/XMLSchema:string" required="false" searchable="true"/>
Is this accident or design? Does it have to be followed?
I am continuing to explore the idea of thunking to a non-namespace XML document where the namespace prefixes are converted to underscores so that:
/rdf:RDF/tor:OccurrenceRecord/tor:hasVoucher/tsp:Specimen/tsp:procedure
becomes
/rdf_RDF/tor_OccurrenceRecord/tor_hasVoucher/tsp_Specimen/tsp_procedure
This makes creating output models very very easy and something that can be automated. The resulting instance documents can be converted to RDF with a simple XSLT or other script (I have not actually done this yet - but it should be simple ;) ).
'Thunking' is a technical term that sounds better than 'hacking' :)
I think we need be a degree of automation of output structure generation. If not any changes to the ontology will take a lot of error prone work to migrate into the TAPIR network infrastructure.
I'll try and get a complete demo of this working as it doesn't really make sense till it is demonstrated.
Off list could you add me to the source forge project and initiate me into what I need to do to add the CNS handler
Many thanks,
Roger
On 12 Mar 2007, at 02:54, Renato De Giovanni wrote:
Hi Roger,
I'll try to answer your questions below...
How can we use such as CNS file to configure the configurators of the wrappers and is this desirable?
For TapirLink, as you already know it will first be necessary to write a CNS handler to load concepts in this format. PyWrapper can deal with CNS.
I believe the current configurators assume that the full name of the concepts is resolvable to the documentation for the concept. The concept paths generated form the ontology are not resolvable in this way. I could write a script that generated a documentation page when it was passed one of these paths. Would this be a reasonable thing to do?
There's a recommendation that concepts should try to be globally unique and also resolvable to something that helps understanding what they are. But this should not be assumed.
In TapirLink it's up to the conceptual schema handler implementation to know how to get documentation or not. The Darwin schema handler stores a link to documentation only if it finds a xs:annotation/xs:documentation/@source node inside the concept definition. A generic CNS handler could probably try to check if the concept identifier is resolvable, and in this case store the identifier itself as a link to documentation.
There are lots of concepts in this file (138) and it will get a lot bigger when I add in the general properties. We really need a way of organizing the mapping process into a hierarchy browsing process. Could we have a pre-mapping phase where someone browses the ontology and creates a list of concepts that they want to map. They could then use this much shorter list in the configurator. Would this be more productive? I have started mocking something up along these lines but will abandon it if there is another way forward.
I don't know if PyWrapper already has some kind of hierarchical mapping facility. In the case of TapirLink, you need to know that the original design was to work with simple DarwinCore-like schemas (one or more flat schema), and also to work with "almost" tabular data behind the scenes. So before trying to implement such hierarchical mapping in TapirLink, it would be important to know if it really makes sense to have underlying tabular data being mapped to complex data models.
Anyway, even providers with tabular data could make use of an ontology browser to select the concepts that they want to provide. I'm thinking about how this could be used by TapirLink...
Perhaps the ontology browser could be a separate application that in the end would generate a link to a conceptual schema (in some specific format) whose concepts could then be loaded by a corresponding handler in the configurator.
I was thinking of writing a script to automate the production of output model structures in much the same way as creating the CNS but I ran into an interesting problem with preventing recursion. ... What do you think?
I'm not sure I understood the problem - it's late here too... :-)
I would ask you a more basic question first: are you sure you need to automate response structure creation from parts of the ontology? What kind of structures do you have in mind? RDF stuff for LSID resolution?
One different thing that would be intersting - especially considering the amount of time that it took to build output models during the workshop - is an output model designer application. It would receive as an input a response structure and one or more conceptual schemas. The application would parse the response structure and display an interface to map all nodes against the concepts. But I realize that in this case the response structure would be an input, not an output as you want.
Best Regards,
Renato
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Hi Roger,
Good to know about the new CNS handler. You just need to tell me your sourceforge login so that I can give you write permissions to the repository.
I'm curious to see how you're handling a CNS with multiple conceptual schemas. It may be interesting to allow users to choose which ones they want to map, but I'm not sure what would be the best way to do this. And it will also be necessary to include a new combo beside the field "additional schema to load" to specify a handler.
About the concept types, I think they are not being used anywhere yet. I just thought it could be interesting to store them. Maybe they will never be used because what really matters when producing output are the XML types defined in the response structure - which can be completely different from the concept types. So... I can change the separator to "#" or maybe I should simply drop that attribute.
It's really nice to see more people getting involved - I should probably start to improve code documentation...
Best Regards, -- Renato
On 13 Mar 2007 at 15:53, Roger Hyam wrote:
Hi Renato,
Thanks for the reply and for TAPIRLink 0.2
I have written a CNS handler for TAPIRLink and it seems to work... just testing it some more. It is a pretty simple thing.
Why does the TAPIRLink configuration file use : instead of # in the data types?
<concept id="http://rs.tdwg.org/dwc/dwcore/Remarks" name="Remarks" type="http://www.w3.org/2001/XMLSchema:string" required="false" searchable="true"/>
Is this accident or design? Does it have to be followed?
I am continuing to explore the idea of thunking to a non-namespace XML document where the namespace prefixes are converted to underscores so that:
/rdf:RDF/tor:OccurrenceRecord/tor:hasVoucher/tsp:Specimen/tsp:procedure
becomes
/rdf_RDF/tor_OccurrenceRecord/tor_hasVoucher/tsp_Specimen/tsp_procedure
This makes creating output models very very easy and something that can be automated. The resulting instance documents can be converted to RDF with a simple XSLT or other script (I have not actually done this yet - but it should be simple ;) ).
'Thunking' is a technical term that sounds better than 'hacking' :)
I think we need be a degree of automation of output structure generation. If not any changes to the ontology will take a lot of error prone work to migrate into the TAPIR network infrastructure.
I'll try and get a complete demo of this working as it doesn't really make sense till it is demonstrated.
Off list could you add me to the source forge project and initiate me into what I need to do to add the CNS handler
Many thanks,
Roger
Hi Renato,
On 13 Mar 2007, at 20:17, Renato De Giovanni wrote:
I'm curious to see how you're handling a CNS with multiple conceptual schemas. It may be interesting to allow users to choose which ones they want to map, but I'm not sure what would be the best way to do this. And it will also be necessary to include a new combo beside the field "additional schema to load" to specify a handler.
I am probably cheating here by treating a single CNS as a conceptual schema. I create a CNS from a view onto the ontology. As an example all the paths into the ontology from the TaxonConcept class (but preventing or limiting recursions).
This seems to be the way to do it from the point of view of the ontology. A CNS file with 200+ concepts in is generated from the OccurrenceRecord view of the ontology alone. If we merged this with CNS files of similar sizes for the other basic types we would end up with a CNS containing 1000s of concepts which would not be practical to present to the user.
Does this break anything in the ideas behind a TAPIR network? It is not likely to lead to a bad multiplication of CNS files as there are only likely to be around 10 maximum for the ontology. What do you think?
The alternative, as you say, is to change the interface and allow the user to choose between different [concept_source] blocks in one big file. I am not sure of the advantages of this though. Because of the nature of the CNS files I guess one could concatenate them together to get a single CNS for the whole ontology if needed.
I am not totally happy with presenting the user even with 100's of things to map. It would make my heart sink to see such a list. I am looking at ways to improve the documentation and try and target the most "important" or most needed properties to map first. This is the same problem that is faced by the ABCD guys I guess. I don't rule out UI changes to help solve this but may be able to do it with just carefully ordering of concepts. Ideas and suggestions would be welcome.
All the best,
Roger
Hi Roger,
Does this break anything in the ideas behind a TAPIR network? It is not likely to lead to a bad multiplication of CNS files as there are only likely to be around 10 maximum for the ontology. What do you think?
I see no problems in having separate CNS files for each conceptual schema. It doesn't break anything - actually TAPIR capabilities responses can even advertise more than one CNS.
In principle the new CNS handler for TapirLink could just fetch the first conceptual schema from the file. Otherwise we would need to either change the interface or create parameters to the handler.
Maybe the main CNS file that we have now for DarwinCore and ABCD versions could also be separated into different files... (Markus?) Perhaps there could be other advantages in doing that.
Best Regards, -- Renato
Hi guys, nice to see you are progressing!
An important idea behind having multiple schemas inside a CNS is that it acts as a single entrypoint. Remember that we intended to replace the CNS alias.txt file with a real webservice. Having only 1 entrypoint gives you one really big advantage. As soon as you update this file, all providers are aware of this! In the case you want to add a new CNS file, every service has to be reconfigured. Thats easy to do, but someone has to do it and it will take a long time til it spreads through our entire network. We could surely have another registry of CNS files, but that sounds to me a bit over the top. Can you explain to me again why having separate alias.txt files is important for TapirLink? Couldnt you just read the entire file and locally split it into cached separate alias.txt files? -- Markus
On 14.03.2007, at 13:16, Renato De Giovanni wrote:
Hi Roger,
Does this break anything in the ideas behind a TAPIR network? It is not likely to lead to a bad multiplication of CNS files as there are only likely to be around 10 maximum for the ontology. What do you think?
I see no problems in having separate CNS files for each conceptual schema. It doesn't break anything - actually TAPIR capabilities responses can even advertise more than one CNS.
In principle the new CNS handler for TapirLink could just fetch the first conceptual schema from the file. Otherwise we would need to either change the interface or create parameters to the handler.
Maybe the main CNS file that we have now for DarwinCore and ABCD versions could also be separated into different files... (Markus?) Perhaps there could be other advantages in doing that.
Best Regards,
Renato _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Hi Markus,
The issue is that the configuration interface was developed in a way that it expects each conceptual schema to have its own individual address. Changing the interface can be a bit tricky, but I'm sure we can find some workaround. One way would be something similar to what you suggested: a CNS address with multiple schemas would make the configurator download them and display each one in the existing list of pre-defined schemas that can be selected by the user. Another way could be to include a fake parameter in the CNS address to specify the conceptual schema (this one doesn't look so nice but it somehow simulates a real service).
Regards, -- Renato
On 19 Mar 2007 at 16:29, Markus Döring wrote:
Hi guys, nice to see you are progressing!
An important idea behind having multiple schemas inside a CNS is that it acts as a single entrypoint. Remember that we intended to replace the CNS alias.txt file with a real webservice. Having only 1 entrypoint gives you one really big advantage. As soon as you update this file, all providers are aware of this! In the case you want to add a new CNS file, every service has to be reconfigured. Thats easy to do, but someone has to do it and it will take a long time til it spreads through our entire network. We could surely have another registry of CNS files, but that sounds to me a bit over the top. Can you explain to me again why having separate alias.txt files is important for TapirLink? Couldnt you just read the entire file and locally split it into cached separate alias.txt files? -- Markus
I am trying to get my head round this and figure out if it matters or not.
When some one is running a configurator on a wrapper they need to pick sets of concepts (concept_source) that they will map for a particular endpoint.
The configurator needs to get these sets of concepts from somewhere that is managed centrally for any one thematic network so that it can be kept up to date.
The configurator will probably know about some sets of concepts when it is installed but the user needs to be able to specify other sets.
In the case of the set of concepts being contained in an XML Schema there is a 1:1 relationship between the set and a URI.
In the case of the set of concepts being contained in a CNS file (as currently specified) there is potentially a one to many relationship where the URI may refer to many sets of concepts in a single file unless we adopt a convention of using a fragment identifier in the URI to specify a particular concept_source within the CNS.
The advantage to having multiple concept_sources in a single CNS is that the wrapper can be distributed with the URI of a CNS that can subsequently contain new concept_sources that weren't known about previously.
I suspect that (although it would be good to have a system where the configurators lead people through choosing which concept_sources they might want to map things against) it is actually much easier just to have a web page that describes them and gives the URI to enter into the configurator.
My preference at the moment is to adopt the convention of using the fragment identifier to point out which concept_source within a CNS is used. The URI fragment == alias of the concept_source. This keeps the 1:1 mapping of URI to concept_source and the implementation simple. The wrapper can simply not support CNS mapping where the fragment isn't specified or it can load the whole CNS and ask the user to pick which concept_source they want to use.
A possibility for the TAPIRLink implemenation is to have the schemas.xml file loaded from a central location.
From the ontology point of view it makes sense to have a URI for each main object types that returns the CNS for that view onto the ontology - so I guess that is the reason I did it that way. I could always put together a uri that returned a concatenation of the CNS files for all the different entry points for the ontology if that was useful.
What do you think?
Roger
On 19 Mar 2007, at 18:34, Renato De Giovanni wrote:
Hi Markus,
The issue is that the configuration interface was developed in a way that it expects each conceptual schema to have its own individual address. Changing the interface can be a bit tricky, but I'm sure we can find some workaround. One way would be something similar to what you suggested: a CNS address with multiple schemas would make the configurator download them and display each one in the existing list of pre-defined schemas that can be selected by the user. Another way could be to include a fake parameter in the CNS address to specify the conceptual schema (this one doesn't look so nice but it somehow simulates a real service).
Regards,
Renato
On 19 Mar 2007 at 16:29, Markus Döring wrote:
Hi guys, nice to see you are progressing!
An important idea behind having multiple schemas inside a CNS is that it acts as a single entrypoint. Remember that we intended to replace the CNS alias.txt file with a real webservice. Having only 1 entrypoint gives you one really big advantage. As soon as you update this file, all providers are aware of this! In the case you want to add a new CNS file, every service has to be reconfigured. Thats easy to do, but someone has to do it and it will take a long time til it spreads through our entire network. We could surely have another registry of CNS files, but that sounds to me a bit over the top. Can you explain to me again why having separate alias.txt files is important for TapirLink? Couldnt you just read the entire file and locally split it into cached separate alias.txt files? -- Markus
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Hi Roger,
Can you give an example of the URI using a fragment identifier for a concept source? Are you thinking about something like this:
http://somehost/somepath?cs=darwincore1.4
It will probably be the simplest solution now. The configuration interface (and the CNS handler) can be changed later to support URIs that don't specify a conceptual schema.
Best Regards, -- Renato
On 22 Mar 2007 at 14:23, Roger Hyam wrote:
I am trying to get my head round this and figure out if it matters or not.
When some one is running a configurator on a wrapper they need to pick sets of concepts (concept_source) that they will map for a particular endpoint.
The configurator needs to get these sets of concepts from somewhere that is managed centrally for any one thematic network so that it can be kept up to date.
The configurator will probably know about some sets of concepts when it is installed but the user needs to be able to specify other sets.
In the case of the set of concepts being contained in an XML Schema there is a 1:1 relationship between the set and a URI.
In the case of the set of concepts being contained in a CNS file (as currently specified) there is potentially a one to many relationship where the URI may refer to many sets of concepts in a single file unless we adopt a convention of using a fragment identifier in the URI to specify a particular concept_source within the CNS.
The advantage to having multiple concept_sources in a single CNS is that the wrapper can be distributed with the URI of a CNS that can subsequently contain new concept_sources that weren't known about previously.
I suspect that (although it would be good to have a system where the configurators lead people through choosing which concept_sources they might want to map things against) it is actually much easier just to have a web page that describes them and gives the URI to enter into the configurator.
My preference at the moment is to adopt the convention of using the fragment identifier to point out which concept_source within a CNS is used. The URI fragment == alias of the concept_source. This keeps the 1:1 mapping of URI to concept_source and the implementation simple. The wrapper can simply not support CNS mapping where the fragment isn't specified or it can load the whole CNS and ask the user to pick which concept_source they want to use.
A possibility for the TAPIRLink implemenation is to have the schemas.xml file loaded from a central location.
From the ontology point of view it makes sense to have a URI for each main object types that returns the CNS for that view onto the ontology - so I guess that is the reason I did it that way. I could always put together a uri that returned a concatenation of the CNS files for all the different entry points for the ontology if that was useful.
What do you think?
Roger
Hi Renato,
I suspect Roger was thinking more along the lines of:
http://somehost/somepath/schema#someconcept
At least that's what I read from "fragment identifier".
On an aside, kind of, can someone elaborate on the decision to use a CNS file format (as described in the 1.0 spec) that is not in some form of xml, preferably RDF?
thanks, Dave V.
On Mar 22, 2007, at 12:28, Renato De Giovanni wrote:
Hi Roger,
Can you give an example of the URI using a fragment identifier for a concept source? Are you thinking about something like this:
http://somehost/somepath?cs=darwincore1.4
It will probably be the simplest solution now. The configuration interface (and the CNS handler) can be changed later to support URIs that don't specify a conceptual schema.
Best Regards,
Renato
On 22 Mar 2007 at 14:23, Roger Hyam wrote:
I am trying to get my head round this and figure out if it matters or not.
When some one is running a configurator on a wrapper they need to pick sets of concepts (concept_source) that they will map for a particular endpoint.
The configurator needs to get these sets of concepts from somewhere that is managed centrally for any one thematic network so that it can be kept up to date.
The configurator will probably know about some sets of concepts when it is installed but the user needs to be able to specify other sets.
In the case of the set of concepts being contained in an XML Schema there is a 1:1 relationship between the set and a URI.
In the case of the set of concepts being contained in a CNS file (as currently specified) there is potentially a one to many relationship where the URI may refer to many sets of concepts in a single file unless we adopt a convention of using a fragment identifier in the URI to specify a particular concept_source within the CNS.
The advantage to having multiple concept_sources in a single CNS is that the wrapper can be distributed with the URI of a CNS that can subsequently contain new concept_sources that weren't known about previously.
I suspect that (although it would be good to have a system where the configurators lead people through choosing which concept_sources they might want to map things against) it is actually much easier just to have a web page that describes them and gives the URI to enter into the configurator.
My preference at the moment is to adopt the convention of using the fragment identifier to point out which concept_source within a CNS is used. The URI fragment == alias of the concept_source. This keeps the 1:1 mapping of URI to concept_source and the implementation simple. The wrapper can simply not support CNS mapping where the fragment isn't specified or it can load the whole CNS and ask the user to pick which concept_source they want to use.
A possibility for the TAPIRLink implemenation is to have the schemas.xml file loaded from a central location.
From the ontology point of view it makes sense to have a URI for each main object types that returns the CNS for that view onto the ontology - so I guess that is the reason I did it that way. I could always put together a uri that returned a concatenation of the CNS files for all the different entry points for the ontology if that was useful.
What do you think?
Roger
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Hi Dave,
Thanks! It looks a lot more elegant with "#", and still very simple.
About the CNS, I think there was always an expectation that it would soon become a real service, so we just wanted to start with a simple file that was easy to type. At that time there was an attempt to specify a more powerful service which would be based on ebXML, but as far as I know it was never implemented. So the original format which was supposed to be temporary remained until now.
I agree that using XML/RDF would be better, but I still think that sooner or later we will need a real service. TapirLink already needs something that maps equivalent concepts from different schemas.
Best Regards, -- Renato
On 22 Mar 2007 at 13:07, Dave Vieglais wrote:
Hi Renato,
I suspect Roger was thinking more along the lines of:
http://somehost/somepath/schema#someconcept
At least that's what I read from "fragment identifier".
On an aside, kind of, can someone elaborate on the decision to use a CNS file format (as described in the 1.0 spec) that is not in some form of xml, preferably RDF?
thanks, Dave V.
Hi Dave and all,
I actually meant more like:
http://somehost/somepath/alias.txt#some_concept_source
where it is identifying a complete section of a CNS file containing many TAPIR concepts.
My understanding of the whole RDF + CNS + TAPIR Concepts + output model is like a thunking layer to get from RDF to simple XML and back again.
There is a pretty picture here: http://wiki.tdwg.org/twiki/bin/view/TAG
At the moment I have a script that takes a view on the ontology that is defined in OWL and creates two things: A TAPIR output model and a CNS file that lists the concepts in the output model (paths through the ontology following the ObjectProperty relationships - not the subclassing relationships). It creates the big schemas that are reminiscent of ABCD but that map on to the ontology and RDF. (It also creates some documentation).
I am actually working on the output model not using namespaces but only element naming conventions (e.g. rdf_RDF == rdf:RDF) A simple XSLT then turns the resulting instance documents into real RDF with all the namespaces and stuff correctly in place. A couple of regular expressions would do the same job.
It sounds like a bit of a hack but as the XML Schemas and instance documents are really only used as part of the TAPIR configuration and protocol layer I feel it is justified. It gets around loads of problems like recursion of XSD complexTypes, confusion over imports of different complexTypes that represent the same object and having numerous schema imports to cope with the different namespaces.
I want to get the whole of this working and demo'd and then I'll put a wiki page together on it.
So the concepts exist in RDF/OWL already we are just discussing a representation of them to map into TAPIR networks.
It should be possible for TAPIR providers to appear like semantic web applications - but not SPARQL servers.
All the best,
Roger
On 22 Mar 2007, at 18:07, Dave Vieglais wrote:
Hi Renato,
I suspect Roger was thinking more along the lines of:
http://somehost/somepath/schema#someconcept
At least that's what I read from "fragment identifier".
On an aside, kind of, can someone elaborate on the decision to use a CNS file format (as described in the 1.0 spec) that is not in some form of xml, preferably RDF?
thanks, Dave V.
On Mar 22, 2007, at 12:28, Renato De Giovanni wrote:
Hi Roger,
Can you give an example of the URI using a fragment identifier for a concept source? Are you thinking about something like this:
http://somehost/somepath?cs=darwincore1.4
It will probably be the simplest solution now. The configuration interface (and the CNS handler) can be changed later to support URIs that don't specify a conceptual schema.
Best Regards,
Renato
On 22 Mar 2007 at 14:23, Roger Hyam wrote:
I am trying to get my head round this and figure out if it matters or not.
When some one is running a configurator on a wrapper they need to pick sets of concepts (concept_source) that they will map for a particular endpoint.
The configurator needs to get these sets of concepts from somewhere that is managed centrally for any one thematic network so that it can be kept up to date.
The configurator will probably know about some sets of concepts when it is installed but the user needs to be able to specify other sets.
In the case of the set of concepts being contained in an XML Schema there is a 1:1 relationship between the set and a URI.
In the case of the set of concepts being contained in a CNS file (as currently specified) there is potentially a one to many relationship where the URI may refer to many sets of concepts in a single file unless we adopt a convention of using a fragment identifier in the URI to specify a particular concept_source within the CNS.
The advantage to having multiple concept_sources in a single CNS is that the wrapper can be distributed with the URI of a CNS that can subsequently contain new concept_sources that weren't known about previously.
I suspect that (although it would be good to have a system where the configurators lead people through choosing which concept_sources they might want to map things against) it is actually much easier just to have a web page that describes them and gives the URI to enter into the configurator.
My preference at the moment is to adopt the convention of using the fragment identifier to point out which concept_source within a CNS is used. The URI fragment == alias of the concept_source. This keeps the 1:1 mapping of URI to concept_source and the implementation simple. The wrapper can simply not support CNS mapping where the fragment isn't specified or it can load the whole CNS and ask the user to pick which concept_source they want to use.
A possibility for the TAPIRLink implemenation is to have the schemas.xml file loaded from a central location.
From the ontology point of view it makes sense to have a URI for each main object types that returns the CNS for that view onto the ontology - so I guess that is the reason I did it that way. I could always put together a uri that returned a concatenation of the CNS files for all the different entry points for the ontology if that was useful.
What do you think?
Roger
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Is there a plan to register TAPIR provider endpoints in the GBIF UDDI registry? Or is there some other mechanism that will be used to discover TAPIR providers?
thanks, Dave V.
We are ready to register them and start indexing them into the new data portal.
Thanks,
Donald
On Mar 23, 2007, at 6:16 PM, Vieglais, David A wrote:
Is there a plan to register TAPIR provider endpoints in the GBIF UDDI registry? Or is there some other mechanism that will be used to discover TAPIR providers?
thanks, Dave V. _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
------------------------------------------------------------ Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ------------------------------------------------------------
Roger, a nice simple diagram. thats good to have, great!
The simple temporary text format decision was pretty much mine cause I had an easy parser for it and its very readable and therefore manageable. If it looks as if we are going to stay with it for a while I dont really mind to use some other RDF or XML format. Would this ease the development of the TapirLink configurator? It would mean some recoding of pywrapper and rogers scripts, but I guess that's still a rather small amount of work.
sorry for being lazy lately -- Markus
On 22.03.2007, at 21:26, Roger Hyam wrote:
Hi Dave and all,
I actually meant more like:
http://somehost/somepath/alias.txt#some_concept_source
where it is identifying a complete section of a CNS file containing many TAPIR concepts.
My understanding of the whole RDF + CNS + TAPIR Concepts + output model is like a thunking layer to get from RDF to simple XML and back again.
There is a pretty picture here: http://wiki.tdwg.org/twiki/bin/view/ TAG
At the moment I have a script that takes a view on the ontology that is defined in OWL and creates two things: A TAPIR output model and a CNS file that lists the concepts in the output model (paths through the ontology following the ObjectProperty relationships - not the subclassing relationships). It creates the big schemas that are reminiscent of ABCD but that map on to the ontology and RDF. (It also creates some documentation).
I am actually working on the output model not using namespaces but only element naming conventions (e.g. rdf_RDF == rdf:RDF) A simple XSLT then turns the resulting instance documents into real RDF with all the namespaces and stuff correctly in place. A couple of regular expressions would do the same job.
It sounds like a bit of a hack but as the XML Schemas and instance documents are really only used as part of the TAPIR configuration and protocol layer I feel it is justified. It gets around loads of problems like recursion of XSD complexTypes, confusion over imports of different complexTypes that represent the same object and having numerous schema imports to cope with the different namespaces.
I want to get the whole of this working and demo'd and then I'll put a wiki page together on it.
So the concepts exist in RDF/OWL already we are just discussing a representation of them to map into TAPIR networks.
It should be possible for TAPIR providers to appear like semantic web applications - but not SPARQL servers.
All the best,
Roger
On 22 Mar 2007, at 18:07, Dave Vieglais wrote:
Hi Renato,
I suspect Roger was thinking more along the lines of:
http://somehost/somepath/schema#someconcept
At least that's what I read from "fragment identifier".
On an aside, kind of, can someone elaborate on the decision to use a CNS file format (as described in the 1.0 spec) that is not in some form of xml, preferably RDF?
thanks, Dave V.
On Mar 22, 2007, at 12:28, Renato De Giovanni wrote:
Hi Roger,
Can you give an example of the URI using a fragment identifier for a concept source? Are you thinking about something like this:
http://somehost/somepath?cs=darwincore1.4
It will probably be the simplest solution now. The configuration interface (and the CNS handler) can be changed later to support URIs that don't specify a conceptual schema.
Best Regards,
Renato
On 22 Mar 2007 at 14:23, Roger Hyam wrote:
I am trying to get my head round this and figure out if it matters or not.
When some one is running a configurator on a wrapper they need to pick sets of concepts (concept_source) that they will map for a particular endpoint.
The configurator needs to get these sets of concepts from somewhere that is managed centrally for any one thematic network so that it can be kept up to date.
The configurator will probably know about some sets of concepts when it is installed but the user needs to be able to specify other sets.
In the case of the set of concepts being contained in an XML Schema there is a 1:1 relationship between the set and a URI.
In the case of the set of concepts being contained in a CNS file (as currently specified) there is potentially a one to many relationship where the URI may refer to many sets of concepts in a single file unless we adopt a convention of using a fragment identifier in the URI to specify a particular concept_source within the CNS.
The advantage to having multiple concept_sources in a single CNS is that the wrapper can be distributed with the URI of a CNS that can subsequently contain new concept_sources that weren't known about previously.
I suspect that (although it would be good to have a system where the configurators lead people through choosing which concept_sources they might want to map things against) it is actually much easier just to have a web page that describes them and gives the URI to enter into the configurator.
My preference at the moment is to adopt the convention of using the fragment identifier to point out which concept_source within a CNS is used. The URI fragment == alias of the concept_source. This keeps the 1:1 mapping of URI to concept_source and the implementation simple. The wrapper can simply not support CNS mapping where the fragment isn't specified or it can load the whole CNS and ask the user to pick which concept_source they want to use.
A possibility for the TAPIRLink implemenation is to have the schemas.xml file loaded from a central location.
From the ontology point of view it makes sense to have a URI for each main object types that returns the CNS for that view onto the ontology - so I guess that is the reason I did it that way. I could always put together a uri that returned a concatenation of the CNS files for all the different entry points for the ontology if that was useful.
What do you think?
Roger
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
participants (6)
-
Dave Vieglais
-
Donald Hobern
-
Markus Döring
-
Renato De Giovanni
-
Roger Hyam
-
Vieglais, David A