TAG Road Map

newer
Cardinality - not in Tonto shared...

older
RE: [tdwg-tapir] Validation of an...

Roger Hyam

7 Aug 2006 7 Aug '06

04:37

Hi Everyone, The TDWG Infrastructure Project team have a milestone/deliverable to produce a document to act as a road map for the TDWG Architecture. I have prepared a document that I believe pulls together the ideas and consequences from the TAG-1 meeting and the two GUID meetings plus various discussions held at the OGC meeting in Edinburgh. This document introduces the notion of a 'Semantic Hub' which is a proposed mechanism to expose TDWG semantics through multiple technologies: XML Schema; GML; OWL; etc. I would be most grateful if you could find time to have a look at this six page document: http://wiki.tdwg.org/twiki/bin/view/TAG/ArchitectureOverview and give any feedback on the wiki page, via this list or directly to me. Many thanks, Roger -- ------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

Show replies by date

Javier de la Torre

9 Aug 9 Aug

05:50

Hi Roger, Nice document. Just some comments: -Tonto is a little bit bad name in spanish. -Have you considered something about controlled vocabularies in the semantic hub? Or maybe what I prefer to call managed controlled vocabularies. For example in ABCD there is a concept called "KindOfRecord", it is not a controlled vocabulary, is just a text. It would be too difficult to provide a fix list of terms for it so would be great if the list can somehow be created in a community base. Let say that I want to map my database to this field, I could get a list of proposed terms already being used, if none of them satisfy me then i can create my own one. It is a little bit like tagging in a control way. I love the del.icio.us example, they propose you tags and most of the time I use them, and by doing this then the data is much more accessible because the tags have not exploded. The opposite os what is happening now in ABCD everybody use a different term for the same thing and the unified data becomes useless. -In the implementation section you say something like "Data Providers must map their data to these views" referring to views from the semantic hub. This is actually we are trying to avoid. TAPIR at the beginning was created with the vision of data providers mapping once their databases and being accessible through different views that are explicitly declared on the request. We changed the name now and we are calling it outputModels. In the other hand you know that WASABI and PyWrapper are now becoming muti-protocol. That means that we want providers to map their databases once and make the data available in different protocols. So, again, within TAPIR itself and within protocols we need something like the semantic hub you are proposing. And we are doing it right now but very primitive. I am working on the implementation of the BioMOBY protocol inside PyWrapper. I had created a mapping file between TDWG schemas and MOBY data types registry so that I can resolve questions like: -Ok, if I have these TDWG concepts mapped, which MOBY services could I create. -How can I create this MOBY types using TDWG concepts? As I said this is now being implemented in a simple flat file that will be available in Internet for all data providers, but I am not accessing it as a service and I have to do all the handling on the client. The semantic Hub you are proposing is exactly what we need and want to do this more properly. So... summarizing, from our side I can imagine now that we need: -the semantic hub must expose the concepts in a way that we can use them in our configuration tools in the data providers to allow mapping a database there. -the semantic hub must expose the different views, or outputModels as we call them in TAPIR, so that providers software can produce them. There is a full list of other requirements I would love to see there that can be found in Dave Thau's work on a schema repository, do you remember? http://ww3.bgbm.org/schemarepowiki Specially there where things like: -Give an XSLT that transforms ABCD 1.2 into 2.06 -Give me labels for this concept -Give me examples of values for this concept ... I hope to be half clear as your document is :) Javier. On 8/7/06, Roger Hyam <roger@tdwg.org> wrote:

...

Hi Everyone,

The TDWG Infrastructure Project team have a milestone/deliverable to produce a document to act as a road map for the TDWG Architecture.

I have prepared a document that I believe pulls together the ideas and consequences from the TAG-1 meeting and the two GUID meetings plus various discussions held at the OGC meeting in Edinburgh. This document introduces the notion of a 'Semantic Hub' which is a proposed mechanism to expose TDWG semantics through multiple technologies: XML Schema; GML; OWL; etc.

I would be most grateful if you could find time to have a look at this six page document:

http://wiki.tdwg.org/twiki/bin/view/TAG/ArchitectureOverview

and give any feedback on the wiki page, via this list or directly to me.

Many thanks,

Roger

--

------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Roger Hyam

07:36

Hi Javier, Comments below.

...

Nice document. Just some comments:

-Tonto is a little bit bad name in spanish.

Yes I know it means Fool. I thought it was appropriate for something that people might think of as a knowledge management tool - keep things in perspective. Ricardo thought it was OK so I presume it is OK in Brazilian Portuguese but not sure about Spanish in all the different countries. If anyone knows it to be very derogatory or a 'bad' word in any particular country let me know and I'll change it. Fool is quite mild in English. Fool also means "Cold dessert consisting of fruit puree and whipped cream." in English.

...

-Have you considered something about controlled vocabularies in the semantic hub? Or maybe what I prefer to call managed controlled vocabularies. For example in ABCD there is a concept called "KindOfRecord", it is not a controlled vocabulary, is just a text. It would be too difficult to provide a fix list of terms for it so would be great if the list can somehow be created in a community base. Let say that I want to map my database to this field, I could get a list of proposed terms already being used, if none of them satisfy me then i can create my own one. It is a little bit like tagging in a control way. I love the del.icio.us example, they propose you tags and most of the time I use them, and by doing this then the data is much more accessible because the tags have not exploded. The opposite os what is happening now in ABCD everybody use a different term for the same thing and the unified data becomes useless.

There will be instances of classes in Tonto. I should have mentioned that. Chatting to Rob Gales about it it seems a good way of doing controlled vocabularies. They will be extensible because Tonto will always be capable of change. Also in some languages - OWL etc you could always define your own instances outside of Tonto - but it would depend on how we do the coding for the XML Schema based renderings of the semantics. Populating drop down menus with information out of Tonto when some one is mapping a data source in the ultimate goal - the dream!

...

-In the implementation section you say something like "Data Providers must map their data to these views" referring to views from the semantic hub. This is actually we are trying to avoid. TAPIR at the beginning was created with the vision of data providers mapping once their databases and being accessible through different views that are explicitly declared on the request. We changed the name now and we are calling it outputModels. In the other hand you know that WASABI and PyWrapper are now becoming muti-protocol. That means that we want providers to map their databases once and make the data available in different protocols.

The plan is that data mapping only has to be done to one of the views that Tonto has onto its internal semantics then the other views/representations could be used as outputModels (or custom output models could be created by clients etc). The goal is definitely to only map once - a single set of semantics - but then represent in multiple ways. Tonto could provide a view of the semantics that a graphical tool could then pick up to help some one build a mapping file - the dream...

...

So, again, within TAPIR itself and within protocols we need something like the semantic hub you are proposing. And we are doing it right now but very primitive. I am working on the implementation of the BioMOBY protocol inside PyWrapper. I had created a mapping file between TDWG schemas and MOBY data types registry so that I can resolve questions like: -Ok, if I have these TDWG concepts mapped, which MOBY services could I create. -How can I create this MOBY types using TDWG concepts?

This is definitely where everyone is heading. The idea with the hub is to start by clearly defining the semantic constructs we are going to use (classes, properties, instances, literals, ranges) so that we can be sure that we can represent the semantics in different 'representations'. It is no good using some UML or OWL construct that doesn't have a good representation in XML Schema or GML for example. This plan will not answer all the questions on the first day. The existing schemas will need to be mapped into Tonto so they are represented in a uniform way and this will take time - but months not years I hope. It is not always clear in the exiting schemas what the classes and properties are, for example, so this is not an automatic process but will need thinking through - and there are issues around cardinality and multiple inheritance that will need to be discussed.

...

As I said this is now being implemented in a simple flat file that will be available in Internet for all data providers, but I am not accessing it as a service and I have to do all the handling on the client. The semantic Hub you are proposing is exactly what we need and want to do this more properly.

I am glad to hear that. I hope that we can generate the file you need to do the mapping automatically in future.

...

So... summarizing, from our side I can imagine now that we need:

-the semantic hub must expose the concepts in a way that we can use them in our configuration tools in the data providers to allow mapping a database there. Just specify the way and we will make it do it. -the semantic hub must expose the different views, or outputModels as we call them in TAPIR, so that providers software can produce them.

Just specify them and we write a script to produce them from Tonto.

...

There is a full list of other requirements I would love to see there that can be found in Dave Thau's work on a schema repository, do you remember? http://ww3.bgbm.org/schemarepowiki

Specially there where things like: -Give an XSLT that transforms ABCD 1.2 into 2.06 Ah ha! this is where I have to say no!

What I am proposing is that we have a central semantic model that can be presented in a multitude of ways. This is *very *different from a service that can automatically transform one existing schema to another. That is far more ambitious and may not be possible in what remains of human history. We should build the central semantic model out of the existing schemas but that is a manual process where decisions will have to be made about what was meant by the different constructs in the existing schemas. Think about the different ways inclusion, adjacency, cardinality and type extension/restriction are used within XML Schema documents and what they 'mean' in terms of GML feature types or RDFS classes and properties? In general there just isn't a mapping. When they built GML they stared with the model of feature types and properties and then decided how they would represent this using XML and control it with XML Schema. We need to take a similar approach. Decide on what our modeling technique is then decide how we will represent the model in different technologies including GML and semantic web stuff. A crazy analogy would be to say it is like getting a machine to write a story in different languages based on the same plot. This might be achievable because the plot can be encoded in some machine readable way and the machine can just use rules to bolt together stored sentences. Adding another language is just a matter of new output rules and new stored sentences. It wouldn't produce great literature but it would work. This is very different from asking a machine to read a book in one language, understand the plot and print out the story in a completely different language. For a start their may be things expressed in the first language for which there is no direct equivalent. Just ask a babel fish! I think this is the main way the Semantic Hub proposal differs from the Schema Repository approach. I am trying to make something happen by restricting the scope - if you remember the talk I gave on managing scope and resources.. I hope this is OK.

...

-Give me labels for this concept This should be easy. Internationalization won't be in the first implementation but should be in subsequent versions.

-Give me examples of values for this concept This should be easy but isn't in the highest priority. ...

I hope to be half clear as your document is :)

Sounds good. Thank you again for your input. All the best, Roger

...

Javier.

On 8/7/06, Roger Hyam <roger@tdwg.org> wrote:

...
Hi Everyone,

The TDWG Infrastructure Project team have a milestone/deliverable to produce a document to act as a road map for the TDWG Architecture.

I have prepared a document that I believe pulls together the ideas and consequences from the TAG-1 meeting and the two GUID meetings plus various discussions held at the OGC meeting in Edinburgh. This document introduces the notion of a 'Semantic Hub' which is a proposed mechanism to expose TDWG semantics through multiple technologies: XML Schema; GML; OWL; etc.

I would be most grateful if you could find time to have a look at this six page document:

http://wiki.tdwg.org/twiki/bin/view/TAG/ArchitectureOverview

and give any feedback on the wiki page, via this list or directly to me.

Many thanks,

Roger

--

------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

-- ------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

Javier de la Torre

07:54

Hi Roger, Tonto is not a bad word in spanish, is the same as you described for english. I though you might have not considered it, but if you like it and want to have presentations like: "...well and then we have Roger Hyam who is the tonto ("fool") developer" is up to you ;)

...

-the semantic hub must expose the concepts in a way that we can use them in our configuration tools in the data providers to allow mapping a database there.

Just specify the way and we will make it do it.

Ups, there is no format that we have considered. I was thinking more in that you were gonna tell me that you will give an OWL and I was going to have to understand it. But if you ask then I want it as a serialized Python object :D No, serioulsy, we have to think about this a little bit more.

...

-the semantic hub must expose the different views, or outputModels as we call them in TAPIR, so that providers software can produce them.

Just specify them and we write a script to produce them from Tonto.

well, there is already an outputodel format in TAPIR. I can give you some examples of an outputModel for BioMOBY that is created from Darwin Core concepts, or a GML app schema outputModel from ABCD.

...

Specially there where things like: -Give an XSLT that transforms ABCD 1.2 into 2.06

Ah ha! this is where I have to say no! [and a lot of arguing...]

Roger, you got me wrong. There is already an XSLT, made by humans, that transforms ABCD 1.2 to ABCD 2.06. What I would like is a repository that just stores these documents and if I a program ask "Hey tonto! Do you have an XSLT to transform ABCD 1.2 to ABCD 2.06?" Could say "hey, look! I am not so fool, I have one!" and the program becomes happy. Actually I think this is the easiest thing to implement in Tonto. Best regards, Javi.

Steve Perry

09:23

Hi Roger & Javier, Roger, thanks for sending the link to the technical roadmap. Reading that document followed by Javier's comments and your responses, made me think of a few questions and comments of my own:

...

...
-Have you considered something about controlled vocabularies in the semantic hub? Or maybe what I prefer to call managed controlled vocabularies. For example in ABCD there is a concept called "KindOfRecord", it is not a controlled vocabulary, is just a text. It would be too difficult to provide a fix list of terms for it so would be great if the list can somehow be created in a community base. Let say that I want to map my database to this field, I could get a list of proposed terms already being used, if none of them satisfy me then i can create my own one. It is a little bit like tagging in a control way. I love the del.icio.us example, they propose you tags and most of the time I use them, and by doing this then the data is much more accessible because the tags have not exploded. The opposite os what is happening now in ABCD everybody use a different term for the same thing and the unified data becomes useless.

There will be instances of classes in Tonto. I should have mentioned that. Chatting to Rob Gales about it it seems a good way of doing controlled vocabularies. They will be extensible because Tonto will always be capable of change. Also in some languages - OWL etc you could always define your own instances outside of Tonto - but it would depend on how we do the coding for the XML Schema based renderings of the semantics.

Populating drop down menus with information out of Tonto when some one is mapping a data source in the ultimate goal - the dream! First, a silly point. The term "controlled vocabulary" is used by some as a synonym for "ontology". Can we find or coin a new term for concepts with a range of enumerated values that isn't overloaded?

I agree with Rob that any "concept" with enumerated values ought to have those values represented as instances with assigned GUIDs. Some values will be blessed by the fact that they are stored in the core model but people will be free to invent their own without breaking existing software or mapping rules (at the expense of losing interoperability with software that only understands the approved values).

...

...
-In the implementation section you say something like "Data Providers must map their data to these views" referring to views from the semantic hub. This is actually we are trying to avoid. TAPIR at the beginning was created with the vision of data providers mapping once their databases and being accessible through different views that are explicitly declared on the request. We changed the name now and we are calling it outputModels. In the other hand you know that WASABI and PyWrapper are now becoming muti-protocol. That means that we want providers to map their databases once and make the data available in different protocols.

The plan is that data mapping only has to be done to one of the views that Tonto has onto its internal semantics then the other views/representations could be used as outputModels (or custom output models could be created by clients etc). The goal is definitely to only map once - a single set of semantics - but then represent in multiple ways. Tonto could provide a view of the semantics that a graphical tool could then pick up to help some one build a mapping file - the dream... This ties in with a few questions I have about the semantic hub and Tonto.

First, since some of us are also involved in working to create data models that one day may be added to the semantic hub, is there a defined list of the common subset of modeling constructs (between UML, XSD, OWL, and RDFS) and suggestions about how to implement them? Are there discussions about what constructs will be dropped and the trade-offs of different implementations? For example, it could be argued that N-ary associations could be implemented in RDFS and OWL (and perhaps in XML-Schemas that can describe directed labeled graphs through the use of GUIDs), but from the research community, the recommended implementation for N-ary associations in RDF-based systems is reification. As implementors of systems that work with RDF-based data, we feel that reification is not the way to go and that it may be better to drop support for N-ary associations rather than put in place a "flawed work-around" like reification. Just off the top of my head there are also issues with modeling arbitrary cardinality, cardinality on one or both sides of an association, primitive type mapping and data type promotion, modeling aggregation and/or composition (sequences and bags mean anonymous nodes which don't play nice in a system that uses GUIDs to name resources), and how to implement many of these modeling constructs in XSD which was designed to describe trees (not graphs) and has no built-in notion of global identifiers. I don't mean to get bogged down in detail or to make trouble, but creation of the concrete data models (in XSD, RDFS, OWL, etc.) from the abstract semantic core will depend on sorting all these issues out. Is there a place where these discussions are happening and is there some way that implementors can feed back into the decisions made on these issues by the technical infrastructure group? Once they have been formalized, I think it may be important that these modeling recommendations be made available to the community in a document. One idea behind the semantic core is that it can grow over time. As the community models new areas of biodiversity informatics there has to be a way for the new data models to be incorporated into the semantic core (after being blessed by some TDWG body). This will be easier if the people creating new data models understand which modeling constructs they have available to them.

...

...
So, again, within TAPIR itself and within protocols we need something like the semantic hub you are proposing. And we are doing it right now but very primitive. I am working on the implementation of the BioMOBY protocol inside PyWrapper. I had created a mapping file between TDWG schemas and MOBY data types registry so that I can resolve questions like: -Ok, if I have these TDWG concepts mapped, which MOBY services could I create. -How can I create this MOBY types using TDWG concepts?

This is definitely where everyone is heading. The idea with the hub is to start by clearly defining the semantic constructs we are going to use (classes, properties, instances, literals, ranges) so that we can be sure that we can represent the semantics in different 'representations'. It is no good using some UML or OWL construct that doesn't have a good representation in XML Schema or GML for example.

This plan will not answer all the questions on the first day. The existing schemas will need to be mapped into Tonto so they are represented in a uniform way and this will take time - but months not years I hope. It is not always clear in the exiting schemas what the classes and properties are, for example, so this is not an automatic process but will need thinking through - and there are issues around cardinality and multiple inheritance that will need to be discussed.

Is there a more complete description of Tonto functionality? I'm still not clear about what kind of interface it will provide to the network of services (if any). Is it correct to think of Tonto as a schema repository, a collaborative schema editor, or a tool that will be used by technical infrastructure group to generate concrete implementations of the semantic core in various typing systems (XSD, RDFS, OWL, etc)? It may be lack of sleep, but I took the wording in section 5.2 to suggest all three at different points.

...

...
As I said this is now being implemented in a simple flat file that will be available in Internet for all data providers, but I am not accessing it as a service and I have to do all the handling on the client. The semantic Hub you are proposing is exactly what we need and want to do this more properly.

I am glad to hear that. I hope that we can generate the file you need to do the mapping automatically in future.

...
So... summarizing, from our side I can imagine now that we need:

-the semantic hub must expose the concepts in a way that we can use them in our configuration tools in the data providers to allow mapping a database there. Just specify the way and we will make it do it. -the semantic hub must expose the different views, or outputModels as we call them in TAPIR, so that providers software can produce them.

Just specify them and we write a script to produce them from Tonto.

...
There is a full list of other requirements I would love to see there that can be found in Dave Thau's work on a schema repository, do you remember? http://ww3.bgbm.org/schemarepowiki

Specially there where things like: -Give an XSLT that transforms ABCD 1.2 into 2.06 Ah ha! this is where I have to say no!

What I am proposing is that we have a central semantic model that can be presented in a multitude of ways. This is *very *different from a service that can automatically transform one existing schema to another. That is far more ambitious and may not be possible in what remains of human history. We should build the central semantic model out of the existing schemas but that is a manual process where decisions will have to be made about what was meant by the different constructs in the existing schemas. Think about the different ways inclusion, adjacency, cardinality and type extension/restriction are used within XML Schema documents and what they 'mean' in terms of GML feature types or RDFS classes and properties? In general there just isn't a mapping.

When they built GML they stared with the model of feature types and properties and then decided how they would represent this using XML and control it with XML Schema. We need to take a similar approach. Decide on what our modeling technique is then decide how we will represent the model in different technologies including GML and semantic web stuff.

I agree completely. On the idea that we can generate GML feature types directly from the semantic hub, there is a tangential but related point that hasn't received much discussion. It centers around how GUIDs will work within GML application schemas. GML app schemas probably shouldn't contain many LSIDs because GIS apps can't resolve them to get at the underlying data (and LSIDs aren't very informative when they appear as labels for features on maps). So any direct translation of semantic hub classes into GML app schemas may be of limited value. Instead GML app schemas will probably have to be composed by selecting a set of properties with literals values, even if that means dereferencing LSIDs into associated objects to do so. But that's a discussion that can be held later.

...

A crazy analogy would be to say it is like getting a machine to write a story in different languages based on the same plot. This might be achievable because the plot can be encoded in some machine readable way and the machine can just use rules to bolt together stored sentences. Adding another language is just a matter of new output rules and new stored sentences. It wouldn't produce great literature but it would work.

This is very different from asking a machine to read a book in one language, understand the plot and print out the story in a completely different language. For a start their may be things expressed in the first language for which there is no direct equivalent. Just ask a babel fish!

I think this is the main way the Semantic Hub proposal differs from the Schema Repository approach. I am trying to make something happen by restricting the scope - if you remember the talk I gave on managing scope and resources..

I hope this is OK.

...
-Give me labels for this concept This should be easy. Internationalization won't be in the first implementation but should be in subsequent versions.

-Give me examples of values for this concept This should be easy but isn't in the highest priority. This could be difficult when the value of a concept is not a single string value, but one of several different types of objects.

-Steve

Roger Hyam

10:23

Hi Steve, Good to have your input. This thread is getting a little difficult to read so I 'll just pick out a few bits then spin out some new threads. Steve Perry wrote:

...

First, a silly point. The term "controlled vocabulary" is used by some as a synonym for "ontology". Can we find or coin a new term for concepts with a range of enumerated values that isn't overloaded?

I agree with Rob that any "concept" with enumerated values ought to have those values represented as instances with assigned GUIDs. Some values will be blessed by the fact that they are stored in the core model but people will be free to invent their own without breaking existing software or mapping rules (at the expense of losing interoperability with software that only understands the approved values). Sounds good. A terminology to talk about terminology would be good. I think enumerated values is quiet a good one but am open to suggestions. Any property with a range of instance of a class is really an enumeration though.

...

First, since some of us are also involved in working to create data models that one day may be added to the semantic hub, is there a defined list of the common subset of modeling constructs (between UML, XSD, OWL, and RDFS) and suggestions about how to implement them? Are there discussions about what constructs will be dropped and the trade-offs of different implementations? I am hoping to kick the discussion off. Donald made the point on an earlier version of the road map document that I should expand on the basic lists of constructs (I use the word construct as I can't think of another term for the things we use to construct our ontology) but I decided to leave it 'enigmatic' so it would kick off the debate.

For example, it could be argued that N-ary associations could be implemented in RDFS and OWL (and perhaps in XML-Schemas that can describe directed labeled graphs through the use of GUIDs), but from the research community, the recommended implementation for N-ary associations in RDF-based systems is reification. As implementors of systems that work with RDF-based data, we feel that reification is not the way to go and that it may be better to drop support for N-ary associations rather than put in place a "flawed work-around" like reification. Just off the top of my head there are also issues with modeling arbitrary cardinality, cardinality on one or both sides of an association, primitive type mapping and data type promotion, modeling aggregation and/or composition (sequences and bags mean anonymous nodes which don't play nice in a system that uses GUIDs to name resources), and how to implement many of these modeling constructs in XSD which was designed to describe trees (not graphs) and has no built-in notion of global identifiers.

What I would like to do is start from the position of 0 constructs and then add them in one at a time (stopping when we have the bear minimum we need to communicate) rather than look for an intersection of known target languages. This is because less means less implementation/documentation etc it also means when another language appears that we have to support we are less likely to have used something that it can't do. Rather contentiously I think it will also make it easier for us to agree on models - easier for people to understand and appreciate the alternative ways of modeling something before settling on one. I am just thinking about how to kick off the debate on constructs and have started a wiki page here: http://wiki.tdwg.org/twiki/bin/view/TAG/TDWGOntologyConstructs but running out of time today so will have to work on it tomorrow. The big ones I'd like to start discussion with are cardinality and multiple inheritance. I have good arguments for not having either (just not sure how to state the inheritance ones). I'll maybe kick off with and email on cardinality tomorrow. I tried it out on Jessie, Rob and Robert last week and they seemed OK about it - but may have just been bored ! The big question is "What is business/application logic and what is semantics?"

...

I don't mean to get bogged down in detail or to make trouble, but creation of the concrete data models (in XSD, RDFS, OWL, etc.) from the abstract semantic core will depend on sorting all these issues out. Is there a place where these discussions are happening and is there some way that implementors can feed back into the decisions made on these issues by the technical infrastructure group? Once they have been formalized, I think it may be important that these modeling recommendations be made available to the community in a document. One idea behind the semantic core is that it can grow over time. As the community models new areas of biodiversity informatics there has to be a way for the new data models to be incorporated into the semantic core (after being blessed by some TDWG body). This will be easier if the people creating new data models understand which modeling constructs they have available to them.

What I would like to think is that we will be able to model from within Tonto for the basic agreed semantics. Application logic modeling can happen outside but for the basic agreed semantics it has to happen collaboratively and extend what is already there - or it isn't agreed.

...

Is there a more complete description of Tonto functionality? I'm still not clear about what kind of interface it will provide to the network of services (if any). Is it correct to think of Tonto as a schema repository, a collaborative schema editor, or a tool that will be used by technical infrastructure group to generate concrete implementations of the semantic core in various typing systems (XSD, RDFS, OWL, etc)? It may be lack of sleep, but I took the wording in section 5.2 to suggest all three at different points. The Semantic Hub as a notion will be a place where semantic stuff is stored. Tonto will be an application to do 'ontology governance'. Effectively a simple web application that allows the building of a single, agreed ontology according to a set of rules (the constructs that are permitted) and then exposes (writes files out some place) the ontology in different technologies. The semantics are internal to Tonto (controlled by its business rules) and exposed in flavours that are required.

All this is only possible by managing the scope of what is done. If we extend the scope of what Tonto is supposed to do too far then it will never work. The lessons of the Gene Ontology echo in my mind. They had far too few constructs and they ended up with something that people complain about. People complain about it because they are using it! There are plenty of OWL ontologies out there that no one complains about because no one is using. I'd like to hit the happy medium. Currently I am hacking together a first iteration of Tonto as I believe it is easier (and quicker) to demonstrate the thing as a proof of concept than try and formally describe it, justify it and then engineer it properly. It is probably quicker to discover what is not possible this way. I hope to have a version up for comment in a few weeks and certainly for demo by October.

...

...
This should be easy but isn't in the highest priority. This could be difficult when the value of a concept is not a single string value, but one of several different types of objects.

You are right it could be difficult. In which case it should only be done if we can't achieve our business goals without it - same applies to choosing constructs. Hope this helps. I'll kick off a thread on why we shouldn't have cardinality tomorrow morning my time - then you can pick it to pieces ;) All the best, Roger -- ------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

6836

Age (days ago)

6838

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Javier de la Torre
Roger Hyam
Steve Perry