I got sidetracked on this days ago, but feel in the light of recent star schema discussions on the original caching thread that the time is again right to submit this new discussion.
Tradition has DwC discussions on this Tapir mailing list. I'm starting this new thread based on Markus' recent posting (below) about an Identification extension to DwC. I'm motivated to pull together the time and energy to finally push the pending DwC through the standards process, with a goal of having that whole process finished by the TDWG Meeting this year. I've been thinking about how to conduct the Request for Comment required to move the standard forward. I propose to put together a survey with Survey Monkey or something akin to actually test for reasonble concensus. Any comments or suggestions about this idea are welcome. However, I see benefits to having further discussion about some key issues before doing that, as I believe we now have enough accumulated experience to make some good decisions that will affect the design and guidelines for further development of the Darwin core and extensions.
In the past, most Darwin Core discussions have revolved about whether to include a particular concept, and where. I think it will be much more useful to concentrate on a few key issues at a higher level, resolve them at that level, then make any necessary changes to the schemas based on the consensus guiding principles. It should be easy and fast to accomplish this if the principles are clear and simple. It should be possible to complete this work soon if we can easily achieve a concensus. Here are some seed questions and recommendations to facilitate the resolution if this next step in the process.
1) Is species occurrence in nature and in collections the right scope for the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>
John's two-cent opinions: 1) Yes 2) Tough question. Inclusive, but with a well-defined path from testing to inclusion. I see no merit in minimalism for its own sake. Candidate concepts should be in scope (the discovery or retrieval of occurrence information), have a demonstrated audience, and be stable following testing. New and untested concepts can go into test extensions and application schemas that import the core, other extensions and the test extension. 3) Alignment with the Core Ontology ( http://wiki.tdwg.org/twiki/bin/view/TAG/CoreOntology) is a good guiding principle for the design of extensions. To me this suggests, for example, that an Identification Extension is appropriate as a model for the CoreIdentification. The tough bit is to decide which objects constitute extensions. All of them? Higher-level ones? Ones that are likely to have services built around them? For example, should there be an extension for a CorePlace (Geospatial extension) or a CoreGathering (Geospatial extension with event information)? Another tough bit is to decide if objects that can have a one-to-many relationship with the Core should have Status concepts. For example, given an Identification Extension to the Core, should that extension have an IdentificationStatus concept in which to label a uniquely "accepted" identification? Sorry, more questions than answers here. The same recommendations for inclusion of concepts in the core (above) apply to extensions - that they should be tested and stable. 4) Tough question. We don't seem to be completely prepared from the implementation perspective to apply GUIDs to occurrences, let alone other objects whose nature may change over time. Is there a convincing argument one way or another? In the absence of an argument in favor, I guess the default response is "No new GUID concepts". 5) The existing Darwin Record Application Schema for Tapir ( http://rs.tdwg.org/dwc/tdwg_dw_record_tapir.xsd) is a good model. It is working well in practice so far. The concepts will have to change to accommodate any changes in the Core or extensions, but the structure and method of composition of the application schema seem sound. 6) No, it isn't the right approach to overly restrict content at the concept definition level for the simple reason that if we do that, we will remove the need and value of applications or services built on top of the distributed networks (or caches built from them) to help collections validate or do error detection on their data. That would be a great loss as an incentive to participate. Besides, application schemas can be built from the existing concept definitions and may further restrict them for specialized purposes.
Anything short of a flamethrower in response is welcome.
---------- Forwarded message ---------- From: Markus Döring mdoering@gbif.org Date: Fri, May 16, 2008 at 1:29 AM Subject: Re: [tdwg-tapir] Fwd: Tapir protocol - Harvest methods? To: Renato De Giovanni renato@cria.org.br Cc: tdwg-tapir@lists.tdwg.org
Renato,
<snip>
I have created an identification extension for darwin core that holds the historical list of identification events and their outcome. This is a YAML section of the metafile describing the columns for this extension through fully qualified concepts ala TAPIR:
identification: - http://rs.tdwg.org/dwc/dwcore/ScientificName - http://rs.tdwg.org/dwc/dwcore/AuthorYearOfScientificName - http://rs.tdwg.org/dwc/dwcore/Family - http://rs.tdwg.org/dwc/dwcore/IdentificationQualifier - http://rs.tdwg.org/dwc/curatorial/DateIdentified - http://rs.tdwg.org/dwc/curatorial/IdentifiedBy
When creating this I realised that pretty much all concepts I was interested in already existed in darwin core or the curatorial extension. Wouldnt it be wise to reuse those concepts? Or are they strictly tight to the idea of a current identification and therefore cant be used for historical ones? This is probably more of a darwin core question than TAPIR, but we are all on this list anyway ...
The xml in that case would look sth like this:
<record uri="http://mygarden.com/specimen/plants/54321-423-43-54-6-3-24-44 "> dwc:ScientificNameAster alpinus subsp. parvicepsdwc:ScientificName ... ident:record dwc:ScientificNameAster alpinusdwc:ScientificName dwc:AuthorYearOfScientificNameL.</dwc:AuthorYearOfScientificName> dwc:FamilyAsteraceaedwc:Family cur:DateIdentified1913-03-12</cur:DateIdentified> cur:IdentifiedByKarl Marx</cur:IdentifiedBy> </ident:record> ident:record dwc:ScientificNameAster alpinus subsp. parvicepsdwc:ScientificName dwc:AuthorYearOfScientificNameNovopokr.</ dwc:AuthorYearOfScientificName> dwc:FamilyAsteraceaedwc:Family cur:DateIdentified2003-09-07</cur:DateIdentified> cur:IdentifiedByKeith Richards</cur:IdentifiedBy> </ident:record> <record>
Markus
</snip>
Hi John, 0. Is the DarwinCore a) a data model or b) a set of common terms used for searching species related data (i.e. indexes)? The original intent of the DwC is (b). Its use as a data model was secondary. Both are required but are not necessarily the same thing and much confusion seems to arise when the two intents are mixed.
1. Specimens and observations are within scope of the DwC, but from an IR point of view, it would be good if DwC elements / concepts can be used to index other content (e.g. search for documents by dwc:ScientificName), thus providing a common mechanism for discovery of species related information.
2. Minimalist with appropriate mechanisms for extension. The current situation is ridiculous with 40 or so versions floating around, all with overlapping concepts, but no mapping between definitions. There is a (crude) extension mechanism that as far as I know, has only been used by OBIS. All other versions of the DwC have completely ignored extension, thus leading to the current incompatibility between many data providers. There needs to be a robust core that ideally is never changed, and mechanisms for extension so that specialist groups can modify the model to their needs without loosing interoperability.
For search terms (0.b), this can be pretty much a simple vocabulary (list of terms). For a data model (0.a) there are many mechanisms. The semantic web offers many examples of content definitions that can be reused, embedded, and extended.
3. Extension is necessary when the semantics of the available definitions are insufficient for an application. A group (any group) should be able to create an extension without fear of breaking the DarwinCore. Ideally, any creator of an extension should carefully evaluate existing extensions and use those where appropriate.
4. There should be a search term "GUID" in the core. Content models do not need to contain a GUID (though they should), but must be identifiable and resolvable by a GUID. Relationships between objects should be through GUIDs. Sounds a lot like RDF.
5. The DRAST (Darwin Record Application Schema for Tapir) looks ok, but is really orthogonal to the approach taken by a much broader community (e.g. examine FOAF, DC, etc), though inline with the OGC models. Who knows which is "better"? Both work, though the later approach is arguably more difficult to utilize in "mashup" approaches / applications.
6. (0.a) Defining a data model for interoperability and integration of content implies restrictive constraints on element definitions. (0.b) the search engine (data provider) should be able to figure it out (lax definitions).
Dave V.
On Thu, May 22, 2008 at 1:15 PM, John R. WIECZOREK tuco@berkeley.edu wrote:
I got sidetracked on this days ago, but feel in the light of recent star
...
- Is species occurrence in nature and in collections the right scope for
the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>
John & Co.
It is good to read that DwC would now be finalised as a TDWG standard. When we are promoting biodiversity data sharing, it helps to convince new followers when we can unambigously state that we are using real agreed standards.
My comments to some of the points
- Is species occurrence in nature and in collections the right scope
for the Core?
Occurrence of an organism in nature is the core of the core.
Collection specimens is already a step more specialised, and those elements could go to an specimen extension or curatorial extension.
This is more or less already taken care and what is left is mainly matter of language. I always have trouble explaining to field observers that they need to use Collector for Observer or Reporter. It is confusing to speak of Collector when no specimen was collected. Same with CollectionCode which really is something like CatalogCode or DatasetName.
- Should the general philosophy of the Core be inclusive or
minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension?
Minimalist core but inclusive extensions.
- What are the defining characteristics of a group of related
concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)?
I favour an approach where the extensions are created by communities of users, such as invasive species, agriculture, forestry, observers, botanic gardens, museum collections, etc. Each of these groups already have their own databases where the necessary elements can be found.
- Should there be elements in the Core and extensions to hold GUIDs
linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object?
The core now has GlobalUniqueIdentifier, which can be used to resolve to the other identifiers for gathering, taxon etc. But that is tricky, and I doubt how many people can really deal with it. I would probably want to add direct GUIDs for those most important linking elements, but package them all into a Linking Extension. In particular we should include GUID for the taxonomic concept. That is because LSIDs are now available from SP2000, and probably from some others as well. Lets start using them!
The above also concerns the References Elements that need to be removed from the core, IMHO.*.. *
- Is it the right approach to have restrictions on content at the
concept definition level?
Only in cases when the restriction stems from mathematical or logical rule, then restrict. For agreed sets of values I would refer to community practice. Communities need to keep such lists, which can dynamically be changed as agreed, and without need to update standard versions.
Regards, Hannu
John et al.,
I think it's a good idea to use a survey to test for consensus, but it should be as simple as possible. I would suggest perhaps two questions:
-------------- 1- Would you be satisfied if the current DarwinCore and its two extensions (curatorial and geospatial) become an official standard? (even if you don't fully agree with everything).
yes/no
2- If not, what do you think should still be improved/fixed?
[text] ---------------
Regarding your questions, maybe it would be interesting to move the discussion to the Wiki so that more people can participate and all answers can be visualized together.
I agree with Dave that it's important to define the nature of DwC. Personally, I see it as an XML vocabulary, not a real data model as the TDWG ontology.
Now my answers:
1) Yes, I agree that species occurrence would be the right scope for the core. However, I agree with Hannu that some concepts in the core can easily cause confusion when interpreted by field observers. A few adjustments in concept names and perhaps a new observation extension could be worthwhile.
As a side note, if we decide to make any changes in the existing schemas it's important to change the namespace because they are already being used by providers. It's possible to map concepts from two schemas if they have different namespaces (see http://rs.tdwg.org/tapir/cs/mappings/) allowing providers to easily upgrade their configuration when necessary.
2) Fully agree with Dave.
3) Fully agree with Dave and Hannu.
4) I tend to agree with John, unless we expect that a few observation/specimen data providers will be able to offer additional GUIDs (such as taxon concept GUIDs) in the next years. I doubt this will happen soon, but I may be wrong.
5) I see the current application schema more as an example. There can be as many application schemas as necessary from TAPIR's perspective. This one works well for flat data structures. ABCD or something along the lines of Markus' new schema would be better options when there can be repeatable elements.
6) Fully agree with John's answer. I think it's important to allow data cleaning through valid XML instances.
Best Regards, -- Renato
I got sidetracked on this days ago, but feel in the light of recent star schema discussions on the original caching thread that the time is again right to submit this new discussion.
Tradition has DwC discussions on this Tapir mailing list. I'm starting this new thread based on Markus' recent posting (below) about an Identification extension to DwC. I'm motivated to pull together the time and energy to finally push the pending DwC through the standards process, with a goal of having that whole process finished by the TDWG Meeting this year. I've been thinking about how to conduct the Request for Comment required to move the standard forward. I propose to put together a survey with Survey Monkey or something akin to actually test for reasonble concensus. Any comments or suggestions about this idea are welcome. However, I see benefits to having further discussion about some key issues before doing that, as I believe we now have enough accumulated experience to make some good decisions that will affect the design and guidelines for further development of the Darwin core and extensions.
In the past, most Darwin Core discussions have revolved about whether to include a particular concept, and where. I think it will be much more useful to concentrate on a few key issues at a higher level, resolve them at that level, then make any necessary changes to the schemas based on the consensus guiding principles. It should be easy and fast to accomplish this if the principles are clear and simple. It should be possible to complete this work soon if we can easily achieve a concensus. Here are some seed questions and recommendations to facilitate the resolution if this next step in the process.
- Is species occurrence in nature and in collections the right scope for
the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>
Anything short of a flamethrower in response is welcome.
My only wish is that each 'concept' within DwC consists of a normative description bound to a URI. We can then use these definitions across technologies.
I would even suggest that the standard takes the form of a human readable table of URIs and definitions and any talk of XML Schema, RDF, CSV etc is left either to later sections of the standard or even totally separate standards.
My belief is that the pain we suffer now is because the 'things' were not defined separately from the technology so it is not possible to conceptually map between the transfer formats.
Just my thoughts,
Roger
On 29 May 2008, at 21:43, Renato De Giovanni wrote:
John et al.,
I think it's a good idea to use a survey to test for consensus, but it should be as simple as possible. I would suggest perhaps two questions:
1- Would you be satisfied if the current DarwinCore and its two extensions (curatorial and geospatial) become an official standard? (even if you don't fully agree with everything).
yes/no
2- If not, what do you think should still be improved/fixed?
[text]
Regarding your questions, maybe it would be interesting to move the discussion to the Wiki so that more people can participate and all answers can be visualized together.
I agree with Dave that it's important to define the nature of DwC. Personally, I see it as an XML vocabulary, not a real data model as the TDWG ontology.
Now my answers:
- Yes, I agree that species occurrence would be the right scope for
the core. However, I agree with Hannu that some concepts in the core can easily cause confusion when interpreted by field observers. A few adjustments in concept names and perhaps a new observation extension could be worthwhile.
As a side note, if we decide to make any changes in the existing schemas it's important to change the namespace because they are already being used by providers. It's possible to map concepts from two schemas if they have different namespaces (see http://rs.tdwg.org/tapir/cs/mappings/) allowing providers to easily upgrade their configuration when necessary.
Fully agree with Dave.
Fully agree with Dave and Hannu.
I tend to agree with John, unless we expect that a few
observation/specimen data providers will be able to offer additional GUIDs (such as taxon concept GUIDs) in the next years. I doubt this will happen soon, but I may be wrong.
- I see the current application schema more as an example. There
can be as many application schemas as necessary from TAPIR's perspective. This one works well for flat data structures. ABCD or something along the lines of Markus' new schema would be better options when there can be repeatable elements.
- Fully agree with John's answer. I think it's important to allow
data cleaning through valid XML instances.
Best Regards,
Renato
I got sidetracked on this days ago, but feel in the light of recent star schema discussions on the original caching thread that the time is again right to submit this new discussion.
Tradition has DwC discussions on this Tapir mailing list. I'm starting this new thread based on Markus' recent posting (below) about an Identification extension to DwC. I'm motivated to pull together the time and energy to finally push the pending DwC through the standards process, with a goal of having that whole process finished by the TDWG Meeting this year. I've been thinking about how to conduct the Request for Comment required to move the standard forward. I propose to put together a survey with Survey Monkey or something akin to actually test for reasonble concensus. Any comments or suggestions about this idea are welcome. However, I see benefits to having further discussion about some key issues before doing that, as I believe we now have enough accumulated experience to make some good decisions that will affect the design and guidelines for further development of the Darwin core and extensions.
In the past, most Darwin Core discussions have revolved about whether to include a particular concept, and where. I think it will be much more useful to concentrate on a few key issues at a higher level, resolve them at that level, then make any necessary changes to the schemas based on the consensus guiding principles. It should be easy and fast to accomplish this if the principles are clear and simple. It should be possible to complete this work soon if we can easily achieve a concensus. Here are some seed questions and recommendations to facilitate the resolution if this next step in the process.
- Is species occurrence in nature and in collections the right
scope for the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>
Anything short of a flamethrower in response is welcome.
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
I completely agree - back to "model it first using a serialisation independent method", eg UML, then work towards specific schemas for specific use cases which are "solution oriented" rather than a modeling technique.
I always wondered whether schemas such as DwC are more like "suggested implementations" or "best practices" for specific models and specific use cases. Eg DwC could be represented as the best practice transfer standard for simple specimen/observation records (using xml or rdf?), referencing a standard set of "concepts" (by URI), and use a format like:
<xs:schema targetNamespace="http://rs.tdwg.org/dwc/dwcore/" xmlns:dwe="http://rs.tdwg.org/dwc/dwelement" xmlns:dwc="http://rs.tdwg.org/dwc/dwcore/" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" version="0.7" xmlns:tto="http://rs.tdwg.org/ontology/voc/TaxonOccurrence#" >
... <xs:element ref="tto:basisOfRecordString ( http://rs.tdwg.org/ontology/voc/TaxonOccurrence#basisOfRecordString )" minOccurs="1" /> <xs:element ref="tto:identifiedToString ( 'http://rs.tdwg.org/ontology/voc/TaxonOccurrence#identifiedTo%22/' )" minOccurs="1" /> ...
</xs:schema>
Let me know if this is complete rubbish?? Kevin
"Roger Hyam (TDWG)" rogerhyam@mac.com 4/06/2008 8:33 p.m. >>>
My only wish is that each 'concept' within DwC consists of a normative description bound to a URI. We can then use these definitions across technologies.
I would even suggest that the standard takes the form of a human readable table of URIs and definitions and any talk of XML Schema, RDF, CSV etc is left either to later sections of the standard or even totally separate standards.
My belief is that the pain we suffer now is because the 'things' were not defined separately from the technology so it is not possible to conceptually map between the transfer formats.
Just my thoughts,
Roger
On 29 May 2008, at 21:43, Renato De Giovanni wrote:
John et al.,
I think it's a good idea to use a survey to test for consensus, but it should be as simple as possible. I would suggest perhaps two questions:
1- Would you be satisfied if the current DarwinCore and its two extensions (curatorial and geospatial) become an official standard? (even if you don't fully agree with everything).
yes/no
2- If not, what do you think should still be improved/fixed?
[text]
Regarding your questions, maybe it would be interesting to move the discussion to the Wiki so that more people can participate and all answers can be visualized together.
I agree with Dave that it's important to define the nature of DwC. Personally, I see it as an XML vocabulary, not a real data model as the TDWG ontology.
Now my answers:
- Yes, I agree that species occurrence would be the right scope for
the core. However, I agree with Hannu that some concepts in the core can easily cause confusion when interpreted by field observers. A few adjustments in concept names and perhaps a new observation extension could be worthwhile.
As a side note, if we decide to make any changes in the existing schemas it's important to change the namespace because they are already being used by providers. It's possible to map concepts from two schemas if they have different namespaces (see http://rs.tdwg.org/tapir/cs/mappings/) allowing providers to easily upgrade their configuration when necessary.
Fully agree with Dave.
Fully agree with Dave and Hannu.
I tend to agree with John, unless we expect that a few
observation/specimen data providers will be able to offer additional GUIDs (such as taxon concept GUIDs) in the next years. I doubt this will happen soon, but I may be wrong.
- I see the current application schema more as an example. There
can be as many application schemas as necessary from TAPIR's perspective. This one works well for flat data structures. ABCD or something along the lines of Markus' new schema would be better options when there can be repeatable elements.
- Fully agree with John's answer. I think it's important to allow
data cleaning through valid XML instances.
Best Regards,
Renato
I got sidetracked on this days ago, but feel in the light of recent star schema discussions on the original caching thread that the time is again right to submit this new discussion.
Tradition has DwC discussions on this Tapir mailing list. I'm starting this new thread based on Markus' recent posting (below) about an Identification extension to DwC. I'm motivated to pull together the time and energy to finally push the pending DwC through the standards process, with a goal of having that whole process finished by the TDWG Meeting this year. I've been thinking about how to conduct the Request for Comment required to move the standard forward. I propose to put together a survey with Survey Monkey or something akin to actually test for reasonble concensus. Any comments or suggestions about this idea are welcome. However, I see benefits to having further discussion about some key issues before doing that, as I believe we now have enough accumulated experience to make some good decisions that will affect the design and guidelines for further development of the Darwin core and extensions.
In the past, most Darwin Core discussions have revolved about whether to include a particular concept, and where. I think it will be much more useful to concentrate on a few key issues at a higher level, resolve them at that level, then make any necessary changes to the schemas based on the consensus guiding principles. It should be easy and fast to accomplish this if the principles are clear and simple. It should be possible to complete this work soon if we can easily achieve a concensus. Here are some seed questions and recommendations to facilitate the resolution if this next step in the process.
- Is species occurrence in nature and in collections the right
scope for the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>
Anything short of a flamethrower in response is welcome.
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research. http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
participants (7)
-
Dave Vieglais
-
Hannu Saarenmaa
-
John R. WIECZOREK
-
Kevin Richards
-
Markus Döring
-
Renato De Giovanni
-
Roger Hyam (TDWG)