John et al.,
I think it's a good idea to use a survey to test for consensus, but it should be as simple as possible. I would suggest perhaps two questions:
-------------- 1- Would you be satisfied if the current DarwinCore and its two extensions (curatorial and geospatial) become an official standard? (even if you don't fully agree with everything).
yes/no
2- If not, what do you think should still be improved/fixed?
[text] ---------------
Regarding your questions, maybe it would be interesting to move the discussion to the Wiki so that more people can participate and all answers can be visualized together.
I agree with Dave that it's important to define the nature of DwC. Personally, I see it as an XML vocabulary, not a real data model as the TDWG ontology.
Now my answers:
1) Yes, I agree that species occurrence would be the right scope for the core. However, I agree with Hannu that some concepts in the core can easily cause confusion when interpreted by field observers. A few adjustments in concept names and perhaps a new observation extension could be worthwhile.
As a side note, if we decide to make any changes in the existing schemas it's important to change the namespace because they are already being used by providers. It's possible to map concepts from two schemas if they have different namespaces (see http://rs.tdwg.org/tapir/cs/mappings/) allowing providers to easily upgrade their configuration when necessary.
2) Fully agree with Dave.
3) Fully agree with Dave and Hannu.
4) I tend to agree with John, unless we expect that a few observation/specimen data providers will be able to offer additional GUIDs (such as taxon concept GUIDs) in the next years. I doubt this will happen soon, but I may be wrong.
5) I see the current application schema more as an example. There can be as many application schemas as necessary from TAPIR's perspective. This one works well for flat data structures. ABCD or something along the lines of Markus' new schema would be better options when there can be repeatable elements.
6) Fully agree with John's answer. I think it's important to allow data cleaning through valid XML instances.
Best Regards, -- Renato
I got sidetracked on this days ago, but feel in the light of recent star schema discussions on the original caching thread that the time is again right to submit this new discussion.
Tradition has DwC discussions on this Tapir mailing list. I'm starting this new thread based on Markus' recent posting (below) about an Identification extension to DwC. I'm motivated to pull together the time and energy to finally push the pending DwC through the standards process, with a goal of having that whole process finished by the TDWG Meeting this year. I've been thinking about how to conduct the Request for Comment required to move the standard forward. I propose to put together a survey with Survey Monkey or something akin to actually test for reasonble concensus. Any comments or suggestions about this idea are welcome. However, I see benefits to having further discussion about some key issues before doing that, as I believe we now have enough accumulated experience to make some good decisions that will affect the design and guidelines for further development of the Darwin core and extensions.
In the past, most Darwin Core discussions have revolved about whether to include a particular concept, and where. I think it will be much more useful to concentrate on a few key issues at a higher level, resolve them at that level, then make any necessary changes to the schemas based on the consensus guiding principles. It should be easy and fast to accomplish this if the principles are clear and simple. It should be possible to complete this work soon if we can easily achieve a concensus. Here are some seed questions and recommendations to facilitate the resolution if this next step in the process.
- Is species occurrence in nature and in collections the right scope for
the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>
Anything short of a flamethrower in response is welcome.