I got sidetracked on this days ago, but feel in the light of recent star schema discussions on the original caching thread that the time is again right to submit this new discussion.
Tradition has DwC discussions on this Tapir mailing list. I'm starting this new thread based on Markus' recent posting (below) about an Identification extension to DwC. I'm motivated to pull together the time and energy to finally push the pending DwC through the standards process, with a goal of having that whole process finished by the TDWG Meeting this year. I've been thinking about how to conduct the Request for Comment required to move the standard forward. I propose to put together a survey with Survey Monkey or something akin to actually test for reasonble concensus. Any comments or suggestions about this idea are welcome. However, I see benefits to having further discussion about some key issues before doing that, as I believe we now have enough accumulated experience to make some good decisions that will affect the design and guidelines for further development of the Darwin core and extensions.
In the past, most Darwin Core discussions have revolved about whether to include a particular concept, and where. I think it will be much more useful to concentrate on a few key issues at a higher level, resolve them at that level, then make any necessary changes to the schemas based on the consensus guiding principles. It should be easy and fast to accomplish this if the principles are clear and simple. It should be possible to complete this work soon if we can easily achieve a concensus. Here are some seed questions and recommendations to facilitate the resolution if this next step in the process.
1) Is species occurrence in nature and in collections the right scope for the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>
John's two-cent opinions: 1) Yes 2) Tough question. Inclusive, but with a well-defined path from testing to inclusion. I see no merit in minimalism for its own sake. Candidate concepts should be in scope (the discovery or retrieval of occurrence information), have a demonstrated audience, and be stable following testing. New and untested concepts can go into test extensions and application schemas that import the core, other extensions and the test extension. 3) Alignment with the Core Ontology ( http://wiki.tdwg.org/twiki/bin/view/TAG/CoreOntology) is a good guiding principle for the design of extensions. To me this suggests, for example, that an Identification Extension is appropriate as a model for the CoreIdentification. The tough bit is to decide which objects constitute extensions. All of them? Higher-level ones? Ones that are likely to have services built around them? For example, should there be an extension for a CorePlace (Geospatial extension) or a CoreGathering (Geospatial extension with event information)? Another tough bit is to decide if objects that can have a one-to-many relationship with the Core should have Status concepts. For example, given an Identification Extension to the Core, should that extension have an IdentificationStatus concept in which to label a uniquely "accepted" identification? Sorry, more questions than answers here. The same recommendations for inclusion of concepts in the core (above) apply to extensions - that they should be tested and stable. 4) Tough question. We don't seem to be completely prepared from the implementation perspective to apply GUIDs to occurrences, let alone other objects whose nature may change over time. Is there a convincing argument one way or another? In the absence of an argument in favor, I guess the default response is "No new GUID concepts". 5) The existing Darwin Record Application Schema for Tapir ( http://rs.tdwg.org/dwc/tdwg_dw_record_tapir.xsd) is a good model. It is working well in practice so far. The concepts will have to change to accommodate any changes in the Core or extensions, but the structure and method of composition of the application schema seem sound. 6) No, it isn't the right approach to overly restrict content at the concept definition level for the simple reason that if we do that, we will remove the need and value of applications or services built on top of the distributed networks (or caches built from them) to help collections validate or do error detection on their data. That would be a great loss as an incentive to participate. Besides, application schemas can be built from the existing concept definitions and may further restrict them for specialized purposes.
Anything short of a flamethrower in response is welcome.
---------- Forwarded message ---------- From: Markus Döring mdoering@gbif.org Date: Fri, May 16, 2008 at 1:29 AM Subject: Re: [tdwg-tapir] Fwd: Tapir protocol - Harvest methods? To: Renato De Giovanni renato@cria.org.br Cc: tdwg-tapir@lists.tdwg.org
Renato,
<snip>
I have created an identification extension for darwin core that holds the historical list of identification events and their outcome. This is a YAML section of the metafile describing the columns for this extension through fully qualified concepts ala TAPIR:
identification: - http://rs.tdwg.org/dwc/dwcore/ScientificName - http://rs.tdwg.org/dwc/dwcore/AuthorYearOfScientificName - http://rs.tdwg.org/dwc/dwcore/Family - http://rs.tdwg.org/dwc/dwcore/IdentificationQualifier - http://rs.tdwg.org/dwc/curatorial/DateIdentified - http://rs.tdwg.org/dwc/curatorial/IdentifiedBy
When creating this I realised that pretty much all concepts I was interested in already existed in darwin core or the curatorial extension. Wouldnt it be wise to reuse those concepts? Or are they strictly tight to the idea of a current identification and therefore cant be used for historical ones? This is probably more of a darwin core question than TAPIR, but we are all on this list anyway ...
The xml in that case would look sth like this:
<record uri="http://mygarden.com/specimen/plants/54321-423-43-54-6-3-24-44 "> dwc:ScientificNameAster alpinus subsp. parvicepsdwc:ScientificName ... ident:record dwc:ScientificNameAster alpinusdwc:ScientificName dwc:AuthorYearOfScientificNameL.</dwc:AuthorYearOfScientificName> dwc:FamilyAsteraceaedwc:Family cur:DateIdentified1913-03-12</cur:DateIdentified> cur:IdentifiedByKarl Marx</cur:IdentifiedBy> </ident:record> ident:record dwc:ScientificNameAster alpinus subsp. parvicepsdwc:ScientificName dwc:AuthorYearOfScientificNameNovopokr.</ dwc:AuthorYearOfScientificName> dwc:FamilyAsteraceaedwc:Family cur:DateIdentified2003-09-07</cur:DateIdentified> cur:IdentifiedByKeith Richards</cur:IdentifiedBy> </ident:record> <record>
Markus
</snip>