Advance of DarwinCore postponed.
A surprise at the recent TDWG annual meeting was the withdrawal of DarwinCore2 from the ballot of standards up for recommendation. This was done because recent changes need more time for review and explanation; it does NOT mean that support for the DarwinCore has been withdrawn, nor does it mean that we have to wait until next year to finalize the schema. With some diligent work over the next two to three months, we can finalize the DarwinCore2, test it, and begin broader deployment. If you are a current or potential user of the DarwinCore, please note that additional work needs to be done and your participation would be appreciated. The longer version of what transpired in St. Petersburg follows below.
On July 10th, more than 60 days before the annual meeting as prescribed by the TDWG bylaws, John Wieczorek and I posted a revised version of the DarwinCore2 on the review web site (http://darwincore.calacademy.org/Documentation/DarwinCore2Draft_v1-4_HTML). This draft incorporated the advisable changes proposed since the draft of October, 2004. We then informed the TDWG Executive Committee, that the DarwinCore2 was ready to be considered for recommendation as a full TDWG standard.
On September 12th at the annual meeting, conveners of the respective subgroups presented their proposed standards in a plenary session. I summarized the current state of the DarwinCore2, the changes made since last year, and the rationale behind them. The most significant change was to remove the geospatial elements from the core and place them in a geospatial extension. We did this for two reasons: 1) to follow the emerging best practice of constructing schemas to be interoperable across domains; and 2) to improve the stability of the core by making it smaller. Unfortunately, this change caught many people unaware, and serious questions were raised in the discussion period. These questions and my responses to them are summarized here.
1) Geospatial elements are critical to many users of the DarwinCore, so why remove them?
Response: There is broad agreement among data architects, within TDWG and beyond, that the best way to achieve interoperability across domains is to import external schemas (and thereby the elements they contain), for example GML, rather than redundantly defining conceptually equivalent elements in our own namespace, as we have done in the past. An alternative we did not discuss, but will consider in the coming weeks, is whether a better strategy would be to import the GML elements directly into the core rather than putting them into an extension. In any case, we are certain that the DarwinCore, or application schemas based on the DarwinCore, will import their geospatial elements from GML.
2) These critical geospatial elements are now relegated to a specification that is not at the same level of maturity as the DarwinCore. Is this a good thing to do?
Response: This will be inevitable if we develop our information domain by solidifying the areas with greatest commonality and defer more specialized elements to subsequent work by appropriate stakeholders.
[A point I didn't make at the meeting, but would like to make now is that GML is actually more mature, or at least more broadly deployed, than any TDWG standard. So the elements in the geospatial extension are more stable than the DarwinCore, not less stable.]
3) Can the geospatial extension include the GML elements for lines and polygons as well as those for points?
Response: This will need further investigation, but would certainly be desirable in the long run.
Another cautionary marker appeared the next day among the contributed papers. Roger Hyam gave a presentation about managing change among interdependent schemas. He described a situation in which version dependencies among schemas could require them to be upgraded simultaneously, which effectively eliminates the main benefit of separating them in the first place. Gregor Hagedorn challenged the generality of Roger's point saying that proper referencing of schemas could ensure the required flexibility. The issue was left open and obviously requires further study and resolution, preferably in the form of a recommendation from a group of technical architects to the groups constructing references among schemas.
As the deadline approached to open the vote on standards up for recommendation, both Adrian Rissone and Walter Berendsohn approached me (Stan) separately and asked me to withdraw the DarwinCore from the ballot. Both of them said that people had expressed to them the opinion that the recent changes to the DarwinCore were too large relative to the earlier draft, and too recent to warrant putting the schema up for recommendation as a standard. Although I believe the DarwinCore would have received at least the required simple majority of votes, in the interest of broader consensus I agreed to withdraw it from the ballot. Therefore, at the beginning of the final session on Tuesday I "announced" that we had agreed to take the DarwinCore off the ballot while the larger architectural issues were settled. I went on to explain that in my judgement the best course of action would include the following tasks:
1) convene technical experts to develop an explicit recommendation about how to incorporate elements from one schema into another (i.e., to address the issue Roger Hyam raised);
2) work with the various DarwinCore user communities to determine the most appropriate allocation of elements to the core and its extensions, producing explicit lists of elements that would be available to each specialist community by constructing their schema from the core, GML, and their own extensions;
3) resolve these issues as quickly as possible (2-3 months), and fix the resulting draft as version 2.0;
4) Develop explicit instructions for upgrading providers with DarwiCore2 and begin testing.
At the end of the discussion I asked for vocal expressions with agreement or disagreement for withdrawing the darwincore from the ballot. About 10 people voiced support for withdrawing it from the ballot, no one expressed disagreement, and most were silent (perhaps stunned). It was done, and now the work goes on.
An architecture interest group is being established and will be announced on the main TDWG mailing list. Roger Hyam will lead that group until more formal arrangements are made.
Finally, I want to point out some problems we have had with group dynamics. I was too busy this last year to devote sufficient time to promoting discussion and developing consensus. Things are only going to get worse for me with the need to revise the general TDWG standards development process. Therefore, John Wieczorek and Renato de Giovanni are going take over managing the DarwinCore, though I will do my best to continue to participate. I think passive publishing (e.g., via a web site or wiki) is ineffective if the tempo of contributions is episodic. People stop visiting the web site when activity drops off, and they don't come back unless something is pushed into their inbox. Therefore I would like to encourage stakeholders in the DarwinCore to subscribe to the email list. Instructions can be found at http://circa.gbif.net/tdwg.
We will do our best to respond to your concerns and keep the discussion moving.
Sincerely,
Stan Blum