Hi again,
I think Steve's point about aggregating data is a critical one. When you have a unique data source, you have a lot more control over issues such as protocol, effort, data quality, etc. In a distributed data environment, you need to be able to make the assesment about the data you are using as a data analyst.
This always comes down to the issue of what questions you want to ask of your data. The goal is not necessarily to describe all data, but to share data for particular purposes. If you are talking about population monitoring, anything that will affect your chances of detecting a particular organism during an event is potentially of interest to the analyst. I think we'll need to take on one piece at a time, at the risk of being overwhelmed. Here's an attempt to summarize the main issues at hand:
- data quality/uncertainty (what are the parameters needed to describe it, and how do they differ among different communities). Some of it is a property of the protocol (eg, citizen-science vs. professional biologists), and some is a property of individual observations. Both are important.
- protocol (what are the parameters needed to describe it, and how do they differ among communities). Protocol issues include how to measure effort (for how long did you observe, how big an area did you cover), taxonomic focus (what are the taxa of interest? all birds? all aquatic invertebrates? all waterfowl? all raptors?), and more.
- sampling framework (is this a stratified survey, what are your strata, etc.). This is somewhat related to protocol, but you may also need information about individual strata (size, etc.).
- ancillary data (eg, environmental data, weather, water chemistry, etc. or community-specific information about the organisms, such as fat score for birds). Which ones are important, and most importantly, to answer what specific questions? This is probably the most complex aspect of having multiple communities involved. A lot of this will probably require specific extensions.
- I'm not sure how we can describe detectability in a data schema. This may be more something you want to be sure that you can measure from the data itself. If so, what are the components needed to do so?
- are there other topics?
I think we've made some progress already within the bird community on many of those fronts with the BMDE schema, but there's still some work to do, especially on the protocol description standards. Not sure what the best way to go about making progress on those things is for the larger monitoring community.
What about building a dictionnary of monitoring protocols that we can refer to from the data schema? If each protocol had a unique ID, and this information was available somewhere outside of the data schema, it would probably simplify things immensely. The data schema would still need to have some protocol information (eg, if the duration varies among individual events following a given protocol). Were you thinking about something along those lines, Steve?
Good night
Denis
Denis Lepage, Senior Scientist/Chercheur sénior
National Data Center/Centre national des données
Bird Studies Canada/Études d'Oiseaux Canada
PO Box/B.P. 160, Port Rowan, ON N0E 1M0
519-586-3531 ext. 225, fax/téléc. 519-586-3532