What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
Joel, thanks for this nice summary. Darwin Core definitely needs far more examples to illustrate its use. We should probably provide various examples for all the different use cases we already know dwc can be used for.
Darwin Core has a recordedBy term. Did that not fit your obs:observedBy property and if not what exactly is the difference? See http://rs.tdwg.org/dwc/terms/index.htm#recordedBy
Thanks, Markus
On Oct 11, 2010, at 13:46, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Markus,
The issue is that the observer and the the recorder are not always the same person. A student or a team member might report the observation to a teacher/team leader, who then creates the record. As you and Tim point out, the definition of recordedBy does allow us to use the term for both observers and record creators, but then we lose the distinction between the two.
Joel.
On Mon, 11 Oct 2010, "Markus D�ring (GBIF)" wrote:
Joel, thanks for this nice summary. Darwin Core definitely needs far more examples to illustrate its use. We should probably provide various examples for all the different use cases we already know dwc can be used for.
Darwin Core has a recordedBy term. Did that not fit your obs:observedBy property and if not what exactly is the difference? See http://rs.tdwg.org/dwc/terms/index.htm#recordedBy
Thanks, Markus
On Oct 11, 2010, at 13:46, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I see. From what I understand (please correct me if Im wrong, John) dwc:recordedBy is rather the primary observer in the field than the person keying in the dwc metadata record about it. If that's the missing bit I would use dcterm:creator instead: http://dublincore.org/documents/dcmi-terms/#elements-creator and we could think about including this into the official record level terms in darwin core.
dc:source is another record level term I personally include very often although not explicitly listed in the dwc guide. For species checklists I often want to include a link back to the often richer source webpage.
Markus
On Oct 11, 2010, at 14:53, joel sachs wrote:
Markus,
The issue is that the observer and the the recorder are not always the same person. A student or a team member might report the observation to a teacher/team leader, who then creates the record. As you and Tim point out, the definition of recordedBy does allow us to use the term for both observers and record creators, but then we lose the distinction between the two.
Joel.
On Mon, 11 Oct 2010, "Markus Döring (GBIF)" wrote:
Joel, thanks for this nice summary. Darwin Core definitely needs far more examples to illustrate its use. We should probably provide various examples for all the different use cases we already know dwc can be used for.
Darwin Core has a recordedBy term. Did that not fit your obs:observedBy property and if not what exactly is the difference? See http://rs.tdwg.org/dwc/terms/index.htm#recordedBy
Thanks, Markus
On Oct 11, 2010, at 13:46, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Mon, Oct 11, 2010 at 6:38 AM, "Markus Döring (GBIF)" mdoering@gbif.orgwrote:
I see. From what I understand (please correct me if Im wrong, John) dwc:recordedBy is rather the primary observer in the field than the person keying in the dwc metadata record about it.
Yes. DwC doesn't have a term for the person creating the record - recordedBy is meant to be the collector(s) or observer(s) in the field who made the determination that a species occurred there.
If that's the missing bit I would use dcterm:creator instead: http://dublincore.org/documents/dcmi-terms/#elements-creator and we could think about including this into the official record level terms in darwin core.
dc:source is another record level term I personally include very often although not explicitly listed in the dwc guide. For species checklists I often want to include a link back to the often richer source webpage.
dcterms:source seems like a reasonable addition to the record-level terms borrowed from Dublin Core. There is no evidence that we ever attempted to include it, which means at least that we didn't have a reason to reject it.
Markus
On Oct 11, 2010, at 14:53, joel sachs wrote:
Markus,
The issue is that the observer and the the recorder are not always the
same person. A student or a team member might report the observation to a teacher/team leader, who then creates the record. As you and Tim point out, the definition of recordedBy does allow us to use the term for both observers and record creators, but then we lose the distinction between the two.
Joel.
On Mon, 11 Oct 2010, "Markus Döring (GBIF)" wrote:
Joel, thanks for this nice summary. Darwin Core definitely needs far more
examples to illustrate its use. We should probably provide various examples for all the different use cases we already know dwc can be used for.
Darwin Core has a recordedBy term. Did that not fit your obs:observedBy
property and if not what exactly is the difference?
See http://rs.tdwg.org/dwc/terms/index.htm#recordedBy
Thanks, Markus
On Oct 11, 2010, at 13:46, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the
suitability
and appropriatness of TDWG standards for citizen science. Robert
Stevenson
has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have.
We
can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However,
there
is a desperate need for examples and templates of its use. To
illustrate
this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the
bioblitz
data profile [1]. I think the lessons learned document should include
an
on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the
bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property
in
DwC, and it's important in Citizen Science (though often not
available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude
and
longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their
coordinates,
we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns
to
the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as
simply
XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than
taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it
up,
and see that any scientific name is acceptable, at any taxonomic rank,
or
not having any rank. And once we have a scientific name, higher ranks
can
be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and
so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt
it
was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of
schema
is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from
the
world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework,
etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4.
http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hello, I was not involved with TDWG this year for a variety of reasons. But we have organized a lot (@80 million records) of birds using an extension of Darwin Core. This work may be relevant to encoding bioblitz type data.
What we were particularly focused on adding was information about the observer, collection event, protocol, and location. You can view the additions at http://www.avianknowledge.net/content/about/bmde-variable-descriptions
The information that we have organized with this extension of Darwin Core has been successfully used in numerous publications, and includes sufficient information on effort etc to allow assumptions of detectability, bias, and incorporation of negative data.
Denis Lepage spearheaded the development of the Bird Monitoring Data Exchange schema, but was assisted by numerous members of the bird monitoring and informatics community.
Steve Kelling
On Mon, Oct 11, 2010 at 2:00 PM, John Wieczorek tuco@berkeley.edu wrote:
On Mon, Oct 11, 2010 at 6:38 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
I see. From what I understand (please correct me if Im wrong, John) dwc:recordedBy is rather the primary observer in the field than the person keying in the dwc metadata record about it.
Yes. DwC doesn't have a term for the person creating the record - recordedBy is meant to be the collector(s) or observer(s) in the field who made the determination that a species occurred there.
If that's the missing bit I would use dcterm:creator instead: http://dublincore.org/documents/dcmi-terms/#elements-creator and we could think about including this into the official record level terms in darwin core.
dc:source is another record level term I personally include very often although not explicitly listed in the dwc guide. For species checklists I often want to include a link back to the often richer source webpage.
dcterms:source seems like a reasonable addition to the record-level terms borrowed from Dublin Core. There is no evidence that we ever attempted to include it, which means at least that we didn't have a reason to reject it.
Markus
On Oct 11, 2010, at 14:53, joel sachs wrote:
Markus,
The issue is that the observer and the the recorder are not always the same person. A student or a team member might report the observation to a teacher/team leader, who then creates the record. As you and Tim point out, the definition of recordedBy does allow us to use the term for both observers and record creators, but then we lose the distinction between the two.
Joel.
On Mon, 11 Oct 2010, "Markus Döring (GBIF)" wrote:
Joel, thanks for this nice summary. Darwin Core definitely needs far more examples to illustrate its use. We should probably provide various examples for all the different use cases we already know dwc can be used for.
Darwin Core has a recordedBy term. Did that not fit your obs:observedBy property and if not what exactly is the difference? See http://rs.tdwg.org/dwc/terms/index.htm#recordedBy
Thanks, Markus
On Oct 11, 2010, at 13:46, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However,
there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than
taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and
so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next
year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Joel,
Thanks for taking the time to summarise this. A few comments inline:
On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However,
there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
Is this not the intention of recordedBy?
http://rs.tdwg.org/dwc/terms/#recordedBy A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications.
Keeping an inventory of applications somewhere might be worthwhile to help promote or decide on this.
Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than
taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile,
and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
For citizen science, would it not make more sense to apply some easy guideline to select one of: - HumanObservation - PreservedSpecimen - LivingSpecimen (http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
Basis of record is one of the fundamental fields to know when consuming content, so I think any effort to capture that at source will be worthwhile in the long run.
- There seemed to be enthusiasm for another field event at next
year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please could you tell unsubscribe me from this list? I've tried following the unsubscribe instructions from the website but that hasn't worked.
Thank you, Katie
________________________________ From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Tim Robertson (GBIF) Sent: 11 October 2010 13:00 To: joel sachs Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Joel,
Thanks for taking the time to summarise this. A few comments inline:
On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
Is this not the intention of recordedBy?
http://rs.tdwg.org/dwc/terms/#recordedBy A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications.
Keeping an inventory of applications somewhere might be worthwhile to help promote or decide on this.
Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
For citizen science, would it not make more sense to apply some easy guideline to select one of: - HumanObservation - PreservedSpecimen - LivingSpecimen (http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
Basis of record is one of the fundamental fields to know when consuming content, so I think any effort to capture that at source will be worthwhile in the long run.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
---------------------------------------- This email and any attachments might contain information that is confidential or protected by legal privilege. We advise that you carry out your own virus checks before opening any attachment. Eton College is a charity registered with HMRC under number X6839. Eton College, Windsor, Berkshire SL4 6DJ
Katie,
I entered your email at the bottom of http://lists.tdwg.org/mailman/listinfo/tdwg-content and clicked "Unsubscribe or edit options". I then clicked "Unsubscribe" on the following page. You won't be unsubsribed until you follow the instructions in your confirmation email.
If you follow these instruction, and you remained subsrcibed, send mail directly to the list owners listed at the bottom of the http://lists.tdwg.org/mailman/listinfo/tdwg-content page.
Joel.
On Mon, 11 Oct 2010, k.flanagan@etoncollege.org.uk wrote:
Please could you tell unsubscribe me from this list? I've tried following the unsubscribe instructions from the website but that hasn't worked.
Thank you, Katie
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Tim Robertson (GBIF) Sent: 11 October 2010 13:00 To: joel sachs Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Joel,
Thanks for taking the time to summarise this. A few comments inline:
On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
Is this not the intention of recordedBy?
http://rs.tdwg.org/dwc/terms/#recordedBy A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications.
Keeping an inventory of applications somewhere might be worthwhile to help promote or decide on this.
Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
For citizen science, would it not make more sense to apply some easy guideline to select one of:
- HumanObservation
- PreservedSpecimen
- LivingSpecimen
(http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
Basis of record is one of the fundamental fields to know when consuming content, so I think any effort to capture that at source will be worthwhile in the long run.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This email and any attachments might contain information that is confidential or protected by legal privilege. We advise that you carry out your own virus checks before opening any attachment. Eton College is a charity registered with HMRC under number X6839. Eton College, Windsor, Berkshire SL4 6DJ
Another element is a link between images, sound or whatever and the observation. This is useful since, at least for a time these media items are independent of the main observation. All information about the same object at the same time and place should be linked by an event identifier. In collections work this might be the collectors sequence number but any number/string such as Heidorn-20100915-215 that is unique to the item would be fine. It would be associated with the main entry, images, sound or other records. In flickr this could be put into a tag. Some cameras allow prefix and auto increment of a trailing number which would work. Field observation tools could do the same and on paper it could be Observer+Date+SequenceNumber or anything else. -- Bryan On Mon, Oct 11, 2010 at 6:25 AM, joel sachs jsachs@csee.umbc.edu wrote:
Katie,
I entered your email at the bottom of http://lists.tdwg.org/mailman/listinfo/tdwg-content and clicked "Unsubscribe or edit options". I then clicked "Unsubscribe" on the following page. You won't be unsubsribed until you follow the instructions in your confirmation email.
If you follow these instruction, and you remained subsrcibed, send mail directly to the list owners listed at the bottom of the http://lists.tdwg.org/mailman/listinfo/tdwg-content page.
Joel.
On Mon, 11 Oct 2010, k.flanagan@etoncollege.org.uk wrote:
Please could you tell unsubscribe me from this list? I've tried following the unsubscribe instructions from the website but that hasn't worked.
Thank you, Katie
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Tim Robertson (GBIF) Sent: 11 October 2010 13:00 To: joel sachs Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Joel,
Thanks for taking the time to summarise this. A few comments inline:
On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
Is this not the intention of recordedBy?
http://rs.tdwg.org/dwc/terms/#recordedBy A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications.
Keeping an inventory of applications somewhere might be worthwhile to help promote or decide on this.
Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
For citizen science, would it not make more sense to apply some easy guideline to select one of:
- HumanObservation
- PreservedSpecimen
- LivingSpecimen
(http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
Basis of record is one of the fundamental fields to know when consuming content, so I think any effort to capture that at source will be worthwhile in the long run.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This email and any attachments might contain information that is confidential or protected by legal privilege. We advise that you carry out your own virus checks before opening any attachment. Eton College is a charity registered with HMRC under number X6839. Eton College, Windsor, Berkshire SL4 6DJ
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This seems to match the definition of the collectors field number: http://rs.tdwg.org/dwc/terms/index.htm#fieldNumber
A slightly broader one is the eventID that usually groups several observations, e.g. when recording a plot: http://rs.tdwg.org/dwc/terms/index.htm#eventID
Markus
On Oct 11, 2010, at 16:22, Bryan wrote:
Another element is a link between images, sound or whatever and the observation. This is useful since, at least for a time these media items are independent of the main observation. All information about the same object at the same time and place should be linked by an event identifier. In collections work this might be the collectors sequence number but any number/string such as Heidorn-20100915-215 that is unique to the item would be fine. It would be associated with the main entry, images, sound or other records. In flickr this could be put into a tag. Some cameras allow prefix and auto increment of a trailing number which would work. Field observation tools could do the same and on paper it could be Observer+Date+SequenceNumber or anything else. -- Bryan On Mon, Oct 11, 2010 at 6:25 AM, joel sachs jsachs@csee.umbc.edu wrote:
Katie,
I entered your email at the bottom of http://lists.tdwg.org/mailman/listinfo/tdwg-content and clicked "Unsubscribe or edit options". I then clicked "Unsubscribe" on the following page. You won't be unsubsribed until you follow the instructions in your confirmation email.
If you follow these instruction, and you remained subsrcibed, send mail directly to the list owners listed at the bottom of the http://lists.tdwg.org/mailman/listinfo/tdwg-content page.
Joel.
On Mon, 11 Oct 2010, k.flanagan@etoncollege.org.uk wrote:
Please could you tell unsubscribe me from this list? I've tried following the unsubscribe instructions from the website but that hasn't worked.
Thank you, Katie
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Tim Robertson (GBIF) Sent: 11 October 2010 13:00 To: joel sachs Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Joel,
Thanks for taking the time to summarise this. A few comments inline:
On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
Is this not the intention of recordedBy?
http://rs.tdwg.org/dwc/terms/#recordedBy A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications.
Keeping an inventory of applications somewhere might be worthwhile to help promote or decide on this.
Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
For citizen science, would it not make more sense to apply some easy guideline to select one of:
- HumanObservation
- PreservedSpecimen
- LivingSpecimen
(http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
Basis of record is one of the fundamental fields to know when consuming content, so I think any effort to capture that at source will be worthwhile in the long run.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This email and any attachments might contain information that is confidential or protected by legal privilege. We advise that you carry out your own virus checks before opening any attachment. Eton College is a charity registered with HMRC under number X6839. Eton College, Windsor, Berkshire SL4 6DJ
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Bryan Heidorn University of Arizona http://www.sirls.arizona.edu/heidorn _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Sure does look like that would do it. We just need to include it as a core element for a bioblitz.
On Mon, Oct 11, 2010 at 7:35 AM, Markus Döring m.doering@mac.com wrote:
This seems to match the definition of the collectors field number: http://rs.tdwg.org/dwc/terms/index.htm#fieldNumber
A slightly broader one is the eventID that usually groups several observations, e.g. when recording a plot: http://rs.tdwg.org/dwc/terms/index.htm#eventID
Markus
On Oct 11, 2010, at 16:22, Bryan wrote:
Another element is a link between images, sound or whatever and the observation. This is useful since, at least for a time these media items are independent of the main observation. All information about the same object at the same time and place should be linked by an event identifier. In collections work this might be the collectors sequence number but any number/string such as Heidorn-20100915-215 that is unique to the item would be fine. It would be associated with the main entry, images, sound or other records. In flickr this could be put into a tag. Some cameras allow prefix and auto increment of a trailing number which would work. Field observation tools could do the same and on paper it could be Observer+Date+SequenceNumber or anything else. -- Bryan On Mon, Oct 11, 2010 at 6:25 AM, joel sachs jsachs@csee.umbc.edu wrote:
Katie,
I entered your email at the bottom of http://lists.tdwg.org/mailman/listinfo/tdwg-content and clicked "Unsubscribe or edit options". I then clicked "Unsubscribe" on the following page. You won't be unsubsribed until you follow the instructions in your confirmation email.
If you follow these instruction, and you remained subsrcibed, send mail directly to the list owners listed at the bottom of the http://lists.tdwg.org/mailman/listinfo/tdwg-content page.
Joel.
On Mon, 11 Oct 2010, k.flanagan@etoncollege.org.uk wrote:
Please could you tell unsubscribe me from this list? I've tried following the unsubscribe instructions from the website but that hasn't worked.
Thank you, Katie
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Tim Robertson (GBIF) Sent: 11 October 2010 13:00 To: joel sachs Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Joel,
Thanks for taking the time to summarise this. A few comments inline:
On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
Is this not the intention of recordedBy?
http://rs.tdwg.org/dwc/terms/#recordedBy A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications.
Keeping an inventory of applications somewhere might be worthwhile to help promote or decide on this.
Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
For citizen science, would it not make more sense to apply some easy guideline to select one of:
- HumanObservation
- PreservedSpecimen
- LivingSpecimen
(http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
Basis of record is one of the fundamental fields to know when consuming content, so I think any effort to capture that at source will be worthwhile in the long run.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This email and any attachments might contain information that is confidential or protected by legal privilege. We advise that you carry out your own virus checks before opening any attachment. Eton College is a charity registered with HMRC under number X6839. Eton College, Windsor, Berkshire SL4 6DJ
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Bryan Heidorn University of Arizona http://www.sirls.arizona.edu/heidorn _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Tim,
Thanks - responses below ...
On Mon, 11 Oct 2010, Tim Robertson (GBIF) wrote:
Hi Joel,
Thanks for taking the time to summarise this. A few comments inline:
On Oct 11, 2010, at 1:46 PM, joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
Is this not the intention of recordedBy?
http://rs.tdwg.org/dwc/terms/#recordedBy A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
I think it's useful to preserve the distinction between the primary observer and the record creator, and this distinction is lost in a concatenated list.
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications.
Keeping an inventory of applications somewhere might be worthwhile to help promote or decide on this.
An inventory of consumers of DwC, geo, and other namespaces would be great.
Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
For citizen science, would it not make more sense to apply some easy guideline to select one of:
- HumanObservation
- PreservedSpecimen
- LivingSpecimen
(http://code.google.com/p/darwincore/wiki/RecordLevelTerms)
Basis of record is one of the fundamental fields to know when consuming content, so I think any effort to capture that at source will be worthwhile in the long run.
I agree. We did include an "Evidence" field in the paper field sheet, although we neglected to modify the Fusion table and the data documentation. My point is that it's nice to have an architecture that doesn't punish you too much for thinking of additional terms at the last minute.
Cheers - Joel.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Joel, Thanks for the summary. Note that several services are already planned as part of the improvements for the CoL in the 4d4life and i4life projects, including annotation services. I'll try to include the ideas from Rod Page in the discussions about the new CoL services to implement.
Wouter
____________________________________________________________ Ir Wouter Addink Deputy Director ETI & Head Informatics Department ETI BioInformatics, University of Amsterdam Mauritskade 61,1092 AD Amsterdam, The Netherlands Phone: +31 20 5257239, Fax: +31 20 5257238 Web: http://www.eti.uva.nl LinkedIn: http://www.linkedin.com/in/wouteraddink new: Heukels' Flora NL iPhone app! "Biodiversity is the biological life-insurance of mankind" -J. Pronk
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content- bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: maandag 11 oktober 2010 13:47 To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However,
there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than
taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and
so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in- 2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Natural occurrence is meant to be captured through the term dwc:establishmentMeans ( http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote:
Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we had some
problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
[cid:image001.png@01CB6A04.E311F750]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote: Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.commailto:tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
untitled
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote:
Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
[cid:image001.png@01CB6A0B.23EDE990]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
________________________________ From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
[cid:image001.png@01CB6A0B.23EDE990]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote: Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.commailto:tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Is this hypothetical "weeding out" something that couldn't be done with controlled vocabularies? The recommended best practice is to use one, and that's as controlled as we ever get with Darwin Core terms outside of implementations.
On Mon, Oct 11, 2010 at 6:43 PM, Donald.Hobern@csiro.au wrote:
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
[image: untitled]
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au
*From:* Richard Pyle [mailto:deepreef@bishopmuseum.org] *Sent:* Tuesday, 12 October 2010 12:33 PM *To:* Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu
*Cc:* tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com *Subject:* RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Donald.Hobern@csiro.au *Sent:* Monday, October 11, 2010 2:59 PM *To:* tuco@berkeley.edu *Cc:* tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com *Subject:* Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled – effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
[image: untitled]
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au
*From:* gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] *On Behalf Of *John Wieczorek *Sent:* Tuesday, 12 October 2010 11:34 AM *To:* Hobern, Donald (CES, Black Mountain) *Cc:* jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org *Subject:* Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans ( http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote:
Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we had some
problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If a recommended controlled vocabulary was provided, rather than examples, that would help.
Donald
[cid:image001.png@01CB6A0F.05F2C100]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 12:47 PM To: Hobern, Donald (CES, Black Mountain) Cc: deepreef@bishopmuseum.org; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Is this hypothetical "weeding out" something that couldn't be done with controlled vocabularies? The recommended best practice is to use one, and that's as controlled as we ever get with Darwin Core terms outside of implementations. On Mon, Oct 11, 2010 at 6:43 PM, Donald.Hobern@csiro.au wrote: Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
[cid:image001.png@01CB6A0F.05F2C100]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: Richard Pyle [mailto:deepreef@bishopmuseum.orgmailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edumailto:tuco@berkeley.edu
Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.commailto:tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
________________________________ From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edumailto:tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.commailto:tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
[cid:image001.png@01CB6A0F.05F2C100]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.commailto:gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.commailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edumailto:jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.commailto:tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote: Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.commailto:tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I agree fully that recommended vocabularies would help immensely. As with many Darwin Core terms, the recommended vocabulary for establishmentMeans doesn't actually exist yet. The examples are there in the Darwin Core term commentary to set the stage and give people an idea of the intent. Controlled vocabulary is an area ripe for development for whoever is ready to actually use the controlled terms meaningfully. One way to begin vocabulary development is discuss options here on tdwg-content. Conclusions can be taken forward first as recommendations on the associated Darwin Core secondary (non-normative) documentation pages, such as http://code.google.com/p/darwincore/wiki/Occurrence, then more formally in a shared community vocabulary management system, which is another currently active thread on this discussion forum.
Cheers,
John
On Mon, Oct 11, 2010 at 7:11 PM, Donald.Hobern@csiro.au wrote:
If a recommended controlled vocabulary was provided, rather than examples, that would help.
Donald
[image: untitled]
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au
*From:* gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] *On Behalf Of *John Wieczorek *Sent:* Tuesday, 12 October 2010 12:47 PM
*To:* Hobern, Donald (CES, Black Mountain) *Cc:* deepreef@bishopmuseum.org; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com
*Subject:* Re: [tdwg-content] What I learned at the TechnoBioBlitz
Is this hypothetical "weeding out" something that couldn't be done with controlled vocabularies? The recommended best practice is to use one, and that's as controlled as we ever get with Darwin Core terms outside of implementations.
On Mon, Oct 11, 2010 at 6:43 PM, Donald.Hobern@csiro.au wrote:
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
[image: untitled]
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au
*From:* Richard Pyle [mailto:deepreef@bishopmuseum.org] *Sent:* Tuesday, 12 October 2010 12:33 PM *To:* Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu
*Cc:* tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com
*Subject:* RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Donald.Hobern@csiro.au *Sent:* Monday, October 11, 2010 2:59 PM *To:* tuco@berkeley.edu *Cc:* tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com *Subject:* Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled – effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
[image: untitled]
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au
*From:* gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] *On Behalf Of *John Wieczorek *Sent:* Tuesday, 12 October 2010 11:34 AM *To:* Hobern, Donald (CES, Black Mountain) *Cc:* jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org *Subject:* Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans ( http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote:
Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we had some
problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I certainly agree it's important! I was just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is inhabiting the natural environment) Captive (brought by humans and still maintained in captivity)
You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich
_____
From: Donald.Hobern@csiro.au [mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
untitled
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
untitled
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote:
Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Rich,
Let's not confuse those terms which are best applied to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is inhabiting the natural environment) Captive (brought by humans and still maintained in captivity)
You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment. Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich
________________________________ From: Donald.Hobern@csiro.au [mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
[cid:image001.png@01CB6A23.666B0380]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
________________________________ From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
[cid:image001.png@01CB6A23.666B0380]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote: Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.commailto:tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
________________________________
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich,
Let's not confuse those terms which are best applied to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful) would be something like:
Native (was there without any assistance from humans)
Introduced (got there with the assistance of humans, but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity)
You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha,
Rich
________________________________
From: Donald.Hobern@csiro.au [mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
untitled
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
________________________________
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
untitled
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group. Donald Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific. Here are some of my immediate observations: 1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz". We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions." Here are areas where we augemented or diverged from DwC in the bioblitz: i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available). ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum. If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ (I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this. 2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred. 3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4]. 4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web. 5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc. Happy Thanksgiving to all in Canada - Joel. ---- 1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
________________________________
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
________________________________
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich,
Let's not confuse those terms which are best applied to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful) would be something like:
Native (was there without any assistance from humans)
Introduced (got there with the assistance of humans, but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity)
You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha,
Rich
________________________________
From: Donald.Hobern@csiro.au [mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
untitled
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
________________________________
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
untitled
Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208
Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote:
Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ----
1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
________________________________
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Our approach within the DarwinCore Archive context has been to add dwc:establishmentMeans to the Distribution extension (see http://rs.gbif.org/extension/gbif/1.0/distribution.xml) that would support the publishing of specific and multiple geospatial areas with distinct establishmentMean values. This would makes sense primarily where the basis of record is a taxon and not an occurrence. The term "Nativeness" as a label for the vocabulary is arbitrary in the sense that there is no specific linkage to the dwc:establishmentMeans concept. But we could reference it specifically in an extensions definition as a recommended vocabulary for use with that concept.
Best, David
On Oct 12, 2010, at 1:41 PM, Richard Pyle wrote:
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I believe the observation part should record the "nativeness" of the habitat, not of the species. An animal in a zoo or plant in a botanical garden is likely to be non-native - but it could be. A lichen or a pathogenic fungus is much more likely to be native. But knowing the habitat according to this, would be very useful for pathogenic funguses as well.
With limited nativeness to English, nativeness applied to habitat does not seem to be the correct term to me, establishment means neither.
--------------- Any additional proposal for a term recording the occurrence circumstances with respect natural or man-made? ---------------
Proposals for terms to be covered could be:
Inside a building; including greenhouses (= square kilometers of these in agriculture!) Open air garden or zoo Open agricultural habitat High volume traffic lines (railways, highways, waterways) Habitats with limited human influence.
to be emended and improved....
This list is among other informed, by questions in quarantine, but also escaping biologically modified organisms.
Gregor
I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
________________________________________ From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.orgwrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down
the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of
taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being
maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I
do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence
record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots
Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as
we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to
mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
then this is equivalent to saying that Nativeness is a property of an
Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties
of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list,
having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [
tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [ deepreef@bishopmuseum.org]
Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a
property of
a taxon in a class of habitat, but few people actually frame it this
way).
The reason I think that "Nativeness" is best represented as a property
of an
Occurrence, rather than of a taxon, is that a taxon is a circumscribed
set
of organisms, usually based on evolutionary relatedness or morphological
or
genetic similarity. By contrast, an Occurrence is about the presence of
a
member or multiple members of a taxon concept in space and time (i.e.,
at a
particular place and time).
We often think of Occurrence records in terms of individual organisms
(e.g.,
specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual
organism.
However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon
at a
particular locality, the way that this intersection is usually manifest
in
DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
YES! I like. I'd be happy to contribute use-cases.
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Tuesday, October 12, 2010 10:32 AM To: Markus Döring (GBIF) Cc: tdwg-content@lists.tdwg.org; Roger Hyam; tdwg-bioblitz@googlegroups.com; Jerry Cooper Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down
the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa
in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being
maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do
not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence
record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots
Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as
we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to
mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
then this is equivalent to saying that Nativeness is a property of an
Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties
of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list,
having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org
[tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org]
Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property
of
a taxon in a class of habitat, but few people actually frame it this
way).
The reason I think that "Nativeness" is best represented as a property of
an
Occurrence, rather than of a taxon, is that a taxon is a circumscribed
set
of organisms, usually based on evolutionary relatedness or morphological
or
genetic similarity. By contrast, an Occurrence is about the presence of
a
member or multiple members of a taxon concept in space and time (i.e., at
a
particular place and time).
We often think of Occurrence records in terms of individual organisms
(e.g.,
specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual
organism.
However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at
a
particular locality, the way that this intersection is usually manifest
in
DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Sorry -- I meant I'd be happy to contribute examples (and use-cases, if those are also desired).
Rich
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, October 12, 2010 10:35 AM To: tuco@berkeley.edu; 'Markus Döring (GBIF)' Cc: tdwg-content@lists.tdwg.org; 'Roger Hyam'; tdwg-bioblitz@googlegroups.com; 'Jerry Cooper' Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
YES! I like. I'd be happy to contribute use-cases.
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Tuesday, October 12, 2010 10:32 AM To: Markus Döring (GBIF) Cc: tdwg-content@lists.tdwg.org; Roger Hyam; tdwg-bioblitz@googlegroups.com; Jerry Cooper Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down
the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa
in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being
maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do
not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence
record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots
Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as
we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to
mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
then this is equivalent to saying that Nativeness is a property of an
Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties
of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list,
having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org
[tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org]
Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property
of
a taxon in a class of habitat, but few people actually frame it this
way).
The reason I think that "Nativeness" is best represented as a property of
an
Occurrence, rather than of a taxon, is that a taxon is a circumscribed
set
of organisms, usually based on evolutionary relatedness or morphological
or
genetic similarity. By contrast, an Occurrence is about the presence of
a
member or multiple members of a taxon concept in space and time (i.e., at
a
particular place and time).
We often think of Occurrence records in terms of individual organisms
(e.g.,
specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual
organism.
However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at
a
particular locality, the way that this intersection is usually manifest
in
DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Rich, what about a nomenclatural example, maybe separate ones for each code? http://code.google.com/p/darwincore/wiki/Examples
Markus
On Oct 12, 2010, at 22:36, Richard Pyle wrote:
Sorry -- I meant I'd be happy to contribute examples (and use-cases, if those are also desired).
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, October 12, 2010 10:35 AM To: tuco@berkeley.edu; 'Markus Döring (GBIF)' Cc: tdwg-content@lists.tdwg.org; 'Roger Hyam'; tdwg-bioblitz@googlegroups.com; 'Jerry Cooper' Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
YES! I like. I'd be happy to contribute use-cases.
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of John Wieczorek Sent: Tuesday, October 12, 2010 10:32 AM To: Markus Döring (GBIF) Cc: tdwg-content@lists.tdwg.org; Roger Hyam; tdwg-bioblitz@googlegroups.com; Jerry Cooper Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote: Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples .
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" <mdoering@gbif.org
wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said.
Going down the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence
of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels
and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is
being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of
individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an
occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A
Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As
soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/ naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term
"occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
then this is equivalent to saying that Nativeness is a property
of an Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are
properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content
list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org
] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org]
Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term)
is a
property of a taxon at a geographic locality (it could also be a
property of
a taxon in a class of habitat, but few people actually frame it
this way).
The reason I think that "Nativeness" is best represented as a
property of an
Occurrence, rather than of a taxon, is that a taxon is a
circumscribed set
of organisms, usually based on evolutionary relatedness or
morphological or
genetic similarity. By contrast, an Occurrence is about the
presence of a
member or multiple members of a taxon concept in space and time
(i.e., at a
particular place and time).
We often think of Occurrence records in terms of individual
organisms (e.g.,
specimens, or specific observed or photographed organisms), and I
agree,
it's weird to think of "Nativeness" as it applies to an
individual organism.
However, my understanding is that Occurrence instances can also
apply to
populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a
taxon at a
particular locality, the way that this intersection is usually
manifest in
DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry
Cooper
Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry
Cooper
Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of
Richard Pyle
Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in
captivity)
You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche
modellers,
etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org
]
Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel
sachs
Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com;
tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments
is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare
Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve
Tim Robertson (GBIF) wrote:
I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" <mdoering@gbif.org mailto:mdoering@gbif.org> wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc Id volunteer to do the html page if Im given example records with a short use case description... Markus On Oct 12, 2010, at 13:14, Roger Hyam wrote: > Wow - what a thread to come back to. > > I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence". > > This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down! > > The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked. > > Take two examples. > > A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later. > > A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification. > > As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals. > > I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field. > > There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not. > > The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated. > > Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object. > > Sorry to be long winded. > > Roger > > > On 12 Oct 2010, at 09:36, Kevin Richards wrote: > >> I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing. >> >> Eg, if we describe (in a basic way) : >> Ocurrence = Taxon at Location >> >> then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) >> then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view) >> >> As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area" >> >> Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-) >> >> Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ... >> >> Kevin >> >> ________________________________________ >> From: tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org> [tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org>] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org <mailto:deepreef@bishopmuseum.org>] >> Sent: Tuesday, 12 October 2010 5:41 p.m. >> To: Jerry Cooper; tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >> >> Hi Jerry, >> >> Before we agree to disagree, let me try to elaborate a bit more: >> >> I think we both agree that "Nativeness" (to borrow Dave's term) is a >> property of a taxon at a geographic locality (it could also be a property of >> a taxon in a class of habitat, but few people actually frame it this way). >> >> The reason I think that "Nativeness" is best represented as a property of an >> Occurrence, rather than of a taxon, is that a taxon is a circumscribed set >> of organisms, usually based on evolutionary relatedness or morphological or >> genetic similarity. By contrast, an Occurrence is about the presence of a >> member or multiple members of a taxon concept in space and time (i.e., at a >> particular place and time). >> >> We often think of Occurrence records in terms of individual organisms (e.g., >> specimens, or specific observed or photographed organisms), and I agree, >> it's weird to think of "Nativeness" as it applies to an individual organism. >> However, my understanding is that Occurrence instances can also apply to >> populations -- which is what terms such as establishmentMeans and >> occurrenceStatus fit into this class. >> >> More generally, if we agree that "Nativeness" is a property of a taxon at a >> particular locality, the way that this intersection is usually manifest in >> DwC is via Occurrence and Event instances. >> >> How else would you represent "Nativeness" within DwC? >> >> Aloha, >> Rich >> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org> >>> [mailto:tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org>] On Behalf Of Jerry Cooper >>> Sent: Monday, October 11, 2010 6:02 PM >>> To: tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >>> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> We will have to agree to disagree. >>> >>> For me at least 'Native', 'Invasive' etc are clearly not >>> properties associated with a collection event. They are >>> collective statements, not necessarily about properties of >>> the taxon as a whole, but about the properties of a taxon in >>> some restricted sense - usually geographically restricted. >>> >>> GISIN, like our model here in NZ, pulls together such items >>> under a triplet of taxon/occurrence statement/geographical >>> extent linked to a publication. >>> >>> >>> Jerry >>> >>> >>> -----Original Message----- >>> From: Richard Pyle [mailto:deepreef@bishopmuseum.org <mailto:deepreef@bishopmuseum.org>] >>> Sent: Tuesday, 12 October 2010 4:23 p.m. >>> To: Jerry Cooper >>> Cc: tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >>> Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> Hi Jerry, >>> >>> Yes, this is a road I've been down before. Intuitively, >>> these terms seem like they should apply to taxon concepts, >>> but it turns out that's not the right way to do it. Things >>> like "native" and "invasive" are not properties of taxon >>> concepts; they're the property of an occurrence (which, I >>> suspect, is why establishmentMeans is included in the >>> Occurrence class in DwC; e.g., see the examples at >>> http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans >>> >>> Rich >>> >>> ________________________________ >>> >>> From: tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org> >>> [mailto:tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org>] On Behalf Of Jerry Cooper >>> Sent: Monday, October 11, 2010 4:38 PM >>> Cc: tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; >>> tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> Rich, >>> >>> >>> >>> Let's not confuse those terms which are best applied >>> to a taxon concept rather than a specific >>> collection/observation of a taxon at a location. >>> >>> >>> >>> There are existing vocabularies for taxon-related >>> provenance, like those in GISIN, or the vocabulary Roger >>> mentioned in his PESI talk at TDWG. >>> >>> >>> >>> However, against a specific collection you can only >>> record what the recorder actually knows at that location for >>> that specific collected taxon, and not to infer a status like >>> 'introduced' etc. >>> >>> >>> >>> So, to me, the vocabulary reduces even further - and >>> the obvious ones are 'in cultivation', 'in captivity', >>> 'border intercept' . Our botanical collection management >>> system would hold more data on provenance of a specific >>> collection and linkages between events - from the wild at t=1, >>> x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But >>> then we often have that data because we are generating it. >>> >>> >>> >>> Jerry >>> >>> >>> >>> >>> >>> From: tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org> >>> [mailto:tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org>] On Behalf Of Richard Pyle >>> Sent: Tuesday, 12 October 2010 3:27 p.m. >>> To: Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au>; tuco@berkeley.edu <mailto:tuco@berkeley.edu> >>> Cc: tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; >>> tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> I certainly agree it's important! I was just saying >>> that a simple flag probably wouldn't be enough. I like the >>> idea of a controlled vocabulary (as you and John both allude >>> to), and I can imagine about a half-dozen terms that our >>> community will no-doubt adopt with almost no debate..... :-) >>> >>> >>> >>> In my mind, the broadest categories (and likely most >>> useful) would be something like: >>> >>> >>> >>> Native (was there without any assistance from humans) >>> >>> Introduced (got there with the assistance of humans, >>> but is inhabiting the natural environment) >>> >>> Captive (brought by humans and still maintained in captivity) >>> >>> >>> >>> You might also throw in "Cryptogenic", which is an >>> assertion that we do not know which of these categories a >>> particular organism falls (not the same as null, which means >>> we don't know whether or not we know) >>> >>> >>> >>> Of course, each of these can be further subdivded, >>> but the more we subdivide, the greater the ratio of >>> fuzzy:clean distinctions. I would say that the terms should >>> be established in consultation with those most likely to use >>> them (e.g., as you suggest, distribution analysis, niche modellers, >>> etc.) For example, it might be useful to distinguish between >>> an organism that was itself introduced, compared to the >>> progeny (or a well-established >>> population) of an intoduced organism. This information can be >>> useful for separating things likely to become established in >>> new localities, vs. things that do not seem to "take" in a >>> novel environment. >>> >>> Anyway...I didn't want to say a lot on this topic >>> (too late?); I just wanted to steer more towards controlled >>> vocabulary, than simple flag field. >>> >>> >>> >>> Aloha, >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au> >>> [mailto:Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au>] >>> Sent: Monday, October 11, 2010 3:44 PM >>> To: Richard Pyle; tuco@berkeley.edu <mailto:tuco@berkeley.edu> >>> Cc: tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; >>> tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> Hi Rich. >>> >>> >>> >>> I recognise this (and could probably define >>> many different useful flags). The bottom line is really >>> whether or not the location is one which should be used for >>> distribution analysis, niche modelling and similar >>> activities. There will certainly be many grey areas, but it >>> would be good if software could weed out captive occurrences. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box 1700, >>> Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au> >>> <mailto:Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au>> >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Richard Pyle [mailto:deepreef@bishopmuseum.org <mailto:deepreef@bishopmuseum.org>] >>> Sent: Tuesday, 12 October 2010 12:33 PM >>> To: Hobern, Donald (CES, Black Mountain); >>> tuco@berkeley.edu <mailto:tuco@berkeley.edu> >>> Cc: tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; >>> tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> >>> >>> I'm not so sure a simple flag will do it. We >>> have examples ranging from animals in zoos, to escaped >>> animals, to intentionally and unintentionally introduced >>> populations, to naturalized populations -- and just about >>> everything in-between. Where on this spectrum would you draw >>> the line for flagging something as "naturally occurring"? >>> >>> >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: >>> tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org> >>> [mailto:tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org>] On Behalf Of >>> Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au> >>> Sent: Monday, October 11, 2010 2:59 PM >>> To: tuco@berkeley.edu <mailto:tuco@berkeley.edu> >>> Cc: tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>; >>> tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com> >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> Thanks, John. >>> >>> >>> >>> This is useful, but completely >>> uncontrolled - effectively a verbatimEstablishmentMeans. >>> Having a more controlled version or a simple flag which could >>> be machine-processible in those cases where providers can >>> supply it would be useful. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, >>> Atlas of Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au> >>> <mailto:Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au>> >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: gtuco.btuco@gmail.com <mailto:gtuco.btuco@gmail.com> >>> [mailto:gtuco.btuco@gmail.com <mailto:gtuco.btuco@gmail.com>] On Behalf Of John Wieczorek >>> Sent: Tuesday, 12 October 2010 11:34 AM >>> To: Hobern, Donald (CES, Black Mountain) >>> Cc: jsachs@csee.umbc.edu <mailto:jsachs@csee.umbc.edu>; >>> tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com>; tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> >>> >>> Natural occurrence is meant to be >>> captured through the term dwc:establishmentMeans >>> (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). >>> >>> On Mon, Oct 11, 2010 at 5:16 PM, >>> <Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au>> wrote: >>> >>> Thanks, Joel. >>> >>> Nice summary. One addition which we >>> do need to resolve (and which has been suggested in recent >>> months) is to have a flag to indicate whether a record should >>> be considered to show a "natural" >>> occurrence (in distinction from cultivation, botanic gardens, >>> zoos, etc.). >>> This is not so much an issue in a BioBlitz, but is certainly >>> a factor with citizen science recording in general - see the >>> number of zoo animals in the Flickr EOL group. >>> >>> Donald >>> >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> Phone: (02) 62464352 Mobile: 0437990208 >>> Email: Donald.Hobern@csiro.au <mailto:Donald.Hobern@csiro.au> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org> >>> [mailto:tdwg-content-bounces@lists.tdwg.org <mailto:tdwg-content-bounces@lists.tdwg.org>] On Behalf Of joel sachs >>> Sent: Monday, 11 October 2010 10:47 PM >>> To: tdwg-bioblitz@googlegroups.com <mailto:tdwg-bioblitz@googlegroups.com>; >>> tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> >>> Subject: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> One of the goals of the recent >>> bioblitz was to think about the suitability and >>> appropriatness of TDWG standards for citizen science. Robert >>> Stevenson has volunteered to take the lead on preparing a >>> technobioblitz lessons learned document, and though the scope >>> of this document is not yet determined, I think the audience >>> will include bioblitz organizers, software developers, and >>> TDWG as a whole. I hope no one is shy about sharing lessons >>> they think they learned, or suggestions that they have. We >>> can use the bioblitz google group for this discussion, and >>> copy in tdwg-content when our discussion is standards-specific. >>> >>> Here are some of my immediate observations: >>> >>> 1. Darwin Core is almost exactly >>> right for citizen science. However, there is a desperate need >>> for examples and templates of its use. To illustrate this >>> need: one of the developers spoke of the design choice >>> between "a simple csv file and a Darwin Core record". But a >>> simple csv file is a legitimate representation of Darwin >>> Core! To be fair to the developer, such a sentence might not >>> have struck me as absurd a year ago, before Remsen said >>> "let's use DwC for the bioblitz". >>> >>> We provided a couple of example DwC >>> records (text and rdf) in the bioblitz data profile [1]. I >>> think the lessons learned document should include an on-line >>> catalog of cut-and-pasteable examples covering a variety of >>> use cases, together with a dead simple desciption of DwC, >>> something like "Darwin Core is a collection of terms, >>> together with definitions." >>> >>> Here are areas where we augemented or >>> diverged from DwC in the bioblitz: >>> >>> i. We added obs:observedBy [2], since >>> there is no equivalent property in DwC, and it's important in >>> Citizen Science (though often not available). >>> >>> ii. We used geo:lat and geo:long [3] >>> instead of DwC terms for latitude and longitude. The geo >>> namespace is a well used and supported standard, and records >>> with geo coordinates are automatically mapped by several >>> applications. Since everyone was using GPS to retrieve their >>> coordinates, we were able to assume WGS-84 as the datum. >>> >>> If someone had used another Datum, >>> say XYZ, we would have added columns to the Fusion table so >>> that they could have expressed their coordiantes in DwC, as, e.g.: >>> DwC:decimalLatitude=41.5 >>> DwC:decimalLongitude=-70.7 >>> DwC:geodeticDatum=XYZ >>> >>> (I would argue that it should be >>> kosher DwC to express the above as simply XYZ:lat and >>> XYZ:long. DwC already incorporates terms from other >>> namespaces, such as Dublin Core, so there is precedent for this. >>> >>> 2. DwC:scientificName might be more >>> user friendly than taxonomy:binomial and the other taxonomy >>> machine tags EOL uses for flickr images. If >>> DwC:scientificName isn't self-explanatory enough, a user can >>> look it up, and see that any scientific name is acceptable, >>> at any taxonomic rank, or not having any rank. And once we >>> have a scientific name, higher ranks can be inferred. >>> >>> 3. Catalogue of Life was an important >>> part of the workflow, but we had some problems with it. >>> Future bioblitzes might consider using something like a CoL >>> fork, as recently described by Rod Page [4]. >>> >>> 4. We didn't include "basisOfRecord" >>> in the original data profile, and so it wasn't a column in >>> the Fusion Table [5]. But when a transcriber felt it was >>> necessary to include in order to capture data in a particular >>> field sheet, she just added the column to the table. This >>> flexibility of schema is important, and is in harmony with >>> the semantic web. >>> >>> 5. There seemed to be enthusiasm for >>> another field event at next year's TDWG. This could be an >>> opportunity to gather other types of data (eg. >>> character data) and thereby >>> i) expose meeting particpants to >>> another set of everyday problems from the world of >>> biodiversity workflows, and ii) try other TDWG technology on >>> for size, e.g. the observation exchange format, annotation >>> framework, etc. >>> >>> >>> Happy Thanksgiving to all in Canada - >>> Joel. >>> ---- >>> >>> >>> 1. >>> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz >> -profile-v1-1 >>> 2. Slightly bastardizing our old >>> observation ontology - >>> http://spire.umbc.edu/ontologies/Observation.owl >>> 3. http://www.w3.org/2003/01/geo/ >>> 4. >>> http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat >> a-in-2010.html >>> 5. >>> http://tables.googlelabs.com/DataSource?dsrcid=248798 >>> >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> >>> >>> >>> ________________________________ >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> >>> >>> >>> >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >> >> >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> >> Please consider the environment before printing this email >> Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. >> The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz > > _______________________________________________ > tdwg-content mailing list > tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> > http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Steve,
The TDWG process for creating standards is here: http://www.tdwg.org/about-tdwg/process/ This is worth reading if you haven’t done so already.
Another document worth reading is the standards format specifiation http://www.tdwg.org/standards/147/ I never pushed this “standard” through public review, but it still functions a guideline for formatting and our view of what is isn’t within scope of a “standard”. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 — uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement — offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but don’t create a de-novo standard.
Interest groups and task groups are explained in the Process. If you want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu wrote:
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve
Tim Robertson (GBIF) wrote: I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Stan, John, what exactly is part of the ratified dwc standard? The download at http://www.tdwg.org/standards/450/ seems to include all files under rs.tdwg.org/dwc apart from the legacy ones. Is this zip the standard? In particular I wonder about the xml and text guidelines and their supplementary schemas/files. Are these part of the "core" dwc standard?
Intuitively I always thought that the plain (html) list of dwc terms is the normative part of the dwc standard while the application guidelines are kept separate and up to now are not formally endorsed. John mentioned in a past email though that http://rs.tdwg.org/dwc/rdf/dwcterms.rdf is a normative file. If we need to change any of the guidelines or their associated files, does this need to go through the formal process then?
Markus
On Oct 13, 2010, at 21:00, Blum, Stan wrote:
Steve,
The TDWG process for creating standards is here: http://www.tdwg.org/about-tdwg/process/ This is worth reading if you haven’t done so already.
Another document worth reading is the standards format specifiation http://www.tdwg.org/standards/147/ I never pushed this “standard” through public review, but it still functions a guideline for formatting and our view of what is isn’t within scope of a “standard”. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 — uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement — offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but don’t create a de-novo standard.
Interest groups and task groups are explained in the Process. If you want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu wrote:
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve
Tim Robertson (GBIF) wrote:
I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
> I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing. > > Eg, if we describe (in a basic way) : > Ocurrence = Taxon at Location > > then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) > then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view) > > As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area" > > Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-) > > Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ... > > Kevin > > ________________________________________ > From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] > Sent: Tuesday, 12 October 2010 5:41 p.m. > To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com > Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz > > Hi Jerry, > > Before we agree to disagree, let me try to elaborate a bit more: > > I think we both agree that "Nativeness" (to borrow Dave's term) is a > property of a taxon at a geographic locality (it could also be a property of > a taxon in a class of habitat, but few people actually frame it this way). > > The reason I think that "Nativeness" is best represented as a property of an > Occurrence, rather than of a taxon, is that a taxon is a circumscribed set > of organisms, usually based on evolutionary relatedness or morphological or > genetic similarity. By contrast, an Occurrence is about the presence of a > member or multiple members of a taxon concept in space and time (i.e., at a > particular place and time). > > We often think of Occurrence records in terms of individual organisms (e.g., > specimens, or specific observed or photographed organisms), and I agree, > it's weird to think of "Nativeness" as it applies to an individual organism. > However, my understanding is that Occurrence instances can also apply to > populations -- which is what terms such as establishmentMeans and > occurrenceStatus fit into this class. > > More generally, if we agree that "Nativeness" is a property of a taxon at a > particular locality, the way that this intersection is usually manifest in > DwC is via Occurrence and Event instances. > > How else would you represent "Nativeness" within DwC? > > Aloha, > Rich > >> -----Original Message----- >> From: tdwg-content-bounces@lists.tdwg.org >> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper >> Sent: Monday, October 11, 2010 6:02 PM >> To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >> >> We will have to agree to disagree. >> >> For me at least 'Native', 'Invasive' etc are clearly not >> properties associated with a collection event. They are >> collective statements, not necessarily about properties of >> the taxon as a whole, but about the properties of a taxon in >> some restricted sense - usually geographically restricted. >> >> GISIN, like our model here in NZ, pulls together such items >> under a triplet of taxon/occurrence statement/geographical >> extent linked to a publication. >> >> >> Jerry >> >> >> -----Original Message----- >> From: Richard Pyle [mailto:deepreef@bishopmuseum.org] >> Sent: Tuesday, 12 October 2010 4:23 p.m. >> To: Jerry Cooper >> Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >> Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz >> >> Hi Jerry, >> >> Yes, this is a road I've been down before. Intuitively, >> these terms seem like they should apply to taxon concepts, >> but it turns out that's not the right way to do it. Things >> like "native" and "invasive" are not properties of taxon >> concepts; they're the property of an occurrence (which, I >> suspect, is why establishmentMeans is included in the >> Occurrence class in DwC; e.g., see the examples at >> http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans >> >> Rich >> >> ________________________________ >> >> From: tdwg-content-bounces@lists.tdwg.org >> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper >> Sent: Monday, October 11, 2010 4:38 PM >> Cc: tdwg-content@lists.tdwg.org; >> tdwg-bioblitz@googlegroups.com >> Subject: Re: [tdwg-content] What I learned at the >> TechnoBioBlitz >> >> >> >> Rich, >> >> >> >> Let's not confuse those terms which are best applied >> to a taxon concept rather than a specific >> collection/observation of a taxon at a location. >> >> >> >> There are existing vocabularies for taxon-related >> provenance, like those in GISIN, or the vocabulary Roger >> mentioned in his PESI talk at TDWG. >> >> >> >> However, against a specific collection you can only >> record what the recorder actually knows at that location for >> that specific collected taxon, and not to infer a status like >> 'introduced' etc. >> >> >> >> So, to me, the vocabulary reduces even further - and >> the obvious ones are 'in cultivation', 'in captivity', >> 'border intercept' . Our botanical collection management >> system would hold more data on provenance of a specific >> collection and linkages between events - from the wild at t=1, >> x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But >> then we often have that data because we are generating it. >> >> >> >> Jerry >> >> >> >> >> >> From: tdwg-content-bounces@lists.tdwg.org >> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle >> Sent: Tuesday, 12 October 2010 3:27 p.m. >> To: Donald.Hobern@csiro.au; tuco@berkeley.edu >> Cc: tdwg-content@lists.tdwg.org; >> tdwg-bioblitz@googlegroups.com >> Subject: Re: [tdwg-content] What I learned at the >> TechnoBioBlitz >> >> >> >> I certainly agree it's important! I was just saying >> that a simple flag probably wouldn't be enough. I like the >> idea of a controlled vocabulary (as you and John both allude >> to), and I can imagine about a half-dozen terms that our >> community will no-doubt adopt with almost no debate..... :-) >> >> >> >> In my mind, the broadest categories (and likely most >> useful) would be something like: >> >> >> >> Native (was there without any assistance from humans) >> >> Introduced (got there with the assistance of humans, >> but is inhabiting the natural environment) >> >> Captive (brought by humans and still maintained in captivity) >> >> >> >> You might also throw in "Cryptogenic", which is an >> assertion that we do not know which of these categories a >> particular organism falls (not the same as null, which means >> we don't know whether or not we know) >> >> >> >> Of course, each of these can be further subdivded, >> but the more we subdivide, the greater the ratio of >> fuzzy:clean distinctions. I would say that the terms should >> be established in consultation with those most likely to use >> them (e.g., as you suggest, distribution analysis, niche modellers, >> etc.) For example, it might be useful to distinguish between >> an organism that was itself introduced, compared to the >> progeny (or a well-established >> population) of an intoduced organism. This information can be >> useful for separating things likely to become established in >> new localities, vs. things that do not seem to "take" in a >> novel environment. >> >> Anyway...I didn't want to say a lot on this topic >> (too late?); I just wanted to steer more towards controlled >> vocabulary, than simple flag field. >> >> >> >> Aloha, >> >> Rich >> >> >> >> ________________________________ >> >> From: Donald.Hobern@csiro.au >> [mailto:Donald.Hobern@csiro.au] >> Sent: Monday, October 11, 2010 3:44 PM >> To: Richard Pyle; tuco@berkeley.edu >> Cc: tdwg-content@lists.tdwg.org; >> tdwg-bioblitz@googlegroups.com >> Subject: RE: [tdwg-content] What I learned at >> the TechnoBioBlitz >> >> Hi Rich. >> >> >> >> I recognise this (and could probably define >> many different useful flags). The bottom line is really >> whether or not the location is one which should be used for >> distribution analysis, niche modelling and similar >> activities. There will certainly be many grey areas, but it >> would be good if software could weed out captive occurrences. >> >> >> >> Donald >> >> >> >> >> >> untitled >> >> >> >> Donald Hobern, Director, Atlas of >> Living Australia >> >> CSIRO Ecosystem Sciences, GPO Box 1700, >> Canberra, ACT 2601 >> >> Phone: (02) 62464352 Mobile: 0437990208 >> >> Email: Donald.Hobern@csiro.au >> mailto:Donald.Hobern@csiro.au >> >> Web: http://www.ala.org.au/ >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> From: Richard Pyle [mailto:deepreef@bishopmuseum.org] >> Sent: Tuesday, 12 October 2010 12:33 PM >> To: Hobern, Donald (CES, Black Mountain); >> tuco@berkeley.edu >> Cc: tdwg-content@lists.tdwg.org; >> tdwg-bioblitz@googlegroups.com >> Subject: RE: [tdwg-content] What I learned at >> the TechnoBioBlitz >> >> >> >> I'm not so sure a simple flag will do it. We >> have examples ranging from animals in zoos, to escaped >> animals, to intentionally and unintentionally introduced >> populations, to naturalized populations -- and just about >> everything in-between. Where on this spectrum would you draw >> the line for flagging something as "naturally occurring"? >> >> >> >> Rich >> >> >> >> ________________________________ >> >> From: >> tdwg-content-bounces@lists.tdwg.org >> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of >> Donald.Hobern@csiro.au >> Sent: Monday, October 11, 2010 2:59 PM >> To: tuco@berkeley.edu >> Cc: tdwg-content@lists.tdwg.org; >> tdwg-bioblitz@googlegroups.com >> Subject: Re: [tdwg-content] What I >> learned at the TechnoBioBlitz >> >> Thanks, John. >> >> >> >> This is useful, but completely >> uncontrolled - effectively a verbatimEstablishmentMeans. >> Having a more controlled version or a simple flag which could >> be machine-processible in those cases where providers can >> supply it would be useful. >> >> >> >> Donald >> >> >> >> >> >> untitled >> >> >> >> Donald Hobern, Director, >> Atlas of Living Australia >> >> CSIRO Ecosystem Sciences, GPO Box >> 1700, Canberra, ACT 2601 >> >> Phone: (02) 62464352 Mobile: 0437990208 >> >> Email: Donald.Hobern@csiro.au >> mailto:Donald.Hobern@csiro.au >> >> Web: http://www.ala.org.au/ >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> From: gtuco.btuco@gmail.com >> [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek >> Sent: Tuesday, 12 October 2010 11:34 AM >> To: Hobern, Donald (CES, Black Mountain) >> Cc: jsachs@csee.umbc.edu; >> tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org >> Subject: Re: [tdwg-content] What I >> learned at the TechnoBioBlitz >> >> >> >> Natural occurrence is meant to be >> captured through the term dwc:establishmentMeans >> (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). >> >> On Mon, Oct 11, 2010 at 5:16 PM, >> Donald.Hobern@csiro.au wrote: >> >> Thanks, Joel. >> >> Nice summary. One addition which we >> do need to resolve (and which has been suggested in recent >> months) is to have a flag to indicate whether a record should >> be considered to show a "natural" >> occurrence (in distinction from cultivation, botanic gardens, >> zoos, etc.). >> This is not so much an issue in a BioBlitz, but is certainly >> a factor with citizen science recording in general - see the >> number of zoo animals in the Flickr EOL group. >> >> Donald >> >> >> >> >> Donald Hobern, Director, Atlas of >> Living Australia >> CSIRO Ecosystem Sciences, GPO Box >> 1700, Canberra, ACT 2601 >> Phone: (02) 62464352 Mobile: 0437990208 >> Email: Donald.Hobern@csiro.au >> Web: http://www.ala.org.au/ >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> From: tdwg-content-bounces@lists.tdwg.org >> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs >> Sent: Monday, 11 October 2010 10:47 PM >> To: tdwg-bioblitz@googlegroups.com; >> tdwg-content@lists.tdwg.org >> Subject: [tdwg-content] What I >> learned at the TechnoBioBlitz >> >> One of the goals of the recent >> bioblitz was to think about the suitability and >> appropriatness of TDWG standards for citizen science. Robert >> Stevenson has volunteered to take the lead on preparing a >> technobioblitz lessons learned document, and though the scope >> of this document is not yet determined, I think the audience >> will include bioblitz organizers, software developers, and >> TDWG as a whole. I hope no one is shy about sharing lessons >> they think they learned, or suggestions that they have. We >> can use the bioblitz google group for this discussion, and >> copy in tdwg-content when our discussion is standards-specific. >> >> Here are some of my immediate observations: >> >> 1. Darwin Core is almost exactly >> right for citizen science. However, there is a desperate need >> for examples and templates of its use. To illustrate this >> need: one of the developers spoke of the design choice >> between "a simple csv file and a Darwin Core record". But a >> simple csv file is a legitimate representation of Darwin >> Core! To be fair to the developer, such a sentence might not >> have struck me as absurd a year ago, before Remsen said >> "let's use DwC for the bioblitz". >> >> We provided a couple of example DwC >> records (text and rdf) in the bioblitz data profile [1]. I >> think the lessons learned document should include an on-line >> catalog of cut-and-pasteable examples covering a variety of >> use cases, together with a dead simple desciption of DwC, >> something like "Darwin Core is a collection of terms, >> together with definitions." >> >> Here are areas where we augemented or >> diverged from DwC in the bioblitz: >> >> i. We added obs:observedBy [2], since >> there is no equivalent property in DwC, and it's important in >> Citizen Science (though often not available). >> >> ii. We used geo:lat and geo:long [3] >> instead of DwC terms for latitude and longitude. The geo >> namespace is a well used and supported standard, and records >> with geo coordinates are automatically mapped by several >> applications. Since everyone was using GPS to retrieve their >> coordinates, we were able to assume WGS-84 as the datum. >> >> If someone had used another Datum, >> say XYZ, we would have added columns to the Fusion table so >> that they could have expressed their coordiantes in DwC, as, e.g.: >> DwC:decimalLatitude=41.5 >> DwC:decimalLongitude=-70.7 >> DwC:geodeticDatum=XYZ >> >> (I would argue that it should be >> kosher DwC to express the above as simply XYZ:lat and >> XYZ:long. DwC already incorporates terms from other >> namespaces, such as Dublin Core, so there is precedent for this. >> >> 2. DwC:scientificName might be more >> user friendly than taxonomy:binomial and the other taxonomy >> machine tags EOL uses for flickr images. If >> DwC:scientificName isn't self-explanatory enough, a user can >> look it up, and see that any scientific name is acceptable, >> at any taxonomic rank, or not having any rank. And once we >> have a scientific name, higher ranks can be inferred. >> >> 3. Catalogue of Life was an important >> part of the workflow, but we had some problems with it. >> Future bioblitzes might consider using something like a CoL >> fork, as recently described by Rod Page [4]. >> >> 4. We didn't include "basisOfRecord" >> in the original data profile, and so it wasn't a column in >> the Fusion Table [5]. But when a transcriber felt it was >> necessary to include in order to capture data in a particular >> field sheet, she just added the column to the table. This >> flexibility of schema is important, and is in harmony with >> the semantic web. >> >> 5. There seemed to be enthusiasm for >> another field event at next year's TDWG. This could be an >> opportunity to gather other types of data (eg. >> character data) and thereby >> i) expose meeting particpants to >> another set of everyday problems from the world of >> biodiversity workflows, and ii) try other TDWG technology on >> for size, e.g. the observation exchange format, annotation >> framework, etc. >> >> >> Happy Thanksgiving to all in Canada - >> Joel. >> ---- >> >> >> 1. >> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz > -profile-v1-1 >> 2. Slightly bastardizing our old >> observation ontology - >> http://spire.umbc.edu/ontologies/Observation.owl >> 3. http://www.w3.org/2003/01/geo/ >> 4. >> http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat > a-in-2010.html >> 5. >> http://tables.googlelabs.com/DataSource?dsrcid=248798 >> >> >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org >> >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org >> >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> >> >> >> >> ________________________________ >> >> Please consider the environment before printing this email >> Warning: This electronic message together with any >> attachments is confidential. If you receive it in error: (i) >> you must not read, use, disclose, copy or retain it; (ii) >> please contact the sender immediately by reply email and then >> delete the emails. >> The views expressed in this email may not be those of >> Landcare Research New Zealand Limited. >> http://www.landcareresearch.co.nz >> >> >> >> >> >> Please consider the environment before printing this email >> Warning: This electronic message together with any >> attachments is confidential. If you receive it in error: (i) >> you must not read, use, disclose, copy or retain it; (ii) >> please contact the sender immediately by reply email and then >> delete the emails. >> The views expressed in this email may not be those of >> Landcare Research New Zealand Limited. >> http://www.landcareresearch.co.nz >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> > > > _______________________________________________ > tdwg-content mailing list > tdwg-content@lists.tdwg.org > http://lists.tdwg.org/mailman/listinfo/tdwg-content > > Please consider the environment before printing this email > Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. > The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Wed, Oct 13, 2010 at 12:58 PM, "Markus Döring (GBIF)" mdoering@gbif.orgwrote:
Stan, John, what exactly is part of the ratified dwc standard? The download at http://www.tdwg.org/standards/450/ seems to include all files under rs.tdwg.org/dwc apart from the legacy ones. Is this zip the standard? In particular I wonder about the xml and text guidelines and their supplementary schemas/files. Are these part of the "core" dwc standard?
Everything in the zip file is a part of the standard. I call the rdf files (one with history, one without, for convenience) the normative standard for the terms because it contains all of the information for the term definitions and no more. Basically, the HTML representation can be created from the RDF, and the HTML version is just one way to look at it. To me that makes the RDF fundamental.
Intuitively I always thought that the plain (html) list of dwc terms is the
normative part of the dwc standard while the application guidelines are kept separate and up to now are not formally endorsed. John mentioned in a past email though that http://rs.tdwg.org/dwc/rdf/dwcterms.rdf is a normative file. If we need to change any of the guidelines or their associated files, does this need to go through the formal process then?
Changes to the schemas and guidelines that have been ratified do need to go through the process defined in the Darwin Core Namespace document (also a part of the ratified standard). It easy if they are corrections. A little harder if they actually have backward compatibility implications.
Markus
On Oct 13, 2010, at 21:00, Blum, Stan wrote:
Steve,
The TDWG process for creating standards is here:
http://www.tdwg.org/about-tdwg/process/ This is worth reading if you haven’t done so already.
Another document worth reading is the standards format specifiation
http://www.tdwg.org/standards/147/ I never pushed this “standard” through public review, but it still functions a guideline for formatting and our view of what is isn’t within scope of a “standard”. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 — uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site
is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement —
offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but don’t create a de-novo standard.
Interest groups and task groups are explained in the Process. If you
want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu
wrote:
OK, because of a momentary heavy work load I'm still in the process of
getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it?
Steve
Tim Robertson (GBIF) wrote:
I will also help with examples. If we are doing XML / RDF formats,
lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have
XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" <
mdoering@gbif.org> wrote:
Would we have the energy to compile example dwc records on how to use
darwin core for certain use cases?
The lack of guidance on how to use darwin core was mentioned earlier.
An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a
short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
> Wow - what a thread to come back to. > > I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
> > This point is largely just expanding on what Kevin just said. Going
down the road he was wise enough not to go down!
> > The vocabulary I briefly presented at TDWG was aimed at occurrence
of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
> > Take two examples. > > A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
> > A tiger sitting in a cage a London Zoo is "managed" in that it is
being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
> > As Kevin says, when I observe an individual (or flock of
individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
> > I would therefore advocate that we just have a flag on an
occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
> > There are of course grey areas (biology always has grey areas). A
Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
> > The status of taxa in regions is a completely different thing. As
soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
> > Does the problem occur because we are using the same term
"occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
> > Sorry to be long winded. > > Roger > > > On 12 Oct 2010, at 09:36, Kevin Richards wrote: > >> I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
>> >> Eg, if we describe (in a basic way) : >> Ocurrence = Taxon at Location >> >> then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
>> then this is equivalent to saying that Nativeness is a property of
an Ocurrence ! (Rich's view)
>> >> As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
>> >> Also I tend to feel that a lot of biodiversity properties are
properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
>> >> Also, we discussed this topic a while ago on the tdwg content
list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
>> >> Kevin >> >> ________________________________________ >> From: tdwg-content-bounces@lists.tdwg.org [
tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [ deepreef@bishopmuseum.org]
>> Sent: Tuesday, 12 October 2010 5:41 p.m. >> To: Jerry Cooper; tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
>> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >> >> Hi Jerry, >> >> Before we agree to disagree, let me try to elaborate a bit more: >> >> I think we both agree that "Nativeness" (to borrow Dave's term) is
a
>> property of a taxon at a geographic locality (it could also be a
property of
>> a taxon in a class of habitat, but few people actually frame it
this way).
>> >> The reason I think that "Nativeness" is best represented as a
property of an
>> Occurrence, rather than of a taxon, is that a taxon is a
circumscribed set
>> of organisms, usually based on evolutionary relatedness or
morphological or
>> genetic similarity. By contrast, an Occurrence is about the
presence of a
>> member or multiple members of a taxon concept in space and time
(i.e., at a
>> particular place and time). >> >> We often think of Occurrence records in terms of individual
organisms (e.g.,
>> specimens, or specific observed or photographed organisms), and I
agree,
>> it's weird to think of "Nativeness" as it applies to an individual
organism.
>> However, my understanding is that Occurrence instances can also
apply to
>> populations -- which is what terms such as establishmentMeans and >> occurrenceStatus fit into this class. >> >> More generally, if we agree that "Nativeness" is a property of a
taxon at a
>> particular locality, the way that this intersection is usually
manifest in
>> DwC is via Occurrence and Event instances. >> >> How else would you represent "Nativeness" within DwC? >> >> Aloha, >> Rich >> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry
Cooper
>>> Sent: Monday, October 11, 2010 6:02 PM >>> To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> We will have to agree to disagree. >>> >>> For me at least 'Native', 'Invasive' etc are clearly not >>> properties associated with a collection event. They are >>> collective statements, not necessarily about properties of >>> the taxon as a whole, but about the properties of a taxon in >>> some restricted sense - usually geographically restricted. >>> >>> GISIN, like our model here in NZ, pulls together such items >>> under a triplet of taxon/occurrence statement/geographical >>> extent linked to a publication. >>> >>> >>> Jerry >>> >>> >>> -----Original Message----- >>> From: Richard Pyle [mailto:deepreef@bishopmuseum.org] >>> Sent: Tuesday, 12 October 2010 4:23 p.m. >>> To: Jerry Cooper >>> Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> Hi Jerry, >>> >>> Yes, this is a road I've been down before. Intuitively, >>> these terms seem like they should apply to taxon concepts, >>> but it turns out that's not the right way to do it. Things >>> like "native" and "invasive" are not properties of taxon >>> concepts; they're the property of an occurrence (which, I >>> suspect, is why establishmentMeans is included in the >>> Occurrence class in DwC; e.g., see the examples at >>> http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans >>> >>> Rich >>> >>> ________________________________ >>> >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry
Cooper
>>> Sent: Monday, October 11, 2010 4:38 PM >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> Rich, >>> >>> >>> >>> Let's not confuse those terms which are best applied >>> to a taxon concept rather than a specific >>> collection/observation of a taxon at a location. >>> >>> >>> >>> There are existing vocabularies for taxon-related >>> provenance, like those in GISIN, or the vocabulary Roger >>> mentioned in his PESI talk at TDWG. >>> >>> >>> >>> However, against a specific collection you can only >>> record what the recorder actually knows at that location for >>> that specific collected taxon, and not to infer a status like >>> 'introduced' etc. >>> >>> >>> >>> So, to me, the vocabulary reduces even further - and >>> the obvious ones are 'in cultivation', 'in captivity', >>> 'border intercept' . Our botanical collection management >>> system would hold more data on provenance of a specific >>> collection and linkages between events - from the wild at t=1, >>> x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But >>> then we often have that data because we are generating it. >>> >>> >>> >>> Jerry >>> >>> >>> >>> >>> >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of
Richard Pyle
>>> Sent: Tuesday, 12 October 2010 3:27 p.m. >>> To: Donald.Hobern@csiro.au; tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> I certainly agree it's important! I was just saying >>> that a simple flag probably wouldn't be enough. I like the >>> idea of a controlled vocabulary (as you and John both allude >>> to), and I can imagine about a half-dozen terms that our >>> community will no-doubt adopt with almost no debate..... :-) >>> >>> >>> >>> In my mind, the broadest categories (and likely most >>> useful) would be something like: >>> >>> >>> >>> Native (was there without any assistance from humans) >>> >>> Introduced (got there with the assistance of humans, >>> but is inhabiting the natural environment) >>> >>> Captive (brought by humans and still maintained in
captivity)
>>> >>> >>> >>> You might also throw in "Cryptogenic", which is an >>> assertion that we do not know which of these categories a >>> particular organism falls (not the same as null, which means >>> we don't know whether or not we know) >>> >>> >>> >>> Of course, each of these can be further subdivded, >>> but the more we subdivide, the greater the ratio of >>> fuzzy:clean distinctions. I would say that the terms should >>> be established in consultation with those most likely to use >>> them (e.g., as you suggest, distribution analysis, niche
modellers,
>>> etc.) For example, it might be useful to distinguish between >>> an organism that was itself introduced, compared to the >>> progeny (or a well-established >>> population) of an intoduced organism. This information can be >>> useful for separating things likely to become established in >>> new localities, vs. things that do not seem to "take" in a >>> novel environment. >>> >>> Anyway...I didn't want to say a lot on this topic >>> (too late?); I just wanted to steer more towards controlled >>> vocabulary, than simple flag field. >>> >>> >>> >>> Aloha, >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: Donald.Hobern@csiro.au >>> [mailto:Donald.Hobern@csiro.au] >>> Sent: Monday, October 11, 2010 3:44 PM >>> To: Richard Pyle; tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> Hi Rich. >>> >>> >>> >>> I recognise this (and could probably define >>> many different useful flags). The bottom line is really >>> whether or not the location is one which should be used for >>> distribution analysis, niche modelling and similar >>> activities. There will certainly be many grey areas, but it >>> would be good if software could weed out captive occurrences. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box 1700, >>> Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au >>> mailto:Donald.Hobern@csiro.au >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Richard Pyle [mailto:
deepreef@bishopmuseum.org]
>>> Sent: Tuesday, 12 October 2010 12:33 PM >>> To: Hobern, Donald (CES, Black Mountain); >>> tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> >>> >>> I'm not so sure a simple flag will do it. We >>> have examples ranging from animals in zoos, to escaped >>> animals, to intentionally and unintentionally introduced >>> populations, to naturalized populations -- and just about >>> everything in-between. Where on this spectrum would you draw >>> the line for flagging something as "naturally occurring"? >>> >>> >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: >>> tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of >>> Donald.Hobern@csiro.au >>> Sent: Monday, October 11, 2010 2:59 PM >>> To: tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> Thanks, John. >>> >>> >>> >>> This is useful, but completely >>> uncontrolled - effectively a verbatimEstablishmentMeans. >>> Having a more controlled version or a simple flag which could >>> be machine-processible in those cases where providers can >>> supply it would be useful. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, >>> Atlas of Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au >>> mailto:Donald.Hobern@csiro.au >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: gtuco.btuco@gmail.com >>> [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek >>> Sent: Tuesday, 12 October 2010 11:34 AM >>> To: Hobern, Donald (CES, Black Mountain) >>> Cc: jsachs@csee.umbc.edu; >>> tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> >>> >>> Natural occurrence is meant to be >>> captured through the term dwc:establishmentMeans >>> (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). >>> >>> On Mon, Oct 11, 2010 at 5:16 PM, >>> Donald.Hobern@csiro.au wrote: >>> >>> Thanks, Joel. >>> >>> Nice summary. One addition which we >>> do need to resolve (and which has been suggested in recent >>> months) is to have a flag to indicate whether a record should >>> be considered to show a "natural" >>> occurrence (in distinction from cultivation, botanic gardens, >>> zoos, etc.). >>> This is not so much an issue in a BioBlitz, but is certainly >>> a factor with citizen science recording in general - see the >>> number of zoo animals in the Flickr EOL group. >>> >>> Donald >>> >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> Phone: (02) 62464352 Mobile: 0437990208 >>> Email: Donald.Hobern@csiro.au >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel
sachs
>>> Sent: Monday, 11 October 2010 10:47 PM >>> To: tdwg-bioblitz@googlegroups.com; >>> tdwg-content@lists.tdwg.org >>> Subject: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> One of the goals of the recent >>> bioblitz was to think about the suitability and >>> appropriatness of TDWG standards for citizen science. Robert >>> Stevenson has volunteered to take the lead on preparing a >>> technobioblitz lessons learned document, and though the scope >>> of this document is not yet determined, I think the audience >>> will include bioblitz organizers, software developers, and >>> TDWG as a whole. I hope no one is shy about sharing lessons >>> they think they learned, or suggestions that they have. We >>> can use the bioblitz google group for this discussion, and >>> copy in tdwg-content when our discussion is standards-specific. >>> >>> Here are some of my immediate
observations:
>>> >>> 1. Darwin Core is almost exactly >>> right for citizen science. However, there is a desperate need >>> for examples and templates of its use. To illustrate this >>> need: one of the developers spoke of the design choice >>> between "a simple csv file and a Darwin Core record". But a >>> simple csv file is a legitimate representation of Darwin >>> Core! To be fair to the developer, such a sentence might not >>> have struck me as absurd a year ago, before Remsen said >>> "let's use DwC for the bioblitz". >>> >>> We provided a couple of example DwC >>> records (text and rdf) in the bioblitz data profile [1]. I >>> think the lessons learned document should include an on-line >>> catalog of cut-and-pasteable examples covering a variety of >>> use cases, together with a dead simple desciption of DwC, >>> something like "Darwin Core is a collection of terms, >>> together with definitions." >>> >>> Here are areas where we augemented or >>> diverged from DwC in the bioblitz: >>> >>> i. We added obs:observedBy [2], since >>> there is no equivalent property in DwC, and it's important in >>> Citizen Science (though often not available). >>> >>> ii. We used geo:lat and geo:long [3] >>> instead of DwC terms for latitude and longitude. The geo >>> namespace is a well used and supported standard, and records >>> with geo coordinates are automatically mapped by several >>> applications. Since everyone was using GPS to retrieve their >>> coordinates, we were able to assume WGS-84 as the datum. >>> >>> If someone had used another Datum, >>> say XYZ, we would have added columns to the Fusion table so >>> that they could have expressed their coordiantes in DwC, as,
e.g.:
>>> DwC:decimalLatitude=41.5 >>> DwC:decimalLongitude=-70.7 >>> DwC:geodeticDatum=XYZ >>> >>> (I would argue that it should be >>> kosher DwC to express the above as simply XYZ:lat and >>> XYZ:long. DwC already incorporates terms from other >>> namespaces, such as Dublin Core, so there is precedent for this. >>> >>> 2. DwC:scientificName might be more >>> user friendly than taxonomy:binomial and the other taxonomy >>> machine tags EOL uses for flickr images. If >>> DwC:scientificName isn't self-explanatory enough, a user can >>> look it up, and see that any scientific name is acceptable, >>> at any taxonomic rank, or not having any rank. And once we >>> have a scientific name, higher ranks can be inferred. >>> >>> 3. Catalogue of Life was an important >>> part of the workflow, but we had some problems with it. >>> Future bioblitzes might consider using something like a CoL >>> fork, as recently described by Rod Page [4]. >>> >>> 4. We didn't include "basisOfRecord" >>> in the original data profile, and so it wasn't a column in >>> the Fusion Table [5]. But when a transcriber felt it was >>> necessary to include in order to capture data in a particular >>> field sheet, she just added the column to the table. This >>> flexibility of schema is important, and is in harmony with >>> the semantic web. >>> >>> 5. There seemed to be enthusiasm for >>> another field event at next year's TDWG. This could be an >>> opportunity to gather other types of data (eg. >>> character data) and thereby >>> i) expose meeting particpants to >>> another set of everyday problems from the world of >>> biodiversity workflows, and ii) try other TDWG technology on >>> for size, e.g. the observation exchange format, annotation >>> framework, etc. >>> >>> >>> Happy Thanksgiving to all in Canada - >>> Joel. >>> ---- >>> >>> >>> 1. >>> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz >> -profile-v1-1 >>> 2. Slightly bastardizing our old >>> observation ontology - >>> http://spire.umbc.edu/ontologies/Observation.owl >>> 3. http://www.w3.org/2003/01/geo/ >>> 4. >>> http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat >> a-in-2010.html >>> 5. >>> http://tables.googlelabs.com/DataSource?dsrcid=248798 >>> >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> >>> >>> >>> ________________________________ >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> >>> >>> >>> >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >> >> >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> >> Please consider the environment before printing this email >> Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
>> The views expressed in this email may not be those of Landcare
Research New Zealand Limited. http://www.landcareresearch.co.nz
> > _______________________________________________ > tdwg-content mailing list > tdwg-content@lists.tdwg.org > http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Stan, Thanks for the clarification. My concern here is that standard or not, if examples are posted on the Google Darwin Code site, they will have an implied "stamp of approval" and will be used by others as a template (despite that site being labeled as "for discussion and development" not everyone can post to it and that implies some authority). In the case of straight XML, that isn't really that big of an issue. XML can mean whatever one wants as long as there is an agreement between the sender and the receiver (perhaps in the form of a formal XML schema) as to what the elements represent. I believe that RDF is a different beast. When one exposes RDF, the receiver is unknown. Therefore, the RDF has to actually "mean" something to the receiver without a pre-arranged agreement. In a generic XML document, the elements can simply be a list of string values of terms with no implied "meaning" except what might be inferred by grouping them in a container element. In RDF, the elements represent properties of particular resources. I believe strongly that although there may be several "right" ways to express properties of members of DwC classes, there are many more "wrong" ways that should not be used. By "right" I mean that they make sense semantically in that the properties logically are ones that should actually belong to the described resource. I do not believe that the discussion of these issues has progressed to the point where there is a consensus on the "right" use of DwC terms for some types of resources and therefore I am opposed to the posting of RDF examples on any official Darwin Core sites without a lot more discussion UNLESS the examples are clearly labeled as examples intended for discussion and not for use as templates. If you want such examples, I can provide http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf as an example for an Individual http://bioimages.vanderbilt.edu/baskauf/79695.rdf as am example of an Occurrence that is a live plant image and http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf as an example of an Occurrence that is an herbarium specimen I would be happy to discuss the reasons why I structured the RDF as I did (although mostly those examples are already rationalized in https://journals.ku.edu/index.php/jbi/article/view/3664), but I would not go so far as to say that they are "right" without some discussion.
What I intended when I suggested that I might write some kind of guide for Darwin Core represented in RDF/XML was really a document that explained to beginners what the point was of RDF, the basics of how one can structure properties in RDF using examples that are Darwin Core terms, and options for creating URIs that refer to resources that are described in separate files or within the same file. I wasn't really suggesting that it be a full-blown recommendation with specific guidelines for the use of particular terms or structuring of files for particular classes of resources, although that would be a good thing ultimately. I guess I was seeing some kind of a beginner's guide as a way to involve more people (who aren't up on RDF) in the discussion. I don't think that it should be necessary to complete full "standards" process before such a document were made available. It would probably be better to have some kind of road map where that document would be the first segment but would later be followed by guidelines for specific classes of resources with examples. I think that such a modular approach would be the most beneficial because pieces of it could actually get done in a timely fashion rather than requiring the whole thing to be complete before any of it would be accepted.
I do think a task group for Darwin Core RDF would be a good idea. If nobody is in a huge hurry, I don't mind trying to charter such a group, although I'd be just as happy if somebody else wanted to do it and I would just try be an active participant. I will look at the links you suggested, thanks.
Steve
Blum, Stan wrote:
Steve,
The TDWG process for creating standards is here: http://www.tdwg.org/about-tdwg/process/ This is worth reading if you haven’t done so already.
Another document worth reading is the standards format specifiation http://www.tdwg.org/standards/147/ I never pushed this “standard” through public review, but it still functions a guideline for formatting and our view of what is isn’t within scope of a “standard”. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 — uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement — offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but don’t create a de-novo standard.
Interest groups and task groups are explained in the Process. If you want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu wrote:
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve Tim Robertson (GBIF) wrote: I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists). On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote: I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples. On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" <mdoering@gbif.org> wrote: Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc Id volunteer to do the html page if Im given example records with a short use case description... Markus On Oct 12, 2010, at 13:14, Roger Hyam wrote: > Wow - what a thread to come back to. > > I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence". > > This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down! > > The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked. > > Take two examples. > > A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later. > > A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification. > > As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals. > > I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field. > > There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not. > > The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated. > > Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object. > > Sorry to be long winded. > > Roger > > > On 12 Oct 2010, at 09:36, Kevin Richards wrote: > >> I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing. >> >> Eg, if we describe (in a basic way) : >> Ocurrence = Taxon at Location >> >> then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) >> then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view) >> >> As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area" >> >> Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-) >> >> Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ... >> >> Kevin >> >> ________________________________________ >> From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] >> Sent: Tuesday, 12 October 2010 5:41 p.m. >> To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >> >> Hi Jerry, >> >> Before we agree to disagree, let me try to elaborate a bit more: >> >> I think we both agree that "Nativeness" (to borrow Dave's term) is a >> property of a taxon at a geographic locality (it could also be a property of >> a taxon in a class of habitat, but few people actually frame it this way). >> >> The reason I think that "Nativeness" is best represented as a property of an >> Occurrence, rather than of a taxon, is that a taxon is a circumscribed set >> of organisms, usually based on evolutionary relatedness or morphological or >> genetic similarity. By contrast, an Occurrence is about the presence of a >> member or multiple members of a taxon concept in space and time (i.e., at a >> particular place and time). >> >> We often think of Occurrence records in terms of individual organisms (e.g., >> specimens, or specific observed or photographed organisms), and I agree, >> it's weird to think of "Nativeness" as it applies to an individual organism. >> However, my understanding is that Occurrence instances can also apply to >> populations -- which is what terms such as establishmentMeans and >> occurrenceStatus fit into this class. >> >> More generally, if we agree that "Nativeness" is a property of a taxon at a >> particular locality, the way that this intersection is usually manifest in >> DwC is via Occurrence and Event instances. >> >> How else would you represent "Nativeness" within DwC? >> >> Aloha, >> Rich >> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper >>> Sent: Monday, October 11, 2010 6:02 PM >>> To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> We will have to agree to disagree. >>> >>> For me at least 'Native', 'Invasive' etc are clearly not >>> properties associated with a collection event. They are >>> collective statements, not necessarily about properties of >>> the taxon as a whole, but about the properties of a taxon in >>> some restricted sense - usually geographically restricted. >>> >>> GISIN, like our model here in NZ, pulls together such items >>> under a triplet of taxon/occurrence statement/geographical >>> extent linked to a publication. >>> >>> >>> Jerry >>> >>> >>> -----Original Message----- >>> From: Richard Pyle [mailto:deepreef@bishopmuseum.org] >>> Sent: Tuesday, 12 October 2010 4:23 p.m. >>> To: Jerry Cooper >>> Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> Hi Jerry, >>> >>> Yes, this is a road I've been down before. Intuitively, >>> these terms seem like they should apply to taxon concepts, >>> but it turns out that's not the right way to do it. Things >>> like "native" and "invasive" are not properties of taxon >>> concepts; they're the property of an occurrence (which, I >>> suspect, is why establishmentMeans is included in the >>> Occurrence class in DwC; e.g., see the examples at >>> http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans >>> >>> Rich >>> >>> ________________________________ >>> >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper >>> Sent: Monday, October 11, 2010 4:38 PM >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> Rich, >>> >>> >>> >>> Let's not confuse those terms which are best applied >>> to a taxon concept rather than a specific >>> collection/observation of a taxon at a location. >>> >>> >>> >>> There are existing vocabularies for taxon-related >>> provenance, like those in GISIN, or the vocabulary Roger >>> mentioned in his PESI talk at TDWG. >>> >>> >>> >>> However, against a specific collection you can only >>> record what the recorder actually knows at that location for >>> that specific collected taxon, and not to infer a status like >>> 'introduced' etc. >>> >>> >>> >>> So, to me, the vocabulary reduces even further - and >>> the obvious ones are 'in cultivation', 'in captivity', >>> 'border intercept' . Our botanical collection management >>> system would hold more data on provenance of a specific >>> collection and linkages between events - from the wild at t=1, >>> x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But >>> then we often have that data because we are generating it. >>> >>> >>> >>> Jerry >>> >>> >>> >>> >>> >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle >>> Sent: Tuesday, 12 October 2010 3:27 p.m. >>> To: Donald.Hobern@csiro.au; tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> I certainly agree it's important! I was just saying >>> that a simple flag probably wouldn't be enough. I like the >>> idea of a controlled vocabulary (as you and John both allude >>> to), and I can imagine about a half-dozen terms that our >>> community will no-doubt adopt with almost no debate..... :-) >>> >>> >>> >>> In my mind, the broadest categories (and likely most >>> useful) would be something like: >>> >>> >>> >>> Native (was there without any assistance from humans) >>> >>> Introduced (got there with the assistance of humans, >>> but is inhabiting the natural environment) >>> >>> Captive (brought by humans and still maintained in captivity) >>> >>> >>> >>> You might also throw in "Cryptogenic", which is an >>> assertion that we do not know which of these categories a >>> particular organism falls (not the same as null, which means >>> we don't know whether or not we know) >>> >>> >>> >>> Of course, each of these can be further subdivded, >>> but the more we subdivide, the greater the ratio of >>> fuzzy:clean distinctions. I would say that the terms should >>> be established in consultation with those most likely to use >>> them (e.g., as you suggest, distribution analysis, niche modellers, >>> etc.) For example, it might be useful to distinguish between >>> an organism that was itself introduced, compared to the >>> progeny (or a well-established >>> population) of an intoduced organism. This information can be >>> useful for separating things likely to become established in >>> new localities, vs. things that do not seem to "take" in a >>> novel environment. >>> >>> Anyway...I didn't want to say a lot on this topic >>> (too late?); I just wanted to steer more towards controlled >>> vocabulary, than simple flag field. >>> >>> >>> >>> Aloha, >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: Donald.Hobern@csiro.au >>> [mailto:Donald.Hobern@csiro.au] >>> Sent: Monday, October 11, 2010 3:44 PM >>> To: Richard Pyle; tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> Hi Rich. >>> >>> >>> >>> I recognise this (and could probably define >>> many different useful flags). The bottom line is really >>> whether or not the location is one which should be used for >>> distribution analysis, niche modelling and similar >>> activities. There will certainly be many grey areas, but it >>> would be good if software could weed out captive occurrences. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box 1700, >>> Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au >>> <mailto:Donald.Hobern@csiro.au> >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Richard Pyle [mailto:deepreef@bishopmuseum.org] >>> Sent: Tuesday, 12 October 2010 12:33 PM >>> To: Hobern, Donald (CES, Black Mountain); >>> tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> >>> >>> I'm not so sure a simple flag will do it. We >>> have examples ranging from animals in zoos, to escaped >>> animals, to intentionally and unintentionally introduced >>> populations, to naturalized populations -- and just about >>> everything in-between. Where on this spectrum would you draw >>> the line for flagging something as "naturally occurring"? >>> >>> >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: >>> tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of >>> Donald.Hobern@csiro.au >>> Sent: Monday, October 11, 2010 2:59 PM >>> To: tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> Thanks, John. >>> >>> >>> >>> This is useful, but completely >>> uncontrolled - effectively a verbatimEstablishmentMeans. >>> Having a more controlled version or a simple flag which could >>> be machine-processible in those cases where providers can >>> supply it would be useful. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, >>> Atlas of Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au >>> <mailto:Donald.Hobern@csiro.au> >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: gtuco.btuco@gmail.com >>> [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek >>> Sent: Tuesday, 12 October 2010 11:34 AM >>> To: Hobern, Donald (CES, Black Mountain) >>> Cc: jsachs@csee.umbc.edu; >>> tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> >>> >>> Natural occurrence is meant to be >>> captured through the term dwc:establishmentMeans >>> (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). >>> >>> On Mon, Oct 11, 2010 at 5:16 PM, >>> <Donald.Hobern@csiro.au> wrote: >>> >>> Thanks, Joel. >>> >>> Nice summary. One addition which we >>> do need to resolve (and which has been suggested in recent >>> months) is to have a flag to indicate whether a record should >>> be considered to show a "natural" >>> occurrence (in distinction from cultivation, botanic gardens, >>> zoos, etc.). >>> This is not so much an issue in a BioBlitz, but is certainly >>> a factor with citizen science recording in general - see the >>> number of zoo animals in the Flickr EOL group. >>> >>> Donald >>> >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> Phone: (02) 62464352 Mobile: 0437990208 >>> Email: Donald.Hobern@csiro.au >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs >>> Sent: Monday, 11 October 2010 10:47 PM >>> To: tdwg-bioblitz@googlegroups.com; >>> tdwg-content@lists.tdwg.org >>> Subject: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> One of the goals of the recent >>> bioblitz was to think about the suitability and >>> appropriatness of TDWG standards for citizen science. Robert >>> Stevenson has volunteered to take the lead on preparing a >>> technobioblitz lessons learned document, and though the scope >>> of this document is not yet determined, I think the audience >>> will include bioblitz organizers, software developers, and >>> TDWG as a whole. I hope no one is shy about sharing lessons >>> they think they learned, or suggestions that they have. We >>> can use the bioblitz google group for this discussion, and >>> copy in tdwg-content when our discussion is standards-specific. >>> >>> Here are some of my immediate observations: >>> >>> 1. Darwin Core is almost exactly >>> right for citizen science. However, there is a desperate need >>> for examples and templates of its use. To illustrate this >>> need: one of the developers spoke of the design choice >>> between "a simple csv file and a Darwin Core record". But a >>> simple csv file is a legitimate representation of Darwin >>> Core! To be fair to the developer, such a sentence might not >>> have struck me as absurd a year ago, before Remsen said >>> "let's use DwC for the bioblitz". >>> >>> We provided a couple of example DwC >>> records (text and rdf) in the bioblitz data profile [1]. I >>> think the lessons learned document should include an on-line >>> catalog of cut-and-pasteable examples covering a variety of >>> use cases, together with a dead simple desciption of DwC, >>> something like "Darwin Core is a collection of terms, >>> together with definitions." >>> >>> Here are areas where we augemented or >>> diverged from DwC in the bioblitz: >>> >>> i. We added obs:observedBy [2], since >>> there is no equivalent property in DwC, and it's important in >>> Citizen Science (though often not available). >>> >>> ii. We used geo:lat and geo:long [3] >>> instead of DwC terms for latitude and longitude. The geo >>> namespace is a well used and supported standard, and records >>> with geo coordinates are automatically mapped by several >>> applications. Since everyone was using GPS to retrieve their >>> coordinates, we were able to assume WGS-84 as the datum. >>> >>> If someone had used another Datum, >>> say XYZ, we would have added columns to the Fusion table so >>> that they could have expressed their coordiantes in DwC, as, e.g.: >>> DwC:decimalLatitude=41.5 >>> DwC:decimalLongitude=-70.7 >>> DwC:geodeticDatum=XYZ >>> >>> (I would argue that it should be >>> kosher DwC to express the above as simply XYZ:lat and >>> XYZ:long. DwC already incorporates terms from other >>> namespaces, such as Dublin Core, so there is precedent for this. >>> >>> 2. DwC:scientificName might be more >>> user friendly than taxonomy:binomial and the other taxonomy >>> machine tags EOL uses for flickr images. If >>> DwC:scientificName isn't self-explanatory enough, a user can >>> look it up, and see that any scientific name is acceptable, >>> at any taxonomic rank, or not having any rank. And once we >>> have a scientific name, higher ranks can be inferred. >>> >>> 3. Catalogue of Life was an important >>> part of the workflow, but we had some problems with it. >>> Future bioblitzes might consider using something like a CoL >>> fork, as recently described by Rod Page [4]. >>> >>> 4. We didn't include "basisOfRecord" >>> in the original data profile, and so it wasn't a column in >>> the Fusion Table [5]. But when a transcriber felt it was >>> necessary to include in order to capture data in a particular >>> field sheet, she just added the column to the table. This >>> flexibility of schema is important, and is in harmony with >>> the semantic web. >>> >>> 5. There seemed to be enthusiasm for >>> another field event at next year's TDWG. This could be an >>> opportunity to gather other types of data (eg. >>> character data) and thereby >>> i) expose meeting particpants to >>> another set of everyday problems from the world of >>> biodiversity workflows, and ii) try other TDWG technology on >>> for size, e.g. the observation exchange format, annotation >>> framework, etc. >>> >>> >>> Happy Thanksgiving to all in Canada - >>> Joel. >>> ---- >>> >>> >>> 1. >>> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz >> -profile-v1-1 >>> 2. Slightly bastardizing our old >>> observation ontology - >>> http://spire.umbc.edu/ontologies/Observation.owl >>> 3. http://www.w3.org/2003/01/geo/ >>> 4. >>> http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat >> a-in-2010.html >>> 5. >>> http://tables.googlelabs.com/DataSource?dsrcid=248798 >>> >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> >>> >>> >>> ________________________________ >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> >>> >>> >>> >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >> >> >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> >> Please consider the environment before printing this email >> Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. >> The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz > > _______________________________________________ > tdwg-content mailing list > tdwg-content@lists.tdwg.org > http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I was just ready to leave work when I wrote this and since then I'm feeling like I should clarify just what I mean by "wrong" ways of using RDF. I recognize that TDWG encourages flexibility in the ways that standards such as DwC are used. As such, it doesn't usually define "right" and "wrong" ways of using the standards. What I mean by calling some uses "wrong" is not intended to discourage the creative use of DwC terms in RDF. What I mean is that one must be careful to make sure that RDF statements mean what is intended. Here is an example. The Dublin Core term dcterms:language means "the language of the resource". On multiple occasions, I've seen this term used in RDF as a property of a resource whose metadata is written in a certain language. This is "wrong" because the subject of the statement is the resource itself, not the resource's metadata. The need for this kind of clarity is apparent in the case of media. For example, if we are providing metadata in English that describes a nature film which has audio in German, the correct statement is that [film] dcterms:language "de", NOT [film] dcterms:language "en". This problem is handled appropriately in the MRTG schema by creating the (required) term mrtg:metadataLanguage. The correct statement would be [film] mrtg:metadataLanguage "en" . (I'm using "[film]" in lieu of a URI identifier for the film.) If, however, we were writing RDF to describe the metadata itself rather than the film, then it would be appropriate to say [film's metadata] dcterms:language "en" . In straight XML, we might get away with semantic sloppiness if the senders and receivers of the XML "understand" what the intended subject is of the term dcterms:language. But in RDF, we have to assume that the receiver of the RDF is a "stupid" computer which only infers exactly what is said and not what we MEANT to say.
I believe that this is a very important point that all parties need to keep in mind before we happily march off creating RDF templates for the general public to use. In particular, I have some serious problems with the way that people are associating properties with instances of the dwc:Occurrence class. I believe that these "wrong" ways originate with the historical roots of Darwin Core as a means to describe specimens. I will illustrate what I mean. In many cases, a specimen is created by killing an organism and gluing it to a piece of paper (if it's a plant) or putting it in a jar (if it's an animal). It is natural to ask the question "what kind of species is the specimen?". We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes sense. However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include specimens but which also includes observations and probably all kinds of things like images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have a scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about observations. An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila melanogaster" we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
In all of the examples above, the correct statement is [individual organism] dwc:scientificName "Drosophila melanogaster". The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism. These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really. The image is not an image of the Drosophila species concept nor is it an image of the string "Drosophila melanogaster". The image is an image of an individual fruit fly. The individual fruit fly is a representative of the taxon, the image and the observation are not.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from it, collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things? Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it? That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird. We didn't get the DNA sample from the feather, we got it via a blood sample from the bird. The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it. Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost. The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject. It explains the rationale better than this email, has nice diagrams, and gives RDF examples to illustrate everything (https://journals.ku.edu/index.php/jbi/article/view/3664). If somebody has a better idea of how to develop an internally consistent system that can handle the problems I've raised here that DOESN'T involve Individuals (i.e. other "right"[=semantically accurate] ways to express properties and relationships among Identifications, Taxa, diverse types of Occurrences, etc.) I'd like to hear what it is. Or perhaps as Stan has suggested, there needs to be a task group that can hash out alternative views. But let's have the discussion before we post models and suggest people use them.
Steve
Steve Baskauf wrote:
Stan, Thanks for the clarification. My concern here is that standard or not, if examples are posted on the Google Darwin Code site, they will have an implied "stamp of approval" and will be used by others as a template (despite that site being labeled as "for discussion and development" not everyone can post to it and that implies some authority). In the case of straight XML, that isn't really that big of an issue. XML can mean whatever one wants as long as there is an agreement between the sender and the receiver (perhaps in the form of a formal XML schema) as to what the elements represent. I believe that RDF is a different beast. When one exposes RDF, the receiver is unknown. Therefore, the RDF has to actually "mean" something to the receiver without a pre-arranged agreement. In a generic XML document, the elements can simply be a list of string values of terms with no implied "meaning" except what might be inferred by grouping them in a container element. In RDF, the elements represent properties of particular resources. I believe strongly that although there may be several "right" ways to express properties of members of DwC classes, there are many more "wrong" ways that should not be used. By "right" I mean that they make sense semantically in that the properties logically are ones that should actually belong to the described resource. I do not believe that the discussion of these issues has progressed to the point where there is a consensus on the "right" use of DwC terms for some types of resources and therefore I am opposed to the posting of RDF examples on any official Darwin Core sites without a lot more discussion UNLESS the examples are clearly labeled as examples intended for discussion and not for use as templates. If you want such examples, I can provide http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf as an example for an Individual http://bioimages.vanderbilt.edu/baskauf/79695.rdf as am example of an Occurrence that is a live plant image and http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf as an example of an Occurrence that is an herbarium specimen I would be happy to discuss the reasons why I structured the RDF as I did (although mostly those examples are already rationalized in https://journals.ku.edu/index.php/jbi/article/view/3664), but I would not go so far as to say that they are "right" without some discussion.
What I intended when I suggested that I might write some kind of guide for Darwin Core represented in RDF/XML was really a document that explained to beginners what the point was of RDF, the basics of how one can structure properties in RDF using examples that are Darwin Core terms, and options for creating URIs that refer to resources that are described in separate files or within the same file. I wasn't really suggesting that it be a full-blown recommendation with specific guidelines for the use of particular terms or structuring of files for particular classes of resources, although that would be a good thing ultimately. I guess I was seeing some kind of a beginner's guide as a way to involve more people (who aren't up on RDF) in the discussion. I don't think that it should be necessary to complete full "standards" process before such a document were made available. It would probably be better to have some kind of road map where that document would be the first segment but would later be followed by guidelines for specific classes of resources with examples. I think that such a modular approach would be the most beneficial because pieces of it could actually get done in a timely fashion rather than requiring the whole thing to be complete before any of it would be accepted.
I do think a task group for Darwin Core RDF would be a good idea. If nobody is in a huge hurry, I don't mind trying to charter such a group, although I'd be just as happy if somebody else wanted to do it and I would just try be an active participant. I will look at the links you suggested, thanks.
Steve
Blum, Stan wrote:
Steve,
The TDWG process for creating standards is here: http://www.tdwg.org/about-tdwg/process/ This is worth reading if you haven’t done so already.
Another document worth reading is the standards format specifiation http://www.tdwg.org/standards/147/ I never pushed this “standard” through public review, but it still functions a guideline for formatting and our view of what is isn’t within scope of a “standard”. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 — uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement — offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but don’t create a de-novo standard.
Interest groups and task groups are explained in the Process. If you want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu wrote:
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve Tim Robertson (GBIF) wrote: I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists). On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote: I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples. On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" <mdoering@gbif.org> wrote: Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc Id volunteer to do the html page if Im given example records with a short use case description... Markus On Oct 12, 2010, at 13:14, Roger Hyam wrote: > Wow - what a thread to come back to. > > I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence". > > This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down! > > The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked. > > Take two examples. > > A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later. > > A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification. > > As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals. > > I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field. > > There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not. > > The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated. > > Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object. > > Sorry to be long winded. > > Roger > > > On 12 Oct 2010, at 09:36, Kevin Richards wrote: > >> I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing. >> >> Eg, if we describe (in a basic way) : >> Ocurrence = Taxon at Location >> >> then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) >> then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view) >> >> As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area" >> >> Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-) >> >> Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ... >> >> Kevin >> >> ________________________________________ >> From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] >> Sent: Tuesday, 12 October 2010 5:41 p.m. >> To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >> >> Hi Jerry, >> >> Before we agree to disagree, let me try to elaborate a bit more: >> >> I think we both agree that "Nativeness" (to borrow Dave's term) is a >> property of a taxon at a geographic locality (it could also be a property of >> a taxon in a class of habitat, but few people actually frame it this way). >> >> The reason I think that "Nativeness" is best represented as a property of an >> Occurrence, rather than of a taxon, is that a taxon is a circumscribed set >> of organisms, usually based on evolutionary relatedness or morphological or >> genetic similarity. By contrast, an Occurrence is about the presence of a >> member or multiple members of a taxon concept in space and time (i.e., at a >> particular place and time). >> >> We often think of Occurrence records in terms of individual organisms (e.g., >> specimens, or specific observed or photographed organisms), and I agree, >> it's weird to think of "Nativeness" as it applies to an individual organism. >> However, my understanding is that Occurrence instances can also apply to >> populations -- which is what terms such as establishmentMeans and >> occurrenceStatus fit into this class. >> >> More generally, if we agree that "Nativeness" is a property of a taxon at a >> particular locality, the way that this intersection is usually manifest in >> DwC is via Occurrence and Event instances. >> >> How else would you represent "Nativeness" within DwC? >> >> Aloha, >> Rich >> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper >>> Sent: Monday, October 11, 2010 6:02 PM >>> To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> We will have to agree to disagree. >>> >>> For me at least 'Native', 'Invasive' etc are clearly not >>> properties associated with a collection event. They are >>> collective statements, not necessarily about properties of >>> the taxon as a whole, but about the properties of a taxon in >>> some restricted sense - usually geographically restricted. >>> >>> GISIN, like our model here in NZ, pulls together such items >>> under a triplet of taxon/occurrence statement/geographical >>> extent linked to a publication. >>> >>> >>> Jerry >>> >>> >>> -----Original Message----- >>> From: Richard Pyle [mailto:deepreef@bishopmuseum.org] >>> Sent: Tuesday, 12 October 2010 4:23 p.m. >>> To: Jerry Cooper >>> Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz >>> >>> Hi Jerry, >>> >>> Yes, this is a road I've been down before. Intuitively, >>> these terms seem like they should apply to taxon concepts, >>> but it turns out that's not the right way to do it. Things >>> like "native" and "invasive" are not properties of taxon >>> concepts; they're the property of an occurrence (which, I >>> suspect, is why establishmentMeans is included in the >>> Occurrence class in DwC; e.g., see the examples at >>> http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans >>> >>> Rich >>> >>> ________________________________ >>> >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper >>> Sent: Monday, October 11, 2010 4:38 PM >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> Rich, >>> >>> >>> >>> Let's not confuse those terms which are best applied >>> to a taxon concept rather than a specific >>> collection/observation of a taxon at a location. >>> >>> >>> >>> There are existing vocabularies for taxon-related >>> provenance, like those in GISIN, or the vocabulary Roger >>> mentioned in his PESI talk at TDWG. >>> >>> >>> >>> However, against a specific collection you can only >>> record what the recorder actually knows at that location for >>> that specific collected taxon, and not to infer a status like >>> 'introduced' etc. >>> >>> >>> >>> So, to me, the vocabulary reduces even further - and >>> the obvious ones are 'in cultivation', 'in captivity', >>> 'border intercept' . Our botanical collection management >>> system would hold more data on provenance of a specific >>> collection and linkages between events - from the wild at t=1, >>> x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But >>> then we often have that data because we are generating it. >>> >>> >>> >>> Jerry >>> >>> >>> >>> >>> >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle >>> Sent: Tuesday, 12 October 2010 3:27 p.m. >>> To: Donald.Hobern@csiro.au; tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I learned at the >>> TechnoBioBlitz >>> >>> >>> >>> I certainly agree it's important! I was just saying >>> that a simple flag probably wouldn't be enough. I like the >>> idea of a controlled vocabulary (as you and John both allude >>> to), and I can imagine about a half-dozen terms that our >>> community will no-doubt adopt with almost no debate..... :-) >>> >>> >>> >>> In my mind, the broadest categories (and likely most >>> useful) would be something like: >>> >>> >>> >>> Native (was there without any assistance from humans) >>> >>> Introduced (got there with the assistance of humans, >>> but is inhabiting the natural environment) >>> >>> Captive (brought by humans and still maintained in captivity) >>> >>> >>> >>> You might also throw in "Cryptogenic", which is an >>> assertion that we do not know which of these categories a >>> particular organism falls (not the same as null, which means >>> we don't know whether or not we know) >>> >>> >>> >>> Of course, each of these can be further subdivded, >>> but the more we subdivide, the greater the ratio of >>> fuzzy:clean distinctions. I would say that the terms should >>> be established in consultation with those most likely to use >>> them (e.g., as you suggest, distribution analysis, niche modellers, >>> etc.) For example, it might be useful to distinguish between >>> an organism that was itself introduced, compared to the >>> progeny (or a well-established >>> population) of an intoduced organism. This information can be >>> useful for separating things likely to become established in >>> new localities, vs. things that do not seem to "take" in a >>> novel environment. >>> >>> Anyway...I didn't want to say a lot on this topic >>> (too late?); I just wanted to steer more towards controlled >>> vocabulary, than simple flag field. >>> >>> >>> >>> Aloha, >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: Donald.Hobern@csiro.au >>> [mailto:Donald.Hobern@csiro.au] >>> Sent: Monday, October 11, 2010 3:44 PM >>> To: Richard Pyle; tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> Hi Rich. >>> >>> >>> >>> I recognise this (and could probably define >>> many different useful flags). The bottom line is really >>> whether or not the location is one which should be used for >>> distribution analysis, niche modelling and similar >>> activities. There will certainly be many grey areas, but it >>> would be good if software could weed out captive occurrences. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box 1700, >>> Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au >>> <mailto:Donald.Hobern@csiro.au> >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Richard Pyle [mailto:deepreef@bishopmuseum.org] >>> Sent: Tuesday, 12 October 2010 12:33 PM >>> To: Hobern, Donald (CES, Black Mountain); >>> tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: RE: [tdwg-content] What I learned at >>> the TechnoBioBlitz >>> >>> >>> >>> I'm not so sure a simple flag will do it. We >>> have examples ranging from animals in zoos, to escaped >>> animals, to intentionally and unintentionally introduced >>> populations, to naturalized populations -- and just about >>> everything in-between. Where on this spectrum would you draw >>> the line for flagging something as "naturally occurring"? >>> >>> >>> >>> Rich >>> >>> >>> >>> ________________________________ >>> >>> From: >>> tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of >>> Donald.Hobern@csiro.au >>> Sent: Monday, October 11, 2010 2:59 PM >>> To: tuco@berkeley.edu >>> Cc: tdwg-content@lists.tdwg.org; >>> tdwg-bioblitz@googlegroups.com >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> Thanks, John. >>> >>> >>> >>> This is useful, but completely >>> uncontrolled - effectively a verbatimEstablishmentMeans. >>> Having a more controlled version or a simple flag which could >>> be machine-processible in those cases where providers can >>> supply it would be useful. >>> >>> >>> >>> Donald >>> >>> >>> >>> >>> >>> untitled >>> >>> >>> >>> Donald Hobern, Director, >>> Atlas of Living Australia >>> >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> >>> Phone: (02) 62464352 Mobile: 0437990208 >>> >>> Email: Donald.Hobern@csiro.au >>> <mailto:Donald.Hobern@csiro.au> >>> >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: gtuco.btuco@gmail.com >>> [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek >>> Sent: Tuesday, 12 October 2010 11:34 AM >>> To: Hobern, Donald (CES, Black Mountain) >>> Cc: jsachs@csee.umbc.edu; >>> tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org >>> Subject: Re: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> >>> >>> Natural occurrence is meant to be >>> captured through the term dwc:establishmentMeans >>> (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans). >>> >>> On Mon, Oct 11, 2010 at 5:16 PM, >>> <Donald.Hobern@csiro.au> wrote: >>> >>> Thanks, Joel. >>> >>> Nice summary. One addition which we >>> do need to resolve (and which has been suggested in recent >>> months) is to have a flag to indicate whether a record should >>> be considered to show a "natural" >>> occurrence (in distinction from cultivation, botanic gardens, >>> zoos, etc.). >>> This is not so much an issue in a BioBlitz, but is certainly >>> a factor with citizen science recording in general - see the >>> number of zoo animals in the Flickr EOL group. >>> >>> Donald >>> >>> >>> >>> >>> Donald Hobern, Director, Atlas of >>> Living Australia >>> CSIRO Ecosystem Sciences, GPO Box >>> 1700, Canberra, ACT 2601 >>> Phone: (02) 62464352 Mobile: 0437990208 >>> Email: Donald.Hobern@csiro.au >>> Web: http://www.ala.org.au/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: tdwg-content-bounces@lists.tdwg.org >>> [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs >>> Sent: Monday, 11 October 2010 10:47 PM >>> To: tdwg-bioblitz@googlegroups.com; >>> tdwg-content@lists.tdwg.org >>> Subject: [tdwg-content] What I >>> learned at the TechnoBioBlitz >>> >>> One of the goals of the recent >>> bioblitz was to think about the suitability and >>> appropriatness of TDWG standards for citizen science. Robert >>> Stevenson has volunteered to take the lead on preparing a >>> technobioblitz lessons learned document, and though the scope >>> of this document is not yet determined, I think the audience >>> will include bioblitz organizers, software developers, and >>> TDWG as a whole. I hope no one is shy about sharing lessons >>> they think they learned, or suggestions that they have. We >>> can use the bioblitz google group for this discussion, and >>> copy in tdwg-content when our discussion is standards-specific. >>> >>> Here are some of my immediate observations: >>> >>> 1. Darwin Core is almost exactly >>> right for citizen science. However, there is a desperate need >>> for examples and templates of its use. To illustrate this >>> need: one of the developers spoke of the design choice >>> between "a simple csv file and a Darwin Core record". But a >>> simple csv file is a legitimate representation of Darwin >>> Core! To be fair to the developer, such a sentence might not >>> have struck me as absurd a year ago, before Remsen said >>> "let's use DwC for the bioblitz". >>> >>> We provided a couple of example DwC >>> records (text and rdf) in the bioblitz data profile [1]. I >>> think the lessons learned document should include an on-line >>> catalog of cut-and-pasteable examples covering a variety of >>> use cases, together with a dead simple desciption of DwC, >>> something like "Darwin Core is a collection of terms, >>> together with definitions." >>> >>> Here are areas where we augemented or >>> diverged from DwC in the bioblitz: >>> >>> i. We added obs:observedBy [2], since >>> there is no equivalent property in DwC, and it's important in >>> Citizen Science (though often not available). >>> >>> ii. We used geo:lat and geo:long [3] >>> instead of DwC terms for latitude and longitude. The geo >>> namespace is a well used and supported standard, and records >>> with geo coordinates are automatically mapped by several >>> applications. Since everyone was using GPS to retrieve their >>> coordinates, we were able to assume WGS-84 as the datum. >>> >>> If someone had used another Datum, >>> say XYZ, we would have added columns to the Fusion table so >>> that they could have expressed their coordiantes in DwC, as, e.g.: >>> DwC:decimalLatitude=41.5 >>> DwC:decimalLongitude=-70.7 >>> DwC:geodeticDatum=XYZ >>> >>> (I would argue that it should be >>> kosher DwC to express the above as simply XYZ:lat and >>> XYZ:long. DwC already incorporates terms from other >>> namespaces, such as Dublin Core, so there is precedent for this. >>> >>> 2. DwC:scientificName might be more >>> user friendly than taxonomy:binomial and the other taxonomy >>> machine tags EOL uses for flickr images. If >>> DwC:scientificName isn't self-explanatory enough, a user can >>> look it up, and see that any scientific name is acceptable, >>> at any taxonomic rank, or not having any rank. And once we >>> have a scientific name, higher ranks can be inferred. >>> >>> 3. Catalogue of Life was an important >>> part of the workflow, but we had some problems with it. >>> Future bioblitzes might consider using something like a CoL >>> fork, as recently described by Rod Page [4]. >>> >>> 4. We didn't include "basisOfRecord" >>> in the original data profile, and so it wasn't a column in >>> the Fusion Table [5]. But when a transcriber felt it was >>> necessary to include in order to capture data in a particular >>> field sheet, she just added the column to the table. This >>> flexibility of schema is important, and is in harmony with >>> the semantic web. >>> >>> 5. There seemed to be enthusiasm for >>> another field event at next year's TDWG. This could be an >>> opportunity to gather other types of data (eg. >>> character data) and thereby >>> i) expose meeting particpants to >>> another set of everyday problems from the world of >>> biodiversity workflows, and ii) try other TDWG technology on >>> for size, e.g. the observation exchange format, annotation >>> framework, etc. >>> >>> >>> Happy Thanksgiving to all in Canada - >>> Joel. >>> ---- >>> >>> >>> 1. >>> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz >> -profile-v1-1 >>> 2. Slightly bastardizing our old >>> observation ontology - >>> http://spire.umbc.edu/ontologies/Observation.owl >>> 3. http://www.w3.org/2003/01/geo/ >>> 4. >>> http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat >> a-in-2010.html >>> 5. >>> http://tables.googlelabs.com/DataSource?dsrcid=248798 >>> >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >>> >>> >>> >>> ________________________________ >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> >>> >>> >>> >>> >>> Please consider the environment before printing this email >>> Warning: This electronic message together with any >>> attachments is confidential. If you receive it in error: (i) >>> you must not read, use, disclose, copy or retain it; (ii) >>> please contact the sender immediately by reply email and then >>> delete the emails. >>> The views expressed in this email may not be those of >>> Landcare Research New Zealand Limited. >>> http://www.landcareresearch.co.nz >>> _______________________________________________ >>> tdwg-content mailing list >>> tdwg-content@lists.tdwg.org >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content >>> >> >> >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org >> http://lists.tdwg.org/mailman/listinfo/tdwg-content >> >> Please consider the environment before printing this email >> Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. >> The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz > > _______________________________________________ > tdwg-content mailing list > tdwg-content@lists.tdwg.org > http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Hi Steve,
It is not if there are not examples of things that seem to work in this space, it is that those alternatives that could be incorporated into the DarwinCore are largely ignored unless they they come from the "right people".
How these "right people" are defined is not quite clear to me but what seems strange is that many of the issues that are being rehashed over and over again I have already live working examples of.
If there are any flaws. or features lacking. is because I have spent far too much time, as you have, trying to get people on the list to some of the problems with the system they have proposed.
I believe that you are right and that we need to represent individuals in the RDF version, but I have implemented them in a slightly different way.
http://lod.taxonconcept.org/ses/iuCXz#Species txn:speciesConceptHasSpeciesIndividualTag http://lod.taxonconcept.org/ses/iuCXz#Individual
That this creates is a usable type for an "individual of that species concept." It is of type *txn:SpeciesIndividualTag* * * *Now you can easily query for all the "individuals" or all the "individuals that are of a particular species concept".* * * *We could be using the species concept URI's that I have setup but instead we see a perfect example of the folly of LSID's.* * * *Those listed above do not resolve to anything that tells me what they mean. * * * *(I tried, not because I wanted show the folly of LSID's but to see if the LSID was for something that I had a URI for.* * * *A URI that could be used in the interim.)* * * *What I got was an LSID that despite being resolved through a proxy, returned nothing.* * * *I can always add the zoobank LSID to the metadata to the description for that concept which would allow some tracking and use of LSID's.* * * *Frankly, I don't know what is really going on, but there is something very strange about how this entire process is operating.* * * *Some of these issue are better handled by people who already understand RDF, but instead are being rehashed here.* * * *Why do I keep seeing this "pull" to create a specific subsection of the semantic web or even informatics rather that benefit from all the related work happening a few email lists away?* * * *One thing that you might be missing, is that in RDF something can have many types so you can have something that is both a depiction of a speciesconcept and a depiction of an individual.* * * *Just like you had can have a depiction of a "Firehouse" that is also a depiction of the "West Washington Firehouse" * * * *I have thought that there may a need to be some additional "tag" like identifiers like Image or Media, which has Image as a subclass.* * * *Also since the dwc is not really "live" and and working like it should, we can't really test it in the ways it need to be tested.* * * In summary, I feel your pain. * * *Respectfully,* * * *- Pete * On Wed, Oct 13, 2010 at 9:07 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
I was just ready to leave work when I wrote this and since then I'm feeling like I should clarify just what I mean by "wrong" ways of using RDF. I recognize that TDWG encourages flexibility in the ways that standards such as DwC are used. As such, it doesn't usually define "right" and "wrong" ways of using the standards. What I mean by calling some uses "wrong" is not intended to discourage the creative use of DwC terms in RDF. What I mean is that one must be careful to make sure that RDF statements mean what is intended. Here is an example. The Dublin Core term dcterms:language means "the language of the resource". On multiple occasions, I've seen this term used in RDF as a property of a resource whose metadata is written in a certain language. This is "wrong" because the subject of the statement is the resource itself, not the resource's metadata. The need for this kind of clarity is apparent in the case of media. For example, if we are providing metadata in English that describes a nature film which has audio in German, the correct statement is that [film] dcterms:language "de", NOT [film] dcterms:language "en". This problem is handled appropriately in the MRTG schema by creating the (required) term mrtg:metadataLanguage. The correct statement would be [film] mrtg:metadataLanguage "en" . (I'm using "[film]" in lieu of a URI identifier for the film.) If, however, we were writing RDF to describe the metadata itself rather than the film, then it would be appropriate to say [film's metadata] dcterms:language "en" . In straight XML, we might get away with semantic sloppiness if the senders and receivers of the XML "understand" what the intended subject is of the term dcterms:language. But in RDF, we have to assume that the receiver of the RDF is a "stupid" computer which only infers exactly what is said and not what we MEANT to say.
I believe that this is a very important point that all parties need to keep in mind before we happily march off creating RDF templates for the general public to use. In particular, I have some serious problems with the way that people are associating properties with instances of the dwc:Occurrence class. I believe that these "wrong" ways originate with the historical roots of Darwin Core as a means to describe specimens. I will illustrate what I mean. In many cases, a specimen is created by killing an organism and gluing it to a piece of paper (if it's a plant) or putting it in a jar (if it's an animal). It is natural to ask the question "what kind of species is the specimen?". We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes sense. However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include specimens but which also includes observations and probably all kinds of things like images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have a scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about observations. An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila melanogaster" we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
In all of the examples above, the correct statement is [individual organism] dwc:scientificName "Drosophila melanogaster". The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism. These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really. The image is not an image of the Drosophila species concept nor is it an image of the string "Drosophila melanogaster". The image is an image of an individual fruit fly. The individual fruit fly is a representative of the taxon, the image and the observation are not.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from it, collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things? Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it? That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird. We didn't get the DNA sample from the feather, we got it via a blood sample from the bird. The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it. Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost. The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject. It explains the rationale better than this email, has nice diagrams, and gives RDF examples to illustrate everything ( https://journals.ku.edu/index.php/jbi/article/view/3664). If somebody has a better idea of how to develop an internally consistent system that can handle the problems I've raised here that DOESN'T involve Individuals (i.e. other "right"[=semantically accurate] ways to express properties and relationships among Identifications, Taxa, diverse types of Occurrences, etc.) I'd like to hear what it is. Or perhaps as Stan has suggested, there needs to be a task group that can hash out alternative views. But let's have the discussion before we post models and suggest people use them.
Steve
Steve Baskauf wrote:
Stan, Thanks for the clarification. My concern here is that standard or not, if examples are posted on the Google Darwin Code site, they will have an implied "stamp of approval" and will be used by others as a template (despite that site being labeled as "for discussion and development" not everyone can post to it and that implies some authority). In the case of straight XML, that isn't really that big of an issue. XML can mean whatever one wants as long as there is an agreement between the sender and the receiver (perhaps in the form of a formal XML schema) as to what the elements represent. I believe that RDF is a different beast. When one exposes RDF, the receiver is unknown. Therefore, the RDF has to actually "mean" something to the receiver without a pre-arranged agreement. In a generic XML document, the elements can simply be a list of string values of terms with no implied "meaning" except what might be inferred by grouping them in a container element. In RDF, the elements represent properties of particular resources. I believe strongly that although there may be several "right" ways to express properties of members of DwC classes, there are many more "wrong" ways that should not be used. By "right" I mean that they make sense semantically in that the properties logically are ones that should actually belong to the described resource. I do not believe that the discussion of these issues has progressed to the point where there is a consensus on the "right" use of DwC terms for some types of resources and therefore I am opposed to the posting of RDF examples on any official Darwin Core sites without a lot more discussion UNLESS the examples are clearly labeled as examples intended for discussion and not for use as templates. If you want such examples, I can provide http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf as an example for an Individual http://bioimages.vanderbilt.edu/baskauf/79695.rdf as am example of an Occurrence that is a live plant image and http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf as an example of an Occurrence that is an herbarium specimen I would be happy to discuss the reasons why I structured the RDF as I did (although mostly those examples are already rationalized in https://journals.ku.edu/index.php/jbi/article/view/3664), but I would not go so far as to say that they are "right" without some discussion.
What I intended when I suggested that I might write some kind of guide for Darwin Core represented in RDF/XML was really a document that explained to beginners what the point was of RDF, the basics of how one can structure properties in RDF using examples that are Darwin Core terms, and options for creating URIs that refer to resources that are described in separate files or within the same file. I wasn't really suggesting that it be a full-blown recommendation with specific guidelines for the use of particular terms or structuring of files for particular classes of resources, although that would be a good thing ultimately. I guess I was seeing some kind of a beginner's guide as a way to involve more people (who aren't up on RDF) in the discussion. I don't think that it should be necessary to complete full "standards" process before such a document were made available. It would probably be better to have some kind of road map where that document would be the first segment but would later be followed by guidelines for specific classes of resources with examples. I think that such a modular approach would be the most beneficial because pieces of it could actually get done in a timely fashion rather than requiring the whole thing to be complete before any of it would be accepted.
I do think a task group for Darwin Core RDF would be a good idea. If nobody is in a huge hurry, I don't mind trying to charter such a group, although I'd be just as happy if somebody else wanted to do it and I would just try be an active participant. I will look at the links you suggested, thanks.
Steve
Blum, Stan wrote:
Steve,
The TDWG process for creating standards is here: http://www.tdwg.org/about-tdwg/process/ This is worth reading if you haven’t done so already.
Another document worth reading is the standards format specifiation http://www.tdwg.org/standards/147/ I never pushed this “standard” through public review, but it still functions a guideline for formatting and our view of what is isn’t within scope of a “standard”. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 — uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement — offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but don’t create a de-novo standard.
Interest groups and task groups are explained in the Process. If you want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu wrote:
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve
Tim Robertson (GBIF) wrote:
I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" < mdoering@gbif.org> wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down
the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of
taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being
maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I
do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence
record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots
Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as
we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to
mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
then this is equivalent to saying that Nativeness is a property of an
Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties
of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list,
having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [
tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [ deepreef@bishopmuseum.org]
Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a
property of
a taxon in a class of habitat, but few people actually frame it this
way).
The reason I think that "Nativeness" is best represented as a property
of an
Occurrence, rather than of a taxon, is that a taxon is a circumscribed
set
of organisms, usually based on evolutionary relatedness or morphological
or
genetic similarity. By contrast, an Occurrence is about the presence of
a
member or multiple members of a taxon concept in space and time (i.e.,
at a
particular place and time).
We often think of Occurrence records in terms of individual organisms
(e.g.,
specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual
organism.
However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon
at a
particular locality, the way that this intersection is usually manifest
in
DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of Jerry Cooper
Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.orgdeepreef@bishopmuseum.org
]
Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of Jerry Cooper
Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of Richard Pyle
Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
<mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au>
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org<deepreef@bishopmuseum.org>
]
Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of
Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
<mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au>
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com gtuco.btuco@gmail.com] On Behalf Of
John Wieczorek
Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu;
tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of joel sachs
Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com;
tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I think the problem is much simpler and less sinsiter than this. I think the problem is that we have many people who inderstand biodiversity data very well, but go glassy-eyed on technical discussions about RDF (e.g., me). And there are also people who understand RDF (and other related protocols and technologies), who go glassy-eyed on discussions about subtle distinctions between taxon names and taxon concepts (and other such details).
There are only a very few people who seem to have a foot firmly in both camps; and half the time those people will go over the heads of both other groups simultaneously (and hence not be understood by either).
And then you have extreme examples of people who understand taxon names and concepts very well (like me), but are put in the awkward position of trying to develop core web services (like ZooBank).
Like I said in an earlier post, a little knowledge is a dangerous thing. Sometimes a VERY dangerous thing.
Aloha, Rich
_____
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Peter DeVries Sent: Wednesday, October 13, 2010 10:54 PM To: Steve Baskauf Cc: tdwg-content@lists.tdwg.org; Roger Hyam; tdwg-bioblitz@googlegroups.com; Blum, Stan; Jerry Cooper Subject: Re: [tdwg-content] "Wrong" RDF, was Re: What I learned at the TechnoBioBlitz
Hi Steve,
It is not if there are not examples of things that seem to work in this space, it is that those alternatives that could be incorporated into the DarwinCore are largely ignored unless they they come from the "right people".
How these "right people" are defined is not quite clear to me but what seems strange is that many of the issues that are being rehashed over and over again I have already live working examples of.
If there are any flaws. or features lacking. is because I have spent far too much time, as you have, trying to get people on the list to some of the problems with the system they have proposed.
I believe that you are right and that we need to represent individuals in the RDF version, but I have implemented them in a slightly different way.
http://lod.taxonconcept.org/ses/iuCXz#Species txn:speciesConceptHasSpeciesIndividualTag http://lod.taxonconcept.org/ses/iuCXz#Individual
That this creates is a usable type for an "individual of that species concept." It is of type txn:SpeciesIndividualTag
Now you can easily query for all the "individuals" or all the "individuals that are of a particular species concept".
We could be using the species concept URI's that I have setup but instead we see a perfect example of the folly of LSID's.
Those listed above do not resolve to anything that tells me what they mean.
(I tried, not because I wanted show the folly of LSID's but to see if the LSID was for something that I had a URI for.
A URI that could be used in the interim.)
What I got was an LSID that despite being resolved through a proxy, returned nothing.
I can always add the zoobank LSID to the metadata to the description for that concept which would allow some tracking and use of LSID's.
Frankly, I don't know what is really going on, but there is something very strange about how this entire process is operating.
Some of these issue are better handled by people who already understand RDF, but instead are being rehashed here.
Why do I keep seeing this "pull" to create a specific subsection of the semantic web or even informatics rather that benefit from all the related work happening a few email lists away?
One thing that you might be missing, is that in RDF something can have many types so you can have something that is both a depiction of a speciesconcept and a depiction of an individual.
Just like you had can have a depiction of a "Firehouse" that is also a depiction of the "West Washington Firehouse"
I have thought that there may a need to be some additional "tag" like identifiers like Image or Media, which has Image as a subclass.
Also since the dwc is not really "live" and and working like it should, we can't really test it in the ways it need to be tested.
In summary, I feel your pain.
Respectfully,
- Pete
On Wed, Oct 13, 2010 at 9:07 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
I was just ready to leave work when I wrote this and since then I'm feeling like I should clarify just what I mean by "wrong" ways of using RDF. I recognize that TDWG encourages flexibility in the ways that standards such as DwC are used. As such, it doesn't usually define "right" and "wrong" ways of using the standards. What I mean by calling some uses "wrong" is not intended to discourage the creative use of DwC terms in RDF. What I mean is that one must be careful to make sure that RDF statements mean what is intended. Here is an example. The Dublin Core term dcterms:language means "the language of the resource". On multiple occasions, I've seen this term used in RDF as a property of a resource whose metadata is written in a certain language. This is "wrong" because the subject of the statement is the resource itself, not the resource's metadata. The need for this kind of clarity is apparent in the case of media. For example, if we are providing metadata in English that describes a nature film which has audio in German, the correct statement is that [film] dcterms:language "de", NOT [film] dcterms:language "en". This problem is handled appropriately in the MRTG schema by creating the (required) term mrtg:metadataLanguage. The correct statement would be [film] mrtg:metadataLanguage "en" . (I'm using "[film]" in lieu of a URI identifier for the film.) If, however, we were writing RDF to describe the metadata itself rather than the film, then it would be appropriate to say [film's metadata] dcterms:language "en" . In straight XML, we might get away with semantic sloppiness if the senders and receivers of the XML "understand" what the intended subject is of the term dcterms:language. But in RDF, we have to assume that the receiver of the RDF is a "stupid" computer which only infers exactly what is said and not what we MEANT to say.
I believe that this is a very important point that all parties need to keep in mind before we happily march off creating RDF templates for the general public to use. In particular, I have some serious problems with the way that people are associating properties with instances of the dwc:Occurrence class. I believe that these "wrong" ways originate with the historical roots of Darwin Core as a means to describe specimens. I will illustrate what I mean. In many cases, a specimen is created by killing an organism and gluing it to a piece of paper (if it's a plant) or putting it in a jar (if it's an animal). It is natural to ask the question "what kind of species is the specimen?". We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes sense. However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include specimens but which also includes observations and probably all kinds of things like images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have a scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about observations. An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila melanogaster" we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
In all of the examples above, the correct statement is [individual organism] dwc:scientificName "Drosophila melanogaster". The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism. These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really. The image is not an image of the Drosophila species concept nor is it an image of the string "Drosophila melanogaster". The image is an image of an individual fruit fly. The individual fruit fly is a representative of the taxon, the image and the observation are not.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from it, collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things? Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it? That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird. We didn't get the DNA sample from the feather, we got it via a blood sample from the bird. The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it. Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost. The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject. It explains the rationale better than this email, has nice diagrams, and gives RDF examples to illustrate everything (https://journals.ku.edu/index.php/jbi/article/view/3664). If somebody has a better idea of how to develop an internally consistent system that can handle the problems I've raised here that DOESN'T involve Individuals (i.e. other "right"[=semantically accurate] ways to express properties and relationships among Identifications, Taxa, diverse types of Occurrences, etc.) I'd like to hear what it is. Or perhaps as Stan has suggested, there needs to be a task group that can hash out alternative views. But let's have the discussion before we post models and suggest people use them.
Steve
Steve Baskauf wrote:
Stan, Thanks for the clarification. My concern here is that standard or not, if examples are posted on the Google Darwin Code site, they will have an implied "stamp of approval" and will be used by others as a template (despite that site being labeled as "for discussion and development" not everyone can post to it and that implies some authority). In the case of straight XML, that isn't really that big of an issue. XML can mean whatever one wants as long as there is an agreement between the sender and the receiver (perhaps in the form of a formal XML schema) as to what the elements represent. I believe that RDF is a different beast. When one exposes RDF, the receiver is unknown. Therefore, the RDF has to actually "mean" something to the receiver without a pre-arranged agreement. In a generic XML document, the elements can simply be a list of string values of terms with no implied "meaning" except what might be inferred by grouping them in a container element. In RDF, the elements represent properties of particular resources. I believe strongly that although there may be several "right" ways to express properties of members of DwC classes, there are many more "wrong" ways that should not be used. By "right" I mean that they make sense semantically in that the properties logically are ones that should actually belong to the described resource. I do not believe that the discussion of these issues has progressed to the point where there is a consensus on the "right" use of DwC terms for some types of resources and therefore I am opposed to the posting of RDF examples on any official Darwin Core sites without a lot more discussion UNLESS the examples are clearly labeled as examples intended for discussion and not for use as templates. If you want such examples, I can provide http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf as an example for an Individual http://bioimages.vanderbilt.edu/baskauf/79695.rdf as am example of an Occurrence that is a live plant image and http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf as an example of an Occurrence that is an herbarium specimen I would be happy to discuss the reasons why I structured the RDF as I did (although mostly those examples are already rationalized in https://journals.ku.edu/index.php/jbi/article/view/3664), but I would not go so far as to say that they are "right" without some discussion.
What I intended when I suggested that I might write some kind of guide for Darwin Core represented in RDF/XML was really a document that explained to beginners what the point was of RDF, the basics of how one can structure properties in RDF using examples that are Darwin Core terms, and options for creating URIs that refer to resources that are described in separate files or within the same file. I wasn't really suggesting that it be a full-blown recommendation with specific guidelines for the use of particular terms or structuring of files for particular classes of resources, although that would be a good thing ultimately. I guess I was seeing some kind of a beginner's guide as a way to involve more people (who aren't up on RDF) in the discussion. I don't think that it should be necessary to complete full "standards" process before such a document were made available. It would probably be better to have some kind of road map where that document would be the first segment but would later be followed by guidelines for specific classes of resources with examples. I think that such a modular approach would be the most beneficial because pieces of it could actually get done in a timely fashion rather than requiring the whole thing to be complete before any of it would be accepted.
I do think a task group for Darwin Core RDF would be a good idea. If nobody is in a huge hurry, I don't mind trying to charter such a group, although I'd be just as happy if somebody else wanted to do it and I would just try be an active participant. I will look at the links you suggested, thanks.
Steve
Blum, Stan wrote:
Steve,
The TDWG process for creating standards is here: http://www.tdwg.org/about-tdwg/process/ This is worth reading if you havent done so already.
Another document worth reading is the standards format specifiation http://www.tdwg.org/standards/147/ I never pushed this standard through public review, but it still functions a guideline for formatting and our view of what is isnt within scope of a standard. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but dont create a de-novo standard.
Interest groups and task groups are explained in the Process. If you want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu wrote:
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve
Tim Robertson (GBIF) wrote:
I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down
the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa
in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being
maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do
not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence
record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots
Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as
we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to
mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
then this is equivalent to saying that Nativeness is a property of an
Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties
of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list,
having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org
[tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org]
Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property
of
a taxon in a class of habitat, but few people actually frame it this
way).
The reason I think that "Nativeness" is best represented as a property of
an
Occurrence, rather than of a taxon, is that a taxon is a circumscribed
set
of organisms, usually based on evolutionary relatedness or morphological
or
genetic similarity. By contrast, an Occurrence is about the presence of
a
member or multiple members of a taxon concept in space and time (i.e., at
a
particular place and time).
We often think of Occurrence records in terms of individual organisms
(e.g.,
specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual
organism.
However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at
a
particular locality, the way that this intersection is usually manifest
in
DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Note to all,
If you notice a large muscid fly buzzing around, pull your dark-bottled beer closer to yourself.
However, do not get so engrossed in responding in tdwg-tag that you fail to notice the fly is not buzzing anymore.
For those who may not have liked what I wrote, perhaps you could argue this is perhaps evidence of the existence of God? :-)
Nough Said,
If appropriate, I will respond to Rich's comments after some sleep.
- Pete
P.S. I think I know a little too much about diptera, but hope that EtOH is a pretty good disinfectant
On Thu, Oct 14, 2010 at 3:53 AM, Peter DeVries pete.devries@gmail.comwrote:
Hi Steve,
It is not if there are not examples of things that seem to work in this space, it is that those alternatives that could be incorporated into the DarwinCore are largely ignored unless they they come from the "right people".
How these "right people" are defined is not quite clear to me but what seems strange is that many of the issues that are being rehashed over and over again I have already live working examples of.
If there are any flaws. or features lacking. is because I have spent far too much time, as you have, trying to get people on the list to some of the problems with the system they have proposed.
I believe that you are right and that we need to represent individuals in the RDF version, but I have implemented them in a slightly different way.
http://lod.taxonconcept.org/ses/iuCXz#Species txn:speciesConceptHasSpeciesIndividualTag http://lod.taxonconcept.org/ses/iuCXz#Individual
That this creates is a usable type for an "individual of that species concept." It is of type *txn:SpeciesIndividualTag*
*Now you can easily query for all the "individuals" or all the "individuals that are of a particular species concept".*
*We could be using the species concept URI's that I have setup but instead we see a perfect example of the folly of LSID's.*
*Those listed above do not resolve to anything that tells me what they mean. *
*(I tried, not because I wanted show the folly of LSID's but to see if the LSID was for something that I had a URI for.*
*A URI that could be used in the interim.)*
*What I got was an LSID that despite being resolved through a proxy, returned nothing.*
*I can always add the zoobank LSID to the metadata to the description for that concept which would allow some tracking and use of LSID's.*
*Frankly, I don't know what is really going on, but there is something very strange about how this entire process is operating.*
*Some of these issue are better handled by people who already understand RDF, but instead are being rehashed here.*
*Why do I keep seeing this "pull" to create a specific subsection of the semantic web or even informatics rather that benefit from all the related work happening a few email lists away?*
*One thing that you might be missing, is that in RDF something can have many types so you can have something that is both a depiction of a speciesconcept and a depiction of an individual.*
*Just like you had can have a depiction of a "Firehouse" that is also a depiction of the "West Washington Firehouse" *
*I have thought that there may a need to be some additional "tag" like identifiers like Image or Media, which has Image as a subclass.*
*Also since the dwc is not really "live" and and working like it should, we can't really test it in the ways it need to be tested.*
In summary, I feel your pain.
*Respectfully,*
*- Pete
On Wed, Oct 13, 2010 at 9:07 PM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
I was just ready to leave work when I wrote this and since then I'm feeling like I should clarify just what I mean by "wrong" ways of using RDF. I recognize that TDWG encourages flexibility in the ways that standards such as DwC are used. As such, it doesn't usually define "right" and "wrong" ways of using the standards. What I mean by calling some uses "wrong" is not intended to discourage the creative use of DwC terms in RDF. What I mean is that one must be careful to make sure that RDF statements mean what is intended. Here is an example. The Dublin Core term dcterms:language means "the language of the resource". On multiple occasions, I've seen this term used in RDF as a property of a resource whose metadata is written in a certain language. This is "wrong" because the subject of the statement is the resource itself, not the resource's metadata. The need for this kind of clarity is apparent in the case of media. For example, if we are providing metadata in English that describes a nature film which has audio in German, the correct statement is that [film] dcterms:language "de", NOT [film] dcterms:language "en". This problem is handled appropriately in the MRTG schema by creating the (required) term mrtg:metadataLanguage. The correct statement would be [film] mrtg:metadataLanguage "en" . (I'm using "[film]" in lieu of a URI identifier for the film.) If, however, we were writing RDF to describe the metadata itself rather than the film, then it would be appropriate to say [film's metadata] dcterms:language "en" . In straight XML, we might get away with semantic sloppiness if the senders and receivers of the XML "understand" what the intended subject is of the term dcterms:language. But in RDF, we have to assume that the receiver of the RDF is a "stupid" computer which only infers exactly what is said and not what we MEANT to say.
I believe that this is a very important point that all parties need to keep in mind before we happily march off creating RDF templates for the general public to use. In particular, I have some serious problems with the way that people are associating properties with instances of the dwc:Occurrence class. I believe that these "wrong" ways originate with the historical roots of Darwin Core as a means to describe specimens. I will illustrate what I mean. In many cases, a specimen is created by killing an organism and gluing it to a piece of paper (if it's a plant) or putting it in a jar (if it's an animal). It is natural to ask the question "what kind of species is the specimen?". We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes sense. However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include specimens but which also includes observations and probably all kinds of things like images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have a scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about observations. An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila melanogaster" we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
In all of the examples above, the correct statement is [individual organism] dwc:scientificName "Drosophila melanogaster". The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism. These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really. The image is not an image of the Drosophila species concept nor is it an image of the string "Drosophila melanogaster". The image is an image of an individual fruit fly. The individual fruit fly is a representative of the taxon, the image and the observation are not.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from it, collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things? Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it? That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird. We didn't get the DNA sample from the feather, we got it via a blood sample from the bird. The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it. Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost. The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject. It explains the rationale better than this email, has nice diagrams, and gives RDF examples to illustrate everything ( https://journals.ku.edu/index.php/jbi/article/view/3664). If somebody has a better idea of how to develop an internally consistent system that can handle the problems I've raised here that DOESN'T involve Individuals (i.e. other "right"[=semantically accurate] ways to express properties and relationships among Identifications, Taxa, diverse types of Occurrences, etc.) I'd like to hear what it is. Or perhaps as Stan has suggested, there needs to be a task group that can hash out alternative views. But let's have the discussion before we post models and suggest people use them.
Steve
Steve Baskauf wrote:
Stan, Thanks for the clarification. My concern here is that standard or not, if examples are posted on the Google Darwin Code site, they will have an implied "stamp of approval" and will be used by others as a template (despite that site being labeled as "for discussion and development" not everyone can post to it and that implies some authority). In the case of straight XML, that isn't really that big of an issue. XML can mean whatever one wants as long as there is an agreement between the sender and the receiver (perhaps in the form of a formal XML schema) as to what the elements represent. I believe that RDF is a different beast. When one exposes RDF, the receiver is unknown. Therefore, the RDF has to actually "mean" something to the receiver without a pre-arranged agreement. In a generic XML document, the elements can simply be a list of string values of terms with no implied "meaning" except what might be inferred by grouping them in a container element. In RDF, the elements represent properties of particular resources. I believe strongly that although there may be several "right" ways to express properties of members of DwC classes, there are many more "wrong" ways that should not be used. By "right" I mean that they make sense semantically in that the properties logically are ones that should actually belong to the described resource. I do not believe that the discussion of these issues has progressed to the point where there is a consensus on the "right" use of DwC terms for some types of resources and therefore I am opposed to the posting of RDF examples on any official Darwin Core sites without a lot more discussion UNLESS the examples are clearly labeled as examples intended for discussion and not for use as templates. If you want such examples, I can provide http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf as an example for an Individual http://bioimages.vanderbilt.edu/baskauf/79695.rdf as am example of an Occurrence that is a live plant image and http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf as an example of an Occurrence that is an herbarium specimen I would be happy to discuss the reasons why I structured the RDF as I did (although mostly those examples are already rationalized in https://journals.ku.edu/index.php/jbi/article/view/3664), but I would not go so far as to say that they are "right" without some discussion.
What I intended when I suggested that I might write some kind of guide for Darwin Core represented in RDF/XML was really a document that explained to beginners what the point was of RDF, the basics of how one can structure properties in RDF using examples that are Darwin Core terms, and options for creating URIs that refer to resources that are described in separate files or within the same file. I wasn't really suggesting that it be a full-blown recommendation with specific guidelines for the use of particular terms or structuring of files for particular classes of resources, although that would be a good thing ultimately. I guess I was seeing some kind of a beginner's guide as a way to involve more people (who aren't up on RDF) in the discussion. I don't think that it should be necessary to complete full "standards" process before such a document were made available. It would probably be better to have some kind of road map where that document would be the first segment but would later be followed by guidelines for specific classes of resources with examples. I think that such a modular approach would be the most beneficial because pieces of it could actually get done in a timely fashion rather than requiring the whole thing to be complete before any of it would be accepted.
I do think a task group for Darwin Core RDF would be a good idea. If nobody is in a huge hurry, I don't mind trying to charter such a group, although I'd be just as happy if somebody else wanted to do it and I would just try be an active participant. I will look at the links you suggested, thanks.
Steve
Blum, Stan wrote:
Steve,
The TDWG process for creating standards is here: http://www.tdwg.org/about-tdwg/process/ This is worth reading if you haven’t done so already.
Another document worth reading is the standards format specifiation http://www.tdwg.org/standards/147/ I never pushed this “standard” through public review, but it still functions a guideline for formatting and our view of what is isn’t within scope of a “standard”. In other words, we are doing our best to follow the basic ideas laid out there about the kinds of specifications:
Type 1 -- normative specification, versioned; Type 2 -- versioned, supplementary documentation; Type 3 — uncontrolled supplementary documentation.
The page of examples John and others have put up on the DarwinCore site is non-normative, uncontrolled documentation.
The thing you were proposing sounded like an applicability statement — offering guidance about how another standard, RDF, should be used in biodiversity informatics. These can also be treated as standards, and get TDWG ratification as standard, but don’t create a de-novo standard.
Interest groups and task groups are explained in the Process. If you want to create an applicability statement for RDF and DarwinCore, you could prepare a task group charter and submit it to the executive for approval. Approval would make it a formal Task Group. See other task group charters for examples.
-Stan
On 10/13/10 6:33 AM, "Steve Baskauf" steve.baskauf@vanderbilt.edu wrote:
OK, because of a momentary heavy work load I'm still in the process of getting caught up on this thread, but this is moving so fast I feel like I'm being left in the dust. Last week I offered to help facilitate creating some guidelines and examples for RDF/XML in Darwin Core. I was told that we should follow the community process of forming an interest group, getting participants, etc. and have been waiting for some guidelines on how that process is supposed to work. Now we are surging ahead with examples and help pages again. Are we following a process or not and if so, what is it? Steve
Tim Robertson (GBIF) wrote:
I will also help with examples. If we are doing XML / RDF formats, lets get an example record conforming to the Text guidelines in there as well for completeness (most useful when dealing with checklists).
On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" < mdoering@gbif.org> wrote:
Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down
the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of
taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being
maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I
do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence
record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots
Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon
as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence"
to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is
restricted by Location (jerry's view)
then this is equivalent to saying that Nativeness is a property of an
Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are
properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list,
having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [
tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [ deepreef@bishopmuseum.org]
Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a
property of
a taxon in a class of habitat, but few people actually frame it this
way).
The reason I think that "Nativeness" is best represented as a property
of an
Occurrence, rather than of a taxon, is that a taxon is a circumscribed
set
of organisms, usually based on evolutionary relatedness or
morphological or
genetic similarity. By contrast, an Occurrence is about the presence
of a
member or multiple members of a taxon concept in space and time (i.e.,
at a
particular place and time).
We often think of Occurrence records in terms of individual organisms
(e.g.,
specimens, or specific observed or photographed organisms), and I
agree,
it's weird to think of "Nativeness" as it applies to an individual
organism.
However, my understanding is that Occurrence instances can also apply
to
populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon
at a
particular locality, the way that this intersection is usually manifest
in
DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of Jerry Cooper
Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.orgdeepreef@bishopmuseum.org
]
Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of Jerry Cooper
Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of Richard Pyle
Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
<mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au>
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org<deepreef@bishopmuseum.org>
]
Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of
Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
<mailto:Donald.Hobern@csiro.au Donald.Hobern@csiro.au>
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com gtuco.btuco@gmail.com] On Behalf Of
John Wieczorek
Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu;
tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org]
On Behalf Of joel sachs
Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com;
tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies Knowledge Base http://lod.geospecies.org/ About the GeoSpecies Knowledge Base http://about.geospecies.org/
In many cases, a specimen is created by killing an organism and gluing it
to a
piece of paper (if it's a plant) or putting it in a jar (if it's an
animal).
It is natural to ask the question "what kind of species is the specimen?".
We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes
sense.
However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include
specimens
but which also includes observations and probably all kinds of things like
images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens
we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have
a
scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about
observations.
An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila
melanogaster"
we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
OK, I admit that I have not been following this list as closely as I should have -- especially during the latter half of 2009. But I have to ask....seriously....is this the level of misunderstanding that still exists in our community?
Perhaps I'm the idiot here, but it has *always* been my understanding that the "thing" (I hesitate to use the word "basis") of an Occurrence instance is *always* the organism (or set of organisms, or impression of an organism in the case of fossils). If the organisms were captured and preserved in a Museum, then we call it a specimen. If the organisms were only witnessed and not captured, we call it an observation. Everything else (including the physical specimen) is just layers of evidence to support the existence and taxonomic identification of the organism within the Occurrence. When photons reflected off the outer surface of an organism find their way through a lense and onto some mechanism for recording said photos (either a human retina and neurons in the brain, or sheet of celluloid, or digital image sensor and memory stick), it's still the organism that the photons reflected off of, which represents the "thing" of the Occurrence to which metadata apply. Same goes for vocalizations transmitted through pressure waves in the air onto some recording device (ear/brain, or microphone/tape).
So while it's certainly true that a media object such as a 35mm slide or digital image file does not itself have a scientificName (then again, some of my old Kodachromes have enough mold on them that they might....), said media objects are *not* the Occurrence itself -- they merely represent evidence of the occurrence. Even a specimen in a jar is not the Occurrence itself. The Occurrence occurred when the specimen was captured (e.g., 400 feet deep on a coral reef). A specimen in a jar on a shelf in a Museum is no longer the "Occurrence"; it is the evidence of the Occurrence.
When I assign a GUID to an Occurrence record that lacks a voucher (i.e., an "Observation"), I'm certainly not trying to identify the act of observation; I'm identifying the organism that was observed, at the time and place that it was observed.
For what it's worth, if I only have a still or video image of an organism (e.g., http://www.youtube.com/watch?v=GVTd11q3Ppc; taken by Rob Whitton, who some of you met at TDWG this year), and didn't collect the specimen, I create an Observation record, and link the image to it as associatedMedia. I would never assign a taxon name to the video clip -- only to the "content item" of the video that represents an organism, serving as the basis of an Occurrence record.
The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism.
I would say in all three cases that the presence of an organism at a place and time was the Occurrence. Specimens, images, and reported observations are merely the evidence that the occurrence existed (and to varying degrees, can also allow for subsequent interpretations of taxonomic identification).
These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really.
I completely agree. The occurrence was the organism at a place and time. The "species" is merely the taxon concept that someone identified the organism as belonging to. The scientificName is merely the label that someone applied to the taxon concept. In other words, the scientificName is really a property of the Taxon Concept, and the Taxon Concept is the subject of an identification event, and the identification event was applied to the organism, which itself represents the basis of an Occurrence. But very few people go to the trouble of creating that full chain of relationships, so as a short-hand, the scientificName is often treated as a direct property of the occurrence (collected or observed organism). I think this short-hand is perfectly fine in the context of DwC, but only as long as people understand the implied chain of linked entities. If we start to forget what's really going on, then we run into trouble.
Which, I guess, was the whole point of Steve's post.
What concerns me, though, is that we're not (yet?) already beyond this.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from
it,
collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things?
Two Occurences: The first one when it was captured, photographed, and relieved of a feather. The second when it was observed at a later date.
Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it?
We create an identifier for the first Occurrence, capture the specimen-relevant metadata of the preserved feather, and track the DNA sample via associatedSequences.
That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird.
It's certainly different from an image of the whole Bird, but that doesn't preclude us from including both bird and feather images among associatedMedia for the first Occurrence.
We didn't get the DNA sample from the feather, we got it via a blood sample from the bird.
I don't see that as a problem, because the feather is only the evidence of the bird at the place and time (i.e., the first Occurrence). Thus, the sequence can still be included as part of the associatedSequences for the first Occurrence.
The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it.
Agreed -- it forms the basis for the second Occurrence record (later date). The two Occurrence records can be cross referenced, either via a shared individualID, or via associatedOccurrences.
Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
As someone else alluded to earlier in this thread, there are near-infinite ways that we can slice & cluster biodiversity data. I think there are some cases where "individual" makes a lot of sense as a class (banded birds, managed organisms in zoos and curated gardens, whale and shark observation datasets, plant monitoring projects, etc.). But I think the notion of "Occurrence" makes more sense at this point in biodiversity informatics history, because the vast majority of datasets can be organized in this way realtively painlessly, and because the majority of questions being asked of these data revolve around presence of organisms identified to taxon concepts occurring at place and time.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost.
Myself among them. Thank you for presenting it in the less-efficient English Prose form.
The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
Well...I certainly agree with you that we need *clear* documentation on what these classes are intended to represent. I had *thought* it was clear that an Occurrence was as I have outlined above. But like I said, I'm perfectly willing to accept that I'm the idiot in this case, and am completely out of phase with the rest of the community.
As to whether or not we need to define a class for Individual, I'm not so sure that's entirely necessary. I guess DwC is already primed for it (http://rs.tdwg.org/dwc/terms/index.htm#individualID) -- but I'm not sure what properties would apply to such a class that are not already covered in DwC. Pronbably the next intieration of DwC would move some of the properties of the Occurrence class (catalogNumber, individualCount, preparations, disposition, associatedSequences, previousIdentifications) over to the Individual Class, at which point the Occurrence becomes the intersection of an Individual and an Event.
But let me ask: how would you scope "Individual"? (see my previous rants on this list in recent days) Would it be restricted to a particular individual organism? Or, would it be extended to include specified groups of organisms (as dwc:individualID already does)? What about populations? Taxon Concepts?
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject.
I've had gotten through the first few pages, and intend to finish soon. But it's much more fun to write emails about this stuff..... :-)
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
Steve, just a quick mail while Im still reading your longer post - it might be outdated already...
It feels we are trapped in the old debate about simple vs complex models. Darwin Core was not meant to be a full, exact model. Its the "core" of the data we are dealing with and often contains shortcuts as Rich explained. There are much richer models around, but you will find it hard to exchange data based on them between very heterogenous databases.
You might be interested in looking at the EDIT CDM how it implements the idea of derived specimens/observations: http://wp5.e-taxonomy.eu/cdm/v22/EARoot/EA8/EA246.png (diagram taken from complete model at http://wp5.e-taxonomy.eu/cdm/v22/ )
that is actually based on the DCEFD model published by walter in 1997: http://www.bgbm.org/CDEFD/CollectionModel/units.htm
Some quick remarks to increase the confusion: - asserting that several occurrences are talking about the same individual can be done via dwc:individualID already. How this knowledge is established is rather difficult I would think, but for some occurrences at least banding or dna fingerprints might be a way - the scientific name shortcut in dwc is most often rejected by people that need to track identification histories
Markus
On Oct 14, 2010, at 10:53, Richard Pyle wrote:
In many cases, a specimen is created by killing an organism and gluing it
to a
piece of paper (if it's a plant) or putting it in a jar (if it's an
animal).
It is natural to ask the question "what kind of species is the specimen?".
We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes
sense.
However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include
specimens
but which also includes observations and probably all kinds of things like
images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens
we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have
a
scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about
observations.
An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila
melanogaster"
we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
OK, I admit that I have not been following this list as closely as I should have -- especially during the latter half of 2009. But I have to ask....seriously....is this the level of misunderstanding that still exists in our community?
Perhaps I'm the idiot here, but it has *always* been my understanding that the "thing" (I hesitate to use the word "basis") of an Occurrence instance is *always* the organism (or set of organisms, or impression of an organism in the case of fossils). If the organisms were captured and preserved in a Museum, then we call it a specimen. If the organisms were only witnessed and not captured, we call it an observation. Everything else (including the physical specimen) is just layers of evidence to support the existence and taxonomic identification of the organism within the Occurrence. When photons reflected off the outer surface of an organism find their way through a lense and onto some mechanism for recording said photos (either a human retina and neurons in the brain, or sheet of celluloid, or digital image sensor and memory stick), it's still the organism that the photons reflected off of, which represents the "thing" of the Occurrence to which metadata apply. Same goes for vocalizations transmitted through pressure waves in the air onto some recording device (ear/brain, or microphone/tape).
So while it's certainly true that a media object such as a 35mm slide or digital image file does not itself have a scientificName (then again, some of my old Kodachromes have enough mold on them that they might....), said media objects are *not* the Occurrence itself -- they merely represent evidence of the occurrence. Even a specimen in a jar is not the Occurrence itself. The Occurrence occurred when the specimen was captured (e.g., 400 feet deep on a coral reef). A specimen in a jar on a shelf in a Museum is no longer the "Occurrence"; it is the evidence of the Occurrence.
When I assign a GUID to an Occurrence record that lacks a voucher (i.e., an "Observation"), I'm certainly not trying to identify the act of observation; I'm identifying the organism that was observed, at the time and place that it was observed.
For what it's worth, if I only have a still or video image of an organism (e.g., http://www.youtube.com/watch?v=GVTd11q3Ppc; taken by Rob Whitton, who some of you met at TDWG this year), and didn't collect the specimen, I create an Observation record, and link the image to it as associatedMedia. I would never assign a taxon name to the video clip -- only to the "content item" of the video that represents an organism, serving as the basis of an Occurrence record.
The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism.
I would say in all three cases that the presence of an organism at a place and time was the Occurrence. Specimens, images, and reported observations are merely the evidence that the occurrence existed (and to varying degrees, can also allow for subsequent interpretations of taxonomic identification).
These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really.
I completely agree. The occurrence was the organism at a place and time. The "species" is merely the taxon concept that someone identified the organism as belonging to. The scientificName is merely the label that someone applied to the taxon concept. In other words, the scientificName is really a property of the Taxon Concept, and the Taxon Concept is the subject of an identification event, and the identification event was applied to the organism, which itself represents the basis of an Occurrence. But very few people go to the trouble of creating that full chain of relationships, so as a short-hand, the scientificName is often treated as a direct property of the occurrence (collected or observed organism). I think this short-hand is perfectly fine in the context of DwC, but only as long as people understand the implied chain of linked entities. If we start to forget what's really going on, then we run into trouble.
Which, I guess, was the whole point of Steve's post.
What concerns me, though, is that we're not (yet?) already beyond this.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from
it,
collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things?
Two Occurences: The first one when it was captured, photographed, and relieved of a feather. The second when it was observed at a later date.
Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it?
We create an identifier for the first Occurrence, capture the specimen-relevant metadata of the preserved feather, and track the DNA sample via associatedSequences.
That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird.
It's certainly different from an image of the whole Bird, but that doesn't preclude us from including both bird and feather images among associatedMedia for the first Occurrence.
We didn't get the DNA sample from the feather, we got it via a blood sample from the bird.
I don't see that as a problem, because the feather is only the evidence of the bird at the place and time (i.e., the first Occurrence). Thus, the sequence can still be included as part of the associatedSequences for the first Occurrence.
The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it.
Agreed -- it forms the basis for the second Occurrence record (later date). The two Occurrence records can be cross referenced, either via a shared individualID, or via associatedOccurrences.
Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
As someone else alluded to earlier in this thread, there are near-infinite ways that we can slice & cluster biodiversity data. I think there are some cases where "individual" makes a lot of sense as a class (banded birds, managed organisms in zoos and curated gardens, whale and shark observation datasets, plant monitoring projects, etc.). But I think the notion of "Occurrence" makes more sense at this point in biodiversity informatics history, because the vast majority of datasets can be organized in this way realtively painlessly, and because the majority of questions being asked of these data revolve around presence of organisms identified to taxon concepts occurring at place and time.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost.
Myself among them. Thank you for presenting it in the less-efficient English Prose form.
The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
Well...I certainly agree with you that we need *clear* documentation on what these classes are intended to represent. I had *thought* it was clear that an Occurrence was as I have outlined above. But like I said, I'm perfectly willing to accept that I'm the idiot in this case, and am completely out of phase with the rest of the community.
As to whether or not we need to define a class for Individual, I'm not so sure that's entirely necessary. I guess DwC is already primed for it (http://rs.tdwg.org/dwc/terms/index.htm#individualID) -- but I'm not sure what properties would apply to such a class that are not already covered in DwC. Pronbably the next intieration of DwC would move some of the properties of the Occurrence class (catalogNumber, individualCount, preparations, disposition, associatedSequences, previousIdentifications) over to the Individual Class, at which point the Occurrence becomes the intersection of an Individual and an Event.
But let me ask: how would you scope "Individual"? (see my previous rants on this list in recent days) Would it be restricted to a particular individual organism? Or, would it be extended to include specified groups of organisms (as dwc:individualID already does)? What about populations? Taxon Concepts?
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject.
I've had gotten through the first few pages, and intend to finish soon. But it's much more fun to write emails about this stuff..... :-)
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks for the various replies. I'm going to try to respond to several of them in this one. I realize that these lengthy replies may overwhelm some readers. However, I will beg your collective indulgence because I've got a proposal on the table for adding Individual as a Darwin Core class. It appears that the submission process is moving forward and you can consider this as the "pleading of my case" for why that addition is desirable (and in my opinion) necessary.
One point which I think has permeated the Darwin Core discussions since I've started following them is that DwC is designed to facilitate many uses. Although somebody might use Occurrence records to make dots on a distribution map, somebody else might be using the same records to track the movement of the individual organism as it swims around the sea. Somebody else may just be using the location and time metadata to demonstrate that the photo that they took places the organism in a reasonable location for the species they assert they have photographed. Another person may be using the location and time metadata to indicate that two species co-occurred at the same location at the same time. Darwin Core will be functioning well when it allows occurrence records to do any of these things or possibly all of these things at the same time. The case that I'll try to make here is that Darwin Core mostly allows these things, but lack of an Individual class is making it difficult to do some of them. I will illustrate with a couple examples.
The first one is the problem of tracking an individual over time. As Rich correctly points out, the "new" Darwin Core standard has the term dwc:individualID which is designed to facilitate exactly this kind of thing. In a previous thread when we discussed the appropriate use of the xxxxID terms, I believe that there was a consensus that using them as "idrefs" (I can't remember the technical database term for this, I mean when an item in a record points to the identifier of another record) was appropriate. In a flat "table-based" database system, you would just have a table of records (i.e. rows) for some kind of "thing" with a column heading of "xxxxID". You would place the identifier for the related other thing in that column. In the case of dwc:individualID, the rows would be occurrence records and the entry in the individualID column would be the identifier for the individual. In RDF, you would make statements asserting the relationship between the thing and the other thing. For example, if you wanted to say that a dwc:Identification asserted that something was a particular dwc:Taxon, you could make the statement in RDF that [identification] dwc:taxonID [taxon], where [identification] and [taxon] are instances of those two classes that have been assigned some kind of (hopefully gobally unique) identifiers. In the case of asserting that a number of occurrence records track the same individual over time, in RDF, I would for each occurrence make the statement [occurrence] dwc:individualID [individual]. That's great and I can (and do) do that with Darwin Core as it exists. The problem that I face is that in RDF any time that one makes a statement about a resource (I'm switching to that term because "thing" is to vague) using an identifier for it (in the form of a URI), the identifier must dereference (resolve? sorry Bob!) to produce metadata about the resource. So when I assign a URI to an individual organism a semantic client should be able to retrieve information about the individual. One of the fundamental pieces of information that a client should (according to the TDWG GUID applicability statement) be given about a resource is what type of thing the resource is. This is called the "rdfs:type" of the resource. The TDWG Applicability statement (recommendation 11) says that resources identified by a GUID "should be typed using the TDWG ontology or other well-known vocabularies". I hate to be cynical about this, but I don't have confidence that the TDWG ontology will be ready to use in my lifetime. The only "well-known" vocabulary that I know of that will work for this purpose at the moment is Darwin Core and the Darwin Core classes are just right for typing all of the kinds of resources I want to talk about (occurrences, taxon, identifications, etc.) EXCEPT for Individuals. I think that dwc:individualID is the only one of the xxxxID terms that refers to a type of thing that doesn't have a class defined for it, hence my request to add Individual as a class. At the TDWG meeting, somebody (Roger maybe?) commented that there isn't anything that would stop me from creating my own URI for an Individual class. That is absolutely true and I already did that (http://bioimages.vanderbilt.edu/rdf/terms#Individual), but that doesn't make my term "well known". I want Individual to be a class in Darwin Core so that people other than me know what it means. There is no way that I can currently follow the "rules" for GUIDs and RDF on this, and anybody in the future who uses dwc:individualID in RDF is going to face this same problem (i.e. anyone who wants to track individuals over time).
In the case of putting "dots on a map" to show the distribution of a species, the case is simple if the occurrences are specimens where the whole dead organism is collected. It is not so simple with other types of occurrences. Let me illustrate with an example. There is currently precisely one known individual of Crataegus harbisonii in nature. I have given this individual the URI http://bioimages.vanderbilt.edu/ind-baskauf/70905 . I have approximately 62 images of that individual at http://bioimages.vanderbilt.edu/ind-baskauf/70905.htm and http://www.cas.vanderbilt.edu/bioimages/species/crha2.htm . Each one of these images represents an occurrence in that I pressed the shutter on my camera at different times for each one. Ron Lance has collected tissue from this tree for grafting purposes and now has an occurrence with basisOfRecord="LivingSpecimen" in his arboretum in North Carolina. Andrea Bishop of the Tennessee Dept of Environment and Conservation has seeds collected from the tree - I'd call the collection of those seeds an occurrence record. I'm pretty sure that there are one or more specimens from this tree in herbaria (although I'm not sure where). So my question to Marcus and others at GBIF is: how many dots will you put on your map for this tree? 65 (one for each occurrence) or 1 (one for each individual)? I think the answer should be one, but it isn't clear to me how a data aggregator is going to achieve the goal of having one dot per individual if the basic unit "dot creation" is an occurrence rather than an individual. At the present moment, this question seems like a moot point because most records in big databases like GBIF are based on one specimen (or observation) per record of an individual, but that won't necessarily be the case in the future if people take multiple live organism images, perhaps also at the same time they collect a physical specimen. I anticipate that one response to this question will be to call each imaging bout one "observation" having a number of dwc:associatedMedia references. That collapses the number of occurrence records considerably, but not down to one. I took images of that tree on at least three separate instances over the course of a year and Ron collected his graft tissue years before that. There is simply no way to reduce the number of occurrences for this tree to one, nor should we want to. A possible use of multiple occurrence records (i.e. my first point above) of this sort might be to establish how long individuals of Crataegus harbisonii live and each occurrence record (whether separated by years or by the seconds between shutter clicks) is a part of the record that we should be able to (and want to) preserve. Another use would be to track a non-sessile organism (e.g. a whale) in both time and space. In that case, the record on a map for an individual would be some kind of curve rather than a dot. But in any case, recognizing the existence of an entity that I'm calling an Individual facilitates these broader uses of occurrence data and it's really hard for me to see how that is going to happen if we ONLY have occurrences as separate entities. Response Markus? How does GBIF deal with whale tracks or multiple banded bird observations for a single bird?
The third compelling reason for recognizing the existence of Individuals as a resource type is that it is the best way to maintain the linkage between multiple occurrences of the same individual and identifications. (In the oversimplified examples I gave earlier, I applied a scientific name directly to an individual. In actual practice, I relate individuals to identifications and then relate the identifications to taxa.) Again, to illustrate with a real-life example, when Bruce Kirchoff was developing his Woody Plants of the Southeastern US learning software, he asked a taxonomist to go through the images of mine that he was using for the project to verify that they were identified correctly. My old website just threw together all images of a particular species onto one page without regard to the individuals from which they originated (e.g. http://www.cas.vanderbilt.edu/bioimages/species/sarar3.htm and http://www.cas.vanderbilt.edu/bioimages/species/soam3.htm). It turns out that I had carelessly misidentified a vegetative Sambucus racemosa ssp. racemosa individual as Sorbus americana. The taxonomist asked me which of the various bark, twig, leaf, etc. images were from the same plant and the only way I could find out was through the laborious process of looking for images with similar time/date values and my hand written field notes. It was a nightmare finding all of the particular image records that needed to have their identifications fixed and then correcting them. On my new website (e.g. http://bioimages.vanderbilt.edu/metadata.htm, then click on Quercus chrysolepis), the images are connected to the individual from which they originated. If I discover by looking at a particularly informative image that I have misidentified the individual, I only need to add an updated determination (i.e. identification) to that individual's record and automatically all images from that individual are displayed with the correct name and are placed on the correct species page. Now imagine a situation that is larger and even more complicated than this (think a Bioblitz). Herbarium curators and live plant photographers are working together to document the flora of an area. Multiple images and multiple specimens may be collected from the same individual. The images may go one place and the specimens may go to several herbaria (if "duplicates" are distributed). It's possible that people might come back to the same individual later to photograph or collect fruit having initially seen flowers. Suppose on down the line a taxonomist looks at one of the specimen duplicates and realizes that the initial identification was wrong (or maybe just wants to assert an alternative opinion about the identity). If the record is based on that individual, then all that is required is for the annotating taxonomist to add a determination (i.e. dwc:Identification) to the Individual's record and poof! all images and duplicate specimens have that opinion associated with them. In contrast, if all of these separate occurrence records are not tied together via the Individual, and if each individual occurrence record has its own determination, nobody is possibly going to ever track down and correct every one. Granted, the scenario that I've suggested is contingent on the existence of a large scale database that can connect metadata across institutions, but exactly that kind of thing is what projects like the US Virtual Herbarium and our Live Plants Imaging group are trying to create. Let's enable this by making it possible within Darwin Core to have a record structure that is Individual-based.
I recognize that many "specimen-based" organizations aren't really going to care one whit about this. That's fine. In their databases and personal XML schemas they can ignore Individuals as it is their prerogative. But when we build RDF templates, I believe strongly that for the benefit of those of us who care about the broader applications of occurrences those templates should use individuals to connect (one or more) occurrences and (one or more) identifications. For those with a technical bent, you can see how I have done this for an herbarium specimen by looking at the page source RDF of the example http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf . For those of a non-technical bent, just look at the webpage that shows up when you click on the link. It looks just like any other web page for a specimen and you don't even have to know that the underlying RDF supports using Individuals as a grouping mechanism.
In summary, I think we need Individual as a DwC class to enable understandable rdfs:typing of records of individuals and to create a context in which instances of individuals can be placed (i.e. people would assign and use identifiers for individuals when they document occurrences). These instances (and their assigned URI GUIDSs) would allow for "connecting" identifications and occurrences in a more meaningful way. I am not suggesting that the occurrence be dethroned as the center of biodiversity records. Assuming that the xxxxID terms end up being moved out of the various classes and into the record-level terms area as was suggested recently, I think that there are really only about two terms that should be put into a new Individual class: the other new term I have proposed (individualRemarks) and establishmentMeans (but that is the topic of another email). It may seem odd to suggest a adding a class that has very few terms in it, but if you follow my reasoning above you will hopefully understand why I have done so.
I hope that the discussion (and criticism!) will continue. Again, I'm interested in hearing alternatives. Steve
Richard Pyle wrote:
In many cases, a specimen is created by killing an organism and gluing it
to a
piece of paper (if it's a plant) or putting it in a jar (if it's an
animal).
It is natural to ask the question "what kind of species is the specimen?".
We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes
sense.
However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include
specimens
but which also includes observations and probably all kinds of things like
images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens
we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have
a
scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about
observations.
An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila
melanogaster"
we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
OK, I admit that I have not been following this list as closely as I should have -- especially during the latter half of 2009. But I have to ask....seriously....is this the level of misunderstanding that still exists in our community?
Perhaps I'm the idiot here, but it has *always* been my understanding that the "thing" (I hesitate to use the word "basis") of an Occurrence instance is *always* the organism (or set of organisms, or impression of an organism in the case of fossils). If the organisms were captured and preserved in a Museum, then we call it a specimen. If the organisms were only witnessed and not captured, we call it an observation. Everything else (including the physical specimen) is just layers of evidence to support the existence and taxonomic identification of the organism within the Occurrence. When photons reflected off the outer surface of an organism find their way through a lense and onto some mechanism for recording said photos (either a human retina and neurons in the brain, or sheet of celluloid, or digital image sensor and memory stick), it's still the organism that the photons reflected off of, which represents the "thing" of the Occurrence to which metadata apply. Same goes for vocalizations transmitted through pressure waves in the air onto some recording device (ear/brain, or microphone/tape).
So while it's certainly true that a media object such as a 35mm slide or digital image file does not itself have a scientificName (then again, some of my old Kodachromes have enough mold on them that they might....), said media objects are *not* the Occurrence itself -- they merely represent evidence of the occurrence. Even a specimen in a jar is not the Occurrence itself. The Occurrence occurred when the specimen was captured (e.g., 400 feet deep on a coral reef). A specimen in a jar on a shelf in a Museum is no longer the "Occurrence"; it is the evidence of the Occurrence.
When I assign a GUID to an Occurrence record that lacks a voucher (i.e., an "Observation"), I'm certainly not trying to identify the act of observation; I'm identifying the organism that was observed, at the time and place that it was observed.
For what it's worth, if I only have a still or video image of an organism (e.g., http://www.youtube.com/watch?v=GVTd11q3Ppc; taken by Rob Whitton, who some of you met at TDWG this year), and didn't collect the specimen, I create an Observation record, and link the image to it as associatedMedia. I would never assign a taxon name to the video clip -- only to the "content item" of the video that represents an organism, serving as the basis of an Occurrence record.
The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism.
I would say in all three cases that the presence of an organism at a place and time was the Occurrence. Specimens, images, and reported observations are merely the evidence that the occurrence existed (and to varying degrees, can also allow for subsequent interpretations of taxonomic identification).
These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really.
I completely agree. The occurrence was the organism at a place and time. The "species" is merely the taxon concept that someone identified the organism as belonging to. The scientificName is merely the label that someone applied to the taxon concept. In other words, the scientificName is really a property of the Taxon Concept, and the Taxon Concept is the subject of an identification event, and the identification event was applied to the organism, which itself represents the basis of an Occurrence. But very few people go to the trouble of creating that full chain of relationships, so as a short-hand, the scientificName is often treated as a direct property of the occurrence (collected or observed organism). I think this short-hand is perfectly fine in the context of DwC, but only as long as people understand the implied chain of linked entities. If we start to forget what's really going on, then we run into trouble.
Which, I guess, was the whole point of Steve's post.
What concerns me, though, is that we're not (yet?) already beyond this.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from
it,
collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things?
Two Occurences: The first one when it was captured, photographed, and relieved of a feather. The second when it was observed at a later date.
Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it?
We create an identifier for the first Occurrence, capture the specimen-relevant metadata of the preserved feather, and track the DNA sample via associatedSequences.
That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird.
It's certainly different from an image of the whole Bird, but that doesn't preclude us from including both bird and feather images among associatedMedia for the first Occurrence.
We didn't get the DNA sample from the feather, we got it via a blood sample from the bird.
I don't see that as a problem, because the feather is only the evidence of the bird at the place and time (i.e., the first Occurrence). Thus, the sequence can still be included as part of the associatedSequences for the first Occurrence.
The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it.
Agreed -- it forms the basis for the second Occurrence record (later date). The two Occurrence records can be cross referenced, either via a shared individualID, or via associatedOccurrences.
Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
As someone else alluded to earlier in this thread, there are near-infinite ways that we can slice & cluster biodiversity data. I think there are some cases where "individual" makes a lot of sense as a class (banded birds, managed organisms in zoos and curated gardens, whale and shark observation datasets, plant monitoring projects, etc.). But I think the notion of "Occurrence" makes more sense at this point in biodiversity informatics history, because the vast majority of datasets can be organized in this way realtively painlessly, and because the majority of questions being asked of these data revolve around presence of organisms identified to taxon concepts occurring at place and time.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost.
Myself among them. Thank you for presenting it in the less-efficient English Prose form.
The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
Well...I certainly agree with you that we need *clear* documentation on what these classes are intended to represent. I had *thought* it was clear that an Occurrence was as I have outlined above. But like I said, I'm perfectly willing to accept that I'm the idiot in this case, and am completely out of phase with the rest of the community.
As to whether or not we need to define a class for Individual, I'm not so sure that's entirely necessary. I guess DwC is already primed for it (http://rs.tdwg.org/dwc/terms/index.htm#individualID) -- but I'm not sure what properties would apply to such a class that are not already covered in DwC. Pronbably the next intieration of DwC would move some of the properties of the Occurrence class (catalogNumber, individualCount, preparations, disposition, associatedSequences, previousIdentifications) over to the Individual Class, at which point the Occurrence becomes the intersection of an Individual and an Event.
But let me ask: how would you scope "Individual"? (see my previous rants on this list in recent days) Would it be restricted to a particular individual organism? Or, would it be extended to include specified groups of organisms (as dwc:individualID already does)? What about populations? Taxon Concepts?
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject.
I've had gotten through the first few pages, and intend to finish soon. But it's much more fun to write emails about this stuff..... :-)
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
.
Steve,
I think that it might be useful to point out that there are actually two different "kinds" of DarwinCore at least as I see it
1) The first is the current more XML-ish version that people use for their records, A format largely intended to consumed by GBIF. This is a representation that allows users to map the Excel or RDMB tables to a standard set of fields. This appears to pretty close to workable. Two areas where I am not sure about are:
Does this form need an "Individual" class as you propose? Should this form allow the use of the geo vocabulary?
In this version, something like TaxonConceptID is probably fine because the consuming application will know to look at the contents of that field, determine if it is an LSID or a URI and handle it appropriately. Generic semantic web tools will not be able to handle these but if this is understood ahead of time, it is not a problem. These are not made for generic tools but for specific tools and use cases.
So what I propose is we address the issues of the current XMLish DarwinCore so that it handles some of the issues you mention.
I think Markus would probably have some valuable insight into how this can be done in a way that will work with GBIF.
This should result in a clear form 1 version of the Darwin core fairly quickly.
This allows the submission process to continue without confusion while the issues of the other version get worked out.
For most users, form 1 is all they will need to understand.
2) The second version is a more fully semantacized version which I think will require a lot more discussion. This version should be understandable by the generic semantic web tools and should ideally also work well in the LOD cloud. But it will take some time for us to agree how this should be done and even longer for the general community to get familiar with it. For most users, form 1 is all they will need to understand.
Once we have actually figured out a more semantic version that makes sense, GBIF can process and express the submitted data in this version. (I am over simplifying here but I will be more specific below)
This process might take a while and I think a number of groups will probably want to stick to submitting their data in DWC form 1.
I think the issue that you and I have run into are where we are trying to use form 1 as if it was more like form 2 and running into problems.
Some of these issues are relatively easy to solve, others are very tricky and are only visible when you try to run specific kinds of SPARQL queries on the knowledge base.
Others are even more complicated. For instance, I see that there might be a need for several forms of "species concepts".
Those that I have created have specific use cases in mind. A lot of the use cases that Steve's describes overlap with my use cases.
Most of my concepts are structured to provide basic informatin and a map to other potentially related information sources.
What I eventually intend them to be serve as a form of "key element" that that allows different people to repeatably map the same specimens to the same concepts.
I think these will eventually useful as concepts for DWC form 1 because they basically represent a "key element"
As currently proposed, do they cover some of the issues and relationships handled by Rich Pyle's TNC's?
No they don't.
(We do think that we can make some interesting and useful interlinkages between the TNC's and the TXN's)
There are probably other groups with use cases for species concepts which may necessitate different underlying assumptions.
So, for just the aspects relating to species concepts, there still needs to be a lot more discussion.
As to the related things like occurrences, and individuals, it is clear to me there are a lot of similarities and overlap.
The differences seem to mainly involve the specifics of the RDF representations.
There are other aspects where the meaning or mental constructs are different for what - may at first - seem similar. For instance, I would see the specimens derived from one individual plant but now existing as two separate plant in a different locations as a separate individuals. In the same way as I see identical twins as different individuals.
So we have a slightly different conceptualization of what an "individual" is.
To solve this we will need to have some sort of meeting or videoconference between those groups that have a pretty good understanding of RDF. This will allow us to hash these issues out. Here we could explain our different conceptualizations and use cases and see if we can come up with some common standards that will allow people to do what they need to do while following a common standard.
This will need some sort of whiteboard enabled discussion, sample data sets, and example SPARQL queries.
We should also have use cases in the form of SPARQL queries, that allow the resulting knowledge bases to be successfully queried in the ways that people need, and return the kinds of results they expect.
It appears to me that the more semanticized version of the DarwinCore will require RDF that differentiated between:
1) entries that are literals, 2) entries that a LOD compatible URI's 3) entries that are LSID's dereferencable via a proxy.
There are a number basic issues involving RDF, which the members will need to have a common understanding about.
Does the entire TDWG community have to understand these nuances?
No, but the group working on a more semantic version will need to.
When we think we have something workable, I would suggest that we have others in the semantic web and LOD community look at it to see if we missed something.
If this all goes well, we will have an example data set, and documentation that explains how it works and why specific things are done the way they are.
This then goes to GBIF (if they want it) where they can work out exposing their current records in this new format.
My guess is that this might take a few passes to get right - simply because that is the nature of the beast.
GBIF will have to come up with some A.I type error and interpretation system to process data. (Which is a pretty cool problem in itself)
They seem to already have a lot of this in place.
The initial example set and a GBIF set can be included in some of the gigantic billion triple challenge and inferencing studies that are going on.
I suspect that these studies will expose some issues and insights that we would have never been able to see ourselves.
These studies will also feed back on the process standards process, where we can figure out how to deal with the strange edge-cases they expose.
In the end, we should have a well-vetted format, that is also well understood.
So what I propose is that do the following to work out a future more semantic version of the DarwinCore:
We break this into a smaller group to work on pulling together the similar semantic representations, come with a list of use cases and a test set.
The discussion of this new version should probably be moved to one of the separate lists (as, I think, Markus suggested).
This will avoid confusion between issues in the current DarwinCore and those issues relating to some more fully "semantic" future representation.
While this potential standard gets worked out groups can use form one of the DarwinCore to submit their records.
As I said previously, I think this might be the only version they really need to be familiar with.
Does this plan seem to make sense to everyone?
Respectfully,
- Pete
On Thu, Oct 14, 2010 at 10:43 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
Thanks for the various replies. I'm going to try to respond to several of them in this one. I realize that these lengthy replies may overwhelm some readers. However, I will beg your collective indulgence because I've got a proposal on the table for adding Individual as a Darwin Core class. It appears that the submission process is moving forward and you can consider this as the "pleading of my case" for why that addition is desirable (and in my opinion) necessary.
One point which I think has permeated the Darwin Core discussions since I've started following them is that DwC is designed to facilitate many uses. Although somebody might use Occurrence records to make dots on a distribution map, somebody else might be using the same records to track the movement of the individual organism as it swims around the sea. Somebody else may just be using the location and time metadata to demonstrate that the photo that they took places the organism in a reasonable location for the species they assert they have photographed. Another person may be using the location and time metadata to indicate that two species co-occurred at the same location at the same time. Darwin Core will be functioning well when it allows occurrence records to do any of these things or possibly all of these things at the same time. The case that I'll try to make here is that Darwin Core mostly allows these things, but lack of an Individual class is making it difficult to do some of them. I will illustrate with a couple examples.
The first one is the problem of tracking an individual over time. As Rich correctly points out, the "new" Darwin Core standard has the term dwc:individualID which is designed to facilitate exactly this kind of thing. In a previous thread when we discussed the appropriate use of the xxxxID terms, I believe that there was a consensus that using them as "idrefs" (I can't remember the technical database term for this, I mean when an item in a record points to the identifier of another record) was appropriate. In a flat "table-based" database system, you would just have a table of records (i.e. rows) for some kind of "thing" with a column heading of "xxxxID". You would place the identifier for the related other thing in that column. In the case of dwc:individualID, the rows would be occurrence records and the entry in the individualID column would be the identifier for the individual. In RDF, you would make statements asserting the relationship between the thing and the other thing. For example, if you wanted to say that a dwc:Identification asserted that something was a particular dwc:Taxon, you could make the statement in RDF that [identification] dwc:taxonID [taxon], where [identification] and [taxon] are instances of those two classes that have been assigned some kind of (hopefully gobally unique) identifiers. In the case of asserting that a number of occurrence records track the same individual over time, in RDF, I would for each occurrence make the statement [occurrence] dwc:individualID [individual]. That's great and I can (and do) do that with Darwin Core as it exists. The problem that I face is that in RDF any time that one makes a statement about a resource (I'm switching to that term because "thing" is to vague) using an identifier for it (in the form of a URI), the identifier must dereference (resolve? sorry Bob!) to produce metadata about the resource. So when I assign a URI to an individual organism a semantic client should be able to retrieve information about the individual. One of the fundamental pieces of information that a client should (according to the TDWG GUID applicability statement) be given about a resource is what type of thing the resource is. This is called the "rdfs:type" of the resource. The TDWG Applicability statement (recommendation 11) says that resources identified by a GUID "should be typed using the TDWG ontology or other well-known vocabularies". I hate to be cynical about this, but I don't have confidence that the TDWG ontology will be ready to use in my lifetime. The only "well-known" vocabulary that I know of that will work for this purpose at the moment is Darwin Core and the Darwin Core classes are just right for typing all of the kinds of resources I want to talk about (occurrences, taxon, identifications, etc.) EXCEPT for Individuals. I think that dwc:individualID is the only one of the xxxxID terms that refers to a type of thing that doesn't have a class defined for it, hence my request to add Individual as a class. At the TDWG meeting, somebody (Roger maybe?) commented that there isn't anything that would stop me from creating my own URI for an Individual class. That is absolutely true and I already did that (http://bioimages.vanderbilt.edu/rdf/terms#Individual), but that doesn't make my term "well known". I want Individual to be a class in Darwin Core so that people other than me know what it means. There is no way that I can currently follow the "rules" for GUIDs and RDF on this, and anybody in the future who uses dwc:individualID in RDF is going to face this same problem (i.e. anyone who wants to track individuals over time).
In the case of putting "dots on a map" to show the distribution of a species, the case is simple if the occurrences are specimens where the whole dead organism is collected. It is not so simple with other types of occurrences. Let me illustrate with an example. There is currently precisely one known individual of Crataegus harbisonii in nature. I have given this individual the URI http://bioimages.vanderbilt.edu/ind-baskauf/70905 . I have approximately 62 images of that individual at http://bioimages.vanderbilt.edu/ind-baskauf/70905.htm and http://www.cas.vanderbilt.edu/bioimages/species/crha2.htm . Each one of these images represents an occurrence in that I pressed the shutter on my camera at different times for each one. Ron Lance has collected tissue from this tree for grafting purposes and now has an occurrence with basisOfRecord="LivingSpecimen" in his arboretum in North Carolina. Andrea Bishop of the Tennessee Dept of Environment and Conservation has seeds collected from the tree - I'd call the collection of those seeds an occurrence record. I'm pretty sure that there are one or more specimens from this tree in herbaria (although I'm not sure where). So my question to Marcus and others at GBIF is: how many dots will you put on your map for this tree? 65 (one for each occurrence) or 1 (one for each individual)? I think the answer should be one, but it isn't clear to me how a data aggregator is going to achieve the goal of having one dot per individual if the basic unit "dot creation" is an occurrence rather than an individual. At the present moment, this question seems like a moot point because most records in big databases like GBIF are based on one specimen (or observation) per record of an individual, but that won't necessarily be the case in the future if people take multiple live organism images, perhaps also at the same time they collect a physical specimen. I anticipate that one response to this question will be to call each imaging bout one "observation" having a number of dwc:associatedMedia references. That collapses the number of occurrence records considerably, but not down to one. I took images of that tree on at least three separate instances over the course of a year and Ron collected his graft tissue years before that. There is simply no way to reduce the number of occurrences for this tree to one, nor should we want to. A possible use of multiple occurrence records (i.e. my first point above) of this sort might be to establish how long individuals of Crataegus harbisonii live and each occurrence record (whether separated by years or by the seconds between shutter clicks) is a part of the record that we should be able to (and want to) preserve. Another use would be to track a non-sessile organism (e.g. a whale) in both time and space. In that case, the record on a map for an individual would be some kind of curve rather than a dot. But in any case, recognizing the existence of an entity that I'm calling an Individual facilitates these broader uses of occurrence data and it's really hard for me to see how that is going to happen if we ONLY have occurrences as separate entities. Response Markus? How does GBIF deal with whale tracks or multiple banded bird observations for a single bird?
The third compelling reason for recognizing the existence of Individuals as a resource type is that it is the best way to maintain the linkage between multiple occurrences of the same individual and identifications. (In the oversimplified examples I gave earlier, I applied a scientific name directly to an individual. In actual practice, I relate individuals to identifications and then relate the identifications to taxa.) Again, to illustrate with a real-life example, when Bruce Kirchoff was developing his Woody Plants of the Southeastern US learning software, he asked a taxonomist to go through the images of mine that he was using for the project to verify that they were identified correctly. My old website just threw together all images of a particular species onto one page without regard to the individuals from which they originated (e.g. http://www.cas.vanderbilt.edu/bioimages/species/sarar3.htm and http://www.cas.vanderbilt.edu/bioimages/species/soam3.htm). It turns out that I had carelessly misidentified a vegetative Sambucus racemosa ssp. racemosa individual as Sorbus americana. The taxonomist asked me which of the various bark, twig, leaf, etc. images were from the same plant and the only way I could find out was through the laborious process of looking for images with similar time/date values and my hand written field notes. It was a nightmare finding all of the particular image records that needed to have their identifications fixed and then correcting them. On my new website (e.g. http://bioimages.vanderbilt.edu/metadata.htm, then click on Quercus chrysolepis), the images are connected to the individual from which they originated. If I discover by looking at a particularly informative image that I have misidentified the individual, I only need to add an updated determination (i.e. identification) to that individual's record and automatically all images from that individual are displayed with the correct name and are placed on the correct species page. Now imagine a situation that is larger and even more complicated than this (think a Bioblitz). Herbarium curators and live plant photographers are working together to document the flora of an area. Multiple images and multiple specimens may be collected from the same individual. The images may go one place and the specimens may go to several herbaria (if "duplicates" are distributed). It's possible that people might come back to the same individual later to photograph or collect fruit having initially seen flowers. Suppose on down the line a taxonomist looks at one of the specimen duplicates and realizes that the initial identification was wrong (or maybe just wants to assert an alternative opinion about the identity). If the record is based on that individual, then all that is required is for the annotating taxonomist to add a determination (i.e. dwc:Identification) to the Individual's record and poof! all images and duplicate specimens have that opinion associated with them. In contrast, if all of these separate occurrence records are not tied together via the Individual, and if each individual occurrence record has its own determination, nobody is possibly going to ever track down and correct every one. Granted, the scenario that I've suggested is contingent on the existence of a large scale database that can connect metadata across institutions, but exactly that kind of thing is what projects like the US Virtual Herbarium and our Live Plants Imaging group are trying to create. Let's enable this by making it possible within Darwin Core to have a record structure that is Individual-based.
I recognize that many "specimen-based" organizations aren't really going to care one whit about this. That's fine. In their databases and personal XML schemas they can ignore Individuals as it is their prerogative. But when we build RDF templates, I believe strongly that for the benefit of those of us who care about the broader applications of occurrences those templates should use individuals to connect (one or more) occurrences and (one or more) identifications. For those with a technical bent, you can see how I have done this for an herbarium specimen by looking at the page source RDF of the example http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0429.rdf . For those of a non-technical bent, just look at the webpage that shows up when you click on the link. It looks just like any other web page for a specimen and you don't even have to know that the underlying RDF supports using Individuals as a grouping mechanism.
In summary, I think we need Individual as a DwC class to enable understandable rdfs:typing of records of individuals and to create a context in which instances of individuals can be placed (i.e. people would assign and use identifiers for individuals when they document occurrences). These instances (and their assigned URI GUIDSs) would allow for "connecting" identifications and occurrences in a more meaningful way. I am not suggesting that the occurrence be dethroned as the center of biodiversity records. Assuming that the xxxxID terms end up being moved out of the various classes and into the record-level terms area as was suggested recently, I think that there are really only about two terms that should be put into a new Individual class: the other new term I have proposed (individualRemarks) and establishmentMeans (but that is the topic of another email). It may seem odd to suggest a adding a class that has very few terms in it, but if you follow my reasoning above you will hopefully understand why I have done so.
I hope that the discussion (and criticism!) will continue. Again, I'm interested in hearing alternatives. Steve
Richard Pyle wrote:
In many cases, a specimen is created by killing an organism and gluing it
to a
piece of paper (if it's a plant) or putting it in a jar (if it's an
animal).
It is natural to ask the question "what kind of species is the specimen?".
We can look at the specimen and make a statement like [specimen]
dwc:scientificName "Drosophila melanogaster" and it pretty much makes
sense.
However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include
specimens
but which also includes observations and probably all kinds of things like
images, DNA samples, and a whole lot of other things. If we try to apply
the same kind of statement to other kinds of Occurrences besides specimens
we immediately run into problems. If we say that [digital image]
dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have
a
scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about
observations.
An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila
melanogaster"
we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
OK, I admit that I have not been following this list as closely as I should have -- especially during the latter half of 2009. But I have to ask....seriously....is this the level of misunderstanding that still exists in our community?
Perhaps I'm the idiot here, but it has *always* been my understanding that the "thing" (I hesitate to use the word "basis") of an Occurrence instance is *always* the organism (or set of organisms, or impression of an organism in the case of fossils). If the organisms were captured and preserved in a Museum, then we call it a specimen. If the organisms were only witnessed and not captured, we call it an observation. Everything else (including the physical specimen) is just layers of evidence to support the existence and taxonomic identification of the organism within the Occurrence. When photons reflected off the outer surface of an organism find their way through a lense and onto some mechanism for recording said photos (either a human retina and neurons in the brain, or sheet of celluloid, or digital image sensor and memory stick), it's still the organism that the photons reflected off of, which represents the "thing" of the Occurrence to which metadata apply. Same goes for vocalizations transmitted through pressure waves in the air onto some recording device (ear/brain, or microphone/tape).
So while it's certainly true that a media object such as a 35mm slide or digital image file does not itself have a scientificName (then again, some of my old Kodachromes have enough mold on them that they might....), said media objects are *not* the Occurrence itself -- they merely represent evidence of the occurrence. Even a specimen in a jar is not the Occurrence itself. The Occurrence occurred when the specimen was captured (e.g., 400 feet deep on a coral reef). A specimen in a jar on a shelf in a Museum is no longer the "Occurrence"; it is the evidence of the Occurrence.
When I assign a GUID to an Occurrence record that lacks a voucher (i.e., an "Observation"), I'm certainly not trying to identify the act of observation; I'm identifying the organism that was observed, at the time and place that it was observed.
For what it's worth, if I only have a still or video image of an organism (e.g., http://www.youtube.com/watch?v=GVTd11q3Ppc; taken by Rob Whitton, who some of you met at TDWG this year), and didn't collect the specimen, I create an Observation record, and link the image to it as associatedMedia. I would never assign a taxon name to the video clip -- only to the "content item" of the video that represents an organism, serving as the basis of an Occurrence record.
The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism.
I would say in all three cases that the presence of an organism at a place and time was the Occurrence. Specimens, images, and reported observations are merely the evidence that the occurrence existed (and to varying degrees, can also allow for subsequent interpretations of taxonomic identification).
These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really.
I completely agree. The occurrence was the organism at a place and time. The "species" is merely the taxon concept that someone identified the organism as belonging to. The scientificName is merely the label that someone applied to the taxon concept. In other words, the scientificName is really a property of the Taxon Concept, and the Taxon Concept is the subject of an identification event, and the identification event was applied to the organism, which itself represents the basis of an Occurrence. But very few people go to the trouble of creating that full chain of relationships, so as a short-hand, the scientificName is often treated as a direct property of the occurrence (collected or observed organism). I think this short-hand is perfectly fine in the context of DwC, but only as long as people understand the implied chain of linked entities. If we start to forget what's really going on, then we run into trouble.
Which, I guess, was the whole point of Steve's post.
What concerns me, though, is that we're not (yet?) already beyond this.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from
it,
collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things?
Two Occurences: The first one when it was captured, photographed, and relieved of a feather. The second when it was observed at a later date.
Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it?
We create an identifier for the first Occurrence, capture the specimen-relevant metadata of the preserved feather, and track the DNA sample via associatedSequences.
That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird.
It's certainly different from an image of the whole Bird, but that doesn't preclude us from including both bird and feather images among associatedMedia for the first Occurrence.
We didn't get the DNA sample from the feather, we got it via a blood sample from the bird.
I don't see that as a problem, because the feather is only the evidence of the bird at the place and time (i.e., the first Occurrence). Thus, the sequence can still be included as part of the associatedSequences for the first Occurrence.
The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it.
Agreed -- it forms the basis for the second Occurrence record (later date). The two Occurrence records can be cross referenced, either via a shared individualID, or via associatedOccurrences.
Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
As someone else alluded to earlier in this thread, there are near-infinite ways that we can slice & cluster biodiversity data. I think there are some cases where "individual" makes a lot of sense as a class (banded birds, managed organisms in zoos and curated gardens, whale and shark observation datasets, plant monitoring projects, etc.). But I think the notion of "Occurrence" makes more sense at this point in biodiversity informatics history, because the vast majority of datasets can be organized in this way realtively painlessly, and because the majority of questions being asked of these data revolve around presence of organisms identified to taxon concepts occurring at place and time.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost.
Myself among them. Thank you for presenting it in the less-efficient English Prose form.
The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
Well...I certainly agree with you that we need *clear* documentation on what these classes are intended to represent. I had *thought* it was clear that an Occurrence was as I have outlined above. But like I said, I'm perfectly willing to accept that I'm the idiot in this case, and am completely out of phase with the rest of the community.
As to whether or not we need to define a class for Individual, I'm not so sure that's entirely necessary. I guess DwC is already primed for it (http://rs.tdwg.org/dwc/terms/index.htm#individualID) -- but I'm not sure what properties would apply to such a class that are not already covered in DwC. Pronbably the next intieration of DwC would move some of the properties of the Occurrence class (catalogNumber, individualCount, preparations, disposition, associatedSequences, previousIdentifications) over to the Individual Class, at which point the Occurrence becomes the intersection of an Individual and an Event.
But let me ask: how would you scope "Individual"? (see my previous rants on this list in recent days) Would it be restricted to a particular individual organism? Or, would it be extended to include specified groups of organisms (as dwc:individualID already does)? What about populations? Taxon Concepts?
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject.
I've had gotten through the first few pages, and intend to finish soon. But it's much more fun to write emails about this stuff..... :-)
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.orghttp://hbs.bishopmuseum.org/staff/pylerichard.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Speaking strictly from ignorance rather than wisdom here, I don't believe there is one right way to use the standard, though I agree that they are innumerable wrong ways to do so. It's this basic unease that makes me intuitively shy of expressing "A [single] TDWG Ontology".
What if we try a slightly different world view from the one you propose centered on the Individual? Namely, let the Occurrence stand as "evidence that a taxon occurred at a place and time." That is to say, we may or may not care about the concept of an individual in our thinking and our data capture. In this view, the Occurrence remains the central concept, and the rest of the data highlights the evidence. Hence, a skull in a collection (and the information gathered about the collection event) is the evidence that a taxon occurred at a place and time. Similarly, a digital image of an identifiable individual from a camera trap is the evidence that a taxon occurred at a place and time. A fossil having myriad individuals is evidence that taxa occurred at a place and time based on a GeologicalContext. In plain English, which we could express as RDF with an appropriate set of predicates, we would always have the same pattern to describe Occurrences from the Occurrence-centric world view, namely
the Occurrence O gives evidence that Taxon T determined based on Identification criteria I occurred at Location L within GeologicalContext G during the Event E based on evidence captured in properties of the Occurrence and distinguishable in the type of evidence as recorded in the dcterms:type and or the dwc:basisOfRecord.
I don't see anything "wrong" with this formulation, as all of the predicates appropriately associate subjects and objects.
In other words, what is special about the Individual-centric view (or any other view) except the way one wants to think about and express the relationships (predicates) or formulates the questions?
On Wed, Oct 13, 2010 at 7:07 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
I was just ready to leave work when I wrote this and since then I'm feeling like I should clarify just what I mean by "wrong" ways of using RDF. I recognize that TDWG encourages flexibility in the ways that standards such as DwC are used. As such, it doesn't usually define "right" and "wrong" ways of using the standards. What I mean by calling some uses "wrong" is not intended to discourage the creative use of DwC terms in RDF. What I mean is that one must be careful to make sure that RDF statements mean what is intended. Here is an example. The Dublin Core term dcterms:language means "the language of the resource". On multiple occasions, I've seen this term used in RDF as a property of a resource whose metadata is written in a certain language. This is "wrong" because the subject of the statement is the resource itself, not the resource's metadata. The need for this kind of clarity is apparent in the case of media. For example, if we are providing metadata in English that describes a nature film which has audio in German, the correct statement is that [film] dcterms:language "de", NOT [film] dcterms:language "en". This problem is handled appropriately in the MRTG schema by creating the (required) term mrtg:metadataLanguage. The correct statement would be [film] mrtg:metadataLanguage "en" . (I'm using "[film]" in lieu of a URI identifier for the film.) If, however, we were writing RDF to describe the metadata itself rather than the film, then it would be appropriate to say [film's metadata] dcterms:language "en" . In straight XML, we might get away with semantic sloppiness if the senders and receivers of the XML "understand" what the intended subject is of the term dcterms:language. But in RDF, we have to assume that the receiver of the RDF is a "stupid" computer which only infers exactly what is said and not what we MEANT to say.
I believe that this is a very important point that all parties need to keep in mind before we happily march off creating RDF templates for the general public to use. In particular, I have some serious problems with the way that people are associating properties with instances of the dwc:Occurrence class. I believe that these "wrong" ways originate with the historical roots of Darwin Core as a means to describe specimens. I will illustrate what I mean. In many cases, a specimen is created by killing an organism and gluing it to a piece of paper (if it's a plant) or putting it in a jar (if it's an animal). It is natural to ask the question "what kind of species is the specimen?". We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes sense. However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include specimens but which also includes observations and probably all kinds of things like images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have a scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about observations. An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila melanogaster" we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
In all of the examples above, the correct statement is [individual organism] dwc:scientificName "Drosophila melanogaster". The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism. These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really. The image is not an image of the Drosophila species concept nor is it an image of the string "Drosophila melanogaster". The image is an image of an individual fruit fly. The individual fruit fly is a representative of the taxon, the image and the observation are not.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from it, collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things? Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it? That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird. We didn't get the DNA sample from the feather, we got it via a blood sample from the bird. The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it. Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost. The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject. It explains the rationale better than this email, has nice diagrams, and gives RDF examples to illustrate everything ( https://journals.ku.edu/index.php/jbi/article/view/3664). If somebody has a better idea of how to develop an internally consistent system that can handle the problems I've raised here that DOESN'T involve Individuals (i.e. other "right"[=semantically accurate] ways to express properties and relationships among Identifications, Taxa, diverse types of Occurrences, etc.) I'd like to hear what it is. Or perhaps as Stan has suggested, there needs to be a task group that can hash out alternative views. But let's have the discussion before we post models and suggest people use them.
Steve
On Oct 14, 2010, at 10:05 AM, John Wieczorek wrote:
What if we try a slightly different world view from the one you propose centered on the Individual? Namely, let the Occurrence stand as "evidence that a taxon occurred at a place and time." That is to say, we may or may not care about the concept of an individual in our thinking and our data capture. In this view, the Occurrence remains the central concept, and the rest of the data highlights the evidence. Hence, a skull in a collection (and the information gathered about the collection event) is the evidence that a taxon occurred at a place and time. Similarly, a digital image of an identifiable individual from a camera trap is the evidence that a taxon occurred at a place and time. A fossil having myriad individuals is evidence that taxa occurred at a place and time based on a GeologicalContext.
If users try to pack a lot of context-dependent significance and meaning into their annotations (what the user "cares about" in the example), and present a fundamental observation only through layers of inference, this makes it more difficult to re-use or re-purpose the results, because the ultimate consumer of the information may not share the same motivations and perspectives.
Arlin
In plain English, which we could express as RDF with an appropriate set of predicates, we would always have the same pattern to describe Occurrences from the Occurrence-centric world view, namely
the Occurrence O gives evidence that Taxon T determined based on Identification criteria I occurred at Location L within GeologicalContext G during the Event E based on evidence captured in properties of the Occurrence and distinguishable in the type of evidence as recorded in the dcterms:type and or the dwc:basisOfRecord.
I don't see anything "wrong" with this formulation, as all of the predicates appropriately associate subjects and objects.
In other words, what is special about the Individual-centric view (or any other view) except the way one wants to think about and express the relationships (predicates) or formulates the questions?
On Wed, Oct 13, 2010 at 7:07 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
I was just ready to leave work when I wrote this and since then I'm feeling like I should clarify just what I mean by "wrong" ways of using RDF. I recognize that TDWG encourages flexibility in the ways that standards such as DwC are used. As such, it doesn't usually define "right" and "wrong" ways of using the standards. What I mean by calling some uses "wrong" is not intended to discourage the creative use of DwC terms in RDF. What I mean is that one must be careful to make sure that RDF statements mean what is intended. Here is an example. The Dublin Core term dcterms:language means "the language of the resource". On multiple occasions, I've seen this term used in RDF as a property of a resource whose metadata is written in a certain language. This is "wrong" because the subject of the statement is the resource itself, not the resource's metadata. The need for this kind of clarity is apparent in the case of media. For example, if we are providing metadata in English that describes a nature film which has audio in German, the correct statement is that [film] dcterms:language "de", NOT [film] dcterms:language "en". This problem is handled appropriately in the MRTG schema by creating the (required) term mrtg:metadataLanguage. The correct statement would be [film] mrtg:metadataLanguage "en" . (I'm using "[film]" in lieu of a URI identifier for the film.) If, however, we were writing RDF to describe the metadata itself rather than the film, then it would be appropriate to say [film's metadata] dcterms:language "en" . In straight XML, we might get away with semantic sloppiness if the senders and receivers of the XML "understand" what the intended subject is of the term dcterms:language. But in RDF, we have to assume that the receiver of the RDF is a "stupid" computer which only infers exactly what is said and not what we MEANT to say.
I believe that this is a very important point that all parties need to keep in mind before we happily march off creating RDF templates for the general public to use. In particular, I have some serious problems with the way that people are associating properties with instances of the dwc:Occurrence class. I believe that these "wrong" ways originate with the historical roots of Darwin Core as a means to describe specimens. I will illustrate what I mean. In many cases, a specimen is created by killing an organism and gluing it to a piece of paper (if it's a plant) or putting it in a jar (if it's an animal). It is natural to ask the question "what kind of species is the specimen?". We can look at the specimen and make a statement like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty much makes sense. However, in the new Darwin Core standard, we have a broader category of "things" (a.k.a. resources) that we call Occurrences which include specimens but which also includes observations and probably all kinds of things like images, DNA samples, and a whole lot of other things. If we try to apply the same kind of statement to other kinds of Occurrences besides specimens we immediately run into problems. If we say that [digital image] dwc:scientificName "Drosophila melanogaster" we are making a nonsensical statement. The digital image can have properties like its photographer, its format, its pixel dimensions, etc. but the image itself does not have a scientific name. The scientific name is a property of the thing that was photographed. It makes even less sense if we are talking about observations. An observation is a situation where somebody observes an organism. The observation can have properties like the observer, the location, etc. However, if we say [observation] dwc:scientificName "Drosophila melanogaster" we are saying that that act of observing has a scientific name. That is an incorrect statement. So the general statement [Occurrence] dwc:scientificName "Drosophila melanogaster" does not make sense when applied to all possible types of Occurrences. Rather, the organism that we are observing is the thing that has a scientific name.
In all of the examples above, the correct statement is [individual organism] dwc:scientificName "Drosophila melanogaster". The specimen is an occurrence of the individual organism. The image is an occurrence of the individual organism. The observation is an occurrence of the individual organism. These statements may seem odd because we are used to thinking of an Occurrence being an occurrence of the "species" but it's not really. The image is not an image of the Drosophila species concept nor is it an image of the string "Drosophila melanogaster". The image is an image of an individual fruit fly. The individual fruit fly is a representative of the taxon, the image and the observation are not.
This point becomes more clear if we look at a situation where several types of occurrence records are collected from the same individual. Let's say that we capture a bird, photograph it, collect a feather from it, collect a DNA sample and band it and let it go. Later somebody sees the band and reports that as an observation. How do we connect all of these things? Do we create an identifier for the specimen (the feather) and then say that the image and the DNA sample came from it? That would be wrong. We could take an image of the feather, but that would be a different thing from an image of the bird. We didn't get the DNA sample from the feather, we got it via a blood sample from the bird. The band observation is not an observation of the feather, or the image or the DNA sample. It's an observation of the bird which was never any kind of specimen living or dead. The bird is an individual organism and that's what we need to call it. Right now we don't have anything in Darwin Core that can be used to rdfs:type the bird, which is why I proposed Individual as a Darwin Core class.
I could say these things more clearly in RDF, but since because many members of the audience of this message aren't familiar with RDF/XML they would probably zone out and the point would be lost. The point is that we need to have identifiable classes of "resources" (the technical name for "things" like physical artifacts, concepts, and electronic representations) for all of the things that that we need to describe and inter-relate in the Darwin Core world. Right now, we are missing one of the important pieces that we need, which is a class for the Individual. If we are satisfied with creating an RDF model that only works for specimens and one-time observations, then we probably don't need Individual as a Darwin Core class. On the other hand, if TDWG and GBIF are really serious about creating a system (Darwin Core and RDF based on it) that can handle other types of Occurrences like multiple images of live organisms, observations of the same organism over time, and multiple types of Occurrences collected from the same organism, then this capability should be built into the system from the start. When I got back from the TDWG meeting, I was all excited about trying to use Darwin Core Archives with my live plant image collection. However, it quickly became evident that it could not work because Occurrences were at the center of the diagram rather than Individuals. So unless something changes, we are already embarking on the process of locking out these other Occurrence types.
I hate to sound like a broken record (do we have those any more?), but read my paper on this subject. It explains the rationale better than this email, has nice diagrams, and gives RDF examples to illustrate everything (https://journals.ku.edu/index.php/jbi/article/view/3664 ). If somebody has a better idea of how to develop an internally consistent system that can handle the problems I've raised here that DOESN'T involve Individuals (i.e. other "right"[=semantically accurate] ways to express properties and relationships among Identifications, Taxa, diverse types of Occurrences, etc.) I'd like to hear what it is. Or perhaps as Stan has suggested, there needs to be a task group that can hash out alternative views. But let's have the discussion before we post models and suggest people use them.
Steve
<ATT00001.txt>
------- Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
Nice. I have added 2 taxonomic checklist examples to that page and a few empty todos: http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_norma...
I am wondering if we need to distinguish between technologies at this stage. They surely have their differences in how to finally encode the information, but that could be done in the guidlines. Having examples with just term - value(literal) pairs would be sufficient I think to illustrate the use of the terms regardless of technologies.
Markus
On Oct 12, 2010, at 22:31, John Wieczorek wrote:
I am interested in helping with an examples page. The page could have XML and RDF examples illustrating particular use cases, as you have recommended. Create an "Examples" page on the Table of Contents and then have all of the examples on one page with an index of links to specific examples at the top? I made a straw man page to show what I am thinking at http://code.google.com/p/darwincore/wiki/Examples.
On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote: Would we have the energy to compile example dwc records on how to use darwin core for certain use cases? The lack of guidance on how to use darwin core was mentioned earlier. An additional example webpage for the dwc website would surely be really helpful for not only newbies. A dwc record for bird watching, vegetation plot surveys, insect specimen collection, herbarium sheets, zoological garden visits, tissue sample, dna sequence, marine fishing net catches, etc
Id volunteer to do the html page if Im given example records with a short use case description...
Markus
On Oct 12, 2010, at 13:14, Roger Hyam wrote:
Wow - what a thread to come back to.
I saw my name mentioned so I ought to chip in. I also think we are conflating two distinct things under the name "occurrence".
This point is largely just expanding on what Kevin just said. Going down the road he was wise enough not to go down!
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records? The answer will always depend on the question asked.
Take two examples.
A tiger roaming "free" in London living off a diet of squirrels and tourists. Occurrence records for this organism are just occurrence records. Why the tiger is in London (climate change, introduction, invasion, escape) is not a quality of it being there. They are value judgements added later.
A tiger sitting in a cage a London Zoo is "managed" in that it is being maintained there by a human effort. We are recording the fact that someone has placed it there and held it in that position for our edification.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
The status of taxa in regions is a completely different thing. As soon as we talk about aggregating multiple observations (or lack of them) then we are talking about the results of analysis instead of primary observations. Only at this point should we be talking about the status of the "occurrence" in terms of native/invasive/naturalised etc. This may not even be based on extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be irradiated.
Does the problem occur because we are using the same term "occurrence" to mean both a primary unit of data gathering and the result of an analysis (possibly even just a hypothesis if it is the result of niche modelling)? How could we differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Sorry to be long winded.
Roger
On 12 Oct 2010, at 09:36, Kevin Richards wrote:
I also have always felt that "nativeness" should apply more to an occurrence than a taxon, but have swayed from one opinion to the other on a regular basis. So my conclusion is that "nativeness" is a propety of both, and require both, in a way - and that these different perspectives are actually the same thing.
Eg, if we describe (in a basic way) : Ocurrence = Taxon at Location
then if we say that Nativeness is a property of a Taxon that is restricted by Location (jerry's view) then this is equivalent to saying that Nativeness is a property of an Ocurrence ! (Rich's view)
As Rich points out, it doesnt make a whole lot of sense to apply Nativeness to a single occurrence, but I'm not sure this is what is meant by stating that "this specimen of Poa anceps that I collected from Christchurch is 'Native'" - but more that "I have found a specimen of Poa anceps in Christchurch and from knowledge of other previously recorded ocurrences, I know that this occurence/taxon is Native in this area"
Also I tend to feel that a lot of biodiversity properties are properties of ocurrences - EVEN taxon names are a property of an occurrence and not of this 'concept' of a species - but I wont go down that road right now :-)
Also, we discussed this topic a while ago on the tdwg content list, having worked out that "nativeness" or what we call "biostatus" is a fairly complicated topic, involving taxon names, locations, time, and aspects like 'origin' and 'presence', ...
Kevin
From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 5:41 p.m. To: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Before we agree to disagree, let me try to elaborate a bit more:
I think we both agree that "Nativeness" (to borrow Dave's term) is a property of a taxon at a geographic locality (it could also be a property of a taxon in a class of habitat, but few people actually frame it this way).
The reason I think that "Nativeness" is best represented as a property of an Occurrence, rather than of a taxon, is that a taxon is a circumscribed set of organisms, usually based on evolutionary relatedness or morphological or genetic similarity. By contrast, an Occurrence is about the presence of a member or multiple members of a taxon concept in space and time (i.e., at a particular place and time).
We often think of Occurrence records in terms of individual organisms (e.g., specimens, or specific observed or photographed organisms), and I agree, it's weird to think of "Nativeness" as it applies to an individual organism. However, my understanding is that Occurrence instances can also apply to populations -- which is what terms such as establishmentMeans and occurrenceStatus fit into this class.
More generally, if we agree that "Nativeness" is a property of a taxon at a particular locality, the way that this intersection is usually manifest in DwC is via Occurrence and Event instances.
How else would you represent "Nativeness" within DwC?
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 6:02 PM To: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
We will have to agree to disagree.
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and
the obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying
that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most
useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans,
but is inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an
assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded,
but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic
(too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define
many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We
have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely
uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director,
Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box
1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be
captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of
Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent
bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly
right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC
records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since
there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3]
instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum,
say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more
user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important
part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord"
in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for
another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old
observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Nice.
Agreed!
I am wondering if we need to distinguish between technologies at this stage. They surely have their differences in how to finally encode the information, but that could be done in the guidlines. Having examples with just term - value(literal) pairs would be sufficient I think to illustrate the use of the terms regardless of technologies.
Maybe we should start with term-value pairs, but I think it would also be useful to render each in XML and RDF as well.
what about a nomenclatural example, maybe separate ones for each code?
Yes, I'll get on it tomorrow (today is a full schecule).
I notice the existing examples don't use GUIDs. Is that intentional, or should I use actual examples with resolbavle GUIDs?
Rich
Maybe we should start with term-value pairs, but I think it would also be useful to render each in XML and RDF as well.
technology specific examples are surely needed at some point. Once we have a set of key value pair examples we could translate them into xml, rdf and csv so we have the same data in various formats.
The trouble with RDF though is that we dont yet have any guidelines. Once Steve et al have settled on something we can then take the rdf examples forward.
what about a nomenclatural example, maybe separate ones for each code?
Yes, I'll get on it tomorrow (today is a full schecule).
I notice the existing examples don't use GUIDs. Is that intentional, or should I use actual examples with resolbavle GUIDs?
Do whatever you have as real data examples. If you happen to have guids in gnub or alike thats brilliant. There is a recommendation to use guids for ids, but its no requirement. Most datasources dont have guids and there is no need to forces them. The example I used for mammal species of the world is real data - and they dont have guids.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
OK, thanks. I'll start by generating the zoological Nomenclatural example, using real ZooBank records. Following your lead on the Species Checklist example, I'll create one normalized version, and one demoralized. If no one else on this list can generate a real example for the botanical and/or bacteriological Code, then I'll see if I can make one. I just can't give it resolvable GUIDs (yet). By early 2011, we should have a functional instance of GNUB, at which time I'll swap out the examples to that.
As the subject line indicates, I also have a question:
At the moment, the ZooBank HTTP proxy only resolves to human-readable HTML: http://zoobank.org/urn:lsid:zoobank.org:pub:68376390-7809-46FF-9EC4-1371B4AA D0FF
However, later today I want to modify the site to redirect appropriate requests to RDF: http://zoobank.org/authority/metadata/?lsid=urn:lsid:zoobank.org:pub:6837639 0-7809-46FF-9EC4-1371B4AAD0FF
My understanding is that several people have opted for a solution that uses HTTP_USER_AGENT to determine whether the client is a browser vs. some other app, and if a browser then return HTML, and if another app then return RDF.
My question is: is this an acceptable (interim) solution? Or is there a better way to deal with this? Yes, I know what we should probably do is return XML with a stylesheet, but that will take me more than this afternoon to implement.
Please forgive me if I've just revealed and/or emphasized how uneducated I am as a web developer....
Aloha, Rich
-----Original Message----- From: "Markus Döring (GBIF)" [mailto:mdoering@gbif.org] Sent: Wednesday, October 13, 2010 9:17 AM To: Richard Pyle Cc: tuco@berkeley.edu; tdwg-content@lists.tdwg.org; 'Roger Hyam'; tdwg-bioblitz@googlegroups.com; 'Jerry Cooper' Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Maybe we should start with term-value pairs, but I think it
would also
be useful to render each in XML and RDF as well.
technology specific examples are surely needed at some point. Once we have a set of key value pair examples we could translate them into xml, rdf and csv so we have the same data in various formats.
The trouble with RDF though is that we dont yet have any guidelines. Once Steve et al have settled on something we can then take the rdf examples forward.
what about a nomenclatural example, maybe separate ones
for each code?
Yes, I'll get on it tomorrow (today is a full schecule).
I notice the existing examples don't use GUIDs. Is that
intentional,
or should I use actual examples with resolbavle GUIDs?
Do whatever you have as real data examples. If you happen to have guids in gnub or alike thats brilliant. There is a recommendation to use guids for ids, but its no requirement. Most datasources dont have guids and there is no need to forces them. The example I used for mammal species of the world is real data - and they dont have guids.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Note, you'll have to fix the truncated URLs if you want to click on them.
Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Wednesday, October 13, 2010 9:45 AM To: '"Markus Döring (GBIF)"' Cc: tdwg-content@lists.tdwg.org; 'Roger Hyam' Subject: [tdwg-content] DwC Examples, and Question
OK, thanks. I'll start by generating the zoological Nomenclatural example, using real ZooBank records. Following your lead on the Species Checklist example, I'll create one normalized version, and one demoralized. If no one else on this list can generate a real example for the botanical and/or bacteriological Code, then I'll see if I can make one. I just can't give it resolvable GUIDs (yet). By early 2011, we should have a functional instance of GNUB, at which time I'll swap out the examples to that.
As the subject line indicates, I also have a question:
At the moment, the ZooBank HTTP proxy only resolves to human-readable HTML: http://zoobank.org/urn:lsid:zoobank.org:pub:68376390-7809-46FF -9EC4-1371B4AA D0FF
However, later today I want to modify the site to redirect appropriate requests to RDF: http://zoobank.org/authority/metadata/?lsid=urn:lsid:zoobank.o
rg:pub:6837639
0-7809-46FF-9EC4-1371B4AAD0FF
My understanding is that several people have opted for a solution that uses HTTP_USER_AGENT to determine whether the client is a browser vs. some other app, and if a browser then return HTML, and if another app then return RDF.
My question is: is this an acceptable (interim) solution? Or is there a better way to deal with this? Yes, I know what we should probably do is return XML with a stylesheet, but that will take me more than this afternoon to implement.
Please forgive me if I've just revealed and/or emphasized how uneducated I am as a web developer....
Aloha, Rich
-----Original Message----- From: "Markus Döring (GBIF)" [mailto:mdoering@gbif.org] Sent: Wednesday, October 13, 2010 9:17 AM To: Richard Pyle Cc: tuco@berkeley.edu; tdwg-content@lists.tdwg.org; 'Roger Hyam'; tdwg-bioblitz@googlegroups.com; 'Jerry Cooper' Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Maybe we should start with term-value pairs, but I think it
would also
be useful to render each in XML and RDF as well.
technology specific examples are surely needed at some point. Once we have a set of key value pair examples we could
translate them
into xml, rdf and csv so we have the same data in various formats.
The trouble with RDF though is that we dont yet have any
guidelines.
Once Steve et al have settled on something we can then take the rdf examples forward.
what about a nomenclatural example, maybe separate ones
for each code?
Yes, I'll get on it tomorrow (today is a full schecule).
I notice the existing examples don't use GUIDs. Is that
intentional,
or should I use actual examples with resolbavle GUIDs?
Do whatever you have as real data examples. If you happen to have guids in gnub or alike thats brilliant. There is a recommendation to use guids for ids, but its no requirement. Most datasources dont have guids and there is
no need to
forces them. The example I used for mammal species of the world is real data - and they dont have guids.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Any reason you don't want to use the Accept: header, which is what content negotiation is normally based on?
-hilmar
Sent with a tap.
On Oct 13, 2010, at 2:45 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
My understanding is that several people have opted for a solution that uses HTTP_USER_AGENT to determine whether the client is a browser vs. some other app, and if a browser then return HTML, and if another app then return RDF.
Content negotiation is the w3c recommended best practice, see:
Best Practice Recipes for Publishing RDF Vocabularies http://www.w3.org/TR/swbp-vocab-pub/
Gregor
Rich, as hilmar and gregor already said content negotiation is the preferred solution. Its also more "restful" if you like that. For apache, setting up content negotiation is really not hard. The easiest is to provide different urls/scripts for each content, usually with the appropiate suffix:
for content negotiation: http://abc.org/xyz
same resource for rdf http://abc.org/xyz.rdf
for xml http://abc.org/xyz.xml
for json http://abc.org/xyz.json
for html http://abc.org/xyz.html
then you can let apache do the content negotiation work for you and do 303 redirects to the format specific url. See http://www.w3.org/TR/swbp-vocab-pub/#apache
Aparently there is also a mod_negotiation module for apache which is recommended over mod_rewrite, but I havent used that so far: http://httpd.apache.org/docs/current/mod/mod_negotiation.html
Note that for rdf in the linked data context at least its highly recommended to do the 303 redirect and NOT use the same url for the different content. That is dont try to use some conditionals inside your code/template to respond differently based on the http header.
Markus
On Oct 13, 2010, at 21:45, Richard Pyle wrote:
OK, thanks. I'll start by generating the zoological Nomenclatural example, using real ZooBank records. Following your lead on the Species Checklist example, I'll create one normalized version, and one demoralized. If no one else on this list can generate a real example for the botanical and/or bacteriological Code, then I'll see if I can make one. I just can't give it resolvable GUIDs (yet). By early 2011, we should have a functional instance of GNUB, at which time I'll swap out the examples to that.
As the subject line indicates, I also have a question:
At the moment, the ZooBank HTTP proxy only resolves to human-readable HTML: http://zoobank.org/urn:lsid:zoobank.org:pub:68376390-7809-46FF-9EC4-1371B4AA D0FF
However, later today I want to modify the site to redirect appropriate requests to RDF: http://zoobank.org/authority/metadata/?lsid=urn:lsid:zoobank.org:pub:6837639 0-7809-46FF-9EC4-1371B4AAD0FF
My understanding is that several people have opted for a solution that uses HTTP_USER_AGENT to determine whether the client is a browser vs. some other app, and if a browser then return HTML, and if another app then return RDF.
My question is: is this an acceptable (interim) solution? Or is there a better way to deal with this? Yes, I know what we should probably do is return XML with a stylesheet, but that will take me more than this afternoon to implement.
Please forgive me if I've just revealed and/or emphasized how uneducated I am as a web developer....
Aloha, Rich
-----Original Message----- From: "Markus Döring (GBIF)" [mailto:mdoering@gbif.org] Sent: Wednesday, October 13, 2010 9:17 AM To: Richard Pyle Cc: tuco@berkeley.edu; tdwg-content@lists.tdwg.org; 'Roger Hyam'; tdwg-bioblitz@googlegroups.com; 'Jerry Cooper' Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Maybe we should start with term-value pairs, but I think it
would also
be useful to render each in XML and RDF as well.
technology specific examples are surely needed at some point. Once we have a set of key value pair examples we could translate them into xml, rdf and csv so we have the same data in various formats.
The trouble with RDF though is that we dont yet have any guidelines. Once Steve et al have settled on something we can then take the rdf examples forward.
what about a nomenclatural example, maybe separate ones
for each code?
Yes, I'll get on it tomorrow (today is a full schecule).
I notice the existing examples don't use GUIDs. Is that
intentional,
or should I use actual examples with resolbavle GUIDs?
Do whatever you have as real data examples. If you happen to have guids in gnub or alike thats brilliant. There is a recommendation to use guids for ids, but its no requirement. Most datasources dont have guids and there is no need to forces them. The example I used for mammal species of the world is real data - and they dont have guids.
Rich
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Oct 14, 2010, at 2:04 AM, Markus Döring (GBIF) wrote:
That is dont try to use some conditionals inside your code/template to respond differently based on the http header.
Actually, the code that executes and the URL that calls it are two different things. The same code can be called by multiple URLs - how many URLs map to how many pieces of code is an implementation question.
-hilmar
ARGHH!!
I had *promised* myself (and others) that I would not get dragged into a debate of this sort. But Roger just gave me too many opportunities to comment (I changed the subject line to protect the innocent).
(Tim -- now would be a good time to get yourself a cup of tea/coffee...)
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records?
In an ideal world, yes it should. There are two answers to the question "Why should we score taxa to regions at all?". At one level, the reason is because many, many people conceptualize space in terms of named regions (in much the same way that we conceptualize the diversity of oragnisms as named taxon concepts), and it turns out that many end-users are interested in answering the question, "What lives here?". But I think Roger's point in asking that question was more along the lines of, "Why do we want to model our data in such a way that taxa are linked directly to regions, rather than derive such distributional information from occurrence records?" My answer to that is: "We don't!" (a conclusion I came to over a decade ago). For a long time I have been firm believer that all Taxon-Locality statements should pass through (be derrived from) Occurrence records.
But now we come back to the real world. If I were to compile the list of organisms that are known to have occurred in Hawaii, then my list would probably only be ablut 70% complete if I relied only on documented observations and collected specimens. The other 30% of the list would come from statements published in historical literature, which often do not record specific details about individual collected specimens or observations. All we have in such cases are statements along the lines of "Jones (1950) reports that the organism he calls 'Aus bus' occurs in Hawaii".
How does such information get recorded in our databases and shared via DwC terms? The easy answer is to establish the link directly between a taxon and a location, as is done in Dave's DwCA Species Distribution extension (http://rs.gbif.org/extension/gbif/1.0/distribution.xml). This is probably fine for "abstracting" distribution information, and is perhaps appropriate for a DwCA extension. But in my opinion, it's a suboptimal approach to structuring the original information. Another approach is to re-frame the statement above as:
"We infer from Jones (1950) that at least one organism that he called 'Aus bus' was observed or collected in Hawaii"
... which allows us to represent this information in the form of an Occurrence record (albeit a somewhat skeletal one).
But this leads us into what I think is the real crux of the issue, which is to idenify what the scope of an "Occurrence" is or sould be.
These are the obvious ones:
In-Scope: - Captured one individual of an organism I identify as "Aus bus" at place and time. - Captured a thousand individuals of an organism I identify as "Aus bus" at place and time [think plankton tow]. - Observed one individual of an organism I identify as "Aus bus" at place and time. - Observed a thousand individuals of an organism I identify as "Aus bus" at place and time [think large school of fish, herd of wildebeest, flock of birds].
Not In-Scope: - Hawaiian Islands; Oahu; Kaneohe (21.410458, -157.774881) - Aus bus (Linnaeus 1758) sec. Jones 1950
But the question is whether the following statement falls within scope of an "Occurrence"
- A population of an organism identified as "Aus bus" occurs at place and time
Perhaps the differences in perspectives we're seeing on this thread are a result of differences on how we would answer that question.
If the answer is "yes, it's within scope", then I would argue that "nativeness" is a property of an Occurrence (as it already is in DwC, in the form of establishmentMeans).
If the answer is "no, it's not in scope", then OK -- but in that case, how does one represent "nativeness" within DwC? That is, what class of object would establishmentMeans be a property of?
The answer will always depend on the question asked.
Yes! Exactly the point of my previous post.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
If the scope of "Occurrence" is limited to specific individuals at specific place and time, then I would agree. But if the scope of "Occurrence" includes statements about populations of organisms in less-precise places and times, then I think it does.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
"Managed" is only one metric of consideration for how to score "intended for distribution" (and a relatively-straightforward one at that). A bit more subtle is the issue that Gail was driving at (i.e., Ross's gull in Massachusetts). For some use-cases, you would want to score that one as "intended for distribution" (if your question was about the potential for the species to disperse without the aid of humans); in other use-cases, you'd want to fileter it out (if your question was about where the statined breeding populations were). There are many other metrics of this sort which, I'm certain, would get hopelessly lost in a simple "intended for distribution" flag.
The status of taxa in regions is a completely different thing.
Well....not completely different. We're talking shades of grey; not black and white.
As soon as we talk about aggregating multiple observations (or lack of
them)
then we are talking about the results of analysis instead of primary
observations.
Hmmmm....I get where you're coming from on the analysis thing -- but our databases are absolutely loaded with instances of aggregated multiple observations (and even aggregated specimens). And getting back to the examples of "population of Aus bus occurs at pace time" example, clearly this is an aggregation, and probably an interpolation, but I'm not so sure it's merely the result of an analysis.
Only at this point should we be talking about the status of the
"occurrence"
in terms of native/invasive/naturalised etc. This may not even be based on
extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be
irradiated.
Since when do we limit or "proper" occurrence records to extant only?
Does the problem occur because we are using the same term "occurrence" to
mean
both a primary unit of data gathering and the result of an analysis
(possibly
even just a hypothesis if it is the result of niche modelling)? How could
we
differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Again, I think this is the crux of the issue. I wasn't even going to reply at all until I got to this paragraph; which triggered the enormous tome above. The distinction between "primary unit of data gathering" and "result of an analysis" is not as stark as you make it out to be. There's a lot in-between, which unfortunately includes things to which we would logically apply the notion of "nativeness" to.
Sorry to be long winded.
Likewise!
Aloha, Rich
On Oct 12, 2010, at 12:02 AM, Jerry Cooper wrote:
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
And furthermore they are judgments or inferences about things that are not observed.
Can someone please explain what is the issue here? We can't stop people from noticing that pandas live in Washington, DC. Why should we? If an organism is observed in a location, its observed there. That's reality. Why go further and speculate about the significance of this observation? What exactly are people afraid of? A fancy scheme for encoding inferences about "native", "introduced", etc. is not going to prevent organisms from popping up in unexpected places, due both to errors and to real events. Data consumers will find ways to deal with that, but probably not by using a fancy scheme of judgments that is not uniformly implemented (which, at this point, seems likely to me).
Arlin
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied to a taxon
concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only record
what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and the
obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying that a
simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful)
would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is
inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an assertion
that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the
more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well- established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic (too
late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define many
different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of Living
Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT
2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have
examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely uncontrolled -
effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured
through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we do need to
resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of Living
Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was
to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly right for
citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records
(text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there
is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead
of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ,
we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to
express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user
friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of
the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the
original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another
field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
------- Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
Hi Arlin,
In principle, I agree with you (at least I desperately want to). But I would just point out two things:
1) Many, many end-users want to filter results to include only "natural" occurrences of organisms. For small datasets/analyses, this can easily be done manually by the end-user. For large datasets and analyses, it would be very helpful to have this information embedded within the source dataset (and also the custodian of the dataset will often be in a better position to make such a judgement).
2) Although I agree that "not uniformly implemented" is a very serious risk at this point (and hence why I spent so much time writing emails on this thread), I'm not so sure the solution necessarily needs to be fancy. I'm still confident it can be resolved with a simple controlled vocabulary. The hard part will be figuring out the scope of that vocabulary.
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Arlin Stoltzfus Sent: Tuesday, October 12, 2010 3:48 AM To: tdwg-content@lists.tdwg.org List; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
On Oct 12, 2010, at 12:02 AM, Jerry Cooper wrote:
For me at least 'Native', 'Invasive' etc are clearly not
properties
associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
And furthermore they are judgments or inferences about things that are not observed.
Can someone please explain what is the issue here? We can't stop people from noticing that pandas live in Washington, DC. Why should we? If an organism is observed in a location, its observed there. That's reality. Why go further and speculate about the significance of this observation? What exactly are people afraid of? A fancy scheme for encoding inferences about "native", "introduced", etc. is not going to prevent organisms from popping up in unexpected places, due both to errors and to real events. Data consumers will find ways to deal with that, but probably not by using a fancy scheme of judgments that is not uniformly implemented (which, at this point, seems likely to me).
Arlin
GISIN, like our model here in NZ, pulls together such
items under a
triplet of taxon/occurrence statement/geographical extent
linked to a
publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively,
these terms
seem like they should apply to taxon concepts, but it turns
out that's
not the right way to do it. Things like "native" and
"invasive" are
not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is
included in
the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of
Jerry Cooper
Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the
TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied
to a taxon
concept rather than a specific collection/observation of a
taxon at a
location.
There are existing vocabularies for taxon-related
provenance,
like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only
record what
the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and the
obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of
Richard Pyle
Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] What I learned at the
TechnoBioBlitz
I certainly agree it's important! I was just saying that a
simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can
imagine about a
half-dozen terms that our community will no-doubt adopt
with almost no
debate..... :-)
In my mind, the broadest categories (and likely most useful)
would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is
inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an assertion
that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the
more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well- established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic (too
late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define many
different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of Living
Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT
2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain);
tuco@berkeley.edu
Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have
examples ranging from animals in zoos, to escaped animals, to
intentionally and
unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely uncontrolled -
effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured
through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we
do need to
resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of Living
Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was
to think about the suitability and appropriatness of TDWG standards
for citizen
science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the
scope of this
document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly right for
citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records
(text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there
is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead
of DwC terms for latitude and longitude. The geo namespace is a
well used and
supported standard, and records with geo coordinates are
automatically
mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ,
we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be
kosher DwC to
express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user
friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is
acceptable, at any
taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of
the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the
original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another
field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to
another set of
everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
-profile-v1-1
2. Slightly bastardizing our old observation
ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4.
http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
a-in-2010.html
5.
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not
read, use,
disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those
of Landcare
Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Example 1: in the ecological niche modelling of the predicted potential range of invasive species it essential to know the difference between the native range of the species, and the introduced range of the species (and to certainly exclude those records for species in greenhouses!).
Example 2: The existence of an occurrence record in a country can, and has, brought down trade barriers costing many maillions of dollars, when in fact a record might be due to a quarantine border intercept of that species, that was never 'in the wild'.
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Arlin Stoltzfus Sent: Wednesday, 13 October 2010 2:48 a.m. To: tdwg-content@lists.tdwg.org List; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
On Oct 12, 2010, at 12:02 AM, Jerry Cooper wrote:
For me at least 'Native', 'Invasive' etc are clearly not properties associated with a collection event. They are collective statements, not necessarily about properties of the taxon as a whole, but about the properties of a taxon in some restricted sense - usually geographically restricted.
And furthermore they are judgments or inferences about things that are not observed.
Can someone please explain what is the issue here? We can't stop people from noticing that pandas live in Washington, DC. Why should we? If an organism is observed in a location, its observed there. That's reality. Why go further and speculate about the significance of this observation? What exactly are people afraid of? A fancy scheme for encoding inferences about "native", "introduced", etc. is not going to prevent organisms from popping up in unexpected places, due both to errors and to real events. Data consumers will find ways to deal with that, but probably not by using a fancy scheme of judgments that is not uniformly implemented (which, at this point, seems likely to me).
Arlin
GISIN, like our model here in NZ, pulls together such items under a triplet of taxon/occurrence statement/geographical extent linked to a publication.
Jerry
-----Original Message----- From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 4:23 p.m. To: Jerry Cooper Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Jerry,
Yes, this is a road I've been down before. Intuitively, these terms seem like they should apply to taxon concepts, but it turns out that's not the right way to do it. Things like "native" and "invasive" are not properties of taxon concepts; they're the property of an occurrence (which, I suspect, is why establishmentMeans is included in the Occurrence class in DwC; e.g., see the examples at http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
Rich
From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Jerry Cooper Sent: Monday, October 11, 2010 4:38 PM Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Rich, Let's not confuse those terms which are best applied to a taxon
concept rather than a specific collection/observation of a taxon at a location.
There are existing vocabularies for taxon-related
provenance, like those in GISIN, or the vocabulary Roger mentioned in his PESI talk at TDWG.
However, against a specific collection you can only record
what the recorder actually knows at that location for that specific collected taxon, and not to infer a status like 'introduced' etc.
So, to me, the vocabulary reduces even further - and the
obvious ones are 'in cultivation', 'in captivity', 'border intercept' . Our botanical collection management system would hold more data on provenance of a specific collection and linkages between events - from the wild at t=1, x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But then we often have that data because we are generating it.
Jerry From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Tuesday, 12 October 2010 3:27 p.m. To: Donald.Hobern@csiro.au; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
I certainly agree it's important! I was just saying that a
simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful)
would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is
inhabiting the natural environment)
Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an assertion
that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the
more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well- established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment.
Anyway...I didn't want to say a lot on this topic (too
late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich ________________________________ From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich. I recognise this (and could probably define many
different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald untitled Donald Hobern, Director, Atlas of Living
Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT
2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org;
tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have
examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich ________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely uncontrolled -
effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald untitled Donald Hobern, Director, Atlas of
Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700,
Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au
Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com
[mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured
through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote:
Thanks, Joel. Nice summary. One addition which we do need to
resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald Donald Hobern, Director, Atlas of Living
Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was
to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations: 1. Darwin Core is almost exactly right for
citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records
(text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or
diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there
is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead
of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ,
we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to
express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
2. DwC:scientificName might be more user
friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
3. Catalogue of Life was an important part of
the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
4. We didn't include "basisOfRecord" in the
original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
5. There seemed to be enthusiasm for another
field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel. ---- 1.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any
attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
------- Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Just trying to confirm whether my understanding from this thread so far is correct. Is there agreement that dwc:establishmentMeans is a) not meaningful for a point or region in space, or a point in time, or a taxon concept, but only for a taxon concept at a spatial location at a point in time, and b) is a judgment made by someone on the basis of certain evidence.
If yes, doesn't that mean that the property needs to be applied to a tuple of (taxon concept, location, time), and needs to be linked to a source and at least the kind of evidence (such as a vocabulary term) on which the judgement was made? I.e., how much worth is a dwc:establishmentMeans value that fails either of those requirements?
Apologies if that's all there already.
-hilmar
Sent with a tap.
Hi Hilmar,
I'm not yet convinced that dwc:establishmentMeans is not meaningful for a point (which I interpret as a single organism or small set of organisms collected or observed at a particular place/time). I think a legitimate piece of metadata (i.e., something that data consumers will often want to know) for such "standard" occurrence record is: "Was this organism born in the general vicinity of its documented capture/observation, or did it arrive from a far-distant locality during its lifetime?" And, if the answer is the latter, then many data consumers may also want to know "Was it brought by humans to its documented place & time, or did it manage to get there by so-called 'natural' means?" Also, if the organism itself was born in the vicinity of its documented capture/observation, it may be useful to know whether its recent ancestors arrived in the vicinity with or without the help of humans.
These seem like esoteric (or even rediculous) questions, but I think they are, in essence, what people want to know in terms of why establishmentMeans is part of DwC.
Also, as I tried to explain in one of my previous posts, it seems to me that an "Occurrence" ultimately represents "a tuple of (taxon concept, location, time)", where location and time are generally properties of Events, and an Occurrence is essentially the intersection of a Taxon Concept and an Event.
Aloha, Rich
-----Original Message----- From: Hilmar Lapp [mailto:hlapp@nescent.org] Sent: Tuesday, October 12, 2010 1:26 PM To: Richard Pyle Cc: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Just trying to confirm whether my understanding from this thread so far is correct. Is there agreement that dwc:establishmentMeans is a) not meaningful for a point or region in space, or a point in time, or a taxon concept, but only for a taxon concept at a spatial location at a point in time, and b) is a judgment made by someone on the basis of certain evidence.
If yes, doesn't that mean that the property needs to be applied to a tuple of (taxon concept, location, time), and needs to be linked to a source and at least the kind of evidence (such as a vocabulary term) on which the judgement was made? I.e., how much worth is a dwc:establishmentMeans value that fails either of those requirements?
Apologies if that's all there already.
-hilmar
Sent with a tap.=
There are several dwc terms that need the company of others to make sense. But this "company" effectively means defining proper classes, sth that dwc did not dare to do as it opens up a can of worms. There are so many ways to model our world and they all make sense. But nevertheless these terms are useful when actually defining such classes in application schemas.
The distribution "class" we have defined as a dwc archive extension for taxa makes use of establishment means and puts it into the taxonomic, spatial and temporal context as being discussed:
http://rs.gbif.org/extension/gbif/1.0/distribution.xml
It also adds a lifestage qualifier to distinguish between juvenile and adults for example. And it allows for both a seasonal or a date range temporal context.
Markus
On Oct 13, 2010, at 4:29, Richard Pyle wrote:
Hi Hilmar,
I'm not yet convinced that dwc:establishmentMeans is not meaningful for a point (which I interpret as a single organism or small set of organisms collected or observed at a particular place/time). I think a legitimate piece of metadata (i.e., something that data consumers will often want to know) for such "standard" occurrence record is: "Was this organism born in the general vicinity of its documented capture/observation, or did it arrive from a far-distant locality during its lifetime?" And, if the answer is the latter, then many data consumers may also want to know "Was it brought by humans to its documented place & time, or did it manage to get there by so-called 'natural' means?" Also, if the organism itself was born in the vicinity of its documented capture/observation, it may be useful to know whether its recent ancestors arrived in the vicinity with or without the help of humans.
These seem like esoteric (or even rediculous) questions, but I think they are, in essence, what people want to know in terms of why establishmentMeans is part of DwC.
Also, as I tried to explain in one of my previous posts, it seems to me that an "Occurrence" ultimately represents "a tuple of (taxon concept, location, time)", where location and time are generally properties of Events, and an Occurrence is essentially the intersection of a Taxon Concept and an Event.
Aloha, Rich
-----Original Message----- From: Hilmar Lapp [mailto:hlapp@nescent.org] Sent: Tuesday, October 12, 2010 1:26 PM To: Richard Pyle Cc: Jerry Cooper; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Just trying to confirm whether my understanding from this thread so far is correct. Is there agreement that dwc:establishmentMeans is a) not meaningful for a point or region in space, or a point in time, or a taxon concept, but only for a taxon concept at a spatial location at a point in time, and b) is a judgment made by someone on the basis of certain evidence.
If yes, doesn't that mean that the property needs to be applied to a tuple of (taxon concept, location, time), and needs to be linked to a source and at least the kind of evidence (such as a vocabulary term) on which the judgement was made? I.e., how much worth is a dwc:establishmentMeans value that fails either of those requirements?
Apologies if that's all there already.
-hilmar
Sent with a tap.=
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Your human-centered distinctions will likely prove to be useful in certain contexts (quarantine; invasive species watches), but think of birds or insects blown or carried into areas by meteorological events that survive and become established for a season or more, or as curiosities of observation (Ross's gull in Massachusetts). Range extension is important in climate change research and may have nothing to do with direct human activities outlined below.
Cheers! Gail
On Oct 11, 2010, at 9:26 PM, Richard Pyle wrote:
I certainly agree it's important! I was just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is inhabiting the natural environment) Captive (brought by humans and still maintained in captivity)
You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment. Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich
From: Donald.Hobern@csiro.au [mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
<image001.png>
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled – effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
<image001.png>
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote: Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg.
character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl
- http://www.w3.org/2003/01/geo/
- http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html
- http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Gail,
Yes, and this gets back to Donald's original point, and one I've wrestled with for a long time -- which boils down to, "what, exactly, are we interested in knowing"?
In my experience, most people seem most concerned with whether an organism got to where it was captured/observed/photographed/surveyed/monitored/whatever with the aid of humans, or without. I think the reason for this is that most people distinguich between "natural" and "artificial" on the basis of how much control humans had in it.
After that, the next most important question tends to be whether it's an anomalous occurrence, or a locally reproducing population. This is where we get into waifs & vagrants (I see the examples you gave as subcategories of "Native", in my hyper-simplified vocabulary in my earlier post); and from there, it gets into shades of grey on how "established" the population is (e.g., does it achieve the threshold of "naturalized"?)
Unfortunately, these two things (how it got there, vs. whether it's locally reproducing) are really different metrics, yet people often try to combine them into the same category.
From my read of http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans,
both metrics are evident. In a sense, the presence of the word "established" might seem to restrict it to only locally-reproducing populations. However, one could also interpret the word "establish" in the context of individual organism (e.g., how did this individual Ross's Gull get established in Massachusetts?)
The GBIF vocabulary that Dave sent the link for also seems to mix the two metrics.
And there are other considerations I've seen in databases as well (including our own): such as the distinction between "adventive" and "intentional" introductions. Also, the word "invasive" (one of the examples on the DwC page for establishmentMeans) suggests another angle, implying some sort of "harm" done to the ecosystems or to human interests such as agriculture.
So again, I ask: "What, exactly, are we interested in knowing?" If I read Donald correctly, he simply wants to filter out occurrence records of Pandas in Washington DC. But Niche Modellers might come from a different angle, and seek evidence of whether or not an organism is capable of sustaining a locally reproducing population in some particular place (regardless of whether the founders of the pupulation were brought by humans, or got there on their own).
If we can answer that question, then the next question will be whether we can pack all the answers into a single controlled vocabulary for "establishmentMeans", or whether we may need to think about another DwC term, with a slightly different meaning.
Aloha, Rich
________________________________
From: Gail Kampmeier [mailto:gkamp@illinois.edu] Sent: Monday, October 11, 2010 4:41 PM To: Richard Pyle Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz Your human-centered distinctions will likely prove to be useful in certain contexts (quarantine; invasive species watches), but think of birds or insects blown or carried into areas by meteorological events that survive and become established for a season or more, or as curiosities of observation (Ross's gull in Massachusetts). Range extension is important in climate change research and may have nothing to do with direct human activities outlined below.
Cheers! Gail
On Oct 11, 2010, at 9:26 PM, Richard Pyle wrote:
I certainly agree it's important! I was just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-) In my mind, the broadest categories (and likely most useful) would be something like: Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is inhabiting the natural environment) Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know) Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment. Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field. Aloha, Rich
________________________________
From: Donald.Hobern@csiro.au [mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz Hi Rich. I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences. Donald <image001.png> Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/ From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"? Rich
________________________________
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful. Donald <image001.png> Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/ From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote: Thanks, Joel. Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group. Donald Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific. Here are some of my immediate observations: 1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz". We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions." Here are areas where we augemented or diverged from DwC in the bioblitz: i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available). ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum. If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ (I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this. 2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred. 3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4]. 4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web. 5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc. Happy Thanksgiving to all in Canada - Joel. ---- 1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
An earlier version of the draft DwC had a simple flag of ValidDistribution (true/false). This is perhaps the flag that Donald was looking for, which morphed into the more complex establishmentMeans, to give various stakeholders the chance to figure out more precisely as you say "what, exactly are we interested in knowing?" John's current version is more flexible but does suggest a controlled vocabulary, which has not been worked out, but ValidDistribution = false would capture the pandas in DC and the Ross's gull (http://www.allaboutbirds.org/guide/Rosss_Gull/id) in Massachusetts, and for a particular point in time. It would not, however, adequately capture the extensive escapes, accidental and purposeful introductions of non-native species that become established, and are later considered to be ValidDistribution = true (apparently this might also now be the case for Ross's gull, but I remember the big hooha in 1975!). When is this line considered to be crossed? This is likely why John, in his wisdom, decided to deprecate ValidDistribution in favor of something with a broader possible interpretation.
Thanks for your discussion below.
Gail
On Oct 11, 2010, at 11:00 PM, Richard Pyle wrote:
Hi Gail,
Yes, and this gets back to Donald's original point, and one I've wrestled with for a long time -- which boils down to, "what, exactly, are we interested in knowing"?
In my experience, most people seem most concerned with whether an organism got to where it was captured/observed/photographed/surveyed/monitored/whatever with the aid of humans, or without. I think the reason for this is that most people distinguich between "natural" and "artificial" on the basis of how much control humans had in it.
After that, the next most important question tends to be whether it's an anomalous occurrence, or a locally reproducing population. This is where we get into waifs & vagrants (I see the examples you gave as subcategories of "Native", in my hyper-simplified vocabulary in my earlier post); and from there, it gets into shades of grey on how "established" the population is (e.g., does it achieve the threshold of "naturalized"?)
Unfortunately, these two things (how it got there, vs. whether it's locally reproducing) are really different metrics, yet people often try to combine them into the same category.
From my read of http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans, both metrics are evident. In a sense, the presence of the word "established" might seem to restrict it to only locally-reproducing populations. However, one could also interpret the word "establish" in the context of individual organism (e.g., how did this individual Ross's Gull get established in Massachusetts?)
The GBIF vocabulary that Dave sent the link for also seems to mix the two metrics.
And there are other considerations I've seen in databases as well (including our own): such as the distinction between "adventive" and "intentional" introductions. Also, the word "invasive" (one of the examples on the DwC page for establishmentMeans) suggests another angle, implying some sort of "harm" done to the ecosystems or to human interests such as agriculture.
So again, I ask: "What, exactly, are we interested in knowing?" If I read Donald correctly, he simply wants to filter out occurrence records of Pandas in Washington DC. But Niche Modellers might come from a different angle, and seek evidence of whether or not an organism is capable of sustaining a locally reproducing population in some particular place (regardless of whether the founders of the pupulation were brought by humans, or got there on their own).
If we can answer that question, then the next question will be whether we can pack all the answers into a single controlled vocabulary for "establishmentMeans", or whether we may need to think about another DwC term, with a slightly different meaning.
Aloha, Rich
From: Gail Kampmeier [mailto:gkamp@illinois.edu] Sent: Monday, October 11, 2010 4:41 PM To: Richard Pyle Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Your human-centered distinctions will likely prove to be useful in certain contexts (quarantine; invasive species watches), but think of birds or insects blown or carried into areas by meteorological events that survive and become established for a season or more, or as curiosities of observation (Ross's gull in Massachusetts). Range extension is important in climate change research and may have nothing to do with direct human activities outlined below.
Cheers! Gail
On Oct 11, 2010, at 9:26 PM, Richard Pyle wrote:
I certainly agree it's important! I was
just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-) In my mind, the broadest categories (and likely most useful) would be something like: Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is inhabiting the natural environment) Captive (brought by humans and still maintained in captivity) You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know) Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment. Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field. Aloha, Rich
From: Donald.Hobern@csiro.au
[mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz Hi Rich. I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences. Donald
<image001.png>
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"? Rich
________________________________ From:
tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Donald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John. This is
useful, but completely uncontrolled - effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful. Donald
<image001.png>
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au mailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured
through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
On Mon, Oct 11, 2010 at 5:16 PM,
Donald.Hobern@csiro.au wrote: Thanks, Joel. Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group. Donald Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org
[mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific. Here are some of my immediate observations: 1. Darwin Core is almost exactly right for citizen science. However, there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz". We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions." Here are areas where we augemented or diverged from DwC in the bioblitz: i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available). ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum. If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ (I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this. 2. DwC:scientificName might be more user friendly than taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred. 3. Catalogue of Life was an important part of the workflow, but we had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4]. 4. We didn't include "basisOfRecord" in the original data profile, and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web. 5. There seemed to be enthusiasm for another field event at next year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc. Happy Thanksgiving to all in Canada - Joel. ---- 1. http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
The GBIF vocabulary server has a draft "Nativeness" vocabulary that contains a list of these and other terms representing a useful controlled list for dwc:establishmentMeans.
http://vocabularies.gbif.org/vocabularies/nativeness
This might serve as a platform for follow-on development. I could assist in providing administrative access to this vocabulary. This vocabulary development platform is under evaluation and we will be moving forward on vocabulary development and integration within the GBIF network.
David Remsen
On Oct 12, 2010, at 11:26 AM, Richard Pyle wrote:
I certainly agree it's important! I was just saying that a simple flag probably wouldn't be enough. I like the idea of a controlled vocabulary (as you and John both allude to), and I can imagine about a half-dozen terms that our community will no-doubt adopt with almost no debate..... :-)
In my mind, the broadest categories (and likely most useful) would be something like:
Native (was there without any assistance from humans) Introduced (got there with the assistance of humans, but is inhabiting the natural environment) Captive (brought by humans and still maintained in captivity)
You might also throw in "Cryptogenic", which is an assertion that we do not know which of these categories a particular organism falls (not the same as null, which means we don't know whether or not we know)
Of course, each of these can be further subdivded, but the more we subdivide, the greater the ratio of fuzzy:clean distinctions. I would say that the terms should be established in consultation with those most likely to use them (e.g., as you suggest, distribution analysis, niche modellers, etc.) For example, it might be useful to distinguish between an organism that was itself introduced, compared to the progeny (or a well-established population) of an intoduced organism. This information can be useful for separating things likely to become established in new localities, vs. things that do not seem to "take" in a novel environment. Anyway...I didn't want to say a lot on this topic (too late?); I just wanted to steer more towards controlled vocabulary, than simple flag field.
Aloha, Rich
From: Donald.Hobern@csiro.au [mailto:Donald.Hobern@csiro.au] Sent: Monday, October 11, 2010 3:44 PM To: Richard Pyle; tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
Hi Rich.
I recognise this (and could probably define many different useful flags). The bottom line is really whether or not the location is one which should be used for distribution analysis, niche modelling and similar activities. There will certainly be many grey areas, but it would be good if software could weed out captive occurrences.
Donald
<image001.png>
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: Richard Pyle [mailto:deepreef@bishopmuseum.org] Sent: Tuesday, 12 October 2010 12:33 PM To: Hobern, Donald (CES, Black Mountain); tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
I'm not so sure a simple flag will do it. We have examples ranging from animals in zoos, to escaped animals, to intentionally and unintentionally introduced populations, to naturalized populations -- and just about everything in-between. Where on this spectrum would you draw the line for flagging something as "naturally occurring"?
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org ] On Behalf OfDonald.Hobern@csiro.au Sent: Monday, October 11, 2010 2:59 PM To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Thanks, John.
This is useful, but completely uncontrolled – effectively a verbatimEstablishmentMeans. Having a more controlled version or a simple flag which could be machine-processible in those cases where providers can supply it would be useful.
Donald
<image001.png>
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, 12 October 2010 11:34 AM To: Hobern, Donald (CES, Black Mountain) Cc: jsachs@csee.umbc.edu; tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
Natural occurrence is meant to be captured through the term dwc:establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans ).
On Mon, Oct 11, 2010 at 5:16 PM, Donald.Hobern@csiro.au wrote: Thanks, Joel.
Nice summary. One addition which we do need to resolve (and which has been suggested in recent months) is to have a flag to indicate whether a record should be considered to show a "natural" occurrence (in distinction from cultivation, botanic gardens, zoos, etc.). This is not so much an issue in a BioBlitz, but is certainly a factor with citizen science recording in general - see the number of zoo animals in the Flickr EOL group.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org ] On Behalf Of joel sachs Sent: Monday, 11 October 2010 10:47 PM To: tdwg-bioblitz@googlegroups.com; tdwg-content@lists.tdwg.org Subject: [tdwg-content] What I learned at the TechnoBioBlitz
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However,
there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than
taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile,
and so it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next
year's TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl
- http://www.w3.org/2003/01/geo/
- http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html
- http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
This conversation about values for basisOfRecord, establishmentMeans, and the nature of what actually constitutes a dwc:Occurrence is very important. We have sitting on the table before us several official requests for additions and modifications to Darwin Core: http://code.google.com/p/darwincore/issues/detail?id=68 http://code.google.com/p/darwincore/issues/detail?id=69 http://code.google.com/p/darwincore/issues/detail?id=80 and http://code.google.com/p/darwincore/issues/detail?id=81 that cannot and should not be decided until this discussion occurs. In particular, a discussion of what exactly a dwc:Occurrence is lies at the heart of much of what we are discussing in this thread and is critical to other processes that are moving forward, such as guidelines for how we represent things in RDF. On this list I requested discussion on this suite of topics when I proposed the Darwin Core modifications, and I requested to members of the TAG that this discussion happen at the TDWG meeting. It didn't happen either place, so I'm glad it's happening here now.
Roger has correctly noted that we colloquially talk about Occurrences in two ways that are fundamentally different. We use Occurrence (1) to mean that a species occurs generically at a particular locality (the "checklist" use), and (2) we talk about particular instances of particular individual organisms being noticed at a particular place at a particular time. Based on the clarification that John Wieczorek gave in the thread that surrounds http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000280.html, an Occurrence record simply asserts that an organism was someplace at a certain time (and doesn't imply any fitness of use such as for documenting distributions). This is consistent with meaning (2). I think that the "checklist" use (meaning 1) really should be called something else because it is conceptually something very different.
Assuming that when we talk about a dwc:Occurrence we intend meaning (2), it is important to clarify what aspect of an organism occurring somewhere at some time we intend for dwc:Occurrence to mean. When people talk about Occurrences, the conversation often goes awry because people are considering an Occurrence to include more or fewer conceptual entities. I don't know if images can be embedded in messages sent to the list, so look at this image: http://bioimages.vanderbilt.edu/pages/resource-diagram.gif before reading further. In that diagram, I'm trying to be as generic as possible. I think it is the intention of both TDWG and GBIF to go beyond thinking that Occurrences can only be specimens. So consider that this generic Occurrence could be a PreservedSpecimen, but could also be an image of an organism, DNA sample, or any other token of the presence of the Organism at a particular time and place (or a HumanObservation that has no token at all). I have heard people say that an Occurrence is a dctype:Event. That recognizes the arrow on the left side of the diagram which represents the time and place of the Occurrence. I have heard people say that if we photograph an organism, that is an "observation" with associated media. That recognizes the collected metadata (i.e. the "observation") part and the representation of the organism part (the photograph). When we talk about a PreservedSpecimen being an Occurrence, we probably intend the metadata as well as the physical thing in a jar or glued to a sheet of paper (the representation of the organism) and may or may not include the arrow on the left. I have taken the position that an Occurrence includes all of the components shown in the diagram. I'm not saying that this is the correct or only view on this subject, but if somebody intends for an Occurrence to mean something else, then they need to be clear about which component(s) of the diagram they are talking about.
Being conceptually clear about these things is important because that clarity informs the decision-making process about the pending issues that I mentioned, such as whether DigitalStillImage should be added as a DwC type (and hence have a URI and be an accepted value for dwc:basisOfRecord) and how we should structure RDF when we try to describe the properties of an Occurrence. If by "basisOfRecord" we mean a representation or token on which the Occurrence is based (or lack of token in the case of observations), then we should add as DwC types any type of physical or digital artifact that will be used by several people to document that an Occurrence existed at some point. It would not make logical sense to say that sometimes the basisOfRecord can be an artifact like a specimen, but other supporting artifacts such as digital images cannot and must be relegated to being associatedMedia.
I am not going to say more on this topic right now, partly because I have mid-semester progress reports to finish by the end of the day, but mostly because I wrote a paper discussing these issues and it lays out the conceptual framework I'm talking about better than I can in an email. I have cited that paper both in my requests for the Darwin Core changes and in previous emails to this list. However, based on the various emails that have been flying around, I don't think many people on the list have read it. That paper isn't a spur of the moment rant. I spent over a year writing it, solicited and received comments about it from a number of people including several people on the TAG, and went through the peer review process for several months before it was finally published this spring. It does not necessarily represent "the correct" view on the topics that we are discussing, but I believe that it does represent a logically consistent way of conceptualizing Occurrences and how a broad range of types of Occurrences can be described and related to other resources. If others can present clear and consistent alternatives to the framework that I've suggested, I would like to hear what they are. The article, Biodiversity Informatics 7:14-44 can be accessed at https://journals.ku.edu/index.php/jbi/article/view/3664 . In particular, take note of the discussion on p.27-28 regarding the criterion for determining whether an Occurrence documents a species' distribution, p. 28 where I discuss the difference between the use of dwc:recordedBy and dcterms:created, and p. 29 where I suggest controlled values for dwc:establishmentMeans that can be used for differentiating the extent to which an individual documented by an Occurrence occurs "naturally" at its location (native, naturalized, adventive, or cultivated - intended to apply to either plants or animals; a farm or zoo animal would be considered "cultivated"-I would be happy to define and propose these as a controlled vocabulary). These are all things that have come up in this thread. I also should note that I have been successfully applying this framework to live plant images at http://bioimages.vanderbilt.edu where I serve RDF that is consistent with the design discussed in the paper.
I would like to say more about the relationship between LivingSpecimens, Individuals, establishmentMeans, and indicating whether an Occurrence document's a species' distribution, but that will have to wait until later.
Steve Baskauf
joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for citizen science. Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the scope of this document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a whole. I hope no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google group for this discussion, and copy in tdwg-content when our discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science. However, there
is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the design choice between "a simple csv file and a Darwin Core record". But a simple csv file is a legitimate representation of Darwin Core! To be fair to the developer, such a sentence might not have struck me as absurd a year ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in the bioblitz:
i. We added obs:observedBy [2], since there is no equivalent property in DwC, and it's important in Citizen Science (though often not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms for latitude and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are automatically mapped by several applications. Since everyone was using GPS to retrieve their coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms from other namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than taxonomy:binomial
and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't self-explanatory enough, a user can look it up, and see that any scientific name is acceptable, at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow, but we
had some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data profile, and so
it wasn't a column in the Fusion Table [5]. But when a transcriber felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the table. This flexibility of schema is important, and is in harmony with the semantic web.
- There seemed to be enthusiasm for another field event at next year's
TDWG. This could be an opportunity to gather other types of data (eg. character data) and thereby i) expose meeting particpants to another set of everyday problems from the world of biodiversity workflows, and ii) try other TDWG technology on for size, e.g. the observation exchange format, annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
- http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile-v1-1
- Slightly bastardizing our old observation ontology -
http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4. http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .
Damn! I wish I'd read this before writing that massive reply to Roger (note: I'm trying to move this into the new thread with the new subject line).
I agree with most of what Steve wrote, but I still disagree (as I did with Roger) that the distinction between Steve's two meanings of "Occurrence" is so stark. I agree there is a fundamental distinction from the perspective of data management between:
TaxonConcept<-->Location
And
TaxonConcept<-->Occurrence<-->Event<-->Location
My contention, however, is that "TaxonConcept<-->Location" is often (usually? always?) just a short-hand (scant metadata) way of representing "TaxonConcept<-->Occurrence<-->Event<-->Location". Our domain (biodiversity information) is full of these overloaded short-hand terms, and they're often not easy to detect as such (e.g., so many databases simply represent implied taxon concepts as text-string scientific names).
In my mind, the "essence" of an Occurrence is, ultimately, "organism(s) at place and time". The "place and time" part are represented as a dwc:Event class linked to a dcterms:Location class. The tricky part is what do we mean by "organism(s)". I suspect most would agree that an individual bird falls within scope of "organism(s)" in the case of dwc Occurrence. I further suspect that most would agree that a flock of birds also falls within scope.
But what about a population of birds? No? What is a population, other than a set of individual organisms? How is this different from a "flock" (a smaller set of individual organisms)?
And what about a taxon concept? No? What is a taxon concept, other than a (larger) set of individual organisms?
The fact is, there is a smooth continuum spanning:
IndividualOrganism<-->Event GroupOfIndividualOrganisms<-->Event PopulationOfIndividualOrganisms<-->Event TaxonCocnept<-->Event
Each of the four items above has overlapping scope with adjacent items in the list (the overlap between the first two is evident in colonial organisms).
Steve, many thanks for sending the link to your paper. I apologize that I have not read it yet (I often don't have time to stay on top of this list, so I missed your earlier reference to it), but I will. Just be aware that while my contributions to this thread are not in published/peer-reviewd form, they are nevertheless the result of more than two decades of dealing with biodiversity datasets, and very careful thinking and reasoning (i.e., as much as it may seem otherwise, these are much more than spur-of-the-momnent rants).
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Tuesday, October 12, 2010 6:37 AM To: joel sachs Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
This conversation about values for basisOfRecord, establishmentMeans, and the nature of what actually constitutes a dwc:Occurrence is very important. We have sitting on the table before us several official requests for additions and modifications to Darwin Core: http://code.google.com/p/darwincore/issues/detail?id=68 http://code.google.com/p/darwincore/issues/detail?id=69 http://code.google.com/p/darwincore/issues/detail?id=80 and http://code.google.com/p/darwincore/issues/detail?id=81 that cannot and should not be decided until this discussion occurs. In particular, a discussion of what exactly a dwc:Occurrence is lies at the heart of much of what we are discussing in this thread and is critical to other processes that are moving forward, such as guidelines for how we represent things in RDF. On this list I requested discussion on this suite of topics when I proposed the Darwin Core modifications, and I requested to members of the TAG that this discussion happen at the TDWG meeting. It didn't happen either place, so I'm glad it's happening here now.
Roger has correctly noted that we colloquially talk about Occurrences in two ways that are fundamentally different. We use Occurrence (1) to mean that a species occurs generically at a particular locality (the "checklist" use), and (2) we talk about particular instances of particular individual organisms being noticed at a particular place at a particular time. Based on the clarification that John Wieczorek gave in the thread that surrounds http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000280.html, an Occurrence record simply asserts that an organism was someplace at a certain time (and doesn't imply any fitness of use such as for documenting distributions). This is consistent with meaning (2). I think that the "checklist" use (meaning 1) really should be called something else because it is conceptually something very different.
Assuming that when we talk about a dwc:Occurrence we intend meaning (2), it is important to clarify what aspect of an organism occurring somewhere at some time we intend for dwc:Occurrence to mean. When people talk about Occurrences, the conversation often goes awry because people are considering an Occurrence to include more or fewer conceptual entities. I don't know if images can be embedded in messages sent to the list, so look at this image: http://bioimages.vanderbilt.edu/pages/resource-diagram.gif before reading further. In that diagram, I'm trying to be as generic as possible. I think it is the intention of both TDWG and GBIF to go beyond thinking that Occurrences can only be specimens. So consider that this generic Occurrence could be a PreservedSpecimen, but could also be an image of an organism, DNA sample, or any other token of the presence of the Organism at a particular time and place (or a HumanObservation that has no token at all). I have heard people say that an Occurrence is a dctype:Event. That recognizes the arrow on the left side of the diagram which represents the time and place of the Occurrence. I have heard people say that if we photograph an organism, that is an "observation" with associated media. That recognizes the collected metadata (i.e. the "observation") part and the representation of the organism part (the photograph). When we talk about a PreservedSpecimen being an Occurrence, we probably intend the metadata as well as the physical thing in a jar or glued to a sheet of paper (the representation of the organism) and may or may not include the arrow on the left. I have taken the position that an Occurrence includes all of the components shown in the diagram. I'm not saying that this is the correct or only view on this subject, but if somebody intends for an Occurrence to mean something else, then they need to be clear about which component(s) of the diagram they are talking about.
Being conceptually clear about these things is important because that clarity informs the decision-making process about the pending issues that I mentioned, such as whether DigitalStillImage should be added as a DwC type (and hence have a URI and be an accepted value for dwc:basisOfRecord) and how we should structure RDF when we try to describe the properties of an Occurrence. If by "basisOfRecord" we mean a representation or token on which the Occurrence is based (or lack of token in the case of observations), then we should add as DwC types any type of physical or digital artifact that will be used by several people to document that an Occurrence existed at some point. It would not make logical sense to say that sometimes the basisOfRecord can be an artifact like a specimen, but other supporting artifacts such as digital images cannot and must be relegated to being associatedMedia.
I am not going to say more on this topic right now, partly because I have mid-semester progress reports to finish by the end of the day, but mostly because I wrote a paper discussing these issues and it lays out the conceptual framework I'm talking about better than I can in an email. I have cited that paper both in my requests for the Darwin Core changes and in previous emails to this list. However, based on the various emails that have been flying around, I don't think many people on the list have read it. That paper isn't a spur of the moment rant. I spent over a year writing it, solicited and received comments about it from a number of people including several people on the TAG, and went through the peer review process for several months before it was finally published this spring. It does not necessarily represent "the correct" view on the topics that we are discussing, but I believe that it does represent a logically consistent way of conceptualizing Occurrences and how a broad range of types of Occurrences can be described and related to other resources. If others can present clear and consistent alternatives to the framework that I've suggested, I would like to hear what they are. The article, Biodiversity Informatics 7:14-44 can be accessed at https://journals.ku.edu/index.php/jbi/article/view/3664 . In particular, take note of the discussion on p.27-28 regarding the criterion for determining whether an Occurrence documents a species' distribution, p. 28 where I discuss the difference between the use of dwc:recordedBy and dcterms:created, and p. 29 where I suggest controlled values for dwc:establishmentMeans that can be used for differentiating the extent to which an individual documented by an Occurrence occurs "naturally" at its location (native, naturalized, adventive, or cultivated - intended to apply to either plants or animals; a farm or zoo animal would be considered "cultivated"-I would be happy to define and propose these as a controlled vocabulary). These are all things that have come up in this thread. I also should note that I have been successfully applying this framework to live plant images at http://bioimages.vanderbilt.edu where I serve RDF that is consistent with the design discussed in the paper.
I would like to say more about the relationship between LivingSpecimens, Individuals, establishmentMeans, and indicating whether an Occurrence document's a species' distribution, but that will have to wait until later.
Steve Baskauf
joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for
citizen science.
Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the
scope of this
document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a
whole. I hope
no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google
group for
this discussion, and copy in tdwg-content when our
discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science.
However,
there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the
design choice
between "a simple csv file and a Darwin Core record". But a
simple csv
file is a legitimate representation of Darwin Core! To be
fair to the
developer, such a sentence might not have struck me as
absurd a year
ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a
collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in
the bioblitz:
i. We added obs:observedBy [2], since there is no
equivalent property
in DwC, and it's important in Citizen Science (though often
not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms
for latitude
and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are
automatically mapped by
several applications. Since everyone was using GPS to
retrieve their
coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms
from other
namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than
taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't
self-explanatory enough, a
user can look it up, and see that any scientific name is
acceptable,
at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow,
but we had
some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data
profile, and
so it wasn't a column in the Fusion Table [5]. But when a
transcriber
felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the
table. This
flexibility of schema is important, and is in harmony with
the semantic web.
- There seemed to be enthusiasm for another field event at next
year's TDWG. This could be an opportunity to gather other
types of data (eg.
character data) and thereby i) expose meeting particpants to another set of everyday
problems from
the world of biodiversity workflows, and ii) try other TDWG
technology
on for size, e.g. the observation exchange format,
annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile
-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4.
http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-201
0.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Occurrence is admittedly a problematic term. Its current definition is vague following in the grand tradition of Dublin Core term definitions. Rich's interpretation echoes what Steve wrote and comes closest in my mind to what an occurrence really is meant to be, namely "evidence of one or more organisms occurring at a place and time." This leaves open all of the vast continuum of scales - geographic, temporal, and taxonomic - at which occurrences can be described. I'm not sure exactly what is solved by trying to make named distinctions between different scales or levels of detail (on any of the three axes) of Occurrence. The core of the issue really boils down to fitness-for-use of records and a potential user's capacity to accurately determine that. These should be characteristics that can be determined from the content of the records.
On Tue, Oct 12, 2010 at 1:11 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Damn! I wish I'd read this before writing that massive reply to Roger (note: I'm trying to move this into the new thread with the new subject line).
I agree with most of what Steve wrote, but I still disagree (as I did with Roger) that the distinction between Steve's two meanings of "Occurrence" is so stark. I agree there is a fundamental distinction from the perspective of data management between:
TaxonConcept<-->Location
And
TaxonConcept<-->Occurrence<-->Event<-->Location
My contention, however, is that "TaxonConcept<-->Location" is often (usually? always?) just a short-hand (scant metadata) way of representing "TaxonConcept<-->Occurrence<-->Event<-->Location". Our domain (biodiversity information) is full of these overloaded short-hand terms, and they're often not easy to detect as such (e.g., so many databases simply represent implied taxon concepts as text-string scientific names).
In my mind, the "essence" of an Occurrence is, ultimately, "organism(s) at place and time". The "place and time" part are represented as a dwc:Event class linked to a dcterms:Location class. The tricky part is what do we mean by "organism(s)". I suspect most would agree that an individual bird falls within scope of "organism(s)" in the case of dwc Occurrence. I further suspect that most would agree that a flock of birds also falls within scope.
But what about a population of birds? No? What is a population, other than a set of individual organisms? How is this different from a "flock" (a smaller set of individual organisms)?
And what about a taxon concept? No? What is a taxon concept, other than a (larger) set of individual organisms?
The fact is, there is a smooth continuum spanning:
IndividualOrganism<-->Event GroupOfIndividualOrganisms<-->Event PopulationOfIndividualOrganisms<-->Event TaxonCocnept<-->Event
Each of the four items above has overlapping scope with adjacent items in the list (the overlap between the first two is evident in colonial organisms).
Steve, many thanks for sending the link to your paper. I apologize that I have not read it yet (I often don't have time to stay on top of this list, so I missed your earlier reference to it), but I will. Just be aware that while my contributions to this thread are not in published/peer-reviewd form, they are nevertheless the result of more than two decades of dealing with biodiversity datasets, and very careful thinking and reasoning (i.e., as much as it may seem otherwise, these are much more than spur-of-the-momnent rants).
Aloha, Rich
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Tuesday, October 12, 2010 6:37 AM To: joel sachs Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
This conversation about values for basisOfRecord, establishmentMeans, and the nature of what actually constitutes a dwc:Occurrence is very important. We have sitting on the table before us several official requests for additions and modifications to Darwin Core: http://code.google.com/p/darwincore/issues/detail?id=68 http://code.google.com/p/darwincore/issues/detail?id=69 http://code.google.com/p/darwincore/issues/detail?id=80 and http://code.google.com/p/darwincore/issues/detail?id=81 that cannot and should not be decided until this discussion occurs. In particular, a discussion of what exactly a dwc:Occurrence is lies at the heart of much of what we are discussing in this thread and is critical to other processes that are moving forward, such as guidelines for how we represent things in RDF. On this list I requested discussion on this suite of topics when I proposed the Darwin Core modifications, and I requested to members of the TAG that this discussion happen at the TDWG meeting. It didn't happen either place, so I'm glad it's happening here now.
Roger has correctly noted that we colloquially talk about Occurrences in two ways that are fundamentally different. We use Occurrence (1) to mean that a species occurs generically at a particular locality (the "checklist" use), and (2) we talk about particular instances of particular individual organisms being noticed at a particular place at a particular time. Based on the clarification that John Wieczorek gave in the thread that surrounds http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000280.html, an Occurrence record simply asserts that an organism was someplace at a certain time (and doesn't imply any fitness of use such as for documenting distributions). This is consistent with meaning (2). I think that the "checklist" use (meaning 1) really should be called something else because it is conceptually something very different.
Assuming that when we talk about a dwc:Occurrence we intend meaning (2), it is important to clarify what aspect of an organism occurring somewhere at some time we intend for dwc:Occurrence to mean. When people talk about Occurrences, the conversation often goes awry because people are considering an Occurrence to include more or fewer conceptual entities. I don't know if images can be embedded in messages sent to the list, so look at this image: http://bioimages.vanderbilt.edu/pages/resource-diagram.gif before reading further. In that diagram, I'm trying to be as generic as possible. I think it is the intention of both TDWG and GBIF to go beyond thinking that Occurrences can only be specimens. So consider that this generic Occurrence could be a PreservedSpecimen, but could also be an image of an organism, DNA sample, or any other token of the presence of the Organism at a particular time and place (or a HumanObservation that has no token at all). I have heard people say that an Occurrence is a dctype:Event. That recognizes the arrow on the left side of the diagram which represents the time and place of the Occurrence. I have heard people say that if we photograph an organism, that is an "observation" with associated media. That recognizes the collected metadata (i.e. the "observation") part and the representation of the organism part (the photograph). When we talk about a PreservedSpecimen being an Occurrence, we probably intend the metadata as well as the physical thing in a jar or glued to a sheet of paper (the representation of the organism) and may or may not include the arrow on the left. I have taken the position that an Occurrence includes all of the components shown in the diagram. I'm not saying that this is the correct or only view on this subject, but if somebody intends for an Occurrence to mean something else, then they need to be clear about which component(s) of the diagram they are talking about.
Being conceptually clear about these things is important because that clarity informs the decision-making process about the pending issues that I mentioned, such as whether DigitalStillImage should be added as a DwC type (and hence have a URI and be an accepted value for dwc:basisOfRecord) and how we should structure RDF when we try to describe the properties of an Occurrence. If by "basisOfRecord" we mean a representation or token on which the Occurrence is based (or lack of token in the case of observations), then we should add as DwC types any type of physical or digital artifact that will be used by several people to document that an Occurrence existed at some point. It would not make logical sense to say that sometimes the basisOfRecord can be an artifact like a specimen, but other supporting artifacts such as digital images cannot and must be relegated to being associatedMedia.
I am not going to say more on this topic right now, partly because I have mid-semester progress reports to finish by the end of the day, but mostly because I wrote a paper discussing these issues and it lays out the conceptual framework I'm talking about better than I can in an email. I have cited that paper both in my requests for the Darwin Core changes and in previous emails to this list. However, based on the various emails that have been flying around, I don't think many people on the list have read it. That paper isn't a spur of the moment rant. I spent over a year writing it, solicited and received comments about it from a number of people including several people on the TAG, and went through the peer review process for several months before it was finally published this spring. It does not necessarily represent "the correct" view on the topics that we are discussing, but I believe that it does represent a logically consistent way of conceptualizing Occurrences and how a broad range of types of Occurrences can be described and related to other resources. If others can present clear and consistent alternatives to the framework that I've suggested, I would like to hear what they are. The article, Biodiversity Informatics 7:14-44 can be accessed at https://journals.ku.edu/index.php/jbi/article/view/3664 . In particular, take note of the discussion on p.27-28 regarding the criterion for determining whether an Occurrence documents a species' distribution, p. 28 where I discuss the difference between the use of dwc:recordedBy and dcterms:created, and p. 29 where I suggest controlled values for dwc:establishmentMeans that can be used for differentiating the extent to which an individual documented by an Occurrence occurs "naturally" at its location (native, naturalized, adventive, or cultivated - intended to apply to either plants or animals; a farm or zoo animal would be considered "cultivated"-I would be happy to define and propose these as a controlled vocabulary). These are all things that have come up in this thread. I also should note that I have been successfully applying this framework to live plant images at http://bioimages.vanderbilt.edu where I serve RDF that is consistent with the design discussed in the paper.
I would like to say more about the relationship between LivingSpecimens, Individuals, establishmentMeans, and indicating whether an Occurrence document's a species' distribution, but that will have to wait until later.
Steve Baskauf
joel sachs wrote:
One of the goals of the recent bioblitz was to think about the suitability and appropriatness of TDWG standards for
citizen science.
Robert Stevenson has volunteered to take the lead on preparing a technobioblitz lessons learned document, and though the
scope of this
document is not yet determined, I think the audience will include bioblitz organizers, software developers, and TDWG as a
whole. I hope
no one is shy about sharing lessons they think they learned, or suggestions that they have. We can use the bioblitz google
group for
this discussion, and copy in tdwg-content when our
discussion is standards-specific.
Here are some of my immediate observations:
- Darwin Core is almost exactly right for citizen science.
However,
there is a desperate need for examples and templates of its use. To illustrate this need: one of the developers spoke of the
design choice
between "a simple csv file and a Darwin Core record". But a
simple csv
file is a legitimate representation of Darwin Core! To be
fair to the
developer, such a sentence might not have struck me as
absurd a year
ago, before Remsen said "let's use DwC for the bioblitz".
We provided a couple of example DwC records (text and rdf) in the bioblitz data profile [1]. I think the lessons learned document should include an on-line catalog of cut-and-pasteable examples covering a variety of use cases, together with a dead simple desciption of DwC, something like "Darwin Core is a
collection of terms, together with definitions."
Here are areas where we augemented or diverged from DwC in
the bioblitz:
i. We added obs:observedBy [2], since there is no
equivalent property
in DwC, and it's important in Citizen Science (though often
not available).
ii. We used geo:lat and geo:long [3] instead of DwC terms
for latitude
and longitude. The geo namespace is a well used and supported standard, and records with geo coordinates are
automatically mapped by
several applications. Since everyone was using GPS to
retrieve their
coordinates, we were able to assume WGS-84 as the datum.
If someone had used another Datum, say XYZ, we would have added columns to the Fusion table so that they could have expressed their coordiantes in DwC, as, e.g.: DwC:decimalLatitude=41.5 DwC:decimalLongitude=-70.7 DwC:geodeticDatum=XYZ
(I would argue that it should be kosher DwC to express the above as simply XYZ:lat and XYZ:long. DwC already incorporates terms
from other
namespaces, such as Dublin Core, so there is precedent for this.
- DwC:scientificName might be more user friendly than
taxonomy:binomial and the other taxonomy machine tags EOL uses for flickr images. If DwC:scientificName isn't
self-explanatory enough, a
user can look it up, and see that any scientific name is
acceptable,
at any taxonomic rank, or not having any rank. And once we have a scientific name, higher ranks can be inferred.
- Catalogue of Life was an important part of the workflow,
but we had
some problems with it. Future bioblitzes might consider using something like a CoL fork, as recently described by Rod Page [4].
- We didn't include "basisOfRecord" in the original data
profile, and
so it wasn't a column in the Fusion Table [5]. But when a
transcriber
felt it was necessary to include in order to capture data in a particular field sheet, she just added the column to the
table. This
flexibility of schema is important, and is in harmony with
the semantic web.
- There seemed to be enthusiasm for another field event at next
year's TDWG. This could be an opportunity to gather other
types of data (eg.
character data) and thereby i) expose meeting particpants to another set of everyday
problems from
the world of biodiversity workflows, and ii) try other TDWG
technology
on for size, e.g. the observation exchange format,
annotation framework, etc.
Happy Thanksgiving to all in Canada - Joel.
http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile
-v1-1 2. Slightly bastardizing our old observation ontology - http://spire.umbc.edu/ontologies/Observation.owl 3. http://www.w3.org/2003/01/geo/ 4.
http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-201
0.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks, John. I agree that there will be little value in trying to define and name distinct units of space and time, but there may be value in defining units along the taxonomic axis. However, we should first come to a community consensus on what the maximum scope of each axis is.
My sense is that the maximum scope of space is "Earth" (at least until we begin documenting populations of extraterrestrial life).
My sense is that the maximum scope of time is effectively "any window of time during the past 4 billion years or so".
But I don't have a clear sense for what the maximum scope of "one or more organisms" ought to be. I'm content with extending it to "populations" as a unit of "organisms", because I see a smooth transition from two individual organisms all the way up to a population of organisms. But should we accept taxonConcept (which can be thought of as an implied set of populations) as an extension of "organisms"? If so, then "Animalia Occurred on Earth sometime during the past 2 billion years" is a legitimate Occurrence record (pretty damn useless...but still legitimate).
I think it matters, and is relevant to this exchange -- both because of Steve's point about more clearly defining what an "Occurrence" can be, and because we still don't have a good idea of how and where to score "nativeness" (for which there is clearly an expressed need).
I agree that fitness-for-use should be determined from the content of the records, but coming back to Donald's (and others') point about filtering "non-native" records, there needs to be a way to include this information in the content of the records in order to determine fitness-for-use. I believe that a controlled vocabulary for establishmentMeans will probably be all we have to do to satisy 95% of the user need. But before we can nail down what that controlled vocabulary would encompass, I think we need to come to some sort of consensus on the issues that Steve has articulated.
Aloha, Rich
_____
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, October 12, 2010 11:29 AM To: Richard Pyle Cc: Steve Baskauf; joel sachs; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] How to record "Nativeness"?
Occurrence is admittedly a problematic term. Its current definition is vague following in the grand tradition of Dublin Core term definitions. Rich's interpretation echoes what Steve wrote and comes closest in my mind to what an occurrence really is meant to be, namely "evidence of one or more organisms occurring at a place and time." This leaves open all of the vast continuum of scales - geographic, temporal, and taxonomic - at which occurrences can be described. I'm not sure exactly what is solved by trying to make named distinctions between different scales or levels of detail (on any of the three axes) of Occurrence. The core of the issue really boils down to fitness-for-use of records and a potential user's capacity to accurately determine that. These should be characteristics that can be determined from the content of the records.
Dear All,
Completely agree with Rich's analysis.
May I coin two new terms? But before, I think we should separate the definition of occurrences from their use, which remove many questions evocated in previous messages):
Occurrence:
At least a triplet (a taxon name, a location, a time); whatever the precision of each member of the triplet is.
The difference is in only in the use that we can do of an occurrence depending on the respective precision of each member of the triplet. Ontologies by definition should reflect the patterns not the processes (although technically I suppose that processes can be described by ontologies ... but it is an extension of the meaning of the word).
Name:
From "living organism" (extant or fossil) down to infrasubspecific rank
if needed. Can it be a common name? Yes, it may decrease the precision or even the accuracy, that is all.
Location:
From Earth/continent/ocean/catchment down to precise geocoordinates.
Earth is always implicit and by default until we find life out in space.
Time:
From 4.5 billion year range / geological era down to a precise date/time
stamp.
Here are the two new terms I propose (and more could be coined using the same way):
Geoccurrence: an occurrence with geocoordinates.
Loccurrence: an occurrence with only a locality/geographic name.
Should we coin terms for occurrences resulting from modeling?
Another consideration: species distribution modeling is a rationalization of the production of distribution maps, just like cladistics is a rationalization of the production of phylogenies.
For cladistics, in essence we sample individuals in the real genealogic tree (= tokogenetic tree of Hennig): but can we say that actually all characters used in cladistics lead back to a given individual? Maybe true for molecular data but this statement needs more thinking; I don't think it is true for morphology, and it is the same way for synthetic descriptions and older works, as Rich described as using all imprecise old records.
Likewise for distribution, we sample individuals, and also use the best of loccurrences based on imprecise location (cf. Jeremy Jackson work on historical records and trends).
As for recording nativeness, I would suggest that it is a general issue for all controlled vocabularies that try to establish categories over a continuum: the only way to get rid of all these problematic definitions, and most probably incl. occurrences, is to express them with fuzzy logic: we can say that a species is more or less native, especially if the abundance is gradient from a center to peripheral areas, and then it could derive from species distribution modeling based on geoccurrences and loccurrences expressed as fuzzy functions.
So the next step is to include fuzzy logic in ontologies ;-). And TDWG becoming a fuzzy think tank ;-).
BW
Nicolas.
________________________________
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle Sent: Wednesday 13 October 2010 07:08 To: tuco@berkeley.edu Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] How to record "Nativeness"?
Thanks, John. I agree that there will be little value in trying to define and name distinct units of space and time, but there may be value in defining units along the taxonomic axis. However, we should first come to a community consensus on what the maximum scope of each axis is.
My sense is that the maximum scope of space is "Earth" (at least until we begin documenting populations of extraterrestrial life).
My sense is that the maximum scope of time is effectively "any window of time during the past 4 billion years or so".
But I don't have a clear sense for what the maximum scope of "one or more organisms" ought to be. I'm content with extending it to "populations" as a unit of "organisms", because I see a smooth transition from two individual organisms all the way up to a population of organisms. But should we accept taxonConcept (which can be thought of as an implied set of populations) as an extension of "organisms"? If so, then "Animalia Occurred on Earth sometime during the past 2 billion years" is a legitimate Occurrence record (pretty damn useless...but still legitimate).
I think it matters, and is relevant to this exchange -- both because of Steve's point about more clearly defining what an "Occurrence" can be, and because we still don't have a good idea of how and where to score "nativeness" (for which there is clearly an expressed need).
I agree that fitness-for-use should be determined from the content of the records, but coming back to Donald's (and others') point about filtering "non-native" records, there needs to be a way to include this information in the content of the records in order to determine fitness-for-use. I believe that a controlled vocabulary for establishmentMeans will probably be all we have to do to satisy 95% of the user need. But before we can nail down what that controlled vocabulary would encompass, I think we need to come to some sort of consensus on the issues that Steve has articulated.
Aloha,
Rich
________________________________
From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek Sent: Tuesday, October 12, 2010 11:29 AM To: Richard Pyle Cc: Steve Baskauf; joel sachs; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com Subject: Re: [tdwg-content] How to record "Nativeness"?
Occurrence is admittedly a problematic term. Its current definition is vague following in the grand tradition of Dublin Core term definitions. Rich's interpretation echoes what Steve wrote and comes closest in my mind to what an occurrence really is meant to be, namely "evidence of one or more organisms occurring at a place and time." This leaves open all of the vast continuum of scales - geographic, temporal, and taxonomic - at which occurrences can be described. I'm not sure exactly what is solved by trying to make named distinctions between different scales or levels of detail (on any of the three axes) of Occurrence. The core of the issue really boils down to fitness-for-use of records and a potential user's capacity to accurately determine that. These should be characteristics that can be determined from the content of the records.
participants (23)
-
"Markus Döring (GBIF)"
-
Arlin Stoltzfus
-
Bailly, Nicolas (WorldFish)
-
Blum, Stan
-
Bryan
-
David Remsen (GBIF)
-
Donald.Hobern@csiro.au
-
Gail Kampmeier
-
Gregor Hagedorn
-
Hilmar Lapp
-
Jerry Cooper
-
joel sachs
-
John Wieczorek
-
k.flanagan@etoncollege.org.uk
-
Kevin Richards
-
Markus Döring
-
Peter DeVries
-
Richard Pyle
-
Roger Hyam
-
Steve Baskauf
-
Steve Kelling
-
Tim Robertson (GBIF)
-
Wouter Addink