Differences in thinking between TDWG and LinkedData groups about data sharing / integration
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data.
*"The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction. For organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to grow organically and be described by anyone. The same people would probably not believe the Web possible in the first place if there were not already ample proof of its success."*
REST for Java developers, Part 4: The future is RESTful From http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=...
I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources.
It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter. --------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 ------------------------------------------------------------
"described by anyone" is not the same as "described by anyone in any way convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or proposed standard you find enables fiefdoms \in ways that are impossible under some other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries pete.devries@gmail.comwrote:
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data.
*"The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction. For organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to grow organically and be described by anyone. The same people would probably not believe the Web possible in the first place if there were not already ample proof of its success."*
REST for Java developers, Part 4: The future is RESTful From http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=...
I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources.
It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Respectfully, 1) Only certain classes of organizations will be able to contribute since the standard is requires special skills. Those groups that can pay for hardware and a person specific to this standard for perpetuity. I look at this and think that a number of groups that could be providers cannot because of the way the system is implemented. Why not have a simple RDF tar or zip file format that GBIF checks with a crawler every night?
2) There is very little reuse of existing vocabularies, geo for instance. Similar to the "not invented here mentality".
3) Discussions and decisions seem to be too much about making sure that providers keep their "brand" on the data even if they disappear.
4) Suggestions or alternative ways of thinking are rejected until an insider restates them without attribution
5) It is not at all clear how some of these decisions are made. It appears as if some people disagree, there is discussion. Then years later there is the same discussion. It seems that some smaller group keeps pulling everyone back to the same architectural decision.
6) Where are the example data sets? We should have some example data sets available to see if the standard can be used to answer real questions? Either they don't exist or they are only available to a few.
I actually have nothing but praise for GBIF and uBio (except for the minor encoding thing), this more about trying to work within TDWG and getting stonewalled. I am having the same feelings about it that I had a few years ago, after which I left to try to make something that worked so I could proceed with my project.
It probably was unfair to imply that the fiefdoms are by design, rather than a side effect of the implementation standards, and for that I apologize.
- Pete
On Thu, Apr 23, 2009 at 10:56 AM, Bob Morris morris.bob@gmail.com wrote:
"described by anyone" is not the same as "described by anyone in any way convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or proposed standard you find enables fiefdoms \in ways that are impossible under some other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries pete.devries@gmail.comwrote:
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data.
*"The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction. For organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to grow organically and be described by anyone. The same people would probably not believe the Web possible in the first place if there were not already ample proof of its success."*
REST for Java developers, Part 4: The future is RESTful From http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=...
I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources.
It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram/calendar.html phone (+1)617 287 6466
Can't let this great opportunity pass... :)
It is not just us. It is the regrettable human condition. See also: http://www.informationweek.com/news/infrastructure/management/showArticle.jh...
<diatribe_alert/>
One of the intriguing and enigmatic things about TDWG is that it does not seem to respect its own standards, preferring to invent another set rather than fix or enhance what is already there. More than the 'not invented here' syndrome, we have to deal with 'not invented here this week' and the user is left with the impression of a happy self-absorbed group chasing a flock of flitting butterflies - 'oh, that's pretty, I will try and catch that one'. Yep, just the human condition. In another context, Roger H, http://www.hyam.net/blog/archives/346, has provided the best description of TDWG and its standards I think I have ever heard: "It is kind of like the guy selling seagulls on the beach. You give him £5 and he points into the sky and says 'That one is yours'. Yes he is providing a service but the relationship that counts is the one between you and the gull." This is both profound and frightening in how close it is to reality.
On the complexity and costs of implementation, you are absolutely right. Taxonomic database standards have moved out of the domain of taxonomists, they no longer understand what we are talking about (hell, I no longer understand what we are talking about. And the rest of you? c'mon now, be honest... :) and we are left with this relationship of 'trust us' paternalism that I do not think is all that healthy - we all trust and respect Microsoft, right?. It is getting increasingly difficult give advice to a bod with a beat up PC in a developing country on what they should do - the alphabet soup surrounding GUIDS and the like just does not cut it when the mind set is an Excel spreadsheet with a bunch of intuitive headings. Oh, I'm sorry - you mean it is not just in developing countries? There is a widening gap between 'Joe the Taxonomist' and business of taxonomy data standards that we do not seem to be able to address (do we even care?). We used to be able to give our staff in the herbarium a TDWG standard and say 'this is what we have to do.' Not anymore... Ah, the good ole days... Maybe the answer lies in the hegemony of a new and benevolent 'MSOffice for Taxonomy'. Could work, but I do not think it is going to be particularly satisfying.
On the reuse issue, Greg W argues that we do not do nearly enough of this and I agree with him. He argues that TDWG should focus on the standard and not the application and implementation of the standard and has proposed that in our vocabularies we should adopt the principles of nomenclatural priority, that is, going back to *Dublin* Core, adding stuff chronologically from other other standards, including our own, until there is no option other than to invent another one, or there is nothing in our domain left to standardize. For taxonomists there is something inherently attractive in this approach - don't describe a taxon where it already exists, don't invent a standard where one already exists. To retrofit this and untangle all the synonymy and homonymy in our existing standards and implementations is going to take a lot of work though. But the vocabularies and ontologies are a good start.
On the 'branding, issue, it is not so much branding but attribution. Apart from the moral and legal issues, it is unscientific not to attribute, source and provide lineage for data. The is no optionality. We have to do it. Even if the initial supplier 'disappears'. *Especially* if the initial supplier disappears. Attribution (branding if you like) is absolutely essential for credibility. If someone is not going to do it, they can not have our data, and we will not use theirs.
On the architectural issue, I can not really get all that hung up on it. If a standard is good, it should be able to be implemented in a number of architectures (isn't that almost a definition of interoperability?). Where things get 'interesting' is when architecture (and the continuum towards application) becomes the standard or part of the standard. TDWG needs to constantly ask itself to what extent it needs to get involved with implementation of the standards it promotes. I would argue 'not at all', but this is another discussion.
And the 'insider' cabalistic nature of TDWG? What can I say - it has always been this way. A standard attracts a champion and the champion establishes a fiefdom of acolytes around it. Yep, the human condition. And it sort of works. Some of the time. (btw - another artifact of the human condition - your brilliant ideas are never perceived as such until someone else has them - just ask poor old Wallace how he is feeling this year). A downside of this approach is that the various TDWG standards are very poorly coordinated between each other - this is something we should be able to do something about.
We could piss on the TDWG tent from the outside, but you have to agree, it is much more satisfying to get inside the tent and piss on and piss off everyone in it... :)
<disclaimer>None of the above ideas are mine. I am following the TDWG standard practice of restating them without attribution :) </disclaimer>
Ah... that was fun...
jim
On Fri, Apr 24, 2009 at 5:09 AM, Peter DeVries pete.devries@gmail.com wrote:
Respectfully,
- Only certain classes of organizations will be able to contribute since
the standard is requires special skills. Those groups that can pay for hardware and a person specific to this standard for perpetuity. I look at this and think that a number of groups that could be providers cannot because of the way the system is implemented. Why not have a simple RDF tar or zip file format that GBIF checks with a crawler every night? 2) There is very little reuse of existing vocabularies, geo for instance. Similar to the "not invented here mentality". 3) Discussions and decisions seem to be too much about making sure that providers keep their "brand" on the data even if they disappear. 4) Suggestions or alternative ways of thinking are rejected until an insider restates them without attribution 5) It is not at all clear how some of these decisions are made. It appears as if some people disagree, there is discussion. Then years later there is the same discussion. It seems that some smaller group keeps pulling everyone back to the same architectural decision. 6) Where are the example data sets? We should have some example data sets available to see if the standard can be used to answer real questions? Either they don't exist or they are only available to a few. I actually have nothing but praise for GBIF and uBio (except for the minor encoding thing), this more about trying to work within TDWG and getting stonewalled. I am having the same feelings about it that I had a few years ago, after which I left to try to make something that worked so I could proceed with my project. It probably was unfair to imply that the fiefdoms are by design, rather than a side effect of the implementation standards, and for that I apologize.
- Pete
On Thu, Apr 23, 2009 at 10:56 AM, Bob Morris morris.bob@gmail.com wrote:
"described by anyone" is not the same as "described by anyone in any way convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or proposed standard you find enables fiefdoms \in ways that are impossible under some other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries pete.devries@gmail.com wrote:
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data. "The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction. For organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to grow organically and be described by anyone. The same people would probably not believe the Web possible in the first place if there were not already ample proof of its success." REST for Java developers, Part 4: The future is RESTful
From http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=... I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources. It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram/calendar.html phone (+1)617 287 6466
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hi Jim, Thanks for keeping in good humor. :-)
I was trying to get my head around the TaxonOccurrence standard so I could rewrite my observation records, and was hoping on finding some examples. That experience, and some related issues, made me a little irritable, for that I am sorry.
It might be useful to make up a test set. It does not even need to be real. In my experience, this is where you really start to see if the standard does what you want it to do. Many times I have thought that I had everything figured out, only to discover after loading data into my triple store that something will not work or something that used to work was now broken by the addition of a new feature.
I am not bothered by the idea of branding, it just seemed that the it was being confused with the goal of persistence in what seemed to be a unproductive thread.
Thanks again for being amused, tolerant and insightful. :-)
- Pete
On Thu, Apr 23, 2009 at 7:50 PM, Jim Croft jim.croft@gmail.com wrote:
Can't let this great opportunity pass... :)
It is not just us. It is the regrettable human condition. See also:
http://www.informationweek.com/news/infrastructure/management/showArticle.jh...
<diatribe_alert/>
One of the intriguing and enigmatic things about TDWG is that it does not seem to respect its own standards, preferring to invent another set rather than fix or enhance what is already there. More than the 'not invented here' syndrome, we have to deal with 'not invented here this week' and the user is left with the impression of a happy self-absorbed group chasing a flock of flitting butterflies - 'oh, that's pretty, I will try and catch that one'. Yep, just the human condition. In another context, Roger H, http://www.hyam.net/blog/archives/346, has provided the best description of TDWG and its standards I think I have ever heard: "It is kind of like the guy selling seagulls on the beach. You give him £5 and he points into the sky and says 'That one is yours'. Yes he is providing a service but the relationship that counts is the one between you and the gull." This is both profound and frightening in how close it is to reality.
On the complexity and costs of implementation, you are absolutely right. Taxonomic database standards have moved out of the domain of taxonomists, they no longer understand what we are talking about (hell, I no longer understand what we are talking about. And the rest of you? c'mon now, be honest... :) and we are left with this relationship of 'trust us' paternalism that I do not think is all that healthy - we all trust and respect Microsoft, right?. It is getting increasingly difficult give advice to a bod with a beat up PC in a developing country on what they should do - the alphabet soup surrounding GUIDS and the like just does not cut it when the mind set is an Excel spreadsheet with a bunch of intuitive headings. Oh, I'm sorry - you mean it is not just in developing countries? There is a widening gap between 'Joe the Taxonomist' and business of taxonomy data standards that we do not seem to be able to address (do we even care?). We used to be able to give our staff in the herbarium a TDWG standard and say 'this is what we have to do.' Not anymore... Ah, the good ole days... Maybe the answer lies in the hegemony of a new and benevolent 'MSOffice for Taxonomy'. Could work, but I do not think it is going to be particularly satisfying.
On the reuse issue, Greg W argues that we do not do nearly enough of this and I agree with him. He argues that TDWG should focus on the standard and not the application and implementation of the standard and has proposed that in our vocabularies we should adopt the principles of nomenclatural priority, that is, going back to *Dublin* Core, adding stuff chronologically from other other standards, including our own, until there is no option other than to invent another one, or there is nothing in our domain left to standardize. For taxonomists there is something inherently attractive in this approach - don't describe a taxon where it already exists, don't invent a standard where one already exists. To retrofit this and untangle all the synonymy and homonymy in our existing standards and implementations is going to take a lot of work though. But the vocabularies and ontologies are a good start.
On the 'branding, issue, it is not so much branding but attribution. Apart from the moral and legal issues, it is unscientific not to attribute, source and provide lineage for data. The is no optionality. We have to do it. Even if the initial supplier 'disappears'. *Especially* if the initial supplier disappears. Attribution (branding if you like) is absolutely essential for credibility. If someone is not going to do it, they can not have our data, and we will not use theirs.
On the architectural issue, I can not really get all that hung up on it. If a standard is good, it should be able to be implemented in a number of architectures (isn't that almost a definition of interoperability?). Where things get 'interesting' is when architecture (and the continuum towards application) becomes the standard or part of the standard. TDWG needs to constantly ask itself to what extent it needs to get involved with implementation of the standards it promotes. I would argue 'not at all', but this is another discussion.
And the 'insider' cabalistic nature of TDWG? What can I say - it has always been this way. A standard attracts a champion and the champion establishes a fiefdom of acolytes around it. Yep, the human condition. And it sort of works. Some of the time. (btw - another artifact of the human condition - your brilliant ideas are never perceived as such until someone else has them - just ask poor old Wallace how he is feeling this year). A downside of this approach is that the various TDWG standards are very poorly coordinated between each other - this is something we should be able to do something about.
We could piss on the TDWG tent from the outside, but you have to agree, it is much more satisfying to get inside the tent and piss on and piss off everyone in it... :)
<disclaimer>None of the above ideas are mine. I am following the TDWG standard practice of restating them without attribution :)
</disclaimer>
Ah... that was fun...
jim
On Fri, Apr 24, 2009 at 5:09 AM, Peter DeVries pete.devries@gmail.com wrote:
Respectfully,
- Only certain classes of organizations will be able to contribute since
the standard is requires special skills. Those groups that can pay for hardware and a person specific to this standard for perpetuity. I look at this and think that a number of groups that could be providers cannot because of
the
way the system is implemented. Why not have a simple RDF tar or zip file
format
that GBIF checks with a crawler every night? 2) There is very little reuse of existing vocabularies, geo for instance. Similar to the "not invented here mentality". 3) Discussions and decisions seem to be too much about making sure that providers keep their "brand" on the data even if they disappear. 4) Suggestions or alternative ways of thinking are rejected until an
insider
restates them without attribution 5) It is not at all clear how some of these decisions are made. It
appears
as if some people disagree, there is discussion. Then years later there
is
the same discussion. It seems that some smaller group keeps pulling
everyone back to the same architectural decision. 6) Where are the example data sets? We should have some example data sets available to see if the standard can be used to answer real questions? Either they don't exist or they are only available to a few. I actually have nothing but praise for GBIF and uBio (except for the
minor
encoding thing), this more about trying to work within TDWG and getting stonewalled. I am having the same feelings about it that I had a few
years
ago, after which I left to try to make something that worked so I could proceed with my project. It probably was unfair to imply that the fiefdoms are by design, rather
than
a side effect of the implementation standards, and for that I apologize.
- Pete
On Thu, Apr 23, 2009 at 10:56 AM, Bob Morris morris.bob@gmail.com
wrote:
"described by anyone" is not the same as "described by anyone in any way convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or proposed standard you find enables fiefdoms \in ways that are impossible under
some
other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries <pete.devries@gmail.com
wrote:
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data. "The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction.
For
organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to
grow
organically and be described by anyone. The same people would probably
not
believe the Web possible in the first place if there were not already
ample
proof of its success." REST for Java developers, Part 4: The future is RESTful
From
http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=...
I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources. It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram/calendar.html phone (+1)617 287 6466
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499
"Words, as is well known, are the great foes of reality."
- Joseph Conrad, author (1857-1924)
"I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant."
- attributed to Robert McCloskey, US State Department spokesman
Pete
You should have sent something around on the mailing list, I could have given you an example of a TaxonOccurrence. Or perhaps you did and I missed it???
Anyway, with Herb IMI, Paul Kirk and I have set up an resolver to provider TaxonOccurrence RDF data, see for example urn:lsid:herbimi.info:specimens:100069 (or http://lsid.herbimi.info/authority/metadata/?lsid=urn:lsid:herbimi.info:spec... in your browser). It also has an example of using Interaction data - ie in this case a host plant (IPNI ID) of a fungus (Herb IMI specimen) with an identification to a taxon concept and name (Index Fungorum name).
Jim - feel free to help improve the ideas and processes of TDWG if you find them that bad. :-)
Kevin
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Peter DeVries Sent: Friday, 24 April 2009 1:46 p.m. To: Jim Croft Cc: Bob Morris; tdwg-tag@lists.tdwg.org Subject: Re: [tdwg-tag] Differences in thinking between TDWG and LinkedData groups about data sharing / integration
Hi Jim,
Thanks for keeping in good humor. :-)
I was trying to get my head around the TaxonOccurrence standard so I could rewrite my observation records, and was hoping on finding some examples. That experience, and some related issues, made me a little irritable, for that I am sorry.
It might be useful to make up a test set. It does not even need to be real. In my experience, this is where you really start to see if the standard does what you want it to do. Many times I have thought that I had everything figured out, only to discover after loading data into my triple store that something will not work or something that used to work was now broken by the addition of a new feature.
I am not bothered by the idea of branding, it just seemed that the it was being confused with the goal of persistence in what seemed to be a unproductive thread.
Thanks again for being amused, tolerant and insightful. :-)
- Pete
On Thu, Apr 23, 2009 at 7:50 PM, Jim Croft <jim.croft@gmail.commailto:jim.croft@gmail.com> wrote: Can't let this great opportunity pass... :)
It is not just us. It is the regrettable human condition. See also: http://www.informationweek.com/news/infrastructure/management/showArticle.jh...
<diatribe_alert/>
One of the intriguing and enigmatic things about TDWG is that it does not seem to respect its own standards, preferring to invent another set rather than fix or enhance what is already there. More than the 'not invented here' syndrome, we have to deal with 'not invented here this week' and the user is left with the impression of a happy self-absorbed group chasing a flock of flitting butterflies - 'oh, that's pretty, I will try and catch that one'. Yep, just the human condition. In another context, Roger H, http://www.hyam.net/blog/archives/346, has provided the best description of TDWG and its standards I think I have ever heard: "It is kind of like the guy selling seagulls on the beach. You give him £5 and he points into the sky and says 'That one is yours'. Yes he is providing a service but the relationship that counts is the one between you and the gull." This is both profound and frightening in how close it is to reality.
On the complexity and costs of implementation, you are absolutely right. Taxonomic database standards have moved out of the domain of taxonomists, they no longer understand what we are talking about (hell, I no longer understand what we are talking about. And the rest of you? c'mon now, be honest... :) and we are left with this relationship of 'trust us' paternalism that I do not think is all that healthy - we all trust and respect Microsoft, right?. It is getting increasingly difficult give advice to a bod with a beat up PC in a developing country on what they should do - the alphabet soup surrounding GUIDS and the like just does not cut it when the mind set is an Excel spreadsheet with a bunch of intuitive headings. Oh, I'm sorry - you mean it is not just in developing countries? There is a widening gap between 'Joe the Taxonomist' and business of taxonomy data standards that we do not seem to be able to address (do we even care?). We used to be able to give our staff in the herbarium a TDWG standard and say 'this is what we have to do.' Not anymore... Ah, the good ole days... Maybe the answer lies in the hegemony of a new and benevolent 'MSOffice for Taxonomy'. Could work, but I do not think it is going to be particularly satisfying.
On the reuse issue, Greg W argues that we do not do nearly enough of this and I agree with him. He argues that TDWG should focus on the standard and not the application and implementation of the standard and has proposed that in our vocabularies we should adopt the principles of nomenclatural priority, that is, going back to *Dublin* Core, adding stuff chronologically from other other standards, including our own, until there is no option other than to invent another one, or there is nothing in our domain left to standardize. For taxonomists there is something inherently attractive in this approach - don't describe a taxon where it already exists, don't invent a standard where one already exists. To retrofit this and untangle all the synonymy and homonymy in our existing standards and implementations is going to take a lot of work though. But the vocabularies and ontologies are a good start.
On the 'branding, issue, it is not so much branding but attribution. Apart from the moral and legal issues, it is unscientific not to attribute, source and provide lineage for data. The is no optionality. We have to do it. Even if the initial supplier 'disappears'. *Especially* if the initial supplier disappears. Attribution (branding if you like) is absolutely essential for credibility. If someone is not going to do it, they can not have our data, and we will not use theirs.
On the architectural issue, I can not really get all that hung up on it. If a standard is good, it should be able to be implemented in a number of architectures (isn't that almost a definition of interoperability?). Where things get 'interesting' is when architecture (and the continuum towards application) becomes the standard or part of the standard. TDWG needs to constantly ask itself to what extent it needs to get involved with implementation of the standards it promotes. I would argue 'not at all', but this is another discussion.
And the 'insider' cabalistic nature of TDWG? What can I say - it has always been this way. A standard attracts a champion and the champion establishes a fiefdom of acolytes around it. Yep, the human condition. And it sort of works. Some of the time. (btw - another artifact of the human condition - your brilliant ideas are never perceived as such until someone else has them - just ask poor old Wallace how he is feeling this year). A downside of this approach is that the various TDWG standards are very poorly coordinated between each other - this is something we should be able to do something about.
We could piss on the TDWG tent from the outside, but you have to agree, it is much more satisfying to get inside the tent and piss on and piss off everyone in it... :)
<disclaimer>None of the above ideas are mine. I am following the TDWG standard practice of restating them without attribution :) </disclaimer>
Ah... that was fun...
jim
On Fri, Apr 24, 2009 at 5:09 AM, Peter DeVries <pete.devries@gmail.commailto:pete.devries@gmail.com> wrote:
Respectfully,
- Only certain classes of organizations will be able to contribute since
the standard is requires special skills. Those groups that can pay for hardware and a person specific to this standard for perpetuity. I look at this and think that a number of groups that could be providers cannot because of the way the system is implemented. Why not have a simple RDF tar or zip file format that GBIF checks with a crawler every night? 2) There is very little reuse of existing vocabularies, geo for instance. Similar to the "not invented here mentality". 3) Discussions and decisions seem to be too much about making sure that providers keep their "brand" on the data even if they disappear. 4) Suggestions or alternative ways of thinking are rejected until an insider restates them without attribution 5) It is not at all clear how some of these decisions are made. It appears as if some people disagree, there is discussion. Then years later there is the same discussion. It seems that some smaller group keeps pulling everyone back to the same architectural decision. 6) Where are the example data sets? We should have some example data sets available to see if the standard can be used to answer real questions? Either they don't exist or they are only available to a few. I actually have nothing but praise for GBIF and uBio (except for the minor encoding thing), this more about trying to work within TDWG and getting stonewalled. I am having the same feelings about it that I had a few years ago, after which I left to try to make something that worked so I could proceed with my project. It probably was unfair to imply that the fiefdoms are by design, rather than a side effect of the implementation standards, and for that I apologize.
- Pete
On Thu, Apr 23, 2009 at 10:56 AM, Bob Morris <morris.bob@gmail.commailto:morris.bob@gmail.com> wrote:
"described by anyone" is not the same as "described by anyone in any way convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or proposed standard you find enables fiefdoms \in ways that are impossible under some other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries <pete.devries@gmail.commailto:pete.devries@gmail.com> wrote:
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data. "The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction. For organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to grow organically and be described by anyone. The same people would probably not believe the Web possible in the first place if there were not already ample proof of its success." REST for Java developers, Part 4: The future is RESTful
From http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=... I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources. It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edumailto:ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram/calendar.html phone (+1)617 287 6466
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- _________________ Jim Croft ~ jim.croft@gmail.commailto:jim.croft@gmail.com ~ +61-2-62509499
"Words, as is well known, are the great foes of reality." - Joseph Conrad, author (1857-1924)
"I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant." - attributed to Robert McCloskey, US State Department spokesman
-- --------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 ------------------------------------------------------------
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Also, the Plazi EoL REST service for descriptions extracted from legacy literature has particularly minimal use of TaxonOccurrence objects. See sample, including restful service spec, at http://wiki.tdwg.org/twiki/bin/view/SPM/PlaziEOLProject
Bob Morris
On Thu, Apr 23, 2009 at 10:16 PM, Kevin Richards < RichardsK@landcareresearch.co.nz> wrote:
Pete
You should have sent something around on the mailing list, I could have given you an example of a TaxonOccurrence. Or perhaps you did and I missed it???
Anyway, with Herb IMI, Paul Kirk and I have set up an resolver to provider TaxonOccurrence RDF data,
see for example urn:lsid:herbimi.info:specimens:100069 (or http://lsid.herbimi.info/authority/metadata/?lsid=urn:lsid:herbimi.info:spec... your browser). It also has an example of using Interaction data - ie in this case a host plant (IPNI ID) of a fungus (Herb IMI specimen) with an identification to a taxon concept and name (Index Fungorum name).
Jim - feel free to help improve the ideas and processes of TDWG if you find them that bad. :-)
Kevin
*From:* tdwg-tag-bounces@lists.tdwg.org [mailto: tdwg-tag-bounces@lists.tdwg.org] *On Behalf Of *Peter DeVries *Sent:* Friday, 24 April 2009 1:46 p.m. *To:* Jim Croft *Cc:* Bob Morris; tdwg-tag@lists.tdwg.org *Subject:* Re: [tdwg-tag] Differences in thinking between TDWG and LinkedData groups about data sharing / integration
Hi Jim,
Thanks for keeping in good humor. :-)
I was trying to get my head around the TaxonOccurrence standard so I could rewrite my observation records,
and was hoping on finding some examples. That experience, and some related issues, made me a little irritable,
for that I am sorry.
It might be useful to make up a test set. It does not even need to be real. In my experience, this is where
you really start to see if the standard does what you want it to do. Many times I have thought that I had
everything figured out, only to discover after loading data into my triple store that something will not work
or something that used to work was now broken by the addition of a new feature.
I am not bothered by the idea of branding, it just seemed that the it was being confused with the goal of
persistence in what seemed to be a unproductive thread.
Thanks again for being amused, tolerant and insightful. :-)
- Pete
On Thu, Apr 23, 2009 at 7:50 PM, Jim Croft jim.croft@gmail.com wrote:
Can't let this great opportunity pass... :)
It is not just us. It is the regrettable human condition. See also:
http://www.informationweek.com/news/infrastructure/management/showArticle.jh...
<diatribe_alert/>
One of the intriguing and enigmatic things about TDWG is that it does not seem to respect its own standards, preferring to invent another set rather than fix or enhance what is already there. More than the 'not invented here' syndrome, we have to deal with 'not invented here this week' and the user is left with the impression of a happy self-absorbed group chasing a flock of flitting butterflies - 'oh, that's pretty, I will try and catch that one'. Yep, just the human condition. In another context, Roger H, http://www.hyam.net/blog/archives/346, has provided the best description of TDWG and its standards I think I have ever heard: "It is kind of like the guy selling seagulls on the beach. You give him £5 and he points into the sky and says 'That one is yours'. Yes he is providing a service but the relationship that counts is the one between you and the gull." This is both profound and frightening in how close it is to reality.
On the complexity and costs of implementation, you are absolutely right. Taxonomic database standards have moved out of the domain of taxonomists, they no longer understand what we are talking about (hell, I no longer understand what we are talking about. And the rest of you? c'mon now, be honest... :) and we are left with this relationship of 'trust us' paternalism that I do not think is all that healthy - we all trust and respect Microsoft, right?. It is getting increasingly difficult give advice to a bod with a beat up PC in a developing country on what they should do - the alphabet soup surrounding GUIDS and the like just does not cut it when the mind set is an Excel spreadsheet with a bunch of intuitive headings. Oh, I'm sorry - you mean it is not just in developing countries? There is a widening gap between 'Joe the Taxonomist' and business of taxonomy data standards that we do not seem to be able to address (do we even care?). We used to be able to give our staff in the herbarium a TDWG standard and say 'this is what we have to do.' Not anymore... Ah, the good ole days... Maybe the answer lies in the hegemony of a new and benevolent 'MSOffice for Taxonomy'. Could work, but I do not think it is going to be particularly satisfying.
On the reuse issue, Greg W argues that we do not do nearly enough of this and I agree with him. He argues that TDWG should focus on the standard and not the application and implementation of the standard and has proposed that in our vocabularies we should adopt the principles of nomenclatural priority, that is, going back to *Dublin* Core, adding stuff chronologically from other other standards, including our own, until there is no option other than to invent another one, or there is nothing in our domain left to standardize. For taxonomists there is something inherently attractive in this approach - don't describe a taxon where it already exists, don't invent a standard where one already exists. To retrofit this and untangle all the synonymy and homonymy in our existing standards and implementations is going to take a lot of work though. But the vocabularies and ontologies are a good start.
On the 'branding, issue, it is not so much branding but attribution. Apart from the moral and legal issues, it is unscientific not to attribute, source and provide lineage for data. The is no optionality. We have to do it. Even if the initial supplier 'disappears'. *Especially* if the initial supplier disappears. Attribution (branding if you like) is absolutely essential for credibility. If someone is not going to do it, they can not have our data, and we will not use theirs.
On the architectural issue, I can not really get all that hung up on it. If a standard is good, it should be able to be implemented in a number of architectures (isn't that almost a definition of interoperability?). Where things get 'interesting' is when architecture (and the continuum towards application) becomes the standard or part of the standard. TDWG needs to constantly ask itself to what extent it needs to get involved with implementation of the standards it promotes. I would argue 'not at all', but this is another discussion.
And the 'insider' cabalistic nature of TDWG? What can I say - it has always been this way. A standard attracts a champion and the champion establishes a fiefdom of acolytes around it. Yep, the human condition. And it sort of works. Some of the time. (btw - another artifact of the human condition - your brilliant ideas are never perceived as such until someone else has them - just ask poor old Wallace how he is feeling this year). A downside of this approach is that the various TDWG standards are very poorly coordinated between each other - this is something we should be able to do something about.
We could piss on the TDWG tent from the outside, but you have to agree, it is much more satisfying to get inside the tent and piss on and piss off everyone in it... :)
<disclaimer>None of the above ideas are mine. I am following the TDWG standard practice of restating them without attribution :)
</disclaimer>
Ah... that was fun...
jim
On Fri, Apr 24, 2009 at 5:09 AM, Peter DeVries pete.devries@gmail.com wrote:
Respectfully,
- Only certain classes of organizations will be able to contribute since
the standard is requires special skills. Those groups that can pay for hardware and a person specific to this standard for perpetuity. I look at this and think that a number of groups that could be providers cannot because of
the
way the system is implemented. Why not have a simple RDF tar or zip file
format
that GBIF checks with a crawler every night? 2) There is very little reuse of existing vocabularies, geo for instance. Similar to the "not invented here mentality". 3) Discussions and decisions seem to be too much about making sure that providers keep their "brand" on the data even if they disappear. 4) Suggestions or alternative ways of thinking are rejected until an
insider
restates them without attribution 5) It is not at all clear how some of these decisions are made. It
appears
as if some people disagree, there is discussion. Then years later there
is
the same discussion. It seems that some smaller group keeps pulling
everyone back to the same architectural decision. 6) Where are the example data sets? We should have some example data sets available to see if the standard can be used to answer real questions? Either they don't exist or they are only available to a few. I actually have nothing but praise for GBIF and uBio (except for the
minor
encoding thing), this more about trying to work within TDWG and getting stonewalled. I am having the same feelings about it that I had a few
years
ago, after which I left to try to make something that worked so I could proceed with my project. It probably was unfair to imply that the fiefdoms are by design, rather
than
a side effect of the implementation standards, and for that I apologize.
- Pete
On Thu, Apr 23, 2009 at 10:56 AM, Bob Morris morris.bob@gmail.com
wrote:
"described by anyone" is not the same as "described by anyone in any way convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or proposed standard you find enables fiefdoms \in ways that are impossible under
some
other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries <pete.devries@gmail.com
wrote:
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data. "The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction.
For
organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to
grow
organically and be described by anyone. The same people would probably
not
believe the Web possible in the first place if there were not already
ample
proof of its success." REST for Java developers, Part 4: The future is RESTful
From
http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=...
I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources. It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/%7Eram http://www.cs.umb.edu/~ram/calendar.htmlhttp://www.cs.umb.edu/%7Eram/calendar.html phone (+1)617 287 6466
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
--
Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499
"Words, as is well known, are the great foes of reality."
- Joseph Conrad, author (1857-1924)
"I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant."
- attributed to Robert McCloskey, US State Department spokesman
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Pete, another custom non rdf application of it are the GBIF REST services, e.g.: http://data.gbif.org/ws/rest/occurrence/get/1003321
In regards of interoperable standards applicable to different architecture I would like to point out the revised Darwin Core Terms that are not tied to a technical implementation per se. Just like dublin core the darwin core group has preferred to provide several best practice guidelines on how to use the same dwc terms in the context of xml, rdf or simple text, our (gbif) latest favorite encoding. Because the terms are rather simple, it allows you to encode and transform the exact same data for a variety of technical architectures.
http://darwincore.googlecode.com/svn/trunk/terms/index.htm
Wouldn't it be great to have this minimal set of biodiversity concepts available for rdf(-a),xml,text,html, rss, micro formats & tagging? There is a discrepancy to the current lsid vocabularies of course as with all tdwg standards. But shouldn't we try to get this fixed and have a core set of terms available everywhere?
Personally I think this is a far more important issue than having globally unique ids. They would have to resolve to something meaningful anyway to be useful.
Markus
On Apr 24, 2009, at 4:24, Bob Morris wrote:
Also, the Plazi EoL REST service for descriptions extracted from legacy literature has particularly minimal use of TaxonOccurrence objects. See sample, including restful service spec, at http://wiki.tdwg.org/twiki/bin/view/SPM/PlaziEOLProject
Bob Morris
On Thu, Apr 23, 2009 at 10:16 PM, Kevin Richards <RichardsK@landcareresearch.co.nz
wrote:
Pete
You should have sent something around on the mailing list, I could have given you an example of a TaxonOccurrence. Or perhaps you did and I missed it???
Anyway, with Herb IMI, Paul Kirk and I have set up an resolver to provider TaxonOccurrence RDF data,
see for example urn:lsid:herbimi.info:specimens:100069 (or http://lsid.herbimi.info/authority/metadata/?lsid=urn:lsid:herbimi.info:spec... in your browser). It also has an example of using Interaction data
- ie in this case a host plant (IPNI ID) of a fungus (Herb IMI
specimen) with an identification to a taxon concept and name (Index Fungorum name).
Jim - feel free to help improve the ideas and processes of TDWG if you find them that bad. :-)
Kevin
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org ] On Behalf Of Peter DeVries Sent: Friday, 24 April 2009 1:46 p.m. To: Jim Croft Cc: Bob Morris; tdwg-tag@lists.tdwg.org Subject: Re: [tdwg-tag] Differences in thinking between TDWG and LinkedData groups about data sharing / integration
Hi Jim,
Thanks for keeping in good humor. :-)
I was trying to get my head around the TaxonOccurrence standard so I could rewrite my observation records,
and was hoping on finding some examples. That experience, and some related issues, made me a little irritable,
for that I am sorry.
It might be useful to make up a test set. It does not even need to be real. In my experience, this is where
you really start to see if the standard does what you want it to do. Many times I have thought that I had
everything figured out, only to discover after loading data into my triple store that something will not work
or something that used to work was now broken by the addition of a new feature.
I am not bothered by the idea of branding, it just seemed that the it was being confused with the goal of
persistence in what seemed to be a unproductive thread.
Thanks again for being amused, tolerant and insightful. :-)
- Pete
On Thu, Apr 23, 2009 at 7:50 PM, Jim Croft jim.croft@gmail.com wrote:
Can't let this great opportunity pass... :)
It is not just us. It is the regrettable human condition. See also: http://www.informationweek.com/news/infrastructure/management/showArticle.jh...
<diatribe_alert/>
One of the intriguing and enigmatic things about TDWG is that it does not seem to respect its own standards, preferring to invent another set rather than fix or enhance what is already there. More than the 'not invented here' syndrome, we have to deal with 'not invented here this week' and the user is left with the impression of a happy self-absorbed group chasing a flock of flitting butterflies - 'oh, that's pretty, I will try and catch that one'. Yep, just the human condition. In another context, Roger H, http://www.hyam.net/blog/archives/346, has provided the best description of TDWG and its standards I think I have ever heard: "It is kind of like the guy selling seagulls on the beach. You give him £5 and he points into the sky and says 'That one is yours'. Yes he is providing a service but the relationship that counts is the one between you and the gull." This is both profound and frightening in how close it is to reality.
On the complexity and costs of implementation, you are absolutely right. Taxonomic database standards have moved out of the domain of taxonomists, they no longer understand what we are talking about (hell, I no longer understand what we are talking about. And the rest of you? c'mon now, be honest... :) and we are left with this relationship of 'trust us' paternalism that I do not think is all that healthy - we all trust and respect Microsoft, right?. It is getting increasingly difficult give advice to a bod with a beat up PC in a developing country on what they should do - the alphabet soup surrounding GUIDS and the like just does not cut it when the mind set is an Excel spreadsheet with a bunch of intuitive headings. Oh, I'm sorry - you mean it is not just in developing countries? There is a widening gap between 'Joe the Taxonomist' and business of taxonomy data standards that we do not seem to be able to address (do we even care?). We used to be able to give our staff in the herbarium a TDWG standard and say 'this is what we have to do.' Not anymore... Ah, the good ole days... Maybe the answer lies in the hegemony of a new and benevolent 'MSOffice for Taxonomy'. Could work, but I do not think it is going to be particularly satisfying.
On the reuse issue, Greg W argues that we do not do nearly enough of this and I agree with him. He argues that TDWG should focus on the standard and not the application and implementation of the standard and has proposed that in our vocabularies we should adopt the principles of nomenclatural priority, that is, going back to *Dublin* Core, adding stuff chronologically from other other standards, including our own, until there is no option other than to invent another one, or there is nothing in our domain left to standardize. For taxonomists there is something inherently attractive in this approach - don't describe a taxon where it already exists, don't invent a standard where one already exists. To retrofit this and untangle all the synonymy and homonymy in our existing standards and implementations is going to take a lot of work though. But the vocabularies and ontologies are a good start.
On the 'branding, issue, it is not so much branding but attribution. Apart from the moral and legal issues, it is unscientific not to attribute, source and provide lineage for data. The is no optionality. We have to do it. Even if the initial supplier 'disappears'. *Especially* if the initial supplier disappears. Attribution (branding if you like) is absolutely essential for credibility. If someone is not going to do it, they can not have our data, and we will not use theirs.
On the architectural issue, I can not really get all that hung up on it. If a standard is good, it should be able to be implemented in a number of architectures (isn't that almost a definition of interoperability?). Where things get 'interesting' is when architecture (and the continuum towards application) becomes the standard or part of the standard. TDWG needs to constantly ask itself to what extent it needs to get involved with implementation of the standards it promotes. I would argue 'not at all', but this is another discussion.
And the 'insider' cabalistic nature of TDWG? What can I say - it has always been this way. A standard attracts a champion and the champion establishes a fiefdom of acolytes around it. Yep, the human condition. And it sort of works. Some of the time. (btw - another artifact of the human condition - your brilliant ideas are never perceived as such until someone else has them - just ask poor old Wallace how he is feeling this year). A downside of this approach is that the various TDWG standards are very poorly coordinated between each other - this is something we should be able to do something about.
We could piss on the TDWG tent from the outside, but you have to agree, it is much more satisfying to get inside the tent and piss on and piss off everyone in it... :)
<disclaimer>None of the above ideas are mine. I am following the TDWG standard practice of restating them without attribution :)
</disclaimer>
Ah... that was fun...
jim
On Fri, Apr 24, 2009 at 5:09 AM, Peter DeVries pete.devries@gmail.com wrote:
Respectfully,
- Only certain classes of organizations will be able to
contribute since
the standard is requires special skills. Those groups that can pay
for
hardware and a person specific to this standard for perpetuity. I look at
this and
think that a number of groups that could be providers cannot
because of the
way the system is implemented. Why not have a simple RDF tar or zip
file format
that GBIF checks with a crawler every night? 2) There is very little reuse of existing vocabularies, geo for
instance.
Similar to the "not invented here mentality". 3) Discussions and decisions seem to be too much about making sure
that
providers keep their "brand" on the data even if they disappear. 4) Suggestions or alternative ways of thinking are rejected until
an insider
restates them without attribution 5) It is not at all clear how some of these decisions are made. It
appears
as if some people disagree, there is discussion. Then years later
there is
the same discussion. It seems that some smaller group keeps
pulling
everyone back to the same architectural decision. 6) Where are the example data sets? We should have some example
data sets
available to see if the standard can be used to answer real
questions?
Either they don't exist or they are only available to a few.
I actually have nothing but praise for GBIF and uBio (except for
the minor
encoding thing), this more about trying to work within TDWG and
getting
stonewalled. I am having the same feelings about it that I had a
few years
ago, after which I left to try to make something that worked so I
could
proceed with my project. It probably was unfair to imply that the fiefdoms are by design,
rather than
a side effect of the implementation standards, and for that I
apologize.
- Pete
On Thu, Apr 23, 2009 at 10:56 AM, Bob Morris
morris.bob@gmail.com wrote:
"described by anyone" is not the same as "described by anyone in
any way
convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or
proposed
standard you find enables fiefdoms \in ways that are impossible
under some
other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries <pete.devries@gmail.com
wrote:
This paragraph below seems to encapsulate the differences in
thinking
between the linkeddata community and some of the TDWG people on how to best share biodiversity data. "The notion of a fabric of resources that are individually
described,
queried, and resolved may seem unmanageable or like science
fiction. For
organizations that are used to large, manual, centralized
efforts to
standardize on everything, it may seem anarchic to allow
resources to grow
organically and be described by anyone. The same people would
probably not
believe the Web possible in the first place if there were not
already ample
proof of its success." REST for Java developers, Part 4: The future is RESTful
From http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=... I think that some people may have lost sight of the goal of
making data
available to improve the understanding of our natural world and hopefully better manage our natural resources. It does not seem that creating a distributed network of fiefdoms
will
help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram/calendar.html phone (+1)617 287 6466
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
--
Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499
"Words, as is well known, are the great foes of reality."
- Joseph Conrad, author (1857-1924)
"I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant."
- attributed to Robert McCloskey, US State Department spokesman
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram/calendar.html phone (+1)617 287 6466 _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Thank you everyone :-)
On Fri, Apr 24, 2009 at 4:39 AM, "Markus Döring (GBIF)" mdoering@gbif.orgwrote:
This I like :-)
I will look through this and the examples and try to get my stuff to follow it.
Personally I think this is a far more important issue than having
globally unique ids.
Yes, I agree. Also I think it is something that could move forward quickly with less controversy.
Thanks again,
- Pete
Pete,
Trying to maintain Jim's light hearted take on this:
(-: we have c. one million occurrence records deliverable in TDWG standard form (HISPID3) from http://anbg.gov.au/cgi-bin/anhsir. Available since March 1993 (when there were < 200 web servers on the Internet) this URL provides access to a web form interface and an API for REST style services (even then): http://anbg.gov.au/cgi-bin/anhsir?format=HISPID3&taxon_name=Doodia+asper... for example will return HISPID data extract from all occurrence records with taxon names starting with Doodia aspera. The "ontology" on which this is based is available at http://plantnet.rbgsyd.nsw.gov.au/HISCOM/HISPID/HISPID3/H3.html. HISPID was first published in 1989, accepted as a TDWG standard in 1993 and revised again in print in 1996.
The current HISPID ( http://hiscom.chah.org.au/wiki/HISPID_5 ) extends ABCD. Sooner or later we all have to let go and go with the flow. Darwin Core does not yet support Herbarium data interchange. But before it does, I would like to see another HISPID, designed explicitly for Herbarium Data interchange, based on a standardized TDWG vocabulary. :-)
We are just a few weeks away from exposing these data as both linked data and TAPIR XML with resolvable LSIDs.
The Australian Faunal Directory (AFD) and Australian Plant Name Index (APNI) were to be made available at the same time under each of these protocols (incl. TAPIR TCS) applying as much of the TDWG vocabulary that we have been able to use ... but if there are plans to change this stuff do we have to rethink - again? From our efforts we too would like to have input into how the vocabularies evolve. Free Data (apologies to L.Lessig) is the essence of the semantic web and we know from experience that we have to live with how we let it go.
If I was at home in bed last Thursday the <diatribe/> may not have been so light hearted. So I am still hopeful it can all work out for the best.
greg
2009/4/24 Peter DeVries pete.devries@gmail.com:
Thank you everyone :-)
On Fri, Apr 24, 2009 at 4:39 AM, "Markus Döring (GBIF)" mdoering@gbif.org wrote:
This I like :-)
I will look through this and the examples and try to get my stuff to follow it.
Personally I think this is a far more important issue than having globally unique ids.
Yes, I agree. Also I think it is something that could move forward quickly with less controversy. Thanks again,
- Pete
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Thanks to both of you for the examples. :-) I had asked some people via Twitter.
- Pete
On Thu, Apr 23, 2009 at 9:16 PM, Kevin Richards < RichardsK@landcareresearch.co.nz> wrote:
Pete
You should have sent something around on the mailing list, I could have given you an example of a TaxonOccurrence. Or perhaps you did and I missed it???
Anyway, with Herb IMI, Paul Kirk and I have set up an resolver to provider TaxonOccurrence RDF data,
see for example urn:lsid:herbimi.info:specimens:100069 (or http://lsid.herbimi.info/authority/metadata/?lsid=urn:lsid:herbimi.info:spec... your browser). It also has an example of using Interaction data - ie in this case a host plant (IPNI ID) of a fungus (Herb IMI specimen) with an identification to a taxon concept and name (Index Fungorum name).
Jim - feel free to help improve the ideas and processes of TDWG if you find them that bad. :-)
Kevin
*From:* tdwg-tag-bounces@lists.tdwg.org [mailto: tdwg-tag-bounces@lists.tdwg.org] *On Behalf Of *Peter DeVries *Sent:* Friday, 24 April 2009 1:46 p.m. *To:* Jim Croft *Cc:* Bob Morris; tdwg-tag@lists.tdwg.org *Subject:* Re: [tdwg-tag] Differences in thinking between TDWG and LinkedData groups about data sharing / integration
Hi Jim,
Thanks for keeping in good humor. :-)
I was trying to get my head around the TaxonOccurrence standard so I could rewrite my observation records,
and was hoping on finding some examples. That experience, and some related issues, made me a little irritable,
for that I am sorry.
It might be useful to make up a test set. It does not even need to be real. In my experience, this is where
you really start to see if the standard does what you want it to do. Many times I have thought that I had
everything figured out, only to discover after loading data into my triple store that something will not work
or something that used to work was now broken by the addition of a new feature.
I am not bothered by the idea of branding, it just seemed that the it was being confused with the goal of
persistence in what seemed to be a unproductive thread.
Thanks again for being amused, tolerant and insightful. :-)
- Pete
On Thu, Apr 23, 2009 at 7:50 PM, Jim Croft jim.croft@gmail.com wrote:
Can't let this great opportunity pass... :)
It is not just us. It is the regrettable human condition. See also:
http://www.informationweek.com/news/infrastructure/management/showArticle.jh...
<diatribe_alert/>
One of the intriguing and enigmatic things about TDWG is that it does not seem to respect its own standards, preferring to invent another set rather than fix or enhance what is already there. More than the 'not invented here' syndrome, we have to deal with 'not invented here this week' and the user is left with the impression of a happy self-absorbed group chasing a flock of flitting butterflies - 'oh, that's pretty, I will try and catch that one'. Yep, just the human condition. In another context, Roger H, http://www.hyam.net/blog/archives/346, has provided the best description of TDWG and its standards I think I have ever heard: "It is kind of like the guy selling seagulls on the beach. You give him £5 and he points into the sky and says 'That one is yours'. Yes he is providing a service but the relationship that counts is the one between you and the gull." This is both profound and frightening in how close it is to reality.
On the complexity and costs of implementation, you are absolutely right. Taxonomic database standards have moved out of the domain of taxonomists, they no longer understand what we are talking about (hell, I no longer understand what we are talking about. And the rest of you? c'mon now, be honest... :) and we are left with this relationship of 'trust us' paternalism that I do not think is all that healthy - we all trust and respect Microsoft, right?. It is getting increasingly difficult give advice to a bod with a beat up PC in a developing country on what they should do - the alphabet soup surrounding GUIDS and the like just does not cut it when the mind set is an Excel spreadsheet with a bunch of intuitive headings. Oh, I'm sorry - you mean it is not just in developing countries? There is a widening gap between 'Joe the Taxonomist' and business of taxonomy data standards that we do not seem to be able to address (do we even care?). We used to be able to give our staff in the herbarium a TDWG standard and say 'this is what we have to do.' Not anymore... Ah, the good ole days... Maybe the answer lies in the hegemony of a new and benevolent 'MSOffice for Taxonomy'. Could work, but I do not think it is going to be particularly satisfying.
On the reuse issue, Greg W argues that we do not do nearly enough of this and I agree with him. He argues that TDWG should focus on the standard and not the application and implementation of the standard and has proposed that in our vocabularies we should adopt the principles of nomenclatural priority, that is, going back to *Dublin* Core, adding stuff chronologically from other other standards, including our own, until there is no option other than to invent another one, or there is nothing in our domain left to standardize. For taxonomists there is something inherently attractive in this approach - don't describe a taxon where it already exists, don't invent a standard where one already exists. To retrofit this and untangle all the synonymy and homonymy in our existing standards and implementations is going to take a lot of work though. But the vocabularies and ontologies are a good start.
On the 'branding, issue, it is not so much branding but attribution. Apart from the moral and legal issues, it is unscientific not to attribute, source and provide lineage for data. The is no optionality. We have to do it. Even if the initial supplier 'disappears'. *Especially* if the initial supplier disappears. Attribution (branding if you like) is absolutely essential for credibility. If someone is not going to do it, they can not have our data, and we will not use theirs.
On the architectural issue, I can not really get all that hung up on it. If a standard is good, it should be able to be implemented in a number of architectures (isn't that almost a definition of interoperability?). Where things get 'interesting' is when architecture (and the continuum towards application) becomes the standard or part of the standard. TDWG needs to constantly ask itself to what extent it needs to get involved with implementation of the standards it promotes. I would argue 'not at all', but this is another discussion.
And the 'insider' cabalistic nature of TDWG? What can I say - it has always been this way. A standard attracts a champion and the champion establishes a fiefdom of acolytes around it. Yep, the human condition. And it sort of works. Some of the time. (btw - another artifact of the human condition - your brilliant ideas are never perceived as such until someone else has them - just ask poor old Wallace how he is feeling this year). A downside of this approach is that the various TDWG standards are very poorly coordinated between each other - this is something we should be able to do something about.
We could piss on the TDWG tent from the outside, but you have to agree, it is much more satisfying to get inside the tent and piss on and piss off everyone in it... :)
<disclaimer>None of the above ideas are mine. I am following the TDWG standard practice of restating them without attribution :)
</disclaimer>
Ah... that was fun...
jim
On Fri, Apr 24, 2009 at 5:09 AM, Peter DeVries pete.devries@gmail.com wrote:
Respectfully,
- Only certain classes of organizations will be able to contribute since
the standard is requires special skills. Those groups that can pay for hardware and a person specific to this standard for perpetuity. I look at this and think that a number of groups that could be providers cannot because of
the
way the system is implemented. Why not have a simple RDF tar or zip file
format
that GBIF checks with a crawler every night? 2) There is very little reuse of existing vocabularies, geo for instance. Similar to the "not invented here mentality". 3) Discussions and decisions seem to be too much about making sure that providers keep their "brand" on the data even if they disappear. 4) Suggestions or alternative ways of thinking are rejected until an
insider
restates them without attribution 5) It is not at all clear how some of these decisions are made. It
appears
as if some people disagree, there is discussion. Then years later there
is
the same discussion. It seems that some smaller group keeps pulling
everyone back to the same architectural decision. 6) Where are the example data sets? We should have some example data sets available to see if the standard can be used to answer real questions? Either they don't exist or they are only available to a few. I actually have nothing but praise for GBIF and uBio (except for the
minor
encoding thing), this more about trying to work within TDWG and getting stonewalled. I am having the same feelings about it that I had a few
years
ago, after which I left to try to make something that worked so I could proceed with my project. It probably was unfair to imply that the fiefdoms are by design, rather
than
a side effect of the implementation standards, and for that I apologize.
- Pete
On Thu, Apr 23, 2009 at 10:56 AM, Bob Morris morris.bob@gmail.com
wrote:
"described by anyone" is not the same as "described by anyone in any way convenient to the describer", so I find this quotation somewhat disingenuous. More precisely, I wonder what TDWG standard or proposed standard you find enables fiefdoms \in ways that are impossible under
some
other solution to the problem the standard addresses/.
Bob Morris
On Thu, Apr 23, 2009 at 10:47 AM, Peter DeVries <pete.devries@gmail.com
wrote:
This paragraph below seems to encapsulate the differences in thinking between the linkeddata community and some of the TDWG people on how to best share biodiversity data. "The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction.
For
organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to
grow
organically and be described by anyone. The same people would probably
not
believe the Web possible in the first place if there were not already
ample
proof of its success." REST for Java developers, Part 4: The future is RESTful
From
http://www.javaworld.com/javaworld/jw-04-2009/jw-04-rest-series-4.html?page=...
I think that some people may have lost sight of the goal of making data available to improve the understanding of our natural world and hopefully better manage our natural resources. It does not seem that creating a distributed network of fiefdoms will help us achieve this goal.
- Pete
I was led to this article by @janzemanek on twitter.
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Professor of Computer Science UMASS-Boston ram@cs.umb.edu http://bdei.cs.umb.edu/ http://www.cs.umb.edu/~ram http://www.cs.umb.edu/~ram/calendar.html phone (+1)617 287 6466
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
--
Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499
"Words, as is well known, are the great foes of reality."
- Joseph Conrad, author (1857-1924)
"I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant."
- attributed to Robert McCloskey, US State Department spokesman
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Thanks Kevin - I did not say the activities were bad - just enigmatic. A decade or less ago we could take TDWG standards into the herbarium and botanists and technicians alike could understand them and and what they were trying to achieve. We can not do that anymore, at least not in our place, and I suspect this is a pattern that is replicated almost everywhere. (Importantly, we used to be able to talk to bureaucrats about the standards; now we just mumble the words 'TDWG standard' in passing and hope they do not care too much - it has worked so far - just hope they do not ask for any details.)
It is almost as though there was a conscious branch in the TDWG evolutionary tree, a fork in the programming, but I can not put my finger on when, where, how or why.
While this not in itself a bad thing, progress, like something else, happens, one of the consequences is an intellectual disconnect between the managers and bods in the collection (who usually have control of the money) and the geeks at the computers. If those with the money can not see or understand what is being done with it, they are not very disposed to parting with it. And if they could see the churn that is happening around uids, persistence and resolvability... How to rebuild and strengthen this connection is what keeps me awake at night - I suspect it may not be possible now.
I look upon TDWG as clearing house for all information standards applicable to biological collections and taxonomy. In this respect I am now a consumer more than a contributer and I have to admit that as a clearing house, TDWG, even though it offers so much more, is now pretty opaque to the average taxonomist.
Maybe that's the problem I am wrestling with. It is not so much that sometime in the last decade TDWG switched from being a demand driven to a supply side activity, but that the demand forked, taxonomists went back to doing what they do best and the data integrators took over the running (and a number of taxonomists in the process). Maybe...
Maybe I am wrestling with something that does not need to be wrestled with. Maybe it is ok for the geeks to say to the taxonomists, here is a new black box gizmo we have made for your stuff, use it. But I would disappointed if it was. Collectors and taxonomists manage 'things', and TDWG is the *only* mechanism we have to help them to know and be precise and unambiguous about those 'things' they manage. So, for every 'thing' they are likely to encounter, TDWG needs to have a rule about what they can or should do with it. Not sure we need to have a pronouncement on how, but, maybe we do...
And the technical architecture group is the only instrument I can see in TDWG to identify and address TDWG's own internal competing (ok, alternative) standards - and deprecating them as prior standards without providing a replacement is not really the answer I want to hear.
As for trying to improve the process, what part of the combination of the words 'herd' and 'cats' is causing difficulty? :)
jim
On Fri, Apr 24, 2009 at 12:16 PM, Kevin Richards RichardsK@landcareresearch.co.nz wrote:
Pete
You should have sent something around on the mailing list, I could have given you an example of a TaxonOccurrence. Or perhaps you did and I missed it???
Anyway, with Herb IMI, Paul Kirk and I have set up an resolver to provider TaxonOccurrence RDF data,
see for example urn:lsid:herbimi.info:specimens:100069 (or http://lsid.herbimi.info/authority/metadata/?lsid=urn:lsid:herbimi.info:spec... in your browser). It also has an example of using Interaction data - ie in this case a host plant (IPNI ID) of a fungus (Herb IMI specimen) with an identification to a taxon concept and name (Index Fungorum name).
Jim - feel free to help improve the ideas and processes of TDWG if you find them that bad. :-)
Kevin
On Fri, Apr 24, 2009 at 12:27 AM, Jim Croft jim.croft@gmail.com wrote:
... It is almost as though there was a conscious branch in the TDWG evolutionary tree, a fork in the programming, but I can not put my finger on when, where, how or why.
jim
I would say it was the very salutary--but all too short--Moore foundation
stimulus package (sic). Three very energetic and competent professionals put a huge amount of really good stuff on the table. Just at the point where that has to be carried forward, now we are back to an organization of qualified volunteers who---surprise, surprise---have no more time on their hands than they did before. This is pretty typical of volunteer standards bodies. W3C, OMG, and even ISO take years to hammer out standards, which then take more years until wide adoption (unless there is economic incentive).
Bob
ah yes, that is almost certainly a, if not the, watershed...
wonder if the looming international year of biodiversity can be parlayed to significant economic incentive?
also wonder if the soon to to constituted GBIF tech group will be able to cut to the chase and put this all to bed?
jim
On Fri, Apr 24, 2009 at 3:02 PM, Bob Morris morris.bob@gmail.com wrote:
I would say it was the very salutary--but all too short--Moore foundation stimulus package (sic). Three very energetic and competent professionals put a huge amount of really good stuff on the table. Just at the point where that has to be carried forward, now we are back to an organization of qualified volunteers who---surprise, surprise---have no more time on their hands than they did before. This is pretty typical of volunteer standards bodies. W3C, OMG, and even ISO take years to hammer out standards, which then take more years until wide adoption (unless there is economic incentive).
Bob
participants (6)
-
"Markus Döring (GBIF)"
-
Bob Morris
-
greg whitbread
-
Jim Croft
-
Kevin Richards
-
Peter DeVries