I have been pondering taxon name matching type services lately...
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data. If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
[cid:image002.png@01CB1DDB.86BDF370]
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts? Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against - then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming ...
Kevin
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Dear all (actually: I'm not sure who are the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv ), so happy to see what can be done in this space.
Currently my algorithm is web accessible and tests designated genus names against genus names held, and genus+species combinations against both genus only, and genus+species combinations as held in my "IRMNG" reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/ if interested). I am also planning to implement a degree of cross-rank matching shortly, e.g. if a subgenus is supplied, test this as a possible genus against genus+species combinations (as this often turns out to be the reason for a direct mismatch in practice), same with infraspecies vs. subspecies (my current interface does not yet handle infraspecies, and just detect then "parks" apparent subgenera, but the intention is to handle these as testable components in due course).
Maybe I will set up the above options and let you know as available for testing. Also I may look for genus+species concatenated (think Homosapiens), genus+subgenus+species with missing brackets around subgenus, and maybe other things, as per my somewhat extensive exposure to otherwise non-resolved namestrings floating around in OBIS/GBIF data provider space. Of course it is a slippery slope; other examples are family in genus field and vice versa, or common name similar; genus and species reversed; truncated names not flagged as such; abbreviated genera (which I already handle as exact, but not fuzzy matches at this time, at least as "H. sapiens" etc.); more..
Any comments on the above welcome,
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
________________________________ From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Wednesday, 7 July 2010 11:55 AM To: tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately...
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data. If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
[cid:image002.gif@01CB1DF6.B77E7730]
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts? Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against - then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming ...
Kevin
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Thanks Tony. I would be interested in any collaboration we can do in this area (assuming I find 5 minutes to work on it :-))
There seems to be several approaches to name matching/integration - one working with the name strings, and one working with more structured data. It would be good to clarify and perhaps standardise these approaches. (the second approach is discussed in a recent paper of mine).
It does indeed become a slippery slope, and this is one of the reasons I am keen to promote some sort of infrastructure/configurability of a matching system, so that end users can configure a matching algorithm/workflow to suit their particular data.
Kevin
From: Tony.Rees@csiro.au [mailto:Tony.Rees@csiro.au] Sent: Wednesday, 7 July 2010 7:06 p.m. To: Kevin Richards; tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: RE: WPS for Names
Dear all (actually: I'm not sure who are the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv ), so happy to see what can be done in this space.
Currently my algorithm is web accessible and tests designated genus names against genus names held, and genus+species combinations against both genus only, and genus+species combinations as held in my "IRMNG" reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/ if interested). I am also planning to implement a degree of cross-rank matching shortly, e.g. if a subgenus is supplied, test this as a possible genus against genus+species combinations (as this often turns out to be the reason for a direct mismatch in practice), same with infraspecies vs. subspecies (my current interface does not yet handle infraspecies, and just detect then "parks" apparent subgenera, but the intention is to handle these as testable components in due course).
Maybe I will set up the above options and let you know as available for testing. Also I may look for genus+species concatenated (think Homosapiens), genus+subgenus+species with missing brackets around subgenus, and maybe other things, as per my somewhat extensive exposure to otherwise non-resolved namestrings floating around in OBIS/GBIF data provider space. Of course it is a slippery slope; other examples are family in genus field and vice versa, or common name similar; genus and species reversed; truncated names not flagged as such; abbreviated genera (which I already handle as exact, but not fuzzy matches at this time, at least as "H. sapiens" etc.); more..
Any comments on the above welcome,
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
________________________________ From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Wednesday, 7 July 2010 11:55 AM To: tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately...
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data. If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
[cid:image001.gif@01CB1E7E.666E0530]
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts? Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against - then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming ...
Kevin
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Hi Kevin, Tony, and all
As one of the very likely consumers of such a service Ill be following this with great interest.
A quick question at this point. Is anything like this planned for GNA (www.globalnames.org)? or GBIF?
Evgeniy
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: 8. heinäkuuta 2010 0:32 To: Tony.Rees@csiro.au; tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-tag] WPS for Names
Thanks Tony. I would be interested in any collaboration we can do in this area (assuming I find 5 minutes to work on it :-))
There seems to be several approaches to name matching/integration one working with the name strings, and one working with more structured data. It would be good to clarify and perhaps standardise these approaches. (the second approach is discussed in a recent paper of mine).
It does indeed become a slippery slope, and this is one of the reasons I am keen to promote some sort of infrastructure/configurability of a matching system, so that end users can configure a matching algorithm/workflow to suit their particular data.
Kevin
From: Tony.Rees@csiro.au [mailto:Tony.Rees@csiro.au] Sent: Wednesday, 7 July 2010 7:06 p.m. To: Kevin Richards; tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: RE: WPS for Names
Dear all (actually: Im not sure who are the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv ), so happy to see what can be done in this space.
Currently my algorithm is web accessible and tests designated genus names against genus names held, and genus+species combinations against both genus only, and genus+species combinations as held in my IRMNG reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/ if interested). I am also planning to implement a degree of cross-rank matching shortly, e.g. if a subgenus is supplied, test this as a possible genus against genus+species combinations (as this often turns out to be the reason for a direct mismatch in practice), same with infraspecies vs. subspecies (my current interface does not yet handle infraspecies, and just detect then parks apparent subgenera, but the intention is to handle these as testable components in due course).
Maybe I will set up the above options and let you know as available for testing. Also I may look for genus+species concatenated (think Homosapiens), genus+subgenus+species with missing brackets around subgenus, and maybe other things, as per my somewhat extensive exposure to otherwise non-resolved namestrings floating around in OBIS/GBIF data provider space. Of course it is a slippery slope; other examples are family in genus field and vice versa, or common name similar; genus and species reversed; truncated names not flagged as such; abbreviated genera (which I already handle as exact, but not fuzzy matches at this time, at least as H. sapiens etc.); more..
Any comments on the above welcome,
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
_____
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Wednesday, 7 July 2010 11:55 AM To: tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data.
If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts?
Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming
Kevin
_____
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
_____
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Kevin,
Also of relevance would be the work that iPlant is sponsoring on a Taxonomic Name Resolution Service (TNRS) -- see:
https://pods.iplantcollaborative.org/wiki/display/iptol/TNRS+Workshop
for background on a recent workshop for this activity. I'm sure they'd be interested in collaborating. I think Brian Enquist would have a good idea of progress on the project and would be able to provide more details.
Matt
On Wed, Jul 7, 2010 at 1:49 PM, Evgeniy Meyke evgeniy@earthcape.com wrote:
Hi Kevin, Tony, and all
As one of the very likely consumers of such a service I’ll be following this with great interest.
A quick question at this point. Is anything like this planned for GNA ( www.globalnames.org)? or GBIF?
Evgeniy
*From:* tdwg-tag-bounces@lists.tdwg.org [mailto: tdwg-tag-bounces@lists.tdwg.org] *On Behalf Of *Kevin Richards *Sent:* 8. heinäkuuta 2010 0:32 *To:* Tony.Rees@csiro.au; tdwg-tag@lists.tdwg.org
*Cc:* tdwg-content@lists.tdwg.org *Subject:* Re: [tdwg-tag] WPS for Names
Thanks Tony. I would be interested in any collaboration we can do in this area (assuming I find 5 minutes to work on it :-))
There seems to be several approaches to name matching/integration – one working with the name strings, and one working with more structured data. It would be good to clarify and perhaps standardise these approaches. (the second approach is discussed in a recent paper of mine).
It does indeed become a slippery slope, and this is one of the reasons I am keen to promote some sort of infrastructure/configurability of a matching system, so that end users can configure a matching algorithm/workflow to suit their particular data.
Kevin
*From:* Tony.Rees@csiro.au [mailto:Tony.Rees@csiro.au] *Sent:* Wednesday, 7 July 2010 7:06 p.m. *To:* Kevin Richards; tdwg-tag@lists.tdwg.org *Cc:* tdwg-content@lists.tdwg.org *Subject:* RE: WPS for Names
Dear all (actually: I’m not sure who are the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv ), so happy to see what can be done in this space.
Currently my algorithm is web accessible and tests designated genus names against genus names held, and genus+species combinations against both genus only, and genus+species combinations as held in my “IRMNG” reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/ if interested). I am also planning to implement a degree of cross-rank matching shortly, e.g. if a subgenus is supplied, test this as a possible genus against genus+species combinations (as this often turns out to be the reason for a direct mismatch in practice), same with infraspecies vs. subspecies (my current interface does not yet handle infraspecies, and just detect then “parks” apparent subgenera, but the intention is to handle these as testable components in due course).
Maybe I will set up the above options and let you know as available for testing. Also I may look for genus+species concatenated (think Homosapiens), genus+subgenus+species with missing brackets around subgenus, and maybe other things, as per my somewhat extensive exposure to otherwise non-resolved namestrings floating around in OBIS/GBIF data provider space. Of course it is a slippery slope; other examples are family in genus field and vice versa, or common name similar; genus and species reversed; truncated names not flagged as such; abbreviated genera (which I already handle as exact, but not fuzzy matches at this time, at least as “H. sapiens” etc.); more..
Any comments on the above welcome,
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
*From:* tdwg-tag-bounces@lists.tdwg.org [mailto: tdwg-tag-bounces@lists.tdwg.org] *On Behalf Of *Kevin Richards *Sent:* Wednesday, 7 July 2010 11:55 AM *To:* tdwg-tag@lists.tdwg.org *Cc:* tdwg-content@lists.tdwg.org *Subject:* [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately…
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data.
If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts?
Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against – then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming …
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
You may recall that at the name processing meeting at TDWG last year there were many presentations on name recognition, discovery, and matching (terms we distinguished at a 2009 workshop and documented at http://code.google.com/p/taxon-name-processing/) and one outcome was the need to polish and operationalise some of these services. At GBIF, we undertook the following:
Supported the refinement of uBio's TaxonFinder , a name-recognition service. This upgrade uncoupled the service from client functions that were hard-coded into the original, abstracted ubio name-service lookups from the service, and conforms to an API that returns found names as dwc:scientificName
Our version is here (http://code.google.com/p/taxon-name-processing/wiki/NameFindingAPI ) and is near ready for release.
Markus has experimented with a Lucene-based name-matching algorithm which has a lot of promise but needs more dedicated time to be operational but it conforms to the same API.
We also supported the port of TaxaMatch to PHP and I have been discussing some refinements of this with Mike Giddens to get this service port into a more simplified and localised web application that basically allows a user to provide one or more name indexes that serve as target lexicons. These could be derived via simple lists or database links. I'd like to see the response conform to the same or extended output format as the name finding API. What we really want are simple solutions that do small things well.
Best,
David Remsen
---------------------------------------------------------------------------- David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen ----------------------------------------------------------------------------
On Jul 7, 2010, at 7:15 PM, Matt Jones wrote:
Kevin,
Also of relevance would be the work that iPlant is sponsoring on a Taxonomic Name Resolution Service (TNRS) -- see:
https://pods.iplantcollaborative.org/wiki/display/iptol/TNRS+Workshop
for background on a recent workshop for this activity. I'm sure they'd be interested in collaborating. I think Brian Enquist would have a good idea of progress on the project and would be able to provide more details.
Matt
On Wed, Jul 7, 2010 at 1:49 PM, Evgeniy Meyke evgeniy@earthcape.com wrote: Hi Kevin, Tony, and all
As one of the very likely consumers of such a service I’ll be following this with great interest.
A quick question at this point. Is anything like this planned for GNA (www.globalnames.org)? or GBIF?
Evgeniy
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org ] On Behalf Of Kevin Richards Sent: 8. heinäkuuta 2010 0:32 To: Tony.Rees@csiro.au; tdwg-tag@lists.tdwg.org
Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-tag] WPS for Names
Thanks Tony. I would be interested in any collaboration we can do in this area (assuming I find 5 minutes to work on it :-))
There seems to be several approaches to name matching/integration – one working with the name strings, and one working with more structured data. It would be good to clarify and perhaps standardise these approaches. (the second approach is discussed in a recent paper of mine).
It does indeed become a slippery slope, and this is one of the reasons I am keen to promote some sort of infrastructure/ configurability of a matching system, so that end users can configure a matching algorithm/workflow to suit their particular data.
Kevin
From: Tony.Rees@csiro.au [mailto:Tony.Rees@csiro.au] Sent: Wednesday, 7 July 2010 7:06 p.m. To: Kevin Richards; tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: RE: WPS for Names
Dear all (actually: I’m not sure who are the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv ), so happy to see what can be done in this space.
Currently my algorithm is web accessible and tests designated genus names against genus names held, and genus+species combinations against both genus only, and genus+species combinations as held in my “IRMNG” reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/ if interested). I am also planning to implement a degree of cross- rank matching shortly, e.g. if a subgenus is supplied, test this as a possible genus against genus+species combinations (as this often turns out to be the reason for a direct mismatch in practice), same with infraspecies vs. subspecies (my current interface does not yet handle infraspecies, and just detect then “parks” apparent subgenera, but the intention is to handle these as testable components in due course).
Maybe I will set up the above options and let you know as available for testing. Also I may look for genus+species concatenated (think Homosapiens), genus+subgenus+species with missing brackets around subgenus, and maybe other things, as per my somewhat extensive exposure to otherwise non-resolved namestrings floating around in OBIS/GBIF data provider space. Of course it is a slippery slope; other examples are family in genus field and vice versa, or common name similar; genus and species reversed; truncated names not flagged as such; abbreviated genera (which I already handle as exact, but not fuzzy matches at this time, at least as “H. sapiens” etc.); more..
Any comments on the above welcome,
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org ] On Behalf Of Kevin Richards Sent: Wednesday, 7 July 2010 11:55 AM To: tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.org Subject: [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately…
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data.
If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts?
Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against – then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming …
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Excellent. This sound like a really good start towards what I was envisioning. Thanks Kevin
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of David Remsen (GBIF) Sent: Friday, 9 July 2010 8:27 a.m. To: tdwg-content@lists.tdwg.org Mailing List Cc: Technical Architecture Group mailing list Subject: Re: [tdwg-content] [tdwg-tag] WPS for Names
You may recall that at the name processing meeting at TDWG last year there were many presentations on name recognition, discovery, and matching (terms we distinguished at a 2009 workshop and documented at http://code.google.com/p/taxon-name-processing/) and one outcome was the need to polish and operationalise some of these services. At GBIF, we undertook the following:
Supported the refinement of uBio's TaxonFinder , a name-recognition service. This upgrade uncoupled the service from client functions that were hard-coded into the original, abstracted ubio name-service lookups from the service, and conforms to an API that returns found names as dwc:scientificName
Our version is here (http://code.google.com/p/taxon-name-processing/wiki/NameFindingAPI) and is near ready for release.
Markus has experimented with a Lucene-based name-matching algorithm which has a lot of promise but needs more dedicated time to be operational but it conforms to the same API.
We also supported the port of TaxaMatch to PHP and I have been discussing some refinements of this with Mike Giddens to get this service port into a more simplified and localised web application that basically allows a user to provide one or more name indexes that serve as target lexicons. These could be derived via simple lists or database links. I'd like to see the response conform to the same or extended output format as the name finding API. What we really want are simple solutions that do small things well.
Best,
David Remsen
---------------------------------------------------------------------------- David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen ----------------------------------------------------------------------------
On Jul 7, 2010, at 7:15 PM, Matt Jones wrote:
Kevin,
Also of relevance would be the work that iPlant is sponsoring on a Taxonomic Name Resolution Service (TNRS) -- see:
https://pods.iplantcollaborative.org/wiki/display/iptol/TNRS+Workshop
for background on a recent workshop for this activity. I'm sure they'd be interested in collaborating. I think Brian Enquist would have a good idea of progress on the project and would be able to provide more details.
Matt On Wed, Jul 7, 2010 at 1:49 PM, Evgeniy Meyke <evgeniy@earthcape.commailto:evgeniy@earthcape.com> wrote: Hi Kevin, Tony, and all
As one of the very likely consumers of such a service I'll be following this with great interest.
A quick question at this point. Is anything like this planned for GNA (www.globalnames.orghttp://www.globalnames.org)? or GBIF?
Evgeniy
From: tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: 8. heinäkuuta 2010 0:32 To: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au; tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org
Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: Re: [tdwg-tag] WPS for Names
Thanks Tony. I would be interested in any collaboration we can do in this area (assuming I find 5 minutes to work on it :-))
There seems to be several approaches to name matching/integration - one working with the name strings, and one working with more structured data. It would be good to clarify and perhaps standardise these approaches. (the second approach is discussed in a recent paper of mine).
It does indeed become a slippery slope, and this is one of the reasons I am keen to promote some sort of infrastructure/configurability of a matching system, so that end users can configure a matching algorithm/workflow to suit their particular data.
Kevin
From: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au [mailto:Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au] Sent: Wednesday, 7 July 2010 7:06 p.m. To: Kevin Richards; tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: RE: WPS for Names
Dear all (actually: I'm not sure who are the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv ), so happy to see what can be done in this space.
Currently my algorithm is web accessible and tests designated genus names against genus names held, and genus+species combinations against both genus only, and genus+species combinations as held in my "IRMNG" reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/ if interested). I am also planning to implement a degree of cross-rank matching shortly, e.g. if a subgenus is supplied, test this as a possible genus against genus+species combinations (as this often turns out to be the reason for a direct mismatch in practice), same with infraspecies vs. subspecies (my current interface does not yet handle infraspecies, and just detect then "parks" apparent subgenera, but the intention is to handle these as testable components in due course).
Maybe I will set up the above options and let you know as available for testing. Also I may look for genus+species concatenated (think Homosapiens), genus+subgenus+species with missing brackets around subgenus, and maybe other things, as per my somewhat extensive exposure to otherwise non-resolved namestrings floating around in OBIS/GBIF data provider space. Of course it is a slippery slope; other examples are family in genus field and vice versa, or common name similar; genus and species reversed; truncated names not flagged as such; abbreviated genera (which I already handle as exact, but not fuzzy matches at this time, at least as "H. sapiens" etc.); more..
Any comments on the above welcome,
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
________________________________ From: tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Wednesday, 7 July 2010 11:55 AM To: tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately...
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data. If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
Error! Filename not specified.
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts? Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against - then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming ...
Kevin
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Hi David,
I took a look at the page http://code.google.com/p/taxon-name-processing/wiki/TdwgSession2009 which has an agenda and some documentation prepared in advance of the workshop you mention, but no summary of the discussion or action items. Does such exist somewhere else, and/or could it be added to that page perhaps?
Cheers - Tony
________________________________ From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of David Remsen (GBIF) Sent: Friday, 9 July 2010 6:27 AM To: tdwg-content@lists.tdwg.org Mailing List Cc: Technical Architecture Group mailing list Subject: Re: [tdwg-tag] [tdwg-content] WPS for Names
You may recall that at the name processing meeting at TDWG last year there were many presentations on name recognition, discovery, and matching (terms we distinguished at a 2009 workshop and documented at http://code.google.com/p/taxon-name-processing/) and one outcome was the need to polish and operationalise some of these services. At GBIF, we undertook the following:
Supported the refinement of uBio's TaxonFinder , a name-recognition service. This upgrade uncoupled the service from client functions that were hard-coded into the original, abstracted ubio name-service lookups from the service, and conforms to an API that returns found names as dwc:scientificName
Our version is here (http://code.google.com/p/taxon-name-processing/wiki/NameFindingAPI) and is near ready for release.
Markus has experimented with a Lucene-based name-matching algorithm which has a lot of promise but needs more dedicated time to be operational but it conforms to the same API.
We also supported the port of TaxaMatch to PHP and I have been discussing some refinements of this with Mike Giddens to get this service port into a more simplified and localised web application that basically allows a user to provide one or more name indexes that serve as target lexicons. These could be derived via simple lists or database links. I'd like to see the response conform to the same or extended output format as the name finding API. What we really want are simple solutions that do small things well.
Best,
David Remsen
---------------------------------------------------------------------------- David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen ----------------------------------------------------------------------------
On Jul 7, 2010, at 7:15 PM, Matt Jones wrote:
Kevin,
Also of relevance would be the work that iPlant is sponsoring on a Taxonomic Name Resolution Service (TNRS) -- see:
https://pods.iplantcollaborative.org/wiki/display/iptol/TNRS+Workshop
for background on a recent workshop for this activity. I'm sure they'd be interested in collaborating. I think Brian Enquist would have a good idea of progress on the project and would be able to provide more details.
Matt On Wed, Jul 7, 2010 at 1:49 PM, Evgeniy Meyke <evgeniy@earthcape.commailto:evgeniy@earthcape.com> wrote: Hi Kevin, Tony, and all
As one of the very likely consumers of such a service I'll be following this with great interest.
A quick question at this point. Is anything like this planned for GNA (www.globalnames.orghttp://www.globalnames.org)? or GBIF?
Evgeniy
From: tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: 8. heinäkuuta 2010 0:32 To: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au; tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org
Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: Re: [tdwg-tag] WPS for Names
Thanks Tony. I would be interested in any collaboration we can do in this area (assuming I find 5 minutes to work on it :-))
There seems to be several approaches to name matching/integration - one working with the name strings, and one working with more structured data. It would be good to clarify and perhaps standardise these approaches. (the second approach is discussed in a recent paper of mine).
It does indeed become a slippery slope, and this is one of the reasons I am keen to promote some sort of infrastructure/configurability of a matching system, so that end users can configure a matching algorithm/workflow to suit their particular data.
Kevin
From: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au [mailto:Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au] Sent: Wednesday, 7 July 2010 7:06 p.m. To: Kevin Richards; tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: RE: WPS for Names
Dear all (actually: I'm not sure who are the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv ), so happy to see what can be done in this space.
Currently my algorithm is web accessible and tests designated genus names against genus names held, and genus+species combinations against both genus only, and genus+species combinations as held in my "IRMNG" reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/ if interested). I am also planning to implement a degree of cross-rank matching shortly, e.g. if a subgenus is supplied, test this as a possible genus against genus+species combinations (as this often turns out to be the reason for a direct mismatch in practice), same with infraspecies vs. subspecies (my current interface does not yet handle infraspecies, and just detect then "parks" apparent subgenera, but the intention is to handle these as testable components in due course).
Maybe I will set up the above options and let you know as available for testing. Also I may look for genus+species concatenated (think Homosapiens), genus+subgenus+species with missing brackets around subgenus, and maybe other things, as per my somewhat extensive exposure to otherwise non-resolved namestrings floating around in OBIS/GBIF data provider space. Of course it is a slippery slope; other examples are family in genus field and vice versa, or common name similar; genus and species reversed; truncated names not flagged as such; abbreviated genera (which I already handle as exact, but not fuzzy matches at this time, at least as "H. sapiens" etc.); more..
Any comments on the above welcome,
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
________________________________ From: tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.orgmailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards Sent: Wednesday, 7 July 2010 11:55 AM To: tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately...
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data. If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
[%20]
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts? Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against - then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming ...
Kevin
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Dear Kevin,
During the last year I was involved in a project which used some of the OGC SWE (sensor web enablement) standards. Based on this experience I would say using OGC standards is not an easy thing at all. The main reason is that OGC specifications generally are quite 'generic' and require to specify what they call application profiles which require considerable efforts. What I found worst with OGC standards was that they perfectly enable you to XML describe any thinkable metadata but when it comes for example to deliver DATA via O&M you are left alone and you can indeed deliver anything you would like to as long it fits in a result element of type "any". As a result clients will perfectly be able to parse requests and metadata but probably fail to handle the 'real' output, (the data) when it come from outside your community even if the same 'standard' is used.
From what I have seen in the WPS specification this is also true for this
standard. You can nicely encode your requests etc but when it comes to deliver the output of such a service the specs say:
10.3.1 Execute response parameters The form of the response to an Execute operation request depends on the value of the ResponseForm parameter in the execute request. In the most primitive case, [...] RawDataOutput
So it seems that also WPS focuses on metadata and you would have to specify (standardise ;) !!) the output format yourself.
best regards, Robert
On Wed, Jul 7, 2010 at 3:54 AM, Kevin Richards < RichardsK@landcareresearch.co.nz> wrote:
I have been pondering taxon name matching type services lately…
I wonder if the OGC WPS (Web Processing Service) would make a good platform for integrating the various name matching algorithms that are being worked on lately.
I was imagining something like a web interface where you can go to and view a list of the available algorithms and select different algorithms in different orders to get the best set of match results your own list of name strings/data.
If everyone set up their algorithms as a WPS then this interface would call each WPS in the appropriate order until then end of the configured workflow path.
UI something like (in diagram):
Where the bottom part is configurable by the user. Each box being a representation of a WPS service for doing the match.
Any thoughts?
Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match against – then when you pass the processing of a match routine how would it access the names list to match?? Perhaps it could all be based on one server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming …
Kevin
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
participants (6)
-
David Remsen (GBIF)
-
Evgeniy Meyke
-
Kevin Richards
-
Matt Jones
-
Robert Huber
-
Tony.Rees@csiro.au