Dear all (actually: I’m not sure who are
the recipients currently on tdwg-tag, maybe I will now find out!!)
As some on this list may be aware, this is
an area that has been of interest to me for quite some time (for example see: http://taxacom.markmail.org/message/ywq7ijiaeks7heiv
), so happy to see what can be done in this space.
Currently my algorithm is web accessible
and tests designated genus names against genus names held, and genus+species
combinations against both genus only, and genus+species combinations as held in
my “IRMNG” reference database (search entry point is at http://www.cmar.csiro.au/datacentre/irmng/
if interested). I am also planning to implement a degree of cross-rank matching
shortly, e.g. if a subgenus is supplied, test this as a possible genus against
genus+species combinations (as this often turns out to be the reason for a
direct mismatch in practice), same with infraspecies vs. subspecies (my current
interface does not yet handle infraspecies, and just detect then “parks”
apparent subgenera, but the intention is to handle these as testable components
in due course).
Maybe I will set up the above options and
let you know as available for testing. Also I may look for genus+species
concatenated (think Homosapiens), genus+subgenus+species with missing brackets
around subgenus, and maybe other things, as per my somewhat extensive exposure
to otherwise non-resolved namestrings floating around in OBIS/GBIF data
provider space. Of course it is a slippery slope; other examples are family in
genus field and vice versa, or common name similar; genus and species reversed;
truncated names not flagged as such; abbreviated genera (which I already handle
as exact, but not fuzzy matches at this time, at least as “H. sapiens” etc.);
more..
Any comments on the above welcome,
Regards - Tony
Tony Rees
Manager, Divisional Data Centre,
CSIRO Marine and Atmospheric Research,
GPO
Ph: 0362 325318 (Int: +61 362 325318)
Fax: 0362 325000 (Int: +61 362 325000)
e-mail: Tony.Rees@csiro.au
Manager, OBIS Australia regional node, http://www.obis.org.au/
Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm
Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
From:
tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Kevin Richards
Sent: Wednesday, 7 July 2010 11:55
AM
To: tdwg-tag@lists.tdwg.org
Cc: tdwg-content@lists.tdwg.org
Subject: [tdwg-tag] WPS for Names
I have been pondering taxon name matching type services lately…
I wonder if the OGC WPS (Web Processing Service) would make a good
platform for integrating the various name matching algorithms that are being
worked on lately.
I was imagining something like a web interface where you can go to and
view a list of the available algorithms and select different algorithms in
different orders to get the best set of match results your own list of name strings/data.
If everyone set up their algorithms as a WPS then this interface would
call each WPS in the appropriate order until then end of the configured
workflow path.
UI something like (in diagram):
Where the bottom part is configurable by the user. Each box being
a representation of a WPS service for doing the match.
Any thoughts?
Perhaps something that could be discussed at TDWG?
One issue would be how to define/specify the list of names to match
against – then when you pass the processing of a match routine how would it
access the names list to match?? Perhaps it could all be based on one
server and people could submit algorithm/WPS services to it?
Hmmmm, will keep dreaming …
Kevin
Please consider the
environment before printing this email
Warning: This electronic message together with any attachments is confidential.
If you receive it in error: (i) you must not read, use, disclose, copy or
retain it; (ii) please contact the sender immediately by reply email and then
delete the emails.
The views expressed in this email may not be those of Landcare Research New
Zealand Limited. http://www.landcareresearch.co.nz