LSID conformance test tool

Thu Mar 9 08:32:31 CET 2006

I'm not sure what you mean by "hack one of the existing clients", but it is
always good to have two independent testbeds, preferably coded in different
programming languages. Otherwise, you run the risk of memorializing as
correct whatever a single one accepts. Further, don't assume that two
randomly chosen clients selected as a base are independent. They may be
using common assumptions, e.g. common rdf parsers. It was quite a while
before the SDD team realized that XML Spy was accepting as correct some
schema syntax that wasn't. We didn't discover this until we began building
tools that depended on having valid schemas. That's a smaller set of tools
than one might imagine (which could be at the root of some justly deserved
criticism of XML-Schema: its use is sometimes more advice than consent). We
now tend to run everything through at least two parsers. (Three actually,
because XML Spy, amazingly, uses different parsers in its Text view and in
its Schema view!). As a small aid to parties with no ability or inclination
to invoke standalone XML validating parsers, we wrote a simple wrapper
around Apache Xerces that let it be invoked by anybody, even those who would
rather push mouse buttons and use gui file browsers than invoke a single
keystroke in a shell command window...

The only specific a priori concern about built in assumptions that comes to
mind is the one that pervaded early TDWG LSID discussion, namely,  that LSID
Resolution Services might be conflated with LSID Resolution Discovery
Services  and that the former might get DNS notions inappropriately
ingrained in them merely because such notions are ingrained in the only
currentlResolution Discovery Service scheme ever mentioned by anybody, the
DDDS/DNS scheme of Section 8.3 of the LSID spec. [In particular, a
resolution client like Launchpad which is standalone must, ipso facto, have
some Resolution Discovery Service imbedded in it].

As in any enterprise, it's a little likely that the cultural norms of TDWG
membership will also creep in, but it is very difficult to characterize that
in ways that would inform conformance software. Operative words are likely
to include "systematist", "taxonomist", "museum", and "kingdom".(*) To me,
this means---ultimately---having a clear separation between tests of
\intended/ special things about TDWG blessed LSID resolvers and tests of
things that are identified as about all LSIDs, in case one can find such
things. This all augurs for tools that are at least in part meaningful when
applied against resolution services that have nothing to do with TDWG.

Ideally, conformance test software would cleverly sneak in functionality
that tests whether the resolution service author has actually read the LSID
spec. Hah, hah, just serious.

Bob
(*)Probably because of TDWG's history, those four words, seem to me to enter
in a lot of conversations that also include the phrase ("Oh, we didn't think
of that case").
I don't mean to denigrate any practitioners of any of these venues and
intend no offense. I mean only to remind  that any system architect comes
with baggage from the environment in which they usually work and it is a
quite difficult thing to recognize if and when that is in the way. But for
conformance software it is critical.
Of course I have the advantage of knowing close to no biology, so naturally
there are never hidden assumptions in biodiversity informations systems I
design. [That sentence was sarcastic].

On 3/9/06, Ricardo Scachetti Pereira <ricardo at tdwg.org> wrote:
>
>     Hi all,
>
>     There's been a number of new LSID resolvers (prototypes) poping up
> here and there due to the recent adoption of LSID specification by the
> Biodiversity Informatics community (see http://wiki.gbif.org/guidwiki
> and
> http://wiki.gbif.org/guidwiki/wikka.php?wakka=PrototypingWorkingGroup
> for more details).
>
>     Since then I found myself spending quite sometime trying to explore
> the details of each implementation and also troubleshooting some of the
> resolvers. Although there is a number of tools available for testing,
> such as both IE and Mozilla Launchpads and the Biopathways web resolver,
> I often find myself trying to get more info out of the resolvers using
> ad-hoc techniques, such as hacking urls in the web browser address box.
>
>     I thought it would be nice to have a more automated tool that would
> tell us all about each new resolver that pops up out there. I think
> developers setting up their own resolvers would benefit of such tool as
> well.
>
>     Initially, the idea as to develop a kind of LSID resolver debug
> tool, but it quickly evolved into some kind of LSID standard conformance
> test.

> [...]

So before I start doing this on my own, I would be grateful if you
> could share your thoughts regarding the development of an automated
> conformance testing tool for LSID resolvers, such as requirements, past
> and ongoing related activities, or any ideas on the matter.
> [...]
>
>     The other issue I'm struggling at the moment is the question of how
> to implement it. The first thing that came to my mind was to hack one of
> the existing clients. Now I'm not sure whether to use IE Launchpad and
> thus make this available as a desktop client or to use the Perl or Java
> client stacks to implement the tool as a web application.

I'm not sure what you mean by "hack one of the existing clients", but it is
always good to have two independent testbeds, preferably coded in different
programming languages. Otherwise, you run the risk of memorializing as
correct whatever a single one accepts. Further, don't assume that two
randomly chosen clients selected as a base are independent. They may be
using common assumptions, e.g. common rdf parsers. It was quite a while
before the SDD team realized that XML Spy was accepting as correct some
schema syntax that wasn't. We didn't discover this until we began building
tools that depended on having valid schema. That's a smaller set of tools
than one might imagine (which could be at the root of some justly deserved
criticism of XML-Schema: its use is sometimes more advice than consent). We
now tend to run everything through at least two parsers. (Three actually,
because XML Spy, amazingly, uses different parsers in its Text view and in
its Schema view!). As a small aid to parties with no ability or inclination
to invoke standalone XML validating parsers, we wrote a simple wrapper
around Apache Xerces that let it be invoked by anybody, even those who would
rather push mouse buttons and use gui file browsers than invoke a single
keystroke in a shell command window...

The only specific a priori concern about built in assumptions that comes to
mind is the one that pervaded early TDWG LSID discussion, namely,  that LSID
Resolution Services might be conflated with LSID Resolution Discovery
Services  and that the former might get DNS notions inappropriately
ingrained in them merely because such notions are ingrained in the only
currentlResolution Discovery Service scheme ever mentioned by anybody, the
DDDS/DNS scheme of Section 8.3 of the LSID spec.

As in any enterprise, it's a little likely that the cultural norms of TDWG
membership will also creep in, but it is very difficult to characterize that
in ways that would inform conformance software. Operative words are likely
to include "systematist", "taxonomist", "museum", and "kingdom". To me, this
means---ultimately---having a clear separation between tests of \intended/
special things about TDWG blessed LSID resolvers and tests of things that
are identified as about all LSIDs, in case one can find such things. This
all augurs for tools that are at least in part meaningful when applied
against resolution services that have nothing to do with TDWG.

Ideally, conformance test software would cleverly sneak in functionality
that tests whether the resolution service author has actually read the LSID
spec. Hah, hah, just serious.

Bob

    Anyway, thoughts are really appreciated.
>
>     Best regards,
>
> Ricardo
>
>
> PS. Sorry about the cross post. This message is being sent to both the
> LSID developers and the TDWG/GBIF GUID mailing lists, as I believe it
> will interest both communities.
>

------=_Part_1374_24198154.1141911151238
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

I'm not sure what you mean by &quot;hack one of the existing clients&quot;, but
it is always good to have two independent testbeds, preferably coded in
different programming languages. Otherwise, you run the risk of
memorializing as correct whatever a single one accepts. Further, don't
assume that two randomly chosen clients selected as a base are
independent. They may be using common assumptions, e.g. common rdf
parsers. It was quite a while before the SDD team realized that XML Spy
was accepting as correct some schema syntax that wasn't. We didn't
discover this until we began building tools that depended on having
valid schemas. That's a smaller set of tools than one might imagine
(which could be at the root of some justly deserved criticism of
XML-Schema: its use is sometimes more advice than consent). We now tend
to run everything through at least two parsers. (Three actually,
because XML Spy, amazingly, uses different parsers in its Text view and
in its Schema view!). As a small aid to parties with no ability or
inclination to invoke standalone XML validating parsers, we wrote a
simple wrapper around Apache Xerces that let it be invoked by anybody,
even those who would rather push mouse buttons and use gui file
browsers than invoke a single keystroke in a shell command window... 
 
The only specific a priori concern about built in assumptions that
comes to mind is the one that pervaded early TDWG LSID discussion,
namely,&nbsp; that LSID Resolution Services might be conflated with LSID
Resolution Discovery Services&nbsp; and that the former might get DNS
notions inappropriately ingrained in them merely because such notions
are ingrained in the only currentlResolution Discovery Service scheme
ever mentioned by anybody, the DDDS/DNS scheme of Section 8.3 of the
LSID spec. [In particular, a resolution client like Launchpad which is standalone must, ipso facto, have some Resolution Discovery Service imbedded in it]. 
 
As in any enterprise, it's a little likely that the cultural norms of
TDWG membership will also creep in, but it is very difficult to
characterize that in ways that would inform conformance software.
Operative words are likely to include &quot;systematist&quot;, &quot;taxonomist&quot;,
&quot;museum&quot;, and &quot;kingdom&quot;.(*) To me, this means---ultimately---having a
clear separation between tests of \intended/ special things about TDWG
blessed LSID resolvers and tests of things that are identified as about
all LSIDs, in case one can find such things. This all augurs for tools
that are at least in part meaningful when applied against resolution
services that have nothing to do with TDWG. 
 
Ideally, conformance test software would cleverly sneak in
functionality that tests whether the resolution service author has
actually read the LSID spec. Hah, hah, just serious. 
 
Bob 
(*)Probably because of TDWG's history, those four words, seem to me to
enter in a lot of conversations that also include the phrase (&quot;Oh, we
didn't think of that case&quot;). I don't mean to denigrate any practitioners of any of these venues and intend no offense. I mean only to remind&nbsp; that any system architect comes with baggage from the environment in which they usually work and it is a quite difficult thing to recognize if and when that is in the way. But for conformance software it is critical.
 Of course I have the advantage of knowing close to no biology, so naturally there are never hidden assumptions in biodiversity informations systems I design. [That sentence was sarcastic]. <div>
On 3/9/06, Ricardo Scachetti Pereira &lt;<a href="mailto:ricardo at tdwg.org">ricardo at tdwg.org</a>&gt; wrote:<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
&nbsp;&nbsp;&nbsp;&nbsp;Hi all, &nbsp;&nbsp;&nbsp;&nbsp;There's been a number of new LSID resolvers (prototypes) poping up here and there due to the recent adoption of LSID specification by the Biodiversity Informatics community (see <a href="http://wiki.gbif.org/guidwiki">
http://wiki.gbif.org/guidwiki</a> and <a href="http://wiki.gbif.org/guidwiki/wikka.php?wakka=PrototypingWorkingGroup">http://wiki.gbif.org/guidwiki/wikka.php?wakka=PrototypingWorkingGroup</a> for more details). 
 &nbsp;&nbsp;&nbsp;&nbsp;Since then I found myself spending quite sometime trying to explore the details of each implementation and also troubleshooting some of the resolvers. Although there is a number of tools available for testing,
 such as both IE and Mozilla Launchpads and the Biopathways web resolver, I often find myself trying to get more info out of the resolvers using ad-hoc techniques, such as hacking urls in the web browser address box.
 &nbsp;&nbsp;&nbsp;&nbsp;I thought it would be nice to have a more automated tool that would tell us all about each new resolver that pops up out there. I think developers setting up their own resolvers would benefit of such tool as
 well. &nbsp;&nbsp;&nbsp;&nbsp;Initially, the idea as to develop a kind of LSID resolver debug tool, but it quickly evolved into some kind of LSID standard conformance test.</blockquote><div> &gt; [...] </div> <div>
 &nbsp;</div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">So before I start doing this on my own, I would be grateful if you could share your thoughts regarding the development of an automated
 conformance testing tool for LSID resolvers, such as requirements, past and ongoing related activities, or any ideas on the matter. [...] &nbsp;&nbsp;&nbsp;&nbsp;The other issue I'm struggling at the moment is the question of how
 to implement it. The first thing that came to my mind was to hack one of the existing clients. Now I'm not sure whether to use IE Launchpad and thus make this available as a desktop client or to use the Perl or Java
 client stacks to implement the tool as a web application.</blockquote><div> I'm not sure what you mean by &quot;hack one of the existing clients&quot;, but it is always good to have two independent testbeds, preferably coded in different programming languages. Otherwise, you run the risk of memorializing as correct whatever a single one accepts. Further, don't assume that two randomly chosen clients selected as a base are independent. They may be using common assumptions, 
e.g. common rdf parsers. It was quite a while before the SDD team realized that XML Spy was accepting as correct some schema syntax that wasn't. We didn't discover this until we began building tools that depended on having valid schema. That's a smaller set of tools than one might imagine (which could be at the root of some justly deserved criticism of XML-Schema: its use is sometimes more advice than consent). We now tend to run everything through at least two parsers. (Three actually, because XML Spy, amazingly, uses different parsers in its Text view and in its Schema view!). As a small aid to parties with no ability or inclination to invoke standalone XML validating parsers, we wrote a simple wrapper around Apache Xerces that let it be invoked by anybody, even those who would rather push mouse buttons and use gui file browsers than invoke a single keystroke in a shell command window...
 The only specific a priori concern about built in assumptions that comes to mind is the one that pervaded early TDWG LSID discussion, namely,&nbsp; that LSID Resolution Services might be conflated with LSID Resolution Discovery Services&nbsp; and that the former might get DNS notions inappropriately ingrained in them merely because such notions are ingrained in the only currentlResolution Discovery Service scheme ever mentioned by anybody, the DDDS/DNS scheme of Section 
8.3 of the LSID spec. As in any enterprise, it's a little likely that the cultural norms of TDWG membership will also creep in, but it is very difficult to characterize that in ways that would inform conformance software. Operative words are likely to include &quot;systematist&quot;, &quot;taxonomist&quot;, &quot;museum&quot;, and &quot;kingdom&quot;. To me, this means---ultimately---having a clear separation between tests of \intended/ special things about TDWG blessed LSID resolvers and tests of things that are identified as about all LSIDs, in case one can find such things. This all augurs for tools that are at least in part meaningful when applied against resolution services that have nothing to do with TDWG.
 Ideally, conformance test software would cleverly sneak in functionality that tests whether the resolution service author has actually read the LSID spec. Hah, hah, just serious. Bob </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
&nbsp;&nbsp;&nbsp;&nbsp;Anyway, thoughts are really appreciated. &nbsp;&nbsp;&nbsp;&nbsp;Best regards, Ricardo PS. Sorry about the cross post. This message is being sent to both the LSID developers and the TDWG/GBIF GUID mailing lists, as I believe it
 will interest both communities. </blockquote></div>