Hi Kevin,
you're good:
---
python
responsetests.py -v 20 -p "
http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI?op=inventory&count=true&concept=http://rs.tdwg.org/ontology/voc/TaxonOccurrence#/rdf:RDF/to:TaxonOccurrence/to:collector"
Testing:
http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI?op=inventory&count=true&concept=http://rs.tdwg.org/ontology/voc/TaxonOccurrence#/rdf:RDF/to:TaxonOccurrence/to:collector
INFO:root:[HTTP
HEADER] content-length: 12019
INFO:root:[HTTP
HEADER] x-powered-by: ASP.NET
INFO:root:[HTTP
HEADER] set-cookie: ASP.NET_SessionId=uxfg0l55colhnjmxzinwxp55; path=/;
HttpOnly
INFO:root:[HTTP
HEADER] x-aspnet-version: 2.0.50727
INFO:root:[HTTP
HEADER] server: Microsoft-IIS/6.0
INFO:root:[HTTP
HEADER] connection: close
INFO:root:[HTTP
HEADER] pragma: no-cache:
INFO:root:[HTTP
HEADER] cache-control: private
INFO:root:[HTTP
HEADER] date: Thu, 08 Nov 2007 21:29:37 GMT
INFO:root:[HTTP
HEADER] content-type: text/xml; charset="utf-8"
== Results ==
Test: "HTTP Status"
[OK]: No Worries
Test: "Response
Encoding" [OK]: No Worries
Test: "Document
Encoding" [OK]: No Worries
---
The GBIF rest service doesn't do quite so well though:
---
python
responsetests.py -v 20 -p "http://newportal.gbif.org/ws/rest/provider/list?stylesheet=&maxresults=10
"
Testing:
http://newportal.gbif.org/ws/rest/provider/list?stylesheet=&maxresults=10
INFO:root:[HTTP
HEADER] date: Thu, 08 Nov 2007 21:37:01 GMT
INFO:root:[HTTP
HEADER] transfer-encoding: chunked
INFO:root:[HTTP
HEADER] connection: close
INFO:root:[HTTP
HEADER] content-type: text/xml
INFO:root:[HTTP
HEADER] server: Apache/2.0.52 (Red Hat)
== Results ==
Test: "HTTP Status"
[OK]: No Worries
Test: "Response
Encoding" [WARNING]: No character encoding was specified for the
text/[*+]xml content type | "Content-type: text/xml"
Test: "Response
Encoding" [WARNING]: Falling back to the RFC 3023 default of us-ascii
character encoding.
Test: "Document
Encoding" [ERROR]: Bozo exception. | Document declared as us-ascii, but
parsed as utf-8
---
These tests (plus a whole bunch more) will be available as an online
service real soon, with the intent being to help data providers test
their service fairly rigorously before exposing them to the real
world.
regards,
Dave V.
On Nov 8, 2007 1:46 PM, Kevin Richards <RichardsK@landcareresearch.co.nz>
wrote:
>
>
> Good spotting Dave.
>
> This is fixed now in the TapirDotNET implementation of Tapir.
> I have updated the HerbIMI TapirDotNET implementation at
>
http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI
> So if you could like to check this provider, I would be interested
in your
> results.
>
> Kevin
>
> >>> "Dave Vieglais" <
vieglais@ku.edu>
9/11/2007 7:59 a.m. >>>
>
>
> Hi Everyone,
> I've come across a minor issue with some existing TAPIR
installations
> that should be easily fixed and will likely save some frustrations
> down the road.
>
> The TAPIR spec
> (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-18.html#toc16
)
> indicates a response Content-type of "text/xml". RFC 3023
> (http://www.ietf.org/rfc/rfc3023.txt)
indicates that in this case,
> when no "charset" parameter is specified in the HTTP response
header,
> the implied character encoding of the response document is
"us-ascii"
> (see s8.5).
>
> so for example:
>
> Good:
> response header = Content-type: text/xml; charset="utf-8"
>
> response document signature = <?xml version="1.0"
encoding="utf-8"?>
>
> result = document is assumed to be UTF-8
>
> Not so good:
> response header = Content-type: text/xml
>
> response document signature = <?xml version="1.0"
encoding="utf-8"?>
>
> result = document is assumed to be us-ascii
>
>
> All TAPIR installations that I've examined so far do not set a
charset
> value, and hence the character encoding of "us-ascii" is assumed by
> the consumer application, which is likely to cause some issues for
> consumer applications. This was also a significant issue for
DiGIR
> provider installations.
>
> The solution is likely to be quite simple, and there seems to be
two
> basic options:
>
> 1. Configure the webserver / application to insert a charset value
of
> "UTF-8" to avoid the consumer falling back to the default of
us-ascii.
>
> or
>
> 2. Return a Content-type of "Application/xml" or one of its
subtypes.
> In this case RFC 3023 indicates the default character encoding
should
> be assumed to be UTF-8.
>
> Note that simply specifying the content type does not automatically
> make the response properly encoded - it is still up to the web
> application (TAPIR in this case) to ensure that the output stream
is
> actually UTF-8 encoded.
>
> regards,
> Dave V.
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir@lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>
_______________________________________________
tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir