[tdwg-tag] Recent DiGIR service disruption

8 Oct 2008

      Apologies for cross posting.

Hopefully, most of you did not notice this, but if you did, here's  
what happened.  A problem with resolving the digir.net domain was  
recently identified by John Wieczorek and possibly others.  This  
problem would have affected most, if not all DiGIR providers currently  
operating.  Once identified, the issue was promptly resolved, and  
ongoing problems are unlikely.

This message provides further information about the problem, how it  
was resolved, and how recurrence will be avoided to ensure continued  
operation of the DiGIR infrastructure.

= Problem =

The DNS entry "digir.net" is an A record that must point to an IP  
address.  For several years digir.net resolved to the ip address of  
"66.35.250.210" which is a Source Forge machine that is configured as  
a VHOST for digir.net.  At some point in the last 48hrs, 66.35.250.210  
ceased to respond as a VHOST for digir.net.  This meant that any web  
service that was expecting to retrieve content from digir.net (e.g. http://digir.net/schema/protocol/2003/1.0/digir.xsd 
  ) failed, with significant operational repercussions.  For technical  
reasons, most other digir.net DNS entries were unaffected (e.g. www.digir.net 
).

= Resolution =

The new Source Forge VHOST servicing www.digir.net was identified as  
"216.34.181.97", and the DNS entry for digir.net was updated  
appropriately.  Since the TTL on that DNS record was set relatively  
short (600 seconds), the DNS infrastructure picked up the new  
information fairly quickly, and testing indicates that digir.net URLs  
are now being correctly resolved from several locations around the  
world.

= What You Need To Do =

Nothing.  You should not notice any ongoing disruption.

= Ensuring Continued Operation =

There are three major issues that lead to this malfunction (#2 and #3  
are informational only for developers of new services and tools):

  1. We have no control over Source Forge, and nor should digir.net  
infrastructure be dependent on them

  2. Much of the existing installed DiGIR infrastructure references  
"digir.net" directly and so DNS level indirection is not practical.

  3. HTTP redirects need to be supported in service implementations  
and their clients.  I suspect that a lot of DiGIR software services  
and clients do not follow redirects properly.

== Issue #1 ==

A DNS monitoring service is now in place that will update the DNS  
entries if the source forge machine changes again.

Also, a replica of the DiGIR web material currently hosted by Source  
Forge is being created on a machine that we have complete control over  
and is located on a fast internet backbone.  The replica machine will  
be monitored with a system that will provide automatic notification to  
appropriate contacts in the event of failure.  This machine will be  
located at the University of Kansas at least initially.  Once  
operational, the DNS entry for digir.net will be modified to point to  
this machine.

DNS services are handled by dyndns.org.  Since their inception they  
have provided a 100% uptime for their clients which includes domains  
such as CNN, Mozilla, and Twitter.  Since digir.net was an early  
adopter of their services, the DNS service provided by DynDNS does not  
expire.  DNS registration does expire however, and this relatively  
minor expense is covered by the University of Kansas with assistance  
from the National Science Foundation.

== Issue #2 ==

In retrospect, something like "http://schema.digir.net" rather than "http://digir.net 
" should have been used.  This would have provided a low level (DNS)  
mechanism for indirection that would have been unaffected by this  
unanticipated hardware change.  Altering these entries in DiGIR  
providers and tools is really not practical at this stage, so it is  
unlikely that this issue can be directly addressed.

== Issue #3 ==

HTTP offers a mechanism for temporary and permanent redirection.  In  
order to take advantage of this functionality, it is necessary for  
HTTP clients to properly interpret the HTTP 301 and 307 status codes  
and the respective response headers to determine the new location of  
the resource being resolved.  This is the mechanism employed by PURLs  
and TinyURL for example.  Unfortunately, many developers take short  
cuts when building tools to retrieve content from URLs and avoid the  
additional small amount of work necessary to handle redirect  
messages.  Where ever possible, developers of services and clients  
should use libraries that at least properly handle HTTP redirects  
which will allow the use of PURLs and other mechanisms for providing  
an additional level of indirection in URL resolution where necessary.

regards,
   Dave Vieglais

---
Biodiversity Research Center
University of Kansas