On Thu, Aug 27, 2009 at 3:57 AM, Jim Croftjim.croft@gmail.com wrote:
- Building mission-critical stuff around a single point of failure
without distributed replicated redundancy is amateurish, ultimately doomed to failure, and I am amazed that everybody does it. The drive for an easy solution with smart response times wins every time. IMO, the Australia's Virtual Herbarium took a step backward when it moved from a distributed to a cached solution without building in fail-over redundancy. Yes, the new version is quicker but when it does not work you are clean out of luck. If we are going to build anything that is going to become mission-critical and expect people to use it, then I want more than one of them.
The recent LSID sourceforge failure was not a technical failure; it was an administrative one. Many people had the password required to fix the site, but none was taking care of the problem. Having multiple A records directing to redundant servers would not have helped if none of those servers had the correct content.
Assuming the administrative powers are willing to make the necessary resolution configuration changes (such as establishing A records or reconfiguring a server), then the technical setup can be improved over time as needed. That is, one can start out with a single technical point of failure, and then switch to a redundant system as resources and motivation for one improve. But all the technical prowess in the world can't cause a URI or handle to resolve properly if there is not the willingness to make it do so from those who control the namespace.
This is, of course, an argument in favor of systems (such as the Linnaean one) that don't rely on a central resolution authority, but rather allow consumer choice by encouraging a market for name resolvers (Harvard's library doesn't have that genus revision? Try Yale's). Instituting something like this in any general way is probably beyond the abilities and interests of today's purveyors of web infrastructure. The fact that ICANN and DNS work as well as they do prevents anyone from working on an administratively decentralized alternative.
That said, I agree completely that redundancy is a great thing for both technical and administrative reasons, and should be encouraged even if the replica has to be accessed via different URIs than the original. Ad hoc client (or server, or proxy) software can convert published (unresolvable) URIs to replica URIs for the time being, and maybe in the future there might even be standards for configuring resolver choice - maybe even inside web browsers.
Jonathan