Opened 16 years ago

Last modified 15 years ago

#20 closed defect

scripts LVS design issues — at Version 2

Reported by: andersk Owned by:
Priority: minor Milestone:
Component: web Keywords:
Cc:

Description (last modified by andersk)

(Imported from help.mit.edu #431727.)

Now that Nagios doesn't suck, we can actually see the scripts outage caused by the AFS server restart every Sunday morning. This made me realize a few things:

  • Our fallback to hodge-podge isn't just an exceptional condition; it happens every week. Thus it's an even worse idea than I thought it was. Viewers will get confused, and search engines may remove pages from their indexes, if they happen to get a 404 error from hodge-podge at the wrong moment.

  • Since the heartbeat script is in the scripts locker, the AFS server that serves it (aegisthus) is a single point of failure. Ideally LVS would check multiple heartbeat scripts in lockers on several different AFS servers, and continue routing connections if any of them respond.

Change History (2)

comment:1 Changed 16 years ago by andersk

  • Description modified (diff)

comment:2 Changed 16 years ago by andersk

  • Description modified (diff)
Note: See TracTickets for help on using tickets.