Resent-Date: Mon, 9 Mar 1998 17:04:31 -0800 (PST)
To: listowners-announce(a)rootsweb.com, rootsweb-help(a)rootsweb.com,
STATE-COORD-L(a)rootsweb.com
cc: "Dr. Brian Leverich" <leverich(a)rootsweb.com>
Reply-to: "Dr. Brian Leverich" <leverich(a)rootsweb.com>
Subject: Serious Problems Out on the Net (and at home)
Date: Mon, 09 Mar 1998 16:58:50 -0800
From: Brian Leverich <leverich(a)rootsweb.com>
Resent-From: listowners-announce(a)rootsweb.com
X-Mailing-List: <listowners-announce(a)rootsweb.com> archive/latest/37
X-Loop: listowners-announce(a)rootsweb.com
Precedence: list
Resent-Sender: listowners-announce-request(a)rootsweb.com
Resent-Bcc:
As many of you have undoubtably noticed, the last week has been a
mess out on The Net.
Starting one week ago, some clown has been strobing all the hosts
and routers on the Internet with an attack that is fatal (locks the
machine such that it has to be power cycled to restart) to Win 95
and Win NT boxen.
While this didn't directly affect RootsWeb (we're wall-to-wall Unix
servers), it did kill one of our network neighbors' NT server.
Unfortunately that NT server was providing reverse DNS for some of
RootsWeb's boxes, so the NT server's repeated crashes caused major
problems for us.
It especially slowed mail deliveries from our list servers.
To deal with this problem, the NT server has been "patched" to
reduce its vulnerability to attack and we have taken various other
technical measures to make us safer.
Last night at 2am Sprintlink, with no warning to any of its
customers, attempted to upgrade the operating systems on its
backbone routers throughout the country. Things went badly south,
and the Net is still very much crippled right now.
The Sprintlink disaster essentially disconnected pieces of RootsWeb
from The Net for several hours until we could adjust our routers to
work around the down pieces of Sprintlink's backbone.
As the Sprintlink disaster rippled through The Net, it also took
down the routers at our feed from CRL. That happened while Karen
and I were away from our keyboards, and the CRL down wounded our
main list server.
Some mail was lost, and we suspect some digests were damaged. We'll
be fixing things for a few days. ):
Ultimately there is no way that an individual site can insulate
itself from a mess like last night, but we will be able to better
protect ourselves as RootsWeb grows and operates more T1 connections
to more Internet backbone carriers.
Finally, this morning at 10am the textbase harddrive in the search
engine box glitched and wedged, killing that server. Karen and I
were working at that machine's console to revive it when the CRL
link went down, which was why we couldn't save the mail server.
Disk problems like this should go away as we upgrade all our servers
to using redundant RAID-5 disk arrays, rather than depending on
single drives.
*sigh*
Thank goodness weeks like this don't happen too often. As noted
above, we'll be doing what we can to insulate ourselves from these
sorts of problems in the future.
One other thought: Karen and Brian will be away from our consoles all
day tomorrow traveling on business that is critical to RootsWeb's
future. It worries us a lot to leave the servers without someone
physically at the consoles, but there will be some great sysadmins
monitoring the site remotely and we think the potential benefits of
this trip outweigh the risks of leaving the servers for a day.
We apologize in advance if anything goes wrong, and we will be back
late tomorrow night in any case. Cheers, B.
--
Dr. Brian Leverich Co-moderator, soc.genealogy.methods/GENMTD-L
RootsWeb Genealogical Data Cooperative
http://www.rootsweb.com/
P.O. Box 6798, Frazier Park, CA 93222-6798 leverich(a)rootsweb.com