Copyright © 2002 Aaron Hill, Paul Hoadley
2003-03-31
Abstract
This article describes a simple but effective method for
monitoring the state of an ADSL link (or, in
fact, any PPP or PPPoE
link using /usr/sbin/ppp). It has been
observed that in at least one mode of operation
(-ddial), the /usr/sbin/ppp
program can become irretrievably wedged when the link goes down.
The only solution seems to be killing and restarting the
process. This article contains and describes the usage of a
script that will periodically check the state of the link, and
kill and restart ppp if required.
Table of Contents
The script obtains the default gateway from the
tun0 interface and attempts to ping it. If
the ping fails it will ping a secondary IP address which you hard
code in the script. If both these tests fail a certain number of
times (2 by default) it restarts the ppp
daemon. It will keep bouncing ppp until the
ping test works again.
The script keeps a count of how many times the ping test
failed so when the ping test works it can send you an email to let
you know what happened and how many times the test failed. By
default this email goes to root. Based on how often you're
running the script from cron you'll be able to tell the length of
the outage from this email.
The following script should be installed as
pingmonitor.sh in the
/usr/local/etc/ directory.
#!/bin/sh
#
# This script tests for network connectivity and restarts ppp if it is found
# to be down.
#
# --- User-modifiable variables ---
# Use this IP address if the primary address cannot be determined
secondaryaddr="139.134.2.2"
# Number of failed pings required to signify link failure
failedtrigger=2
# File to keep track of total number of failed pings
failedcountfile="/usr/local/etc/pingmonitor.missedping.count"
# Email address to send reports to
emailaccount="root"
# Set to the appropriate label in /etc/ppp/ppp.conf
isp="bigpond"
# These options will be given to /usr/sbin/ppp
ppp_opts="-quiet -ddial -nat"
# --- End of user-modifiable variables ---
# Load in system configuration.
if [ -f /etc/defaults/rc.conf ]; then
. /etc/defaults/rc.conf
source_rc_confs
elif [ -f /etc/rc.conf ]; then
. /etc/rc.conf
fi
# only continue if the ppp link should be up
if [ ! $ppp_enable ]; then
# PPP is not configured in the rc files.
exit 0
elif [ "$ppp_enable" = "NO" ] || [ "$ppp_enable" = "no" ]; then
# PPP is not wanted
exit 0
fi
# Set umask
umask 137
# Determine the default gateway for the ADSL link
primaryaddr=`ifconfig tun0 | grep 'inet ' | grep -v 255.255.255.255 | tail -1 | cut -f 2 -d '>' | cut -f 2 -d ' '`
if [ "$primaryaddr" = "" ]; then
primaryaddr="0.0.0.0"
secondaryaddr="0.0.0.0"
fi
# Check if we've had any previous failures
if [ -f $failedcountfile ]; then
pingfailed=`head -n 1 $failedcountfile`
else
pingfailed=0
fi
# Run the ping for the primary address - our default gateway.
/sbin/ping -c 5 -t 5 -q -m 2 $primaryaddr > /dev/null 2> /dev/null
# If the ping failed. Check to see if the gateway is filtered.
if [ $? -ne 0 ]; then
# Try to pull a TTL EXCEEDED message from the gateway.
if [ `/sbin/ping -c 1 -m 0 -n -t 1 $primaryaddr 2> /dev/null | grep -i "time to live exceeded" | grep $primaryaddr | wc -l` -eq 1 ]; then
ping_error=0
else
ping_error=1
fi
else
# No filtering. The default gateway responded to the initial ping.
ping_error=0
fi
#
# Ping returns a non-zero error condition if ALL the ECHO_REQUEST packets did
# not return a ECHO_REPLY. If we received just one answer then ping returns
# a zero error condition which is perfect for our tests.
#
# Check the ping status and try pinging the secondary address if it failed
if [ $ping_error -ne 0 ]; then
/sbin/ping -c 5 -t 5 -q -m 7 $secondaryaddr > /dev/null 2> /dev/null
if [ $? -eq 0 ]; then
ping_error=0
else
ping_error=1
fi
fi
# Test the error condition
if [ $ping_error -ne 0 ]; then
# Update and record the failure count
pingfailed=$(($pingfailed + 1))
echo $pingfailed > $failedcountfile
# Test if we've hit our failure trigger
if [ $pingfailed -ge $failedtrigger ]; then
# time to restart ppp so kill it first
/usr/bin/killall ppp > /dev/null 2> /dev/null
# wait for it to die
sleep 5
# really ensure ppp is dead - we can't risk two running
/usr/bin/killall -9 ppp > /dev/null 2> /dev/null
# wait again
sleep 5
# start up ppp again
/usr/sbin/ppp $ppp_opts $isp > /dev/null 2> /dev/null
# our work here is done
fi
else
# the ping worked so check if we've just recovered from a failure
if [ $pingfailed -ge $failedtrigger ]; then
# we have just recovered so let the admin know
echo "PING test failed $pingfailed times" | /usr/bin/mail -s "PPP restart on `/bin/hostname -s` at `date '+%H:%M %d/%m/%y'`" $emailaccount > /dev/null 2> /dev/null
fi
# all's well now so remove the failure count
rm -f $failedcountfile > /dev/null 2> /dev/null
fi
# that's it
Use the following steps to modify the script for use:
Do a traceroute to anywhere over your ADSL connection to find the second hop IP address.
Add this second hop IP address to the script in the
variable secondaryaddr. Alternatively you
can use any external, reliable IP address like the Telstra DNS
server on 139.130.4.4.
Set the failedtrigger variable to a
value indicating the threshhold for link failure. The script
will accept that number of failures of consecutive ping tests
before concluding that the link is down. A good default value
is probably 2.
Change the filename and path in the variable
failedcountfile if there is a more
appropriate place on your system.
Change the variable emailaccount to
an email address you'd like the error reports to go to.
The variable isp should be set to
correspond to a label in the file
/etc/ppp/ppp.conf. This label name will
be passed to /usr/sbin/ppp.
Set the variable ppp_opts to contain
any options that should be passed to
/usr/sbin/ppp.
Make sure you can manually ping the secondary IP address. You might have to modify your firewall rules depending on your setup.
Make sure you can manually ping your default gateway
and/or your firewall allows it to send TTL
EXCEEDED messages to the
ADSL interface. This is icmptype 11 for
ipfw.
The script should be run periodically using
cron. Add the following entry to
/etc/crontab:
# PING Monitor - check the ADSL connection every two minutes 0-59/2 * * * * root /usr/local/etc/pingmonitor.sh
Restart cron by running: kill -HUP cron.
Test the script. Try pulling the phone line from the back of the modem or such for enough time to trigger the script.
The author of this document is Paul A. Hoadley. The
author of the pingmonitor.sh script is Aaron Hill. Feel
free to send details of any errors in this document by
email.