[Voyage-linux] Broken watchdog?
Robert Rawlins
(spam-protected)
Thu Sep 18 02:21:39 HKT 2008
Ok Guys,
Im starting to think that this might not be a configuration thing. Seems
that I have several boxes running identical configurations, by that I mean.
/etc/watchdog.conf
/etc/default/watchdog
/etc/init.d/watchdog
Are all the same on each system when I look at them, however, for some
reason one of them seems to be having this issue of not rebooting when the
tests fail.
Im starting to wonder if this is down to something else, perhaps the
drivers, or the watchdog itself? Is there any way to manually ping the
watchdog and ask it to reboot the system? Presumably thatll help narrow
down where the problem exists J
Any suggestions? This unit is out in the field so doing a rebuild isnt an
option and I really need this watchdog alive and barking.
Cheers guys,
Rob
From:
voyage-linux-bounces+robert.rawlins=thinkbluemedia.co.uk at list.voyage.hk
[mailto:voyage-linux-bounces+robert.rawlins=thinkbluemedia.co.uk at list.voyage
.hk] On Behalf Of Robert Rawlins
Sent: 15 September 2008 14:18
To: voyage-linux at voyage.hk
Subject: RE: [Voyage-linux] Broken watchdog?
Hi Dario,
Thanks for the reply on this. Im glad you can confirm that its not my
watchdog.conf which is causing the problem. The watchdog script
(/etc/default/watchdog) on my box looks like this:
# Start watchdog at boot time? 0 or 1
run_watchdog=1
#
# Specify additional watchdog options here (see manpage).
Does that seem normal to you? Also, as a little additional information, the
syslog entry when the box first starts up for watchdog looks like this:
Sep 15 12:03:56 voyage watchdog[3290]: starting daemon (5.2):
Sep 15 12:03:56 voyage watchdog[3290]: int=15s realtime=yes sync=no soft=no
mla=0 mem=0
Sep 15 12:03:56 voyage watchdog[3290]: ping: no machine to check
Sep 15 12:03:56 voyage watchdog[3290]: file: no file to check
Sep 15 12:03:56 voyage watchdog[3290]: pidfile: /var/run/myapp.pid
Sep 15 12:03:56 voyage watchdog[3290]: interface: no interface to check
Sep 15 12:03:56 voyage watchdog[3290]: test=none(0) repair=none
alive=/dev/watchdog heartbeat=none temp=none to=root no_act=no
Sep 15 12:03:56 voyage watchdog[3290]: was able to ping process 3249
(/var/run/myapp.pid).
Again, is this what you would expect to see on a normal system
configuration?
Many thanks,
Robert
From: Dario Finardi [mailto:d.finardi at gear.it]
Sent: 15 September 2008 09:04
To: Robert Rawlins
Subject: R: [Voyage-linux] Broken watchdog?
Using such a configuration my boards are working correctly.
have you turned-on the watchdog script modifing the status variable
run_watchdog=1?
_____
Da: voyage-linux-bounces+d.finardi=gear.it at list.voyage.hk
[mailto:voyage-linux-bounces+d.finardi=gear.it at list.voyage.hk] Per conto di
Robert Rawlins
Inviato: giovedì 11 settembre 2008 16.03
A: voyage-linux at voyage.hk
Oggetto: RE: [Voyage-linux] Broken watchdog?
Dario,
Thanks for your reply to and taking the time help out. Sorry for my late
reply, Ive had my head buried in code the past couple of days.
My watchdog configuration file looks like this:
realtime = yes
priority = 1
pidfile = /var/run/myapp.pid
watchdog-device = /dev/watchdog
interval = 15
Thats all there is too it. In syslog it logs all the checks as I detailed
in my original post but after the process crashes and watchdog cannot find
the process I get no more log entries from watchdog and the system is not
rebooted.
Let me know if you need anything else.
Thanks,
Robert
From: Dario Finardi [mailto:d.finardi at gear.it]
Sent: 10 September 2008 14:32
To: Robert Rawlins; voyage-linux at voyage.hk; voyage-linux at voyage.hk
Subject: R: [Voyage-linux] Broken watchdog?
may you post your watchdog configuration?
_____
Da: voyage-linux-bounces+d.finardi=gear.it at list.voyage.hk
[mailto:voyage-linux-bounces+d.finardi=gear.it at list.voyage.hk] Per conto di
Robert Rawlins
Inviato: mercoledì 10 settembre 2008 12.12
A: voyage-linux at voyage.hk; voyage-linux at voyage.hk
Oggetto: [Voyage-linux] Broken watchdog?
Guys,
My watchdog doesnt appear to be working quite correctly and Im hoping you
can help me out. I have it watching a process for me to ensure its still
alive and I can see it logging this check in syslog like so:
Sep 10 08:50:11 voyage watchdog[3351]: still alive after 5661 interval(s)
Sep 10 08:50:11 voyage watchdog[3351]: was able to ping process 3250
(/var/run/myapp.pid).
Sep 10 08:50:26 voyage watchdog[3351]: still alive after 5662 interval(s)
Sep 10 08:50:26 voyage watchdog[3351]: pinging process 3250
(/var/run/myapp.pid) gave errno = 3 = 'No such process'
As you can see, it knows it cannot ping my process as it has crashed, yet
the system doesnt appear to reboot itself. It just sits there like a dead
duck J
Im sure this was working in the past but cannot be sure. This is using
Voyage 0.5 on an ALIX board.
Id really appreciate some advice on this and how to debug if this is an
issue.
Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.voyage.hk/pipermail/voyage-linux/attachments/20080917/38b941ad/attachment.html>
More information about the Voyage-linux
mailing list