iframe-proxy

waja · 2013-10-01T07:01:15Z

Just turning attached patch of github issue #867 into a push request.

"Hi,
i created a patch for check_disk (v2025/1.4.13) which can handle hanging nfs filesystems. Imagine you mounted a share from a NAS at the mountpoint /mnt. Now if the Storage device or whatever acts as NFS server dies or encounters a network problem, you will see messages like "NFS server nas.naprax.de not responding still trying" and every process accessing files inside the /mnt directory will be blocked, maybe forever. Depending on the mount options the hanging processes may even be invulnerable to a kill -9. This also applies to check_disk. If you have a service monitoring usage of /mnt with "check_disk ... -p /mnt", it will also be blocked. Nagios will report a timeout then. But the bad thing is, every minutes another check_disk will be started which also will hang then. Sooner or later your process list fills up with unkillable check_disks.
The critical piece of code inside check_disk is the stat system call, which is in the moment needed to find out, if a path exists at all. If that stat call hits a directory which is mounted from a dead nfs server, it will not return with an error code, but will not return at all.
I found out that although processes cannot be killed in such a situations, threads can. So i rewrote the stat_path subroutine in a way, where the critical stat is executed in it's own thread. If this thread does not terminate within the --timeout interval, it is considered to be blocked by a dead nfs filesystem and the thread will be detached.
I tested it on Linux 2.6.18 (gcc 4.1.2) and Solaris 10/x86 (gcc 3.4.3)"

Just turning attached patch of github issue monitoring-plugins#867 into a push request.

waja · 2015-10-12T09:11:01Z

waja · 2015-10-12T09:35:04Z

reverted by 11c5796

weiss · 2015-10-13T09:12:21Z

Our idea is to fork(2) child processes for checking remote file systems, instead.

thatsafunnyname · 2018-04-20T10:54:35Z

check_disk: no longer hangs on hanging filesystems

a1f790d

Just turning attached patch of github issue monitoring-plugins#867 into a push request.

waja added the enhancement label Jul 29, 2014

waja modified the milestones: 2.2, 2.1 Oct 6, 2014

waja force-pushed the master branch 2 times, most recently from 441913d to 40c870e Compare October 19, 2014 21:31

waja added the squash label Nov 28, 2014

weiss self-assigned this Nov 28, 2014

weiss closed this in 14d306f Dec 2, 2014

weiss reopened this Oct 13, 2015

weiss modified the milestones: 2.3, 2.2 Oct 13, 2015

weiss removed the squash label Oct 13, 2015

waja closed this Nov 20, 2016

waja deleted the github867 branch November 20, 2016 21:17

waja restored the github867 branch November 20, 2016 21:31

waja reopened this Nov 20, 2016

waja modified the milestones: 2.3, 2.4 Dec 15, 2020

RincewindsHat added the check_disk label Nov 18, 2021

RincewindsHat unassigned weiss Mar 9, 2024

waja modified the milestones: 2.4, 2.5 Jul 23, 2024

RincewindsHat modified the milestones: 2.5, 3.1.0 Jan 9, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

check_disk: no longer hangs on hanging filesystems#1186

check_disk: no longer hangs on hanging filesystems#1186
waja wants to merge 1 commit into
monitoring-plugins:masterfrom
waja:github867

waja commented Oct 1, 2013

Uh oh!

waja commented Oct 12, 2015

Uh oh!

waja commented Oct 12, 2015

Uh oh!

weiss commented Oct 13, 2015

Uh oh!

thatsafunnyname commented Apr 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

waja commented Oct 1, 2013

Uh oh!

waja commented Oct 12, 2015

Uh oh!

waja commented Oct 12, 2015

Uh oh!

weiss commented Oct 13, 2015

Uh oh!

thatsafunnyname commented Apr 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants