Alpha software. Interfaces, config schema, and state file format may change without notice between releases. Not recommended for production use without testing in your environment first.
Linux filesystem read-only monitor for iSCSI SAN environments. Detects when a VM's filesystem is remounted read-only (due to SAN failover or network interruption), waits a configurable delay, then reboots the VM. Implements exponential backoff to prevent reboot storms when the underlying issue persists.
Designed to run on Linux VMs under XenServer or XCP-ng with iSCSI SAN storage.
- Reads
/proc/mountson every check interval - Detects watched mounts with the
rooption flag - Waits a backoff-calculated delay (default 5 min for first occurrence)
- Runs pre-reboot hooks (notifications, log flush, etc.)
- Executes
systemctl reboot - Persists reboot history so subsequent incidents have longer delays
If a mount returns to rw before the delay expires, the reboot is cancelled and a recovery notification is sent.
- Linux with systemd ≥ 229 (see Supported Distributions)
- For building from source: Go 1.22+
- Optional: XenServer / XCP-ng — for
xenstorestate backend - Optional: polkit with JavaScript rules — enables
systemctl rebootover D-Bus;reboot -fviaCAP_SYS_BOOTis used as fallback when polkit JS rules are unavailable
| Distribution | Min version | systemd | Package | polkit JS rules |
|---|---|---|---|---|
| Ubuntu | 18.04 LTS | 237 | .deb |
Ubuntu 24.04+ only |
| Debian | 9 (Stretch) | 231 | .deb |
Debian 12+ only |
| RHEL | 8 | 239 | .rpm |
Yes (0.115) |
| CentOS Stream | 8 | 239 | .rpm |
Yes |
| Rocky Linux | 8 | 239 | .rpm |
Yes |
| AlmaLinux | 8 | 239 | .rpm |
Yes |
| openSUSE Leap | 15.0 | 237 | manual | Yes |
| SLES | 15 | 239 | manual | Yes |
| Alpine | ✗ | — | — | Uses OpenRC by default |
RHEL/CentOS 7 is not supported. systemd 219 predates
AmbientCapabilities(systemd ≥ 229 required).
Ubuntu ≤ 22.04 and Debian ≤ 11 ship polkit 0.105 which does not process
.rulesfiles. The polkit rule is not installed on these systems; reboots usereboot -fviaCAP_SYS_BOOTinstead.
Alpine Linux: the
.apkpackage can be built but Alpine uses OpenRC by default and does not ship systemd. Not supported in standard Alpine deployments.
# Download the .deb from the releases page, then:
sudo dpkg -i mountsentinel_<version>_amd64.deb
# Edit config
sudo nano /etc/mountsentinel.yml
sudo systemctl start mountsentinelOn upgrade:
sudo dpkg -i mountsentinel_<new_version>_amd64.deb # restarts daemon automaticallyOn removal:
sudo apt remove mountsentinel # stops service, keeps config + state
sudo apt purge mountsentinel # also removes user and /var/lib/mountsentinelsudo rpm -ivh mountsentinel-<version>.x86_64.rpm
# Edit config
sudo nano /etc/mountsentinel.yml
sudo systemctl start mountsentinelOn upgrade:
sudo rpm -Uvh mountsentinel-<new_version>.x86_64.rpmOn removal:
sudo rpm -e mountsentinel # stops service, keeps config + stategit clone <repo>
go mod tidy
make build-static
# Install (as root)
sudo bash scripts/install.sh ./mountsentinelThe install script:
- Creates
mountsentinelsystem user - Creates
/var/lib/mountsentinel/state directory - Installs binary to
/usr/local/bin/mountsentinel - Installs systemd unit to
/etc/systemd/system/mountsentinel.service - Installs example config to
/etc/mountsentinel.yml(if not present) - Installs polkit rule to
/etc/polkit-1/rules.d/(if polkit JS rules are supported) - Installs Zabbix UserParameter config (if
/etc/zabbix/zabbix_agentd.d/exists)
# Edit config first
sudo nano /etc/mountsentinel.yml
sudo systemctl enable --now mountsentinel
sudo journalctl -fu mountsentinelPackages are built with nFPM. Requires Go 1.22+.
# Install nfpm (once)
make nfpm-install# Both .deb and .rpm
make packages
# Individual formats
make deb # → dist/mountsentinel_<version>_amd64.deb
make rpm # → dist/mountsentinel-<version>.x86_64.rpm
make apk # → dist/mountsentinel_<version>_x86_64.apk (Alpine)Version is taken from git describe --tags. Tag a release before building:
git tag v1.0.0
make packagesTo build with an explicit version:
VERSION=1.0.0 make packages| Path | Notes |
|---|---|
/usr/local/bin/mountsentinel |
Static binary |
/lib/systemd/system/mountsentinel.service |
systemd unit |
/etc/mountsentinel.yml |
Default config (config|noreplace — not overwritten on upgrade) |
/etc/zabbix/zabbix_agentd.d/mountsentinel.conf |
Zabbix UserParameter config |
/usr/share/mountsentinel/50-mountsentinel.rules |
polkit rule (source; postinstall deploys conditionally) |
/etc/polkit-1/rules.d/50-mountsentinel.rules |
Deployed by postinstall only when polkit JS rules are supported |
/var/lib/mountsentinel/ |
State directory (owned by mountsentinel user) |
| Event | Behaviour |
|---|---|
| Fresh install | Creates mountsentinel user, enables service, installs polkit rule if supported |
| Upgrade | Restarts daemon if running; does not touch config or polkit rule |
| Remove | Stops and disables service; keeps config and state |
Purge / rpm -e |
Also removes polkit rule and /var/lib/mountsentinel |
Config file defaults to /etc/mountsentinel.yml. Override with --config.
daemon:
check_interval: "30s" # how often to poll /proc/mounts
dry_run: false # log reboot decisions without executing
log_level: "info" # info | debug | verbose
watch_mounts:
- mountpoint: "/data"
device: "/dev/sdb1"
label: "iscsi-data"
# wildcard: watch all mounts (minus exclusions)
# - mountpoint: "*"
# exclude: ["/proc", "/sys", "/dev", "/run"]
reboot:
delay: "5m" # wait before rebooting after detection
pre_reboot_hooks: [] # commands to run before reboot
backoff:
window: "24h" # rolling window for reboot history
base_delay: "5m" # first-incident delay
multiplier: 2.0 # each repeat doubles the delay
max_delay: "4h" # cap; when reached, auto-reboot stops
jitter: "30s" # random jitter to prevent thundering herd
state:
backend: "file" # file | tmpfs | xenstore | memory | remote
file_path: "/var/lib/mountsentinel/state.json"
fallback_backends: ["tmpfs", "memory"]
notify:
webhook:
url: "https://hooks.slack.com/..."
body_template: |
{"text": "{{.Hostname}} mount {{.Mountpoint}} → {{.Event}}"}
zabbix:
enabled: false
state_file: "/run/mountsentinel/zabbix.json"
metrics:
enabled: false
addr: ":9101"See dist/mountsentinel.yml.example for full annotated reference.
| Backend | Survives Reboot | Notes |
|---|---|---|
file |
Yes | Default. Risk: if /var/lib is on a watched mount |
tmpfs |
No | Recommended for iSCSI. Always writable (RAM). Resets cleanly after reboot |
xenstore |
No | XCP-ng/XenServer only. Uses xenstore-write CLI |
memory |
No | In-process only. Lost on daemon crash |
remote |
Yes | HTTP PUT/GET to configurable URL |
fallback_backends tries alternatives if the primary backend write fails:
state:
backend: "file"
fallback_backends: ["tmpfs", "memory"]mountsentinel [--config /etc/mountsentinel.yml] [--verbose] [--debug]# Table output (exits 2 if any mount is degraded — scriptable)
mountsentinel status
# Filter by mount
mountsentinel status --mount /data
# Zabbix LLD discovery JSON
mountsentinel status --format=zabbix-discovery
# Single item value (for Zabbix UserParameter)
mountsentinel status --mount /data --key state --format=value
mountsentinel status --mount /data --key reboot_count --format=valueExit codes for status:
0— all mounts healthy2— one or more mounts degraded (DETECTED / SUPPRESSED)
Clears the SUPPRESSED state when max_delay has been reached. Requires operator action.
mountsentinel reset --mount /data
# or by device
mountsentinel reset --mount /dev/sdb1systemctl reload mountsentinel
# or
kill -HUP $(pidof mountsentinel)Delays between reboots grow exponentially over a rolling window:
delay = base_delay × multiplier^(reboots_in_window)
delay = min(delay, max_delay)
delay += rand(0, jitter)
With defaults (base=5m, mult=2, max=4h):
| Reboot # in window | Delay |
|---|---|
| 1st | 5 min |
| 2nd | 10 min |
| 3rd | 20 min |
| 4th | 40 min |
| 5th | 80 min |
| 6th+ | 240 min (capped) → SUPPRESSED |
When SUPPRESSED: operator must run mountsentinel reset --mount <mp> to re-enable auto-reboot.
mountsentinel writes /run/mountsentinel/zabbix.json (tmpfs, always writable) on every state change. The local Zabbix agent reads this via UserParameter scripts and forwards to the Zabbix server.
-
Enable in config:
zabbix: enabled: true state_file: "/run/mountsentinel/zabbix.json"
-
Add the Zabbix agent user to the
mountsentinelgroup so it can read/etc/mountsentinel.yml(mode 640). Done automatically by the Ansible role whenmountsentinel_zabbix_enabled: true.sudo usermod -aG mountsentinel zabbix sudo systemctl restart zabbix-agent # restart required — reload is not enough -
Install agent config (done automatically by
scripts/install.shand the Ansible role):sudo cp dist/zabbix/mountsentinel.conf /etc/zabbix/zabbix_agentd.d/ sudo systemctl restart zabbix-agent
-
Import the Zabbix template:
- In Zabbix UI: Configuration → Templates → Import
- Select
dist/zabbix/mountsentinel_template.xml - Requires Zabbix 6.4+
- Apply the template to hosts running mountsentinel
dist/zabbix/mountsentinel_template.xml — Zabbix 6.4 template with full LLD discovery, triggers, and a graph.
Host-level items (passive Zabbix agent, 60s interval):
| Item | Key | Description |
|---|---|---|
| Service: active state | systemd.unit.info[mountsentinel.service,ActiveState] |
Expected: active |
| Service: sub state | systemd.unit.info[mountsentinel.service,SubState] |
Expected: running |
LLD discovery (key mountsentinel.discovery, 5 min interval) — auto-creates per-mount items:
| Item prototype | Key | Type |
|---|---|---|
Mount {#MOUNT}: state |
mountsentinel.state[{#MOUNT}] |
HEALTHY / DETECTED / SUPPRESSED / REBOOTING |
Mount {#MOUNT}: reboot count |
mountsentinel.reboot_count[{#MOUNT}] |
Integer counter within backoff window |
Mount {#MOUNT}: last event |
mountsentinel.last_event[{#MOUNT}] |
ISO8601 timestamp of last reboot |
Mount {#MOUNT}: suppressed |
mountsentinel.suppressed[{#MOUNT}] |
0 = active, 1 = suppressed |
Triggers:
| Trigger | Severity | Condition |
|---|---|---|
| Service is not running | High | ActiveState ≠ active |
| Service has failed | Disaster | SubState = failed |
| Mount is read-only (reboot pending) | Average | state = DETECTED |
| Mount suppressed — operator action required | High | state = SUPPRESSED or suppressed = 1 |
| Mount triggered a reboot | Warning | state = REBOOTING |
| Reboot count is high | Warning | reboot_count ≥ {$MOUNTSENTINEL.REBOOT.WARN} |
The SUPPRESSED trigger has manual close enabled — acknowledge after running mountsentinel reset.
Macros:
| Macro | Default | Description |
|---|---|---|
{$MOUNTSENTINEL.REBOOT.WARN} |
3 |
Reboot count threshold for the high-reboot-count trigger |
mountsentinel daemon
│ writes on each state change (atomic rename)
▼
/run/mountsentinel/zabbix.json (tmpfs — always writable)
▲
│ UserParameter reads on Zabbix server poll
zabbix_agentd
│ forwards
▼
Zabbix Server
No direct mountsentinel → Zabbix server connection. Agent handles transport.
Enable in config:
metrics:
enabled: true
addr: ":9101"Available metrics at http://localhost:9101/metrics:
Health endpoint: http://localhost:9101/healthz
All logs are structured JSON to stdout → captured by journald.
journalctl -fu mountsentinel | jq .Fields: ts, level, event, mount, device, backoff_delay, reboot_at, dry_run.
# Run tests
make test
# Run with verbose logging against a test config
./mountsentinel --config testdata/mountsentinel-test.yml --verbose
# Dry run (safe for staging)
./mountsentinel --config /etc/mountsentinel.yml
# with dry_run: true in config, no actual reboots will occurTo test without a real SAN failure, temporarily remount a filesystem read-only:
sudo mount -o remount,ro /data
# mountsentinel will detect within check_interval
sudo mount -o remount,rw /data
# mountsentinel logs "mount_recovered"- Runs as unprivileged
mountsentinelsystem user - Only
CAP_SYS_BOOTcapability granted viaAmbientCapabilities— no other root privileges - Reboot path:
systemctl rebootvia D-Bus (requires polkit JS rule, see below), falls back toreboot -fusingCAP_SYS_BOOTdirectly - Polkit rule (
/etc/polkit-1/rules.d/50-mountsentinel.rules) installed automatically on: RHEL/Rocky/AlmaLinux 8+, Debian 12+, Ubuntu 24.04+ - On Ubuntu ≤ 22.04 and Debian ≤ 11: polkit JS rules not supported;
reboot -ffallback is used - Full systemd sandboxing:
ProtectSystem=strict,PrivateTmp,MemoryDenyWriteExecute,NoNewPrivileges - Config file: mode 640,
root:mountsentinel— readable by service user, not world - State directory:
/var/lib/mountsentinel/mode 750, owned bymountsentinel
- Michael Moscovitch — developer
- Claude (Anthropic) — AI assistant
