Conquering Ceph MGR Crashes on Proxmox VE 9

An AppArmor + Podman Odyssey

If you’re running Proxmox’s version of Ceph, you might not have this problem. I, however, am foolish enough to have a mixed environment with Proxmox hosts and a Kubernetes cluster in my home lab. I run Ceph Squid, v19.2.3 using cephadm across both Proxmox (Debian) and my Kubernetes (Ubuntu 24.04) hosts. If you’re unaware, cephadm is a containerized means of managing hosts across the cluster using SSH keys. I haven’t had any issues with Ceph or cephadm in my mixed clusters until I upgraded Proxmox from 8.4 to 9.1.

The setup

I cleaned up my Proxmox cluster and got everything in a great state of preparedness to upgrade from 8.4 to 9.1. I was certain I had everything ready to go. I upgraded each box in turn, and everything looked like it was working just fine–right up until I hit restart on the last node that was acting as an MGR in my Ceph cluster.

The Problem: Ceph mgr instances refusing to start

It started innocent enough: after the Proxmox upgrade, everything appeared to be running. Even the mgr processes appeared to be up according to systemd status, but ceph -s told a different story.

  cluster:
    id:     <FSID>
    health: HEALTH_ERR
            Module 'cephadm' has failed: [Errno 13] Permission denied

Huh? What exactly does Permission denied here mean?

Digging in to logs

As it turned out, all of my mgr hosts were down. I took a closer look at the journald output and found errors like these:

Assertion details: socket creation failed: Permission denied
ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable)
 1: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0x133) [0x71e01613414d]
 2: (ceph::logging::detail::JournaldClient::JournaldClient()+0xd5) [0x71e0162bbca5]
 3: (ceph::logging::JournaldLogger::JournaldLogger(ceph::logging::SubsystemMap const*)+0x31) [0x71e0162bd751]
 4: (ceph::logging::Log::start_journald_logger()+0x5c) [0x71e01645140c]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a291c) [0x71e01625291c]

This looked like a logging issue (Ceph trying to connect to /run/systemd/journal/socket for native journald logs). But disabling journald logging via config setting doesn’t work—the crash happened too early, before config load.

The cluster was stable. My mon instances had quorum across 3 Proxmox hosts—which confusingly were running just fine on the same hosts where the mgr instances were failing with cryptic errors. But no functional mgr instances means that ceph orch doesn’t work: no maintenance, no deployment updates, no host checks, no bueno.

The dmesg clue

Digging deeper, I looked at dmesg and found these lines:

[   18.091857] audit: type=1400 audit(1765480799.041:145): apparmor="DENIED" operation="create" class="net" info="failed protocol match" error=-13 profile="containers-default-0.62.2" pid=2282 comm="ceph-mgr" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none
[   18.093033] audit: type=1400 audit(1765480799.042:146): apparmor="DENIED" operation="create" class="net" info="failed protocol match" error=-13 profile="containers-default-0.62.2" pid=2285 comm="ceph-exporter" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none
[   24.020526] audit: type=1400 audit(1765480805.015:147): apparmor="DENIED" operation="create" class="net" info="failed protocol match" error=-13 profile="containers-default-0.62.2" pid=2282 comm="dashboard" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none

So dmesg pinpointed AppArmor unix stream creation denials for mgr’s internal RPCs to ceph-mgr/ceph-exporter/dashboard ). This is where the permission denied error actually comes from.

Begin the attempts to address the issue

While it obviously would have worked to disable apparmor entirely, that was a non-starter. So I started with these attempts.

Podman Overrides

Created /etc/containers/systemd/ceph.container with SecurityOpt=apparmor=unconfined and LogDriver=journald. Reloaded systemd, restarted the mgr unit. The overrides should’ve affected Podman spawns, but it turns out that the unit files that get started are just bash shims calling a script. It was the right idea, but didn’t see why it was failing until further investigation (see what follows).

Considered “Complain Mode” for Podman

Had I targeted the correct thing (hint: not /usr/bin/podman), this probably would have worked, but I’d still have had all of the logs and dmesg entries. You’ll see why I hit the wrong target hereafter.

Podman Profile Tweaks

I wanted to add network unix stream and network unix dgram to Podman’s apparmor profile, ostensibly at /etc/apparmor.d/usr.bin.podman. It turns out that file doesn’t exist. Instead, podman has a dynamically-loaded-by-the-kernel profile. We saw the name in the dmesg output: profile="containers-default-0.62.2".

Can’t tweak the profile by editing a file. Okay, then.

What Finally Worked

The final clue

Ran aa-status to see what apparmor had to say for itself:

# Filtered for brevity
10 processes are in enforce mode.
   ...
   /usr/bin/ceph-mgr (2400) containers-default-0.62.2
   /usr/bin/ceph-exporter (2449) containers-default-0.62.2
   /bin/node_exporter (2611) containers-default-0.62.2
12 processes are unconfined but have a profile defined.
   ...
   /usr/bin/ceph-mon (2269) crun
   /usr/bin/ceph-osd (8249) crun

So, why are ceph-mgr, ceph-exporter, and node_exporter running a different profile than ceph-mon and ceph-osd? More on that below.

What we need to do is add --security-opt apparmor=unconfined to our ceph-mgr instances.

What I wanted to do

You can patch a service definition file with:

... <the rest of the content>
extra_container_args:
- "--security-opt apparmor=unconfined"

Then reapply with ceph orch apply -i filename.yaml

But my mgr instances aren’t talking right now because of this bug! I can’t use apply until they do.

Interim: Manually edit unit.run files

So, digging deeper into why the systemd overrides didn’t work it turns out that it just calls identically named scripts in every one of the cephadm managed paths.

ExecStart=/bin/bash /var/lib/ceph/<fsid>/%i/unit.run
ExecStop=-/bin/bash -c 'bash /var/lib/ceph/<fsid>/%i/unit.stop'
ExecStopPost=-/bin/bash /var/lib/ceph/<fsid>/%i/unit.poststop

So we have unit.run, unit.stop, and unit.poststop. No wonder just trying to set a systemd-level override didn’t work. These files are at /var/lib/ceph/<fsid>/*, and in particular, the unit.run file for a mgr instance will be at /var/lib/ceph/<fsid>/mgr.<host>.<id>/unit.run.

Within the unit.run file you’ll find the actual podman run line:

/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mgr --init --name ...

Inject --security-opt apparmor=unconfined into the podman run line in the gap between --net=host and --entrypoint like this:

/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --security-opt apparmor=unconfined --entrypoint /usr/bin/ceph-mgr --init --name ...

With that in place, I restarted the mgr instance… it worked!

Caveat

Getting this up and running was well and good. But cephadm will overwrite unit.run on a redeploy, so manually editing the unit.run files is not a permanent solution.

Why it happened

I had Grok help me with this explanation:

The root: Proxmox VE 9’s Podman 5.4.2 + AppArmor 4.1.1 back-ported CVE-2025-52881 fixes (critical LXC container file-descriptor leaks). This uses detached mounts for /proc/sys/net/... to prevent race conditions, but apparmor can’t resolve full paths in detached mounts—it sees relative /sys/net/... paths instead of /proc/sys/net/..., triggering the errors seen, e.g. "failed protocol match" on unix stream creates (error=-13 EACCES)...

Why were mgr and ceph-exporter hit but not mon or osd? It turns out that mon and osd have the --privileged flag in the podman run line in their unit.run file because they need low-level device access. Proxmox VE 8 uses Podman 4.3.x which is not affected, so no issue. Ubuntu 24.04’s Podman 4.9.3 + AppArmor 4.0.1 (my Kubernetes infra) skips it too.

The Final Fix: extra_container_args

So cephadm orch services have the ability to add extra args to containers. You would need to run this flow for any affected service.

Download the existing service

ceph orch ls mgr --export > filename.yaml

Edit the service file

In this case, the service is mgr. Add the extra_container_args as follows:

service_type: mgr
service_name: mgr
placement:
  count: 3
  hosts:
  - host1
  - host2
  - host3
extra_container_args:
  - "--security-opt apparmor=unconfined"

Redeploy the service

It really is as simple as this:

ceph orch apply -i filename.yaml

This appends --security-opt apparmor=unconfined as one of the podman run args in unit.run in a permanent fashion that survives redeploys and upgrades. After applying, you can even check the contents of the unit.run file to verify.

Summary

This outlines how to get past being stuck by this apparmor + podman bug in a way that survives redeployments and upgrades—absolutely necessary for the mgr service.

The choice is yours

You absolutely do need to run the mgr service with apparmor=unconfined right now if you’re running on a Proxmox VE 9 box as of December 2025. Future updates may change this. But for now you need to do this if your mgr is on a box with Podman 5.4.2 + AppArmor 4.1.1. You may also want to repeat the patch to your node-exporter and ceph-exporter instances, as they too are effectively blocked (as per the dmesg lines showing DENIED). However, since those are not mission critical, and if you’re like me and have a mixed deployment, you don’t need to make every container run unconfined. In such cases as these, you can probably just edit the unit.run file as outlined and only apply the fix to those instances that need it. Anything running Podman 4.9 is completely safe. The node-exporter and ceph-exporter instances on my Proxmox VE 9 hosts will get the manual treatment. Those instances don’t change often, and perhaps a better fix between apparmor and podman will come out before I need to worry about it again.

Good luck!

Leave a comment