Conquering Ceph MGR Crashes on Proxmox VE 9

An AppArmor + Podman Odyssey

If you’re running Proxmox’s version of Ceph, you might not have this problem. I, however, am foolish enough to have a mixed environment with Proxmox hosts and a Kubernetes cluster in my home lab. I run Ceph Squid, v19.2.3 using cephadm across both Proxmox (Debian) and my Kubernetes (Ubuntu 24.04) hosts. If you’re unaware, cephadm is a containerized means of managing hosts across the cluster using SSH keys. I haven’t had any issues with Ceph or cephadm in my mixed clusters until I upgraded Proxmox from 8.4 to 9.1.

The setup

I cleaned up my Proxmox cluster and got everything in a great state of preparedness to upgrade from 8.4 to 9.1. I was certain I had everything ready to go. I upgraded each box in turn, and everything looked like it was working just fine–right up until I hit restart on the last node that was acting as an MGR in my Ceph cluster.

The Problem: Ceph mgr instances refusing to start

It started innocent enough: after the Proxmox upgrade, everything appeared to be running. Even the mgr processes appeared to be up according to systemd status, but ceph -s told a different story.

  cluster:
    id:     <FSID>
    health: HEALTH_ERR
            Module 'cephadm' has failed: [Errno 13] Permission denied

Huh? What exactly does Permission denied here mean?

Digging in to logs

As it turned out, all of my mgr hosts were down. I took a closer look at the journald output and found errors like these:

Assertion details: socket creation failed: Permission denied
ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable)
 1: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0x133) [0x71e01613414d]
 2: (ceph::logging::detail::JournaldClient::JournaldClient()+0xd5) [0x71e0162bbca5]
 3: (ceph::logging::JournaldLogger::JournaldLogger(ceph::logging::SubsystemMap const*)+0x31) [0x71e0162bd751]
 4: (ceph::logging::Log::start_journald_logger()+0x5c) [0x71e01645140c]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a291c) [0x71e01625291c]

This looked like a logging issue (Ceph trying to connect to /run/systemd/journal/socket for native journald logs). But disabling journald logging via config setting doesn’t work—the crash happened too early, before config load.

The cluster was stable. My mon instances had quorum across 3 Proxmox hosts—which confusingly were running just fine on the same hosts where the mgr instances were failing with cryptic errors. But no functional mgr instances means that ceph orch doesn’t work: no maintenance, no deployment updates, no host checks, no bueno.

The dmesg clue

Digging deeper, I looked at dmesg and found these lines:

[   18.091857] audit: type=1400 audit(1765480799.041:145): apparmor="DENIED" operation="create" class="net" info="failed protocol match" error=-13 profile="containers-default-0.62.2" pid=2282 comm="ceph-mgr" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none
[   18.093033] audit: type=1400 audit(1765480799.042:146): apparmor="DENIED" operation="create" class="net" info="failed protocol match" error=-13 profile="containers-default-0.62.2" pid=2285 comm="ceph-exporter" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none
[   24.020526] audit: type=1400 audit(1765480805.015:147): apparmor="DENIED" operation="create" class="net" info="failed protocol match" error=-13 profile="containers-default-0.62.2" pid=2282 comm="dashboard" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none

So dmesg pinpointed AppArmor unix stream creation denials for mgr’s internal RPCs to ceph-mgr/ceph-exporter/dashboard ). This is where the permission denied error actually comes from.

Begin the attempts to address the issue

While it obviously would have worked to disable apparmor entirely, that was a non-starter. So I started with these attempts.

Podman Overrides

Created /etc/containers/systemd/ceph.container with SecurityOpt=apparmor=unconfined and LogDriver=journald. Reloaded systemd, restarted the mgr unit. The overrides should’ve affected Podman spawns, but it turns out that the unit files that get started are just bash shims calling a script. It was the right idea, but didn’t see why it was failing until further investigation (see what follows).

Considered “Complain Mode” for Podman

Had I targeted the correct thing (hint: not /usr/bin/podman), this probably would have worked, but I’d still have had all of the logs and dmesg entries. You’ll see why I hit the wrong target hereafter.

Podman Profile Tweaks

I wanted to add network unix stream and network unix dgram to Podman’s apparmor profile, ostensibly at /etc/apparmor.d/usr.bin.podman. It turns out that file doesn’t exist. Instead, podman has a dynamically-loaded-by-the-kernel profile. We saw the name in the dmesg output: profile="containers-default-0.62.2".

Can’t tweak the profile by editing a file. Okay, then.

What Finally Worked

The final clue

Ran aa-status to see what apparmor had to say for itself:

# Filtered for brevity
10 processes are in enforce mode.
   ...
   /usr/bin/ceph-mgr (2400) containers-default-0.62.2
   /usr/bin/ceph-exporter (2449) containers-default-0.62.2
   /bin/node_exporter (2611) containers-default-0.62.2
12 processes are unconfined but have a profile defined.
   ...
   /usr/bin/ceph-mon (2269) crun
   /usr/bin/ceph-osd (8249) crun

So, why are ceph-mgr, ceph-exporter, and node_exporter running a different profile than ceph-mon and ceph-osd? More on that below.

What we need to do is add --security-opt apparmor=unconfined to our ceph-mgr instances.

What I wanted to do

You can patch a service definition file with:

... <the rest of the content>
extra_container_args:
- "--security-opt apparmor=unconfined"

Then reapply with ceph orch apply -i filename.yaml

But my mgr instances aren’t talking right now because of this bug! I can’t use apply until they do.

Interim: Manually edit unit.run files

So, digging deeper into why the systemd overrides didn’t work it turns out that it just calls identically named scripts in every one of the cephadm managed paths.

ExecStart=/bin/bash /var/lib/ceph/<fsid>/%i/unit.run
ExecStop=-/bin/bash -c 'bash /var/lib/ceph/<fsid>/%i/unit.stop'
ExecStopPost=-/bin/bash /var/lib/ceph/<fsid>/%i/unit.poststop

So we have unit.run, unit.stop, and unit.poststop. No wonder just trying to set a systemd-level override didn’t work. These files are at /var/lib/ceph/<fsid>/*, and in particular, the unit.run file for a mgr instance will be at /var/lib/ceph/<fsid>/mgr.<host>.<id>/unit.run.

Within the unit.run file you’ll find the actual podman run line:

/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mgr --init --name ...

Inject --security-opt apparmor=unconfined into the podman run line in the gap between --net=host and --entrypoint like this:

/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --security-opt apparmor=unconfined --entrypoint /usr/bin/ceph-mgr --init --name ...

With that in place, I restarted the mgr instance… it worked!

Caveat

Getting this up and running was well and good. But cephadm will overwrite unit.run on a redeploy, so manually editing the unit.run files is not a permanent solution.

Why it happened

I had Grok help me with this explanation:

The root: Proxmox VE 9’s Podman 5.4.2 + AppArmor 4.1.1 back-ported CVE-2025-52881 fixes (critical LXC container file-descriptor leaks). This uses detached mounts for /proc/sys/net/... to prevent race conditions, but apparmor can’t resolve full paths in detached mounts—it sees relative /sys/net/... paths instead of /proc/sys/net/..., triggering the errors seen, e.g. "failed protocol match" on unix stream creates (error=-13 EACCES)...

Why were mgr and ceph-exporter hit but not mon or osd? It turns out that mon and osd have the --privileged flag in the podman run line in their unit.run file because they need low-level device access. Proxmox VE 8 uses Podman 4.3.x which is not affected, so no issue. Ubuntu 24.04’s Podman 4.9.3 + AppArmor 4.0.1 (my Kubernetes infra) skips it too.

The Final Fix: extra_container_args

So cephadm orch services have the ability to add extra args to containers. You would need to run this flow for any affected service.

Download the existing service

ceph orch ls mgr --export > filename.yaml

Edit the service file

In this case, the service is mgr. Add the extra_container_args as follows:

service_type: mgr
service_name: mgr
placement:
  count: 3
  hosts:
  - host1
  - host2
  - host3
extra_container_args:
  - "--security-opt apparmor=unconfined"

Redeploy the service

It really is as simple as this:

ceph orch apply -i filename.yaml

This appends --security-opt apparmor=unconfined as one of the podman run args in unit.run in a permanent fashion that survives redeploys and upgrades. After applying, you can even check the contents of the unit.run file to verify.

Summary

This outlines how to get past being stuck by this apparmor + podman bug in a way that survives redeployments and upgrades—absolutely necessary for the mgr service.

The choice is yours

You absolutely do need to run the mgr service with apparmor=unconfined right now if you’re running on a Proxmox VE 9 box as of December 2025. Future updates may change this. But for now you need to do this if your mgr is on a box with Podman 5.4.2 + AppArmor 4.1.1. You may also want to repeat the patch to your node-exporter and ceph-exporter instances, as they too are effectively blocked (as per the dmesg lines showing DENIED). However, since those are not mission critical, and if you’re like me and have a mixed deployment, you don’t need to make every container run unconfined. In such cases as these, you can probably just edit the unit.run file as outlined and only apply the fix to those instances that need it. Anything running Podman 4.9 is completely safe. The node-exporter and ceph-exporter instances on my Proxmox VE 9 hosts will get the manual treatment. Those instances don’t change often, and perhaps a better fix between apparmor and podman will come out before I need to worry about it again.

Good luck!

Time for some new content

Hi all. It’s been a while. 4 years is a long pause.

I think it’s about time I started writing about my personal projects just to have somewhere to be creative, and share. I expect to write a bunch about my adventures with home automation, using Home Assistant, MQTT, Zigbee & zigbee2mqtt, Zwave & zwave2mqtt, Tasmota, ESPHome, ESP8266 & the newer ESP32 devices, and all of the associated technologies, including Elasticsearch, Kibana, Logstash, Beats, Machine Learning and all of the Elastic Stack coolness I can include.

Hopefully it’s something you find enjoyable. I know I do. Maybe it will be of some assistance to you, if you’re interested in similar projects. That’s all for now, though. Stay tuned for some updates soon!

Curator 4 is now in beta

It’s been so long since I posted here that people are likely to think I’ve abandoned my blog. It’s partially true. It’s been nearly a year since my last post, and it’s been exactly a year since the post before that.

I figure this is the perfect place to make the announcement for Curator 4. I’ve been very busy with my Logstash work, so Curator 4 is coming out considerably later than I would have liked. Better late than never, right?

So why Curator 4?

Because I can’t stop improving things, and Curator 3 was decent, but had some limitations which would not permit me to add some of the most requested features. “What new features are those?” you might ask. All in good time!

Limitations in Curator 3

Curator 3 and each of its predecessors were designed to be run from cron, so that periodic maintenance could be performed easily. All of the other features added to Curator since the very beginning (which was only index deletion) have been bolted on, resulting in a very complex command-line structure. This was still navigable, but not what I would have called ideal. One of the most requested features was snapshot restore. Did you know it would have required 9+ flags to accommodate only most of the options available? I just couldn’t add something like that to the bloated command-line structure and still consider it a tool I’m proud to point to and say, “I made that!”

Another of the limitations was atomic alias actions. I puzzled over how to do that with the command-line structure for a long time and realized that it would have resulted in huge, complicated and hard to read command lines. Nobody wants that. So what was the solution? Configuration files.

Configure all the things!
One of the design decisions for Curator 4 was to use YAML configuration files–two of them, to be precise: one for the client configuration (and logging options), and one for the actions to be performed. Having a default client configuration allows for multiple, different action configuration files to not need to repeat the client information in each of them. If you store the client configuration file as $HOME/.curator/curator.yml, then you won’t even have to reference it at the command-line!

The action file allows for filter stacking and command pipelining.

Filter Stacking

If you used Curator before version 4, then you know that Curator had a limited number of ways you could combine filters before performing the desired action. Generally, that was limited to regular expression filtering combined with age-based filtering. With Curator 4, you can chain multiple filters together–as many as you like–to restrict which indices to act on. How might this help you? Let’s say you want to delete indices in excess of 30G of total space consumed. This might represent 30 days worth of data with your normal logging. What if some event caused a torrent of log lines to be produced? You might accidentally delete weeks worth of logs. With filter stacking, you could first filter by pattern, to only count Logstash indices. The next filter would be disk space, 30G worth. The third filter, however, is the magic one: Only delete indices older than 30 days. The total stack would mean, “delete Logstash indices in excess of 30G of storage, but only if they’re also older than 30 days.” Neat, eh?

Command Pipelining

Command pipelining means that you don’t have to execute a different Curator command for each action you want to perform. You can use the YAML action file to have multiple commands, one after the other, in the same file. It is a configurable option to have execution halt if an action fails with an exception, or continue even if there is an exception.

New Actions

There are some new tools in the Curator stable:

One that should almost be considered new since it’s so improved over previous versions is Alias, which now supports simultaneous, atomic add & remove.

Optimize has been renamed to forceMerge, in accordance with Elastic’s API changes.

New Filters

Well, mostly just improved filters. Filter by space allows you to also filter by age, so that instead of filtering exclusively by space, that you can also filter by age as an extra step in the space filter (not as a stacked filter). Why might this be important? So you delete the oldest indices first, of course!

Speaking of deleting the oldest indices first, filtering by age now offers 3 different ways to determine index age:

  • name (which is what all previous versions of Curator used), which requires a time or datestamp in the index name
  • creation_date which derives the age from the time that Elasticsearch created the index, as stored in the index metadata
  • field_stats which calculates the age from the greatest and least values in a specified field. For Curator 4, since this is age calculations, the field type must be mapped as a date.

Also, with regards to age, Curator now converts the name-derived timestamps to epoch time for comparisons, since creation_date and field_stats are already in epoch time. This is important, as it means that comparisons do not follow the conventions used in Curator 3. If a timestamp is older than a date, it’s older. If it’s younger, it’s younger. Curator no longer tries to calculate and compensate for a full unit count. Test with the –dry-run flag before using this to ensure you don’t delete something you want kept.

Also, since all time calculations are relative to epoch time, and are therefore in seconds, time units have been revamped as multiples of seconds:

    if unit == 'seconds':
        multiplier = 1
    elif unit == 'minutes':
        multiplier = 60
    elif unit == 'hours':
        multiplier = 3600
    elif unit == 'days':
        multiplier = 3600*24
    elif unit == 'weeks':
        multiplier = 3600*24*7
    elif unit == 'months':
        multiplier = 3600*24*30
    elif unit == 'years':
        multiplier = 3600*24*365

This means you can use seconds, minutes, hours, days, weeks, months, or even years as valid units. Just remember that Curator 4 doesn’t care that February only has 28 days. If you use months, it is counting 30 days worth of seconds.

More posts to follow!

There’s too much for me to describe in a single blog post. I’ll continue to write about the new changes in Curator 4 over the coming days. In the meantime, please read the release notes and the online documentation for more information.

Happy Curating!

My current Logstash template — 2015-08-31

I figured it was time to share my current template again, as much has changed since Logstash 1.2.  Among the changes include:

  1. doc_values everywhere applicable
  2. Defaults for all numeric types, using doc_values
  3. Proper mapping for the raw sub-field
  4. Leaving the message field analyzed, and with no raw sub-field
  5. Added ip, latitude, and longitude fields to the geoip mapping, using doc_values

If you couldn’t tell, I’m crazy about doc_values.  Using doc_values (where permitted) prevents your elasticsearch java heap size from growing out of control when performing large aggregations—for example, a months worth of data with Kibana—with very little upfront cost in additional storage.

This is mostly generic, but it does have a few things which are specific to my use case (like the Nginx entry).  Feel free to adapt to your needs.

{
  "template" : "logstash-*",
  "settings" : {
    "index.refresh_interval" : "5s"
  },
  "mappings" : {
    "_default_" : {
       "_all" : {"enabled" : true, "omit_norms" : true},
       "dynamic_templates" : [ {
         "message_field" : {
           "match" : "message",
           "match_mapping_type" : "string",
           "mapping" : {
             "type" : "string", "index" : "analyzed", "omit_norms" : true
           }
         }
       }, {
         "string_fields" : {
           "match" : "*",
           "match_mapping_type" : "string",
           "mapping" : {
             "type" : "string", "index" : "analyzed", "omit_norms" : true,
               "fields" : {
                 "raw" : {"type": "string", "index" : "not_analyzed", "doc_values" : true, "ignore_above" : 256}
               }
           }
         }
       }, {
         "float_fields" : {
           "match" : "*",
           "match_mapping_type" : "float",
           "mapping" : { "type" : "float", "doc_values" : true }
         }
       }, {
         "double_fields" : {
           "match" : "*",
           "match_mapping_type" : "double",
           "mapping" : { "type" : "double", "doc_values" : true }
         }
       }, {
         "byte_fields" : {
           "match" : "*",
           "match_mapping_type" : "byte",
           "mapping" : { "type" : "byte", "doc_values" : true }
         }
       }, {
         "short_fields" : {
           "match" : "*",
           "match_mapping_type" : "short",
           "mapping" : { "type" : "short", "doc_values" : true }
         }
       }, {
         "integer_fields" : {
           "match" : "*",
           "match_mapping_type" : "integer",
           "mapping" : { "type" : "integer", "doc_values" : true }
         }
       }, {
         "long_fields" : {
           "match" : "*",
           "match_mapping_type" : "long",
           "mapping" : { "type" : "long", "doc_values" : true }
         }
       }, {
         "date_fields" : {
           "match" : "*",
           "match_mapping_type" : "date",
           "mapping" : { "type" : "date", "doc_values" : true }
         }
       } ],
       "properties" : {
         "@timestamp": { "type": "date", "doc_values" : true },
         "@version": { "type": "string", "index": "not_analyzed", "doc_values" : true },
         "clientip": { "type": "ip", "doc_values" : true },
         "geoip"  : {
           "type" : "object",
           "dynamic": true,
           "properties" : {
             "ip": { "type": "ip", "doc_values" : true },
             "location" : { "type" : "geo_point", "doc_values" : true },
             "latitude" : { "type" : "float", "doc_values" : true },
             "longitude" : { "type" : "float", "doc_values" : true }
           }
         }
       }
    },
    "nginx_json" : {
      "properties" : {
        "duration" : { "type" : "float", "doc_values" : true },
        "status" : { "type" : "short", "doc_values" : true }
      }
    }
  }
}

 
You can also find this in a GitHub gist.
 
Feel free to add any suggestions, or adaptations you may have used in the comments below!
 

Curator 1.1.0 Released

Hi all!

I have been busy working on Curator 1.1.0 since Elasticsearch released version 1.0, with Snapshot/Restore capability. It’s taken some time to get things to work the way I wanted them, but the results are good!

I wrote a full blog post about it on elasticsearch.com.

I did a huge workup of Curator version 1.0.0 in a previous blog post, but the commands are different now, so I went to the trouble of creating a documentation wiki to make things easier.

No, really:

READ THE DOCUMENTATION WIKI

Important: A Brand New Command-Line Structure

Changes to the command-line structure means that your older cron entries will not work with Curator 1.1.0. Please remember to update your commands when upgrading to Curator 1.1.0.

So if you missed my cue to read the new documentation wiki, here are some of the highlights.

Add/Remove indices from an Alias

Add indices older than 30 days to alias last_month:

 curator alias --alias-older-than 30 --alias last_month

Remove indices older than 60 days from alias last_month:

 curator alias --unalias-older-than 60 --alias last_month

Delete indices

Delete indices older than 30 days:

curator delete --older-than 30

Delete by space. Keep 1024GB (1TB) of data in elasticsearch:

curator delete --disk-space 1024

Note that when using size to determine which indices to keep, having closed indices will cause inaccuracies since they cannot be added to the overall size. This is only an issue if you have closed some indices that are not your oldest ones.

Close indices

Close indices older than 14 days:

curator close --older-than 14

Disable bloom filter for indices

Disable bloom filter for indices older than 1:

curator bloom --older-than 1

Optimize (Lucene forceMerge) indices

Optimize is a bit of a misnomer. It is in actuality a Lucene forceMerge operation. With time-series data in a per-day index, Lucene does a good job of keeping the number of segments low. However, if no new data is being ingested, no further segment merging will happen. There are some minor performance benefits from merging segments down to a smaller count, but a greater benefit when it comes to restarts [e.g. version upgrades, etc.] after a shutdown: with fewer segments to have to validate, the cluster comes back up sooner.

Optimize (Lucene forceMerge) indices older than 2 days to 2 segments per shard (the default is 2):

curator optimize --older-than 2

Optimize (Lucene forceMerge) indices older than 2 days to 1 segment per shard:

curator optimize --older-than 2 --max_num_segments 1

Please note that --timeout is no longer required, as in versions older than 1.1.0. A default of 6 hours (21600 seconds) will be applied for optimize and snapshot. Since the optimize operation can take a long time, curator may disconnect and fail to continue with further operations if the timeout is not set high enough. This number may need to be higher, or could be reduced depending on your scenario. The log file will tell you how long it took to perform previous operations, which you could use as a guideline.

Shard/index allocation

You can use curator to apply routing tags to your indices. This is useful for migrating stale indices from your heavy-duty indexing boxes to slower-hardware search boxes. Read more hereabout the index.routing.allocation.require.* settings. In order for the index-level settings to work, you must also have corresponding node-level settings.

Apply setting index.routing.allocation.require.tag=done_indexing to indices older than 2 days:

curator allocation --older-than 2 --rule tag=done_indexing

Snapshots

You can use curator to capture snapshots to a pre-defined repository. To create a repository you can use the API, or the es_repo_mgr script (included with curator 1.1.0). There are other tools available.

One snapshot will be created per index, and it will take its name from the index, e.g. an index named logstash-2014.06.10 will yield a snapshot named logstash-2014.06.10. The only index in each snapshot will be that matching index. This means if you are trying to snapshot multiple indices, it will loop through them one at a time, and it could take a while. You may need to set the initial timeout to something ridiculously large if you’re just barely capturing snapshots. Another potential solution would be to snap them incrementally by changing the --older-than setting. Snapshots can also be capture by --most-recent count, and can be deleted with --delete-older-than:

Snapshot indices older than 20 days to REPOSITORY_NAME:

curator snapshot --older-than 20 --repository REPOSITORY_NAME     

Snapshot most recent 3 indices matching prefix .marvel- to REPOSITORY_NAME:

 curator snapshot --most-recent 3 --prefix .marvel- --repository REPOSITORY_NAME

Delete snapshots older than 365 days from REPOSITORY_NAME:

 curator snapshot --delete-older-than 365 --repository REPOSITORY_NAME

A default of 6 hours (21600 seconds) will be applied for optimize and snapshot. Since a snapshot can take a long time, curator may disconnect and fail to continue with further operations if the timeout is not set high enough. This number may need to be higher, or could be reduced depending on your scenario. The log file will tell you how long it took to perform previous operations, which you could use as a guideline.

Display indices and snapshots matching prefix

Display a list of all indices matching prefix (logstash- by default):

curator show --show-indices

Display a list of all snapshots in REPOSITORY_NAME matching prefix (logstash- by default):

curator show --show-snapshots --repository REPOSITORY_NAME

Conclusion

Curator 1.1.0 has awesome new features!  Please, go forth and make your time-series index management more awesome!  Happy Curating!

New collectd codec (Logstash 1.4.1+) configuration

With the advent of Logstash 1.4.1, I wanted to make sure everyone knows about the new collectd codec.

In Logstash 1.3.x, we introduced the collectd input plugin.  It was awesome!  We could process metrics in Logstash, store them in Elasticsearch and view them with Kibana.  The only downside was that you could only get around 3100 events per second through the plugin.  With Logstash 1.4.0 we introduced a newly revamped UDP input plugin which was multi-threaded and had a queue.  I refactored the collectd input plugin to be a codec (with some help from my co-workers and the community) to take advantage of this huge performance increase.  Now with only 3 threads on my dual-core Macbook Air I can get over 45,000 events per second through the collectd codec!

So, I wanted to provide some quick examples you could use to change your plugin configuration to use the codec instead.

The old way:

input {
  collectd {}
}

The new way:

input {
  udp {
    port => 25826         # Must be specified. 25826 is the default for collectd
    buffer_size => 1452   # Should be specified. 1452 is the default for recent versions of collectd
    codec => collectd { } # This will invoke the default options for the codec
    type => "collectd"
  }
}

This new configuration will use 2 threads and a queue size of 2000 by default for the UDP input plugin. With this you should easily be able to break 30,000 events per second!

I have provided a gist with some other configuration examples. For more information, please check out the Logstash documentation for the collectd codec.

Happy Logstashing!

Curator: Managing your Logstash and other time-series indices in Elasticsearch — beyond delete and optimize

Deprecated: See https://untergeek.com/2014/06/13/curator-1-1-0-released/


 

In my last post I mentioned curator, an update to the logstash_index_cleaner.py script I’d written so very long ago (okay, is 2.5 years a long time? It sure seems like it in sysops time…). I even linked to my blog post at elasticsearch.org about it. It hasn’t been quite a month, yet, but there have been some changes since then so I thought I’d write another blog post about it.

Installation

Curator is now in PyPI! Yay! This makes it so much easier to install:

pip install elasticsearch-curator

However, if you are using a version of Elasticsearch less than 1.0.0, you will need to use a different version of curator:

pip install elasticsearch-curator==0.6.2

Why specify a specific version? We’re branching curator to be able to accommodate changes in the Elasticsearch API for version 1.0 (which have correlating changes in the python elasticsearch module).

Upgrading

Already using a version of curator?  Upgrading is easy!

pip install -U elasticsearch-curator

The same pattern applies if you need to upgrade to a specific version (==X.Y.Z).

Usage

If you’ve installed via pip then you’re all ready to go.  You don’t even need to specify .py afterwards, as before, and it installs to /usr/local/bin so if that’s in your path, you don’t have to change a thing to use it:

curator -h

This will show you the help output, which is rather long.  I will touch on a few of the features and configuration options.

Delete

This is by far the most common use for curator.  But did you know you can delete by space or by date?

Date

Deleting by date is simple! To delete indices older than 30 days,

curator --host my-host -d 30

You can even delete by date + hour! If you have indices defined like logstash-%{+YYYY.MM.dd.HH} you can delete indices older than 48 hours like this:

curator --host my-host --time-unit hours -d 48

Space

You can delete by space if you need to, but with some provisos and warnings.

  1. If you close indices you will not get an accurate count of bytes.  Elasticsearch cannot report on the space consumed by closed indices.
  2. If you choose this method (and keep a large number of daily indices as a result) you may eventually exhaust the portion of your Elasticsearch heap space reserved for indexing.  I’ll revisit this later (in another blog post), but the short answer is you could wind up with too many simultaneously open indices.  One way to fix that is to close indices you’re not actively using, but then you get looped back to #1.
  3. Deleting by space across a cluster is more complicated because the index size reported will be divided among all of your nodes.  Since Elasticsearch tries to balance storage equally across all nodes, you’ll need to calculate accordingly.

To delete indices beyond 1TB (1024GB) of space:

curator --host my-host --curation-style space -g 1024

Optimize (or rather forceMerge)

The term “optimize” will not die, unfortunately.  Optimizing a hard drive is something you used to have to do every so often to defragment and re-order things for the best performance.  Businesses optimize constantly to improve efficiency and save money.  But in the Elasticsearch world, optimizing is something you never have to do.  It is completely optional.  Truthfully it does have a measurable but nearly negligible performance impact on searches (1% – 2%).  So why do it?  I’m glad you asked!

In technical terms, when you perform an optimize API call in Elasticsearch you’re asking it to do what’s known as a forceMerge.  This takes a bit of background, so google that if you want to know the deep down details.  The short version is that an index in Elasticsearch is a Lucene index.  A Lucene index is made up of shards, each of which is also a Lucene index.  Each shard is composed of segments.  While indexing, Elasticsearch will merge segments to keep from having too many open simultaneously (which can have an impact on availability of file handles, etc.).  Once you are no longer indexing to a given index, it won’t need all of those segments any more.  One of the best reasons to optimize is that recovery time during rolling restarts and outages can be dramatically reduced as far fewer segments have to be verified.  One of the worst reasons to optimize is that you do get a slight performance boost to searches — as stated, a mere 1% – 2% increase.  The cost in terms of disk I/O is tremendous.  It is ill advised to optimize indices on busy clusters as both search and indexing can take a performance hit.  It is absolutely unnecessary to optimize an index that is currently indexing.  Don’t do it.

With these errata out of the way, you can optimize indices older than 1 day down to 1 segment per shard (fully optimized) like this:

curator --host my-host --timeout 7200 --max_num_segments 1 -o 1

If unspecified, --max_num_segments defaults to 2 segments per shard.  Notice that the –timeout directive is specified with 7200 seconds (2 hours) defined.  Even small indices can take a long time to optimize.  My personal, 2 node cluster on spinning disks with around 2.5G of data per index takes 45 minutes to optimize a single index.  I run it in the middle of the night at 2am via cron.

Disable bloom filter cache

This is one of the best new features in curator!  It only works, however, if your Elasticsearch version is 0.90.9 or higher.  After you learn what it does you’ll hopefully find this incentive to upgrade if you haven’t already.

The bloom filter cache speeds the indexing process.  With it disabled, indexing can continue, but at a roughly 40% – 50% speed penalty.  But what about time-series indices, like those from Logstash?  Today is 2014.02.18 and I’m currently writing to an index called logstash-2014.02.18.  But I am not writing to logstash-2014.02.16 any more, so why should I have the bloom filter cache open there?  By disabling the bloom filter cache on “cold” indices I can reclaim valuable resources to benefit the whole cluster.

You can disable the bloom filter cache for indices older than 1 day like this:

curator --host my-host -b 1

Simple, no?  The creator of Elasticsearch, Shay Banon, was very keen to get this into curator as soon as possible as it is one of the easiest ways for Logstash users to get a lot of benefit, very quickly.

Close

One of the earliest requests for curator was for staged expiration of indices.  That is to say, close indices older than 15 days and delete them after 30 days.  This is a big deal because an open index is consuming resources, whether you’re searching through it or not.  A closed index only consumes disk space.  If you typically aren’t searching past 1 week, then having the indices closed is a fantastic way to free up valuable resources for your cluster.  Also, if you’re obliged to keep 30 days of data, but rarely—if ever—search past 2 weeks, you can also meet that requirement easily with this setting.

To close indices older than 15 days:

curator --host my-host -c 15

So simple!

Combining flags

Of course, you don’t need to run one command at a time.  If I wanted to close indices older than 15 days, delete older than 30, and disable bloom filters on indices older than 1 day:

curator --host my-host -b 1 -c 15 -d 30

One important limit is that you can’t delete by space and combine with any other operation.  That one needs to fly solo.

Order of operations

When combining flags it’s important to know that the script forces a particular order of operations to prevent unneeded API calls.  Why optimize an index that’s closed? (hint: it’ll fail if you try anyway) Why close an index that’s slated for deletion?

The order is as follows:

  1. Delete (by space or time)
  2. Close
  3. Disable bloom filters
  4. Optimize

Other flags

--prefix

With the recent release of Elasticsearch Marvel the –prefix flag will get some frequent usage!  Marvel stores its data in indices with a similar naming pattern to Logstash: .marvel-%{+YYYY.MM.dd}, so if you’re using Marvel and want to prune those older indices, curator will be happy to oblige!

To perform operations on indices with a different prefix (the default is logstash-), specify it with the --prefix flag:

curator --host my-host -d 30 --prefix .marvel-

The prefix should be everything right up to the date, including the hyphen in the example above.

--separator

If you format your date differently for some reason, e.g. %{+YYYY-MM-dd} (with hyphens instead of periods), then you can specify the separator like this:

curator --host my-host -d 30 --separator -

--ssl

If you are accessing Elasticsearch through a proxy which is protected by SSL, you can specify the --ssl flag in your command-line.

--url_prefix

If you are employing a proxy to isolate your Elasticsearch and are redirecting things through a path you might need this feature.

For example, if your Elasticsearch cluster were behind host foo.bar and had a url prefix of backend, your API call to check settings would look like this:

http://foo.bar/backend/_settings

Your curator command-line would then include these options:

curator --host foo.bar --url_prefix "backend"

Combining the --ssl and --url_prefix options would allow you to access a proxied, SSL protected Elasticsearch instance like this:

https://foo.bar/backend/_settings

with these command-line options:

curator --host foo.bar --port 443 --ssl --url_prefix "backend"

--dry-run

Adding the --dry-run flag to your curator command line will show you what actions might have been taken without actually performing them.

–debug

This should be self-explanatory:  Increased log verbosity 🙂

--logfile

If you do not specify a log file with this flag, all log messages will be directed to stdout.  If you put curator into a crontab without specifying this (or redirecting stdout and stderr) you run the risk of noisy emails every time curator runs.

Conclusion (and future features!)

Curator has come a long way from its humble beginnings, but the best is yet to come!  Some of the feature requests we’re examining right now include:

Do you have an idea of something you’d like to see included in curator?  Please submit a feature request, or better yet, fork the repository and add it yourself!  We accept pull requests!

Curator

Deprecated:

See this post instead.

I had a lot of fun writing what was once logstash_index_cleaner.py.  That script was then rolled into Logstash’s expire-logs github repository. It’s been through several revisions and iterations since then, but the newest incarnation is the best. Check out my blog post at elasticsearch.org on the new-and-improved Curator.