Hi all!

I have been busy working on Curator 1.1.0 since Elasticsearch released version 1.0, with Snapshot/Restore capability. It’s taken some time to get things to work the way I wanted them, but the results are good!

I wrote a full blog post about it on elasticsearch.com.

I did a huge workup of Curator version 1.0.0 in a previous blog post, but the commands are different now, so I went to the trouble of creating a documentation wiki to make things easier.

No, really:

READ THE DOCUMENTATION WIKI

Important: A Brand New Command-Line Structure

Changes to the command-line structure means that your older cron entries will not work with Curator 1.1.0. Please remember to update your commands when upgrading to Curator 1.1.0.

So if you missed my cue to read the new documentation wiki, here are some of the highlights.

Add/Remove indices from an Alias

Add indices older than 30 days to alias last_month:

 curator alias --alias-older-than 30 --alias last_month

Remove indices older than 60 days from alias last_month:

 curator alias --unalias-older-than 60 --alias last_month

Delete indices

Delete indices older than 30 days:

curator delete --older-than 30

Delete by space. Keep 1024GB (1TB) of data in elasticsearch:

curator delete --disk-space 1024

Note that when using size to determine which indices to keep, having closed indices will cause inaccuracies since they cannot be added to the overall size. This is only an issue if you have closed some indices that are not your oldest ones.

Close indices

Close indices older than 14 days:

curator close --older-than 14

Disable bloom filter for indices

Disable bloom filter for indices older than 1:

curator bloom --older-than 1

Optimize (Lucene forceMerge) indices

Optimize is a bit of a misnomer. It is in actuality a Lucene forceMerge operation. With time-series data in a per-day index, Lucene does a good job of keeping the number of segments low. However, if no new data is being ingested, no further segment merging will happen. There are some minor performance benefits from merging segments down to a smaller count, but a greater benefit when it comes to restarts [e.g. version upgrades, etc.] after a shutdown: with fewer segments to have to validate, the cluster comes back up sooner.

Optimize (Lucene forceMerge) indices older than 2 days to 2 segments per shard (the default is 2):

curator optimize --older-than 2

Optimize (Lucene forceMerge) indices older than 2 days to 1 segment per shard:

curator optimize --older-than 2 --max_num_segments 1

Please note that --timeout is no longer required, as in versions older than 1.1.0. A default of 6 hours (21600 seconds) will be applied for optimize and snapshot. Since the optimize operation can take a long time, curator may disconnect and fail to continue with further operations if the timeout is not set high enough. This number may need to be higher, or could be reduced depending on your scenario. The log file will tell you how long it took to perform previous operations, which you could use as a guideline.

Shard/index allocation

You can use curator to apply routing tags to your indices. This is useful for migrating stale indices from your heavy-duty indexing boxes to slower-hardware search boxes. Read more hereabout the index.routing.allocation.require.* settings. In order for the index-level settings to work, you must also have corresponding node-level settings.

Apply setting index.routing.allocation.require.tag=done_indexing to indices older than 2 days:

curator allocation --older-than 2 --rule tag=done_indexing

Snapshots

You can use curator to capture snapshots to a pre-defined repository. To create a repository you can use the API, or the es_repo_mgr script (included with curator 1.1.0). There are other tools available.

One snapshot will be created per index, and it will take its name from the index, e.g. an index named logstash-2014.06.10 will yield a snapshot named logstash-2014.06.10. The only index in each snapshot will be that matching index. This means if you are trying to snapshot multiple indices, it will loop through them one at a time, and it could take a while. You may need to set the initial timeout to something ridiculously large if you’re just barely capturing snapshots. Another potential solution would be to snap them incrementally by changing the --older-than setting. Snapshots can also be capture by --most-recent count, and can be deleted with --delete-older-than:

Snapshot indices older than 20 days to REPOSITORY_NAME:

curator snapshot --older-than 20 --repository REPOSITORY_NAME

Snapshot most recent 3 indices matching prefix .marvel- to REPOSITORY_NAME:

 curator snapshot --most-recent 3 --prefix .marvel- --repository REPOSITORY_NAME

Delete snapshots older than 365 days from REPOSITORY_NAME:

 curator snapshot --delete-older-than 365 --repository REPOSITORY_NAME

A default of 6 hours (21600 seconds) will be applied for optimize and snapshot. Since a snapshot can take a long time, curator may disconnect and fail to continue with further operations if the timeout is not set high enough. This number may need to be higher, or could be reduced depending on your scenario. The log file will tell you how long it took to perform previous operations, which you could use as a guideline.

Display indices and snapshots matching prefix

Display a list of all indices matching prefix (logstash- by default):

curator show --show-indices

Display a list of all snapshots in REPOSITORY_NAME matching prefix (logstash- by default):

curator show --show-snapshots --repository REPOSITORY_NAME

Conclusion

Curator 1.1.0 has awesome new features! Please, go forth and make your time-series index management more awesome! Happy Curating!

3 thoughts on “Curator 1.1.0 Released”

Pingback: Curator: Managing your Logstash and other time-series indices in Elasticsearch — beyond delete and optimize | The Untergeek
Pingback: Curator | The Untergeek
Brian Tarbox says:

Curator requires a version of urllib3 that I can not use (because it has a bad interaction with python-requests which in turn breaks cloud-init). Is there a version of curator without the urllib3 dependency or is there is a non-curator way to trim ES indices?

Thank you.

September 1, 2015 at 10:54 am

The Untergeek

Ramblings of a non-übergeek

Curator 1.1.0 Released

READ THE DOCUMENTATION WIKI

Important: A Brand New Command-Line Structure

Add/Remove indices from an Alias

Delete indices

Close indices

Disable bloom filter for indices

Optimize (Lucene forceMerge) indices

Shard/index allocation

Snapshots

Display indices and snapshots matching prefix

Conclusion

3 thoughts on “Curator 1.1.0 Released”

Leave a comment Cancel reply

READ THE DOCUMENTATION WIKI

Important: A Brand New Command-Line Structure

Add/Remove indices from an Alias

Delete indices

Close indices

Disable bloom filter for indices

Optimize (Lucene forceMerge) indices

Shard/index allocation

Snapshots

Display indices and snapshots matching prefix

Conclusion

Share this:

Related

3 thoughts on “Curator 1.1.0 Released”

Leave a comment Cancel reply