Hi all!
I have been busy working on Curator 1.1.0 since Elasticsearch released version 1.0, with Snapshot/Restore capability. It’s taken some time to get things to work the way I wanted them, but the results are good!
I wrote a full blog post about it on elasticsearch.com.
I did a huge workup of Curator version 1.0.0 in a previous blog post, but the commands are different now, so I went to the trouble of creating a documentation wiki to make things easier.
No, really:
READ THE DOCUMENTATION WIKI
Important: A Brand New Command-Line Structure
Changes to the command-line structure means that your older cron entries will not work with Curator 1.1.0. Please remember to update your commands when upgrading to Curator 1.1.0.
So if you missed my cue to read the new documentation wiki, here are some of the highlights.
Add/Remove indices from an Alias
Add indices older than 30 days to alias last_month
:
curator alias --alias-older-than 30 --alias last_month
Remove indices older than 60 days from alias last_month
:
curator alias --unalias-older-than 60 --alias last_month
Delete indices
Delete indices older than 30 days:
curator delete --older-than 30
Delete by space. Keep 1024GB (1TB) of data in elasticsearch:
curator delete --disk-space 1024
Note that when using size to determine which indices to keep, having closed indices will cause inaccuracies since they cannot be added to the overall size. This is only an issue if you have closed some indices that are not your oldest ones.
Close indices
Close indices older than 14 days:
curator close --older-than 14
Disable bloom filter for indices
Disable bloom filter for indices older than 1:
curator bloom --older-than 1
Optimize (Lucene forceMerge) indices
Optimize is a bit of a misnomer. It is in actuality a Lucene forceMerge operation. With time-series data in a per-day index, Lucene does a good job of keeping the number of segments low. However, if no new data is being ingested, no further segment merging will happen. There are some minor performance benefits from merging segments down to a smaller count, but a greater benefit when it comes to restarts [e.g. version upgrades, etc.] after a shutdown: with fewer segments to have to validate, the cluster comes back up sooner.
Optimize (Lucene forceMerge) indices older than 2 days to 2 segments per shard (the default is 2):
curator optimize --older-than 2
Optimize (Lucene forceMerge) indices older than 2 days to 1 segment per shard:
curator optimize --older-than 2 --max_num_segments 1
Please note that --timeout
is no longer required, as in versions older than 1.1.0. A default of 6 hours (21600 seconds) will be applied for optimize
and snapshot
. Since the optimize operation can take a long time, curator may disconnect and fail to continue with further operations if the timeout is not set high enough. This number may need to be higher, or could be reduced depending on your scenario. The log file will tell you how long it took to perform previous operations, which you could use as a guideline.
Shard/index allocation
You can use curator to apply routing tags to your indices. This is useful for migrating stale indices from your heavy-duty indexing boxes to slower-hardware search boxes. Read more hereabout the index.routing.allocation.require.*
settings. In order for the index-level settings to work, you must also have corresponding node-level settings.
Apply setting index.routing.allocation.require.tag=done_indexing
to indices older than 2 days:
curator allocation --older-than 2 --rule tag=done_indexing
Snapshots
You can use curator to capture snapshots to a pre-defined repository. To create a repository you can use the API, or the es_repo_mgr
script (included with curator 1.1.0). There are other tools available.
One snapshot will be created per index, and it will take its name from the index, e.g. an index named logstash-2014.06.10
will yield a snapshot named logstash-2014.06.10
. The only index in each snapshot will be that matching index. This means if you are trying to snapshot multiple indices, it will loop through them one at a time, and it could take a while. You may need to set the initial timeout to something ridiculously large if you’re just barely capturing snapshots. Another potential solution would be to snap them incrementally by changing the --older-than
setting. Snapshots can also be capture by --most-recent
count, and can be deleted with --delete-older-than
:
Snapshot indices older than 20 days to REPOSITORY_NAME:
curator snapshot --older-than 20 --repository REPOSITORY_NAME
Snapshot most recent 3 indices matching prefix .marvel-
to REPOSITORY_NAME:
curator snapshot --most-recent 3 --prefix .marvel- --repository REPOSITORY_NAME
Delete snapshots older than 365 days from REPOSITORY_NAME:
curator snapshot --delete-older-than 365 --repository REPOSITORY_NAME
A default of 6 hours (21600 seconds) will be applied for optimize
and snapshot
. Since a snapshot can take a long time, curator may disconnect and fail to continue with further operations if the timeout is not set high enough. This number may need to be higher, or could be reduced depending on your scenario. The log file will tell you how long it took to perform previous operations, which you could use as a guideline.
Display indices and snapshots matching prefix
Display a list of all indices matching prefix (logstash-
by default):
curator show --show-indices
Display a list of all snapshots in REPOSITORY_NAME matching prefix (logstash-
by default):
curator show --show-snapshots --repository REPOSITORY_NAME
Conclusion
Curator 1.1.0 has awesome new features! Please, go forth and make your time-series index management more awesome! Happy Curating!
Curator requires a version of urllib3 that I can not use (because it has a bad interaction with python-requests which in turn breaks cloud-init). Is there a version of curator without the urllib3 dependency or is there is a non-curator way to trim ES indices?
Thank you.