It’s been so long since I posted here that people are likely to think I’ve abandoned my blog. It’s partially true. It’s been nearly a year since my last post, and it’s been exactly a year since the post before that.

I figure this is the perfect place to make the announcement for Curator 4. I’ve been very busy with my Logstash work, so Curator 4 is coming out considerably later than I would have liked. Better late than never, right?

So why Curator 4?

Because I can’t stop improving things, and Curator 3 was decent, but had some limitations which would not permit me to add some of the most requested features. “What new features are those?” you might ask. All in good time!

Limitations in Curator 3

Curator 3 and each of its predecessors were designed to be run from cron, so that periodic maintenance could be performed easily. All of the other features added to Curator since the very beginning (which was only index deletion) have been bolted on, resulting in a very complex command-line structure. This was still navigable, but not what I would have called ideal. One of the most requested features was snapshot restore. Did you know it would have required 9+ flags to accommodate only most of the options available? I just couldn’t add something like that to the bloated command-line structure and still consider it a tool I’m proud to point to and say, “I made that!”

Another of the limitations was atomic alias actions. I puzzled over how to do that with the command-line structure for a long time and realized that it would have resulted in huge, complicated and hard to read command lines. Nobody wants that. So what was the solution? Configuration files.

Configure all the things!
One of the design decisions for Curator 4 was to use YAML configuration files–two of them, to be precise: one for the client configuration (and logging options), and one for the actions to be performed. Having a default client configuration allows for multiple, different action configuration files to not need to repeat the client information in each of them. If you store the client configuration file as $HOME/.curator/curator.yml, then you won’t even have to reference it at the command-line!

The action file allows for filter stacking and command pipelining.

Filter Stacking

If you used Curator before version 4, then you know that Curator had a limited number of ways you could combine filters before performing the desired action. Generally, that was limited to regular expression filtering combined with age-based filtering. With Curator 4, you can chain multiple filters together–as many as you like–to restrict which indices to act on. How might this help you? Let’s say you want to delete indices in excess of 30G of total space consumed. This might represent 30 days worth of data with your normal logging. What if some event caused a torrent of log lines to be produced? You might accidentally delete weeks worth of logs. With filter stacking, you could first filter by pattern, to only count Logstash indices. The next filter would be disk space, 30G worth. The third filter, however, is the magic one: Only delete indices older than 30 days. The total stack would mean, “delete Logstash indices in excess of 30G of storage, but only if they’re also older than 30 days.” Neat, eh?

Command Pipelining

Command pipelining means that you don’t have to execute a different Curator command for each action you want to perform. You can use the YAML action file to have multiple commands, one after the other, in the same file. It is a configurable option to have execution halt if an action fails with an exception, or continue even if there is an exception.

New Actions

There are some new tools in the Curator stable:

One that should almost be considered new since it’s so improved over previous versions is Alias, which now supports simultaneous, atomic add & remove.

Optimize has been renamed to forceMerge, in accordance with Elastic’s API changes.

New Filters

Well, mostly just improved filters. Filter by space allows you to also filter by age, so that instead of filtering exclusively by space, that you can also filter by age as an extra step in the space filter (not as a stacked filter). Why might this be important? So you delete the oldest indices first, of course!

Speaking of deleting the oldest indices first, filtering by age now offers 3 different ways to determine index age:

  • name (which is what all previous versions of Curator used), which requires a time or datestamp in the index name
  • creation_date which derives the age from the time that Elasticsearch created the index, as stored in the index metadata
  • field_stats which calculates the age from the greatest and least values in a specified field. For Curator 4, since this is age calculations, the field type must be mapped as a date.

Also, with regards to age, Curator now converts the name-derived timestamps to epoch time for comparisons, since creation_date and field_stats are already in epoch time. This is important, as it means that comparisons do not follow the conventions used in Curator 3. If a timestamp is older than a date, it’s older. If it’s younger, it’s younger. Curator no longer tries to calculate and compensate for a full unit count. Test with the –dry-run flag before using this to ensure you don’t delete something you want kept.

Also, since all time calculations are relative to epoch time, and are therefore in seconds, time units have been revamped as multiples of seconds:

    if unit == 'seconds':
        multiplier = 1
    elif unit == 'minutes':
        multiplier = 60
    elif unit == 'hours':
        multiplier = 3600
    elif unit == 'days':
        multiplier = 3600*24
    elif unit == 'weeks':
        multiplier = 3600*24*7
    elif unit == 'months':
        multiplier = 3600*24*30
    elif unit == 'years':
        multiplier = 3600*24*365

This means you can use seconds, minutes, hours, days, weeks, months, or even years as valid units. Just remember that Curator 4 doesn’t care that February only has 28 days. If you use months, it is counting 30 days worth of seconds.

More posts to follow!

There’s too much for me to describe in a single blog post. I’ll continue to write about the new changes in Curator 4 over the coming days. In the meantime, please read the release notes and the online documentation for more information.

Happy Curating!

Tagged with:
 

Leave a Reply

Your email address will not be published. Required fields are marked *