Using templates to improve elasticsearch caching (with logstash)

This page is deprecated!

Use the default Logstash template instead.

Update 2014-05-16: Logstash now comes with a default template (which descended from these efforts): It’s available here.

Update 2012-11-05: My most recent template/mapping can be found here.

I find that logstash does a great job with the default index mapping behavior in elasticsearch if you are not sending a ton of log events.  Once that amount begins to grow, however, there is a need to manage how it is indexed.  One great way to maximize your elasticsearch performance is to use an index template.

 

curl -XPUT http://localhost:9200/_template/logstash_per_index -d '
{
    "template" : "logstash*",
    "settings" : {
        "number_of_shards" : 4,
        "index.cache.filter.expire" : "5m",
        "index.cache.field.expire" : "5m",
        "index.refresh_interval" : "5s",
        "index.store.compress.stored" : true,
        "index.query.default_field" : "@message",
        "index.routing.allocation.total_shards_per_node" : 2
    },
    "mappings" : {
        "_default_" : {
             "_all" : {"enabled" : false}
        }
    }
}
'

This template was awesome! I let my field cache expire after 5 minutes to prevent it from overfilling. I have compression turned on to save space. I have 4 nodes and 4 shards (plus the default of 1 replica per shard). With a fixed map of 2 total shards per node, I typically would have one primary shard and one replica per index per day. I was using @message as my default search field and dropping the _all field for space considerations. And then I learned about the soft cache type.

The field cache is awesome in elasticsearch, and you want that data to be persistent for as long as possible. It makes searches lightning fast! A 5 minute expiration time doesn’t help with that at all, and the tendency to re-cache the same data was painful. I was hoping for another solution, but could never find one. Then on the elasticsearch LinkedIn group, I found this article. http://blog.sematext.com/2012/05/17/elasticsearch-cache-usage/ After learning about this change and how it had benefited others, I had to make the change myself. Now my template looks like this:

curl -XPUT http://localhost:9200/_template/logstash_per_index -d '{
    "template" : "logstash*",
    "settings" : {
        "number_of_shards" : 4,
        "index.cache.field.type" : "soft",
        "index.refresh_interval" : "5s",
        "index.store.compress.stored" : true,
        "index.query.default_field" : "@message",
        "index.routing.allocation.total_shards_per_node" : 2
    },
    "mappings" : {
        "_default_" : {
           "_all" : {"enabled" : false},
           "properties" : {
              "@fields" : {
                   "type" : "object",
                   "dynamic": true,
                   "path": "full",
                   "properties" : {
                       "clientip" : { "type": "ip"}
                   }
              },
              "@message": { "type": "string", "index": "analyzed" },
              "@source": { "type": "string", "index": "not_analyzed" },
              "@source_host": { "type": "string", "index": "not_analyzed" },
              "@source_path": { "type": "string", "index": "not_analyzed" },
              "@tags": { "type": "string", "index": "not_analyzed" },
              "@timestamp": { "type": "date", "index": "not_analyzed" },
               "@type": { "type": "string", "index": "not_analyzed" }    
           }   
        }
   }
}
'

The change means that I don’t need cache expiry any more. The built-in GC engine will cover that for me! The downside is that it will take a full 30 days for this solution to get fully caught up as the index.cache.field.type setting can’t be applied to existing indexes.

7 thoughts on “Using templates to improve elasticsearch caching (with logstash)

  1. Is there no way to update index.cache.field.type without recreating indices? If not, any clue why – I’d imagine caching is a quite decoupled function?
    Nice article btw. And the sematext article on cache usage has been really useful for me too.

  2. Update: I think adding
    index.cache.field.type: ‘soft’
    to elasticsearch.yml and restarting the cluster does the trick. Not sure how to verify if this change has been applied though, without recreating a relevant scenario.

    • Aaron says:

      I think it does work the way you suggest. I just prefer to keep directives like these in my templates. That way I don’t have to worry about missing the setup on one of my nodes.

    • David Warden says:

      index.cache.field.type can be changed after the index has been created, but you must close it first. The “head” plugin for ElasticSearch is great for doing that kind of operation (manage indices/making HTTP requests to ES)

      Anyway, thanks for this very useful template, I was pulling out my hair trying to figure out why ES was constantly running out of memory.

      • Aaron says:

        You can do that with the index closed? I didn’t even see that the option would work in the Update API. That’s pretty cool, but it does come at the cost of a rebalance when you close/open the index.

        Let me know what your results are, too. It may not be perfect, still, but it’s much less likely to kill ES now.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s