This page is deprecated!
Use the default Logstash template instead.
Update 2014-05-16: Logstash now comes with a default template (which descended from these efforts): It’s available here.
Update 2012-11-05: My most recent template/mapping can be found here.
I find that logstash does a great job with the default index mapping behavior in elasticsearch if you are not sending a ton of log events. Once that amount begins to grow, however, there is a need to manage how it is indexed. One great way to maximize your elasticsearch performance is to use an index template.
curl -XPUT http://localhost:9200/_template/logstash_per_index -d ' { "template" : "logstash*", "settings" : { "number_of_shards" : 4, "index.cache.filter.expire" : "5m", "index.cache.field.expire" : "5m", "index.refresh_interval" : "5s", "index.store.compress.stored" : true, "index.query.default_field" : "@message", "index.routing.allocation.total_shards_per_node" : 2 }, "mappings" : { "_default_" : { "_all" : {"enabled" : false} } } } '
This template was awesome! I let my field cache expire after 5 minutes to prevent it from overfilling. I have compression turned on to save space. I have 4 nodes and 4 shards (plus the default of 1 replica per shard). With a fixed map of 2 total shards per node, I typically would have one primary shard and one replica per index per day. I was using @message as my default search field and dropping the _all field for space considerations. And then I learned about the soft cache type.
The field cache is awesome in elasticsearch, and you want that data to be persistent for as long as possible. It makes searches lightning fast! A 5 minute expiration time doesn’t help with that at all, and the tendency to re-cache the same data was painful. I was hoping for another solution, but could never find one. Then on the elasticsearch LinkedIn group, I found this article. http://blog.sematext.com/2012/05/17/elasticsearch-cache-usage/ After learning about this change and how it had benefited others, I had to make the change myself. Now my template looks like this:
curl -XPUT http://localhost:9200/_template/logstash_per_index -d '{ "template" : "logstash*", "settings" : { "number_of_shards" : 4, "index.cache.field.type" : "soft", "index.refresh_interval" : "5s", "index.store.compress.stored" : true, "index.query.default_field" : "@message", "index.routing.allocation.total_shards_per_node" : 2 }, "mappings" : { "_default_" : { "_all" : {"enabled" : false}, "properties" : { "@fields" : { "type" : "object", "dynamic": true, "path": "full", "properties" : { "clientip" : { "type": "ip"} } }, "@message": { "type": "string", "index": "analyzed" }, "@source": { "type": "string", "index": "not_analyzed" }, "@source_host": { "type": "string", "index": "not_analyzed" }, "@source_path": { "type": "string", "index": "not_analyzed" }, "@tags": { "type": "string", "index": "not_analyzed" }, "@timestamp": { "type": "date", "index": "not_analyzed" }, "@type": { "type": "string", "index": "not_analyzed" } } } } } '
The change means that I don’t need cache expiry any more. The built-in GC engine will cover that for me! The downside is that it will take a full 30 days for this solution to get fully caught up as the index.cache.field.type setting can’t be applied to existing indexes.