Update: This page is for the now deprecated Logstash 1.1.x and older. Look for the updated version of this here: http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/

Last time we looked at ways to improve logstash/elasticsearch with elasticsearch templates. Today we’ll save ourselves a lot of grok parsing pain with apache’s custom log feature.

Disclaimer: This only works with versions of logstash supporting the UDP input. You can adapt this to send or log in another way, if you like, e.g. send the json to a file and have logstash tail it.

Let’s look first, then explain later. If you are using an include line in your apache config (e.g. Include conf.d/*.conf) then all you need to do is to put this in a standalone or vhost. If it’s a single-host apache, I will create logstash.conf and put this in it:

LogFormat "{ \
            \"@vips\":[\"vip.example.com\",\"customer.example.net\"], \
            \"@source\":\"file://host.example.com//usr/local/apache2/logs/access_log\", \
            \"@source_host\": \"host.example.com\", \
            \"@source_path\": \"/usr/local/apache2/logs/access_log\", \
            \"@tags\":[\"Application\",\"Customer\"], \
            \"@message\": \"%h %l %u %t \\\"%r\\\" %>s %b\", \
            \"@fields\": { \
                \"timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \
                \"clientip\": \"%a\", \
                \"duration\": %D, \
                \"status\": %>s, \
                \"request\": \"%U%q\", \
                \"urlpath\": \"%U\", \
                \"urlquery\": \"%q\", \
                \"method\": \"%m\", \
                \"bytes\": %B \
                }  \
           }" ls_apache_json

CustomLog "|/usr/local/bin/udpclient.pl 127.0.0.1 57080" ls_apache_json

Some of this should look straightforward, but let me point to some pitfalls I had to dig myself out of.

  • bytes: in the @message, I use %b, but in @fields, I use %B. The reason is summed up nicely on http://httpd.apache.org/docs/2.2/mod/mod_log_config.html :

    %B Size of response in bytes, excluding HTTP headers.
    %b Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a ‘-‘ rather than a 0 when no bytes are sent.

    In other words, since I’m trying to send an integer value, if I don’t choose %B, I may send a – (dash/hyphen) when there is no value to send, making the field be categorized as a string. Jordan says that sending the JSON as an integer (i.e. no quotes) should make it into ES as an integer. This may yet require a mapping/template.

  • @message: This is the apache common format. You could easily substitute in the same fields as go in the apache combined format. Or, you could leave the common format as @message and simply add the fields for user-agent and referrer if you want to collect those:
                    \"referer\": \\\"%{Referer}i\\\", \
                    \"useragent\": \\\"%{User-agent}i\\\" \
  • timestamp: Jordan has had no problems with passing @timestamp directly, but I have had nothing but problems. Perhaps I can get a solution linked here some time, but in the meanwhile, I simply spit out the timestamp here in ISO8601, and then use date and mutate in logstash.conf:
    input {
       udp {
          port => 57080
          type => "apache"
          buffer_size => 8192
          format => "json_event"
       }
    }
    
    filter {
       date {
           type => "apache"
           timestamp => "ISO8601"
       }
       mutate {
            type   => "apache"
            remove => [ "timestamp" ]
       }
    }

What comes out is ready (except for the date munging) for feeding into elasticsearch, and even has an @message field for searching. This method also makes it trivial to add extra fields (get them from http://httpd.apache.org/docs/2.2/mod/mod_log_config.html) without doing anything extra or having to re-work your patterns for grok. As I mentioned previously, I keep the common log format for @message, then add the other fields (like duration, user-agent and referer) as needed. An apachectl restart is all it takes to get the new values into elasticsearch.

And for the sake of a complete solution, the udpclient.pl script:

#!/usr/bin/perl
#udpclient.pl

use IO::Socket::INET;
my $host = $ARGV[0];
my $port = $ARGV[1];

# flush after every write
$| = 1;

my ($socket,$logdata);

#  We call IO::Socket::INET->new() to create the UDP Socket
# and bind with the PeerAddr.
$socket = new IO::Socket::INET (
   PeerAddr   => "$host:$port",
   Proto        => 'udp'
) or die "ERROR in Socket Creation : $!\n";

while ($logdata = <STDIN>) {
    $socket->send($logdata);
}

$socket->close();

I also tend to think that one of the best things about this solution is that it does not interfere with your current logging solution in any degree. It simply catches more and sends it, pre-formatted, over local udp to logstash, and then to whatever output(s) you have defined.

Tagged with:
 

9 Responses to Getting Apache to output JSON (for logstash)

  1. Aaron says:

    Apparently, even a <pre> tag (and <code> tag) can’t make WordPress display the perl code properly. If you cut/pasted this and it said “while ($logdata = )” instead of including <STDIN>, you have my apologies. It seems to work now, having added an ampersand-lt; as the less-than sign.

  2. If you define your piped command in the CustomLog directive with a double pipe:

    CustomLog “||/usr/local/bin/udpclient.pl 127.0.0.1 57080” ls_apache_json

    Apache won’t start a shell to launch your executable in.

    While this might not seem a big change, it has proven to be more reliable in my situation, as the single-pipe variant made me end up with tons of zombie processes after a couple of days, presumably due to apache’s child processes getting killed but the associated piped processes don’t…

  3. Lucas says:

    Thanks for sharing this, it is working, but somehow the Apache events are logged twice. I have not such problems with forwarded Rsyslog events.

    I only have a single Logstash output defined in the indexer. Perhaps you have a suggestion what I am doing wrong?

    • Aaron says:

      The IRC channel is a better place to discuss this. Without seeing any of your configuration, I would have no way of knowing. I’m untergeek in the #logstash channel on freenode.

  4. Allyson says:

    I had some problems with apache 2.2.4 (long story…) and getting the escapes to work properly in httpd.conf / ssl.conf. I was able to get the JSON example in the logstash cookbook to work, but was not able to incorporate the @message field with that. Per untergeek’s suggestion, I instead used the example in the logstash cookbook, then used a ‘mutate’ filter to set the @message field with the standard Combined Log Format data:


    mutate {
    type => "apache-logs"
    replace => [ "@message", "%{fields.client} %{fields.duration_usec} %{fields.request} %{fields.status}" ]
    }

    I hope this is helpful for anyone else looking to do something similar.

    • Allyson says:

      Followup: your replacement fields should look like %{fields.fieldname} instead of just %{fieldname}. The logstash stdout output will look wrong using this method, but it will be correct in Kibana.

  5. […] travelers, who may have come to this page by way of my other page on this subject, dealing with the same subject matter, but with logstash version […]

  6. […] Untergeek’s blog posts: Old & New, this is how you would have done it […]

Leave a Reply

Your email address will not be published. Required fields are marked *