It had been suggested that the development team have a log monitoring system based around the log4j library, however it turns out that this system is mostly reactive, and there were a few undocumented configurations that didn't make the migration, and caused some problems. So after some hasty "diff -r" and rsync everything seemed to be well.
But I decided that this weekend I am putting together a central logging management system using logstash and chef, that I can deploy zero config style to hosts via a cookbook recipe, to catch 404 and 503 errors and other alerts coming from remote systems in some timely manner.
(this post is still just mostly a brain dump of the docs I used to get the centralized logging working, its not ready for step-by-step use yet)
I have previously used syslog-ng and rsyslog for centralizing log aggregation, and I am still surprised that it isn't considered a standard configuration in deployment. (maybe at your company it is, but my experience is that it is something that gets left aside or disabled for performance reasons or disk space, or whatever, but when I come back and look at these systems its not the first place the teams go to for troubleshooting)
Maybe it was that the old syslog-ng log searching interface was a bit crufty, or that the log backfill seemed to take a bit of managing. I think the problem was that it lacked AJAX and ruby ;-)
As research for this little project I just took a look at the php-syslog-ng demo site and since last had cause to look it has been renamed to logzilla and has had a fit of the magpies (SHINY!! WANT SHINY!! SHINY GOOD!!!!!!! NEW SHINY! GIVE SHINY!!!!!) and reimplemented the UI using the jquery library.
(my new favourite word/phrase of the week is "having a fit of the magpies" to mean the uncontrollable urge to write some tool or web site in a new and cool library which I learned off uknot)
Logging agents and Central aggregation host (with searchable UI)
Anyway the plan is to setup a bunch of agents that run on the load balancer, the tomcat servers and the mysql database, which tail the logs there and ship them to a central host to act as an aggregater with a database and indexer, and a web interface for searching.
Logstash is the cool tool of the moment, and seems to have quite a lot of momentum in the Devops blogsphere. It also seems possible to run it, on top of, (or next to) the basic day to day shit of an operating system so I am happy to give it a shot on some production systems.
I like the idea that it doesn't require too much fucking about with system config files such as /etc/syslog.conf and can just run out of a single process.
So after a brief reading of the logstash website I now understand that logstash is written in jruby, and has a bunch of plugins allowing it to take log inputs from many sources, and can ship them via a message broker into a searchable backend, and that the default is elasticsearch, of which an embedded version is bundled in the self described monolithic jar file that is the main download.
There is a fairly chunky negative here, which is that there are no packages to install yet, and there is some use of things like using non standard packages and multicast in the defaults, which is not something that I am entirely familiar with. However I have the power of chef, so hopefully I should be able to wrap this fucker up into an agent and a indexer recipe and push it out to my nodes in some short time. The clock is ticking. Well I have already wasted a few hours procrastinating so the clock has been ticking for some time.
So each component run as a java executable class on each node. The agent is configured to process inputs and pass the logs to a message broker listening on the centralized logging host.
There is an indexing agent configured on the central host which subscribes to the message broker and processes the logs into the easticsearch backend.
There is a getting started tutorial that suggests a configuration which includes rabbitmq for message delivery, and elasticsearch as a persistence and search engine.
1. configure the logstash indexer and web agent to process alerts from the message broker
2. configure the message broker
3. configure the agent to send logs to the message broker
4. configure an agent process to process historical logs to the message broker.
5. wrap this shit up in some chef cookbook, and distribute it to the nodes
Now this is probably as much my naivety, as working on the kind of start up web projects that has been mostly my history, that unicast is the norm, hence all my troubleshoot chops are expecting packets with source and destination addresses, and to arrive on some know port.
Anyway once you get used to the idea of multicast, and calm the anxiety associated with opening massive chunks of ports, multicast makes a bit more sense.
But if you are working through the logstash "getting started" guides, be prepared to make a lot of references to the iptables docs on multicast rules. I also ended up adding a rule like so;
-A RH-Firewall-1-INPUT -j LOG --log-prefix LOGSTASH --log-ip-options --log-tcp-options
just before the REJECT rule, so I could tail the kern.log file to see what packets were actually being rejected.