This article is over in the new (to me) Policy DevCenter at O’Reilly: O’Reilly Network: Protect Your OSP with logfinder
It references a new white paper put out by the EFF to aid system admins in finding log files on their systems with the idea that you can’t be bothered by the Feds serving you a PATRIOT Act secret warrant if you don’t keep logs. What has the world come to?
I read a story by Jamie Zawinski that is buried in his rants section so I’m not going to go find it (if you’ve seen it you know that once you go in, you are stuck for an hour just marveling at the scope of it all) but the gist was that when MS sued Netscape, they subpoenaed an internal newsgroup that he started where people complained about how much it sucked there. There is some mention in there about not keeping anything longer than seven days.
Seven days. That is a good, round number. As I think about it, I’ve never needed a log – system, network, server, or access – that was older than seven days for anything that I cared about but I religiously let logrotate and logwatch scour those things, firing off logdigest emails to root and piling up the .tar.gz files. Huh.
Ok, so here is my take on how to do this, from a network (Cisco and Linux) perspective:
- Get a dual 64-bit 1U server, put a single HDD in it and a ridiculous amount of RAM. Install Linux and create 2 RAM disks that mount on start-up – one really big, and the other about a few hundred Mb. Put it on a GigE switch, and set up a VLAN for logging and network administration.
- Configure all the routers, switches, firewalls, and network services Linux boxes to sent their syslog info to this box and not store it locally. Configure all the services logs – Apache, LDAP, RADIUS, SMTP, etc – to use this central log as their only log point as well. If you are stuck with Windows servers, install the syslog service and configure them to log only to the remote system as well.
- Setup syslog on the 1U to use the bigger RAM disk for logging, and constrain the size of the log to whatever is right for the RAM disk using logwatch or logrotate.
- Setup the analysis tools – whatever floats your boat – to crunch the logs in near real time. Abstract out the trends and generalities, but don’t visualize specifics for ‘normal’ activity. Produce static analysis reports that neither contain log information about specific hosts, nor rely upon the logs. Essentially you are taking snapshots of the system.
- Use a Bayesian filtering mechanism to monitor the flow of “problems” and once those “problems” cross some threshold, start writing the related log entries to the smaller RAM disk. (This is the magic I-haven’t-got-a-clue-how-to-actually-do-it step, just in case you were wondering. Take a look at the ‘Predator versus Prey‘ entry from earlier this month for a better idea of what I’m talking about.)
- Use the Homeland Security threat level schema for an alert system driven by the log analysis and set up an RSS feed that syndicates the status. If it exceeds ‘Yellow’, during working hours a Jabber bot IM’s the entire IT staff until someone at the console clears the alarm and outside working hours a pager/voice call mechanism walks through an escalation tree until someone acknowledges it by snoozing the alarm via phone or pager, and clearing the alarm at the console. The key here is that you are alerting the people without sending logs.
- Using the snapshots, make a ‘motion picture’ representation of the network health once a week, once a month, once a quarter, and once a year – each time using the same number of slides, each one representing the same proportional interval of time. None of them actually citing host or user specific information or depending upon log archives.
- Set-up a cron job to reboot every seven days.