$Id: TODO,v 1.186 2004/04/07 20:43:17 flacoste Exp $ ______________________________ ' ` | this file is largely obsolete | `-------------------------------' * proxy - update lr_config, user-manual, website, address.cf, online responder etc. for new service and superservice. integrate squid2dlf, part of welf, socks2dlf. write a ms_proxy2dlf. write more reports. * database - add PostgreSQL plugin - add more reports - add "query-type" extension. Possible types are "create table", "insert", "select", "show", etc * socks - finish or purge socks stuff. * ftp - look at Pure FTPD: log formats: CLF!!! (Apache!), Stats format see: http://pureftpd.sourceforge.net/README Pure FTPD http://www.shagged.org/ftpstats/ Stats format * www - Report on specific webpages: regexp in config file, matched against interesting dlf fields: get the trackfilter stuff more generic, get the nonpics stuff merged in this new scheme. - Lire::WWW::Filename::Attack Add more attacks. See e.g. http://www.securityfocus.com/. * email - Get email convertors use error messages and status messages in log. Redefine dlf format for this. (Requested by schr) - In email convertors: use timestamps in a smarter way: do not use same timestamp on all dlf lines about same message. * responder - Make it easy to publish html reports on a website automatically, get the responder using this. E.g.: get the responder respond with an url like http://logreport.net/reports/0914616612616663161/, where an html report with graphics is published for, say, a week. - Add a http file upload interface to the responder. (Josh is working on this.) * all - Sanitize debug output, and use it. (Requested by, a.o., schr) - Clean up stuff which goes to stderr. document a policy on this, so that we can build a lr2dlf thingie. - Decide on a filesystem layout for a .dlf archive. This should be used to combine old reports / logs to reports over longer periods. It should enable dealing sanely with logs which don't span a 00:00 - 23:59 period. A possible implementation: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The archive should store files in .xml and .dlf format. It should reside somewhere under /var/lib/lire/data. (The current Lire .deb creates /var/lib/lire .) Per kept file, we wanna be able to find out: - filename - service - superservice - timerange - subject/hostname/fromaddress (maybe even complete mailheaders of email message which contained the log file) - some external id (e.g. hostname, to be able to merge different reorts which report on the same thing) - format (xml, log, report, or maybe even something else) We use an 'LR_ID' to identify a job for the Lire system, i.e. a received email message or local log file. We use a 'REPORT_ID' to identify a report. One log file could get split in parts about e.g. different days. For each day, a separate report could get generated. Other ways to split are possible (e.g. for log files which carry lines about different hosts or even services.) Perhaps it's wise to include an LR_ID in the generated report. We could store meta information in an index file (e.g. /var/lib/lire/data/meta/index), which could look like: LR_ID-9871614364-1456 subject gelfand test LR_ID-9871614364-1456 service email LR_ID-9871614364-1456 time_span 2001050427 LR_ID-9871614364-1456 type rcvdemail LR_ID-987161443426-234 time_span 20010527-20010528 LR_ID-98716144999-234 time_span 200105270104-200105282359 LR_ID-98716144988-261 time_span 200105 LR_ID-98716144988-261 type report LR_ID-98716144988-261 dlflines 45 $LR_ID time_rfc "$RFC_TIME" $LR_ID time_begin "$TIME_BEGIN" $LR_ID time_end "$TIME_END" $LR_ID time_span "$LR_TIME" LR_ID-98716144988-261 extid gelfand_20010513 That is: idtag space key space value-with-possibly-embedded-spaces . type can be: rcvdemail, sntemail, report, log, dlf Perhaps we should think of some relational database model, and implement it accordingly. time ranges should be UTC, in "allmost human readable format": yyyymm[dd[hh[mm[ss]]]][-yyyymm[dd[hh[mm[ss]]]]] The directorylayout could be: service.subservice (sub)reporttype /var/lib/lire/data/report/xml/email/postfix/complete/extid/20010527-20010528 /var/lib/lire/data/report/html/ /var/lib/lire/data/report/ascii/ /var/lib/lire/data/email/raw/ /var/lib/lire/data/email/plain/ /var/lib/lire/data/log/dlf/www[/apachecommon?]/viewtype/extid/200105 ^^^^^^^^ We should get rid of "subservices" like apache's common. where should different 'views' go? and filtered logs? E.g., currently we have 'filter' and 'filter_messages' for email. The are filters from dlf to dlf. Currently ( Fri Jun 22 00:17:54 CEST 2001 ) these fields are used by the various scripts: field set by read by extid lr_processmail(if ARCH), lr_log2mail(if ARCH) time_span lr_dlf2xml(if ARCH) lr_processmail (if ARCHIVE set, to construct name stored file), lr_log2mail (if ARCHIVE set), lr_log2report (ARCH) ,lr_log2xml (ARCH) time_rfc lr_dlf2xml (if ARCH) time_begin lr_dlf2xml (if ARCH) time_end lr_dlf2xml (if ARCH) loglines lr_log2report (if ARCH) dlflines lr_log2xml lr_log2report (and purged if ARCH unset) After a the system runs for a while, var/lib/lire/data could be holding files like these: data/email/raw/email/exim/exim_anon_from_hibou/20001202121106-20011130081041 data/log/dlf/email/complete/exim_anon_from_hibou/20001202121106-20011130081041 data/log/dlf/email/complete/localhost/20001202121106-20011130081041 data/log/dlf/email/filter/exim_anon_from_hibou/20001202121106-20011130081041 data/log/dlf/email/filter/localhost/20001202121106-20011130081041 data/log/dlf/email/filter_messages/exim_anon_from_hibou/20001202121106-20011130081041 data/log/dlf/email/filter_messages/localhost/20001202121106-20011130081041 data/log/dlf/www/complete/localhost/20010626053604-20010626142307 data/log/dlf/www/filter_pics/localhost/20010626053604-20010626142307 data/log/dlf/www/filter_trackpage/localhost/20010626053604-20010626142307 data/log/raw/email/complete/exim_anon_from_hibou/20001202121106-20011130081041 data/log/raw/www/complete/localhost/20010626053604-20010626142307 data/meta/index data/report/ascii/email/exim/complete/exim_anon_from_hibou/20001202121106-20011130081041 data/report/xml/email/complete/exim_anon_from_hibou/20001202121106-20011130081041 data/report/xml/www/complete/localhost/20010626053604-20010626142307 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - Extract the name of the loghost from the submitted log file, use this in the generated report. - Maybe it's useful to store complete header of incoming mail in a control file. * Documentation - include doc/images/reportgen.png and dlf2email.png in the developers (or users) manual. - use pod2html or dwww to generate manpages in html format. - Deal with obsolete docs: Merge ./doc/blurb with current README file. - Add pointer to cvs-hibou ... copyright.xml to developers reference. * Other projects - Take a look at others: - an exhaustive list is being maintained by Tina Bird, from securityfocus's LogAnalysis mailinglist - syslog-summary, a python script by Lars Wirzenius : This program summarizes the contents of a log file written by syslog, by displaying each unique (except for the time) line once, and also the number of times such a line occurs in the input. The lines are displayed in the order they occur in the input. License is GPL http://packages.debian.org/stable/admin/syslog-summary.html - logtools, a bunch of tools, written in C++, for managing log files, by Russell Coker . GPL-ed. fmerge - merge common-log-format web logs in order without sorting (good for when you have a gig of logs). logprn - like "tail -f" but after a specified time period of inactivity will run a program (such as lpr) and pipe the new data to it) funnel - pipe one stream of data to several files or processes. clfsplit - split CLF format web logs by client IP address. clfdomainsplit - split CLF logs by server domain. http://www.coker.com.au/logtools/ - xlogmaster, a monitoring program, soon to be replaced by GNU AWACS, by the GNU project. The Xlogmaster is a program that lets you monitor everything that's going on on your system in a very quick and comfortable way. It allows reading log files, devices or running status-gathering programs, translating all data (if wished) and displaying it with filters for highlithing / lowlighting / hiding lines or taking actions upon user-defined events. http://www.gnu.org/software/xlogmaster/ - Log Tool, a syslog parser and convertor, by A.L.Lambert , sponsored by ManISec Inc. Logtool is a command line program that will parse syslog (and syslog-like) log files into a more palatable format. It will take anything resembling a standard syslog file (this includes syslog-ng, and probably most of the other variants out there), and crunch it into one of the following formats for your viewing pleasure: * ANSI (colorized for easy "at a glance" viewing) * ASCII (for e-mail'ed reports, and term's that don't support color) * CSV (for importing into your favorite spreadsheet/database) * HTML (for generating web pages) * RAW (for no good reason) http://www.xjack.org/logtool/ - Analog, shows the usage patterns on your web server http://www.analog.cx/ - acidlab -- Analysis Console for Intrusion Databases The Analysis Console for Intrusion Databases (ACID) is a PHP-based analysis engine to search and process a database of incidents generated by security-related software such as IDSes and firewalls (e.g. Snort, ipchains). http://acidlab.sourceforge.net/ - pflogsumm , a Postfix log file analyser, written in perl, GPL-ed pflogsumm.pl is designed to provide an over-view of postfix activity, with just enough detail to give the administrator a "heads up" for potential trouble spots http://jimsun.linxnet.com/postfix_contrib.html - fwanalog , a shell script to analyze firewall log files, by Balázs Bárány, GPL-ed fwanalog is a shell script that parses and summarizes firewall log files. It currently (version 0.1) understands logs from ipf (tested with OpenBSD 2.8's ipf) and Linux 2.4 iptables http://tud.at/programm/fwanalog/ - The Webalizer , a web server log file analysis program, by Bradford L. Barrett The Webalizer is a fast, free web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser. http://www.mrunix.net/webalizer/ - Bug#98702: wnpp: ITP: libunix-syslog-perl - Take a look at the CPAN perl module Log::LogLite. - logcheck, http://www.psionic.com/abacus/logcheck - Take a look at what netstat does. E.g. on http://v1.nedstatbasic.net/stats?AAsIQwGx7TwVU48pu4qo/jS/exEw - NISCA, an MRTG replacement NISCA is a replacement for MRTG. It stands for "Network Interface Statistics Collection Agent". It gives traffic statistics on the network interfaces on routers and switches and whatnot. It doesn't require SNMP to do its job. http://www.isthisthingon.org/nisca, - swatch -- log file viewer with regexp matching http://www.oit.ucsb.edu/~eta/swatch - fwlogwatch -- Firewall log analyzer: ipchains, netfilter/iptables, ipfilter, Cisco IOS and Cisco PIX log summary reports in text and HTML form, http://cert.uni-stuttgart.de/projects/fwlogwatch/ - News reporting, see vanbaal@gelfand email message 200103240017.f2O0H1N03210@nerys.ehv.lx - webreport : http://www.inter7.com/webreport/ web report is a web log statistics reporting program especially designed for virtual hosting sites. It is also very useful for single hosting sites. The main difference between web report and other statistics programs is a configuration file which allows for easy manipulation of the features. - log2mail : log2mail is a small daemon watching log files and sending mail to a specified address if a regular expression is matched. - modlogan : they're doing very similar stuff as we are. - LogTrend : http://www.logtrend.org/english/ * Various - secondary.com ns records. - debian package: Make sure service/all/etc/lr_spoold caller gets installed in /etc/init.d/ when doing a make, in case we're on a GNU/Linux platform. - get lr_rawmail2mail deal with more than one log file. see mail wytze to development: we should parse subject on the client side. - get scripts in bin/ behave sanely when running as script --help and script --version. - Add new services: [23-Mar:13:52 jama] joostvb: hoe hoog staan auth.log en kern.log op je todo lijstje? [23-Mar:13:54 jama] deze kunnen wel nuttig zijn bij onregelmatigheden. [23-Mar:13:55 joostvb] niet zo hoog, nog [23-Mar:13:55 jama] en misschien at scanlog reporting. [23-Mar:13:55 jama] oke. [23-Mar:13:56 joostvb] nu staan ze erop, tnx :) I.e., auth.log and kern.log could be useful in case "irregularities" occur. - Document list of supported services / superservices in _one_ place. Refer to this place in manpages. - Think about using mktemp or tempfile. (Currently we use our own tmpdir.) * Configuration - Decide wether we need AC_CHECK_PROG(HASPDFXMLTEX, pdfxmltex, yes, no), AC_CHECK_PROG(HASJAVA, java, yes, no) and DBKXSLXHTML in configure.in. Currently (Sat Jun 23 11:08:06 CEST 2001) these are not being used. * Images - sanitize layout of graphics: Y-range is too large, X's to wide (rotate 90 degrees?), or label? * Packaging - FreeBSD port? Openpackages package for *BSD's ( http://openpackages.org/ ) ? * Other cvs-sourceforge/logreport/service/doc/BUGS for stuff about Lire cvs-sourceforge/logreport/docs/website/new_content.txt for stuff about the website (logreport/docs/devel/notes.txt should get merged with this, actually)