Lire Developer's Manual Joost van Baal Egon L. Willighagen Francis J. Lacoste Copyright © 2000, 2001, 2002, 2003, 2004 Stichting LogReport Foundation This manual is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this manual (see COPYING); if not, check with http://www.gnu.org/copyleft/gpl.html or write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111, USA. Revision History Revision 2.0.3 $Date: 2008/03/03 06:30:02 $ $Id: dev-manual.dbx,v 1.90 2008/03/03 06:30:02 vanbaal Exp $ __________________________________________________________________ Table of Contents Preface What This Book Contains How Is This Book Organized? Conventions Used If You Don't Find Something In This Manual I. Lire Architecture 1. Architecture Overview Lire's Design Patterns Log File Normalisation Log Analysis Report Generation Report Formatting and Other Post-Processing Going Further II. Using the Lire Framework 2. Writing a New DLF Converter Prerequisites The common_syslog Log Format Creating the DLF Converter Skeleton Adding a Constructor The Meta-Data Methods The DLF Converter Name Providing Information To Users Providing Information to the Framework The Conversion Methods Registering Your DLF Converter with the Lire Framework DLF Converter API 3. Writing a DLF Schema Designing the ftpproto schema Creating The Schema File Adding the Schema's Description Defining the Schema's Fields Installing The Schema 4. Writing a New DLF Analyser Writing a Categoriser Defining The Extended Schema Defining the Categoriser Categoriser Configuration Categoriser Implementation Writing an Analyser DLF Analyser API 5. Writing a New Report Filter Specification III. Developer's Reference 6. Lire Data Types Lire Textual Elements title element DocBook Elements description element 7. Common Textual Elements to All XML Formats Lire Data Types Parameter Entities Boolean Type Integer Type Number Type String Type Timestamp type Time Type Date Type Duration Type IP Type Port Type Hostname Type URL Type Email Type Bytes Type Filename Type Field Type Superservice Type Related Types 8. The Lire Report Configuration Specification Markup Language The Lire Report Configuration Specification Markup Language config-spec element summary element Parameter Specifiations Elements 9. The Lire Report Configuration Markup Language The Lire Report Configuration Markup Language config element global element param element 10. The Lire DLF Schema Markup Language The Lire DLF Schema Markup Language The dlf-schema element extended-schema element derived-schema element field element 11. The Lire Report Specification Markup Language The Lire Report Specification Markup Language report-spec element global-filter-spec element display-spec element param-spec element param element chart-configs element Filter expression elements Report Calculation Elements 12. The Lire Report Markup Language The Report Markup Language report element Meta-information elements section element subreport element missing-subreport element table element table-info element group-info element column-info element group-summary element group element entry element name element value element chart-configs element IV. Lire Developers' Conventions 13. Contributing Code to Lire 14. Developers' Toolbox Required Tools To Build From CVS Accessing Lire's CVS CVS primer SourceForge Mailing Lists 15. Coding Standards Shell Coding Standards Perl Coding Standards 16. Making Lire "Test-infected" Unit Tests in Lire PerlUnit Writing Tests Running Tests Some "Best Practices" on Unit Testing 17. Commit Policy CVS Branches Hands-on example Naming, what it looks like Creating a Branch Accessing a Branch Merging Branches on the Trunk 18. Testing and debugging Test before releasing Test-installations and test-runs Using the Perl debugger on Lire code 19. Making a Release Setting version in NEWS file, checking ChangeLog Tagging the CVS Building The Tarball Building The Debian Package Building The RPM Package Making sure the FreeBSD port gets updated Uploading The Release The LogReport Webserver Advertising The Release SourceForge Freshmeat.net 20. Website Maintenance Documentation on the LogReport Website Publishing the DTD's 21. Writing Documentation Plain Text Perl's Plain Old Documentation: maintaining manpages Docbook XML: Reference Books and Extensive User Manuals V. Implementation Details 22. Adding a New Superservice in Lire's Distribution 23. Issues with Report Merging 24. Overview of Lire scripts 25. Source Tree Layout Glossary List of Figures 1.1. Log Processing in the Lire's Framework 1.2. The Log Normalisation Process 1.3. The Log Analysis Process 1.4. Report Generation Process 1.5. Processing of the XML Report Using The APIs List of Tables 11.1. weekly overview List of Examples 11.1. timeslot with 1d unit 11.2. timeslot with 2m unit 3. DNS DLF Excerpts Preface Table of Contents What This Book Contains How Is This Book Organized? Conventions Used If You Don't Find Something In This Manual Log file analysis is both an essential and tedious part of system administration. It is essential because it's the best way of profiling the usage of the service installed on the network. It's tedious because programs generate a lot of data and tools to report on this data are often unavailable or incomplete. When such tools exist, they are generally specific to one product, which means that you can't compare e.g. your Qmail(TM) and Exim(TM) mail servers. Lire is a software package developed by the Stichting LogReport Foundation to generate useful reports from raw log files of various network programs. Multiple programs are supported for various types of network services. Lire also supports various output formats for the generated reports. What This Book Contains This book is the Lire Developer's Manual. Its purpose is to present Lire as a log analysis framework. To this ends, it describes the architecture and design of Lire and contains comprehensive instructions on how to use it. Its intended audience is system administrators or programmers who want to extend Lire or want to understand its internals. There is another book, the Lire User's Manual which describes how to install, configure and use Lire, as a "off-the-shelf" log analyzer. Its intended audience is system administrators who want to install and use Lire to gather information about the services operating on their network. How Is This Book Organized? This book is divided in five parts. Part I gives an overview of the architecture and design of Lire. You will find in Part II information on extending Lire. In this part, you will learn how to add a new DLF format to Lire, write log file converters and add reports for a superservice. Part III is a reference section which gives comprehensive details about the various XML formats used by Lire and gives in-depth descriptions of its various APIs. Part IV is targeted at developers who want to participate in Lire's development. It contains information about CVS access, coding conventions, tools needed to build from CVS, release management and other aspects important to those part of the Lire development team. Furthermore, it gives some information on how to contribute code to Lire, as an external party. Finally, Part V contains various implementation details that may be interesting to people wanting to learn more about Lire internals. Conventions Used If You Don't Find Something In This Manual You can report typos, incorrect grammar or any other editorial problems to . We welcome reader's feedback. If you feel that certain parts of this manual aren't clear, are missing information or lacking in any other aspect, please tell us. Of course, if you feel like writing the missing information yourself, we'll very happily accept your patch. We will make our best effort to improve this manual. Remember, that there is another manual, the Lire User's Manual which contains comprehensive information on how to install, use and configure Lire. It also contains reference information about all of Lire's standard reports and supported services. There are various public mailing lists for Lire's users. There is a general users' discussion list where you can find help on how to install and use Lire. You can subscribe to this list by sending an empty email with a subject of subscribe to . Email for the list should be sent to . You can keep track of Lire's new release by subscribing to the announcement mailing list. You can subscribe yourself by sending an empty email with a subject of subscribe to . Finally, if you're interested in Lire's development, there is a development mailing list to which you can subscribe by sending an empty email with a subject of subscribe to . Email to the list should be sent to . All posts on these lists are archived on a public website. Part I. Lire Architecture Table of Contents 1. Architecture Overview Lire's Design Patterns Log File Normalisation Log Analysis Report Generation Report Formatting and Other Post-Processing Going Further Chapter 1. Architecture Overview Table of Contents Lire's Design Patterns Log File Normalisation Log Analysis Report Generation Report Formatting and Other Post-Processing Going Further From a developer's point of view, Lire intends to be the universal log analysis framework. To this end, it provides a reliable, complete, framework upon which to build log analysis and reporting solution. Lire, the tool, is a proof of the versality and extendability of the framework as it is able to produce reports for many of the services that run in today's heterogeneous networks in a variety of output formats. As a framework, Lire is the best choice to replace all those home-grown scripts developed to produce reports from all the log files from the little-known products or custom-developed programs that run on your system. Leveraging Lire framework will make those scripts a lot more versatile while not being really more complicated to develop. It will be easier to add new reports or to support multiple report formats. Figure 1.1. Log Processing in the Lire's Framework Log Processing in the Lire's Framework The Lire's framework divides log analysis in four different processes. The figure Figure 1.1, "Log Processing in the Lire's Framework" shows those four processes: 1. Log Normalisation. The first process normalise logs from different products into a generic format that can be shared by all products that have similar functionality. For example, log files from products as different as Apache(TM) and Microsoft Internet Information Server(TM) will be transformed into an identical format. 2. Log Analysis. In the analysis process, other information is created, inferred or extracted from the normalised data. For example, an anlyser in the www superservice infers the browser used by the client from the referrer information. 3. Report Generation. The third process generates a report from the normalised and analysed data. This process is done by a generic report engine that computes the report based on specifications describing what and how the information should appear in the report. The report is generated in a generic XML format. 4. Report Post-processing and Formatting. The last process converts the generic report into a specific format like ASCII, PDF, HTML but other kind of post-processing (like charts generation) can also be accomplished in this stage. Before going into a more detailed description of each of these procesesses, we'll introduce some of the common design's patterns that you'll find throughout the Lire's framework. Lire's Design Patterns At the center of each of these processes is an XML based file format. Having things specified in data files makes it easier to extend. For example, the reports are built using a generic report builder which finds the instructions on how to build the reports in XML files. So this makes it easy to add new information to a report: you just have to write an XML file. The fact that there are a lot of tools to process XML files is also an interesting aspect. For example, emacs lovers will appreciate the help that its psgml module gives them in writing report specifications. Another important aspects is that we tried to interoperate and to build upon other standards while defining our XML formats . The best illustration of this is that in all the XML file formats that Lire use, a DocBook subset is used for all elements related to narrative descriptions. Another common aspect you'll encounter is that each of these processes and XML file formats come with an API to manipulate them, making it easy to add functionalities at each processing stage. APIs are also a good thing because, even if in theory an open file format somewhat constitutes an API, having libraries that provide convenient access to the file formats makes it a lot easier to write new components providing new functionalities. Log File Normalisation Figure 1.2. The Log Normalisation Process The Log Normalisation Process The first process of the Lire log analysis framework is the log file normalisation process. That process is summarized in the Figure 1.2, "The Log Normalisation Process" figure. This process is centered around the DLF concept which is kind of a universal log format. DLF stands for Distilled Log Format. The concept is that each product specific log file is transformed into a log format that can be common to all the products providing similar functionalities. In Lire's terminology, a class of applications providing similar functionality (e.g. MTA's supplying email) is called a superservice. Still in Lire's terminology, the service from which the super is derived (e.g. postfix or sendmail) refers to the native log format that is converted in the superservice's DLF. One can view the DLF as a table where the rows are the logged events and the fields are logged information related to each event. Since the information logged by an email server is totally different from a web server, each superservice should have its own data models. In Lire, the data model is called a DLF schema. The DLF schemas are defined in XML files using the DLF Schema Markup Language. The schema describes what fields are available for each logged events. One interesting aspect of Lire, is that altough the email DLF is used by all email servers, the email DLF data model isn't restricted to the lowest common denominator across the log formats supported by each email servers. In the Lire's architecture, the superservice's schema can represent the information logged by the most sophisticated product. When some part of the information isn't available in one log format, the DLF log file will contain this information and the reports that needs this information won't be included. This architecture means that to support a new service, i.e. a new log format, in Lire you just need to write a plugin, called a DLF converter. This is just a simple perl module that parses the native log format and maps the information according to the schema. Log Analysis After normalisation, comes the analysis process. The analysis process responsability is to extracts, infers or derives other information from the logged data. Since the superservice's logged data is in a standard format, the analysers are generic in the sense that they can operate for all the superservice's supported log formats, if the product's was clever enough to log the information required by the analyser. The analysis process is shown in the Figure 1.3, "The Log Analysis Process" figure. Figure 1.3. The Log Analysis Process The Log Analysis Process Since each analyser can add information to or create a new DLF, each analyser will generate data according to special kind of schemas. Lire's framework include two kind of analysers. The difference between the two resides in the mapping between the source data and the new data they generate. Extended analysers generate new data for each DLF record whereas derived analysers are used when the new data doesn't have a one-to-one mapping with the source data. The analysers produce data according to a data model which is specified in other DLF schemas. There are extended schemas and derived schemas. An extended schema simply adds new fields to the base superservice's schema. For example, in the web superservice's schema, a lot of information can be obtained from the referer field. From this information, it is possible to guess the user's browser, language or operating system. Those fields are specified in the www-referer extended schema; one analyser is responsible for extracting this information from the referer field. But sometimes the analysis cannot just simply add information to each event record, an altogether different schema is needed then. For those cases, there is the derived schema. An example of the use of such a schema in the current Lire distribution is the analyser which creates user sessions based on the logged client IP address and user agent. This analyser defines the www-session derived schema. Report Generation Once you have all this data, it's time to generate some useful reports out of it. Lire's framework includes a generic report builder. What Lire calls a report is actually a collection of what one may understand as reports; Lire however speaks about a subreports. For example, the proxy's superservice report will contain subreports about the top visited sites, another subreport on the cache hit ratio, as well as several others. The subreports are defined using the Report Specification Markup Language. This markup language contains elements for several things: information regarding the schema on which it operates; descriptions that should be included in the generated report to help in the interpretation of the data; parameters that can be used to modify the generated report (for example, to generate a top 20 subreport instead of a top 10); a filter that selects the records that will be used for the subreport; and finally the operations that make up the subreport: grouping, summing, counting, etc. The report markup language covers most simple needs and there is an extension element as well as an API that can be used to hook in more fancy computations. There are no subreport specifications in the current distribution that make use of this feature yet, however. You can see an overview of this process in the Figure 1.4, "Report Generation Process" figure. Figure 1.4. Report Generation Process Report Generation Process The generated report is another XML file that uses another markup language, this time called the Lire's Report Markup Language. An actual report contains the help descriptions from the report specifications, information on the subreport specifications used, as well as the actual subreport's data.Using another intermediary XML file as output format makes all sort of things possible in the formatting and post-processing stage. Report Formatting and Other Post-Processing The last process works with the generic XML report. Using a domain-specific XML format for the generated format makes it easy for the framework to support multiple different formats. Supporting a new output format is just a matter of writing a new module that processes the XML report file. Figure 1.5. Processing of the XML Report Using The APIs Processing of the XML Report Using The APIs As shown in the Figure 1.5, "Processing of the XML Report Using The APIs" figure, you can also process the XML files using the APIs to the XML report format. Going Further As you can see form this overview, the Lire framework provides a powerful architecture to use for your log analysis needs. The architecture provides extensibility from log normalisation to post-processing of the reports. Exactly how to use the framework is the topic of the next part. Part II. Using the Lire Framework In this part, you'll learn how to leverage the Lire's framework for your own log analysis need. The most common use cases are developing a converter for a new log format and developping new reports. The first chapter Chapter 2, Writing a New DLF Converter explains how to write a converter for a new log format. The responsibility of the converter is to map the information contained in a log file to the data model of a specific DLF schema. When developping a converter for a log format which doesn't fall in the domain one of the existing DLF schema, you'll need to write a new one. This is the topic of the following chapter Chapter 3, Writing a DLF Schema. The chaper Chapter 4, Writing a New DLF Analyser gives information on how to write DLF analysers that can adds data to the base log information. The chapter Chapter 5, Writing a New Report this part gives some notes on how to develop new reports. Table of Contents 2. Writing a New DLF Converter Prerequisites The common_syslog Log Format Creating the DLF Converter Skeleton Adding a Constructor The Meta-Data Methods The DLF Converter Name Providing Information To Users Providing Information to the Framework The Conversion Methods Registering Your DLF Converter with the Lire Framework DLF Converter API 3. Writing a DLF Schema Designing the ftpproto schema Creating The Schema File Adding the Schema's Description Defining the Schema's Fields Installing The Schema 4. Writing a New DLF Analyser Writing a Categoriser Defining The Extended Schema Defining the Categoriser Categoriser Configuration Categoriser Implementation Writing an Analyser DLF Analyser API 5. Writing a New Report Filter Specification Chapter 2. Writing a New DLF Converter Table of Contents Prerequisites The common_syslog Log Format Creating the DLF Converter Skeleton Adding a Constructor The Meta-Data Methods The DLF Converter Name Providing Information To Users Providing Information to the Framework The Conversion Methods Registering Your DLF Converter with the Lire Framework DLF Converter API Before Lire can do various analysis and generate reports on the data contained in your various log files, it must first be converted to a common data model. This is specifically the job of the DLF converter. So if you want to generate the same reports for your RealServer(TM) log files (currently unsupported) than for you web server, you only need to develop a DLF converter which maps the RealServer content to the www DLF schema. Note If no existing DLF schemas represent correctly the domain of your application log file, it is easy to develop a new one. Consult the chapter Chapter 3, Writing a DLF Schema for the whole story. This chapter will show you through an example how to develop a new DLF converter for a kind of useless log format: the common log format encapsulated in syslog. (It is useless because there is not many reasons to make your web server logs it requests through syslog. And it would be probably be simpler to just use the cut command to remove the syslog header.) Note The doc/examples in the source distribution contains another commented example which could serve as a starting point for your converters. Prerequisites Developing a new DLF converter requires some basic programming skills in perl. Altough not strictly necessarily, you should be familiar with perl object-oriented programming model. If you aren't, you should read perltoot(1) before continuing. The common_syslog Log Format The log format supported by our DLF converter is simply the standard Common Log Format supported by most web servers with a syslog header prepended to each line. Here is an example of what such a log file might contain: May 10 11:13:10 hibou httpd[12344]: Apache/1.3.26 (Unix) Debian GNU/Linux Embper l/1.3.3 PHP/4.1.2 mod_perl/1.26 configured -- resuming normal operations May 10 11:13:11 hibou httpd[12345]: 192.168.250.10 - - \ [10/May/2003:11:13:11 +0200] "GET /" HTTP/1.1 200 1523 May 10 11:13:12 hibou httpd[12346]: 192.168.250.10 - - \ [10/May/2003:11:13:11 +0200] "GET /images/logo.png" HTTP/1.1 200 1201 May 10 11:13:12 hibou httpd[12348]: 192.168.250.10 - - \ [10/May/2003:11:13:11 +0200] "GET /images/corner.png" HTTP/1.1 200 1021 Remember that the other layer is a syslog log file and could contains other things than only the web server's requests. The first line in the example isn't a request record but really what usually ends up in the "error_log" and is a message about the server starting. Creating the DLF Converter Skeleton Put simply, a DLF converter is a perl object which implements a set of predefined methods (aka an "interface" in the object-oriented jargon). Since a DLF converter is a perl object, it must be instantiated from a class. Classes in perl are defined in packages. We'll name the package which implements our converter MyConverters::SyslogCommonConverter. To create such a package, you need to create a file named MyConverters/SyslogCommonConverter.pm in a directory searched by perl. Note * You can obtain perl's default search list by running the command $ perl -V. * This search list can be modified by setting the PERL5LIB environment variables. Here is a first cut of our DLF converter: package MyConverters::SyslogCommonConverter; use base qw/Lire::DlfConverter/; 1; The first line declare that the code is in the MyConvertersw::SyslogCommonConverter package. The second one specifies that objects in this package are subclasses of the Lire::DlfConverter packages. The last line fullfill perl's requirement that package returns a true value once they are initialized. This is a complete DLF, altough useless, DLF Converter. In fact, it isn't complete because if you tried to register an instance of that class, you'll get "unimplemented method" errors. Besides, we don't even yet have a formal way to create instance of our converter. This is our next task. Adding a Constructor The Lire framework doesn't place any restrictions on your DLF converter constructor. In fact, the constructor isn't used by the framework at all, it will only be used by your DLF converter registration script (the section called "Registering Your DLF Converter with the Lire Framework"). We will follow perl's convention of using a method named new for our constructor and of using an hash reference to hold our object's data. Here is our complete constructor: use Lire::Syslog; sub new { my $pkg = shift; my $self = bless {}, $pkg; $self->{syslog_parser} = new Lire::Syslog(); return $self; } Since our log format is based on syslog, we will reuse the syslog parsing code included in Lire. This is the reason we instantiate a Lire::Syslog object and save a reference to it in our constructor. The Meta-Data Methods The Lire::DlfConverter interface requires two kinds of methods. First, it requires methods which provide information to the framework on your converter. Second, it requires methods which will actually implement the conversion process. It this the format that this section documents. The DLF Converter Name The method name() should returns the name of our DLF converter. It is this name that is passed to the lr_log2report command. This name must be unique among all the converters registered and it should be restricted to alphanumerical characters (hyphens, period and underscores can also be used). We will name our converter common_syslog: sub name { return "common_syslog"; } Providing Information To Users The next two required methods are used to give more verbose information on your converter to the users. The converter's title() and description() can be use to display information about your converter from the user interface or to generate documentation. The title() should simply returns a string: sub title { return "Common Log Format embedded in Syslog DLF Converter"; } The description() method should returns a DocBook fragment describing your converter and the log formats it support. If you don't know DocBook just restrict yourself to using the para elements to make paragraphs: sub description { return <This DLF Converter extracts web server's requests and error information from a syslog file. The requests and errors should be logged under the httpd program name. The errors are mapped to the syslog schema, the requests are mapped to the www schema. Syslog records from another program than httpd are ignored. EOF } Providing Information to the Framework Two other meta-data methods are used by the framework itself. The first one specifies to what DLF schemas your DLF converter is converting to: sub schemas { return ( "www", "syslog" ); } In our case, we are converting to the syslog and www schemas. Like we described it in our converter's description, we will map the web server's error message to the syslog schema and the request logs to the www schema. Other alternatives would have been to only map the requests information to www schema or map all the non-request records to the syslog schema. The rationale behind the current choice (besides this being an example) is that it make it convenient to process one log file to obtain a report containing the requests and errors from our web server. For that use case, it is best to ignore the non-web server related stuff. The other method affects how the conversion process will be handled. Lire offers two mode of conversion, the line oriented one and the file oriented one. (Both will be described in the next section). If your log file is line-oriented (each lines is one log record) like most log files are, you should use the line-oriented conversion mode: sub handle_log_lines { return 1; } The Conversion Methods The actual conversion process is handled through three methods: init_dlf_converter, finish_conversion() and either process_log_file() or process_log_line() depending on the conversion mode (as determined by handle_log_lines()'s return value. Conversion Initialization The method init_dlf_converter() will be called once before the log file is processed. It should be use to initialize the state of your converter. Since our DLF Converter doesn't need any initialization and doesn't need any configuration, the method is simply empty: sub init_dlf_converter { my ( $self, $process ) = @_; return; } The $process parameter which is passed to all the processing methods is an instance of Lire::DlfConverterProcess. This is the object which is driving the conversion process and it defines several methods which you will use in the actual conversion process. Conversion Finalization The method finish_conversion() will be called once after the log file has been completely processed. This method will be mostly of use to stateful converter, that is DLF converters which generates DLF records from more than one line. Since this is not our case, we simply leave the method empty: sub finish_conversion { my ( $self, $process ) = @_; return; } The DLF Conversion Process Whether you are using the file-oriented or line-oriented conversion mode, the principles are the same. You extract information from the log file and creates DLF records from it. Your DLF converter communicates with the framework by calling methods on the Lire::DlfConverterProcess object which is passed as parameter to your methods. Here is the complete code of our conversion method: use Lire::Apache qw/parse_common/; sub process_log_line { my ( $self, $process, $line ) = @_; my $sys_rec = eval { $self->{syslog_parser}->parse( $line ) }; if ( $@ ) { $process->error( $@, $line ); return; } elsif ( $sys_rec->{process} ne 'httpd' ) { $process->ignore_log_line( $line, "not an httpd record" ); return; } else { my $common_dlf = {}; eval { parse_common( $sys_rec->{content}, $common_dlf ) }; if ( $@ ) { $sys_rec->{message} = $sys_rec->{content}; $process->write_dlf( "syslog", $sys_rec ); } else { $process->write_dlf( "www", $common_dlf ); } } } The first thing that should be noted is that in the line-oriented conversion mode, the method process_log_line() will be called once for each line in the log file. Secondly, the actual parsing of the line is done using two functions: parse_common and Lire::Syslog's parse. These methods simply uses regular expressions to extract the appropriate information from the line and put it in an hash reference. What is important is that these methods already uses as key names the schema's field names. Finally, you can see that there are four different methods used on the $process object to report different kind of information: Reporting Error The example uses the eval statement to trap errors during the syslog record parsing. If the line cannot be parsed as a valid syslog record, it is an error and it is reported through the error() method. The first parameter is the error message and the second one is the line to which the error is associated. This last parameter is optional. Ignoring Information When the syslog event doesn't come from the httpd process, we ignore the line. Ignored line are reported to the framework by using the ignore_log_line() method. The first parameter is the line which is ignored. The second optional parameter gives the reason why the line was ignored. Creating DLF Records Finally, DLF records are created by using the write_dlf() method. Its first parameter is the schema to which the DLF record complies. This schema must be one that is listed by your converter's schemas() method. The second parameter is the DLF data contained in an hash reference. The DLF record will be created by taking for each field in the schema the value under the same name in the hash. (Since in the syslog schema, the field which contains the actual log message is called message, this is the reason we are assigning the content value to the message key.) Missing fields or fields whose value is undef will contains the special LR_NA missing value marker. Keys in the hash that don't map to a schema's field are simply ignored. In our example, we distinguish between the server's error message (mapped to the syslog schema) and the request information (mapped to the www schema) based on whether parse_common succeeded in parsing the line. Saving Log Line Another possibility, not shown in our example, is to ask that the line be saved for a later processing. This is mostly of use to converters who maitains state between lines. In the cases, it is quite the case that there are related lines that are missing from the end of the log file. In that case, you save the line and they will automatically seen by the next run of your converter on the same DLF store. This option is only available in the line-oriented mode of conversion. File-Oriented Conversion The same principles apply when you are using the file-oriented mode of conversion. This mode will usually be used for binary log formats or format which aren't line-oriented like XML. For demonstration purpose, the following code could be added to transform our line-oriented converter into a file-oriented one: sub handle_log_lines { return 0; } sub process_log_file { my ( $self, $process, $fh ) = @_; my $line; while ( defined( $line = <$fh> ) { chomp $line; $self->process_log_line( $process, $line ); } } The difference between the above code and using the line oriented mode is that the framework won't be aware of the number of log lines processed and your converter might have troubles when processing log files which uses a different line-ending convention than the host you are runnig on. Bottom line is that you should use the line-oriented conversion mode when your log format is line oriented. Registering Your DLF Converter with the Lire Framework We first said that DLF converters are perl objects which implements the Lire::DlfConverter interface. What we did is write a class which implements the said interface. Creating the object from that class is the responsability of the DLF converter registration script. This is simply a snippet of perl code which instantiates your object and registers it with the Lire::PluginManager: use Lire::PluginManager; use MyConverters::SyslogCommonConverter; Lire::PluginManager->register_plugin( MyConverters::SyslogCommonConverter->new() ); That's all there is to it, really. You put this snippet in a file named syslog_common_init in one of the directories listed in the plugins_init_path configuration variable. Note Some other notes on this topic: 1. The file can actually be named anything you want, the name service_init just make it clear what is the purpose of the file. 2. The initial value of the plugins_init_path contains the directories sysconfdir/lire/plugins and HOME/.lire/plugins. You can change this list by using the lire tool. 3. Your registration script can create and register more than one object. You can now generate a www report for log files in that format using the command lr_log2report common_syslog < file.log. DLF Converter API The complete DLF Converter API documentation is included in POD format in the Lire distribution. It is usually formatted as man pages. You can alway read it using the perldoc command. The following packages documentation should be consulted: Lire::DlfConverter(3), Lire::DlfConverterProcess(3) and Lire::PluginManager(3). Chapter 3. Writing a DLF Schema Table of Contents Designing the ftpproto schema Creating The Schema File Adding the Schema's Description Defining the Schema's Fields Installing The Schema If you want to develop a DLF converter for an application whose logging data model isn't adequately represented by one of the existing DLF schema, you'll need to develop a new one. If you are familiar with SQL, a DLF schema is similar to a table schema description. A DLF file can be seen as a table, where each log record is represented by a table row. Each log record in the same DLF schema shares the same fields. Designing the ftpproto schema In this chapter, we will create a new schema for logging of FTP session. That DLF schema could serve for an improved DLF converter for log files generated by Microsoft Internet Information Server(TM). Lire currently has a DLF converter for these log files but the current ftp DLF schema is modelled after the xferlog log file which only represents file transfers whereas the log generated by Microsoft Internet Information Server(TM) contains more detailed information on the ftp session. Here is an example of such a log file: #Software: Microsoft Internet Information Server 4.0 #Version: 1.0 #Date: 2001-11-29 00:01:32 #Fields: time c-ip cs-method cs-uri-stem sc-status 00:01:32 10.0.0.1 [56]created spacedat/091001092951LGW_Data.zip 226 00:01:32 10.0.0.1 [56]created spacedat/html/bx01g01.gif 226 00:01:32 10.0.0.1 [56]created spacedat/html/catlogo.gif 226 00:01:32 10.0.0.1 [56]QUIT - 226 00:03:32 10.0.0.1 [58]USER badm 331 00:03:32 10.0.0.1 [58]PASS - 230 As you can see, this log file contains other information beyond the simple upload/download represented in the standard FTP schema. It a session identifier, the command executed, as well as the result code of the action. Our new schema should be able to represent these things. Creating The Schema File To create a DLF schema, you have to create a XML file named after your schema identifier: ftpproto.xml. Schema name should be made of alphanumeric characters. This schema identifier is case sensitive. You schema identifer shouldn't contains hyphens (-) or underscore characters (_). (The hyphen is used for a special purpose). All DLF schemas starts and ends the same way: The first lines contains the usual XML declaration and DOCTYPE declarations, you'll find in many XML documents. The real stuff starts at the lire:dlf-schema. What is important for your schema are the value of the superservice and timestamp attributes. The first one contains your schema identifier. It is called "superservice" for historical reasons. The other one should contains the name of the field which order the record by their event type. (See the section called "The Field Types" for more information.) The last line in the above excerpt would be the last thing in the file and closes the lire:dlf-schema element. Adding the Schema's Description The next things that goes into the schema file are the schema's title and description. Both are intended for developers to read and should be informative of the scope of the schema: DLF Schema for FTP Protocol This DLF schema should be used for FTP servers that have detailed information on the FTP connection in their log files. Each record represents a command done by the client during the FTP session. The content of the lire:description elements are DocBook elements. If you don't know DocBook, you just need to know that paragraphs are delimited using the para elements. Defining the Schema's Fields The only remaining things in the schema definitions are the field specifications. Here is the definition of the first one: This field contains the timestamp at which the command was issued. As you can see, the fields are defined using the lire:field element which has three attributes: name This attribute contains the name of the field. This name should contains only alphanumeric characters. It can also make use of the underscore character. type This attribute contains the type of the field. The available types will described shortly. label This should contains the column label that should be used by default in your report for data coming from this field. This label should be short but descriptive. The field's description is held in the lire:description element which contains DocBook markup. The field's description should be descriptive enough so that someone implementing a DLF converter for this schema knows what goes where. The Field Types The main types available for fields are: timestamp This should be use for field which contains a value to indicate a particular point in time. All timestamp values are represented in the usual UNIX convention: number of seconds since January 1st 1970. Each DLF schema must contains at least one field of this kind and its name should be in the lire:dlf-schema's timestamp attribute. hostname This type should be used for fields which contains an hostname or IP address. It is important to mark such fields, because it will possible eventually to resolve automatically IP addresses to hostname. bool Type for boolean values. number Type for numeric values. Important You shouldn't use this type when the values are limited in number and are semantically related to an enumeration like result code. You should use the string type for this. You should only use the number type for values which you'll want to report in classes instead on the individual values. bytes This type should be use for numeric values which are quantities in bytes. The more specific typing is useful for display purpose. duration This type should be use for numeric values which are quantities of time. The more specific typing is useful for display purpose. string This is the type which can be use for all other purpose. Note If you read the specifications, you'll find other types which are used. These additional types don't bring anything over the basic ones defined above and you shouldn't use them. In addition to the time field defined above, here are the remaining field defintions which make our complete ftpproto schema: This field should contains an identifier that can used to related the commands done in the same FTP session. This identifier can be reused, but shouldn't be while the FTP session isn't closed. This field contains the FTP command executed. The FTP protocol command names (STOR, RETR, APPE, USER, etc.) should be used. This should contains the FTP result code after executing the command. This field should contains the parameters to the FTP command. When the command involves a transfer like for the RETR or STOR command, it should contains the number of bytes transferred. This field contains the number of seconds executing the command took. Installing The Schema Making available the new schema to the Lire framework is pretty easy: just copy the file to one of the directories set in the lr_schemas_path configuration variable. By default, this variable contains the directories datadir/lire/schemas and HOME/.lire/schemas. Like all other configuration variables, its value can be changed using the lire tool. Since we want our schema to be available for other users as well, we will install it in the system directory: &root-prompt; install -m 644 ftproto.xml /usr/local/share/lire/schemas (In this case, Lire was installed under /usr/local. Chapter 4. Writing a New DLF Analyser Table of Contents Writing a Categoriser Defining The Extended Schema Defining the Categoriser Categoriser Configuration Categoriser Implementation Writing an Analyser DLF Analyser API In Lire, a DLF Analyser is a plugin that can extract or derived data from other DLF data. The idea is that these analysis do not depends on the underlying log format but that it can be found simply by using the data normalised in the DLF schema. For example, an analyser could assign category based on the url that was visited (like assigning the 'Public' or 'Private' category). This categorising operation doesn't depends on the log format but only on the presence of the requested_page field in the schema. This would be an example of a special kind of analyser, a Lire DLF Categoriser. This is a simpler analyser that can create new fields based on one DLF record. Note The doc/examples in the source distribution contains the complete code for this categoriser. There is a more generic kind of analysers that create data in another dlf streams based on arbitrary queries on the source DLF schema. An example of this kind is an analyser that construct session summary from the www requests. It reads the DLF records of the www DLF schema and creates www-user_session DLF records from that. Writing an analyser is similar to writing a DLF converter, so consult Chapter 2, Writing a New DLF Converter for the details converning registration and using configuration. Writing a Categoriser The simplest form of analyser are categorisers. In this section, we will show an example of how to write a categoriser that can assign categories using regular expressions to each www requested page. Defining The Extended Schema A categoriser writes DLF in an extended schema. An extended schemas is an extension of a base schema. If you are familiar with SQL you can see it as an inner join with the main schema. That is each fields in the main schema will have the extension fields of the extended schema. In our case our extended schema is very simple, it only adds one category field to the www schema. Defining an extended schema is identical to writing a DLF Schema with exception that we use a different top-level element. You should consult Chapter 3, Writing a DLF Schema for all the details. Here is the extended schema that our categoriser will use: Category Extended Schema for WWW service This is an extended schema for the WWW service which adds a category field based on the regexp matched by the requested_page. This fields contain the page category. The difference with a regular DLF schema is that it starts with the extended-schema tag which has a base-schema attribute which should contain the DLF schema or derived DLF schema that is extended. Defining the Categoriser Like a DLF Converter, the categoriser s an object deriving from a base class which defines the categoriser interface. In the categoriser case, that interface is Lire::DlfCategoriser. The categoriser also has to provide some meta-information to the framework. Here is the code for all of this: package MyAnalysers::PageCategoriser; use base qw/Lire::DlfCategoriser/; sub new { return bless {}, shift; } sub name { return 'page-categoriser'; } sub title { return "A page categoriser"; } sub description { return "A categoriser that assigns categories based on a map of regular expressions to categories."; } sub src_schema { return "www"; } sub dst_schema { return "www-category"; } The methods different from the DLf converter case are the src_schema which specifies the schema which to which fields are added and the dst_schema which gives the schema specifying the fields that will be added. Categoriser Configuration Our categoriser will assign categories based on a mapping from regular expression to category names. To be useful, this mapping should be configurable. Like all plugins in Lire, DLF categorisers can use the Lire Configuration Specification Markup Language to defines the configuration data they use (see Chapter 8, The Lire Report Configuration Specification Markup Language for the full details). The convention is that if there is a parameter named yourname_propeties, this is considered the configuration specification for the plugin yourname. This will mean that a little button will appear in the lire user interface so that the user can configure your plugin data. In our categoriser case, we will define a list of records which will enable the user to define many pairs of regular expression and category name: Page Categoriser Configuration This is a list of regexp that will be apply in this order along the category that should be applied when the regexp match. The Regexp-Category Association Regex The regular expression to test. Category This field contains the category that should be assigned. p .* Unknown This specification also sets a list containing one catchall regex with the category 'Uknown'. The user could add other values before that. An alternative implementation could define a field specifying the default category to assign when no regular expression matches. Categoriser Implementation Two methods are needed to implement the categoriser. The first is an initialisation method called initialise. This method receives as parameter the configuration data entered by the user. In our case, we will compile the regular expressions for faster processing later on : sub initialise { my ( $self, $config ) = @_; foreach my $map ( @$config ) { $map->[0] = qr/$map->[0]/; } $self->{'categories'} = $config; return; } The categorising is made in the categorise method. This method receives as parameter the DLF record to which the extended fields should be added. This DLF record is an hash reference containing one key for each of the fields defined in the source DLF schema. We simply assign the extended fields by adding new keys to the hash reference : sub categorise { my ( $self, $dlf ) = @_; foreach my $map ( @{$self->{'categories'}} ) { if ( $dlf->{'requested_page'} =~ /$map->[0]/ ) { $dlf->{'category'} = $map->[1]; return; } } return; } That's all. Like for the DLF converter you'll need to register this analyser with the Lire::PluginManager (see the section called "Registering Your DLF Converter with the Lire Framework" for more information. Writing an Analyser When a categoriser isn't sufficient for your needs, you can write an Lire::DlfAnalyser which gets complete control on the analysis process. The main difference with at categoriser is that the dst_schema method will contain refer to a derived schema instead of an extended schema. The core of the analyser is done in the analyse method that takes a reference to the store onto which data will be analysed and to a Lire::DlfAnalyserProcess callback object which should be use to write new DLF records and report errors. The method also receives the plugin configuration data. The analyser should create a Lire::DlfQuery to select the records necessary for its analysis. The doc/examples in the source distribution contains the a boiler plate for witing an Analyser. DLF Analyser API The complete DLF Analyser API documentation is included in POD format in the Lire distribution. It is usually formatted as man pages. You can alway read it using the perldoc command. The following packages documentation should be consulted: Lire::DlfAnalyser(3), Lire::DlfAnalyserProcess(3), Lire::DlfCategoriser(3), Lire::DlfQuery(3) and Lire::PluginManager(3). Chapter 5. Writing a New Report Table of Contents Filter Specification Writing a new report involves writing a report specification, e.g. /service//reports/top-foo-by-bar.xml, and adding this report along with possible configuration parameters to .cfg. E.g., to create a new report, based upon email/from-domain.xml: copy the file /usr/local/etc/lire/email.cfg to ~/.lire/etc/email.cfg. Copy the file /usr/local/share/lire/reports/email/top-from-domain.xml to e.g. ~/.lire/reports/reports/email/from-domain.xml. Edit the last file to your needs, and enable it by listing it in your ~/.lire/etc/email.cfg. Beware! The name of the report generally consists of alphanumerics and '-', but the name of parameters may not contain any '-' characters. It generally consists of alphanumerics and '_' characters. Filter Specification For now, you'll have to refer to the example filters as found in the current report specification files. We'll give one other example here: specifying a time range. Suppose you want to be able to report on only a specific time range. You could build a (possibly global and reused) filter like: When trying your new filter, you could install it in ~/.lire/filters/your-filter-name.xml. When lr_dlf2xml looks up a filter which was mentioned in the report configuration file, it looks first in ~/.lire/filters/, and then in .../share/lire/filters/. Part III. Developer's Reference Table of Contents 6. Lire Data Types Lire Textual Elements title element DocBook Elements description element 7. Common Textual Elements to All XML Formats Lire Data Types Parameter Entities Boolean Type Integer Type Number Type String Type Timestamp type Time Type Date Type Duration Type IP Type Port Type Hostname Type URL Type Email Type Bytes Type Filename Type Field Type Superservice Type Related Types 8. The Lire Report Configuration Specification Markup Language The Lire Report Configuration Specification Markup Language config-spec element summary element Parameter Specifiations Elements 9. The Lire Report Configuration Markup Language The Lire Report Configuration Markup Language config element global element param element 10. The Lire DLF Schema Markup Language The Lire DLF Schema Markup Language The dlf-schema element extended-schema element derived-schema element field element 11. The Lire Report Specification Markup Language The Lire Report Specification Markup Language report-spec element global-filter-spec element display-spec element param-spec element param element chart-configs element Filter expression elements Report Calculation Elements 12. The Lire Report Markup Language The Report Markup Language report element Meta-information elements section element subreport element missing-subreport element table element table-info element group-info element column-info element group-summary element group element entry element name element value element chart-configs element Chapter 6. Lire Data Types Table of Contents Lire Textual Elements title element DocBook Elements description element Lire Textual Elements This DTD module defines elements related that contains human-readable content in all the Lire DTDs. This module will also imports some DocBook XML V4.1.2 elements for richer semantic tagging. This module is also namespace aware and will honor the setting of LIRE.pfx to scope its element The latest version of that module is 2.0 and its public identifier is -//LogReport.ORG//ELEMENTS Lire Textual Elements V2.0//EN(TM). title element The title element contains a descriptive title. This element represent some title in Lire. It can be used to give a title to a report specification or to specifify the title of a report or subreport. The content of this element should be localized. This element doesn't have any attribute. DocBook Elements The standard para, formalpara and admonition elements (note, tip, warning, important and caution) are used as well as their content may be used. %DocBookDTD; description element The description element is used to describe an element. It can be used to describe DLF fields, describe a report specification or include descriptions in the generated reports. This element can contains one or more of the block-level DocBook elements we use. The content of this element should be localized. This element doesn't have any attributes. Chapter 7. Common Textual Elements to All XML Formats Table of Contents Lire Data Types Parameter Entities Boolean Type Integer Type Number Type String Type Timestamp type Time Type Date Type Duration Type IP Type Port Type Hostname Type URL Type Email Type Bytes Type Filename Type Field Type Superservice Type Related Types Lire Data Types Parameter Entities This module contains the parameter entity declarations for the data types used by all Lire DTDs. All defined data types have a .type parameter entity which defines their type as an XML type valid in an attribute declaration and a .name parameter entity that declare their name. Additionally, this module declares .types parameter entities that group related types together. The latest version of that module is 1.0 and its public identifier is -//LogReport.ORG//ENTITIES Lire Data Types V1.0//EN(TM). Boolean Type The bool type. It contains a boolean value, either 0, 1, f, t, false or true. Integer Type The int type can contains positive or negative 32 bits integer. Number Type The number type can contains any number either integral or floating point. String Type The string type contains any displayable text string. Timestamp type The timestamp type contains a time representation which contains the date and time informations. It can be represented in UNIX epoch time. Time Type The time type contains a time representation which contains only the time of the day, not the date. For example, this data type can represent 12h00, 15:13:10, etc. Date Type The date type contains a time representation which contains only a date. Duration Type The duration type contains a quantity of time. For example : 5s, 30h, 2days, 3w, 2M, 1y. (The authoritive list of supported duration types is coded in Lire::DataTypes::duration2sec.) IP Type The ip type contains an IPv4 address. Port Type The port type contains a port as used in the TCP to name the ends of logical connections. See also RFC 1700 and http://www.iana.org/numbers.htm. Commonly found in /etc/services on Unix systems. Hostname Type The hostname type contains an DNS hostname. (It can also contains the IPv4 address of the host). URL Type The url type represents URL. Email Type The email type can be used to represent an email address. Bytes Type The bytes type can be used to represent quantity of data. (5m, 1.2g, 300bytes, etc.) Filename Type The filenametype can be used to Represent the name of a file or directory. Field Type Important This type should be considered internal to Lire and shouldn't be used as a parameter or DLF field type. The field type can contains a DLF field name. It is used in the parameter specification to represent a choice of sort field for example. Superservice Type Important This type should be considered internal to Lire and shouldn't be used as a parameter or DLF field type. Related Types Chapter 8. The Lire Report Configuration Specification Markup Language Table of Contents The Lire Report Configuration Specification Markup Language config-spec element summary element Parameter Specifiations Elements The Lire Report Configuration Specification Markup Language Document Type Definition for the Lire Report Configuration Specification Markup Language. This DTD defines a grammar that is used to specify the configuration parameters used by the Lire framework. Besides the framework parameters, this DTD can be used by extensions writers to register their parameters with the framework. The configuration specifications are usually stored in prefix/share/lire/config-spec. Currently, Lire's configuration namespace is flat, which means that two different specification documents cannot define parameters of the same names. Elements of this DTD uses the http://www.logreport.org/LRCSML/ namespace that is usually mapped to the lrcsml prefix. The latest version of that DTD is 1.1 and its public identifier is -//LogReport.ORG//DTD Lire Report Specification Markup Language V1.1//EN(TM). Its canonical system identifier is http://www.logreport.org/LRCSML/1.1/lrcsml.dtd. This DTD uses the common lire-desc.mod module which is used to include a subset of DocBook in description and text elements. %lire-desc.mod; Each configuration specification is a XML document which has one config-spec as its root element. config-spec element Root element of a configuration specification document. It contains a list of parameter specifications.. This element doesn't have any attributes. summary element This element is used for a short one description of the parameter's purpose. Use the description element for longer help text. This element doesn't have any attribute. Parameter Specifiations Elements Common Attributes These attributes are common to all parameters specification elements: name Contains the name of the parameter to which this specification apply. required Determines if a valid value is required to make the container validates. Defaults to true. section This attribute can be used to set a menu section which can be used by configuration frontends to group parameters together. summary This attribute is equivalent to the summary element. obsolete This attribute can be used to mark a parameter as obsolete. Obsolete parameters will be removed from the specification in a future Lire release. boolean element This element is used to define a boolean parameter which can takes a yes or no value. This element doesn't have any specific attributes. integer element This element is used to define an integer parameter. This element doesn't have any specific attributes. string element This element is used to define an string parameter. These parameters can contains any value. This can have a valid-re attribute which specify a regular expression that the value must match. dlf-converter element This element is used to select a registered DlfConverter. This element doesn't have any specific attributes. dlf-schema element This element is used to select an available DlfSchema. If this element has the superservices set, only superservices can be selected. dlf-streams element This element is used to configure Lire::DlfStream in Lire::DlfStore. This element has no attribute. command element This element is used to define a command parameter. To be accepted as valid the parameter's value must point to an executable file or an executable file with the specified value must exist in a directory of the PATH environment variable. This element doesn't have any specific attributes. file element This element is used to define a file parameter. To be accepted as valid, the parameter's value must point to an existing file. This element doesn't have any specific attributes. directory element This element is used to define a directory parameter. To be accepted as valid, the parameter's value must point to an existing directory. This element doesn't have any specific attributes. executable element This element is used to define an executable parameter. To be accepted as valid, the parameter's value must point to an existing executable file. This element doesn't have any specific attributes. select element This element is used to define a parameter for which the value is selected among a set of options. The allowed set of options is specified using option elements. This element doesn't have any specific attributes. option element This element is used to define the valid values for a select parameter. This element doesn't have any specific attributes. list element This element is used to define a parameter that can contains an ordered set of values. The type of values which can be contained is specified using other parameters elements. Any number of parameters of the type specified by the children elements can be contained by the defined parameter. This element doesn't have any specific attributes. object element This element is used to define a parameter that will instantiate an object. The object will be instantiated by calling the "new_from_config()" class method defined in the package specified by the element's class attribute. The constructor will receive the hash instantiated from the parameter's components as parameter. The label attribute can be used to specify the contained element that should be used to represent this object in lists. output-format element This element is used to select an available OutputFormat. This element doesn't have any specific attributes. record element This element is used to define a parameter that holds record-like data. The label attribute can be used to specify the contained element that should be used to represent this record in lists. record element This element is used to define a parameter that holds record-like data. The label attribute can be used to specify the contained element that should be used to represent this record in lists. reference element This element is used to select from an index. The index in which the available values is taken is specified in the index attribute. report-config element This element is used to configure a report configuration. This element doesn't have any attribute. Each superservice can define a default report configuration using this element with a name of superservice_default. plugin element This element is used to define a parameter for which the value is selected among a set of options. The allowed set of options is specified using option elements. The element will also contain additional parameters based on the selected value. The available paramaters should be defined in a record or similar specification named name_properties. For example, the additional parameters when the option_1 option is selected will be found in the specification named option_1_properties. This element doesn't have any specific attributes. Chapter 9. The Lire Report Configuration Markup Language Table of Contents The Lire Report Configuration Markup Language config element global element param element The Lire Report Configuration Markup Language Document Type Definition for the Lire Report Configuration Markup Language. This DTD defines a grammar that is used to store the Lire configuration. The configuration is stored in one or more XML files. Parameters set in later configuration files override the ones set in the formers. The valid parameter names as well as their description and type are specified using configuration specification documents. Elements of this DTD use the http://www.logreport.org/LRCML/ namespace, which is usually mapped to the lrcml prefix. The latest version of the DTD is 1.0 and its public identifier is -//LogReport.ORG//DTD Lire Report Specification Markup Language V1.0//EN(TM). Its canonical system identifier is http://www.logreport.org/LRCML/1.0/lrcml.dtd. Each configuration specification is an XML document which has one config as its root element. config element Root element of a configuration document. It contains presently only one global element which is used to hold the global configuration parameters. This element doesn't have any attributes. global element This element starts the global configuration data. (This is the only scope currently defined). It contains a list of param elements. param element This element contains the parameter's value. The parameter's name is defined in the name attribute. The value attribute can be used to store scalar's value. When the parameter's type is a list, the values are stored in children param elements. Warning This element has a mixed content type. We should probably use a value attribute to hold scalar values. Chapter 10. The Lire DLF Schema Markup Language Table of Contents The Lire DLF Schema Markup Language The dlf-schema element extended-schema element derived-schema element field element The Lire DLF Schema Markup Language The Lire DLD Schema Markup Language (LDSML) is used describe the fields used by DLF records of a specific schema like www, email or msgstore. DLF schemas are defined in one XML document that should be installed in one of the directories that is included in the schema path (usually HOME/.lire/schemas and prefix/share/lire/schemas ). This document must conforms to the LDSML DTD which is described here. Elements of that DTD are defined in the namespace http://www.logreport.org/LDSML/ which will be usually mapped to the lire prefix (altough other prefixes may be used). The latest version of that DTD is 1.1 and its public identifier is -//LogReport.ORG//DTD Lire DLF Schema Markup Language V1.1//EN(TM). Its canonical system identifier is http://www.logreport.org/LDSML/1.1/ldsml.dtd. This DTD uses the common modules lire-types.mod which defines the data types recognized by Lire and lire-desc.mod which is used to include a subset of DocBook in description and text elements. %lire-types.mod; %lire-desc.mod; The top-level element in XML documents describing a DLF schema will be either a dlf-schema, extented-schema or derived-schema depending on the schema's type. DLF schemas are used as base schema for one superservice. For example, the DLF schema of the www superservice is named www. An extended schema is used to define additional fields which values are to be computed by an analyser. Extended schemas are named after the schema which they extend. For example, the www-attack extended schema adds an attack field which contains, if any, the "attack" that was attempted in that request. Derived schemas are used by another type of analysers which defines an entirely different schema. Whereas in the extended schema the new fields will be added to all the DLF records of the base schema, the derived schema will create new DLF records based on the DLF records of the base schema. An example of this is the www-session schema which computes users' session information based on the web requests contained in the www schema. Like for the extended-schema case, derived schemas are named after the base schema from which they are derived. The fields that makes each schema are defined using field elements. The dlf-schema element The dlf-schema element is used to define the base schema of a superservice. It should contains optional title and description elements followed by field elements describing the schema structure. The title is an optional text string that will be used to in the automatic documentation generation that can be extracted from the schema definition. The description element should describe what is represented by each DLF records (one web request, one email delivery, one firewall event, etc.) dlf-schema's attributes superservice This required attribute contains the name of the superservice described by this schema. This will also be used as the base schema's identifier. timestamp This required attribute contains the name of the field which contains the official event's timestamp. This field will be used to sort the DLF records for timegroup and timeslot report operations. extended-schema element This is the root element of an extended DLF Schema. Extended-schema defines additional fields that will be added to the base schema. It contains an optional title, an optional description and one or more field specifications. dlf-schema's attributes id This required attribute contains the identifier of that schema. This identifier should be composed of the superservice's name followed by an hypen (-) and then an word describing the extended schema. base-schema This required attribute contains the identifier of the schema that is extended. required-fields This optional attribute contains a space delimited list of field names that must be available in the base schema for the analyser to do its job. If any of the listed field is missing in the DLF, extended fields for the base schema cannot be computed. module This required attribute contains the name of the analyser that is used to compute the extended fields. This is a perl module that should be installed in perl's library path. derived-schema element This is the root element of a derived DLF Schema. The difference between a normal schema and a derived schema is that the data is generated from another DLF instead of a log file. derived-schema's attributes id This required attribute contains the identifier of that schema. This identifier should be composed of the superservice's name followed by an hypen (-) and then an word describing the derived schema. base-schema This required attribute contains the identifier of the schema from which this derived schema's data is derived. required-fields This optional attribute contains a space delimited list of field names that must be available in the base schema for the analyser to do its job. If any of the listed field is missing in the DLF, the derived records cannot be computed. module This required attribute contains the name of the analyser that is used to compute the derived records. This is a perl module that should be installed in perl's library path. timestamp This required attribute contains the name of the field which contains the official event's timestamp. This field will be used to sort the DLF records for timegroup and timeslot report operations. field element The field is used to describe the fields of the schema. Each field is specified by its name and type. The field element may contain an optional description element which gives more information on the data contained in the field. Description should be used to give better information to the DLF converter implementors on what should appears in that field. field's attributes name This required attribute contains the name of the field. type This required attribute contains the the field's type. default Warning This attribute is obsolete and will be removed in a future Lire release. label This optional attribute gives the label that should be used to display this field in reports. Defaults to the field's name when omitted. Chapter 11. The Lire Report Specification Markup Language Table of Contents The Lire Report Specification Markup Language report-spec element global-filter-spec element display-spec element param-spec element param element chart-configs element Filter expression elements Report Calculation Elements The Lire Report Specification Markup Language Document Type Definition for the Lire Report Specification Markup Language. This DTD defines a grammar that is used to specify reports that can be generated by Lire. Elements of this DTD uses the http://www.logreport.org/LRSML/ namespace that is usually mapped to the lire prefix. The latest version of that DTD is 2.1 and its public identifier is -//LogReport.ORG//DTD Lire Report Specification Markup Language V2.1//EN(TM). Its canonical system identifier is http://www.logreport.org/LRSML/2.1/lrsml.dtd. This DTD uses the common modules lire-types.mod which defines the data types recognized by Lire and lire-desc.mod which is used to include a subset of DocBook in description and text elements. %lire-types.mod; %lire-desc.mod; Each report specification is a XML document which has one report-spec as its root element. This DTD can also be used for filter specification which have one global-filter-spec as root element. report-spec element Root element of a report specification. It contains descriptive elements about the report specification (title, description). It contains the display elements that will be in the generated report (display-spec). It contains specification for the parameters that can be used to customize the report generated from this specification (param-spec). Finally, it contains elements to specify a filter expression which can be used to select a subset of the records (filter-spec) and the expression to build the report (report-calc-spec). report-spec's attributes id the name of the superservice for which this report is available : i.e. email, www, dns, etc. schema The DLF schema used by the report. This defaults to the superservice's schema, but can be one of its derived or extended schema. joined-schemas A whitespace delimited list of additional schemas that will be joined for this report. This will make all fields define in these schemas available for the operators. The schemas that can be joined depends on the specification's schema. id An unique identifier for the report specification global-filter-spec element Root element of a filter specification. It contains descriptive elements about the filter specification (title, description). It contains the display elements that will be used when that filter is used in a generated report (display-spec). It contains specification for the parameters that can be used to customize the filter generated from this specification (param-spec). Finally, it contains element to specify the filter expression which can be used to select a subset of the records (filter-spec). global-filter-spec's attributes superservice the name of the superservice for which this filter is available : i.e. email, www, dns, etc. schema the DLF schema used by the report. This defaults to the superservice's schema, but can be one of its derived or extended schema. joined-schemas A whitespace delimited list of additional schemas that will be joined for this report. This will make all fields define in these schemas available for the operators. The schemas that can be joined depends on the specification's schema. id An unique identifier for the filter specification display-spec element This element contains the descriptive element that will appear in the generated report. It contains one title and may contains one description which will be used as help message This element has no attribute. param-spec element This element contains the parameters than can be customized in this report specification. This element doesn't have any attribute. param element This element contains the specification for a parameter than can be used to customize this report. This element can contains a description element which can be used to explain the parameter's purpose. It is an error to define a parameter with the same name than one of the superservice's field. param's attributes name the name of the parameter. type the parameter's data type default the parameter's default value chart-configs element This element contains one or more chart configurations that should be copied to the generated subreport. These chart configurations are specified using the Lire Report Configuration Markup Language. This element has no attribute. Filter expression elements filter-spec element This element is used to select the subset of the records that will be used to generate the report. If this element is missing, all records will be used to generate the report. The content of this element are expression element which defines an expression which will evaluate to true or false for each record. The subset used for to generate the report are all records for which the expression evaluates to true. The value used to evaluate the expressions are either literal, value of parameter or value of one of the field of the record. Parameter and field starts with a $ followed by the name of the parameter or field. All other values are interpreted as literals. This element doesn't have any attribute. value element This expression element to false if the 'value' attribute is undefined, the empty string or 0. It evaluate to true otherwise. value's attributes value The value that should be evaluated for a boolean context. eq element ne element gt element ge element lt element le element match element The match expression element tries to match a POSIX 1003.2 extended regular expression to a value and return true if there is a match and false otherwise. match's attributes value the value which should matched re A POSIX 1003.2 extended regular expression. case-sensitive Is the regex sensitive to case. Defaults to true. not element and element or element Report Calculation Elements report-calc-spec element This element describes the computation needs to generate the report. It contains one aggregator element. This element doesn't have any attributes. Common Attributes All elements which will create a column in the resulting report have a label attribute that will be used as the column label. When this attribute is omitted, the name attribute content will be used as column label. All operation elements may have a name attribute which can be used to reference that column. (It is required in the case of aggrage functions). The primary usage is for controlling the sort order of the rows in the generated report. group element The group element generates a report where records are grouped by some field values and aggregate statistics are computed on those group of records. It contains the field that should be used for grouping and the statistics that should be computed. The sort order in the report is controlled by the 'sort' attribute. group's attributes name An identifier that can be used to reference this operation from other elements. This name will most often be used in the parent's sort attribute. If omitted a default name will be generated. sort whitespace delimited list of fields name that should used to sort the records. Field names can be prefixed by - to specify reverse sort order, otherwise ascending sort order is used. The name can also refer to the name attribute of the statistics element. limit limit the number of records that will be in the generated report. It can be either a positive integer or the name of a user supplied param. timegroup element The timegroup element generates a report where records are grouped by time range (hour, day, etc.). Statistics are then computed on these records grouped by period. timegroup's attributes name An identifier that can be used to reference this operation from other elements. This name will most often be used in the parent's sort attribute. If omitted a default name will be generated. label Sets the column label that will be used for column generated by this element. If omitted a default label will be generated. field the name of the field which is used to group records. This should be a field which is of one of the time types (timestamp, date, time). It defaults to the default timestamp field if unspecified. period This is the timeperiod over which records should be grouped. Valid period looks like (hour, day, 1h, 30m, etc). It can also be the name of a user supplied param. timeslot element The timeslot element generates a report where records are grouped according to a cyclic unit of time. The duration unit used won't fall over to the next higher unit. For example, this means that using a unit of 1d will generate a report where the stats will be by day of the week, 8h will generate a report by third of day, etc. The statistics are then computed over the records in the same timeslot. Example 11.1. timeslot with 1d unit Using a specification like: ... would generate a report like: Table 11.1. weekly overview Sunday ... Monday ... Tuesday ... ... ... Saturday ... where data will be summed over all Sunday's, Monday's, ..., and Saturdays found in the log. Example 11.2. timeslot with 2m unit Specifying unit="2m" would generate a line for each two months, giving a yearly view. timeslot's attributes name An identifier that can be used to reference this operation from other elements. This name will most often be used in the parent's sort attribute. If omitted a default name will be generated. label Sets the column label that will be used for column generated by this element. If omitted a default label will be generated. field the name of the field which is used to group records. This should be a field which is of one of the time types (timestamp, date, time). It defaults to the default 'timestamp' field if unspecified. unit This is the cyclic unit of time in which units the records are aggregated. It can be any duration value. (hour, day, 1h, 30m, etc). It can also be the name of a user supplied param. rangroup element The rangegroup element generates a report where records are grouped into distinct class delimited by a range. This element can be used to aggregates continuous numeric values like duration or bytes. Statistics are then computed on these records grouped in range class. rangegroup's attributes name An identifier that can be used to reference this operation from other elements. This name will most often be used in the parent's sort attribute. If omitted a default name will be generated. label Sets the column label that will be used for column generated by this element. If omitted a default label will be generated. field the name of the field which is used to group records. This should be a field which is of a continuous numeric type (bytes, duration, int, number). Time types aggregation should use the timegroup element or timeslot. range-start The starting index of the first class. Defaults to 0. This won't be used a the lower limit of the class. It is only used to specify relatively at which values the classes delimitation start. For example, if the range-start is 1, and the range-size is 5, a class ranging -4 to 0 will be created if values are in that range. It can be supplied in any continuous unit (i.e 10k, 5m, etc.) This can also be the name of a user supplied param. range-size This is the size of class. It can be supplied in any continuous unit (i.e 10k, 5m, etc.) It can also be the name of a user supplied param. min-value All value lower then this boundary value will be considered to be equal to this value. If this parameter isn't set, the ranges won't be bounded on the left side. max-value All value greater then this boundary value will be considered to be equal to this value. If this parameter isn't set, the ranges won't be bounded on the right side. size-scale The rate at which the size scale from one class to another. If it is different then 1, this will create a logarithmic distribution. For example, setting this to 2, each successive class will be twice larger then the precedent : 0-9, 10-29, 30-69, etc. field element This element reference a DLF field which value will be displayed in a separate column in the resulting report. Its used to specify the grouping fields in the group element and to specify the fields to output in the records element. field's attribute name The name of the DLF field that will be used as key for grouping. label Sets the column label that will be used for column generated by this element. If omitted a default label will be generated. sum element The sum element sums the value of a field in the group. sum's attributes name An identifier that can be used to reference this operation from other elements. This name will most often be used in the parent's sort attribute. label Sets the column label that will be used for column generated by this element. If omitted a default label will be generated. field the field that should be summed. ratio This attribute can be used to display the sum as a ratio of the group or table total. If the attribute is set to group the resulting value will be the ratio on the group's total sum. If the attribute is set to table, it will be expressed as a ratio of the total sum of the table. The defaults is none which will not convert the sum to a ratio. weight