%lire.ent; ]> Lire Developer's Manual Joostvan Baal Egon L. Willighagen Francis J. Lacoste 2000 2001 2002 2003 2004 Stichting LogReport Foundation This manual is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this manual (see COPYING); if not, check with http://www.gnu.org/copyleft/gpl.html or write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111, USA. &lire-version; $Date: 2008/03/03 06:30:02 $ $Id: dev-manual.dbx,v 1.90 2008/03/03 06:30:02 vanbaal Exp $ Preface Log file analysis is both an essential and tedious part of system administration. It is essential because it's the best way of profiling the usage of the service installed on the network. It's tedious because programs generate a lot of data and tools to report on this data are often unavailable or incomplete. When such tools exist, they are generally specific to one product, which means that you can't compare e.g. your &Qmail; and &Exim; mail servers. &Lire; is a software package developed by the &LogReport; to generate useful reports from raw log files of various network programs. Multiple programs are supported for various types of network services. &Lire; also supports various output formats for the generated reports.
What This Book Contains This book is the &dev-manual;. Its purpose is to present &Lire; as a log analysis framework. To this ends, it describes the architecture and design of &Lire; and contains comprehensive instructions on how to use it. Its intended audience is system administrators or programmers who want to extend &Lire; or want to understand its internals. There is another book, the &user-manual; which describes how to install, configure and use &Lire;, as a off-the-shelf log analyzer. Its intended audience is system administrators who want to install and use &Lire; to gather information about the services operating on their network.
How Is This Book Organized? This book is divided in five parts. gives an overview of the architecture and design of &Lire;. You will find in information on extending &Lire;. In this part, you will learn how to add a new DLF format to &Lire;, write log file converters and add reports for a superservice. is a reference section which gives comprehensive details about the various XML formats used by &Lire; and gives in-depth descriptions of its various APIs. is targeted at developers who want to participate in &Lire;'s development. It contains information about CVS access, coding conventions, tools needed to build from CVS, release management and other aspects important to those part of the &Lire; development team. Furthermore, it gives some information on how to contribute code to Lire, as an external party. Finally, contains various implementation details that may be interesting to people wanting to learn more about &Lire; internals.
Conventions Used
If You Don't Find Something In This Manual You can report typos, incorrect grammar or any other editorial problems to bugs@logreport.org. We welcome reader's feedback. If you feel that certain parts of this manual aren't clear, are missing information or lacking in any other aspect, please tell us. Of course, if you feel like writing the missing information yourself, we'll very happily accept your patch. We will make our best effort to improve this manual. Remember, that there is another manual, the &user-manual; which contains comprehensive information on how to install, use and configure &Lire;. It also contains reference information about all of &Lire;'s standard reports and supported services. There are various public mailing lists for &Lire;'s users. There is a general users' discussion list where you can find help on how to install and use &Lire;. You can subscribe to this list by sending an empty email with a subject of subscribe to questions-request@lists.logreport.org. Email for the list should be sent to questions@lists.logreport.org. You can keep track of &Lire;'s new release by subscribing to the announcement mailing list. You can subscribe yourself by sending an empty email with a subject of subscribe to announcement-request@lists.logreport.org. Finally, if you're interested in &Lire;'s development, there is a development mailing list to which you can subscribe by sending an empty email with a subject of subscribe to development-request@lists.logreport.org. Email to the list should be sent to development@lists.logreport.org. All posts on these lists are archived on a public website.
&Lire; Architecture Architecture Overview From a developer's point of view, &Lire; intends to be the universal log analysis framework. To this end, it provides a reliable, complete, framework upon which to build log analysis and reporting solution. &Lire;, the tool, is a proof of the versality and extendability of the framework as it is able to produce reports for many of the services that run in today's heterogeneous networks in a variety of output formats. As a framework, &Lire; is the best choice to replace all those home-grown scripts developed to produce reports from all the log files from the little-known products or custom-developed programs that run on your system. Leveraging &Lire; framework will make those scripts a lot more versatile while not being really more complicated to develop. It will be easier to add new reports or to support multiple report formats.
Log Processing in the &Lire;'s Framework
The &Lire;'s framework divides log analysis in four different processes. The figure shows those four processes: Log Normalisation The first process normalise logs from different products into a generic format that can be shared by all products that have similar functionality. For example, log files from products as different as &Apache; and &IIS; will be transformed into an identical format. Log Analysis In the analysis process, other information is created, inferred or extracted from the normalised data. For example, an anlyser in the www superservice infers the browser used by the client from the referrer information. Report Generation The third process generates a report from the normalised and analysed data. This process is done by a generic report engine that computes the report based on specifications describing what and how the information should appear in the report. The report is generated in a generic XML format. Report Post-processing and Formatting The last process converts the generic report into a specific format like ASCII, PDF, HTML but other kind of post-processing (like charts generation) can also be accomplished in this stage. Before going into a more detailed description of each of these procesesses, we'll introduce some of the common design's patterns that you'll find throughout the &Lire;'s framework.
Lire's Design Patterns At the center of each of these processes is an XML based file format. Having things specified in data files makes it easier to extend. For example, the reports are built using a generic report builder which finds the instructions on how to build the reports in XML files. So this makes it easy to add new information to a report: you just have to write an XML file. The fact that there are a lot of tools to process XML files is also an interesting aspect. For example, emacs lovers will appreciate the help that its psgml module gives them in writing report specifications. Another important aspects is that we tried to interoperate and to build upon other standards while defining our XML formats . The best illustration of this is that in all the XML file formats that &Lire; use, a DocBook subset is used for all elements related to narrative descriptions. Another common aspect you'll encounter is that each of these processes and XML file formats come with an API to manipulate them, making it easy to add functionalities at each processing stage. APIs are also a good thing because, even if in theory an open file format somewhat constitutes an API, having libraries that provide convenient access to the file formats makes it a lot easier to write new components providing new functionalities.
Log File Normalisation
The Log Normalisation Process
The first process of the Lire log analysis framework is the log file normalisation process. That process is summarized in the figure. This process is centered around the DLF concept which is kind of a universal log format. DLF stands for Distilled Log Format. The concept is that each product specific log file is transformed into a log format that can be common to all the products providing similar functionalities. In Lire's terminology, a class of applications providing similar functionality (e.g. MTA's supplying email) is called a superservice. Still in Lire's terminology, the service from which the super is derived (e.g. postfix or sendmail) refers to the native log format that is converted in the superservice's DLF. One can view the DLF as a table where the rows are the logged events and the fields are logged information related to each event. Since the information logged by an email server is totally different from a web server, each superservice should have its own data models. In Lire, the data model is called a DLF schema. The DLF schemas are defined in XML files using the DLF Schema Markup Language. The schema describes what fields are available for each logged events. One interesting aspect of &Lire;, is that altough the email DLF is used by all email servers, the email DLF data model isn't restricted to the lowest common denominator across the log formats supported by each email servers. In the Lire's architecture, the superservice's schema can represent the information logged by the most sophisticated product. When some part of the information isn't available in one log format, the DLF log file will contain this information and the reports that needs this information won't be included. This architecture means that to support a new service, i.e. a new log format, in Lire you just need to write a plugin, called a DLF converter. This is just a simple perl module that parses the native log format and maps the information according to the schema.
Log Analysis After normalisation, comes the analysis process. The analysis process responsability is to extracts, infers or derives other information from the logged data. Since the superservice's logged data is in a standard format, the analysers are generic in the sense that they can operate for all the superservice's supported log formats, if the product's was clever enough to log the information required by the analyser. The analysis process is shown in the figure.
The Log Analysis Process
Since each analyser can add information to or create a new DLF, each analyser will generate data according to special kind of schemas. &Lire;'s framework include two kind of analysers. The difference between the two resides in the mapping between the source data and the new data they generate. Extended analysers generate new data for each DLF record whereas derived analysers are used when the new data doesn't have a one-to-one mapping with the source data. The analysers produce data according to a data model which is specified in other DLF schemas. There are extended schemas and derived schemas. An extended schema simply adds new fields to the base superservice's schema. For example, in the web superservice's schema, a lot of information can be obtained from the referer field. From this information, it is possible to guess the user's browser, language or operating system. Those fields are specified in the www-referer extended schema; one analyser is responsible for extracting this information from the referer field. But sometimes the analysis cannot just simply add information to each event record, an altogether different schema is needed then. For those cases, there is the derived schema. An example of the use of such a schema in the current Lire distribution is the analyser which creates user sessions based on the logged client IP address and user agent. This analyser defines the www-session derived schema.
Report Generation Once you have all this data, it's time to generate some useful reports out of it. Lire's framework includes a generic report builder. What &Lire; calls a report is actually a collection of what one may understand as reports; &Lire; however speaks about a subreports. For example, the proxy's superservice report will contain subreports about the top visited sites, another subreport on the cache hit ratio, as well as several others. The subreports are defined using the Report Specification Markup Language. This markup language contains elements for several things: information regarding the schema on which it operates; descriptions that should be included in the generated report to help in the interpretation of the data; parameters that can be used to modify the generated report (for example, to generate a top 20 subreport instead of a top 10); a filter that selects the records that will be used for the subreport; and finally the operations that make up the subreport: grouping, summing, counting, etc. The report markup language covers most simple needs and there is an extension element as well as an API that can be used to hook in more fancy computations. There are no subreport specifications in the current distribution that make use of this feature yet, however. You can see an overview of this process in the figure.
Report Generation Process
The generated report is another XML file that uses another markup language, this time called the Lire's Report Markup Language. An actual report contains the help descriptions from the report specifications, information on the subreport specifications used, as well as the actual subreport's data.Using another intermediary XML file as output format makes all sort of things possible in the formatting and post-processing stage.
Report Formatting and Other Post-Processing The last process works with the generic XML report. Using a domain-specific XML format for the generated format makes it easy for the framework to support multiple different formats. Supporting a new output format is just a matter of writing a new module that processes the XML report file.
Processing of the XML Report Using The APIs
As shown in the figure, you can also process the XML files using the APIs to the XML report format.
Going Further As you can see form this overview, the &Lire; framework provides a powerful architecture to use for your log analysis needs. The architecture provides extensibility from log normalisation to post-processing of the reports. Exactly how to use the framework is the topic of the next part.
Using the &Lire; Framework In this part, you'll learn how to leverage the &Lire;'s framework for your own log analysis need. The most common use cases are developing a converter for a new log format and developping new reports. The first chapter explains how to write a converter for a new log format. The responsibility of the converter is to map the information contained in a log file to the data model of a specific DLF schema. When developping a converter for a log format which doesn't fall in the domain one of the existing DLF schema, you'll need to write a new one. This is the topic of the following chapter . The chaper gives information on how to write DLF analysers that can adds data to the base log information. The chapter this part gives some notes on how to develop new reports. Writing a New DLF Converter Before &Lire; can do various analysis and generate reports on the data contained in your various log files, it must first be converted to a common data model. This is specifically the job of the DLF converter. So if you want to generate the same reports for your RealServer log files (currently unsupported) than for you web server, you only need to develop a DLF converter which maps the RealServer content to the www DLF schema. If no existing DLF schemas represent correctly the domain of your application log file, it is easy to develop a new one. Consult the chapter for the whole story. This chapter will show you through an example how to develop a new DLF converter for a kind of useless log format: the common log format encapsulated in syslog. (It is useless because there is not many reasons to make your web server logs it requests through syslog. And it would be probably be simpler to just use the cut command to remove the syslog header.) The doc/examples in the source distribution contains another commented example which could serve as a starting point for your converters.
Prerequisites Developing a new DLF converter requires some basic programming skills in perl. Altough not strictly necessarily, you should be familiar with perl object-oriented programming model. If you aren't, you should read perltoot 1 before continuing.
The <type>common_syslog</type> Log Format The log format supported by our DLF converter is simply the standard Common Log Format supported by most web servers with a syslog header prepended to each line. Here is an example of what such a log file might contain: Remember that the other layer is a syslog log file and could contains other things than only the web server's requests. The first line in the example isn't a request record but really what usually ends up in the error_log and is a message about the server starting.
Creating the DLF Converter Skeleton Put simply, a DLF converter is a perl object which implements a set of predefined methods (aka an interface in the object-oriented jargon). Since a DLF converter is a perl object, it must be instantiated from a class. Classes in perl are defined in packages. We'll name the package which implements our converter MyConverters::SyslogCommonConverter. To create such a package, you need to create a file named MyConverters/SyslogCommonConverter.pm in a directory searched by perl. You can obtain perl's default search list by running the command $ perl -V. This search list can be modified by setting the PERL5LIB environment variables. Here is a first cut of our DLF converter: The first line declare that the code is in the MyConvertersw::SyslogCommonConverter package. The second one specifies that objects in this package are subclasses of the Lire::DlfConverter packages. The last line fullfill perl's requirement that package returns a true value once they are initialized. This is a complete DLF, altough useless, DLF Converter. In fact, it isn't complete because if you tried to register an instance of that class, you'll get unimplemented method errors. Besides, we don't even yet have a formal way to create instance of our converter. This is our next task.
Adding a Constructor The &Lire; framework doesn't place any restrictions on your DLF converter constructor. In fact, the constructor isn't used by the framework at all, it will only be used by your DLF converter registration script (). We will follow perl's convention of using a method named new for our constructor and of using an hash reference to hold our object's data. Here is our complete constructor: {syslog_parser} = new Lire::Syslog(); return $self; } ]]> Since our log format is based on syslog, we will reuse the syslog parsing code included in &Lire;. This is the reason we instantiate a Lire::Syslog object and save a reference to it in our constructor.
The Meta-Data Methods The Lire::DlfConverter interface requires two kinds of methods. First, it requires methods which provide information to the framework on your converter. Second, it requires methods which will actually implement the conversion process. It this the format that this section documents.
The DLF Converter Name The method name() should returns the name of our DLF converter. It is this name that is passed to the lr_log2report command. This name must be unique among all the converters registered and it should be restricted to alphanumerical characters (hyphens, period and underscores can also be used). We will name our converter common_syslog:
Providing Information To Users The next two required methods are used to give more verbose information on your converter to the users. The converter's title() and description() can be use to display information about your converter from the user interface or to generate documentation. The title() should simply returns a string: The description() method should returns a DocBook fragment describing your converter and the log formats it support. If you don't know DocBook just restrict yourself to using the para elements to make paragraphs: This DLF Converter extracts web server's requests and error information from a syslog file. The requests and errors should be logged under the httpd program name. The errors are mapped to the syslog schema, the requests are mapped to the www schema. Syslog records from another program than httpd are ignored. EOF } ]]>
Providing Information to the Framework Two other meta-data methods are used by the framework itself. The first one specifies to what DLF schemas your DLF converter is converting to: In our case, we are converting to the syslog and www schemas. Like we described it in our converter's description, we will map the web server's error message to the syslog schema and the request logs to the www schema. Other alternatives would have been to only map the requests information to www schema or map all the non-request records to the syslog schema. The rationale behind the current choice (besides this being an example) is that it make it convenient to process one log file to obtain a report containing the requests and errors from our web server. For that use case, it is best to ignore the non-web server related stuff. The other method affects how the conversion process will be handled. &Lire; offers two mode of conversion, the line oriented one and the file oriented one. (Both will be described in the next section). If your log file is line-oriented (each lines is one log record) like most log files are, you should use the line-oriented conversion mode:
The Conversion Methods The actual conversion process is handled through three methods: init_dlf_converter, finish_conversion() and either process_log_file() or process_log_line() depending on the conversion mode (as determined by handle_log_lines()'s return value.
Conversion Initialization The method init_dlf_converter() will be called once before the log file is processed. It should be use to initialize the state of your converter. Since our DLF Converter doesn't need any initialization and doesn't need any configuration, the method is simply empty: The $process parameter which is passed to all the processing methods is an instance of Lire::DlfConverterProcess. This is the object which is driving the conversion process and it defines several methods which you will use in the actual conversion process.
Conversion Finalization The method finish_conversion() will be called once after the log file has been completely processed. This method will be mostly of use to stateful converter, that is DLF converters which generates DLF records from more than one line. Since this is not our case, we simply leave the method empty:
The DLF Conversion Process Whether you are using the file-oriented or line-oriented conversion mode, the principles are the same. You extract information from the log file and creates DLF records from it. Your DLF converter communicates with the framework by calling methods on the Lire::DlfConverterProcess object which is passed as parameter to your methods. Here is the complete code of our conversion method: {syslog_parser}->parse( $line ) }; if ( $@ ) { $process->error( $@, $line ); return; } elsif ( $sys_rec->{process} ne 'httpd' ) { $process->ignore_log_line( $line, "not an httpd record" ); return; } else { my $common_dlf = {}; eval { parse_common( $sys_rec->{content}, $common_dlf ) }; if ( $@ ) { $sys_rec->{message} = $sys_rec->{content}; $process->write_dlf( "syslog", $sys_rec ); } else { $process->write_dlf( "www", $common_dlf ); } } } ]]> The first thing that should be noted is that in the line-oriented conversion mode, the method process_log_line() will be called once for each line in the log file. Secondly, the actual parsing of the line is done using two functions: parse_common and Lire::Syslog's parse. These methods simply uses regular expressions to extract the appropriate information from the line and put it in an hash reference. What is important is that these methods already uses as key names the schema's field names. Finally, you can see that there are four different methods used on the $process object to report different kind of information: Reporting Error The example uses the eval statement to trap errors during the syslog record parsing. If the line cannot be parsed as a valid syslog record, it is an error and it is reported through the error() method. The first parameter is the error message and the second one is the line to which the error is associated. This last parameter is optional. Ignoring Information When the syslog event doesn't come from the httpd process, we ignore the line. Ignored line are reported to the framework by using the ignore_log_line() method. The first parameter is the line which is ignored. The second optional parameter gives the reason why the line was ignored. Creating DLF Records Finally, DLF records are created by using the write_dlf() method. Its first parameter is the schema to which the DLF record complies. This schema must be one that is listed by your converter's schemas() method. The second parameter is the DLF data contained in an hash reference. The DLF record will be created by taking for each field in the schema the value under the same name in the hash. (Since in the syslog schema, the field which contains the actual log message is called message, this is the reason we are assigning the content value to the message key.) Missing fields or fields whose value is undef will contains the special LR_NA missing value marker. Keys in the hash that don't map to a schema's field are simply ignored. In our example, we distinguish between the server's error message (mapped to the syslog schema) and the request information (mapped to the www schema) based on whether parse_common succeeded in parsing the line. Saving Log Line Another possibility, not shown in our example, is to ask that the line be saved for a later processing. This is mostly of use to converters who maitains state between lines. In the cases, it is quite the case that there are related lines that are missing from the end of the log file. In that case, you save the line and they will automatically seen by the next run of your converter on the same DLF store. This option is only available in the line-oriented mode of conversion.
File-Oriented Conversion The same principles apply when you are using the file-oriented mode of conversion. This mode will usually be used for binary log formats or format which aren't line-oriented like XML. For demonstration purpose, the following code could be added to transform our line-oriented converter into a file-oriented one: ) { chomp $line; $self->process_log_line( $process, $line ); } } ]]> The difference between the above code and using the line oriented mode is that the framework won't be aware of the number of log lines processed and your converter might have troubles when processing log files which uses a different line-ending convention than the host you are runnig on. Bottom line is that you should use the line-oriented conversion mode when your log format is line oriented.
Registering Your DLF Converter with the &Lire; Framework We first said that DLF converters are perl objects which implements the Lire::DlfConverter interface. What we did is write a class which implements the said interface. Creating the object from that class is the responsability of the DLF converter registration script. This is simply a snippet of perl code which instantiates your object and registers it with the Lire::PluginManager: register_plugin( MyConverters::SyslogCommonConverter->new() ); ]]> That's all there is to it, really. You put this snippet in a file named syslog_common_init in one of the directories listed in the plugins_init_path configuration variable. Some other notes on this topic: The file can actually be named anything you want, the name service_init just make it clear what is the purpose of the file. The initial value of the plugins_init_path contains the directories sysconfdir/lire/plugins and HOME/.lire/plugins. You can change this list by using the lire tool. Your registration script can create and register more than one object. You can now generate a www report for log files in that format using the command lr_log2report common_syslog < file.log.
DLF Converter API The complete DLF Converter API documentation is included in POD format in the &Lire; distribution. It is usually formatted as man pages. You can alway read it using the perldoc command. The following packages documentation should be consulted: Lire::DlfConverter 3 , Lire::DlfConverterProcess 3 and Lire::PluginManager 3 .
Writing a DLF Schema If you want to develop a DLF converter for an application whose logging data model isn't adequately represented by one of the existing DLF schema, you'll need to develop a new one. If you are familiar with SQL, a DLF schema is similar to a table schema description. A DLF file can be seen as a table, where each log record is represented by a table row. Each log record in the same DLF schema shares the same fields.
Designing the <type>ftpproto</type> schema In this chapter, we will create a new schema for logging of FTP session. That DLF schema could serve for an improved DLF converter for log files generated by &IIS;. &Lire; currently has a DLF converter for these log files but the current ftp DLF schema is modelled after the xferlog log file which only represents file transfers whereas the log generated by &IIS; contains more detailed information on the ftp session. Here is an example of such a log file: #Software: Microsoft Internet Information Server 4.0 #Version: 1.0 #Date: 2001-11-29 00:01:32 #Fields: time c-ip cs-method cs-uri-stem sc-status 00:01:32 10.0.0.1 [56]created spacedat/091001092951LGW_Data.zip 226 00:01:32 10.0.0.1 [56]created spacedat/html/bx01g01.gif 226 00:01:32 10.0.0.1 [56]created spacedat/html/catlogo.gif 226 00:01:32 10.0.0.1 [56]QUIT - 226 00:03:32 10.0.0.1 [58]USER badm 331 00:03:32 10.0.0.1 [58]PASS - 230 As you can see, this log file contains other information beyond the simple upload/download represented in the standard FTP schema. It a session identifier, the command executed, as well as the result code of the action. Our new schema should be able to represent these things.
Creating The Schema File To create a DLF schema, you have to create a XML file named after your schema identifier: ftpproto.xml. Schema name should be made of alphanumeric characters. This schema identifier is case sensitive. You schema identifer shouldn't contains hyphens (-) or underscore characters (_). (The hyphen is used for a special purpose). All DLF schemas starts and ends the same way: superservice="ftpproto" timestamp="time" ]]> The first lines contains the usual XML declaration and DOCTYPE declarations, you'll find in many XML documents. The real stuff starts at the lire:dlf-schema. What is important for your schema are the value of the superservice and timestamp attributes. The first one contains your schema identifier. It is called superservice for historical reasons. The other one should contains the name of the field which order the record by their event type. (See for more information.) The last line in the above excerpt would be the last thing in the file and closes the lire:dlf-schema element.
Adding the Schema's Description The next things that goes into the schema file are the schema's title and description. Both are intended for developers to read and should be informative of the scope of the schema: DLF Schema for FTP Protocol This DLF schema should be used for FTP servers that have detailed information on the FTP connection in their log files. Each record represents a command done by the client during the FTP session. ]]> The content of the lire:description elements are DocBook elements. If you don't know DocBook, you just need to know that paragraphs are delimited using the para elements.
Defining the Schema's Fields The only remaining things in the schema definitions are the field specifications. Here is the definition of the first one: This field contains the timestamp at which the command was issued. ]]> As you can see, the fields are defined using the lire:field element which has three attributes: name This attribute contains the name of the field. This name should contains only alphanumeric characters. It can also make use of the underscore character. type This attribute contains the type of the field. The available types will described shortly. label This should contains the column label that should be used by default in your report for data coming from this field. This label should be short but descriptive. The field's description is held in the lire:description element which contains DocBook markup. The field's description should be descriptive enough so that someone implementing a DLF converter for this schema knows what goes where.
The Field Types The main types available for fields are: timestamp This should be use for field which contains a value to indicate a particular point in time. All timestamp values are represented in the usual UNIX convention: number of seconds since January 1st 1970. Each DLF schema must contains at least one field of this kind and its name should be in the lire:dlf-schema's timestamp attribute. hostname This type should be used for fields which contains an hostname or IP address. It is important to mark such fields, because it will possible eventually to resolve automatically IP addresses to hostname. bool Type for boolean values. number Type for numeric values. You shouldn't use this type when the values are limited in number and are semantically related to an enumeration like result code. You should use the string type for this. You should only use the number type for values which you'll want to report in classes instead on the individual values. bytes This type should be use for numeric values which are quantities in bytes. The more specific typing is useful for display purpose. duration This type should be use for numeric values which are quantities of time. The more specific typing is useful for display purpose. string This is the type which can be use for all other purpose. If you read the specifications, you'll find other types which are used. These additional types don't bring anything over the basic ones defined above and you shouldn't use them. In addition to the time field defined above, here are the remaining field defintions which make our complete ftpproto schema: This field should contains an identifier that can used to related the commands done in the same FTP session. This identifier can be reused, but shouldn't be while the FTP session isn't closed. This field contains the FTP command executed. The FTP protocol command names (STOR, RETR, APPE, USER, etc.) should be used. This should contains the FTP result code after executing the command. This field should contains the parameters to the FTP command. When the command involves a transfer like for the RETR or STOR command, it should contains the number of bytes transferred. This field contains the number of seconds executing the command took. ]]>
Installing The Schema Making available the new schema to the &Lire; framework is pretty easy: just copy the file to one of the directories set in the lr_schemas_path configuration variable. By default, this variable contains the directories datadir/lire/schemas and HOME/.lire/schemas. Like all other configuration variables, its value can be changed using the lire tool. Since we want our schema to be available for other users as well, we will install it in the system directory: (In this case, &Lire; was installed under /usr/local.
Writing a New DLF Analyser In &Lire;, a DLF Analyser is a plugin that can extract or derived data from other DLF data. The idea is that these analysis do not depends on the underlying log format but that it can be found simply by using the data normalised in the DLF schema. For example, an analyser could assign category based on the url that was visited (like assigning the 'Public' or 'Private' category). This categorising operation doesn't depends on the log format but only on the presence of the requested_page field in the schema. This would be an example of a special kind of analyser, a Lire DLF Categoriser. This is a simpler analyser that can create new fields based on one DLF record. The doc/examples in the source distribution contains the complete code for this categoriser. There is a more generic kind of analysers that create data in another dlf streams based on arbitrary queries on the source DLF schema. An example of this kind is an analyser that construct session summary from the www requests. It reads the DLF records of the www DLF schema and creates www-user_session DLF records from that. Writing an analyser is similar to writing a DLF converter, so consult for the details converning registration and using configuration.
Writing a Categoriser The simplest form of analyser are categorisers. In this section, we will show an example of how to write a categoriser that can assign categories using regular expressions to each www requested page.
Defining The Extended Schema A categoriser writes DLF in an extended schema. An extended schemas is an extension of a base schema. If you are familiar with SQL you can see it as an inner join with the main schema. That is each fields in the main schema will have the extension fields of the extended schema. In our case our extended schema is very simple, it only adds one category field to the www schema. Defining an extended schema is identical to writing a DLF Schema with exception that we use a different top-level element. You should consult for all the details. Here is the extended schema that our categoriser will use: Category Extended Schema for WWW service This is an extended schema for the WWW service which adds a category field based on the regexp matched by the requested_page. This fields contain the page category. ]]> The difference with a regular DLF schema is that it starts with the extended-schema tag which has a base-schema attribute which should contain the DLF schema or derived DLF schema that is extended.
Defining the Categoriser Like a DLF Converter, the categoriser s an object deriving from a base class which defines the categoriser interface. In the categoriser case, that interface is Lire::DlfCategoriser. The categoriser also has to provide some meta-information to the framework. Here is the code for all of this: A categoriser that assigns categories based on a map of regular expressions to categories."; } sub src_schema { return "www"; } sub dst_schema { return "www-category"; } ]]> The methods different from the DLf converter case are the src_schema which specifies the schema which to which fields are added and the dst_schema which gives the schema specifying the fields that will be added.
Categoriser Configuration Our categoriser will assign categories based on a mapping from regular expression to category names. To be useful, this mapping should be configurable. Like all plugins in &Lire;, DLF categorisers can use the Lire Configuration Specification Markup Language to defines the configuration data they use (see for the full details). The convention is that if there is a parameter named yourname_propeties, this is considered the configuration specification for the plugin yourname. This will mean that a little button will appear in the lire user interface so that the user can configure your plugin data. In our categoriser case, we will define a list of records which will enable the user to define many pairs of regular expression and category name: Page Categoriser Configuration This is a list of regexp that will be apply in this order along the category that should be applied when the regexp match. The Regexp-Category Association Regex The regular expression to test. Category This field contains the category that should be assigned. p .* Unknown ]]> This specification also sets a list containing one catchall regex with the category 'Uknown'. The user could add other values before that. An alternative implementation could define a field specifying the default category to assign when no regular expression matches.
Categoriser Implementation Two methods are needed to implement the categoriser. The first is an initialisation method called initialise. This method receives as parameter the configuration data entered by the user. In our case, we will compile the regular expressions for faster processing later on : [0] = qr/$map->[0]/; } $self->{'categories'} = $config; return; } ]]> The categorising is made in the categorise method. This method receives as parameter the DLF record to which the extended fields should be added. This DLF record is an hash reference containing one key for each of the fields defined in the source DLF schema. We simply assign the extended fields by adding new keys to the hash reference : {'categories'}} ) { if ( $dlf->{'requested_page'} =~ /$map->[0]/ ) { $dlf->{'category'} = $map->[1]; return; } } return; } ]]> That's all. Like for the DLF converter you'll need to register this analyser with the Lire::PluginManager (see for more information.
Writing an Analyser When a categoriser isn't sufficient for your needs, you can write an Lire::DlfAnalyser which gets complete control on the analysis process. The main difference with at categoriser is that the dst_schema method will contain refer to a derived schema instead of an extended schema. The core of the analyser is done in the analyse method that takes a reference to the store onto which data will be analysed and to a Lire::DlfAnalyserProcess callback object which should be use to write new DLF records and report errors. The method also receives the plugin configuration data. The analyser should create a Lire::DlfQuery to select the records necessary for its analysis. The doc/examples in the source distribution contains the a boiler plate for witing an Analyser.
DLF Analyser API The complete DLF Analyser API documentation is included in POD format in the &Lire; distribution. It is usually formatted as man pages. You can alway read it using the perldoc command. The following packages documentation should be consulted: Lire::DlfAnalyser 3 , Lire::DlfAnalyserProcess 3 , Lire::DlfCategoriser 3 , Lire::DlfQuery 3 and Lire::PluginManager 3 .
Writing a New Report Writing a new report involves writing a report specification, e.g. /service/<superservice>/reports/top-foo-by-bar.xml, and adding this report along with possible configuration parameters to <service>.cfg. E.g., to create a new report, based upon email/from-domain.xml: copy the file /usr/local/etc/lire/email.cfg to ~/.lire/etc/email.cfg. Copy the file /usr/local/share/lire/reports/email/top-from-domain.xml to e.g. ~/.lire/reports/reports/email/from-domain.xml. Edit the last file to your needs, and enable it by listing it in your ~/.lire/etc/email.cfg. Beware! The name of the report generally consists of alphanumerics and '-', but the name of parameters may not contain any '-' characters. It generally consists of alphanumerics and '_' characters.
Filter Specification For now, you'll have to refer to the example filters as found in the current report specification files. We'll give one other example here: specifying a time range. Suppose you want to be able to report on only a specific time range. You could build a (possibly global and reused) filter like: ]]> When trying your new filter, you could install it in ~/.lire/filters/your-filter-name.xml. When lr_dlf2xml looks up a filter which was mentioned in the report configuration file, it looks first in ~/.lire/filters/, and then in .../share/lire/filters/.
Developer's Reference Lire Data Types &lire-types-doc; Common Textual Elements to All XML Formats &lire-desc-doc; The Lire Report Configuration Specification Markup Language &lrcsml-doc; The Lire Report Configuration Markup Language &lrcml-doc; The Lire DLF Schema Markup Language &ldsml-doc; The Lire Report Specification Markup Language &lrsml-doc; The Lire Report Markup Language &lrml-doc; &Lire; Developers' Conventions Contributing Code to &Lire; The LogReport team invites you to contribute code to &Lire;. We're very happy with any code contributions which work for you: it'll very likely will make life easier for other people too! We ask you to consider some points, when writing code to get distributed with &Lire;. When adding new scripts, or extending and improving current &Lire; code, make sure you're working with the current &Lire; code. (When working with old code, the bug you're working on might be fixed already by somebody else.) You can get the current code by fetching our CVS from SourceForge, using the anonymously accessible pserver: cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/logreport login When prompted for a password for anonymous, simply press the Enter key. cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/logreport co service See also the instructions on the SourceForge website. Alternatively, you can peek at the Lire CVS using your webbrowser. When you'd like to change e.g. /usr/local/bin/lr_log2report, you'll have to hack on cvs/sourceforge/logreport/service/all/script/lr_log2report.in. This file will get converted to lr_log2report by running ./configure. Of course, when adding scripts or extending scripts, be sure to update the scripts' manpage too. If you'd like the LogReport team to distribute your contribution, be sure to offer it to the team under a suitable software license. Refer to the Licensing section in the &faq; for details. Once you've tested your script, you can send it too the LogReport development list on development@lists.logreport.org. The LogReport team will be happy to ship your contribution with the next Lire release. Developers' Toolbox
Required Tools To Build From CVS In order to be able to build the program from the CVS tree and make a tarball distribution the following tools are needed: DocBook XML 4.1.2 DocBook DSSSL stylesheets autotools &Jade; or &OpenJade; lynx GNU make Perl's XML::Parser module dia epsffit epstopdf xsltproc xmllint For Debian woody the packages are: docbook-utils, docbook-xml-stylesheets, autoconf, automake1.4, autotools-dev, jade, lynx, make and libxml-parser-perl. You need automake version 1.4. Building using automake 1.7 will very likely not work.
Accessing &Lire;'s CVS Make sure you've got an account on SourceForge. Get yourself added to the logreport project. (Joost van Baal joostvb@logreport.org can do this for you.) Make sure your ssh public key is on the sourceforge server. A full backup of the complete LogReport CVS as hosted on SourceForge is made weekly and written to hibou:/data/backup/cvs/.
CVS primer If you have a Unix like system, make sure you have this CVSROOT=:ext:cvs.sourceforge.net:/cvsroot/logreport CVS_RSH=ssh in your shell environment. Of course, you could do something like &user-prompt;eval `ssh-agent` &user-prompt;ssh-add to get a nice ssh-agent running. Now do something like &user-prompt;cd ~/cvs-sourceforge/logreport &user-prompt;cvs co service There are also repositories called 'docs' and 'package'. In the former the webpages are located and in the latter the package files for &Debian; and other distributions are kept. Files can then be edited and commited: &user-prompt;vi somefile &user-prompt;cvs commit somefile and get flamed ;) Subscribe yourself to the commit list (commit-request@lists.logreport.org), to get all commit messages, along with unified diffs.
SourceForge
Mailing Lists
Coding Standards Indentation should be four spaces. No tabs please. See also Message-Id: <1028238571.1085.185.camel@Arendt.Contre.COM> on the development mailing list for some rationale on coding standards.
Shell Coding Standards Shell scripts should run -e. Shell script should be portable. Refer to http://doc.mdcc.cx/doc/autobook/html/autobook_208.html .
Perl Coding Standards Perl scripts should use strict, and run -w. Documentation should come in .pod format, documentation about script internals should be in perl comments. No & in function call unless necessary. Split long lines using hard return; try to respect the 72th column margin (this is kind of a soft limit). Refer to the Lire::Program manpage for more details.
Making Lire <quote>Test-infected</quote> Soon after the release of &Lire; 1.2.1, unit tests were introduced in the source tree. Unit tests help development in several ways; the most important one being that you can make changes to code and run the unit tests to make sure that nothing was broken by that changes. You can find helpful resources on Unit testing on the PerlUnit home page as well as on the JUnit home page from which it was inspired.
Unit Tests in Lire
PerlUnit Unit tests are written using the PerlUnit framework. You need to install version 0.24 or later of the Test::Unit to run the unit tests.
Writing Tests General information on using the PerlUnit framework can be found in the Test::Unit man page. Information on writing individual test cases can be found in the Test::Unit::TestCase man page. Tests for individual modules should be defined in tests::moduleTest package. You can omit the Lire:: prefix and you can inline intermediary package names. For example, the unit tests of the Lire::ExtendedDlfSchema module are in the tests::ExtendedDlfSchemaTest package and the tests of the Lire::Timegroup module are in the tests::TimegroupTest package. The Lire::Tests namespace is reserved for extensions to the PerlUnit framework that will be used to provide fixtures and assertions that are of general use for common Lire extensions. This section will be expanded as common patterns for writing unit test for DLF converters, analyzers and other common Lire extension are developped.
Running Tests To run tests, you use the TestRunner.pl script included with the PerlUnit distribution. You'll need to add the directory containing the Lire libraries to perl library path. For example, if you have TestRunner.pl in your ~/bin directory, you can run a test case from the top level source directory like this: $ perl -Iall/lib ~/bin/TestRunner.pl tests::ExtendedDlfSchemaTest tests::ExtendedDlfSchemaTest can be replaced by your TestCase module.
Some <quote>Best Practices</quote> on Unit Testing This section lists some tips on how to make effective use of Unit tests in common development situations on Lire. Changing interface/implementation Before changing a module interface or implementation, make sure that this module has test cases and that it passes its tests before changing the implementation. This way you can know that your changes didn't break anything. Debugging A good opportunity for writing tests is when bugs are reported. Before trying to chase the bug using the debugger or adding print statements, write a test case that will fail as long as the bug isn't fixed. This achieves two purpose: first, you'll know when the bug is fixed as soon as the test pass; secondly, we now have a test case that will warn us if we regress and the bug reappears.
Commit Policy Make sure your changes run on your own platform before committing. Try not to break things for other platforms though. Currently, &Lire; supported platforms are GNU/Linux (&Debian;, &RH;, &Mandrake;), &FreeBSD;, &OpenBSD; and &Solaris;. Documentation should be updated ASAP, in case it's obsolete or incomplete by new commits.
CVS Branches When doing major architectural changes to &Lire;, branches in CVS are created to make it possible to continue to fix bugs and to add small enhancements to the stable version while development continues on the unstable version. This applies mainly to the service repository. The doc and package repositories generally don't need branching. BTW: A nice CVS tutorial is available in the Debian cvsbook package.
Hands-on example A branching gets announced. Be sure to have all your pending changes commited before the branching occurs. After a branch has been made, one can do this: &user-prompt;cd ~/cvs-sourceforge/logreport &user-prompt;mv service service-HEAD &user-prompt;cvs co -r lire-20010924 service &user-prompt;mv service service-lire-20010924 or (with the same result) &user-prompt;mv service service-HEAD &user-prompt;cvs co -r lire-20010924 -d service-lire-20010924 service Now, when working on stuff which should be shipped in the coming release, one should work in service-lire-20010924. When working on stuff which is rather fancy and experimental, and which needs a lot of work to get stabilized, one should work in service-HEAD.
Naming, what it looks like Here is what branches schematically look like: lire-unstable-20010703 ---> HEAD \ \ lire-20010630 ---> lire-stable-20010701 ]]> In this diagram a branch named lire-20010630 was created from the release-20010629_1 tag. lire-unstable-20010703 is another tag on the trunk (the trunk is the main branch). HEAD isn't a real tag, it always points to latest version on the trunk.
Creating a Branch To create a branch, one runs the command cvs rtag -b -r release-tag branch-name module. Note that this command doesn't need a checkout version of the repository. For example, to create the release-20010629_1-bugfixes branch in the service repository, e.g. to backport bugfixes to version 20010629_1, one would use cvs rtag -b -r release-20010629_1 release-20010629_1-bugfixes service. When ready for release, this could get tagged as release-20010629_2. The release-tag should exist before creating the branch. In case you want to branch from HEAD, use -r HEAD. E.g. cvs rtag -b -r HEAD release_1_1-branch service. Once Lire 1.1 gets released, tag it as release_1_1.
Accessing a Branch To start working on a particular branch, you do cvs update -r branch-name. For example, to work on the release_1_1-branch branch, you do in your checked out version, cvs update -r release_1_1-branch. This will update your copy to the version release_1_1-branch and will commit all future changes on that branch. Alternatively, you can also specify a branch when checking out a module using cvs co -r branch-name module. For example, you could checkout the stable version of &Lire; by using cvs co -r release_1_1-branch service. To see if you are working on a particular branch, you can use the cvs status file command. For example, running cvs status NEWS could show: The branch is indicated by the Sticky Tag: keyword. If its value is (none) you are working on the HEAD branch. To work on the HEAD, you remove the sticky tag by using the command cvs update -A.
Merging Branches on the Trunk You can bring bug fixes and small enhancements that were made on a branch into the unstable version on the trunk by doing a merge. You do a merge by using the command cvs update -j branch-to-merge in your working directory of the trunk. Conflicts are resolved in the usual CVS way. For example, to merge the changes of the stable branch in the development branch, you would use cvs update -j lire-stable. You should tag the branch after each successful merge so that future changes can be easily merged. For example, after merging, you do in a checked out copy of the lire-stable branch: cvs tag lire-stable-merged-20010715. In this way, one week later we can merge the week's changes of the stable branch into the unstable branch by doing cvs update -j lire-stable-merged-20010715 -j lire-stable.
Testing and debugging
Test before releasing One week before release the software should be tested on all supported platforms. In between releases the system gets tested on various platforms on an ad hoc basis. When testing, use the to-be-released tarball. Run make distcheck to generate such a tarball. Especially when changes to the Lire core have been made, the "test" superservice can be handy, for easy setting up of tests of your code. See also the section on Unit Testing in this document.
Test-installations and test-runs We give some hints on various ways to debug the &Lire; code. One can make a test-install by extracting a tarball and running e.g. &user-prompt; ./configure --prefix=$HOME/local && make && make install &user-prompt;PATH=$HOME/local/bin:$PATH; export PATH &user-prompt;MANPATH=$HOME/local/share/man; export MANPATH One can do a test-run by executing: &user-prompt;echo 'some bug-triggering log line' | lr_log2report -o xml <converter> > /tmp/report.xml &user-prompt;lr_xml2report -o txt /tmp/report.xml > /tmp/report.txt &user-prompt;$HOME/local/libexec/lire/convertors/combined2dlf < /tmp/combined.log > /tmp/dlf
Using the Perl debugger on Lire code Please use the perl debugger: investing some time to learn is pays back really quick. Here's a very tiny howto. Start the debugger as e.g. perl -d `which lr_log2report` -o xml combined < tmp/log > /dev/null After starting the debugger, run "v" and "c lineno" to make sure all modules are loaded. Once that's done, you can fast-forward to a relevant routine using e.g. "c Lire::DlfAnalysers::ReferrerCategoriser::categorise". Now you can inspect variables and evaluate expressions by running e.g. DB<12> x $parsed_url->{'query'} Also, be sure to try the commands "s" and "r". Just these 4 command very likely are enough to get your job done. (The "y" command might be useful too, though). See perldebug(1) and perldebtut(1) for more information.
Making a Release Before making an official &Lire; release, it should have been tested on all supported platforms. A release shouldn't be made unless &Lire; builds, installs and generates an ASCII report from all supported log files on all supported platforms. If this is not the case, the release should be delayed untill this is fixed. Making a new release of &Lire; involves many steps: Writing the final version number in NEWS. Tagging the CVS tree. Building the "Standard" &Lire; tarball. Building the &Debian; package. Building the RPM package. Making sure the FreeBSD package gets updated. Uploading the tarballs and making packages available. Advertising the release.
Setting version in NEWS file, checking ChangeLog Inbetween releases, the NEWS file generally reads "version in cvs". This should of course be changed to e.g. "version 20011205". We maintain a ChangeLog file. Make sure the ChangeLog in the toplevel directory is not too big. If needed, split off a chunk and move it to doc/. The ChangeLog is autogenerated from the CVS commits, using the cvs2cl tool. One could e.g. run cvs2cl --prune --stdout -l "-d \>yesterday" -U ../CVSROOT/users. Beware! It might take SourceForge about a day to make the cvs log available. So you might have to wait a day between your last commit and running cvs2cl.
Tagging the CVS Run e.g. cvs tag release-2_0_2_99_1.
Building The Tarball Start from a fresh copy by running the command make maintainer-clean-recursive in the directory where you checked out &Lire;'s source code. Make sure that there are no tarballs in the extras subdirectory. Set the version and prepare the source tree by running the command ./bootstrap. (You can overwrite the pre-cooked version by doing e.g. echo `date +%Y%m%d`-R-f-jvb-1 > VERSION . Make sure your version hasn't got too many characters. Non-GNU tar chokes if pathnames in the archive are too long.) Generate Makefiles Run ./configure Build &Lire; and create the tarball by running the command make distcheck. This will build a tarball lire-version.tar.gz and then make sure that the content of this tarball can be built and installed. If that command fails, &Lire; isn't ready to be released. Fix the errors before making the release. Sign &Lire;'s tarball with your public key. To do this with &GnuPG;, run gpg --detach-sign --armor lire-version.tar.gz. A file lire-version.tar.gz.asc will be created. Publish this file together with the tarball. Now, people downloading the tarball can verify its integrity by downloading the .asc as well as your public key, and running gpg --verify lire-version.tar.gz.asc .
Building The Debian Package This is a raw unformatted dump of what we did to build and upload the &Lire; .deb. &user-prompt;cd ~/cvs-sourceforge/logreport/package/debian &user-prompt;vi changelog :r !date --rfc &user-prompt;cd /usr/local/src/debian/lire/debian/20010219 Run something like 'DIB_V=20020214 DIB_P=lire DIB_TARDIR=../archive/ ./debian-install-build'. This does: &user-prompt;cd /usr/local/src/debian/lire/debian/20010219 &user-prompt;cp \ ~/cvs-sourceforge/logreport/service/lire-20010219.tar.gz . &user-prompt;tar zxf lire-20010219.tar.gz &user-prompt;cd lire/20010418 &user-prompt;mv lire-20010418 lire-20010418.orig &user-prompt;tar zxf lire-20010418.tar.gz &user-prompt;cd lire-20010418 &user-prompt;mkdir debian &user-prompt;cp \ ~/cvs-sourceforge/logreport/package/debian/[^C]* debian/ Export the shell environment variable EMAIL, it should hold your email address, as it is to appear in the maintainers field of the package. (One could use 'dh_make --copyright gpl -s' on first time debianizing.) Build the .deb by running: &user-prompt;debuild 2>&1 | tee /tmp/build Check the .deb: &user-prompt;debc | less You might also want to test wether the Debianized sources build fine on other machines: copy diff.gz, orig.tar.gz and .dsc. Then do &user-prompt;dpkg-source -x lire_*.dsc &user-prompt;cd lire-version &user-prompt;dpkg-buildpackage -rfakeroot After having really tested it (dpkg -i, purge, etc.), optionally install it on any local apt-able websites you might have (Joost has one on http://mdcc.cx/debian/) and upload it to hibou's apt-able archive: &user-prompt;scp lire_20010418-1_all.deb \ hibou.logreport.org:/var/www/logreport.org/pub/debian/dists/local/contrib/binary-all/admin/ &user-prompt;scp lire_20010418*.gz \ hibou.logreport.org:/var/www/logreport.org/pub/debian/dists/local/contrib/source/admin/ &user-prompt;scp lire_20010418*.*s* \ hibou.logreport.org:/var/www/logreport.org/pub/debian/dists/local/contrib/source/admin/ Move the old debian stuff on hibou to hibou:/pub/archive/debian/ . Update the Packages file by running &user-prompt;cd /var/www/logreport.org/pub/debian &user-prompt;make To upload it to the official debian mirrors: vanbaal@gelfand:/usr...src/debian/lire/20010418% date; \ dupload lire_20010418-1_i386.changes Thu Apr 19 14:27:38 CEST 2001 Uploading (ftp) to ftp.uk.debian.org:debian/UploadQueue/ [ job lire_20010418-1_i386 from lire_20010418-1_i386.changes New dpkg-dev, announcement will NOT be sent lire_20010418.orig.tar.gz, md5sum ok lire_20010418-1.diff.gz, md5sum ok lire_20010418-1_all.deb, md5sum ok lire_20010418-1.dsc, md5sum ok lire_20010418-1_i386.changes ok ] Uploading (ftp) to uk (ftp.uk.debian.org) lire_20010418.orig.tar.gz 163.1 kB , ok (12 s, 13.59 kB/s) lire_20010418-1.diff.gz 32.6 kB , ok (3 s, 10.88 kB/s) lire_20010418-1_all.deb 222.4 kB , ok (16 s, 13.90 kB/s) lire_20010418-1.dsc 0.6 kB , ok (0 s, 0.60 kB/s) lire_20010418-1_i386.changes 1.2 kB , ok (1 s, 1.22 kB/s) ] check ftp://ftp.uk.debian.org/debian/UploadQueue/
Building The RPM Package
Making sure the FreeBSD port gets updated Since August 21, 2002, Lire is in the FreeBSD ports collection. Edwin Groothuis has build a FreeBSD port. Ask him if he's available for updating his port. Alternatively, Cédric Gross might be able to help. If not, the LogReport team should take care of it, and submit a Problem Report to the FreeBSD system, asking for inclusion of the updated port.
Uploading The Release To release a new distribution, publish the tarball on various places and send an announcement to the announcement@lists.logreport.org mailinglist, stating the most interesting new features. Furthermore, add a newsitem to the news list of the website. We'll describe how to upload the tarball to various places.
The LogReport Webserver Upload the tarball to the pub area on the LogReport server. The area is mirrored automagically by the download.logreport.org servers; updates are done every 6 hours. Upload like this: &user-prompt;scp lire-20001211.tar.gz hibou.logreport.org:/var/www/logreport.org/pub/ On hibou, do: &user-prompt;cd /var/www/logreport.org/pub &user-prompt;chown .www lire-20010525.tar.gz &user-prompt;chmod g+w lire-20010525.tar.gz &user-prompt;tar zxf lire-20001211.tar.gz &user-prompt;rm current && ln -s lire-20001211 current &user-prompt;rm current.tar.gz && ln -s lire-20001211.tar.gz current.tar.gz &user-prompt;rm -rf lire-20001205 &user-prompt;mv lire-20001205.tar.gz archive Update the README.txt file: Run &user-prompt;cd /var/www/logreport.org/pub &user-prompt;( echo \ 'current is the latest official release'; echo; ls -lF c* ) > README.txt Check the symlink to the documentation stuff in the tarball. Check if the stuff in http://logreport.org/pub/docs is still up to date.
Advertising The Release
SourceForge In order to release a distribution on SourceForge (SF), you login with your SF account on the SF website. Once logged in you go to the project webpage and choose Admin. Down at the bottom of that page is a a [Edit/Add File Releases] link (click it). You are able to edit packages, like the &Lire; package in the LogReport project. To add a new release, choose [Add Release]. As a release name uses the date, like 20010407, assign it to the &Lire; package and then use the Create This Release button to makes it effective. The next page shows 4 steps of which only one (step 2) is not straightforward. In that step you assign files to a release (.tar.gz, .deb, .rpm). These files should be uploaded to SF's Upload anonymous FTP site at ftp://upload.sourceforge.net/incoming/. Make sure the file is placed in the /incoming directory. Click Refresh View in Step 2 to add the files you uploaded to the FTP site. Check the files belonging to the release and Click Add Files. In step 3, set Processor to any. Set file type to .deb and source.gz. Click update/refresh. Step 4: send notice. Done.
Freshmeat.net On Freshmeat.net, releases are not released, but get announced only. These announcements attract a lot of attention. The webpage for the &Lire; package can be found at http://freshmeat.net/projects/lire/. To announce a new release go to Lire - development branch webpage. Choose Add Release from the Project pull down menu in the light blue area. The rest is very straightforward.
Website Maintenance We give hints on how to upgrade the website: installing stuff from current CVS on http://logreport.org. Commits to the CVS tree of the website are automatically propagated to hibou. For more information on the markup language of the website, see the WJML documentation.
Documentation on the LogReport Website Be sure the links to stuff under /pub/current are still alive. E.g. the files TODO, dev-manual.html and user-manual.html are linked to.
Publishing the DTD's The DTD's are published as HTML on the website by using hibou:/usr/local/src/dtdparse/dtdparse-2.0b2-LogReportPatched.tar.gz, which is a patched version of Norman Walsh's dtdparse utility. Before the utility is run, make sure that the DocBook DTD is not included in the parsing process, because the DocBook DTD should not be published. This is done by changing the line: ]]> into: ]]> The webpages are then generated with: perl ~/dtdparse-2.0b2-patched/dtdparse.pl --title "XML Lire Report Markup Language" --output lire.xml lire.dtd perl ~/dtdparse-2.0b2-patched/dtdformat.pl --html lire.xml The resulting lire directory can be tar-ed, gziped and unpacked again on hibou in the directory /var/www/logreport.org/pub/docs/dtd/. The other two DTD's are HTML-ized similarly, but remember to change the title when running dtdparse.pl.
Writing Documentation Documentation which comes with the &Lire; tarball is maintained in four formats: plain text, Perl POD, DocBook XML and UML diagrams. We'll talk about all four of these here.
Plain Text Small files like README, NEWS, AUTHORS, doc/BUGS, and doc/TODO are traditionally maintained in plain text format. We adhere to this common practice.
Perl's Plain Old Documentation: maintaining manpages We use Perl's pod (plain old documentation) for manpages. Every file installed with &Lire; in /usr/bin/ must have a manpage. Every file installed in /usr/share/perl5/Lire/ and /usr/lib/lire/ should have a manpage. It would be nice if the files in /etc/lire/ were documented in manpages too. And perhaps for some files in /usr/share/lire/xml/, /usr/share/lire/reports/, /usr/share/lire/filters/ and /usr/share/lire/schemas/ manpages could be useful. Since the files in /usr/bin/ are commands, ran by &Lire; users, the manpages describing these should focus on the user perspective. Describing the inner workings and implementations of the commands is less important than describing why someone would want to run the specific command. If there's need to make some remarks on the internals of these scripts, a section called DEVELOPERS could be added to the manpage. The perl modules installed in /usr/share/perl5/Lire/ and the commands in /usr/lib/lire/ are not intended as interfaces for the user. Only people wanting to change or study the operation of &Lire; itself will interact with these files; therefore, the manpages should explain the inner workings and implementations of these files. The configuration files in /etc/lire/ might be changed by users. These should be properly documented: in manpages or in the &user-manual;.
Docbook XML: Reference Books and Extensive User Manuals The main documentation of the &Lire; project is done in DocBook XML 4.1.2. E.g. this document is maintained in DocBook XML, as is the &user-manual;. The &user-manual; has more information about DocBook. After editing the &dev-manual; or the &user-manual;, you should run make check-xml to make sure the document is still a valid DocBook document. You should fix any errors before committing your changes. If everything went right, documentation is built in txt, tex, html and pdf format by running make dist, or just make in doc/. We give some hints which might be helpful in case you have to build the documentation manually. To generate PDF: &user-prompt;jade -t tex -d /path/to/DSSSL/docbook/print/docbook.dsl roadmap.xml &user-prompt;pdfjadetex roadmap.tex The last step is actually done two or three times to resolve page numbers. To generate HTML: &user-prompt;jade -t sgml -d html.dsl roadmap.xml And now you can use the html.dsl in the doc/source directory. (If necessary, adjust it to reflect the location of your DSSSL stylesheets). Use lynx to generate TXT output from HTML with: &user-prompt;lynx -nolist -dump roadmap.html > roadmap.txt
Implementation Details Adding a New Superservice in &Lire;'s Distribution Integrating a new superservice in the &Lire;'s several things: Making new directories in CVS: /service/<superservice>/ /service/<superservice>/script/ /service/<superservice>/reports/ Adding several files: /service/<superservice>/Makefile.am /service/<superservice>/reports/Makefile.am /service/<superservice>/script/Makefile.am /service/<superservice>/<superservice>.cfg /service/< superservice>/<superservice>.xml This file specifies the DLF format of the superservice. Ideally, it should offer a place for each and every snippet of information which will ever be found in a logfile from a program which offers functionality defined by the superservice. This file should have documentation embedded; this will show up in this manual. Writing service plugins (2dlf scripts): /service/<superservice>/script/<service>2dlf.in Adapting several files: /service/configure.in (add the Makefiles and 2dlf script to AC_OUTPUT, to get them converted from <service>2dlf.in to <service>2dlf.) /service/Makefile.am (add the superservice directory to SUBDIRS, so that make gets run there too, when called from the root source directory.) /service/all/etc/address.cf (to make the new service known as a member of a superservice.) Update Documentation: User Manual: Chapter "Supported Applications". Add manpages for scripts Update the configuration by writing a custom config spec or extended the current one as well as by added default values to the defaults configuration files. Issues with Report Merging In some cases, a merged report doesn't display the right information. We outline some worst case scenarios, and justify our implementation. Suppose log file 1 (requests with sizes) looks like: request size A 12 B 11 C 10 while log file 2 looks like: request size D 3 E 2 F 1 We report on the top 2 biggest requests, so the report from log 1 looks like: request size A 12 B 11 while the report from log 2 would look like: request size D 3 E 2 Now we change the