jxla logo franšais   

How-to made on april 2005

Check your configuration

To run JXLA, you need a JDK ( 1.3 or above )
A log file from your webserver to parse
The last archive from sourceforge.net (don't download the version without xerces ), ( if you download the version 1.1, don't forget to download org.nioto.browser-v0.2.1.jar to have the latest versions. )


Unzip the archive, it will create a jxla directory with all stuff inside, ( if it's version 1.1, replace the library jxla/lib/org.nioto.browser.jar by org.nioto.browser-v0.2.1.jar ).

nioto@serveur1:~$ tar -zxf jxla-1.1.1.tar.gz
nioto@serveur1:~$ cd jxla/
nioto@serveur1:~/jxla$ ls -al
total 44
-rwxr-xr-x  1 nioto nioto  574 Mar 19 18:24 Changelog
-rwxr-xr-x  1 nioto nioto 1657 Mar 19 18:16 README.txt
-rwxr-xr-x  1 nioto nioto  267 Jan 21  2002 ant.bat
-rwxr-xr-x  1 nioto nioto  210 Jan  6 14:18 ant.sh
drwxr-xr-x  2 nioto nioto 4096 Mar 19 18:17 bin
-rwxr-xr-x  1 nioto nioto 1350 Mar 19 17:33 build.xml
drwxr-xr-x  2 nioto nioto 4096 Jan  6 19:16 conf
drwxr-xr-x  3 nioto nioto 4096 Mar 19 17:29 doc
drwxr-xr-x  2 nioto nioto 4096 Mar 19 17:29 lib
-rwxr-xr-x  1 nioto nioto 2302 Jan  6 14:18 license.txt
drwxr-xr-x  3 nioto nioto 4096 Jan  6 19:17 src
Set the JAVA_HOME environnement variable to the correct location,
on my linux box the jdk is located on /usr/local/jdk so:
nioto@serveur1:~/jxla$ export JAVA_HOME=/usr/local/jdk
nioto@serveur1:~/jxla$  $JAVA_HOME/bin/java -version
java version "1.4.2_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_02-b03)
Java HotSpot(TM) Client VM (build 1.4.2_02-b03, mixed mode)


First thing to do is to locate the log files and their format,
for apache on unix box look at your httpd.conf.
In my box :

nioto@serveur1:~/jxla$ more /etc/apache/httpd.conf | grep Log | grep -v ^#
ErrorLog /var/log/apache/error.log
LogLevel debug
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v" full
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %P %T" debug
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
CustomLog /var/log/apache/access.log combined
So, my log files ar stored on /var/log/apache/ directory with names
beginning with access.log ( rotate logs move old logs to apache.log.1.gz, apache.log.2.gz, etc ...).
And the format of their lines is "combined" :
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"

Edit the conf/conf.xml file ( description here ):
  1. dns : allow or disallow the reverse dns requests, if you allow it, set the path of the file to store the requests already made ( something like $JXLA_HOME/work/dnsrequest.dump ). I disallow dns reverse to speed up analyse.
  2. logfiles : set up the location of files to parse and their format
    on my box: directory is /var/log/apache/ , filenameregexp (value is a regexp) is access\.log.*.
    The format node is the less intuitive part, here is a of conversion for different elements of apache format ( see apache documentation for more information ):
     %h : Remote host => $remote_host
     %l : Remote logname (from identd, if supplied ) => not used so  * ( any string without space)
     %u : Remote user => $user
     %t : Time, in common log format time format (standard english format,
          ex:[28/Mar/2005:20:32:57 +0200] ) => [d/m/y:h *]
     \"%r\" : First line of request quoted ( ex : "GET /index.html HTTP/1.0" ) => "* $uri *"
     %>s : Status => $status
     %b : Bytes send => $size
     \"%{Referer}i\" : quoted referer  => $referer
     \"%{User-Agent}i\" : quoted user-agent => $agent
    If you had change the log format, you can add more than one regexp node to the file.
        <regexp>$remote_host * $user [d/m/y:h *] "* $uri *" $status $size $referer $agent*</regexp>
  3. pages : My pages have htm, html and php extensions
    and the default index page is index.html ( the page called when url ends with / )
      <extensions>.htm, .html, .php</extensions>
  4. max-values: the default values are good
  5. localconfigclass: The purpose of this parameter is to allow to retrieve data, when your server doesn't not make a log by host, but one file containing all data, in my box, there is only a website so no change are done:
    Note :
    if you want to analyze a log file containing data from different hostnames (like an hosting company),
    you will need to code a class implementing AbstractSiteConfig to tell
  6. summary-name: name of the file where simple report by month will be write
  7. history-filepath : path file to store data already parsed ( something like $JXLA_HOME/work/history.dump )
  8. searchengines : configure search engines, usefull to parse referer, default values are good.

Edit the bin/runSimple(.sh|.bat) script

Set the OUTPUTDIR,JAVA_HOME, CONF and HOSTNAME environment variables
OUTPUTDIR : directory where to write the report
JAVA_HOME : the location of the jdk
CONF : path to your configuration file
HOSTNAME : name of the hostname which owns the log files

Check your configuration

Copy your bin/runSimple.(sh|bat) to bin/viewConfig.(sh|bat) and modify last line

$JAVA_HOME/bin/java -Dhostname=$HOSTNAME -Doutputdir=$OUTPUTDIR org.novadeck.jxla.Main $CONF
$JAVA_HOME/bin/java -Dhostname=$HOSTNAME -Doutputdir=$OUTPUTDIR org.novadeck.jxla.Main $CONF viewConfig
and run it, you will see your configuration like :
nioto@serveur1:~/jxla$ bin/viewConfig.sh
Your configuration is :

Class to get infos from hostnames : org.novadeck.jxla.config.SimpleSiteConfig
Requests with extensions in  [.htm, .html, .php ]
Reverse dns is disable,
List of files to parse : [ /var/log/apache/access.log, /var/log/apache/access.log.1,
/var/log/apache/access.log.2.gz, /var/log/apache/access.log.3.gz, /var/log/apache/access.log.4.gz,
/var/log/apache/access.log.5.gz, /var/log/apache/access.log.6.gz, /var/log/apache/access.log.7.gz,
/var/log/apache/access.log.8.gz, /var/log/apache/access.log.9.gz, /var/log/apache/access.log.10.gz,
/var/log/apache/access.log.11.gz, /var/log/apache/access.log.12.gz, /var/log/apache/access.log.13.gz,
/var/log/apache/access.log.14.gz, /var/log/apache/access.log.15.gz, /var/log/apache/access.log.16.gz,
/var/log/apache/access.log.17.gz, /var/log/apache/access.log.18.gz, /var/log/apache/access.log.19.gz ]

Max referers to output : 20
Max search engine keywords to output : 50
Max remote hosts to output : 25
Max uris to output : 30
Max file not found error to output : 10
Max referers to output : 20
Max Countries to output : 1000
Max Browsers to output : 50
Max Operating system to output : 50
The summary file of the log analysis will be write to 'summary.xml'
The history will be write to '/home/nioto/jxla/work/history.dump'
Default web page for your web server is 'index.html'
Available regexp for parsing logs are :
$remote_host * $user [d/m/y:h *] "* $uri *" $status $size $referer $agent*

Launch jxla

nioto@serveur1:~/jxla$ bin/runSingle.sh
file doesn't exist, will be created at end of process
22.0 seconds to parse 36055 lines
re  = $remote_host * $user [d/m/y:h *] "* $uri *" $status $size $referer $agent*
match 35598
The output means :
1> the history file is missing, a new one will be created at the end
2> it took 22.0 seconds to analyze 36055 lines ( my linux box is a old one ;-)
3> 35598 lines math the regexp
remark : the (36055-35598) lines not matching are request from scripts kiddies


You can access to some debug output when running, setting the system property DEBUG to true

      Created with NetBeans!
      SourceForge.net Logo