How to send Nginx logs to SQL database

15th April, 2020

In the world of internet which is full of websites, we have numerous of them using Nginx as their web server. Now if you are a data analyst or devops person, you would like to have the Nginx logs in a database where one can query them and analyse them based on different requirements. 

This blog is dedicated to resolve such needs where one needs to send the Nginx access logs to a PostgreSQL database. To achieve this we require the following very simple tools:

  • Nginx: web server for reverse proxy
  • Rsyslog: for forwarding log messages
  • PostgreSQL: database to store the log messages (there is another section below which supports any database)

The steps are as follows: (for Ubuntu)

  1. Nginx set up
      
    1. sudo apt-get update & sudo apt-get install nginx

    2. Create a new configuration file for your website (for e.g. mywebsite.com) 
      sudo vim /etc/nginx/sites-available/mywebsite

    3. Put/edit the following and configure it according to your needs. The important part here is the format of the Nginx log output and location of the log file for mywebsite.com.
      log_format json_output '{"time_local": "$time_local", '
         '"path": "$request_uri", '
         '"ip": "$remote_addr", '
         '"time": "$time_iso8601", '
         '"user_agent": "$http_user_agent", '
         '"user_id_got": "$uid_got", '
         '"user_id_set": "$uid_set", '
         '"remote_user": "$remote_user", '
         '"request": "$request", '
         '"status": "$status", '
         '"body_bytes_sent": "$body_bytes_sent", '
         '"request_time": "$request_time", '
         '"http_referrer": "$http_referer" }';
      
      server {
          listen 80;
          server_name <domain_name>;
          client_max_body_size 75M;
          location / {
              proxy_read_timeout 300;
              proxy_connect_timeout 300;
              proxy_pass http://0.0.0.0:5000;
              include uwsgi_params; # the uwsgi_params in nginx
      
              # This line is important. The logs will now be stored in /var/log/nginx/mywebsite/access.log as json_output format defined above.
              access_log /var/log/nginx/mywebsite/access.log json_output;
          }
      }
      ​

       

    4. Sym-link this configuration to /etc/nginx/sites-enabled/
      sudo ln -s /etc/nginx/sites-available/mywebsite /etc/nginx/sites-enabled/

    5. Check the Nginx configuration if everything is Ok.
      sudo nginx -t

    6. Restart Nginx
      sudo systemctl restart nginx

    7. Load mywebsite.com in browser and the access logs for mywebsite.com can now be viewed in /var/log/nginx/mywebsite/access.log .

  2. PostgreSQL set up

    1. sudo apt install postgresql postgresql-contrib

    2. sudo su postgres

    3. Initiate the psql client

    4. Create a role
      CREATE ROLE <user> WITH LOGIN encrypted password '<password>';

    5. Create a database
      CREATE DATABASE <dbname>;

    6. Connect to the database and create a table for logs
      \c <dbname>;
      CREATE TABLE accesslog (id serial NOT NULL PRIMARY KEY, log_line json NOT NULL, created_at TIMESTAMP NOT NULL);

    7. Grant access to the role 
      GRANT ALL PRIVILEGES ON DATABASE <dbname> to <user>;
      GRANT ALL PRIVILEGES ON ALL TABLES in SCHEMA public to <user>;
      GRANT ALL PRIVILEGES ON ALL SEQUENCES in SCHEMA public to <user>;

  3. Rsyslog set up

    1. sudo apt-get update & sudo apt-get install rsyslog-pgsql

    2. Create a configuration file /etc/rsyslog.d/51-mywebsite.conf (the configuration files of rsyslog are read alphabetically so the first two numbers are relevant for the order).

    3. Add the following to the file.
      # imfile module is needed to capture file events
      module(load="imfile") # Load the imfile input module
      
      # The imfile module will wait for new loglines in the given file
      # Tag is recommended and relevant especially while hosting multiple websites in one server.
      input(type="imfile"
            File="/var/log/nginx/mywebsite/access.log"
            Tag="mywebsite:")
      
      # Load the ompgsql output module for sending logs to PostgreSQL server
      module(load="ompgsql")
      
      # To insert a log as one row of table, a template is needed.
      # The template below will determine how to insert a row in table named as accesslog.
      # The json log output as defined in Nginx configuration will be inserted in log_line json column.
      # The creation timestamp will be inserted to column created_at.
      template(name="pgsql-template" type="list" option.sql="on") {
        constant(value="INSERT INTO accesslog (log_line, created_at) values ('")
        property(name="msg")
        constant(value="','")
        property(name="timereported" dateformat="pgsql" date.inUTC="on")
        constant(value="')")
      }
      
      # The following action determines the event to be triggered when a new logline is detected by imfile module.
      # Based on the tag defined above, it will insert the log to the PostgreSQL database.
      # Credentials of the database are needed here.
      # The template name should be the same as defined above.
      if( $syslogtag == 'mywebsite:')  then {
        action(type="ompgsql" server="localhost"
              user="<user>"
              pass="<password>"
              db="<dbname>"
              template="pgsql-template"
              queue.type="linkedList")
      }​

       

    4. Restart the rsyslog
      sudo service rsyslog restart


Troubleshooting:

  • It might happen that rsyslog is unable to add logs to the database. In that case check the rsyslog log at /var/log/syslog for errors and also try to insert a sample log to the database manually using the database role and its credntials.

That's it. Now all the logs from Nginx from mywebsite.com will also be sent to the PostgreSQL database server.

 

Send Nginx logs to any database after processing the logs

The above implementation relies completely on rsyslog configuration. However, there will be at times where one also needs to process the logs and do even more steps before inserting them to a database. To have more flexibility, the best case scenario will be to somehow transfer each incoming log to a script which will then later do all the processing steps according to one's own requirements. Moreover with such setup, one can also customise the script to insert the incoming messages to any database using respective database clients.

To achieve such setup, we need a script which will keep listening to any incoming log from the rsyslog as well as we need to alter the rsyslog setting to forward the logs to the script. Here a solution based on Python3 has been described below. The same infrastructure can also be used for any programming language of your choice.

  1. First we need to expose one port for listening, for e.g. 6000 in localhost.
  2. Second we need to have a Python script listening to this port all the time.
  3. Create a Python script with following content.

    import psycopg2 # PostgreSQL adapter for Python
    import asyncio
    import configparser
    import json
    
    # Create a config.ini file having database credentials and the port number.
    CONFIG = configparser.ConfigParser()
    CONFIG.read("config.ini")
    
    TCP_PORT = int(CONFIG["RSYSLOG"]["FORWARD_PORT"])
    
    try:
        credentials = {
            "host": CONFIG["POSTGRESQL"]["DB_HOST"],
            "port": int(CONFIG["POSTGRESQL"]["DB_PORT"]),
            "database": CONFIG["POSTGRESQL"]["DB_NAME"],
            "user": CONFIG["POSTGRESQL"]["DB_USER"],
            "password": CONFIG["POSTGRESQL"]["DB_PASSWORD"]
        }
        CONN = psycopg2.connect(**credentials)
        CURSOR = CONN.cursor()
    except:
        print("Unable to connect to the database")
        quit()
    
    INSERT_QUERY = "INSERT INTO access_log (log_line, created_at) VALUES (%s, %s);"
    
    
    def execute_sql(params):
        try:
            CURSOR.execute(INSERT_QUERY, params)
            CURSOR.close()
            CONN.commit()
        except Exception as e:
            CONN.rollback()
            raise e
    
    
    class LogAnalyser:
    
        def __init__(self):
            pass
    
        def process(self, str_input):
            str_input = str_input.decode("utf-8", errors="ignore")
            # Add your processing steps here
            # ...
            try:
                # Extract created_at from the log string
                str_splits = str_input.split("{", 1)
                json_text = "{" + str_splits[1]
                data = json.loads(json_text)
                created_at = data["time"]
                log_line = json_text
                return log_line, created_at # The order is relevant for INSERT query params
            except Exception as e:
                print(e)
            return None
    
    
    @asyncio.coroutine
    def handle_echo(reader, writer):
        log_filter = LogAnalyser()
        while True:
            line = yield from reader.readline()
            if not line:
                break
            params = log_filter.process(line)
            if params:
                execute_sql(params=params)
    
    
    loop = asyncio.get_event_loop()
    coro = asyncio.start_server(handle_echo, '127.0.0.1', TCP_PORT, loop=loop)
    server = loop.run_until_complete(coro)
    
    # Serve requests until Ctrl+C is pressed
    print('Serving on {}'.format(server.sockets[0].getsockname()))
    try:
        loop.run_forever()
    except KeyboardInterrupt:
        pass
    
    # Close the server
    print("Closing the server.")
    server.close()
    loop.run_until_complete(server.wait_closed())
    loop.close()
    CURSOR.close()
    CONN.close()
    ​

    The script above will create a TCP server listening at the port number provided in your config.ini file. As new messages will be received by the server, it will analyse and the process the logs to extract the relevant information such as created_at and then execute database commands to enter it to the database.

    The execute_sql method can be replaced with any method for storing the logs to any kind of database.

  4. Start the python script.
  5. Change the rsyslog configuration in file /etc/rsyslog.d/51-mywebsite.conf .

    input(type="imfile"
          File="/var/log/nginx/mywebsite/access.log"
          Tag="mywebsite:")
    
    # omfwd module for forwarding the logs to another tcp server
    if( $syslogtag == 'mywebsite:')  then {
      action(type="omfwd" target="127.0.0.1" port="6000" protocol="tcp"
                action.resumeRetryCount="100"
                queue.type="linkedList" queue.size="10000")
    }​

     

  6. Restart the rsyslog to reload the new configuration.
    sudo service rsyslog restart

    Now rsyslog will forward all the logs to a tcp server running at localhost at port 6000. At the same time, the python is always listening and waiting for incoming messages at localhost port 6000 (defined in config.ini settings) which will then later process each message and even insert it to given database settings.

    In this way, one has more control over how to process the logs and customise the script according to one's wish. The important step was to forward the incoming logs to a another tcp server which contains all the log processing methods.