In the last 2 weekends, I spent a bit of time revamping the backbone of this website since I felt I needed to have more control over it, not just on the presentation layer, but also on the architecture (while I was there, I did also add a sprinkle of darkness to the CSS but that's not what I want to focus on in this post).

Everything started from my will to build a custom and ethical analytics system for this website. I'm always curious to know from what part of the globe people are landing here and what pages generate more traffic, so my main goals were:

  1. I don't want/need to collect a huge amount of data, but I want to have precise control over that data

  2. I really don't want to directly/indirectly support Google by feeding data of my visitors to it. I try to use DuckDuckGo as much as I can, but I still rely on Google from time to time, so I'm already giving to it plently of personal data and I want to limit that as much as I can

  3. I can use this occasion as a way to learn docker and brush up my backend skills

I thought that I could use a bit of PostgreSQL, Flask, NGINX and tie them all together using docker and docker-compose. In the past I already played with NGINX and NodeJS, but it's been a while since I hacked things in the backend, so I was ready to immerse myself in a few days of pure geeky fun!

How this site used to work

When I rebooted by website from the old wordpress blog, I was looking for something:
  1. With more control..

  2. ..but less frills than a CMS

  3. Cheap or potentially free

So I picked Github Pages, which uses Jekyll, a static site generator.

Basically, I have a local git repo containing my whole blog. I can write the pages in markdown or html, then run

bundle exec jekyll serve

to generate the actual static site and preview it locally, then git push it to Github, which handles serving the pages. That's it.

The site is totally static, there's no cookies, no server-side querying going on, so it's very performant and lightweight: perfect for serving pure content. Github Pages is free and you can attach your own domain, which is what I did.

Plugging analytics into it

Github pages worked great or a number of years: it's very easy to maintain and I love that both the css/html templates and the actual content are versioned through git.

When I was thinking about different approaches for making my own analytics tool, I asked a hacker virtual friend of mine who suggested that I could simply scrape the logs of the server and get enough generic data about the visitors, which is a very cool idea that I didn't think of. The thing is that with Github Pages you don't really have access to the server where the site is running..

Which made me think that it was about time that I also moved the hosting of the blog itself to a place where I had full control over and I could just ssh easily.

Since I had a bit of free time in these weekend, and I wanted to explore docker, I thought that I could build a simple PostreSQL + Flask app that just receives POST requests and saves data in a structured way in the PSQL database. I also initially thought of using a Cloud provider for the orchestration part, but for the sake of KISS, I went for the route of a custom Virtual Private Server with Digital Ocean: for 5 quids a month I get ssh access, and enough disk space and ram for my current needs.

It took a bit (2 weekends) to figure out how to plug everything with docker-compose, but in the end I got exactly what I wanted: a simple system for collecting user data that I have 100% control over, that isn't evil or invasive, and that doesn't use cookies at all (yet).

How this site works now

Everything is handled by running a

docker-compose up

which takes care of spinning all of the containers:

  • A Flask API that handles the POST requests for the analytics. It also sends the right headers back as response to the requests so that CORS is possible only for the handpicked origins that I decide :)

  • A PostreSQL database that stores the data in classic relational tables (and uses a persistent volume). I'm using sqlalchemy in my Flask python app because I find particularly appealing the ORM approach. My blog is not a performance critical application, so I'm happy to stay away from a having a bunch of scripts and function that handle the actual SQL calls and instead have nice (Data) Classes that are easy to read and improve clarity of the code by a huge factor (and I can tell you that for sure because I worked in code bases where there's no ORM approach.. things get very messy very soon, unless you're disciplined).

  • A NGIX server that acts as the reverse proxy. In my case, it forwards any request to the /api route to Flask, and for the more generic root ( /) requests, it tries to find a matching static file/dir on a specific directory of my VPS, which is where the jekyll site is deployed (where deploying really just means.. rsyncing the files over to a specific directory on my VPS). Having NGINX as the only "door" where people can knock makes it easy to set SSL just at the NGINX configuration level, instead of having to deal with HTTPS inside Flask, which instead only deals with HTTP calls. Once again, KISS.

Talk is cheap, show us the code!

Since I know that it all sounds easy when somebody explains it, but you run into million un-advertised issues whenever you actually try to do it yourself, here's my commented nginx and my docker-compose configurations (keep in mind that I'm definitely not a pro nginx/docker user, I just happen to have put together something that works, after looking here and there in different docs):

server {

    listen 80;
    listen ssl 443;

    ssl_certificate /etc/ssl/valerioviperino_me_bundle.crt;
    ssl_certificate_key /etc/ssl/valerioviperino_me.key;
    server_name valerioviperino.me www.valerioviperino.me;

    # Redirect HTTP TO HTTPS (there's probably a better way)
    if ($scheme = http){
        return 301 https://$server_name$request_uri;
    }

    # Try to serve static files coming from jekyll
    location / {
        root /var/www/;
        try_files $uri $uri/ $uri.html index index.html =404;
    }

    # Reroute all api calls to my flask server
    location /api {
        proxy_pass http://webapp:8081/;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_redirect off;
    }

    # Custom 404 page
    error_page 404 /404.html;
    location = /custom_404.html {
            root /usr/share/nginx/html;
            internal;
    }
}

docker-compose file:


version: "3.7"

services:
    webapp:
        build: ./services/webapp
        depends_on:
            - db
        environment:
            - PYTHONDONTWRITEBYTECODE=1
            # see https://stackoverflow.com/questions/59812009/what-is-the-use-of-pythonunbuffered-in-docker-file
            - PYTHONUNBUFFERED=1
            # Flask
            - "FLASK_PORT=${FLASK_PORT}"
            - FLASK_DEBUG=1
            # PSQL Connection
            - "POSTGRES_PORT=${PSQL_PORT}"
            - "POSTGRES_USER=${PSQL_USERNAME}"
            - "POSTGRES_PASSWORD=${PSQL_PASSWORD}"
            - "POSTGRES_DB=${PSQL_DB}"
        links:
            - db:db
        # 'expose' only exposes the ports to other docker services,
        # 'ports' instead, exposes them to the outside
        expose:
            - "${FLASK_PORT}"
        volumes:
            # to achieve hot reloading while coding
            - ./services/webapp/src:/app
    db:
        image:
            postgres:alpine
        environment:
            - "POSTGRES_USER=${PSQL_USERNAME}"
            - "POSTGRES_PASSWORD=${PSQL_PASSWORD}"
            - "POSTGRES_DB=${PSQL_DB}"
        ports:
            - "${PSQL_PORT}:${PSQL_PORT}"
        volumes:
            - postgres_data:/var/lib/postgresql/data/
    nginx:
        build: ./services/nginx
        ports:
            - 80:80
            - 443:443
        depends_on:
            - webapp
        volumes:
            - ./services/nginx/var/www:/var/www

# To make the data persistent beyond the life of the container
volumes:
    postgres_data: