in contribution IT ~ read.

Deploying scalable Django app on Kubernetes

In our company, we decided to move from AWS world with docker/docker-compose orchestration to Kubernetes. I thought it will be an easy task, hoping to leverage my (definitely limited) devops experiences from the past few years. Surprisingly, I hit the ground hard.

Don't get me wrong, Kubernetes (k8s) is really a beautiful piece of software and Google Cloud Platform is much nicer and intuitive to work with than AWS. But the beautiful abstraction of k8s come with its costs.

I assume that the reader have basic knowledge about how Kubernetes works and how deployments et cetera are created. Most of the inspiration for this came from the following:

  1. Scalable and resilent django
  2. Django-Redis-Posgresql + the second part (git dir)

What I missed in the above though was a concrete example.

Our old solution was having everything in a quite simple docker-compose.yml's files with external environment files for different environments. So this was the baseline.

Here I want to summarize how I deployed (IMO) a pretty standard, highly scalable application in Django. These were my requirements:

  1. stateless django-workers - the logic, the core, the heart of the solution. Main django application must be stateless to allow...
  2. horizontal scalability for django-workers - being able to scale the app simply by adding additional workers
  3. scalable cache - redis in master-slave configuration, enabling adding additional slaves on fly.
  4. managed db - in the Google Cloud SQL (postgresql)
  5. decoupled - db, cache, django-workers are on their own
  6. static-files hosted on google CDN (google cloud storage)
  7. cheap - leveraging k8s' autoscaling to save money during the low load but being able to spike during high seasons
  8. secure - following up to date security recommendations
  9. multienvironment - minimizing distance from testing (staging) and production environment (and maintanance)
  10. maintainable - ideally using CI for new releases, leveraging helm, development must be still possible without any layer of containerizing, simple python manage.py runserver must bring up the server without any issues.

Here is my solution and it's being described below. You can find a settings.py file in the repository as well, so you know what values maps from the Kubernetes elements into django app.

Database

Using GC SQL required three steps:

  1. creating new DB
  2. migrating the existing AWS RDS database
  3. adding Google Cloud Proxy container to my deployments

Creating new SQL instance and database

That's easy. Go to SQL overview and create a new SQL instance. Create a new user proxyuser (choose whatever) using the Web UI, as well as create a new DB djangoappdb next to the default postgres. On the overview, you need to get Instance connection name which will be used later. The db user, the password and db name must be later specified in the configs - django obviously needs them.

Migrating existing database

I needed to dump the existing psql instance in amazon RDS. The easiest for me was to go to the docker machine with the production instance and do:

$ docker exec -ti djangoapp sh

# had to install postgresql on my alpine base image
# $ apk update
# $ apk add postgresql 

# now the dump, where I could reuse the containers env vars
pg_dump -h $DJANGO_APP_DB_HOST -U $DJANGO_APP_USER --no-owner --format=plain $DJANGO_APP_DB_NAME > dump.sql  # now insert $DJANGO_APP_DB_PASSWORD  

This gave me copy of my production DB. To be able to import it into Google SQL, I had to remove the following lines starting on 20th line in the dump:

CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;  
--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -
--
COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';  

it's probably something by default available on AWS RDS but not in Google SQL. I really didn't care. After the removal, I could proceed further.

First, I had to go to Google Cloud Storage settings and create a new bucket to be able to load the dump there. Name however you want and upload the dump there. I used:

gsutil cp dump.sql gs://dump123/dump.sql  

Now go to GC SQL again and click on import. Put the path to the GCS dump and select the database we created earlier (djangoappdb). Set the import user to the previously created proxyuser.

Adding Google Cloud Proxy

Google SQL doesn't allow direct access to the database from the containers (or rather, pods) - you have to talk to them through the Google clouds SQL proxy. This is what you need to follow. The most important part which hasn't been covered yet is the fact that you need to create a new service account, which google cloud proxy is going to use to communicate with the SQL. The name of this acc can be arbitrary. I also added this file to the deployment, but it is maybe to impractical.

What does this mean for your app? Well, the deployment for the django app will probably look like this:

      - env:
        - name: DJANGO_APP_DB_USER
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: username
        - name: DJANGO_APP_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: password

where the referenced secrets and keys are from the credentials you create following the steps on the last link. I explicitly let it like this because it corresponds to the GCP documentation, but in my deployment, I simply added these variables into my configmap and secrets and fetching them using envFrom as is on the few lines below.

The proxy container is what your django app is going talk through with your DB. Hence, the django app only sees the proxy container and hence should communicate through something like:

DJANGO_APP_DB_HOST: "localhost"  #  
DJANGO_APP_DB_PORT: "5432"  
DJANGO_APP_DB_NAME: "djangoappdb"  # DB name where you import the data  

which is in my case all together consumed in settings.py as env vars:

DATABASES = {  
    'default': {
        'ENGINE': 'django.db.backends.' + os.environ.get('DJANGO_APP_DB_ENGINE', 'sqlite3'),  # this is in case of PSQL: `postgresql_psycopg2`
        'NAME': os.environ.get('DJANGO_APP_DB_NAME', os.path.join(BASE_DIR, 'db.sqlite3')),
        'USER': os.environ.get('DJANGO_APP_DB_USER'),
        'HOST': os.environ.get('DJANGO_APP_DB_HOST'),
        'PORT': os.environ.get('DJANGO_APP_DB_PORT'),
        'PASSWORD': os.environ.get('DJANGO_APP_DB_PASSWORD'),
    }
}

The connection string referenced in the document for SQL proxy is the one I mentioned in the DB creation: Instance connection name.

Static files

To make the django-worker stateless, I needed to move the static files out of the local storage. That means that all replicas of the django-worker will have the same state of statics.

One solution would be creating some shared volume, but why not taking leverage of the cheap and fast Google Cloud Storage instead of managing some volume?

Hence, just go to the GCS and create a bucket where we'll copy the static files during deployment.

The only change I had to do in my settings.py was:

STATIC_URL = os.getenv('DJANGO_APP_STATIC_URL', '/static/')  

and when deploying, setting env var:

DJANGO_APP_STATIC_URL = "gs://my-bucket/static"  

to copy the files over there, I do:

python manage.py collectstatic --no-input --verbosity 1  
gsutil rsync -R ./static gs://my-bucket/static  

CORS

To be able to fetch data from GCS, you need to set up CORS on the bucket.

[
    {
        "origin": [
            "https://your-domain",
            "http://you-can-specify-even-more-than-one",
        ],
      "responseHeader": ["Content-Type"],
      "method": ["GET", "HEAD"],
      "maxAgeSeconds": 3600
    }
]

save the JSON above as cors.json and execute:

gsutil cors set cors.json gs://{your-bucket}  

Cache

This was actually again quite easy. Basically, we deploy one redis-master and multiple redis-slaves as needed. See the charts for more information, there are no gotchas.

To preserve easy development - not having to run redis cache when wanting, you want to have locmemcache available. But to leverage multiple caches in django_redis.cache.RedisCache driver, you need to specify multiple caches (hence the split). I solved this as follows:

CACHE_LOCATION = os.getenv('DJANGO_APP_CACHE_LOCATION', 'wonderland').split()  
CACHES = {  
    "default": {
        "BACKEND": os.getenv('DJANGO_APP_CACHE_BACKEND',
                             'django.core.cache.backends.locmem.LocMemCache'),
        # e.g. redis backend supports multiple locations in list (for replicas), 
        # while e.g. LocMemCache needs only a string
        "LOCATION": CACHE_LOCATION[0] if len(CACHE_LOCATION) == 1 else CACHE_LOCATION, 
    }
}

this consumes the following:

  DJANGO_APP_CACHE_BACKEND: "django_redis.cache.RedisCache"
  # first element here is considered master
  DJANGO_APP_CACHE_LOCATION: "redis://redis-master:6379/1 redis://redis-slave:6379/1"

Networking

This is actually easier than I thought and that it is in docker-compose. First of all, I didn't need nginx anymore as a proxy/load balancer (yay! Once thing less to care about).

First of all, I went to networking - external IP address here and created a new IP address. This is what we need to set to our loadbalancer:

apiVersion: v1  
kind: Service  
spec:  
  type: LoadBalancer
  loadBalancerIP: <THE_STATIC_IP>

Release and deploy

It's really just:

  1. build and push a new image docker build -t $IMAGE .
  2. manually make a (optionally backward-compatible) migration
  3. collect static files and upload them to GCS (as above)
  4. use helm to deploy new release (see below)
  5. optionally create a new migration to remove obsolete tables/columns... after all old pods are replaced by the new code

Helm

Using helm is quite no brainer and there is really not much I could tell here.

To install the chart, I did (once):

helm install my-chart -f my-chart/values.staging.yaml  
# this returned RELEASE_NAME

Then, to make a new release of the chart, all I have to do is:

helm upgrade $RELEASE_NAME my-chart -f my-chart/values.staging.yaml  

Environment variables and secrets

I tried to parametrize the chart as much as possible into values.yaml. The environment specific values are then placed into appropriate values.<env>.yaml. Hence the -f my-chart/values.staging.yaml switch above. Secrets are still part of the chart except the ones for the database.