Deploying scalable Django app on Kubernetes
In our company, we decided to move from AWS world with docker
/docker-compose
orchestration to Kubernetes. I thought it will be an easy task, hoping to leverage my (definitely limited) devops experiences from the past few years. Surprisingly, I hit the ground hard.
Don't get me wrong, Kubernetes (k8s) is really a beautiful piece of software and Google Cloud Platform is much nicer and intuitive to work with than AWS. But the beautiful abstraction of k8s comes with its costs.
I assume that the reader has basic knowledge about how Kubernetes works and how deployments et cetera are created. Most of the inspiration for this came from the following:
- Scalable and resilient django
- Django-Redis-Posgresql + the second part (git dir)
What I missed in the above though was a concrete example.
Our old solution was having everything in a quite simple docker-compose.yml
's files with external environment files for different environments. So this was the baseline.
Here I want to summarize how I deployed (IMO) a pretty standard, highly scalable application in Django. These were my requirements:
- stateless
django-workers
- the logic, the core, the heart of the solution. Maindjango
application must be stateless to allow... - horizontal scalability for
django-workers
- being able to scale the app simply by adding additional workers - scalable
cache
-redis
in master-slave configuration, enabling adding additional slaves on fly. - managed
db
- in the Google Cloud SQL (postgresql) - decoupled -
db
,cache
,django-workers
are on their own - static-files hosted on google CDN (google cloud storage)
- cheap - leveraging k8s' autoscaling to save money during the low load but being able to spike during high seasons
- secure - following up to date security recommendations
- multienvironment - minimizing distance from testing (staging) and production environment (and maintanance)
- maintainable - ideally using CI for new releases, leveraging helm, development must be still possible without any layer of containerizing, simple
python manage.py runserver
must bring up the server without any issues.
Here is my solution and it's being described below. You can find a settings.py
file in the repository as well, so you know what values maps from the Kubernetes elements into django app.
Database
Using GCP SQL required the following steps:
- creating a new GCP DB
- migrating the existing AWS RDS database
Creating new SQL instance and database
That's easy. Go to SQL overview and create a new SQL instance, ideally provisioning an internal IP address on the same network as your cluster (by default you have only one: default
, so if you don't know what this means, just ignore it). Create a new user myappuser
(choose whatever username of course) using the Web UI, as well as create a new DB djangoappdb
next to the default postgres
. The db user, the password, db name and IP address must be later specified in the configs - django app obviously needs them.
Migrating existing database
I needed to dump the existing PostgreSQL instance in amazon RDS. The easiest for me was to go to the docker machine with the production instance and do:
$ docker exec -ti djangoapp sh
# had to install postgresql on my alpine base image
# $ apk update
# $ apk add postgresql
# now the dump, where I could reuse the containers env vars
pg_dump -h $DJANGO_APP_DB_HOST -U $DJANGO_APP_USER --no-owner --format=plain $DJANGO_APP_DB_NAME > dump.sql # now insert $DJANGO_APP_DB_PASSWORD
This gave me copy of my production DB. To be able to import it into Google SQL, I had to remove the following lines starting on 20th line in the dump:
CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -
--
COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
it's probably something by default available on AWS RDS but not in Google SQL. I really didn't care. After the removal, I could proceed further.
First, I had to go to Google Cloud Storage settings and create a new bucket to be able to load the dump there. Name however you want and upload the dump there. I used:
gsutil cp dump.sql gs://dump123/dump.sql
Now go to GCP SQL again and click on import. Put the path to the GCS dump and select the database we created earlier (djangoappdb
). Set the import user to the previously created myappuser
.
DJANGO_APP_DB_HOST: "DB IP address from the previous step"
DJANGO_APP_DB_PORT: "5432"
DJANGO_APP_DB_NAME: "djangoappdb" # DB name where you import the data
which is in my case all together consumed in settings.py
as env vars:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.' + os.environ.get('DJANGO_APP_DB_ENGINE', 'sqlite3'), # this is in case of PSQL: `postgresql_psycopg2`
'NAME': os.environ.get('DJANGO_APP_DB_NAME', os.path.join(BASE_DIR, 'db.sqlite3')),
'USER': os.environ.get('DJANGO_APP_DB_USER'),
'HOST': os.environ.get('DJANGO_APP_DB_HOST'),
'PORT': os.environ.get('DJANGO_APP_DB_PORT'),
'PASSWORD': os.environ.get('DJANGO_APP_DB_PASSWORD'),
}
}
Static files
To make the django-worker
stateless, I needed to move the static files out of the local storage. That means that all replicas of the django-worker will have the same state of statics.
One solution would be creating some shared volume, but why not taking leverage of the cheap and fast Google Cloud Storage instead of managing some volume?
Hence, just go to the GCS and create a bucket where we'll copy the static files during deployment.
The only change I had to do in my settings.py
was:
STATIC_URL = os.getenv('DJANGO_APP_STATIC_URL', '/static/')
and when deploying, setting env var:
DJANGO_APP_STATIC_URL = "gs://my-bucket/static"
to copy the files over there, I do:
python manage.py collectstatic --no-input --verbosity 1
gsutil rsync -R ./static gs://my-bucket/static
CORS
To be able to fetch data from GCS, you need to set up CORS on the bucket.
[
{
"origin": [
"https://your-domain",
"http://you-can-specify-even-more-than-one",
],
"responseHeader": ["Content-Type"],
"method": ["GET", "HEAD"],
"maxAgeSeconds": 3600
}
]
save the JSON above as cors.json
and execute:
gsutil cors set cors.json gs://{your-bucket}
Cache
This was actually again quite easy. Basically, we deploy one redis-master
and multiple redis-slave
s as needed. See the charts for more information, there are no gotchas.
To preserve easy development - not having to run redis cache - you want to have locmemcache
available. But to leverage multiple caches in django_redis.cache.RedisCache
driver, you need to specify multiple caches (hence the split below). I solved this as follows:
CACHE_LOCATION = os.getenv('DJANGO_APP_CACHE_LOCATION', 'wonderland').split()
CACHES = {
"default": {
"BACKEND": os.getenv('DJANGO_APP_CACHE_BACKEND',
'django.core.cache.backends.locmem.LocMemCache'),
# e.g. redis backend supports multiple locations in list (for replicas),
# while e.g. LocMemCache needs only a string
"LOCATION": CACHE_LOCATION[0] if len(CACHE_LOCATION) == 1 else CACHE_LOCATION,
}
}
this consumes the following in the deployment:
DJANGO_APP_CACHE_BACKEND: "django_redis.cache.RedisCache"
# first element here is considered master
DJANGO_APP_CACHE_LOCATION: "redis://redis-master:6379/1 redis://redis-slave:6379/1"
Networking
This is actually easier than I thought and that it is in docker-compose
. First of all, I didn't need nginx
anymore as a proxy/load balancer (yay! One thing less to care about).
First of all, I went to networking - external IP address
here and created a new IP address. This is what we need to set to our loadBalancer:
apiVersion: v1
kind: Service
spec:
type: LoadBalancer
loadBalancerIP: <THE_STATIC_IP>
For HTTPS, you'll need an ingress, which is not covered in this tutorial but has there are plenty of articles for this already.
Release and deploy
It's really just:
- build and push a new image
docker build -t $IMAGE .
- manually make an (optionally backward-compatible) migration
- collect static files and upload them to GCS (as above)
- use helm to deploy new release (see below)
- optionally create a new migration to remove obsolete tables/columns... after all old pods are replaced by the new code
Helm
Using helm is quite no brainer and there is really not much I could tell here.
To install the chart, I did (once):
helm install my-chart -f my-chart/values.staging.yaml
# this returned RELEASE_NAME
Then, to make a new release of the chart, all I have to do is:
helm upgrade $RELEASE_NAME my-chart -f my-chart/values.staging.yaml
Environment variables and secrets
I tried to parametrize the chart as much as possible into values.yaml
. The environment-specific values are then placed into appropriate values.<env>.yaml
. Hence the -f my-chart/values.staging.yaml
switch above. Secrets are still part of the chart except for the ones for the database.
To encode secrets, I used GCP KMS with Sops. Create a keyring and a key and then you can encode yaml secrets as:
sops --gcp-kms <your-key-here> -i -e secrets.<env>.yaml
and before the deployment you can just
sops -d secrets.<env>.yaml
and then simply reference it as additional values file:
helm upgrade $RELEASE_NAME my-chart -f my-chart/values.staging.yaml -f my-chart/values.staging.yaml.decoded