Increase gunicorn concurrency performance with multiprocessing, multithreading or gevent

If you run gunicorn without any configuration, it can only handle one request a time.

# running without special options
gunicorn example:app

I recommend two ways to increase concurrency of gunicorn.

multiprocessing + multithreading

First of all, we should increase number of gunicorn worker processes . By default, only one worker process is used to handle requests.

In the following example, we use -w 8 to specify number of worker processes. In general, the more cpu cores you have, the more processes you can specify. In my experience, process number can be twice as much as cpu cores.

# on my 4 cpu cores computer, I let gunicorn run with 8 worker processes
gunicorn -w 8 example:app

Now we can handle 8 requests a time. That's not enough. We can further utilize multithreading.

The default number of threads per process is 1. In general, the more IO jobs you do, the more threads you should use. IO jobs are things like accessing database and reading/writing files. If you have no idea about it, just set a number like 50, and tweak later.

In the following example, we use --threads 50 to specify number of threads per process:

# 8 worker processes, 50 threads per process.
gunicorn -w 8 --threads 50 example:app

Finally, we can handle 400 requests a time. 400 equals 8 * 50.

multiprocessing + gevent

Like the previous chapter, multiprocessing is still used, but instead of threads we use gevent here. Gevent is a coroutine-based Python networking library. Gunicorn gvent worker theoretically can have more concurrency than multithreading worker. You cant test both of them on your server and choose the one with better performance.

Install gevent if you do not already have one:

pip install gevent

In the following example, we use -k gevent --worker-connections 1000 to specify number of gevent concurrency per process:

# 8 worker processes, 1000 gevent concurrency per process.
gunicorn -w 8 -k gevent --worker-connections 1000 example:app

Now we can handle 8000 requests a time. 8000 equals 8 * 1000.

8000 is a theoretical value, whether you can reach it depends on your usage scenarios. Like multithreading, the more IO jobs you do, the better gevent works. Just test it on your environment.

Posted on 2022-03-01