Python Concurrency with Asyncio: An Intro to Python's Concurrency Model

Python's approach to concurrency is tied to its Global Interpreter Lock (GIL).

Mar 1, 2025

This is the first article in a series exploring concurrency in Python using asyncio. Each article will be largely based on my summary of each chapter I read from the book Python Concurrency with Asyncio by Matthew Fowler.

Here, I present an introduction to concurrency in general with a focus on Python’s concurrency model. After the article is an appendix of important definitions. These are perhaps more informal definitions based on my understanding of the terms (I strive to be both simple and accurate). You can read the definitions first and keep them in mind, or go back to them if you encounter a term in the article that you are unfamiliar with.

TL;DR

Python programs are limited by the Global Interpreter Lock (GIL), which ensures that only one thread has access to an object at any point in time.

To enable concurrency, Python’s works around the GIL limitation by implementing a single-threaded event loop. This model leverages non-blocking sockets at the OS level to handle I/O operations concurrently while Python code executes on a single thread. Using coroutines (via async/await syntax), we can write code that pauses during I/O operations, allowing other tasks to run.

This approach excels for I/O-bound applications like web servers and data pipelines, offering efficiency without the complexities of thread synchronization. While this model provides concurrency, achieving true parallelism in Python requires using multiple processes, each with its own GIL.

Python’s Concurrency Model: The Single-Threaded Event Loop

The Global Interpreter Lock and its limitations

At the heart of Python’s concurrency model is the Global Interpreter Lock (GIL). The GIL is a mutex (or lock) that protects access to Python’s objects, preventing multiple threads from executing Python bytecode simultaneously. This limitation exists because CPython’s memory management (which is handled via object reference count) is not thread-safe. As a result, a Python proces can effectively utilize only one CPU core at a time for Python code execution, regardless of how many threads it creates. This presents a challenge for CPU-bound tasks that would typically benefit from parallel execution across multiple cores.

Achieving concurrency within GIL constraints

Despite this ‘apparent’ limitation, Python can still achieve concurrency, particularly for I/O-bound operations. This is possible because at the OS level, I/O operations are implemented concurrently. Thus, the GIL can be released during those operations to allow other tasks to execute while waiting for I/O to complete.

Building on this, Python developed a solution for concurrency despite the GIL: the single-threaded event loop. This approach allows Python programs to perform concurrent operations without the complexity and overhead of multiple threads, and crucially, without violating the GIL’s constraints.

Key Components of Python’s Concurrency Model

Event Loop

At the core of Python’s concurrency model is the event loop: an infinitely running queue that orchestrates task execution. Think of it as a conductor in an orchestra, directing which section plays at what time, ensuring that no musician (task) remains idle when they could be performing. The event loop follows a simple yet powerful pattern:

Check for tasks that are ready to run
Run those tasks until they either complete or reach a point where they need to wait
While waiting, switch to other tasks that are ready
Repeat this process indefinitely

Non-blocking Sockets

To understand how the event loop enables concurrency, we need to understand a bit about how I/O operations work at the OS level. Sockets–the OS abstraction for data transfer–are the primary means by which programs interact with external systems.

By default, sockets operate in blocking mode, meaning that when a program requests data from a socket, it must wait until that data is available before continuing excution (which is what we experience when writing sequential/synchronous programs). This is inefficient when dealing with multiple I/O operations, as the program spends most of its time waiting.

Non-blocking sockets solve this problem by allowing a program to submit a request and immediately move on to other tasks. When the requested data becomes available, the OS notifies the program, which can then process the result. The notification is handled using the OS’s built-in event notification system.

Coroutines

The final component is coroutines–specialized functions that can pause their execution at specific points and later resume where they elft off. Unlike regular functions that run to completion once called, coroutines can yield control back to the event loop when they reach an I/O operation, allowing other tasks to execute.

In Python, coroutines are implemented using the async and await syntax. When a coroutine awaits an I/O operation, it essentially tells the event loop: “I’m going to be waiting for a while, so feel free to run something else in the meantime.”

Coroutines provide an elegant way to write concurrent code that reads much like synchronous code, hiding much of the complexity of the underlying event loop and non-blocking I/O operations.

How Python’s Concurrency Model Works

Putting all this together, let’s trace how Python executes concurrent operations:

A Python program runs on a single thread, with the event loop coordinating execution.
When the program encounters an I/O operation, it creates a coroutine that awaits the result.
Instead of blocking, the coroutine yields control back to the event loop
The event loop submits the I/O request to the OS using non-blocking sockets.
While waiting for the I/O to complete, the event loop executes other ready coroutines.
When the OS signals that the I/O operation is complete, the event loop scheduels the waiting coroutine to resume.
The coroutine continues execution from where it left off.

This approach allows a single-threaded Python program to handle multiple concurrent operations efficiently, particularly when those operations involve waiting for external resources.

Advantages and Use Cases

The single-threaded event loop model offers advantages:

Simplified Programming Model: Without the complexities of thread synchronization, deadlocks, and race conditions, concurrent code becomes easier to write and reason about. Note that, for more advanced use cases, asyncio does provide ways to manage these complexities and bridge asynchronous and synchronous parts of a program.
Efficiency for I/O-bound Tasks: For applications that spend most of their time waiting for external resources (databases, network services, file systems), the event loop model can achieve performance comparable to or better than multi-threaded approaches, with less overhead.
Scalability: A single event loop can efficiently manage thousands of concurrent connections, making it ideal for high-concurrency scenarios like web servers and API gateways.

Common use cases in which Python’s concurrency model shines include:

Web servers and microservices
Data processing pipelines with significant I/O components
GUI applications that need to remain responsive
Network clients and scrapers
Real-time communication systems

Concurrency vs. Parallelism in Python

While Python’s event loop provides efficient concurrency, it doesn’t address the need for true parallelism in CPU-bound tasks. To achieve parallelism in Python, we must use multiple processes instead of multiple threads.

Each Python process has its own GIL, so by spawning multiple processes, we can truly execute Python code in parallel across multiple CPU cores. Libraries like multiprocessing make this approach relatively straightforward, although it comes with its own set of challenges, such as higher memory overhead and more complex inter-process communication.

This highlights an important distinction: while parallelism always implies concurrency, concurrency does not always imply parallelism. Python’s event loop model is concurrent but not parallel–multiple tasks make progress in overlapping time periods, but they are not executing simultaneously at the CPU level.

Conclusion

Python’s concurrency model, built around the single-threaded event loop, provides an elegant solution to the challenge of concurrent programming within the constraints of the GIL. By leveraging non-blocking I/O operations at the OS level and using coroutines to manage task switching, Python enables efficient concurrent execution without the complexities of multi-threaded programming.

This approach is particularly well-suited for I/O-bound applications, where the ability to make progress on multiple tasks while waiting for external resources can dramatically improve performance and responsiveness.

In the coming articles, we’ll explore how to implement this concurrency model using Python’s asyncio library, dive deeper into coroutines, tasks, futures, and more advanced patterns for concurrent programming in Python.

Appendix: Key Definitions

processes: for simplicity, think of a process as a single unit of execution for a program. That is, anytime a program is executed and run, that program creates a process: a (often self-contained) instance of execution for that program managed by the operating system. For a more detailed dive into processes, I recommend the chapter on processes (chapter 4) of Operating Systems: Three Easy Pieces
threads: threads are often referred to as ‘light-weight processes’, which essentially means that they share some of characteristics of processes and function similarly, but with some major differences. Threads compose processes, and perhaps the major difference between them is that different threads (of the same process) share the same memory address, while different processes (often) have separate memory addresses.

An Illustration of processes and threads

concurrency: concurrency is a property of systems where several tasks happen at the ‘same time’. The concept of ‘same time’ is technically a misnomer, because concurrency often involves a lot of switching between tasks—giving each task micro time slices to run. This is usually managed by the operating system which gives the illusion that that the tasks are executing at the same time.

An Illustration of concurrency

parallelism: task parallelism means that, not only are the tasks (processes) happening at the same time, they are actually being executed at the same time.

An Illustration of parallelism

Concurrency vs. Parallelism

Concurrency and parallelism are easily conflated. A good resource to better understand the difference is a talk by Rob Pike (co-creator of Go), titled Concurrency is not Parallelism.

From The Go Blog, “concurrency is the composition of independently executing processes, while parallelism is the simultaneous execution of (possibly related) computations. Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.”

Global Interpreter Lock (GIL): The Global Interpreter Lock (GIL) is a Python construct that allows only one thread to hold control of the Python interpreter at a time. It was developed for safer memory management between Python processes. For a more comprehensive understanding, I recommend What is the Python Global Interpreter Lock (GIL) from Real Python.
Event Loop: The event loop is a construct in concurrent programming (or event-based systems) that manages the execution of multiple tasks. Think of it like an infinitely running queue that monitors and manages the tasks that we want to run concurrently.
Coroutines: Coroutines are similar to functions, with the key difference being that they allow us to pause and restart their execution at specific points (usually during I/O operations).
Tasks: Tasks are wrappers around coroutines that enable us to await the result of the coroutine. The relationship between tasks and coroutines in Python and how they relate to the concept of futures will be explained in greater detail in the next article in this series. I will also cite code snippets from Python’s asyncio source code to further illustrate this relationship.
Sockets: Sockets are low-level abstractions that allow us to read and write data at the OS level. You usually do not need to interact with them directly unless you’re doing some form of network programming. asyncio abstracts this detail away from you, although we will dive into how it works in a future article in this series.