Python Threading Guide

Introduction

Threading is a technique for running multiple operations concurrently within a single process. In Python, threading is particularly useful for I/O-bound tasks where operations spend significant time waiting for external resources (network requests, file operations, etc.). This guide covers Python's threading capabilities, from basic thread creation to advanced thread management with the concurrent.futures module.

Threading vs Multiprocessing: When to Use Each

Threading: Use for I/O-bound tasks (network requests, file operations, database queries)
Multiprocessing: Use for CPU-bound tasks (complex calculations, data processing)

Why the difference?
Due to Python's Global Interpreter Lock (GIL), only one thread can execute Python code at once. This means threading doesn't provide true parallelism for CPU-bound tasks, but it's very effective for I/O-bound tasks where threads spend time waiting.

Basic Threading: Using the `threading` Module

Creating and Starting Threads

The most basic way to use threads is with the threading.Thread class:

import threading
import time

def do_something():
    print('Sleeping 5 seconds')
    time.sleep(5)
    print('Done sleeping')

# Create thread objects
thread1 = threading.Thread(target=do_something)
thread2 = threading.Thread(target=do_something)

# Start threads
thread1.start()
thread2.start()

Waiting for Threads with `join()`

The join() method allows the main thread to wait for the completion of other threads:

# Start threads
thread1.start()
thread2.start()

# Wait for threads to complete
thread1.join()
thread2.join()

print('All threads completed')

Passing Arguments to Thread Functions

You can pass arguments to the target function using the args parameter (for positional arguments) or kwargs (for keyword arguments):

def do_something(seconds):
    print(f'Sleeping {seconds} seconds')
    time.sleep(seconds)
    print('Done sleeping')

# Pass arguments as a list
thread = threading.Thread(target=do_something, args=[2])
# Or as a tuple
thread = threading.Thread(target=do_something, args=(2,))

Creating Multiple Threads

You can create and manage multiple threads with a list:

threads = []

# Create and start 10 threads
for _ in range(10):
    thread = threading.Thread(target=do_something, args=[2])
    thread.start()
    threads.append(thread)

# Wait for all threads to complete
for thread in threads:
    thread.join()

Advanced Threading: `concurrent.futures`

The concurrent.futures module provides a high-level interface for asynchronously executing callables, introduced in Python 3.2.

ThreadPoolExecutor

The ThreadPoolExecutor manages a pool of worker threads, distributing tasks to available threads as they become available:

import concurrent.futures
import time

def do_something(seconds):
    print(f'Sleeping {seconds} second')
    time.sleep(seconds)
    return f'Done sleeping {seconds}'

# Using context manager to ensure cleanup
with concurrent.futures.ThreadPoolExecutor() as executor:
    seconds = [5, 4, 3, 2, 1]
    results = executor.map(do_something, seconds)
    
    for result in results:
        print(result)

The `submit()` Method and Futures

The submit() method schedules a function to be executed and returns a Future object representing the execution:

with concurrent.futures.ThreadPoolExecutor() as executor:
    # Schedule execution and get Future objects
    f1 = executor.submit(do_something, 5)
    f2 = executor.submit(do_something, 4)
    
    # Future.result() blocks until the result is available
    print(f1.result())
    print(f2.result())

Processing Completed Futures with `as_completed()`

The as_completed() function yields futures as they complete, which is useful when you want to process results as soon as they're ready:

with concurrent.futures.ThreadPoolExecutor() as executor:
    seconds = [5, 4, 3, 2, 1]
    futures = [executor.submit(do_something, sec) for sec in seconds]
    
    for future in concurrent.futures.as_completed(futures):
        print(future.result())

Comparing `map()` and `submit()`

map(): Simpler syntax, results are returned in the same order as inputs
submit(): More flexible, allows using as_completed() to process results as they finish

Thread Execution Visualization

Sequential Execution

In sequential execution, each function call must complete before the next one begins, resulting in the total execution time being the sum of all individual execution times.

Concurrent Execution

With concurrent execution using threads, multiple functions can run simultaneously. For I/O-bound tasks, this significantly reduces the total execution time.

Timing Thread Execution

To measure execution time, use time.perf_counter():

import time
import concurrent.futures

start = time.perf_counter()

# Thread execution code here...

finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')

Practical Example: Downloading Images with Threads

For I/O-bound tasks like downloading multiple images, threads can significantly improve performance:

import requests
import time
import concurrent.futures

img_urls = [
    'https://images.unsplash.com/photo-1516117172878-fd2c41f4a759',
    'https://images.unsplash.com/photo-1532009324734-20a7a5813719',
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e'
]

def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')

start = time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, img_urls)
    
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')

Thread Synchronization

When threads access shared resources, synchronization is necessary to prevent race conditions.

Using Locks

import threading
import time

counter = 0
lock = threading.Lock()

def increment_counter():
    global counter
    with lock:  # Automatically acquires and releases the lock
        local_counter = counter
        time.sleep(0.1)  # Simulate some processing
        counter = local_counter + 1
        print(f'Counter: {counter}')

threads = []
for _ in range(10):
    thread = threading.Thread(target=increment_counter)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print(f'Final counter: {counter}')

Other Synchronization Primitives

Python's threading module provides several synchronization primitives:

Semaphore: Limits access to a shared resource to a fixed number of threads
```
semaphore = threading.Semaphore(value=2)  # Allow 2 threads at once
```

Event: Signals between threads

event = threading.Event()
# In one thread
event.set()  # Signal the event
# In another thread
event.wait()  # Wait for the event

Condition: More complex signaling between threads

condition = threading.Condition()
# Notify waiting threads
with condition:
    condition.notify_all()

Queue: Thread-safe queue for producer-consumer patterns

from queue import Queue
task_queue = Queue()
task_queue.put(item)  # Producer
item = task_queue.get()  # Consumer

The Global Interpreter Lock (GIL)

The Global Interpreter Lock (GIL) is a fundamental and often debated feature of CPython, the default and most widely used implementation of Python. It's a mutex (mutual exclusion lock) that restricts the execution of Python bytecode to only one thread at a time within a single CPython interpreter process.

What is the GIL?

The GIL functions like a "traffic light for threads" in your program
Only one thread can execute Python bytecode at any given moment
When one thread holds the GIL, all other Python threads must wait, even if they're ready to run
The GIL is specific to CPython (the standard Python implementation) and isn't present in all Python implementations (like Jython or IronPython)

Why the GIL Exists

The GIL was introduced in Python's early days primarily to make CPython thread-safe:

Memory Management: CPython uses reference counting for memory management, which tracks how many references point to an object and automatically frees memory when the count drops to zero
Thread Safety: Without a lock like the GIL, multiple threads could simultaneously update these reference counts, leading to race conditions, crashes, or corrupted data
Implementation Simplicity: The GIL provides a simple way to prevent such issues, making the implementation of the interpreter and C extensions simpler

How the GIL Works in Practice

Thread Execution: When a Python program uses multiple threads, the GIL ensures that only one thread can be actively running Python code
Automatic Release: The GIL is typically released by the currently running thread every 5 milliseconds (ms) by default in Python 3.7 to 3.10, allowing other threads a chance to acquire it and run
GIL During I/O Operations: Most I/O operations release the GIL, which is why threading is beneficial for I/O-bound tasks
Impact of Long-Running Operations:
- Blocking Extension Code: If a thread calls a long-running, "blocking" extension code (e.g., C library calls) that doesn't explicitly release the GIL, other Python threads remain blocked
- Explicit GIL Release: Non-Python code, particularly C extensions, can explicitly release the GIL during long-running operations, allowing other Python threads to run concurrently

Impact on Different Concurrency Models

Multithreading

I/O-Bound Tasks: Threading is highly beneficial for tasks that spend most of their time waiting for external resources
CPU-Bound Tasks: Threading offers little to no performance improvement for tasks that require heavy computation, as only one thread can execute Python bytecode at a time

Multiprocessing

Creates separate processes, each with its own Python interpreter and GIL
Allows programs to fully leverage multiple CPU cores and achieve true parallelism
Best choice for CPU-bound tasks

Asyncio

Runs on a single thread with its own event loop
Coroutines explicitly cooperate by awaiting I/O operations, giving control back to the event loop
Avoids the overhead of context switching between threads

Python 3.13: The Game-Changing "No-GIL" Mode

Python 3.13 introduces a significant innovation: an experimental "no-GIL" or "free-threaded" version of Python, resulting from the acceptance of PEP 703.

Purpose and Impact on Parallelism

Enables Python programs to use threading for CPU-based parallelism
Allows CPU-bound work to run at full speed on different threads
For I/O-bound activities, the no-GIL build shows no performance difference compared to builds with the GIL

How to Access

During installation of Python 3.13, users can choose to download the free-threaded binaries
On Windows, this provides two Python 3.13 instances:
- Regular build with the GIL (launched with py 3.13)
- No-GIL build (launched with py 3.13t)
Developers can explicitly build CPython without the GIL using the --disable-gil flag
In code, check if the GIL is enabled with: import sys; print(sys._is_gil_enabled())

Technical Implementation

Reference counters are updated using atomic operations to ensure thread safety
The cyclic garbage collector has been updated to handle threads safely
These changes enable thread-safe memory management without the need for the GIL

Important Considerations

The free-threaded build is delivered as a separate binary for testing purposes
Code that wasn't thread-safe before won't automatically become thread-safe in the no-GIL build
Developers still need to implement mechanisms like locks or queues for thread safety where shared state is involved
The atomic operations introduce some overhead, potentially slowing down single-threaded programs marginally
Currently intended for experimentation, not production use

Performance Implications

Threading can achieve performance comparable to or even slightly faster than multiprocessing for CPU-bound activities
High-level abstractions for threading, such as thread pools, are recommended to see performance improvements

Bypassing and Avoiding the GIL

To overcome the GIL's limitations for CPU-bound tasks, developers can:

Use multiprocessing: The primary way to achieve true parallelism in Python by running code in separate processes
Leverage C Extensions: Libraries like NumPy or TensorFlow perform heavy computations outside of the GIL's control
Consider the no-GIL mode: In Python 3.13+, experiment with the free-threaded build for CPU-bound tasks
Alternative Python Implementations: Consider Jython, IronPython (no GIL), or PyPy (optimized performance)
Use asyncio for I/O-bound tasks: Often preferred over threading due to efficiency and lower overhead

With these strategies, Python developers can work effectively with or around the GIL to create responsive, scalable applications.

Best Practices

Use thread pools (ThreadPoolExecutor) instead of manually creating threads
Prefer with statement for context management
Use thread synchronization when accessing shared resources
Keep the GIL in mind - threading is best for I/O-bound tasks
Consider using asyncio for newer code if appropriate
Be careful with thread termination - threads should exit cleanly
Benchmark your code - measure the actual performance improvement from threading

Common Pitfalls

Race conditions - Use proper synchronization for shared resources
Deadlocks - Be careful when acquiring multiple locks
Thread leakage - Always join threads or use context managers
Over-threading - Too many threads can cause overhead
Using threading for CPU-bound tasks - Consider multiprocessing instead
Thread safety - Be careful with non-thread-safe libraries

Multiprocessing in Python

Multiprocessing involves running multiple processes, each with its own Python interpreter and memory space. This allows for true parallelism, making multiprocessing ideal for CPU-bound tasks that need to bypass the Global Interpreter Lock (GIL).

Basic Multiprocessing

import multiprocessing
import time

def worker(name):
    print(f"Worker {name} starting")
    time.sleep(2)  # Simulate some work
    print(f"Worker {name} finished")

if __name__ == '__main__':  # This guard is important for Windows
    processes = []
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        processes.append(p)
        p.start()
    
    for p in processes:
        p.join()  # Wait for all processes to finish

Process Pools with `concurrent.futures`

Similar to thread pools, you can use ProcessPoolExecutor for managing a pool of worker processes:

import concurrent.futures
import time

def cpu_bound_task(number):
    return sum(i * i for i in range(number))

if __name__ == '__main__':
    numbers = [10000000 + x for x in range(8)]
    
    start = time.perf_counter()
    
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = executor.map(cpu_bound_task, numbers)
    
    finish = time.perf_counter()
    print(f'Finished in {round(finish-start, 2)} second(s)')

Process vs Thread Visualization

The diagram above illustrates that:

Each process has its own memory space, Python interpreter, and system resources
Threads exist within a process
Threads within the same process share memory
Different processes have separate memory spaces

Asynchronous Programming with Asyncio

Asyncio is a Python library for writing concurrent code using the async/await syntax. It's designed for I/O-bound tasks and uses an event loop to manage and schedule tasks.

Key Concepts in Asyncio

Coroutines: Functions defined with async def. These are the building blocks of asyncio.
Event Loop: The core of asyncio that manages the execution of tasks.
Tasks: Wrappers around coroutines that are scheduled on the event loop.
await: Pauses the execution of a coroutine, yielding control back to the event loop.

Basic Asyncio Example

import asyncio

async def task(name):
    print(f"Task {name} starting")
    await asyncio.sleep(2)  # Simulate an I/O-bound operation
    print(f"Task {name} finished")
    return f"Result from {name}"

async def main():
    # Run multiple coroutines concurrently
    results = await asyncio.gather(
        task("A"),
        task("B"),
        task("C")
    )
    print(results)

# In Python 3.7+
asyncio.run(main())

Event Loop Visualization

Creating and Managing Tasks

import asyncio

async def my_coroutine():
    await asyncio.sleep(2)
    return "Done"

async def main():
    # Create a task and start it immediately
    task = asyncio.create_task(my_coroutine())
    
    # Do other work while the task runs
    print("Doing something else while waiting...")
    
    # Wait for the task to complete
    result = await task
    print(f"Task result: {result}")

asyncio.run(main())

Handling CPU-Bound Tasks in Asyncio

Asyncio is not well-suited for CPU-bound tasks because they can block the event loop. However, you can offload CPU-bound tasks to a separate thread or process:

import asyncio
import time

def cpu_bound_task(n):
    time.sleep(n)  # Simulating a CPU-bound task
    return n * n

async def main():
    # Offload to a separate thread
    result = await asyncio.to_thread(cpu_bound_task, 2)
    print(f"Result: {result}")

asyncio.run(main())

Comparing Concurrency Models

When to Use Each Approach

Multithreading:
- Best for I/O-bound tasks like network operations or file I/O
- Use when you need shared state between threads
- Not ideal for CPU-bound tasks due to the GIL
Multiprocessing:
- Ideal for CPU-bound tasks that need true parallelism
- Use when you need to bypass the GIL
- Best for heavy computational workloads
Asyncio:
- Excellent for I/O-bound tasks with many concurrent operations
- Ideal for building high-performance network servers
- More readable and maintainable than callback-based approaches
- Not suitable for CPU-bound tasks without offloading

Further Resources

Next Steps

After mastering these concurrency models, consider exploring:

Hybrid approaches combining asyncio with threads or processes
Distributed computing frameworks like Dask or Celery
Thread-safe data structures in the queue and collections.concurrent modules
Advanced synchronization patterns like producer-consumer and thread pools

Introduction​

Threading vs Multiprocessing: When to Use Each​

Basic Threading: Using the threading Module​

Creating and Starting Threads​

Waiting for Threads with join()​

Passing Arguments to Thread Functions​

Creating Multiple Threads​

Advanced Threading: concurrent.futures​

ThreadPoolExecutor​

The submit() Method and Futures​

Processing Completed Futures with as_completed()​

Comparing map() and submit()​

Thread Execution Visualization​

Sequential Execution​

Concurrent Execution​

Timing Thread Execution​

Practical Example: Downloading Images with Threads​

Thread Synchronization​

Using Locks​

Other Synchronization Primitives​

The Global Interpreter Lock (GIL)​

What is the GIL?​

Why the GIL Exists​

How the GIL Works in Practice​

Impact on Different Concurrency Models​

Multithreading​

Multiprocessing​

Asyncio​

Python 3.13: The Game-Changing "No-GIL" Mode​

Purpose and Impact on Parallelism​

How to Access​

Technical Implementation​

Important Considerations​

Performance Implications​

Bypassing and Avoiding the GIL​

Best Practices​

Common Pitfalls​

Multiprocessing in Python​

Basic Multiprocessing​

Process Pools with concurrent.futures​

Process vs Thread Visualization​

Asynchronous Programming with Asyncio​

Key Concepts in Asyncio​

Basic Asyncio Example​

Event Loop Visualization​

Creating and Managing Tasks​

Handling CPU-Bound Tasks in Asyncio​

Comparing Concurrency Models​

When to Use Each Approach​

Further Resources​

Next Steps​