Python Threading: Run Code Concurrently & Speed Up Your Scripts

Intermediate Python Concurrency Performance

Python Threading: Run Code Concurrently & Speed Up Your Scripts

Threading lets you run multiple tasks at the same time — or at least give that illusion. In this guide, you’ll learn how Python’s threading module works, when to use it, and how ThreadPoolExecutor makes concurrent code dramatically simpler. By the end, we’ll take a real image-download script from 23 seconds down to 5.

In This Guide

Why Use Threading?
I/O-Bound vs CPU-Bound
Basic Threading Example
Loops, Lists & Joins
Passing Arguments
ThreadPoolExecutor
The map() Method
Real-World: Image Downloads

Why Use Threading?

The core reason to use threading is speed. When your program runs tasks one after the other — synchronously — it just sits and waits between each step. Threading lets you kick off multiple tasks and have them overlap in time, so you’re not sitting idle.

Here’s what synchronous execution looks like. Two calls to a do_something() function that sleeps for 1 second each:

import time

def do_something():
    print("Sleeping 1 second...")
    time.sleep(1)
    print("Done sleeping")

start = time.perf_counter()
do_something()
do_something()
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 2.0 seconds

Each call waits for the last one to finish before starting. If you ran this 10 times, you’d sit around for 10 seconds — even though your CPU is barely doing any work.

“Running tasks one after another like this is called running synchronously. It’s often the right default — but not when your tasks spend most of their time waiting.”

I/O-Bound vs CPU-Bound Tasks

Before reaching for threads, it helps to understand why they work for some problems and not others. Tasks generally fall into two categories:

✅ I/O-Bound — use threading

The bottleneck is waiting: network requests, downloading files, reading from disk, database queries.

While one thread waits, others can run. Threading helps a lot here.

⚙️ CPU-Bound — use multiprocessing

The bottleneck is computation: resizing images, number crunching, data processing.

Threads won’t help due to Python’s GIL. Use multiprocessing instead.

💡 Key Insight

Threading doesn’t actually run code at the same time — it gives the illusion of doing so. When a thread is waiting (sleeping, waiting for a server response), Python can switch to another thread and make progress there. This is called concurrency, not parallelism.

Basic Threading Example

Let’s take that 2-second synchronous script and make it concurrent using the threading module — which is in the standard library, so no install needed.

import time
import threading

def do_something():
    print("Sleeping 1 second...")
    time.sleep(1)
    print("Done sleeping")

start = time.perf_counter()

t1 = threading.Thread(target=do_something)
t2 = threading.Thread(target=do_something)

t1.start()
t2.start()

t1.join()
t2.join()

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.0 seconds

📌 Two Things to Know

target=do_something — pass the function reference, not the call. No parentheses.

.join() — tells the main script to wait for that thread to finish before moving on. Without it, your script may print “finished” before the threads are actually done.

Both threads started at nearly the same time. Each slept for one second. But since they overlapped, the whole script finished in ~1 second instead of 2. That’s the power of threading for I/O-bound work.

Loops, Thread Lists & Joins

Manually creating t1, t2, t3… doesn’t scale. Here’s how to spin up 10 threads with a loop:

import time
import threading

def do_something():
    print("Sleeping 1 second...")
    time.sleep(1)
    print("Done sleeping")

start = time.perf_counter()

threads = []

for _ in range(10):
    t = threading.Thread(target=do_something)
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.0 seconds (not 10!)

⚠️ Why Not Join Inside the First Loop?

If you called t.join() inside the same loop where you start threads, each thread would finish completely before the next one starts. That makes it synchronous again — exactly what we’re trying to avoid.

Always start all threads first, collect them in a list, then join them in a second loop.

Passing Arguments to Threads

Use the args parameter when creating your thread — it takes a list (or tuple) of positional arguments to pass to your target function.

import time
import threading

def do_something(seconds):
    print(f"Sleeping {seconds} second(s)...")
    time.sleep(seconds)
    print("Done sleeping")

start = time.perf_counter()

threads = []

for _ in range(10):
    t = threading.Thread(target=do_something, args=[1.5])
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.5 seconds (not 15!)

Ten calls that each sleep 1.5 seconds — completing together in 1.5 seconds instead of 15. That’s a 10× speedup just from threading.

ThreadPoolExecutor — The Modern Way

Python 3.2 introduced concurrent.futures.ThreadPoolExecutor. It’s easier to use, handles thread management for you, and makes it trivial to switch between threading and multiprocessing if needed.

Using `submit()`

submit() schedules a function to run in a thread and returns a Future object — a reference to the pending result that you can inspect or wait on.

import time
import concurrent.futures

def do_something(seconds):
    print(f"Sleeping {seconds} second(s)...")
    time.sleep(seconds)
    return f"Done sleeping...{seconds}"

start = time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:
    f1 = executor.submit(do_something, 1)
    f2 = executor.submit(do_something, 1)
    print(f1.result())
    print(f2.result())

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.0 seconds

📌 Context Manager Bonus

Using with automatically calls .join() on all threads when the block exits — so even if you don’t retrieve results inside the block, threads will still be waited on before your script continues.

Submitting Many Tasks with a List Comprehension

To run a function many times, combine submit() with a list comprehension and the as_completed() helper to get results as each thread finishes:

import time
import concurrent.futures

def do_something(seconds):
    time.sleep(seconds)
    return f"Done sleeping...{seconds}"

start = time.perf_counter()

seconds_list = [5, 4, 3, 2, 1]

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = [executor.submit(do_something, s) for s in seconds_list]

    for f in concurrent.futures.as_completed(results):
        print(f.result())

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# Results print in order of completion: 1, 2, 3, 4, 5
# → Finished in 5.0 seconds

as_completed() yields results as each thread finishes — so shorter tasks print first, even if they were submitted last. This is useful when you want to process results as they arrive rather than waiting for everything to complete.

The map() Method

If you’re familiar with Python’s built-in map(), executor.map() works the same way — except it runs each call in a separate thread. Unlike as_completed(), it returns results in the order they were submitted (not the order they finished).

import time
import concurrent.futures

def do_something(seconds):
    time.sleep(seconds)
    return f"Done sleeping...{seconds}"

start = time.perf_counter()

seconds_list = [5, 4, 3, 2, 1]

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(do_something, seconds_list)

    for result in results:
        print(result)

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# Results print in submission order: 5, 4, 3, 2, 1
# → Finished in 5.0 seconds

Method	Returns	Result Order	Best For
`submit()` + `as_completed()`	Future objects	Completion order	Processing results as they arrive Recommended
`map()`	Results iterator	Submission order	Simple batch runs, familiar map-style API Simpler

Real-World Example: Downloading Images

Let’s apply everything to a realistic problem — downloading 15 high-resolution photos from the web. Here’s the synchronous version first:

import time
import requests

img_urls = [
    'https://unsplash.com/photos/abc/download',
    'https://unsplash.com/photos/def/download',
    # ... 13 more URLs
]

start = time.perf_counter()

for url in img_urls:
    img_bytes = requests.get(url).content
    img_name = url.split('/')[4]
    with open(f'{img_name}.jpg', 'wb') as f:
        f.write(img_bytes)
    print(f'{img_name} was downloaded...')

finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} seconds')
# → Finished in 23.0 seconds

23 seconds for 15 images. Each download waits for the last to finish — classic I/O-bound bottleneck. Now let’s add threading with just a few changes:

import time
import requests
import concurrent.futures

img_urls = [
    'https://unsplash.com/photos/abc/download',
    'https://unsplash.com/photos/def/download',
    # ... 13 more URLs
]

def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[4]
    with open(f'{img_name}.jpg', 'wb') as f:
        f.write(img_bytes)
    print(f'{img_name} was downloaded...')

start = time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, img_urls)

finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} seconds')
# → Finished in 5.0 seconds

The key changes: wrap the download logic in a function, import concurrent.futures, and replace the for loop with executor.map(). That’s it.

💡 Why Does This Work So Well?

Downloading is almost entirely waiting — for DNS lookups, TCP connections, and server responses. While one thread waits for a response, others are making new requests. The threads aren’t doing anything CPU-heavy; they’re just queued up on different network operations simultaneously.

Quick-Reference: Threading Step-by-Step

1
Identify whether your task is I/O-bound (network, disk, database). If yes, threading will help.
2
Wrap your work in a function that accepts any varying inputs as arguments.
3
Import concurrent.futures and open a ThreadPoolExecutor context manager.
4
Use executor.map(fn, iterable) for simple batch runs, or executor.submit(fn, *args) for fine-grained control.
5
Use as_completed() if you want results as they finish, or iterate map() results in submission order.
6
Handle exceptions — they’re raised when you call .result(), not when the thread runs.

Threading vs Multiprocessing — When to Use Which

🧵 Threading

Use for: I/O-bound tasks — downloading files, API calls, reading/writing to disk, database queries.

Threads share memory and have low overhead. They don’t truly run in parallel due to Python’s GIL.

⚙️ Multiprocessing

Use for: CPU-bound tasks — image resizing, number crunching, data transformation.

Separate processes bypass the GIL and run in true parallel. Higher memory cost, but real CPU gains.

⚠️ Threading Can Make Things Slower

For CPU-bound tasks, adding threads can actually slow your script down due to the overhead of creating and destroying threads, plus GIL contention. If your bottleneck is computation, not waiting — reach for multiprocessing instead.

✅ In summary: Threading shines when your code spends time waiting rather than computing. The ThreadPoolExecutor is the modern, recommended way to use threads in Python — fewer lines, automatic cleanup, and easy to scale. For CPU-heavy work, multiprocessing is the right tool. Start with threads, measure, and switch if needed.

Python Threading: Run Code Concurrently & Speed Up Your Scripts

Why Use Threading?

I/O-Bound vs CPU-Bound Tasks

Basic Threading Example

Loops, Thread Lists & Joins

Passing Arguments to Threads

ThreadPoolExecutor — The Modern Way

Using submit()

Submitting Many Tasks with a List Comprehension

The map() Method

Real-World Example: Downloading Images

Quick-Reference: Threading Step-by-Step

Threading vs Multiprocessing — When to Use Which

Related Topics to Explore Next

Using `submit()`