Python Threading: Run Code Concurrently & Speed Up Your Scripts
Threading lets you run multiple tasks at the same time — or at least give that illusion. In this guide, you’ll learn how Python’s threading module works, when to use it, and how ThreadPoolExecutor makes concurrent code dramatically simpler. By the end, we’ll take a real image-download script from 23 seconds down to 5.
Why Use Threading?
The core reason to use threading is speed. When your program runs tasks one after the other — synchronously — it just sits and waits between each step. Threading lets you kick off multiple tasks and have them overlap in time, so you’re not sitting idle.
Here’s what synchronous execution looks like. Two calls to a do_something() function that sleeps for 1 second each:
import time
def do_something():
print("Sleeping 1 second...")
time.sleep(1)
print("Done sleeping")
start = time.perf_counter()
do_something()
do_something()
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 2.0 seconds
Each call waits for the last one to finish before starting. If you ran this 10 times, you’d sit around for 10 seconds — even though your CPU is barely doing any work.
“Running tasks one after another like this is called running synchronously. It’s often the right default — but not when your tasks spend most of their time waiting.”
I/O-Bound vs CPU-Bound Tasks
Before reaching for threads, it helps to understand why they work for some problems and not others. Tasks generally fall into two categories:
The bottleneck is waiting: network requests, downloading files, reading from disk, database queries.
While one thread waits, others can run. Threading helps a lot here.
The bottleneck is computation: resizing images, number crunching, data processing.
Threads won’t help due to Python’s GIL. Use multiprocessing instead.
Threading doesn’t actually run code at the same time — it gives the illusion of doing so. When a thread is waiting (sleeping, waiting for a server response), Python can switch to another thread and make progress there. This is called concurrency, not parallelism.
Basic Threading Example
Let’s take that 2-second synchronous script and make it concurrent using the threading module — which is in the standard library, so no install needed.
import time
import threading
def do_something():
print("Sleeping 1 second...")
time.sleep(1)
print("Done sleeping")
start = time.perf_counter()
t1 = threading.Thread(target=do_something)
t2 = threading.Thread(target=do_something)
t1.start()
t2.start()
t1.join()
t2.join()
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.0 seconds
target=do_something — pass the function reference, not the call. No parentheses.
.join() — tells the main script to wait for that thread to finish before moving on. Without it, your script may print “finished” before the threads are actually done.
Both threads started at nearly the same time. Each slept for one second. But since they overlapped, the whole script finished in ~1 second instead of 2. That’s the power of threading for I/O-bound work.
Loops, Thread Lists & Joins
Manually creating t1, t2, t3… doesn’t scale. Here’s how to spin up 10 threads with a loop:
import time
import threading
def do_something():
print("Sleeping 1 second...")
time.sleep(1)
print("Done sleeping")
start = time.perf_counter()
threads = []
for _ in range(10):
t = threading.Thread(target=do_something)
t.start()
threads.append(t)
for thread in threads:
thread.join()
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.0 seconds (not 10!)
If you called t.join() inside the same loop where you start threads, each thread would finish completely before the next one starts. That makes it synchronous again — exactly what we’re trying to avoid.
Always start all threads first, collect them in a list, then join them in a second loop.
Passing Arguments to Threads
Use the args parameter when creating your thread — it takes a list (or tuple) of positional arguments to pass to your target function.
import time
import threading
def do_something(seconds):
print(f"Sleeping {seconds} second(s)...")
time.sleep(seconds)
print("Done sleeping")
start = time.perf_counter()
threads = []
for _ in range(10):
t = threading.Thread(target=do_something, args=[1.5])
t.start()
threads.append(t)
for thread in threads:
thread.join()
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.5 seconds (not 15!)
Ten calls that each sleep 1.5 seconds — completing together in 1.5 seconds instead of 15. That’s a 10× speedup just from threading.
ThreadPoolExecutor — The Modern Way
Python 3.2 introduced concurrent.futures.ThreadPoolExecutor. It’s easier to use, handles thread management for you, and makes it trivial to switch between threading and multiprocessing if needed.
Using submit()
submit() schedules a function to run in a thread and returns a Future object — a reference to the pending result that you can inspect or wait on.
import time
import concurrent.futures
def do_something(seconds):
print(f"Sleeping {seconds} second(s)...")
time.sleep(seconds)
return f"Done sleeping...{seconds}"
start = time.perf_counter()
with concurrent.futures.ThreadPoolExecutor() as executor:
f1 = executor.submit(do_something, 1)
f2 = executor.submit(do_something, 1)
print(f1.result())
print(f2.result())
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# → Finished in 1.0 seconds
Using with automatically calls .join() on all threads when the block exits — so even if you don’t retrieve results inside the block, threads will still be waited on before your script continues.
Submitting Many Tasks with a List Comprehension
To run a function many times, combine submit() with a list comprehension and the as_completed() helper to get results as each thread finishes:
import time
import concurrent.futures
def do_something(seconds):
time.sleep(seconds)
return f"Done sleeping...{seconds}"
start = time.perf_counter()
seconds_list = [5, 4, 3, 2, 1]
with concurrent.futures.ThreadPoolExecutor() as executor:
results = [executor.submit(do_something, s) for s in seconds_list]
for f in concurrent.futures.as_completed(results):
print(f.result())
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# Results print in order of completion: 1, 2, 3, 4, 5
# → Finished in 5.0 seconds
as_completed() yields results as each thread finishes — so shorter tasks print first, even if they were submitted last. This is useful when you want to process results as they arrive rather than waiting for everything to complete.
The map() Method
If you’re familiar with Python’s built-in map(), executor.map() works the same way — except it runs each call in a separate thread. Unlike as_completed(), it returns results in the order they were submitted (not the order they finished).
import time
import concurrent.futures
def do_something(seconds):
time.sleep(seconds)
return f"Done sleeping...{seconds}"
start = time.perf_counter()
seconds_list = [5, 4, 3, 2, 1]
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(do_something, seconds_list)
for result in results:
print(result)
finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds")
# Results print in submission order: 5, 4, 3, 2, 1
# → Finished in 5.0 seconds
| Method | Returns | Result Order | Best For |
|---|---|---|---|
submit() + as_completed() |
Future objects | Completion order | Processing results as they arrive Recommended |
map() |
Results iterator | Submission order | Simple batch runs, familiar map-style API Simpler |
Real-World Example: Downloading Images
Let’s apply everything to a realistic problem — downloading 15 high-resolution photos from the web. Here’s the synchronous version first:
import time
import requests
img_urls = [
'https://unsplash.com/photos/abc/download',
'https://unsplash.com/photos/def/download',
# ... 13 more URLs
]
start = time.perf_counter()
for url in img_urls:
img_bytes = requests.get(url).content
img_name = url.split('/')[4]
with open(f'{img_name}.jpg', 'wb') as f:
f.write(img_bytes)
print(f'{img_name} was downloaded...')
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} seconds')
# → Finished in 23.0 seconds
23 seconds for 15 images. Each download waits for the last to finish — classic I/O-bound bottleneck. Now let’s add threading with just a few changes:
import time
import requests
import concurrent.futures
img_urls = [
'https://unsplash.com/photos/abc/download',
'https://unsplash.com/photos/def/download',
# ... 13 more URLs
]
def download_image(img_url):
img_bytes = requests.get(img_url).content
img_name = img_url.split('/')[4]
with open(f'{img_name}.jpg', 'wb') as f:
f.write(img_bytes)
print(f'{img_name} was downloaded...')
start = time.perf_counter()
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(download_image, img_urls)
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} seconds')
# → Finished in 5.0 seconds
The key changes: wrap the download logic in a function, import concurrent.futures, and replace the for loop with executor.map(). That’s it.
Downloading is almost entirely waiting — for DNS lookups, TCP connections, and server responses. While one thread waits for a response, others are making new requests. The threads aren’t doing anything CPU-heavy; they’re just queued up on different network operations simultaneously.
Quick-Reference: Threading Step-by-Step
- 1
Identify whether your task is I/O-bound (network, disk, database). If yes, threading will help.
- 2
Wrap your work in a function that accepts any varying inputs as arguments.
- 3
Import
concurrent.futuresand open aThreadPoolExecutorcontext manager. - 4
Use
executor.map(fn, iterable)for simple batch runs, orexecutor.submit(fn, *args)for fine-grained control. - 5
Use
as_completed()if you want results as they finish, or iteratemap()results in submission order. - 6
Handle exceptions — they’re raised when you call
.result(), not when the thread runs.
Threading vs Multiprocessing — When to Use Which
Use for: I/O-bound tasks — downloading files, API calls, reading/writing to disk, database queries.
Threads share memory and have low overhead. They don’t truly run in parallel due to Python’s GIL.
Use for: CPU-bound tasks — image resizing, number crunching, data transformation.
Separate processes bypass the GIL and run in true parallel. Higher memory cost, but real CPU gains.
For CPU-bound tasks, adding threads can actually slow your script down due to the overhead of creating and destroying threads, plus GIL contention. If your bottleneck is computation, not waiting — reach for multiprocessing instead.
ThreadPoolExecutor is the modern, recommended way to use threads in Python — fewer lines, automatic cleanup, and easy to scale. For CPU-heavy work, multiprocessing is the right tool. Start with threads, measure, and switch if needed.
