Python Example Multithreading with Concurrent Futures

Before we start you should understand the difference between Multiprocessing and Multithreading. To keep things simple I just put this quick comparison here.

Multiprocessing

+ Great for CPU bound applications

+ Takes advantage of multiple CPUs & cores

+ Separate memory space

+ Code is usually easier to read and understand

+ Child processes may be killed

Higher memory overhead

Multithreading

+ Good for IO bound applications and for web applications

+ Lightweight

+ Shared memory access

Not interruptible or killable

Code is usually harder to understand and hard to get right

In this post I’ll show how to use Python3 concurrent.futures library for multithreading scenario.

Python3 concurrent.futures reminds me CompletableFuture that we had in Java 8. Similar to Java’s CompletableFuture concurrent.futures provides us with relatively comfortable mechanism for treating threads.

In the following code sample we have two URLs that been processed in different threads. I use here ThreadPoolExecutor and as_completed method that allows to print the results as soon as the URL reading is finished.

import logging
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed
import urllib.request
from time import time
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
URLS = ['https://codeflex.co/configuring-redis-cluster-on-linux/',
        'https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/']
def read_website(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        logger.info('Reading data from ' + url + ' ...')
        data = conn.read()
        logger.info('Finished to read data for ' + url)
        return data
def main():
    with ThreadPoolExecutor(max_workers=6) as executor:
        future_data = {executor.submit(read_website, url, 30): url for url in URLS}
        for future in as_completed(future_data):
            try:
                website_data = future.result()
                logger.info('%r page is %d bytes' % (future_data[future], len(website_data)))
            except Exception as exc:
                logger.info('%r generated an exception: %s' % (future_data[future], exc))
if __name__ == '__main__':
    main()

The program output:

2021-03-16 19:27:00,668 - __main__ - INFO - Reading data from https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/ ...
2021-03-16 19:27:00,678 - __main__ - INFO - Reading data from https://codeflex.co/configuring-redis-cluster-on-linux/ ...
2021-03-16 19:27:01,050 - __main__ - INFO - Finished to read data for https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/
2021-03-16 19:27:01,051 - __main__ - INFO - 'https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/' page is 89366 bytes
2021-03-16 19:27:01,253 - __main__ - INFO - Finished to read data for https://codeflex.co/configuring-redis-cluster-on-linux/
2021-03-16 19:27:01,253 - __main__ - INFO - 'https://codeflex.co/configuring-redis-cluster-on-linux/' page is 105557 bytes

In the next post we’ll see a code sample for Python3 Multiprocessing.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.