Before we start you should understand the difference between Multiprocessing and Multithreading. To keep things simple I just put this quick comparison here.
Multiprocessing
+ Great for CPU bound applications
+ Takes advantage of multiple CPUs & cores
+ Separate memory space
+ Code is usually easier to read and understand
+ Child processes may be killed
– Higher memory overhead
Multithreading
+ Good for IO bound applications and for web applications
+ Lightweight
+ Shared memory access
– Not interruptible or killable
– Code is usually harder to understand and hard to get right
In this post I’ll show how to use Python3 concurrent.futures
library for multithreading scenario.
Python3 concurrent.futures
reminds me CompletableFuture
that we had in Java 8. Similar to Java’s CompletableFuture concurrent.futures
provides us with relatively comfortable mechanism for treating threads.
In the following code sample we have two URLs that been processed in different threads. I use here ThreadPoolExecutor
and as_completed
method that allows to print the results as soon as the URL reading is finished.
import logging
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed
import urllib.request
from time import time
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
URLS = ['https://codeflex.co/configuring-redis-cluster-on-linux/',
'https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/']
def read_website(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
logger.info('Reading data from ' + url + ' ...')
data = conn.read()
logger.info('Finished to read data for ' + url)
return data
def main():
with ThreadPoolExecutor(max_workers=6) as executor:
future_data = {executor.submit(read_website, url, 30): url for url in URLS}
for future in as_completed(future_data):
try:
website_data = future.result()
logger.info('%r page is %d bytes' % (future_data[future], len(website_data)))
except Exception as exc:
logger.info('%r generated an exception: %s' % (future_data[future], exc))
if __name__ == '__main__':
main()
The program output:
2021-03-16 19:27:00,668 - __main__ - INFO - Reading data from https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/ ...
2021-03-16 19:27:00,678 - __main__ - INFO - Reading data from https://codeflex.co/configuring-redis-cluster-on-linux/ ...
2021-03-16 19:27:01,050 - __main__ - INFO - Finished to read data for https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/
2021-03-16 19:27:01,051 - __main__ - INFO - 'https://codeflex.co/python-s3-multipart-file-upload-with-metadata-and-progress-indicator/' page is 89366 bytes
2021-03-16 19:27:01,253 - __main__ - INFO - Finished to read data for https://codeflex.co/configuring-redis-cluster-on-linux/
2021-03-16 19:27:01,253 - __main__ - INFO - 'https://codeflex.co/configuring-redis-cluster-on-linux/' page is 105557 bytes
In the next post we’ll see a code sample for Python3 Multiprocessing.
[…] the last post we saw what are the differences between Multiprocessing and Multithreading in Python3 and the code […]
[…] the last two articles we reviewed Python3 Multithreading and Multiprocessing […]