In this article, we will learn how to work with a specific Python class from the multiprocessing module, the process class. I will give you a quick overview with examples.
What is a Python multiprocessing module?
What better way of describing what the module than to pull from the official documentation? Multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
The threading module is not the focus of this article, but in summary, the threading module will handle a small segment of code execution (lightweight and with shared memory), while the multiprocessing one will handle a program execution (heavier, and totally isolated).
If you want to learn more about the difference between a process and a thread, read this amazing article by Jong Hyuck Won, Process vs Thread: What’s the difference?
In general, the multiprocessing module offers a variety of other classes, functions and utilities that one could use to handle multiple processes executing during your program execution. That module is specially designed to be the main point of interaction if a program needs to apply parallelism in its workflow. We won't go over all classes and utilities from the multiprocessing module, but rather, we will focus on a very specific class, the process class.
What is the process class?
In this section, we will try to give a better scope of what a process is, and how you can identify, use and manage processes within Python. As explained in the GNU C Library: "Processes are the primitive units for allocation of system resources. Each process has its own address space and (usually) one thread of control. A process executes a program; you can have multiple processes executing the same program, but each process has its own copy of the program within its own address space and executes it independently of the other copies."
But what does that look like in Python? So far, we have managed to give some descriptions and references to what a process is, the difference between a process and a thread, but we haven't touched any code so far. Well, let's change that and do a very simple example of a process in Python:
#!/usr/bin/env python
import os
# A very, very simple process.
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")
Which will produce the following output:
[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 144112
As you can see, any running Python script or program is a process of its own.
Creating a child process from your parent
And what about spawning different child processes inside your parent process? Well, to do that, we have the aid of the Process
class from multiprocessing module, and it looks like this:
#!/usr/bin/env python
import os
import multiprocessing
def child_process():
print(f"Hi! I'm a child process {os.getpid()}")
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")
# Here we create a new instance of the Process class and assign our
# `child_process` function to be executed.
process = multiprocessing.Process(target=child_process)
# We then start the process
process.start()
# And finally, we join the process. This will make our script to hang and
# wait until the child process is done.
process.join()
Which will produce the following output:
[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 144078
Hi! I'm a child process 144079
A very important note about the previous script: if you don't use the process.join()
to wait for your child process to execute and finish, then any other subsequent code that point will actually execute and may become a bit harder to synchronize your workflow.
Consider the following example:
#!/usr/bin/env python
import os
import multiprocessing
def child_process():
print(f"Hi! I'm a child process {os.getpid()}")
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")
# Here we create a new instance of the Process class and assign our
# `child_process` function to be executed.
process = multiprocessing.Process(target=child_process)
# We then start the process
process.start()
# And finally, we join the process. This will make our script to hang and
# wait until the child process is done.
#process.join()
print("AFTER CHILD EXECUTION! RIGHT?!")
This snippet will produce the following output:
[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 145489
AFTER CHILD EXECUTION! RIGHT?!
Hi! I'm a child process 145490
Of course, it is not correct to affirm that the above snippet is wrong. It will all depend on how you want to use the module and how your child processes will execute. So use it wisely.
Creating various child processes from a parent process
If you want to spawn multiple processes, you can take advantage of for-loops (or any other type of loops). They will let you create as many references to the processes you need, and at a later stage, start/join
them.
#!/usr/bin/env python
import os
import multiprocessing
def child_process(id):
print(f"Hi! I'm a child process {os.getpid()} with id#{id}")
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")
list_of_processes = []
# Loop through the number 0 to 10 and create processes for each one of
# them.
for i in range(0, 10):
# Here we create a new instance of the Process class and assign our
# `child_process` function to be executed. Note the difference now that
# we are using the `args` parameter now, this means that we can pass
# down parameters to the function being executed as a child process.
process = multiprocessing.Process(target=child_process, args=(i,))
list_of_processes.append(process)
for process in list_of_processes:
# We then start the process
process.start()
# And finally, we join the process. This will make our script to hang
# and wait until the child process is done.
process.join()
That will produce the following output:
[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 146056
Hi! I'm a child process 146057 with id#0
Hi! I'm a child process 146058 with id#1
Hi! I'm a child process 146059 with id#2
Hi! I'm a child process 146060 with id#3
Hi! I'm a child process 146061 with id#4
Hi! I'm a child process 146062 with id#5
Hi! I'm a child process 146063 with id#6
Hi! I'm a child process 146064 with id#7
Hi! I'm a child process 146065 with id#8
Hi! I'm a child process 146066 with id#9
Communicating data between child process and parent process
In the previous section, I described the addition of a new parameter to the multiprocessing.Process
class constructor, the args
. This parameter allows you to pass down values to your child process to be used inside of the function. But do you know how to return data from the child process?
You may be thinking that to return data from the child, one must use the return
statement inside of it to actually be able to retrieve the data. A process is wonderful to execute functions in an isolated way, without interfering with shared resources meaning that the normal and usual way that we know about returning data from functions. Here, is not allowed because of its isolation.
Instead, we can use the queue class, which will provide us an interface to communicate data between the parent process and its child processes. A queue, in this context, is a normal FIFO (First In First Out) that has a built-in mechanism for working with multiprocessing.
Consider the following example:
#!/usr/bin/env python
import os
import multiprocessing
def child_process(queue, number1, number2):
print(f"Hi! I'm a child process {os.getpid()}. I do calculations.")
sum = number1 + number2
# Putting data into the queue
queue.put(sum)
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")
# Defining a new Queue()
queue = multiprocessing.Queue()
# Here we create a new instance of the Process class and assign our
# `child_process` function to be executed. Note the difference now that
# we are using the `args` parameter now, this means that we can pass
# down parameters to the function being executed as a child process.
process = multiprocessing.Process(target=child_process, args=(queue,1, 2))
# We then start the process
process.start()
# And finally, we join the process. This will make our script to hang and
# wait until the child process is done.
process.join()
# Accessing the result from the queue.
print(f"Got the result from child process as {queue.get()}")
It will give the following output:
[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 149002
Hi! I'm a child process 149003. I do calculations.
Got the result from child process as 3
Exception handling for the process class
Handling exceptions is a special and somewhat difficult task that we have to go through from time to time while working with the process module. The reason for that is, by default, any exception that occurs inside a child process will always be handled by the Process
class that spawned it.
The code below is raising an Exception
with text:
#!/usr/bin/env python
import os
import multiprocessing
def child_process():
print(f"Hi! I'm a child process {os.getpid()}.")
raise Exception("Oh no! :(")
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")
# Here we create a new instance of the Process class and assign our
# `child_process` function to be executed. Note the difference now that
# we are using the `args` parameter now, this means that we can pass
# down parameters to the function being executed as a child process.
process = multiprocessing.Process(target=child_process)
try:
# We then start the process
process.start()
# And finally, we join the process. This will make our script to hang and
# wait until the child process is done.
process.join()
print("AFTER CHILD EXECUTION! RIGHT?!")
except Exception:
print("Uhhh... It failed?")
This results in:
[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 149505
Hi! I'm a child process 149506.
Process Process-1:
Traceback (most recent call last):
File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/tmp/tmp.iuW2VAurGG/scratch.py", line 7, in child_process
raise Exception("Oh no! :(")
Exception: Oh no! :(
AFTER CHILD EXECUTION! RIGHT?!
If you follow up the code, you will be able to notice that there is a print
statement carefully placed after the process.join()
call to simulate that the parent process is still running, even after an unhandled exception raised in its child.
One way of overcoming this situation is to actually handle the exception inside your child process as follows:
#!/usr/bin/env python
import os
import multiprocessing
def child_process():
try:
print(f"Hi! I'm a child process {os.getpid()}.")
raise Exception("Oh no! :(")
except Exception:
print("Uh, I think it's fine now...")
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")
# Here we create a new instance of the Process class and assign our
# `child_process` function to be executed. Note the difference now that
# we are using the `args` parameter now, this means that we can pass
# down parameters to the function being executed as a child process.
process = multiprocessing.Process(target=child_process)
# We then start the process
process.start()
# And finally, we join the process. This will make our script to hang and
# wait until the child process is done.
process.join()
print("AFTER CHILD EXECUTION! RIGHT?!")
Now your exceptions will be handled inside your child process, meaning you can control what will happen to it and what should be done in such cases.
Final thoughts
The multiprocessing
module is very powerful when working and implementing solutions that will depend on executing in a parallel way, especially if used with the Process
class. That adds this amazing possibility to execute any function in its own isolated process.