DEV Community

Mehmet Ali Tilgen
Mehmet Ali Tilgen

Posted on

Process Management in Python: Fundamentals of Parallel Programming

Parallel programming is a programming model that allows a program to run multiple tasks simultaneously on multiple processors or cores. This model aims to use processor resources more efficiently, reduce processing time and increase performance.

To illustrate parallel programming with an image, we can imagine that we have a problem. Before we start parallel processing, we divide this problem into smaller sub-parts. We assume that these sub-parts are independent of each other and have no knowledge about each other. Each sub-problem is translated into smaller tasks or instructions. These tasks are organized in a way that is suitable for parallel work. For example, many instructions can be created to perform the same operation on a dataset. These tasks are then distributed to different processors. Each processor processes its assigned instructions independently and in parallel. This process significantly reduces the total processing time and allows us to use resources more efficiently.

Python offers several tools and modules for parallel programming.

Multiprocessing
It allows the program to take advantage of true parallelism by enabling it to run multiple processes at the same time. multiprocessing module overcomes the limitations of GIL (Global Interpreter Lock), allowing to achieve full performance on multi-core processors.

Global Interpreter Lock (GIL) is a mechanism used in the popular implementation of Python called CPython. GIL allows only one thread to execute Python bytecode at a time. This is a construct that limits true parallelism when multithreading is used in Python.

Example Square and Cube Calculation

from multiprocessing import Process

def print_square(numbers):
    for n in numbers:
        print(f"Square of {n} is {n * n}")

def print_cube(numbers):
    for n in numbers:
        print(f"Cube of {n} is {n * n * n}")

if __name__ == "__main__":
    numbers = [2, 3, 4, 5]  

    # İşlemler (processes) oluşturma
    process1 = Process(target=print_square, args=(numbers,))
    process2 = Process(target=print_cube, args=(numbers,))

    # İşlemleri başlatma
    process1.start()
    process2.start()

    # İşlemlerin tamamlanmasını bekleme
    process1.join()
    process2.join()

Enter fullscreen mode Exit fullscreen mode

Why We Need Multiprocessing We can explain the need for multiprocessing with the analogy of a cook and a kitchen. You can think of a cook cooking alone in a kitchen as a single-process program. We can liken it to multiprocessing when more than one cook works together in the same kitchen.

Single Process - Single Cook

There is only one cook in a kitchen. This cook will make three different dishes: a starter, a main course and a dessert. Each dish is made in turn:
He prepares and completes the starter.
He moves on to the main course and finishes it.
Finally, he makes the dessert.
The problem:

No matter how fast the cook is, he or she takes turns and this wastes time in the kitchen.
If three different dishes need to be cooked at the same time, the time will be longer.
Multiprocessing - Many Cooks

Now imagine that there are three cooks in the same kitchen. Each is preparing a different dish:
One cook makes the starter.
The second cook prepares the main course.
The third cook makes the dessert.
Advantage:

Three dishes are made at the same time, which significantly reduces the total time.
Each cook does its own work independently and is not affected by the others.
Sharing Data Between Processes in Python
In Python, it is possible to share data between different processes using the multiprocessing module. However, each process uses its own memory space. Therefore, special mechanisms are used to share data between processes.

import multiprocessing

result = []

def square_of_list(mylist):
    for num in mylist:
        result.append(num**2)
    return result

mylist= [1,3,4,5]

p1 = multiprocessing.Process(target=square_of_list,args=(mylist,))
p1.start()
p1.join()

print(result) # [] Boş Liste
Enter fullscreen mode Exit fullscreen mode

When we examine the code sample, we see that the result list is empty. The main reason for this is that the processes created with multiprocessing work in their own memory space, independent of the main process. Because of this independence, changes made in the child process are not directly reflected in the variables in the main process.

Python provides the following methods for sharing data:

1. Shared Memory
Value and Array objects are used to share data between operations.
Value: Shares a single data type (for example, a number).
Array: Used for sharing an array of data.

from multiprocessing import Process, Value

def increment(shared_value):
    for _ in range(1000):
        shared_value.value += 1  

if __name__ == "__main__":
    shared_value = Value('i', 0)  
    processes = [Process(target=increment, args=(shared_value,)) for _ in range(5)]

    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print(f"Sonuç: {shared_value.value}")
Enter fullscreen mode Exit fullscreen mode

2. Queue
It uses the FIFO (First In First Out) structure to transfer data between processes.
multiprocessing.Queue allows multiple processes to send and receive data.

from multiprocessing import Process, Queue

def producer(queue):
    for i in range(5):
        queue.put(i)  # Kuyruğa veri ekle
        print(f"Üretildi: {i}")

def consumer(queue):
    while not queue.empty():
        item = queue.get()  
        print(f"Tüketildi: {item}")

if __name__ == "__main__":
    queue = Queue()

    producer_process = Process(target=producer, args=(queue,))
    consumer_process = Process(target=consumer, args=(queue,))

    producer_process.start()
    producer_process.join()

    consumer_process.start()
    consumer_process.join()
Enter fullscreen mode Exit fullscreen mode

3. Pipe
multiprocessing.Pipe provides two-way data transfer between two processes.
It can be used for both sending and receiving data.

from multiprocessing import Process, Pipe

def send_data(conn):
    conn.send([1, 2, 3, 4])  
    conn.close()

if __name__ == "__main__":
    parent_conn, child_conn = Pipe()  

    process = Process(target=send_data, args=(child_conn,))
    process.start()

    print(f"Alınan veri: {parent_conn.recv()}")  # Veri al
    process.join()
Enter fullscreen mode Exit fullscreen mode

*Padding Between Processes
*
“Padding between processes” is often used for process memory organization or to avoid data alignment and collision issues when accessing data shared between multiple processes.

This concept is especially important in cases such as cache-line false sharing. False sharing can lead to performance loss when multiple processes try to use shared memory at the same time. This is due to the sharing of cache-lines in modern processors.

**Synchronization Between Processes
**With the multiprocessing module in Python, multiple processes can run simultaneously. However, it is important to use synchronization when multiple processes need to access the same data. This is necessary to ensure consistency of data and avoid issues such as race conditions.

from multiprocessing import Process, Lock

def print_numbers(lock, name):
    with lock:  # Kilidi alır
        for i in range(5):
            print(f"{name}: {i}")

if __name__ == "__main__":
    lock = Lock()  # Kilit oluştur
    processes = [
        Process(target=print_numbers, args=(lock, f"Process {i}")) for i in range(3)
    ]

    for p in processes:
        p.start()

    for p in processes:
        p.join()
Enter fullscreen mode Exit fullscreen mode

Lock allows only one process to access shared data at a time.
Before the process using the lock finishes, other processes wait.

**Multithreading

Multithreading is a parallel programming model that allows a program to run multiple threads simultaneously. Threads are smaller independent units of code that run within the same process and aim for faster and more efficient processing by sharing resources.
In Python, the threading module is used to develop multithreading applications. However, due to Python's Global Interpreter Lock (GIL) mechanism, multithreading provides limited performance on CPU-bound tasks. Therefore, multithreading is generally preferred for I/O-bound tasks.

thread is the sequence of instructions in our program.

import threading

def print_numbers(name):
    for i in range(5):
        print(f"{name}: {i}")

# İş parçacıkları oluşturma
thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))

# İş parçacıklarını başlatma
thread1.start()
thread2.start()

# İş parçacıklarının tamamlanmasını bekleme
thread1.join()
thread2.join()

print("All threads finished")
Enter fullscreen mode Exit fullscreen mode

**Thread Synchronization
**Thread synchronization is a technique used to ensure data consistency and order when multiple threads access the same resources simultaneously. In Python, the threading module provides several tools for synchronization.

**Why Need Thread Synchronization?
**Race Conditions:

When two or more threads access a shared resource at the same time, data inconsistencies can occur.
For example, one thread may read data while another thread updates the same data.
*Data Consistency:
*

Coordination between threads is required to ensure that shared resources are updated correctly.
Synchronization Tool Examples in Python
**1. Lock
**When a thread acquires the lock, it waits for the lock to be released before other threads can access the same resource.

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100000):
        with lock:  # Kilidi alır
            counter += 1  # Güvenli şekilde artırır

threads = [threading.Thread(target=increment) for _ in range(5)]

for t in threads:
    t.start()

for t in threads:
    t.join()

print(f"Final Counter Value: {counter}")  # Doğru sonuç: 500000
Enter fullscreen mode Exit fullscreen mode

2-Event

import threading
import time

event = threading.Event()

def worker():
    print("Worker waiting for event to be set")
    event.wait()  # Olay tetiklenene kadar bekler
    print("Event is set, worker proceeds")

thread = threading.Thread(target=worker)
thread.start()

time.sleep(2)  # Bekleme simülasyonu
print("Setting the event")
event.set()  # Olay tetiklenir
thread.join()
Enter fullscreen mode Exit fullscreen mode

**Conclusion:
**Thread synchronization is critical to prevent data inconsistencies when threads access shared resources. In Python, tools such as Lock, RLock, Semaphore, Event, and Condition provide effective solutions according to synchronization needs. Which tool to use depends on the needs of the application and synchronization requirements.

Top comments (0)