【Parallel processing in Python】Joblib explained

1. What is Joblib

Joblib is a Python library that provides tools for efficiently saving and loading Python objects, particularly useful for machine learning workflows.

・Install

pip install joblib

2. Key Features

2.1 Parallel Processing

Joblib provides easy-to-use parallel processing capabilities through its Parallel and delayed functions. This is useful for tasks that can be parallelized, such as parameter grid searches or data preprocessing.

from joblib import Parallel, delayed

def process_data(data):
    # Simulate a time-consuming data processing step
    import time
    time.sleep(1)
    return data ** 2

data = [1, 2, 3, 4, 5]

# Parallel(n_jobs=workjob_num)(delayed(func_be_applied)(aug) for elem in array
results = Parallel(n_jobs=2)(delayed(process_data)(d) for d in data)
print(results)

We can use it smiply with list comprehensions as like above. If you specify n_jobs=-1, all available CPU cores will be used for the parallel computation. This can significantly speed up processing time for tasks that are CPU-bound and can be effectively parallelized.
However, it may affect other application using CPU or memory, so should be careful this.

:::details Speed test
・Test

from joblib import Parallel, delayed
import time

def process_data(data):
    # Simulate a time-consuming data processing step
    time.sleep(1)
    return data ** 2

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Normal calculation
start_time = time.time()
results_normal = [process_data(d) for d in data]
end_time = time.time()
normal_duration = end_time - start_time
print("Normal Calculation Results:", results_normal)
print("Normal Calculation Duration:", normal_duration, "seconds")

# Parallel calculation  with n_jobs=2
start_time = time.time()
results_parallel = Parallel(n_jobs=2)(delayed(process_data)(d) for d in data)
end_time = time.time()
parallel_duration = end_time - start_time
print("Parallel Calculation Results:", results_parallel)
print("Parallel Calculation Duration:", parallel_duration, "seconds")

# Parallel calculation with n_jobs=-1
start_time = time.time()
results_parallel = Parallel(n_jobs=-1)(delayed(process_data)(d) for d in data)
end_time = time.time()
parallel_duration = end_time - start_time
print("Parallel Calculation Results:", results_parallel)
print("Parallel Calculation Duration:", parallel_duration, "seconds")

・Result

# Normal Calculation Results: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
# Normal Calculation Duration: 10.011737823486328 seconds
# Parallel Calculation Results: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
# Parallel Calculation Duration: 5.565693616867065 seconds
# Parallel Calculation Results: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
# Parallel Calculation Duration: 3.627182722091675 seconds

:::

As you can see the results of test, parallel processing provides 2x or more faster calculation.

2.2 Serialization/Compression

Joblib using binary format when saves and loads python objects to disk. It provide efficient and faster operation.
Also, it supports various compression methods like zlib, gzip, bz2, and xz, allowing you to reduce the storage size of saved objects.

・Serialization

import joblib

data = [i for i in range(1000000)]

compression = False
if compression:
    joblib.dump(data, 'data.pkl', compress=('gzip', 3))
else:
    joblib.dump(data, 'data.pkl')

data = joblib.load('data.pkl')
print(len(data))
# 1000000

the '3' specifies the compression level (typically from 1 to 9, where higher numbers indicate more compression but slower speeds).

2.3 Memory Mapping

For large NumPy arrays, Joblib can use memory mapping to save memory by keeping a reference to the data on disk instead of loading it all into memory.

When you memory-map a file, parts of the file are loaded into RAM as needed, which can result in slower access times compared to having the entire dataset in RAM. However, it allows you to handle datasets larger than your available RAM.

If you wanna use the data, you have to load the data from storage disk.

from joblib import Memory
import math

cachedir = "./memory_cache"
memory = Memory(cachedir, verbose=0)

@memory.cache
def calc(x):
    print("RUNNING......")
    return math.sqrt(x)

print(calc(2))
print(calc(2))
print(calc(5))

# RUNNING......
# 1.41421356237
# 1.41421356237
# RUNNING......
# 2.23606797749979

As shown, the same calculation's result is returned without calculation(not through the func).
This is useful when doing same calculation, like Fibonacci sequence.