Patrice Ferlet

Posted on Jan 19

How to resolve the dlopen problem with Nvidia and PyTorch or Tensorflow inside a virtual env

#nvidia #pytorch #tensorflow #gpu

If you install PyTorch or Tensorflow with cuda dependencies, you probably have the same problem as I had: the GPU is not detected and an error about dlopen appears. This article explains why and how to fix it.

TL;DR

# before launching your python, poetry run, pipenv run commands
export LD_LIBRARY_PATH=$(find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -)

Using Nvidia pip packages

I really don't like having CUDA inside my root system on Linux. It's not open source, it's huge, and I often need to fix symlinks on several versions. This is not comfortable, and it needs to add a repository.

By chance, Tensorflow or PyTorch can work with pip packages from Nvidia.

You can, for example, use nvidia-cudnn-cu11 to install cuDnn for CUDA v11 inside your virtual environment. This even using poetry, pipenv or manually.

Tensorflow offers a nice subpackage that installs everything needed:

# or poetry add, or pipenv install ...
pip install "tensorflow[and-cuda]"

OK, that's fun. But here is the problem...

The problem

Trying to use my GPU gives me an error. Actually, the libraries are not found and so Nvidia isn't able to use my GPU.

Let me show you a simple case:

# you could remve this after your tests
mkdir -p Projects/ML/testgpu
cd Projects/ML/testgpu

# install with poetry
mkdir .venv
poetry init --no-interaction --python "^3.12"
poetry env use 3.12

# could be long
poetry add "tensorflow[and-cuda]"

poetry run python -c "import tensorflow;tensorflow.keras.Sequential().compile()"

# in the output:
#...
W0000 00:00:1737318822.063024   15978 gpu_device.cc:2344] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

OK, what's the problem?

You installed "CUDA" in ".venv" but Nvidia uses subdirectories and paths that are not activated by default.

We need to force the LD_LIBRARY_PATH environment to include every directories where Nvidia installed their shared libraries...

But there are a lot of libraries, a lot of ".so" files with suffix.

The solution

By chance, we are using Linux!

On our terminal, we have got some very powerful tools:

find can "find" (surprise...) files in a directory with a pattern
grep can filter the output to find what we need
xargs can be used to apply a command on each output line, and we can use dirname to get the directory name
sort can sort the output and also remove doubles with "-u" (unique) option
paste is a very nice tool to concatenate strings

So, let's try:

find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u
# output:
.venv/lib/python3.12/site-packages/nvidia/cublas/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_cupti/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_nvcc/nvvm/lib64
.venv/lib/python3.12/site-packages/nvidia/cuda_nvrtc/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib
.venv/lib/python3.12/site-packages/nvidia/cudnn/lib
.venv/lib/python3.12/site-packages/nvidia/cufft/lib
.venv/lib/python3.12/site-packages/nvidia/curand/lib
.venv/lib/python3.12/site-packages/nvidia/cusolver/lib
.venv/lib/python3.12/site-packages/nvidia/cusparse/lib
.venv/lib/python3.12/site-packages/nvidia/nccl/lib
.venv/lib/python3.12/site-packages/nvidia/nvjitlink/lib

# let's concatenate
find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -
# output:
.venv/lib/python3.12/site-packages/nvidia/cublas/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_cupti/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_nvcc/nvvm/lib64:.venv/lib/python3.12/site-packages/nvidia/cuda_nvrtc/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:.venv/lib/python3.12/site-packages/nvidia/cudnn/lib:.venv/lib/python3.12/site-packages/nvidia/cufft/lib:.venv/lib/python3.12/site-packages/nvidia/curand/lib:.venv/lib/python3.12/site-packages/nvidia/cusolver/lib:.venv/lib/python3.12/site-packages/nvidia/cusparse/lib:.venv/lib/python3.12/site-packages/nvidia/nccl/lib:.venv/lib/python3.12/site-packages/nvidia/nvjitlink/lib

# OK, so make the LD_LIBRARY_PATH variable:
export LD_LIBRARY_PATH=$(find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -)

And so, in the same terminal:

poetry run python -c "import tensorflow;tensorflow.keras.Sequential().compile()"
# output
#...
I0000 00:00:1737319126.393498   16936 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4169 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6

Hooray, my GPU is not detected !

DEV Community

How to resolve the dlopen problem with Nvidia and PyTorch or Tensorflow inside a virtual env

TL;DR

Using Nvidia pip packages

The problem

The solution

Top comments (0)

Read next

Running Nvidia COSMOS on A100 80Gb

NVIDIA CES 2025 Keynote: AI Revolution and the $3000 Personal Supercomputer

Timeline of key events in Nvidia's history

NVIDIA Drivers with Secure Boot on Ubuntu