DEV Community

Cover image for How to resolve the dlopen problem with Nvidia and PyTorch or Tensorflow inside a virtual env
Patrice Ferlet
Patrice Ferlet

Posted on

How to resolve the dlopen problem with Nvidia and PyTorch or Tensorflow inside a virtual env

If you install PyTorch or Tensorflow with cuda dependencies, you probably have the same problem as I had: the GPU is not detected and an error about dlopen appears. This article explains why and how to fix it.

TL;DR

# before launching your python, poetry run, pipenv run commands
export LD_LIBRARY_PATH=$(find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -)
Enter fullscreen mode Exit fullscreen mode

Using Nvidia pip packages

I really don't like having CUDA inside my root system on Linux. It's not open source, it's huge, and I often need to fix symlinks on several versions. This is not comfortable, and it needs to add a repository.

By chance, Tensorflow or PyTorch can work with pip packages from Nvidia.

You can, for example, use nvidia-cudnn-cu11 to install cuDnn for CUDA v11 inside your virtual environment. This even using poetry, pipenv or manually.

Tensorflow offers a nice subpackage that installs everything needed:

# or poetry add, or pipenv install ...
pip install "tensorflow[and-cuda]"
Enter fullscreen mode Exit fullscreen mode

OK, that's fun. But here is the problem...

The problem

Trying to use my GPU gives me an error. Actually, the libraries are not found and so Nvidia isn't able to use my GPU.

Let me show you a simple case:

# you could remve this after your tests
mkdir -p Projects/ML/testgpu
cd Projects/ML/testgpu

# install with poetry
mkdir .venv
poetry init --no-interaction --python "^3.12"
poetry env use 3.12

# could be long
poetry add "tensorflow[and-cuda]"

poetry run python -c "import tensorflow;tensorflow.keras.Sequential().compile()"

# in the output:
#...
W0000 00:00:1737318822.063024   15978 gpu_device.cc:2344] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Enter fullscreen mode Exit fullscreen mode

OK, what's the problem?

You installed "CUDA" in ".venv" but Nvidia uses subdirectories and paths that are not activated by default.

We need to force the LD_LIBRARY_PATH environment to include every directories where Nvidia installed their shared libraries...

But there are a lot of libraries, a lot of ".so" files with suffix.

The solution

By chance, we are using Linux!

On our terminal, we have got some very powerful tools:

  • find can "find" (surprise...) files in a directory with a pattern
  • grep can filter the output to find what we need
  • xargs can be used to apply a command on each output line, and we can use dirname to get the directory name
  • sort can sort the output and also remove doubles with "-u" (unique) option
  • paste is a very nice tool to concatenate strings

So, let's try:

find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u
# output:
.venv/lib/python3.12/site-packages/nvidia/cublas/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_cupti/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_nvcc/nvvm/lib64
.venv/lib/python3.12/site-packages/nvidia/cuda_nvrtc/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib
.venv/lib/python3.12/site-packages/nvidia/cudnn/lib
.venv/lib/python3.12/site-packages/nvidia/cufft/lib
.venv/lib/python3.12/site-packages/nvidia/curand/lib
.venv/lib/python3.12/site-packages/nvidia/cusolver/lib
.venv/lib/python3.12/site-packages/nvidia/cusparse/lib
.venv/lib/python3.12/site-packages/nvidia/nccl/lib
.venv/lib/python3.12/site-packages/nvidia/nvjitlink/lib

# let's concatenate
find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -
# output:
.venv/lib/python3.12/site-packages/nvidia/cublas/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_cupti/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_nvcc/nvvm/lib64:.venv/lib/python3.12/site-packages/nvidia/cuda_nvrtc/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:.venv/lib/python3.12/site-packages/nvidia/cudnn/lib:.venv/lib/python3.12/site-packages/nvidia/cufft/lib:.venv/lib/python3.12/site-packages/nvidia/curand/lib:.venv/lib/python3.12/site-packages/nvidia/cusolver/lib:.venv/lib/python3.12/site-packages/nvidia/cusparse/lib:.venv/lib/python3.12/site-packages/nvidia/nccl/lib:.venv/lib/python3.12/site-packages/nvidia/nvjitlink/lib

# OK, so make the LD_LIBRARY_PATH variable:
export LD_LIBRARY_PATH=$(find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -)
Enter fullscreen mode Exit fullscreen mode

And so, in the same terminal:

poetry run python -c "import tensorflow;tensorflow.keras.Sequential().compile()"
# output
#...
I0000 00:00:1737319126.393498   16936 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4169 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Enter fullscreen mode Exit fullscreen mode

Hooray, my GPU is not detected !

Top comments (0)