If you install PyTorch or Tensorflow with cuda dependencies, you probably have the same problem as I had: the GPU is not detected and an error about dlopen
appears. This article explains why and how to fix it.
TL;DR
# before launching your python, poetry run, pipenv run commands
export LD_LIBRARY_PATH=$(find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -)
Using Nvidia pip packages
I really don't like having CUDA inside my root system on Linux. It's not open source, it's huge, and I often need to fix symlinks on several versions. This is not comfortable, and it needs to add a repository.
By chance, Tensorflow or PyTorch can work with pip
packages from Nvidia.
You can, for example, use nvidia-cudnn-cu11
to install cuDnn
for CUDA v11 inside your virtual environment. This even using poetry
, pipenv
or manually.
Tensorflow offers a nice subpackage that installs everything needed:
# or poetry add, or pipenv install ...
pip install "tensorflow[and-cuda]"
OK, that's fun. But here is the problem...
The problem
Trying to use my GPU gives me an error. Actually, the libraries are not found and so Nvidia isn't able to use my GPU.
Let me show you a simple case:
# you could remve this after your tests
mkdir -p Projects/ML/testgpu
cd Projects/ML/testgpu
# install with poetry
mkdir .venv
poetry init --no-interaction --python "^3.12"
poetry env use 3.12
# could be long
poetry add "tensorflow[and-cuda]"
poetry run python -c "import tensorflow;tensorflow.keras.Sequential().compile()"
# in the output:
#...
W0000 00:00:1737318822.063024 15978 gpu_device.cc:2344] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
OK, what's the problem?
You installed "CUDA" in ".venv
" but Nvidia uses subdirectories and paths that are not activated by default.
We need to force the
LD_LIBRARY_PATH
environment to include every directories where Nvidia installed their shared libraries...
But there are a lot of libraries, a lot of ".so
" files with suffix.
The solution
By chance, we are using Linux!
On our terminal, we have got some very powerful tools:
-
find
can "find" (surprise...) files in a directory with a pattern -
grep
can filter the output to find what we need -
xargs
can be used to apply a command on each output line, and we can usedirname
to get the directory name -
sort
can sort the output and also remove doubles with "-u
" (unique) option -
paste
is a very nice tool to concatenate strings
So, let's try:
find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u
# output:
.venv/lib/python3.12/site-packages/nvidia/cublas/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_cupti/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_nvcc/nvvm/lib64
.venv/lib/python3.12/site-packages/nvidia/cuda_nvrtc/lib
.venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib
.venv/lib/python3.12/site-packages/nvidia/cudnn/lib
.venv/lib/python3.12/site-packages/nvidia/cufft/lib
.venv/lib/python3.12/site-packages/nvidia/curand/lib
.venv/lib/python3.12/site-packages/nvidia/cusolver/lib
.venv/lib/python3.12/site-packages/nvidia/cusparse/lib
.venv/lib/python3.12/site-packages/nvidia/nccl/lib
.venv/lib/python3.12/site-packages/nvidia/nvjitlink/lib
# let's concatenate
find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -
# output:
.venv/lib/python3.12/site-packages/nvidia/cublas/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_cupti/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_nvcc/nvvm/lib64:.venv/lib/python3.12/site-packages/nvidia/cuda_nvrtc/lib:.venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:.venv/lib/python3.12/site-packages/nvidia/cudnn/lib:.venv/lib/python3.12/site-packages/nvidia/cufft/lib:.venv/lib/python3.12/site-packages/nvidia/curand/lib:.venv/lib/python3.12/site-packages/nvidia/cusolver/lib:.venv/lib/python3.12/site-packages/nvidia/cusparse/lib:.venv/lib/python3.12/site-packages/nvidia/nccl/lib:.venv/lib/python3.12/site-packages/nvidia/nvjitlink/lib
# OK, so make the LD_LIBRARY_PATH variable:
export LD_LIBRARY_PATH=$(find .venv -name "*.so*" | grep nvidia | xargs dirname | sort -u | paste -d ":" -s -)
And so, in the same terminal:
poetry run python -c "import tensorflow;tensorflow.keras.Sequential().compile()"
# output
#...
I0000 00:00:1737319126.393498 16936 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4169 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Hooray, my GPU is not detected !
Top comments (0)