walkingmask’s development log

IT系の情報などを適当に書いていきます

MENU

nvidia-dockerとdockerコマンドの違い

些細な気づきかもしれませんが自分的には結構ショックだったのでメモ.

結論

nvidia-docker だと /usr/local/nvidia/lib と /usr/local/nvidia/lib64 があり docker だとごっそり消えてる.

$ sudo docker run --name temp -it nvidia/cuda:cudnn /bin/bash
ls /usr/local/nvidia/lib
ls: cannot access /usr/local/nvidia/lib: No such file or directory

$ sudo nvidia-docker run --name temp -it nvidia/cuda:cudnn /bin/bash
root@53d372f5a1e0:/# ls /usr/local/nvidia/lib
libEGL.so.1                    libGLX_indirect.so.0                 libnvidia-fbc.so.1
...

ただの docker コマンドで作成したコンテナ上に tensorflow を導入しようとしたら

2017-01-26T07:37:23.139264630Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
2017-01-26T07:37:23.356126761Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
2017-01-26T07:37:23.448104369Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
2017-01-26T07:37:23.448596050Z I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2017-01-26T07:37:23.448658973Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 5d740957f6c8
2017-01-26T07:37:23.448668843Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2017-01-26T07:37:23.448676431Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.57  Mon Oct  3 20:37:01 PDT 2016
2017-01-26T07:37:23.448684466Z GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 
2017-01-26T07:37:23.448691355Z """
2017-01-26T07:37:23.448700503Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
2017-01-26T07:37:23.448707809Z I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2017-01-26T07:37:23.448715019Z I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2017-01-26T07:37:23.496708922Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
2017-01-26T07:37:25.040346969Z E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2017-01-26T07:37:25.040379825Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:152] no NVIDIA GPU device is present: /dev/nvidia0 does not exist

ってな具合でハマってしまいバカを見ました.