nvidia-dockerとdockerコマンドの違い
些細な気づきかもしれませんが自分的には結構ショックだったのでメモ.
結論
nvidia-docker だと /usr/local/nvidia/lib と /usr/local/nvidia/lib64 があり docker だとごっそり消えてる.
$ sudo docker run --name temp -it nvidia/cuda:cudnn /bin/bash ls /usr/local/nvidia/lib ls: cannot access /usr/local/nvidia/lib: No such file or directory $ sudo nvidia-docker run --name temp -it nvidia/cuda:cudnn /bin/bash root@53d372f5a1e0:/# ls /usr/local/nvidia/lib libEGL.so.1 libGLX_indirect.so.0 libnvidia-fbc.so.1 ...
ただの docker コマンドで作成したコンテナ上に tensorflow を導入しようとしたら
2017-01-26T07:37:23.139264630Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally 2017-01-26T07:37:23.356126761Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally 2017-01-26T07:37:23.448104369Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally 2017-01-26T07:37:23.448596050Z I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 2017-01-26T07:37:23.448658973Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 5d740957f6c8 2017-01-26T07:37:23.448668843Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program 2017-01-26T07:37:23.448676431Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016 2017-01-26T07:37:23.448684466Z GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 2017-01-26T07:37:23.448691355Z """ 2017-01-26T07:37:23.448700503Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0 2017-01-26T07:37:23.448707809Z I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 2017-01-26T07:37:23.448715019Z I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2017-01-26T07:37:23.496708922Z I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally 2017-01-26T07:37:25.040346969Z E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE 2017-01-26T07:37:25.040379825Z I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:152] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
ってな具合でハマってしまいバカを見ました.