Ubuntu 安裝CUDA cuDNN pytorch tensorflow mxnet | by 林塔恩

文章推薦指數: 80 %
投票人數:10人

This installation did not install the CUDA Driver. A driver of version at least 460.00 is required for CUDA 11.2 functionality to work. 在/home/ 底下 ... UpgradeOpeninappHomeNotificationsListsStoriesWriteUbuntu安裝CUDAcuDNNpytorchtensorflowmxnet需求:已經安裝Ubuntu20.04or18.04目標:安裝Nvidiadriver安裝CUDA安裝cuDNN安裝虛擬環境安裝pytorch安裝tensorflow安裝TensorRT安裝mxnet&nccl1.安裝Nvidiadriver清除原有的nvidiadriver(如果你有的話)sudoapt-getpurgenvidia*加入顯卡ppasudoadd-apt-repositoryppa:graphics-driverspackage更新sudoapt-getupdatesudoaptupgradeUnabletolocktheadministrationdirectory(/var/lib/dpkg/),isanotherprocessusingit?找出目前支援的GPUdriver版本ubuntu-driverslist安裝nvidiadriversudoaptinstallnvidia-driver-VERSION_NUMBER_HERE我目前(2021/7)使用460版本sudoaptinstallnvidia-driver-460安裝完後重啟sudoreboot檢查是否安裝成功nvidia-smi2.安裝CUDA11.2.2https://developer.nvidia.com/cuda-toolkit-archive從官網安裝指定版本,選用linux、ubuntu、18.04、runfile(local)如果想要安裝tensorflow2.5以下版本請選用CUDAToolkit11.0Update1wgethttps://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.runsudoshcuda_11.2.2_460.32.03_linux.run如果已經安裝好了顯卡驅動,在執行上就選擇不安裝cuda提供的顯卡驅動程式因為沒有選擇安裝nvidiadriver,會出現警告資訊,可以不用理會***WARNING:Incompleteinstallation!ThisinstallationdidnotinstalltheCUDADriver.Adriverofversionatleast460.00isrequiredforCUDA11.2functionalitytowork.在/home/底下的.bashrc檔案最下方新增並儲存exportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64exportPATH=$PATH:/usr/local/cuda/bin重新開啟terminal或使用source~/.bashrc更新terminal環境,之後檢查cuda版本nvcc-V正確會顯示nvcc:NVIDIA(R)CudacompilerdriverCopyright(c)2005-2021NVIDIACorporationBuiltonSun_Feb_14_21:12:58_PST_2021Cudacompilationtools,release11.2,V11.2.152Buildcuda_11.2.r11.2/compiler.29618528_0如果已經安裝了CUDA,但出錯怎麼辦?3.安裝cuDNN先登入nvidiacudnn安裝官網,註冊並同意使用者條款,選擇ubuntu18.04版本安裝cudnn下載。

https://developer.nvidia.com/rdp/cudnn-download安裝方法分成兩種(1)tarfile安裝下載cuDNNLibraryforLinux,是一個壓縮檔tar,要將其內容解壓縮複製到安裝cuda的資料夾內解壓縮tar-xzvfcudnn-11.3-linux-x64-v8.2.1.32.tgz複製檔案,貼到安裝cuda的資料夾內sudocpcuda/include/cudnn.h/usr/local/cuda/includesudocpcuda/lib64/libcudnn*/usr/local/cuda/lib64sudochmoda+r/usr/local/cuda/include/cudnn.h/usr/local/cuda/lib64/libcudnn*(2)debfile安裝下載以下三個檔案cuDNNRuntimeLibraryforUbuntu18.04(Deb)cuDNNDeveloperLibraryforUbuntu18.04(Deb)cuDNNCodeSamplesandUserGuideforUbuntu18.04(Deb)使用dpkg-i執行安裝安裝cuDNNruntimelibrarysudodpkg-ilibcudnn8_8.2.1.32–1+cuda11.3_amd64.deb會看到出現/sbin/ldconfig.real:/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so.8isnotasymboliclink如果想解決此Warning可看這安裝cuDNNdeveloperlibrarysudodpkg-ilibcudnn8-devel_8.2.1.32–1+cuda11.3_amd64.deb安裝cuDNNsample和userguidesudodpkg-ilibcudnn8-doc_8.2.1.32–1+cuda11.3_amd64.deb測試cuDNN是否正確安裝,在原先cuDNN檔案下載位置使用cp-r/usr/src/cudnn_samples_v8/$HOMEcd$HOME/cudnn_samples_v8/mnistCUDNNmakeclean&&make./mnistCUDNN執行完看到Testpassed!表示成功安裝FreeImage.h:NosuchfileordirectoryCudafailurer...Aborting…?4.安裝虛擬環境可以選擇使用virtualenv或是Anaconda,推薦在ubuntu上可以使用比較輕量的virtualenv(1)virtualenv安裝安裝virtualenvsudoapt-getinstallpython3-pipsudopip3installvirtualenvsudopip3installvirtualenvwrapper在/home/底下的.bashrc檔案最下方新增並儲存exportWORKON_HOME=~/.virtualenvsexportVIRTUALENVWRAPPER_PYTHON=source/usr/local/bin/virtualenvwrapper.sh可以使用whichpython3找到你的python3執行檔案位置,一般會在/usr/bin/python3完成後開啟新的一個terminal,或是使用source~/.bashrc更新terminal環境創造新的虛擬環境mkvirtualenv之後也可以用workon啟動已經存在的虛擬環境workon(2)Anaconda安裝選擇個人版本的anaconda下載Anaconda|IndividualEditionAnaconda'sopen-sourceIndividualEditionistheeasiestwaytoperformPython/Rdatascienceandmachinelearningona…www.anaconda.com確定使用的是有sudo權限的user來安裝anacondabashAnaconda3–2021.05-Linux-x86_64.sh一路enter到最底選yesDoyouapprovethelicenseterms?[yes|no]同意使用conda進行安裝DoyouwishtheinstallertoprependtheAnaconda3installlocationtoPATHinyour/home/fishworm/.bashrc?[yes|no]安裝完成後,檢查安裝內容condalistanaconda創造虛擬環境condacreate--namepython3numpy啟動conda虛擬環境condaactivate5.安裝pytorch啟動虛擬環境PyTorchPushingthestateoftheartinNLPandMulti-tasklearning.UsingPyTorch'sflexibilitytoefficientlyresearchnew…pytorch.org找到stableforlinux,選擇對應的cuda版本以下是以anaconda為虛擬環境的執行碼condainstallpytorchtorchvisiontorchaudiocudatoolkit=11.1-cpytorch-cnvidia執行後自動下載安裝測試是否正確安裝,啟動python輸入>>>importtorch>>>torch.cuda.is_available()True輸出True代表正確安裝6.安裝tensorflow啟動虛擬環境,安裝tensorflowpipinstalltensorflowpip目前(2021/7)所安裝的tensorflow2.5本身就包含了gpu功能測試是否正確安裝,啟動python輸入>>>importtensorflowastf>>>physical_devices=tf.config.list_physical_devices('GPU')>>>print("NumGPUs:",len(physical_devices))NumGPUs:1正確的話以上程序會顯示你所擁有的gpu數量7.安裝TensorRT目前(2022/3)TensorRT8.2GA支援CUDA10.211.0update111.1update111.2update211.3update111.4update311.5update111.6想要使用TensorRTpythonAPI需要安裝pycuda注意pycuda需要在已經安裝完CUDA和numpy後才安裝python3-mpipinstall'pycuda<2021.1'如果出現nvccnotinpath錯誤,去~/.bashrc下添加exportPATH=$PATH:/usr/local/cuda/binexportCUDA_ROOT=$CUDA_ROOT:/usr/local/cuda/找到TensorRT下載位置https://developer.nvidia.com/nvidia-tensorrt-download可以選擇用debortar的方式安裝,在此選用tar方式,下載後解壓縮version="8.x.x.x"arch=$(uname-m)cuda="cuda-x.x"cudnn="cudnn8.x"tar-xzvfTensorRT-${version}.Linux.${arch}-gnu.${cuda}.${cudnn}.tar.gz對我而言是tar-zxvfTensorRT-8.2.3.0.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz在~/.bashrc內加上TensorRT解壓縮後的位置gedit~/.bashrc到最後一行加上exportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:我個人解壓縮位置選在$HOMEexportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/TensorRT-8.2.3.0/lib啟動虛擬環境後安裝PythonTensorRTwheelfile以下所有安裝都是處於虛擬環境之下workoncdTensorRT-${version}/python使用python3.xpip3installtensorrt-*-cp3x-none-linux_x86_64.whl選用的版本cp3x的x對應你的python3版本安裝PythonUFFwheelfile,只有要使用tensorflow+TensorRT的人才需要cdTensorRT-${version}/uffpip3installuff-0.6.9-py2.py3-none-any.whl檢查安裝whichconvert-to-uff安裝PythongraphsurgeonwheelfilecdTensorRT-${version}/graphsurgeonpip3installgraphsurgeon-0.4.5-py2.py3-none-any.whl安裝Pythononnx-graphsurgeonwheelfilecdTensorRT-${version}/onnx_graphsurgeonpip3installonnx_graphsurgeon-0.3.12-py2.py3-none-any.whl確認是否安裝成功,啟動python輸入importpycudaimporttensorrtimporttensorflowfromtensorflow.python.compiler.tensorrtimporttrt_convertastrt測試是否正確安裝可以使用sampleMNIST做測試8.安裝mxnet&nccl從mxnet1.8+cu112之後開始,mxnet需要安裝nccl模組才能正確執行沒有安裝nccl在嘗試importmxnet時會出現OSError:libnccl.so.2:cannotopensharedobjectfile:Nosuchfileordirectory安裝nccl可以選擇兩種方法(1)從網路上下載wgethttps://developer.download.nvidia.com/compute/cuda/repos///cuda-.pinsudomvcuda-.pin/etc/apt/preferences.d/cuda-repository-pin-600是OS版本,例如ubuntu1604、ubuntu1804、ubuntu2004是CPU架構,例如x86_64、ppc64le、sbsa加入apt-key,並更新sudoapt-keyadv--fetch-keyshttps://developer.download.nvidia.com/compute/cuda/repos///7fa2af80.pubsudoadd-apt-repository"debhttps://developer.download.nvidia.com/compute/cuda/repos////"sudoapt-getupdate(2)下載deb檔需要先註冊https://developer.nvidia.com/nccl/nccl-download選擇指定版本下載,並安裝sudodpkg-inccl-repo-.deb完成以上兩種方法的任一種之後,執行以下命令安裝ncclsudoaptinstalllibnccl2=+libnccl-dev=+再安裝mxnet支援cuda11.2版本pipinstallmxnet-cu112測試安裝是否成功>>>importmxnetasmx>>>a=mx.nd.ones((2,3),mx.gpu())>>>b=a*2+1>>>b.asnumpy()array([[3.,3.,3.],[3.,3.,3.]],dtype=float32)參考:https://www.mvps.net/docs/install-nvidia-drivers-ubuntu-18-04-lts-bionic-beaver-linux/https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstarthttps://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.htmlhttps://docs.nvidia.com/deeplearning/nccl/install-guide/解決libcudnn.so.8isnotasymboliclink警告這是因為在copycudnnfile時失去了symlink而產生cd/usr/local/cuda/lib64ls-lhalibcudnn*理論上應該要是lrwxrwxrwx1rootroot13三1315:37libcudnn.so->libcudnn.so.8lrwxrwxrwx1rootroot17三1315:37libcudnn.so.8->libcudnn.so.8.2.1-rwxr-xr-x1rootroot439M三1310:26libcudnn.so.8.2.1-rw-r—r—1rootroot413M三1310:26libcudnn_static.a實際上我的內容是-rwxr-xr-x1rootroot439M三1310:26libcudnn.so-rwxr-xr-x1rootroot439M三1310:26libcudnn.so.8-rwxr-xr-x1rootroot439M三1310:26libcudnn.so.8.2.1-rw-r—r—1rootroot413M三1310:26libcudnn_static.a連結喪失,需要重新連結sudoln-sflibcudnn.so.8.2.1libcudnn.so.8sudoln-sflibcudnn.so.8libcudnn.so依此類推所有libcudnn的檔案全部連結後執行刷新sudoldconfig如果沒有出現…isnotasymboliclink表示解決問題!以下是各種遇到的各種BugBug1:無法進行更新E:Couldnotgetlock/var/lib/dpkg/lock—open(11:Resourcetemporarilyunavailable)E:Unabletolocktheadministrationdirectory(/var/lib/dpkg/),isanotherprocessusingit?遇到上述狀況表示有檔案被鎖住,檢查有甚麼檔案在運行方法一用APTpackagemanagementtool檢查UbuntuSoftwareCenter或SynapticPackageManager有無正在運行的程式或用Linuxcommandline找出正在使用apt-get的行動,並停止它psaux|grep-iapt上述會顯示id號碼sudokill-9或是全部刪除sudokillallaptapt-get方法二因為突然斷網或關機,導致更新到一半的程式被鎖住,此時不會有運行id但仍產生錯誤,因為有了lockfilelockfile是為了避免檔案同時被多個程式存取,apt-get運行時會產生lockfile,若沒有正常關閉則lockfile無法消失,並阻擋安裝用lsof檢查有無lockfilelsof/var/lib/dpkg/locklsof/var/lib/apt/lists/locklsof/var/cache/apt/archives/lock回傳的單一數值就是之前運行lockfille的執行號碼,刪除他(PID是號碼)sudokill-9PID再刪除lockfilesudorm/var/lib/apt/lists/locksudorm/var/cache/apt/archives/locksudorm/var/lib/dpkg/lock重新配置packagessudodpkg—configure-a若仍有錯誤會出現“dpkg:error:dpkgfrontendislockedbyanotherprocess”使用額外步驟,找出保有lockfile的執行號碼lsof/var/lib/dpkg/lock-frontend刪除他sudokill-9PID移除lockfile並再試一次sudorm/var/lib/dpkg/lock-frontendsudodpkg—configure-aBug2:CUDA安裝失敗,移除步驟移除CUDAsudoapt-getremovenvidia-cuda-toolkit移除cuda和關聯檔案sudoapt-getremove—auto-removenvidia-cuda-toolkit徹底清除資料sudoapt-getpurgenvidia-cuda-toolkitorsudoapt-getpurge—auto-removenvidia-cuda-toolkitBug3:cuDNN驗證失敗.ubuntu@ubuntu:$./mnistCUDNNcudnnGetVersion():7003,CUDNN_VERSIONfromcudnn.h:7003(7.0.3)Cudafailurerversion:GCC5.4.0Error:CUDAdriverversionisinsufficientforCUDAruntimeversionerror_util.h:93Aborting…根據https://devtalk.nvidia.com/default/topic/1025828/cudnn/failed-cudnn-test-mnistcudnn-/1表示重新開機一次就成功了Bug4:FreeImage.h:Nosuchfileordirectorycudnn驗證用的smaplecode需要安裝FreeImage,使用sudoapt-getinstalllibfreeimage3libfreeimage-dev完成安裝Morefrom林塔恩FollowLovepodcastsoraudiobooks?Learnonthegowithournewapp.TryKnowableGetstarted林塔恩5FollowersFollowRelatedEssentialLinuxCommand-LineTricksforComputerVisionResearchersHowtosetupCUDAandTensorFlowonUbuntu20.04 — 2022DeepLearningonWSL2LinuxsystemandBash[shell](https://scicomp.aalto.fi/scicomp/shell/)providesmoothcodingworkflowasaprogrammerandWindowsalso…InstallingCUDAandcuDNNinUbuntu20.04fordeeplearningHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable



請為這篇文章評分?