Photo by Taylor Vick on Unsplash
Setting Up a GPU Server on Ubuntu for Azure N-Series VMs: A Step-by-Step Guide
Table of contents
No headings in the article.
As many of us work with GPU servers, setting one up on an Ubuntu machine can be challenging due to dependency issues like installing the CUDA toolkit, selecting the correct NVIDIA driver, and choosing the appropriate PyTorch version. This guide aims to simplify the process, particularly for Azure VMs that use the N-series with NVIDIA Tesla T4 GPUs.
Steps to Set Up the GPU Server:
1. Set Up the Virtual Machine:
Start by creating a virtual machine with an N-series instance. Often, your Azure account may not have the quota for N-series GPUs by default. You’ll need to request an increase in quota, which is usually approved within 1-2 business days.
2. Choose the Right OS Image:
Use the Ubuntu 20.04 image. Avoid images labeled as “Data Science” since they might lack proper security credentials and consume around 30-40 GB of space unnecessarily.
3. Configure SSD Storage:
Select SSD storage with at least 128 GB. You can attach an existing disk or create a new one. If you attach an existing disk, remember to manually mount it to your VM. Keep in mind that the /mnt space provided by Azure VMs is temporary and will be reset after a restart.
Installing CUDA and NVIDIA Drivers:
After setting up the machine, you can install the necessary drivers and CUDA toolkit. Here’s how to proceed:
1. Update the Machine:
sudo apt update
2. Install Ubuntu Drivers:
sudo apt install -y ubuntu-drivers-common
. Install the Latest NVIDIA Drivers:
sudo ubuntu-drivers install
Reboot the VM:
Restart your virtual machine to apply the changes.
Install CUDA Toolkit:
Follow these steps to install CUDA 12.2. Note that a higher CUDA version can run models that require lower versions (e.g., CUDA 12.2 can run models needing CUDA 11.7), but not vice versa.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2004-12-2-local_12.2.0-535.54.03-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2004-12-2-local_12.2.0-535.54.03-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2004-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda