Nvidia GPU Exporter Service
Introduction
This guide will provide instructions on how to install the nvidia_gpu_exporter as a service in Ubuntu 24.04. There are two main options for monitoring your GPU with Prometheus and Grafana, this guide covers the "nvidia_gpu_exporter" but there is also the official nvidia_dcgm_exporter from Nvidia, which runs in Docker.
- GitHub: nvidia_gpu_exporter
Prerequisites
The latest Nvidia drivers should be installed:
Prometheus should be set up
Grafana should be connected to your Prometheus instance
Download and Install the nvidia_gpu_exporter
Navigate to the nvidia_gpu_exporter releases and locate the release with the "Latest" tag. At the time of this wiki that is nvidia_gpu_exporter_1.2.1_linux_x86_64.tar.gz
and it will be used as an example in future commands within this wiki.
Download the latest version to the home directory
wget -P ~/ https://github.com/utkuozdemir/nvidia_gpu_exporter/releases/download/v1.2.1/nvidia_gpu_exporter_1.2.1_linux_x86_64.tar.gz
Extract the downloaded file
tar -xvf ~/nvidia_gpu_exporter_1.2.1_linux_x86_64.tar.gz
Move the extracted files
sudo mv ~/nvidia_gpu_exporter /usr/local/bin/nvidia_gpu_exporter
Make the binary executable
sudo chmod +x /usr/local/bin/nvidia_gpu_exporter
Remove the unneeded files
rm -rf ~/nvidia_gpu_exporter_1.2.1_linux_x86_64*
rm LICENSE
Create the nvidia_gpu_exporter user
sudo useradd -r -s /bin/false -c "nvidia_gpu_exporter service account" -d /nonexistent nvidia_gpu_exporter
Create the service
sudo nano /etc/systemd/system/nvidia_gpu_exporter.service
Add the following configuration, the press ctrl + x
, then y
, then enter
to save.
[Unit]
Description=Nvidia GPU Exporter
After=network-online.target
[Service]
Type=simple
User=nvidia_gpu_exporter
Group=nvidia_gpu_exporter
ExecStart=/usr/local/bin/nvidia_gpu_exporter
SyslogIdentifier=nvidia_gpu_exporter
Restart=always
RestartSec=1
[Install]
WantedBy=multi-user.target
Reload the daemon
sudo systemctl daemon-reload
Start the service
sudo systemctl start nvidia_gpu_exporter
Enable service to start on boot
sudo systemctl enable nvidia_gpu_exporter
Verify the status
sudo systemctl status nvidia_gpu_exporter
You should see it is enabled
and active
Update the Config
Open up the prometheus config
sudo nano /etc/prometheus/prometheus.yml
Then add a new job for nvidia_gpu_exporter
scrape_configs:
- job_name: 'nvidia_gpu_exporter'
static_configs:
- targets: ['localhost:9835']
labels:
instance: '<hostname>'
If the host that you are installing nvidia_gpu_exporter on is not the host that is running prometheus, then you should use the IP of the host that you are installing nvidia_gpu_exporter on instead of localhost
.
Restart the prometheus service
sudo systemctl restart prometheus
Verify Target is Up
To be sure the nvidia_gpu_exporter target was loaded, check the web frontend for Prometheus. Click the "Status" menu item in the top navbar and then the "Targets" menu item to see all connected targets. Look for the one labeled "nvidia_gpu_exporter"
Importing the Dashboard
There is a default dashboard that you can import to get started. Open up Grafana, go to "Dashboards" and the click the button "New", from the dropdown select "Import". Then enter 14574
where it says "URL or ID". Then click "Load"
I usually make the name a bit more descriptive and change the UID slightly as well. Then select the Prometheus data source you set up. Finally click "Import"
And that's it! You now have a nice dashboard to monitor your GPU.