Skip to main content

Nvidia GPU Exporter Service

Introduction

This guide will provide instructions on how to install the nvidia_gpu_exporter as a service in Ubuntu 24.04. There are two main options for monitoring your GPU with Prometheus and Grafana, this guide covers the "nvidia_gpu_exporter" but there is also the official nvidia_dcgm_exporter from Nvidia, which runs in Docker.

Prerequisites

The latest Nvidia drivers should be installed:

Prometheus should be set up

Grafana should be connected to your Prometheus instance

Download and Install the nvidia_gpu_exporter

Navigate to the nvidia_gpu_exporter releases and locate the release with the "Latest" tag. At the time of this wiki that is nvidia_gpu_exporter_1.2.1_linux_x86_64.tar.gz and it will be used as an example in future commands within this wiki.

Download the latest version to the home directory

wget -P ~/ https://github.com/utkuozdemir/nvidia_gpu_exporter/releases/download/v1.2.1/nvidia_gpu_exporter_1.2.1_linux_x86_64.tar.gz

Extract the downloaded file

tar -xvf ~/nvidia_gpu_exporter_1.2.1_linux_x86_64.tar.gz

Move the extracted files

sudo mv ~/nvidia_gpu_exporter /usr/local/bin/nvidia_gpu_exporter

Make the binary executable

sudo chmod +x /usr/local/bin/nvidia_gpu_exporter

Remove the unneeded files

rm -rf ~/nvidia_gpu_exporter_1.2.1_linux_x86_64*
rm LICENSE

Create the nvidia_gpu_exporter user

sudo useradd -r -s /bin/false -c "nvidia_gpu_exporter service account" -d /nonexistent nvidia_gpu_exporter

Create the service

sudo nano /etc/systemd/system/nvidia_gpu_exporter.service

Add the following configuration, the press ctrl + x, then y, then enter to save.

[Unit]
Description=Nvidia GPU Exporter
After=network-online.target

[Service]
Type=simple

User=nvidia_gpu_exporter
Group=nvidia_gpu_exporter

ExecStart=/usr/local/bin/nvidia_gpu_exporter

SyslogIdentifier=nvidia_gpu_exporter

Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

Reload the daemon

sudo systemctl daemon-reload

Start the service

sudo systemctl start nvidia_gpu_exporter

Enable service to start on boot

sudo systemctl enable nvidia_gpu_exporter

Verify the status

sudo systemctl status nvidia_gpu_exporter

You should see it is enabled and active

Update the Config

Open up the prometheus config

sudo nano /etc/prometheus/prometheus.yml

Then add a new job for nvidia_gpu_exporter

scrape_configs:
- job_name: 'nvidia_gpu_exporter'
static_configs:
- targets: ['localhost:9835']
labels:
instance: '<hostname>'

If the host that you are installing nvidia_gpu_exporter on is not the host that is running prometheus, then you should use the IP of the host that you are installing nvidia_gpu_exporter on instead of localhost.

Restart the prometheus service

sudo systemctl restart prometheus

Verify Target is Up

To be sure the nvidia_gpu_exporter target was loaded, check the web frontend for Prometheus. Click the "Status" menu item in the top navbar and then the "Targets" menu item to see all connected targets. Look for the one labeled "nvidia_gpu_exporter"

gpu-exporter-target

Importing the Dashboard

There is a default dashboard that you can import to get started. Open up Grafana, go to "Dashboards" and the click the button "New", from the dropdown select "Import". Then enter 14574 where it says "URL or ID". Then click "Load"

I usually make the name a bit more descriptive and change the UID slightly as well. Then select the Prometheus data source you set up. Finally click "Import"

nvidia-gpu-exporter-import

And that's it! You now have a nice dashboard to monitor your GPU.

gpu-exporter-dash