skip.link.title

How to Install Prometheus and Grafana on Ubuntu?

  • You can find the source code for this video in my GitHub Repo.

Install Prometheus on Ubuntu 20.04

  • First of all, let create a dedicated Linux user or sometimes called a system account for Prometheus. Having individual users for each service serves two main purposes:
    • It is a security measure to reduce the impact in case of an incident with the service.
    • It simplifies administration as it becomes easier to track down what resources belong to which service.
  • To create a system user or system account, run the following command:

    sudo useradd \
        --system \
        --no-create-home \
        --shell /bin/false prometheus
    
    --system - Will create a system account.
    --no-create-home - We don't need a home directory for Prometheus or any other system accounts in our case.
    --shell /bin/false - It prevents logging in as a Prometheus user.
    prometheus - Will create Prometheus user and a group with the exact same name.

  • Let's check the latest version of Prometheus from the download page.

  • You can use the curl or wget command to download Prometheus.
    wget https://github.com/prometheus/prometheus/releases/download/v2.32.1/prometheus-2.32.1.linux-amd64.tar.gz
    
  • Then, we need to extract all Prometheus files from the archive.
    tar -xvf prometheus-2.32.1.linux-amd64.tar.gz
    
  • Usually, you would have a disk mounted to the data directory. For this tutorial, I will simply create a /data director. Also, you need a folder for Prometheus configuration files.
    sudo mkdir -p /data /etc/prometheus
    
  • Now, let's change the directory to Prometheus and move some files.
    cd prometheus-2.32.1.linux-amd64
    
  • First of all, let's move the prometheus binary and a promtool to the /usr/local/bin/. promtool is used to check configuration files and Prometheus rules.
    sudo mv prometheus promtool /usr/local/bin/
    
  • Optionally, we can move console libraries to the Prometheus configuration directory. Console templates allow for the creation of arbitrary consoles using the Go templating language. You don't need to worry about it if you're just getting started.
    sudo mv consoles/ console_libraries/ /etc/prometheus/
    
  • Finally, let's move the example of the main prometheus configuration file.
    sudo mv prometheus.yml /etc/prometheus/prometheus.yml
    
  • To avoid permission issues, you need to set correct ownership for the /etc/prometheus/ and data directory.
    sudo chown -R prometheus:prometheus /etc/prometheus/ /data/
    
  • You can delete the archive and a Prometheus folder when you are done.
    cd
    rm -rf prometheus*
    
  • Verify that you can execute the Prometheus binary by running the following command:
    prometheus --version
    
  • To get more information and configuration options, run Prometheus help.
    prometheus --help
    
    We're going to use some of these options in the service definition.
  • We're going to use systemd, which is a system and service manager for Linux operating systems. For that, we need to create a systemd unit configuration file.
    sudo vim /etc/systemd/system/prometheus.service
    

prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=500
StartLimitBurst=5

[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/data \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.enable-lifecycle

[Install]
WantedBy=multi-user.target
Let's go over a few of the most important options related to systemd and Prometheus. Restart - Configures whether the service shall be restarted when the service process exits, is killed, or a timeout is reached.
RestartSec - Configures the time to sleep before restarting a service.
User and Group - Are Linux user and a group to start a Prometheus process.
--config.file=/etc/prometheus/prometheus.yml - Path to the main Prometheus configuration file.
--storage.tsdb.path=/data - Location to store Prometheus data.
--web.listen-address=0.0.0.0:9090 - Configure to listen on all network interfaces. In some situations, you may have a proxy such as nginx to redirect requests to Prometheus. In that case, you would configure Prometheus to listen only on localhost.
--web.enable-lifecycle -- Allows to manage Prometheus, for example, to reload configuration without restarting the service.

  • To automatically start the Prometheus after reboot, run enable.
    sudo systemctl enable prometheus
    
  • Then just start the Prometheus.
    sudo systemctl start prometheus
    
  • To check the status of Prometheus run following command:
    sudo systemctl status prometheus
    
  • Suppose you encounter any issues with Prometheus or are unable to start it. The easiest way to find the problem is to use the journalctl command and search for errors.
    journalctl -u prometheus -f --no-pager
    
  • Now we can try to access it via browser. I'm going to be using the IP address of the Ubuntu server. You need to append port 9090 to the IP. For example http://<ip>:9090.
  • If you go to targets, you should see only one - Prometheus target. It scrapes itself every 15 seconds by default.

Install Node Exporter on Ubuntu 20.04

  • Next, we're going to set up and configured Node Exporter to collect Linux system metrics like CPU load and disk I/O. Node Exporter will expose these as Prometheus-style metrics. Since the installation process is very similar, I'm not going to cover as deep as Prometheus.
  • First, let's create a system user for Node Exporter by running the following command:
    sudo useradd \
        --system \
        --no-create-home \
        --shell /bin/false node_exporter
    
  • You can download Node Exporter from the same page.
  • Use wget command to download binary.
    wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
    
  • Extract node exporter from the archive.
    tar -xvf node_exporter-1.3.1.linux-amd64.tar.gz
    
  • Move binary to the /usr/local/bin.
    sudo mv \
      node_exporter-1.3.1.linux-amd64/node_exporter \
      /usr/local/bin/
    
  • Clean up, delete node_exporter archive and a folder.
    rm -rf node_exporter*
    
  • Verify that you can run the binary.
    node_exporter --version
    
  • Node Exporter has a lot of plugins that we can enable. If you run Node Exporter help you will get all the options.
    node_exporter --help
    
    --collector.logind We're going to enable login controller, just for the demo.
  • Next, create similar systemd unit file.
    sudo vim /etc/systemd/system/node_exporter.service
    

node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=500
StartLimitBurst=5

[Service]
User=node_exporter
Group=node_exporter
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/node_exporter \
    --collector.logind

[Install]
WantedBy=multi-user.target
Replace Prometheus user and group to node_exporter, and update ExecStart command.

  • To automatically start the Node Exporter after reboot, enable the service.
    sudo systemctl enable node_exporter
    
  • Then start the Node Exporter.
    sudo systemctl start node_exporter
    
  • Check the status of Node Exporter with the following command:
    sudo systemctl status node_exporter
    
  • If you have any issues, check logs with journalctl
    journalctl -u node_exporter -f --no-pager
    
  • At this point, we have only a single target in our Prometheus. There are many different service discovery mechanisms built into Prometheus. For example, Prometheus can dynamically discover targets in AWS, GCP, and other clouds based on the labels. In the following tutorials, I'll give you a few examples of deploying Prometheus in a cloud-specific environment. For this tutorial, let's keep it simple and keep adding static targets. Also, I have a lesson on how to deploy and manage Prometheus in the Kubernetes cluster.

  • To create a static target, you need to add job_name with static_configs.

    sudo vim /etc/prometheus/prometheus.yml
    

prometheus.yml
1
2
3
4
...
  - job_name: node_export
    static_configs:
      - targets: ["localhost:9100"]
By default, Node Exporter will be exposed on port 9100.

  • Since we enabled lifecycle management via API calls, we can reload Prometheus config without restarting the service and causing the downtime.
  • Before, restarting check if the config is valid.
    promtool check config /etc/prometheus/prometheus.yml
    
  • Then, you can use a POST request to reload the config.
    curl -X POST http://localhost:9090/-/reload
    
  • Check the targets section http://<ip>:9090/targets

Install Grafana on Ubuntu 20.04

  • To visualize metrics we can use Grafana. There are many different data sources that Grafana supports, one of them is Prometheus.
  • First, let's make sure that all the dependencies are installed.
    sudo apt-get install -y apt-transport-https software-properties-common
    
  • Next, add GPG key.
    wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
    
  • Add this repository for stable releases.
    echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
    
  • After you add the repository, update and install Garafana.

    sudo apt-get update
    sudo apt-get -y install grafana
    

  • To automatically start the Grafana after reboot, enable the service.

    sudo systemctl enable grafana-server
    

  • Then start the Grafana.
    sudo systemctl start grafana-server
    
  • To check the status of Grafana, run the following command:
    sudo systemctl status grafana-server
    
  • Go to http://<ip>:3000 and log in to the Grafana using default credentials. The username is admin, and the password is admin as well.
  • When you log in for the first time, you get the option to change the password. Let's use devops123 for the new password.
  • To visualize metrics, you need to add a data source first. Click Add data source and select Prometheus. For the URL, enter http://localhost:9090 and click Save and test. You can see Data source is working.
  • Usually, in production environments, you would store all the configurations in Git. Let me show you another way to add a data source as a code. Let's remove the data source from UI.
  • Create a new datasources.yaml file.
    sudo vim /etc/grafana/provisioning/datasources/datasources.yaml
    

datasources.yaml
1
2
3
4
5
6
7
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    url: http://localhost:9090
    isDefault: true
Optionally, you can make this data source a default one.

  • Restart Grafana to reload the config.
    sudo systemctl restart grafana-server
    
  • Go back to Grafana and refresh the page. You should see the Prometheus data source.
  • We can import existing Grafana dashboards or create your own. Let's create a simple graph.
  • Go back to the Prometheus, and let's explore what metrics we have. Start typing scrape_duration_seconds and click Execute. This metric will show you the duration of the scrape of each Prometheus target. At this point, we have node_exporter and prometheus targets. We're going to use this metric to create a simple graph in Grafana.
  • Go to Grafana and click create Dashboard and then add a new panel.
  • Give a title Scrape Duration and paste scrape_duration_seconds metric. You can also reduce the time interval to 1 hour.
  • For the legend, we can use the job label and for the unit - seconds. There are a lot of configuration parameters that you can use. Let's keep it simple and click apply and save dashboard as Prometheus.
  • Since we already have Node Exporter, we can import an open-source dashboard to visualize CPU, Memory, Network, and a bunch of other metrics. You can search for node exporter on the Grafana website https://grafana.com/grafana/dashboards/.
  • Copy 1860 ID to Clipboard.
  • Now, in Grafana, you can click Import and paste this ID. Then load the dashboard. Select Prometheus datasource and click import.
  • You have all sorts of metrics here that come from node exporter.

Install Pushgateway Prometheus on Ubuntu 20.04

  • Next component that I want to install is Pushgateway. The Pushgateway is a service that allows you to push metrics from jobs that cannot be scrapped. For example, you can have Jenkins jobs or some kind of cron jobs. You can't scrape them since they are running for a limited time only.
  • The installation process is very similar to Prometheus and Node exporter.
  • Create a dedicated user first.
    sudo useradd \
        --system \
        --no-create-home \
        --shell /bin/false pushgateway
    
  • Download archive with Pushgateway from https://prometheus.io/download/.
    wget https://github.com/prometheus/pushgateway/releases/download/v1.4.2/pushgateway-1.4.2.linux-amd64.tar.gz
    
  • Extract all the files.
    tar -xvf pushgateway-1.4.2.linux-amd64.tar.gz
    
  • Move pushgateway binary to to /usr/local/bin.
    sudo mv pushgateway-1.4.2.linux-amd64/pushgateway /usr/local/bin/
    
  • Clean up.
    rm -rf pushgateway*
    
  • Check if Pushgateway can be executed.
    pushgateway --version
    
  • Also, you can get configuration options by running help.
    pushgateway --help
    
  • Create a systemd service.
    sudo vim /etc/systemd/system/pushgateway.service
    
pushgateway.service
[Unit]
Description=Pushgateway
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=500
StartLimitBurst=5

[Service]
User=pushgateway
Group=pushgateway
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/pushgateway

[Install]
WantedBy=multi-user.target
  • Enable the service.
    sudo systemctl enable pushgateway
    
  • Start Pushgateway.
    sudo systemctl start pushgateway
    
  • Check the status.
    sudo systemctl status pushgateway
    
  • Pushgateway can be reachible at http://<ip>:9091.

  • Let's add Pushgateway as a target to Prometheus.

    sudo vim /etc/prometheus/prometheus.yml
    

prometheus.yml
...
  - job_name: pushgateway
    honor_labels: true
    static_configs:
      - targets: ["localhost:9091"]
  • Check Prometheus configuration. If it's valid, reload the config.
    promtool check config /etc/prometheus/prometheus.yml
    curl -X POST http://localhost:9090/-/reload
    
  • Make sure that the target is up and healthy http://<ip>:9090.
  • To send metrics to the Pushgateway, you just need to send a POST request to the following endpoint http://localhost:9091/metrics/job/backup. Where backup is an arbitrary name that will show up as a label.
  • Use curl and pipe the string with echo to Pushgateway. Let's imagine that the Jenkins job that we named backup took almost 16 seconds to complete.
    echo "jenkins_job_duration_seconds 15.98" | curl --data-binary @- http://localhost:9091/metrics/job/backup
    
  • You can find this metric in Prometheus. Refresh the page and start typing jenkins_job_duration_seconds.

Securing Prometheus with Basic Auth

  • When you install Prometheus, it will be open to anyone who knows the endpoint. Fairly recently, Prometheus introduced a way to add basic authentication to each HTTP request. Used to be you had to install a proxy such as nginx at the front of Prometheus and configure basic auth there. Now you can use a built-in authentication mechanism to the Prometheus itself.

  • Let's install the bcrypt python module to create a hash of the password. Prometheus will not store your passwords; it will compute the hash and compare it with the existing one for the given user.

    sudo apt-get -y install python3-bcrypt
    

  • Now, create a simple script that will ask for input and return the hash for the password.
    vim generate_password.py
    
generate_password.py
1
2
3
4
5
6
import getpass
import bcrypt

password = getpass.getpass("password: ")
hashed_password = bcrypt.hashpw(password.encode("utf-8"), bcrypt.gensalt())
print(hashed_password.decode())
  • Run the script and enter devops123 for the password.
    python3 generate_password.py
    
  • Copy this hash and create an additional Prometheus configuration file.
    sudo vim /etc/prometheus/web.yml
    
web.yml
1
2
3
---
basic_auth_users:
    admin: $2b$12$CVcceMyfix1Qa7Kupisfe.JVHXG.U4PWFUculUnGlxPrTlBxfNGRe
  • Now, we need to provide this config to the Prometheus. Let's update the systemd service definition.
    sudo vim /etc/systemd/system/prometheus.service
    
prometheus.service
...
ExecStart=/usr/local/bin/prometheus \
  ...
  --web.config.file=/etc/prometheus/web.yml
  • Every time you update the systemd service, you need to reload it.
    sudo systemctl daemon-reload
    
  • You also need to restart Prometheus.
    sudo systemctl restart prometheus
    
  • And check the status in case of an error.
    sudo systemctl status prometheus
    
  • Now, we can test basic authentication. Go to Prometheus and reload the page.
  • We also need to update the Grafana datasource to provide a username and password.
    sudo vim /etc/grafana/provisioning/datasources/datasources.yaml
    
datasources.yaml
---
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    url: http://localhost:9090
    isDefault: true
    basicAuth: true
    basicAuthUser: admin
    secureJsonData:
      basicAuthPassword: devops123
  • Restart grafana.
    sudo systemctl restart grafana-server
    
  • If you go to the targets section, you will see that the Prometheus target is down. Prometheus requires a username and password to scrape itself as well. So let's update the Prometheus target.
    sudo vim /etc/prometheus/prometheus.yml
    
prometheus.yml
...
  - job_name: "prometheus"
    basic_auth:
      username: admin
      password: devops123
    static_configs:
      - targets: ["localhost:9090"]
  • Check the Prometheus config and reload the config.
    promtool check config /etc/prometheus/prometheus.yml
    curl -X POST -u admin:devops123 http://localhost:9090/-/reload
    
  • Verify that Prometheus target is up in the UI.

Install Alertmanager on Ubuntu 20.04

  • To send alerts, we're going to use Alertmanager. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or in our case Slack. You can set up multiple Alertmanagers to achieve high availability. For this demo, I will install a single one.

  • First, let's create a system user for Alertmanager.

    sudo useradd \
        --system \
        --no-create-home \
        --shell /bin/false alertmanager
    

  • Then, download Alertmanager from the same downloads page.

    wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
    

  • Extract Alertmanager binary.

    tar -xvf alertmanager-0.23.0.linux-amd64.tar.gz
    

  • For Alertmanager, we need storage. It is mandatory (it defaults to "data/") and is used to store Alertmanager's notification states and silences. Without this state (or if you wipe it), Alertmanager would not know across restarts what silences were created or what notifications were already sent.

    sudo mkdir -p /alertmanager-data /etc/alertmanager
    

  • Now, let's move Alermanager's binary to the local bin and copy sample config.

    sudo mv alertmanager-0.23.0.linux-amd64/alertmanager /usr/local/bin/
    sudo mv alertmanager-0.23.0.linux-amd64/alertmanager.yml /etc/alertmanager/
    

  • Remove downloaded archive and a folder.

    rm -rf alertmanager*
    

  • Check if we can run Alertmanager.

    alertmanager --version
    

  • You can also get help and all supported configuration options by running Alertmanager help.

    alertmanager --help
    

  • Next is the systemd service definition.

    sudo vim /etc/systemd/system/alertmanager.service
    

alertmanager.service
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=500
StartLimitBurst=5

[Service]
User=alertmanager
Group=alertmanager
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/alertmanager \
  --storage.path=/alertmanager-data \
  --config.file=/etc/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target
  • Enable alertmanager.
    sudo systemctl enable alertmanager
    
  • Start Alertmanager.
    sudo systemctl start alertmanager
    
  • Check the status.

    sudo systemctl status alertmanager
    

  • Alertmanager will be exposed on port 9093 http://<ip>:9093.

  • It's time to create a simple alert. In almost all Prometheus setups, you have an alert that is always active. It is used to validate the monitoring system itself. For example, it can be integrated with the deadmanssnitch service. If something goes wrong with the Prometheus or Alertmanager and, you will get an emergency notification that your monitoring system is down. It's a very useful service, especially in production environments.

  • Let's create alert but without integration with DeadMansSnitch.

    sudo vim /etc/prometheus/dead-mans-snitch-rule.yml
    
    dead-mans-snitch-rule.yml
    1
    2
    3
    4
    5
    6
    7
    8
    ---
    groups:
    - name: dead-mans-snitch
      rules:
      - alert: DeadMansSnitch
        annotations:
          message: This alert is integrated with DeadMansSnitch.
        expr: vector(1)
    

  • You also need to update the Prometheus config to specify the location of Alertmanager and specify the path to the new rule.

    sudo vim /etc/prometheus/prometheus.yml
    

prometheus.yml
...
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093
rule_files:
  - dead-mans-snitch-rule.yml
  • It's always a good idea to check Prometheus config before restarting.
    promtool check config /etc/prometheus/prometheus.yml
    sudo systemctl restart prometheus
    sudo systemctl status prometheus
    

Alertmanager Slack Channel Integration

  • Alertmanager can be configured to send emails, can be integrated with PagerDuty and many other services. For this demo, I will integrate Alertmanager with Slack. We're going to create a slack channel where all the alerts will be sent.

  • Let's create alerts Slack channel.

  • Create a new Slack app from scratch. Give it a name Prometheus and select a workspace.
  • You can modify the app from the basic information. Let's upload the Prometheus icon.
  • Next, we need to enable incoming webhooks. Then add webhook to the workspace.
  • The last thing, we need to copy Webhook URL and use it in Alertmanager config.
  • Now, update alertmanager.yml config to include a new route to send alerts to the Slack.
    sudo vim /etc/alertmanager/alertmanager.yml
    
alertmanager.yml
---
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
  routes:
  - receiver: slack-notifications
    match:
      severity: warning
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
- name: slack-notifications
  slack_configs:
  - channel: "#alerts"
    send_resolved: true
    api_url: "https://hooks.slack.com/services/<id>"
    title: "{{ .GroupLabels.alertname }}"
    text: "{{ range .Alerts }}{{ .Annotations.message }}\n{{ end }}"
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
  • Restart alertmanager.

    sudo systemctl restart alertmanager
    sudo systemctl status alertmanager
    

  • Create a new alert rule to test Slack integration.

    sudo vim /etc/prometheus/batch-job-rules.yml
    

batch-job-rules.yml
---
groups:
- name: batch-job-rules
  rules:
  - alert: JenkinsJobExceededThreshold
    annotations:
      message: Jenkins job exceeded a threshold of 30 seconds.
    expr: jenkins_job_duration_seconds{job="backup"} > 30
    for: 1m
    labels:
      severity: warning
  • Add a new rule to Prometheus.
    sudo vim /etc/prometheus/prometheus.yml
    
prometheus.yml
...
rule_files:
  - dead-mans-snitch-rule.yml
  - batch-job-rules.yml
  • Check the config and reload Prometheus.

    promtool check config /etc/prometheus/prometheus.yml
    curl -X POST -u admin:devops123 http://localhost:9090/-/reload
    

  • Trigger the alert by sending the new metric to Prometheus Pushgateway.

    echo "jenkins_job_duration_seconds 31.87" | curl --data-binary @- http://localhost:9091/metrics/job/backup
    

  • In a minute or so, you should get a message in Slack.
  • If we send a new metric with a duration of less than 30 seconds, Prometheus will resolve the alert.
    echo "jenkins_job_duration_seconds 11.87" | curl --data-binary @- http://localhost:9091/metrics/job/backup
    
Clean
  • Delete alerts Slack channel.
  • Delete the Slack Prometheus app.
top.title