How to Install Prometheus and Grafana on Ubuntu?¶
- You can find the source code for this video in my GitHub Repo.
Install Prometheus on Ubuntu 20.04¶
- First of all, let create a dedicated Linux user or sometimes called a system account for Prometheus. Having individual users for each service serves two main purposes:
- It is a security measure to reduce the impact in case of an incident with the service.
- It simplifies administration as it becomes easier to track down what resources belong to which service.
-
To create a system user or system account, run the following command:
--system
- Will create a system account.
--no-create-home
- We don't need a home directory for Prometheus or any other system accounts in our case.
--shell /bin/false
- It prevents logging in as a Prometheus user.
prometheus
- Will create Prometheus user and a group with the exact same name. -
Let's check the latest version of Prometheus from the download page.
- You can use the
curl
orwget
command to download Prometheus. - Then, we need to extract all Prometheus files from the archive.
- Usually, you would have a disk mounted to the data directory. For this tutorial, I will simply create a
/data
director. Also, you need a folder for Prometheus configuration files. - Now, let's change the directory to Prometheus and move some files.
- First of all, let's move the
prometheus
binary and apromtool
to the/usr/local/bin/
. promtool is used to check configuration files and Prometheus rules. - Optionally, we can move console libraries to the Prometheus configuration directory. Console templates allow for the creation of arbitrary consoles using the Go templating language. You don't need to worry about it if you're just getting started.
- Finally, let's move the example of the main prometheus configuration file.
- To avoid permission issues, you need to set correct ownership for the
/etc/prometheus/
anddata
directory. - You can delete the archive and a Prometheus folder when you are done.
- Verify that you can execute the Prometheus binary by running the following command:
- To get more information and configuration options, run Prometheus help. We're going to use some of these options in the service definition.
- We're going to use systemd, which is a system and service manager for Linux operating systems. For that, we need to create a systemd unit configuration file.
Restart
- Configures whether the service shall be restarted when the service process exits, is killed, or a timeout is reached.RestartSec
- Configures the time to sleep before restarting a service.User
and Group
- Are Linux user and a group to start a Prometheus process.--config.file=/etc/prometheus/prometheus.yml
- Path to the main Prometheus configuration file.--storage.tsdb.path=/data
- Location to store Prometheus data.--web.listen-address=0.0.0.0:9090
- Configure to listen on all network interfaces. In some situations, you may have a proxy such as nginx to redirect requests to Prometheus. In that case, you would configure Prometheus to listen only on localhost.--web.enable-lifecycle
-- Allows to manage Prometheus, for example, to reload configuration without restarting the service.
- To automatically start the Prometheus after reboot, run enable.
- Then just start the Prometheus.
- To check the status of Prometheus run following command:
- Suppose you encounter any issues with Prometheus or are unable to start it. The easiest way to find the problem is to use the
journalctl
command and search for errors. - Now we can try to access it via browser. I'm going to be using the IP address of the Ubuntu server. You need to append port
9090
to the IP. For examplehttp://<ip>:9090
. - If you go to targets, you should see only one - Prometheus target. It scrapes itself every 15 seconds by default.
Install Node Exporter on Ubuntu 20.04¶
- Next, we're going to set up and configured Node Exporter to collect Linux system metrics like CPU load and disk I/O. Node Exporter will expose these as Prometheus-style metrics. Since the installation process is very similar, I'm not going to cover as deep as Prometheus.
- First, let's create a system user for Node Exporter by running the following command:
- You can download Node Exporter from the same page.
- Use
wget
command to download binary. - Extract node exporter from the archive.
- Move binary to the
/usr/local/bin
. - Clean up, delete node_exporter archive and a folder.
- Verify that you can run the binary.
- Node Exporter has a lot of plugins that we can enable. If you run Node Exporter help you will get all the options.
--collector.logind
We're going to enable login controller, just for the demo. - Next, create similar systemd unit file.
ExecStart
command.
- To automatically start the Node Exporter after reboot, enable the service.
- Then start the Node Exporter.
- Check the status of Node Exporter with the following command:
- If you have any issues, check logs with
journalctl
-
At this point, we have only a single target in our Prometheus. There are many different service discovery mechanisms built into Prometheus. For example, Prometheus can dynamically discover targets in AWS, GCP, and other clouds based on the labels. In the following tutorials, I'll give you a few examples of deploying Prometheus in a cloud-specific environment. For this tutorial, let's keep it simple and keep adding static targets. Also, I have a lesson on how to deploy and manage Prometheus in the Kubernetes cluster.
-
To create a static target, you need to add
job_name
withstatic_configs
.
9100
.
- Since we enabled lifecycle management via API calls, we can reload Prometheus config without restarting the service and causing the downtime.
- Before, restarting check if the config is valid.
- Then, you can use a POST request to reload the config.
- Check the targets section
http://<ip>:9090/targets
Install Grafana on Ubuntu 20.04¶
- To visualize metrics we can use Grafana. There are many different data sources that Grafana supports, one of them is Prometheus.
- First, let's make sure that all the dependencies are installed.
- Next, add GPG key.
- Add this repository for stable releases.
-
After you add the repository, update and install Garafana.
-
To automatically start the Grafana after reboot, enable the service.
- Then start the Grafana.
- To check the status of Grafana, run the following command:
- Go to
http://<ip>:3000
and log in to the Grafana using default credentials. The username isadmin
, and the password isadmin
as well. - When you log in for the first time, you get the option to change the password. Let's use
devops123
for the new password. - To visualize metrics, you need to add a data source first. Click
Add data source
and select Prometheus. For the URL, enterhttp://localhost:9090
and clickSave and test
. You can seeData source is working
. - Usually, in production environments, you would store all the configurations in Git. Let me show you another way to add a data source as a code. Let's remove the data source from UI.
- Create a new
datasources.yaml
file.
datasources.yaml | |
---|---|
- Restart Grafana to reload the config.
- Go back to Grafana and refresh the page. You should see the Prometheus data source.
- We can import existing Grafana dashboards or create your own. Let's create a simple graph.
- Go back to the Prometheus, and let's explore what metrics we have. Start typing
scrape_duration_seconds
and clickExecute
. This metric will show you the duration of the scrape of each Prometheus target. At this point, we havenode_exporter
andprometheus
targets. We're going to use this metric to create a simple graph in Grafana. - Go to Grafana and click create Dashboard and then add a new panel.
- Give a title
Scrape Duration
and pastescrape_duration_seconds
metric. You can also reduce the time interval to 1 hour. - For the legend, we can use the
job
label and for the unit - seconds. There are a lot of configuration parameters that you can use. Let's keep it simple and click apply and save dashboard as Prometheus. - Since we already have Node Exporter, we can import an open-source dashboard to visualize CPU, Memory, Network, and a bunch of other metrics. You can search for node exporter on the Grafana website
https://grafana.com/grafana/dashboards/
. - Copy
1860
ID to Clipboard. - Now, in Grafana, you can click Import and paste this ID. Then load the dashboard. Select Prometheus datasource and click import.
- You have all sorts of metrics here that come from node exporter.
Install Pushgateway Prometheus on Ubuntu 20.04¶
- Next component that I want to install is Pushgateway. The Pushgateway is a service that allows you to push metrics from jobs that cannot be scrapped. For example, you can have Jenkins jobs or some kind of cron jobs. You can't scrape them since they are running for a limited time only.
- The installation process is very similar to Prometheus and Node exporter.
- Create a dedicated user first.
- Download archive with Pushgateway from
https://prometheus.io/download/
. - Extract all the files.
- Move
pushgateway
binary to to/usr/local/bin
. - Clean up.
- Check if Pushgateway can be executed.
- Also, you can get configuration options by running help.
- Create a systemd service.
pushgateway.service | |
---|---|
- Enable the service.
- Start Pushgateway.
- Check the status.
-
Pushgateway can be reachible at
http://<ip>:9091
. -
Let's add Pushgateway as a target to Prometheus.
...
- job_name: pushgateway
honor_labels: true
static_configs:
- targets: ["localhost:9091"]
- Check Prometheus configuration. If it's valid, reload the config.
- Make sure that the target is up and healthy
http://<ip>:9090
. - To send metrics to the Pushgateway, you just need to send a POST request to the following endpoint
http://localhost:9091/metrics/job/backup
. Where backup is an arbitrary name that will show up as a label. - Use curl and pipe the string with echo to Pushgateway. Let's imagine that the Jenkins job that we named backup took almost 16 seconds to complete.
- You can find this metric in Prometheus. Refresh the page and start typing
jenkins_job_duration_seconds
.
Securing Prometheus with Basic Auth¶
-
When you install Prometheus, it will be open to anyone who knows the endpoint. Fairly recently, Prometheus introduced a way to add basic authentication to each HTTP request. Used to be you had to install a proxy such as nginx at the front of Prometheus and configure basic auth there. Now you can use a built-in authentication mechanism to the Prometheus itself.
-
Let's install the bcrypt python module to create a hash of the password. Prometheus will not store your passwords; it will compute the hash and compare it with the existing one for the given user.
- Now, create a simple script that will ask for input and return the hash for the password.
generate_password.py | |
---|---|
- Run the script and enter
devops123
for the password. - Copy this hash and create an additional Prometheus configuration file.
web.yml | |
---|---|
- Now, we need to provide this config to the Prometheus. Let's update the systemd service definition.
...
ExecStart=/usr/local/bin/prometheus \
...
--web.config.file=/etc/prometheus/web.yml
- Every time you update the systemd service, you need to reload it.
- You also need to restart Prometheus.
- And check the status in case of an error.
- Now, we can test basic authentication. Go to Prometheus and reload the page.
- We also need to update the Grafana datasource to provide a username and password.
datasources.yaml | |
---|---|
- Restart grafana.
- If you go to the targets section, you will see that the Prometheus target is down. Prometheus requires a username and password to scrape itself as well. So let's update the Prometheus target.
...
- job_name: "prometheus"
basic_auth:
username: admin
password: devops123
static_configs:
- targets: ["localhost:9090"]
- Check the Prometheus config and reload the config.
- Verify that Prometheus target is up in the UI.
Install Alertmanager on Ubuntu 20.04¶
-
To send alerts, we're going to use Alertmanager. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or in our case Slack. You can set up multiple Alertmanagers to achieve high availability. For this demo, I will install a single one.
-
First, let's create a system user for Alertmanager.
-
Then, download Alertmanager from the same downloads page.
-
Extract Alertmanager binary.
-
For Alertmanager, we need storage. It is mandatory (it defaults to "data/") and is used to store Alertmanager's notification states and silences. Without this state (or if you wipe it), Alertmanager would not know across restarts what silences were created or what notifications were already sent.
-
Now, let's move Alermanager's binary to the local bin and copy sample config.
-
Remove downloaded archive and a folder.
-
Check if we can run Alertmanager.
-
You can also get help and all supported configuration options by running Alertmanager help.
-
Next is the systemd service definition.
- Enable alertmanager.
- Start Alertmanager.
-
Check the status.
-
Alertmanager will be exposed on port 9093
http://<ip>:9093
. -
It's time to create a simple alert. In almost all Prometheus setups, you have an alert that is always active. It is used to validate the monitoring system itself. For example, it can be integrated with the deadmanssnitch service. If something goes wrong with the Prometheus or Alertmanager and, you will get an emergency notification that your monitoring system is down. It's a very useful service, especially in production environments.
-
Let's create alert but without integration with DeadMansSnitch.
-
You also need to update the Prometheus config to specify the location of Alertmanager and specify the path to the new rule.
...
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- dead-mans-snitch-rule.yml
- It's always a good idea to check Prometheus config before restarting.
Alertmanager Slack Channel Integration¶
-
Alertmanager can be configured to send emails, can be integrated with PagerDuty and many other services. For this demo, I will integrate Alertmanager with Slack. We're going to create a slack channel where all the alerts will be sent.
-
Let's create
alerts
Slack channel. - Create a new Slack app from scratch. Give it a name
Prometheus
and select a workspace. - You can modify the app from the basic information. Let's upload the Prometheus icon.
- Next, we need to enable incoming webhooks. Then add webhook to the workspace.
- The last thing, we need to copy
Webhook URL
and use it in Alertmanager config. - Now, update
alertmanager.yml
config to include a new route to send alerts to the Slack.
-
Restart alertmanager.
-
Create a new alert rule to test Slack integration.
batch-job-rules.yml | |
---|---|
- Add a new rule to Prometheus.
-
Check the config and reload Prometheus.
-
Trigger the alert by sending the new metric to Prometheus Pushgateway.
- In a minute or so, you should get a message in Slack.
- If we send a new metric with a duration of less than 30 seconds, Prometheus will resolve the alert.
Clean
- Delete
alerts
Slack channel. - Delete the Slack
Prometheus
app.