skip.link.title

How to Monitor Golang with Prometheus (Counter - Gauge - Histogram - Summary)

  • You can find the source code for this video in my GitHub Repo.

Create Minimal App

First of all, let's create a folder for the golang app. If you are on Linux or on mac, you can run the following command:

mkdir my-app

Then we need to switch to the my-app directory.

cd my-app

Start your module using the go mod init. Replace the path to your source code repository.

go mod init github.com/antonputra/tutorials/lessons/137/my-app

Now, let's create the bare minimum application to expose default Golang metrics via the http://localhost:8081/metrics endpoint. For this tutorial, we're going to use a standard http module to create an HTTP API to manage hardware devices.

my-app/main.go
package main

import (
    "net/http"

    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8081", nil)
}

Run go mod tidy to download the golang prometheus client.

go mod tidy

To start the app, you can run go run main.go.

go run main.go

To get default metrics exposed by the go app, you can use curl.

curl localhost:8081/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
...

Most of the time, these metrics are not very useful. It's a good practice only to expose and collect metrics that you actually need. Especially if you use one of the many managed prometheuses that charge by the number of metrics and the storage. Moving forward, we'll disable default metrics.

Gauge

We'll start with gauge metric type. It represents a single numerical value that can arbitrarily go up and down. For example, you can use it to measure current memory usage or the number of concurrent requests.

We'll use it to keep track of the number of connected hardware devices to our app. Also, the typical use case for the gauge metric is to expose some metadata about the app, for example, the app version using the label.

Let's go ahead and create a struct to represent the hardware device. It's going to have an id, a mac address, and a firmware version.

my-app/main.go
type Device struct {
    ID       int    `json:"id"`
    Mac      string `json:"mac"`
    Firmware string `json:"firmware"`
}

Then declare the slice of devices as a global variable to hold all the connected devices.

my-app/main.go
var dvs []Device

Use the special init() function to create a couple of devices. The init() function will run before the main() and is used when you need to set up some form of state on the initial startup of your program.

my-app/main.go
func init() {
    dvs = []Device{
        {1, "5F-33-CC-1F-43-82", "2.1.6"},
        {2, "EF-2B-C4-F5-D6-34", "2.1.6"},
    }
}

Now, let's create an http handler function that returns all the connected devices to this instance.

  • Use json.Marshal function to convert go structs to the JSON string.
  • Check for the error and return the bad request status code in case of a conversion error.
  • Then set the Content-Type header to application/json and use the 200 HTTP status code.
  • Finally, write the data to the connection.
my-app/main.go
func getDevices(w http.ResponseWriter, r *http.Request) {
    b, err := json.Marshal(dvs)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)
    w.Write(b)
}

In the main function, add a new /devices endpoint and use getDevices handler function that we just created.

my-app/main.go
func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.HandleFunc("/devices", getDevices)
    http.ListenAndServe(":8081", nil)
}

Use curl to check if we can return connected devices.

curl -i localhost:8081/devices

You should get 2 devices.

HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 01 Dec 2022 12:11:30 GMT
Content-Length: 109

[{"id":1,"mac":"5F-33-CC-1F-43-82","firmware":"2.1.6"},{"id":2,"mac":"EF-2B-C4-F5-D6-34","firmware":"2.1.6"}]

It's time to declare our first metric. We're going to use a Gauge to maintain a number of devices connected to this app.

Create metrics struct and set devices property to prometheus.Gauge type. You may notice that you can use either Gauge or a GaugeVec. The difference is very simple. Gauge represents a single numerical value when GaugeVec is a collection that bundles a set of Gauges with the same name but different labels.

For example, if you want to count all connected devices and you don't care about the different types, use a Gauge. On the other hand, if you have different device types, such as routers, switches, and access points, and you want to count them separately, use GaugeVec with a type label. You'll see a bunch of examples during this tutorial.

my-app/main.go
type metrics struct {
    devices prometheus.Gauge
}

Then create a NewMetrics function that defines metrics. It accepts the prometheus register and returns the pointer to the metrics struct.

  • We need to create devices metric using the NewGauge function.
  • A Namespace is just a metric prefix; usually, you use a single word that matches the name of your app. In my case, it's myapp.
  • Then the metric Name. It's very important to follow the naming conventions provided by Prometheus. You can find it on the official website. Let's call it connected_devices.
  • You also need to include a metric description.
  • Then register it with the prometheus registry and return a pointer.
my-app/main.go
func NewMetrics(reg prometheus.Registerer) *metrics {
    m := &metrics{
        devices: prometheus.NewGauge(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "connected_devices",
            Help:      "Number of currently connected devices.",
        }),
    }
    reg.MustRegister(m.devices)
    return m
}
  • In the main() function, create a non-global registry without any pre-registered Collectors.
  • Then create metrics using the NewMetrics function.
  • Now we can use the devices property of the metrics struct and set it to the current number of connected devices. For that, we simply set it to the number of items in the devices slice.
  • Let's also create a custom prometheus handler with the newly created register.
  • We also need to update the /metrics handler to promHandler.
my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))

    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})

    http.Handle("/metrics", promHandler)
    http.HandleFunc("/devices", getDevices)
    http.ListenAndServe(":8081", nil)
}

If you try to scrape the /metrics endpoint right now, you should get a single connected_devices metric with the value of 2.

curl localhost:8081/metrics
# HELP myapp_connected_devices Number of currently connected devices.
# TYPE myapp_connected_devices gauge
myapp_connected_devices 2

Optionally, if you still want to keep all the golang default metrics, you can use a built-in collector to register it with the custom Prometheus register.

Also, you can expose the prometheus handler metric as well by adding setting the Registry field.

my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    reg.MustRegister(collectors.NewGoCollector())
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))

    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{Registry: reg})

    http.Handle("/metrics", promHandler)
    http.HandleFunc("/devices", getDevices)
    http.ListenAndServe(":8081", nil)
}

The next info metric will represent the metadata of the app. You can expose any number of arbitrary key-value pairs from your application. As an example, we'll expose the version of the currently running app.

This time this will be GaugeVec type since we need to set a version label with the actual version of the application.

my-app/main.go
type metrics struct {
    devices prometheus.Gauge
    info    *prometheus.GaugeVec
}
  • Let's declare info metric using NewGaugeVec function.
  • All the metrics will get the same Namespace with the name of the app.
  • Using the same naming convention, let's call it info as well and give it a description.
  • Don't forget to register it using MustRegister function.
my-app/main.go
func NewMetrics(reg prometheus.Registerer) *metrics {
    m := &metrics{
        devices: prometheus.NewGauge(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "connected_devices",
            Help:      "Number of currently connected devices.",
        }),
        info: prometheus.NewGaugeVec(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "info",
            Help:      "Information about the My App environment.",
        },
            []string{"version"}),
    }
    reg.MustRegister(m.devices, m.info)
    return m
}

Then declare a global version variable.

my-app/main.go
var version string

Typically this variable will be set using the environment variable or by your CI tool. For the demo, let's just hardcode it in the init() function.

my-app/main.go
func init() {
    version = "2.10.5"

    dvs = []Device{
        {1, "5F-33-CC-1F-43-82", "2.1.6"},
        {2, "EF-2B-C4-F5-D6-34", "2.1.6"},
    }
}

Then in the main() function, we can use the version Prometheus label to set the application version and use a constant value of 1. If you check the default golang info metric, it uses the same convention.

my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))
    m.info.With(prometheus.Labels{"version": version}).Set(1)

    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})

    http.Handle("/metrics", promHandler)
    http.HandleFunc("/devices", getDevices)
    http.ListenAndServe(":8081", nil)
}

If you check the /metrics endpoint, you should get the info metric with the version of your app. I'll show you later how to use it in Grafana.

curl localhost:8081/metrics
...
# HELP myapp_info Information about the My App environment.
# TYPE myapp_info gauge
myapp_info{version="2.10.5"} 1

It's more common than you may think to expose the prometheus metrics endpoint using a different port. It helps to keep it secure and private by configuring the firewall rules and access lists.

For example, in AWS, if you deploy your application using EC2 instances and Elastic Load Balancer. You can easily expose your main port/endpoint to the internet and keep prometheus metrics endpoint private and protected.

Since we don't use any 3rd party routers and frameworks, I'll show you a very simple way to run multiple servers using goroutines.

The key is to create separate http request multiplexers.

  • The first one is the default multiplexer we'll use to serve the main content.
  • Replace http with your custom multiplexer.
  • Then the second one is for prometheus.
  • Spin up the first goroutine for the main server and the second one for the prometheus metrics endpoint.
  • Then to prevent the main() function from exiting, we can use a select statement that blocks until our goroutines are running.
my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))
    m.info.With(prometheus.Labels{"version": version}).Set(1)

    dMux := http.NewServeMux()
    dMux.HandleFunc("/devices", getDevices)

    pMux := http.NewServeMux()
    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})
    pMux.Handle("/metrics", promHandler)

    go func() {
        log.Fatal(http.ListenAndServe(":8080", dMux))
    }()

    go func() {
        log.Fatal(http.ListenAndServe(":8081", pMux))
    }()

    select {}
}

Now, if you access localhost:8080/devices, you get the connected devices.

curl localhost:8080/devices
[{"id":1,"mac":"5F-33-CC-1F-43-82","firmware":"2.1.6"},{"id":2,"mac":"EF-2B-C4-F5-D6-34","firmware":"2.1.6"}]%

But to get metrics, you would use localhost:8081/metrics endpoint.

curl localhost:8081/metrics
# HELP myapp_connected_devices Number of currently connected devices.
# TYPE myapp_connected_devices gauge
myapp_connected_devices 2
# HELP myapp_info Information about the My App environment.
# TYPE myapp_info gauge
myapp_info{version="2.10.5"} 1

The next step is to set up Prometheus and Grafana locally using docker and docker-compose. It's optional but can help to visualize your metrics.

The next step is to set up Prometheus and Grafana locally using docker and docker-compose. It's optional but can help to visualize your metrics. Let's create a Dockerfile. I'm not going to spend a lot of time on it.

  • In the first stage, we want to import files and build a golang binary.
  • In the second stage, we take a distroless image and copy our binary there.

For the local development, you can just use a single stage and perhaps find a way to reload and rebuild the image automatically on any change.

my-app/Dockerfile
FROM golang:1.19.3-buster AS build

WORKDIR /app

COPY go.mod ./
COPY go.sum ./

RUN go mod download && go mod verify

COPY main.go ./

RUN go build -o /my-app

FROM gcr.io/distroless/base-debian11

COPY --from=build /my-app /my-app

ENTRYPOINT ["/my-app"]

To run it locally, we'll use docker-compose. In that file, define the myapp service and specify the path to the application. Docker-compose will automatically build the image when we run up command. We also want to expose ports 8080 and 8081 with the prometheus metrics.

By the way, the service name, in this case, myapp, also becomes the fully qualified domain name that can be used by other services in this docker-compose file. For example, Prometheus will use it to find its targets.

docker-compose.yaml
1
2
3
4
5
6
7
8
---
version: "3.9"
services:
  myapp:
    build: ./my-app
    ports:
      - 8080:8080
      - 8081:8081

To start the app, run docker-compose up. We also need to add for the future run --build argument. When we make any changes to the source code, we need to rebuild the docker image.

docker-compose up --build

Now, test with curl if you can access localhost:8080/devices and localhost:8081/metrics endpoints.

curl localhost:8080/devices
[{"id":1,"mac":"5F-33-CC-1F-43-82","firmware":"2.1.6"},{"id":2,"mac":"EF-2B-C4-F5-D6-34","firmware":"2.1.6"}]
curl localhost:8081/metrics
# HELP myapp_connected_devices Number of currently connected devices.
# TYPE myapp_connected_devices gauge
myapp_connected_devices 2
# HELP myapp_info Information about the My App environment.
# TYPE myapp_info gauge
myapp_info{version="2.10.5"} 1

The next step is to run Prometheus.

  • Create a folder prometheus and the corresponding configuration file.
  • For the scrape interval, you can use such small values, but in production, you want to increase them to at least 15 or 30 seconds.
  • We're not going to specify the alertmanager and any rules.
  • We'll use static scrape with the domain name of myapp. If you don't use docker-compose and run it locally, just specify localhost:8081.

We don't need to provide the /metrics path because it's the default. In case you use any different path, such as /prom-metrics, you'll need to specify it as well.

prometheus/prometheus.yml
---
global:
  scrape_interval: 5s
  evaluation_interval: 5s

alerting:

rule_files:

scrape_configs:
- job_name: myapp
  static_configs:
  - targets: ["myapp:8081"]

Now, in the docker-compose, add the prometheus service with the latest docker image.

  • Expose prometheus 9090 port to localhost.
  • Finally, provide the path to the local configuration file.
docker-compose.yaml
---
version: "3.9"
services:
  myapp:
    build: ./my-app
    ports:
      - 8080:8080
      - 8081:8081

  prometheus:
    image: prom/prometheus:v2.40.4
    ports:
      - 9090:9090
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml

Let's rerun docker-compose; you can omit the build if you want.

docker-compose up --build

If you navigate to localhost:9090, you should see Prometheus UI. Under the targets section, we can find a single myapp target.

Image title

From the Prometheus explorer tab, you can enter myapp_connected_devices metric and execute. You should get 2 connected devices.

Image title

There is also a Graph tab to visualize metrics, but most of the time, I use Grafana for that purpose.

Image title

Next is Grafana. We can add a new datasource using code.

  • Let's call it Main and use prometheus:9090 URL.
grafana/datasources.yaml
1
2
3
4
5
6
7
---
apiVersion: 1
datasources:
- name: Main
  type: prometheus
  url: http://prometheus:9090
  isDefault: true

Then add the grafana service to the docker-compose file.

  • Expose port 3000 to the local host.
  • Set the admin user and password using environment variables.
  • Mount the datasource file that we just created to the grafana container.
  • If you want to persist data such as dashboards between restarts, you must create a volume and mount it to the container.
docker-compose.yaml
---
version: "3.9"
services:
  myapp:
    build: ./my-app
    ports:
      - 8080:8080
      - 8081:8081

  prometheus:
    image: prom/prometheus:v2.40.4
    ports:
      - 9090:9090
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:9.3.0
    ports:
      - 3000:3000
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=devops123
    volumes:
      - ./grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
      - grafana:/var/lib/grafana

volumes:
  grafana:

Let's run docker-compose up again and visit Grafana on localhost:3000.

docker-compose up --build

The username is admin, and the password is devops123.

First of all, make sure that the datasource is properly configured.

Image title

Then create a new dashboard; let's call it My App.

  • Add a new panel.
  • For the Title, let's use Connected Devices (Gauge).
  • Make sure that the Main prometheus datasource is selected.
  • Use the same metric, myapp_connected_devices, to get all the connected devices to our app.
  • For the Legend, we can use {{ instance }} label, which is automatically assigned by the prometheus.
  • Change the time interval to Last 1 hour.

Now, let's start customizing our chart.

  • Let's shift the Legend to the Right and transform it into the Table.
  • Choose the Last non-null value.
  • Change Line interpolation to smooth.
  • Increase the Line width to 2 and set the Fill opacity to 50.
  • Also, change the Gradient mode from None to Opacity.
  • Optionally you can Connect null values.
  • For the Unit, use short.
  • Set Decimals to 0.
  • Lastly, let's change the Color scheme to light blue or any other color you want.

That's all; this is our first graph.

Image title

Next, let's create another panel to display the current version of the app.

  • Change the Title to App Version (Gauge).
  • Also, update the chart type to Stat.
  • For the metric, use myapp_info.
  • Change the query Type from Range to Instant.
  • For the legend, let's use {{ version }} label.
  • Set Text mode to Name and Graph mode to None.

It's not perfect; every time you upgrade your app, you'll see multiple versions for 5 minutes, and after that, the single version is displayed.

Image title

Let's add functionality to our app to register new devices.

  • Create a new createDevice function with the same signature as any other http handler.
  • Then declare the device variable.
  • This function will accept the device as a JSON object from the client and decode it to the golang struct.
  • Since we use a global variable to maintain connected devices, let's append this device to dvs slice.
  • Set the HTTP status code to 201 and return Device created! to the client.
my-app/main.go
func createDevice(w http.ResponseWriter, r *http.Request) {
    var dv Device

    err := json.NewDecoder(r.Body).Decode(&dv)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    dvs = append(dvs, dv)

    w.WriteHeader(http.StatusCreated)
    w.Write([]byte("Device created!"))
}

The standard http request multiplexer doesn't have functionality to route requests based on the HTTP methods such as GET, POST, etc. To implement this, we need to create another handler and use a switch statement.

  • Let's call it registerDevices.
  • Use the switch on the request method.
  • If the method is GET, we want to use our first getDevices function.
  • If the method is POST, use the new createDevice function.
  • In case we receive a request with the unsupported method, we want to indicate what methods are available and send the error to the client.
my-app/main.go
func registerDevices(w http.ResponseWriter, r *http.Request) {
    switch r.Method {
    case "GET":
        getDevices(w, r)
    case "POST":
        createDevice(w, r)
    default:
        w.Header().Set("Allow", "GET, POST")
        http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
    }
}
  • Then update getDevices http handler to registerDevices.
my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))
    m.info.With(prometheus.Labels{"version": version}).Set(1)

    dMux := http.NewServeMux()
    dMux.HandleFunc("/devices", registerDevices)

    pMux := http.NewServeMux()
    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})
    pMux.Handle("/metrics", promHandler)

    go func() {
        log.Fatal(http.ListenAndServe(":8080", dMux))
    }()

    go func() {
        log.Fatal(http.ListenAndServe(":8081", pMux))
    }()

    select {}
}

This time when you run docker-compose, make sure to include the build flag since we updated the source code.

docker-compose up --build

To create a new device, use curl and provide the JSON object.

curl -d '{"id": 3, "mac": "96-40-D1-32-D7-1A", "firmware": "3.03.00"}' localhost:8080/devices

To check if the device was created, use the GET request.

curl localhost:8080/devices

Now, every time when we create a new device, we want to increment the corresponding metric. In this tutorial, I'll show you a few methods, from creating our custom http handlers to the middleware.

Let's start with the custom handler. In order to confirm to the handler interface, we need to implement a single ServeHTTP function with the same signature as the http handler. That's the only requirement.

First of all, let's create registerDevicesHandler struct, and include a metrics property that we can pass later to increment the device count.

my-app/main.go
type registerDevicesHandler struct {
    metrics *metrics
}

We can update existing registerDevices function to ServeHTTP. Then add metrics as additional argument to the createDevice function.

my-app/main.go
func (rdh registerDevicesHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    switch r.Method {
    case "GET":
        getDevices(w, r)
    case "POST":
        createDevice(w, r, rdh.metrics)
    default:
        w.Header().Set("Allow", "GET, POST")
        http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
    }
}

In the createDevice function, add metrics as a third argument. To increment the device count, you can use Inc() method.

In general, it's recommended to use Set() instead with Gauge type metrics due to performance benefits. On the other hand, it's better to use increment on the Counter type than Set(). Let's use the latter.

my-app/main.go
func createDevice(w http.ResponseWriter, r *http.Request, m *metrics) {
    var dv Device

    err := json.NewDecoder(r.Body).Decode(&dv)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    dvs = append(dvs, dv)

    // m.devices.Inc()
    m.devices.Set(float64(len(dvs)))

    w.WriteHeader(http.StatusCreated)
    w.Write([]byte("Device created!"))
}
  • In the main() function, we need to initialize registerDevicesHandler and pass a pointer to the metrics struct.
  • Then replace the handler on the /devices endpoint.
my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))
    m.info.With(prometheus.Labels{"version": version}).Set(1)

    dMux := http.NewServeMux()
    rdh := registerDevicesHandler{metrics: m}
    dMux.Handle("/devices", rdh)

    pMux := http.NewServeMux()
    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})
    pMux.Handle("/metrics", promHandler)

    go func() {
        log.Fatal(http.ListenAndServe(":8080", dMux))
    }()

    go func() {
        log.Fatal(http.ListenAndServe(":8081", pMux))
    }()

    select {}
}

To test this functionality, rebuild the app.

docker-compose up --build

You can use the same command to create a new device.

curl -d '{"id": 3, "mac": "96-40-D1-32-D7-1A", "firmware": "3.03.00"}' localhost:8080/devices

In the Grafana dashboard, you can see that the metric went up.

Image title

Counter

The next metric type that we're going to implement is a counter. It is a cumulative metric that represents a single monotonically increasing counter. It can only go up and be reset to zero on restart. Typically you would use it with the rate function and measure the number of requests served, tasks completed, or errors. We're going to use it to count device upgrades.

Let's declare it as a CounterVec to add custom labels. We'll use a label to count upgrades of different device types. For example, the type can be a router, access point, modem, etc.

my-app/main.go
type metrics struct {
    devices  prometheus.Gauge
    info     *prometheus.GaugeVec
    upgrades *prometheus.CounterVec
}
  • Add it to the NewMetrics function.
  • Let's name it device_upgrade_total and give it a description Number of upgraded devices.
  • Provide a single type label, and don't forget to register it using MustRegister function.
my-app/main.go
func NewMetrics(reg prometheus.Registerer) *metrics {
    m := &metrics{
        devices: prometheus.NewGauge(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "connected_devices",
            Help:      "Number of currently connected devices.",
        }),
        info: prometheus.NewGaugeVec(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "info",
            Help:      "Information about the My App environment.",
        },
            []string{"version"}),
        upgrades: prometheus.NewCounterVec(prometheus.CounterOpts{
            Namespace: "myapp",
            Name:      "device_upgrade_total",
            Help:      "Number of upgraded devices.",
        }, []string{"type"}),
    }
    reg.MustRegister(m.devices, m.info, m.upgrades)
    return m
}
  • Next, create upgradeDevice function that accepts writer, request, and metrics.
  • To get the id of the device, let's trim the path.
  • Then try to convert the id to the integer and return an error if it fails.
  • To accept the firmware version, we'll reuse the same Device struct to decode JSON to the device object.
  • Then find the device by the provided id and update the firmware version.
  • To increment the counter, use a router type label and Inc() method.
  • Return 202 HTTP status code and send Upgrading... message to the client.
my-app/main.go
func upgradeDevice(w http.ResponseWriter, r *http.Request, m *metrics) {
    path := strings.TrimPrefix(r.URL.Path, "/devices/")

    id, err := strconv.Atoi(path)
    if err != nil || id < 1 {
        http.NotFound(w, r)
    }

    var dv Device
    err = json.NewDecoder(r.Body).Decode(&dv)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    for i := range dvs {
        if dvs[i].ID == id {
            dvs[i].Firmware = dv.Firmware
        }
    }

    m.upgrades.With(prometheus.Labels{"type": "router"}).Inc()

    w.WriteHeader(http.StatusAccepted)
    w.Write([]byte("Upgrading..."))
}

We also need to create a custom http handler with metrics property.

For the upgrade of the device, we typically use PUT HTTP method. On any other request, send Method Not Allowed.

my-app/main.go
type manageDevicesHandler struct {
    metrics *metrics
}

func (mdh manageDevicesHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    switch r.Method {
    case "PUT":
        upgradeDevice(w, r, mdh.metrics)
    default:
        w.Header().Set("Allow", "PUT")
        http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
    }
}

In the main() function, we need to initialize the new manageDevicesHandler and add additional /devices/ path with a / at the end.

my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))
    m.info.With(prometheus.Labels{"version": version}).Set(1)

    dMux := http.NewServeMux()
    rdh := registerDevicesHandler{metrics: m}
    mdh := manageDevicesHandler{metrics: m}

    dMux.Handle("/devices", rdh)
    dMux.Handle("/devices/", mdh)

    pMux := http.NewServeMux()
    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})
    pMux.Handle("/metrics", promHandler)

    go func() {
        log.Fatal(http.ListenAndServe(":8080", dMux))
    }()

    go func() {
        log.Fatal(http.ListenAndServe(":8081", pMux))
    }()

    select {}
}

Rebuild the app.

docker-compose up --build

Until you upgrade at least once, you won't see a new metric in the prometheus. Let's use curl to upgrade a device a couple of times.

curl -X PUT -d '{"firmware": "2.3.0"}' localhost:8080/devices/1

Now, in the prometheus, you can execute myapp_device_upgrade_total query to get the number of times your devices were upgraded.

Image title

Just a total number of upgrades maybe not very useful. More interesting would be to measure the load. We can apply the rate() function to get the number of upgrades per second.

  • Let's create a new Grafana chart and call it Upgrades (Counter).
  • Use rate(myapp_device_upgrade_total[1m]) query. In this expression, we measure the rate of the upgrades per second for the last 1 minute. Keep in mind that the interval must be at least 4 times larger than the scrape interval.
  • For the legend, use {{ type }} label.
  • Then pretty much the same customization as with the previous dashboard.
  • For the Unit type, use requests per second (rps).
  • Change the color to pink.
  • Set the refresh rate to 5 or 10 seconds.

Before testing upgrades, let's introduce some artificial delay. - Create a new sleep function. - It will accept the maximum number of milliseconds and generate some random delay.

my-app/main.go
func sleep(ms int) {
    rand.Seed(time.Now().UnixNano())
    now := time.Now()
    n := rand.Intn(ms + now.Second())
    time.Sleep(time.Duration(n) * time.Millisecond)
}

Then include it in the upgradeDevice function.

my-app/main.go
func upgradeDevice(w http.ResponseWriter, r *http.Request, m *metrics) {
    path := strings.TrimPrefix(r.URL.Path, "/devices/")

    id, err := strconv.Atoi(path)
    if err != nil || id < 1 {
        http.NotFound(w, r)
    }

    var dv Device
    err = json.NewDecoder(r.Body).Decode(&dv)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    for i := range dvs {
        if dvs[i].ID == id {
            dvs[i].Firmware = dv.Firmware
        }
    }
    sleep(1000)

    m.upgrades.With(prometheus.Labels{"type": "router"}).Inc()

    w.WriteHeader(http.StatusAccepted)
    w.Write([]byte("Upgrading..."))
}

Rebuild the app.

docker-compose up --build

Now to generate some load, you can try to run curl multiple times, or you can use one of the open-source load testers such as hey.

 hey -n 100000 -c 1 -q 2 -m PUT -d '{"firmware": "2.03.00"}' http://localhost:8080/devices/1

Image title

Histogram

To measure latency or response sizes, we typically use Histogram. There is a significate benefit over summary that I'll show you in the following example.

Let's create a new metric and call it a duration of HistogramVec type.

my-app/main.go
type metrics struct {
    devices  prometheus.Gauge
    info     *prometheus.GaugeVec
    upgrades *prometheus.CounterVec
    duration *prometheus.HistogramVec
}

When naming histograms, you should follow Prometheus naming conventions and use one of the base units. When measuring time, instead of minutes or milliseconds, you should use seconds. Later you can convert it using either using simple math or a built-in Grafana type system.

The key and sometimes a challenge when working with histograms is that you must come up with some time buckets ahead of time. On the other hand, a summary does not require it, but there is an even bigger issue with it.

You can use some build functions to automatically generate buckets, or you can hardcode them. Here, for example, I want to declare five buckets, from 100 ms to 300 ms. They used to count requests. For example, if the request duration is less than 150 ms, the bucket count will go up.

my-app/main.go
func NewMetrics(reg prometheus.Registerer) *metrics {
    m := &metrics{
        devices: prometheus.NewGauge(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "connected_devices",
            Help:      "Number of currently connected devices.",
        }),
        info: prometheus.NewGaugeVec(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "info",
            Help:      "Information about the My App environment.",
        },
            []string{"version"}),
        upgrades: prometheus.NewCounterVec(prometheus.CounterOpts{
            Namespace: "myapp",
            Name:      "device_upgrade_total",
            Help:      "Number of upgraded devices.",
        }, []string{"type"}),
        duration: prometheus.NewHistogramVec(prometheus.HistogramOpts{
            Namespace: "myapp",
            Name:      "request_duration_seconds",
            Help:      "Duration of the request.",
            // 4 times larger for apdex score
            // Buckets: prometheus.ExponentialBuckets(0.1, 1.5, 5),
            // Buckets: prometheus.LinearBuckets(0.1, 5, 5),
            Buckets: []float64{0.1, 0.15, 0.2, 0.25, 0.3},
        }, []string{"status", "method"}),
    }
    reg.MustRegister(m.devices, m.info, m.upgrades, m.duration)
    return m
}

Let's use one of the custom handlers to pass the metrics variable to the getDevices function.

my-app/main.go
func (rdh registerDevicesHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    switch r.Method {
    case "GET":
        getDevices(w, r, rdh.metrics)
    case "POST":
        createDevice(w, r, rdh.metrics)
    default:
        w.Header().Set("Allow", "GET, POST")
        http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
    }
}
  • Add metrics as an argument to getDevices.
  • Then get the current time.
  • Use sleep to simulate some latency.
  • Finally, record the observation using Observe function and the time elapsed since now.

Later, I'll show you how to measure duration by creating a custom middleware.

my-app/main.go
func getDevices(w http.ResponseWriter, r *http.Request, m *metrics) {
    now := time.Now()

    b, err := json.Marshal(dvs)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }
    sleep(200)

    m.duration.With(prometheus.Labels{"method": "GET", "status": "200"}).Observe(time.Since(now).Seconds())

    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)
    w.Write(b)
}

Rebuild the app.

docker-compose up --build

Until you at least one invoke the /devices endpoint, the new histogram metric won't show up.

curl localhost:8080/devices

Before creating a dashboard, let's scrape the /metrics endpoint with curl.

curl localhost:8081/metrics

We have all the buckets with the upper inclusive bound, including the sum and count.

  • ..._duration_seconds_sum is the total sum of all observed values. Since I invoked the /devices endpoint just once, it looks like the duration of that first request was around 65 ms.
  • ...duration_seconds_count - is the count of events that have been observed. Since I made a single request, we have a single count.
...
# HELP myapp_request_duration_seconds Duration of the request.
# TYPE myapp_request_duration_seconds histogram
myapp_request_duration_seconds_bucket{method="GET",status="200",le="0.1"} 1
myapp_request_duration_seconds_bucket{method="GET",status="200",le="0.15"} 1
myapp_request_duration_seconds_bucket{method="GET",status="200",le="0.2"} 1
myapp_request_duration_seconds_bucket{method="GET",status="200",le="0.25"} 1
myapp_request_duration_seconds_bucket{method="GET",status="200",le="0.3"} 1
myapp_request_duration_seconds_bucket{method="GET",status="200",le="+Inf"} 1
myapp_request_duration_seconds_sum{method="GET",status="200"} 0.065365292
myapp_request_duration_seconds_count{method="GET",status="200"} 1

Now let's create a new Grafana chart for the request duration.

  • Let's call it Latency (Histogram).
  • For the query, let's calculate the multiple percentiles and start with P99, which covers 99% of all requests that fall under that duration histogram_quantile(0.99, rate(myapp_request_duration_seconds_bucket[1m])).

Here is the biggest difference, in my opinion, between histograms and summaries. You can easily aggregate histogram values across all the replicas of your service. Especially in the cloud, we usually run multiple replicas of the application, sometimes even hundreds of them. When using a summary, you can only calculate the percentile for each individual application. On the other hand, with a histogram, it's easy to aggregate across all the replicas.

Also, histograms cover almost or maybe even all use cases for summaries. As I said, the downside is that you need to provide interval buckets ahead of time.

Even if we have a single application right now, let's make this query future-proof and update the query to histogram_quantile(0.99, sum(rate(myapp_request_duration_seconds_bucket[1m])) by (le))

  • For the legend, use P99.
  • Let's also repeat the same process for p90 and p50.
  • Then make the same customizations for the graph.
  • For the unit, use seconds.
  • Optionally you can override some variables.
  • Change the color for P99 to red.
  • Then for P90, use yellow.
  • Finally, for p90, you can use green color.

Now let's generate some load.

hey -n 100000 -c 1 -q 2 http://localhost:8080/devices

Image title

Summary

The last metric type is summary. It's a little bit more convenient than a histogram in way that you don't need to define buckets ahead of time. But it's not possible to aggregate across multiple replicas of your application. I almost never use it.

As with any other metric, you can declare it as a summary vector. But for simplicity, let's just use a summary.

my-app/main.go
type metrics struct {
    devices       prometheus.Gauge
    info          *prometheus.GaugeVec
    upgrades      *prometheus.CounterVec
    duration      *prometheus.HistogramVec
    loginDuration prometheus.Summary
}
  • Let's call it login_request_duration_seconds.
  • When you declare the summary metric, you can specify percentiles instead of buckets. Here we have the same p99, p90, and p50 percentile, which is just a median.
  • Also, don't forget to register it using MustRegister function.
my-app/main.go
func NewMetrics(reg prometheus.Registerer) *metrics {
    m := &metrics{
        devices: prometheus.NewGauge(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "connected_devices",
            Help:      "Number of currently connected devices.",
        }),
        info: prometheus.NewGaugeVec(prometheus.GaugeOpts{
            Namespace: "myapp",
            Name:      "info",
            Help:      "Information about the My App environment.",
        },
            []string{"version"}),
        upgrades: prometheus.NewCounterVec(prometheus.CounterOpts{
            Namespace: "myapp",
            Name:      "device_upgrade_total",
            Help:      "Number of upgraded devices.",
        }, []string{"type"}),
        duration: prometheus.NewHistogramVec(prometheus.HistogramOpts{
            Namespace: "myapp",
            Name:      "request_duration_seconds",
            Help:      "Duration of the request.",
            // 4 times larger for apdex score
            // Buckets: prometheus.ExponentialBuckets(0.1, 1.5, 5),
            // Buckets: prometheus.LinearBuckets(0.1, 5, 5),
            Buckets: []float64{0.1, 0.15, 0.2, 0.25, 0.3},
        }, []string{"status", "method"}),
        loginDuration: prometheus.NewSummary(prometheus.SummaryOpts{
            Namespace:  "myapp",
            Name:       "login_request_duration_seconds",
            Help:       "Duration of the login request.",
            Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
        }),
    }
    reg.MustRegister(m.devices, m.info, m.upgrades, m.duration, m.loginDuration)
    return m
}

For the summary, let's create a new login endpoint and a handler, but in this case, we'll use a middleware pattern.

First, let's create a similar custom http handler but without the metrics property.

The handler will only use the sleep function and return Welcome to the app! to the client.

my-app/main.go
type loginHandler struct{}

func (l loginHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    sleep(200)
    w.Write([]byte("Welcome to the app!"))
}

Now the middleware. It accepts the http handler and the metrics and returns another http handler. In this way, you can chain as many middleware functions as you want. For this use case, we only want to measure the duration of the request. Let's record time now and then use a similar Observe function right after the http handler.

my-app/main.go
func middleware(next http.Handler, m *metrics) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        now := time.Now()
        next.ServeHTTP(w, r)
        m.loginDuration.Observe(time.Since(now).Seconds())
    })
}
  • In the main() function, declare loginHandler.
  • Then use middleware to wrap it.
  • Finally, we can use it in the /login endpoint as a handler.
my-app/main.go
func main() {
    reg := prometheus.NewRegistry()
    m := NewMetrics(reg)

    m.devices.Set(float64(len(dvs)))
    m.info.With(prometheus.Labels{"version": version}).Set(1)

    dMux := http.NewServeMux()
    rdh := registerDevicesHandler{metrics: m}
    mdh := manageDevicesHandler{metrics: m}

    lh := loginHandler{}
    mlh := middleware(lh, m)

    dMux.Handle("/devices", rdh)
    dMux.Handle("/devices/", mdh)
    dMux.Handle("/login", mlh)

    pMux := http.NewServeMux()
    promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})
    pMux.Handle("/metrics", promHandler)

    go func() {
        log.Fatal(http.ListenAndServe(":8080", dMux))
    }()

    go func() {
        log.Fatal(http.ListenAndServe(":8081", pMux))
    }()

    select {}
}

Rebuild the app.

docker-compose up --build

Let's try to access the /login endpoint.

curl localhost:8080/login

If you scrape the /metrics endpoint again, you'll see similar output as the histogram. But instead of buckets, you have the percentiles that were computed on the server side already.

curl localhost:8081/metrics
...
# HELP myapp_login_request_duration_seconds Duration of the login request.
# TYPE myapp_login_request_duration_seconds summary
myapp_login_request_duration_seconds{quantile="0.5"} 0.073964833
myapp_login_request_duration_seconds{quantile="0.9"} 0.073964833
myapp_login_request_duration_seconds{quantile="0.99"} 0.073964833
myapp_login_request_duration_seconds_sum 0.073964833
myapp_login_request_duration_seconds_count 1
  • Let's create the last graph for this tutorial. Call it Latency (Summary).
  • For the query, use myapp_login_request_duration_seconds{quantile="0.99"} metric with different values. The first is P99.
  • Then P90 and P50.
  • For the unit type, use seconds as well.
  • Also, if you want, you can override some variables to match the histogram graph.

For the final test, let's generate some load.

hey -n 100000 -c 1 -q 2 http://localhost:8080/login

Image title

top.title