SECRET OF CSS

Need to Reboot? Learn How to Check With Node_exporter! | by Mirco | Jul, 2022


Leverage Prometheus and the node_exporter to capture arbitrary metrics with ease

1*APAZlATVOtyaVto9k1f6Kw
Photo by Jay Heike on Unsplash

You can use the node_exporter and its textfile collector to track arbitrary metrics from your machines.

It may be the outcome of a cron job, number of logins, or anything else you can boil down to a number. They already built in a good deal of metrics, so check out the documentation first.

For everything else, you use the textfile collector. I show you how.

I assume you already have a Prometheus instance up and running. If this is not the case, head over to the official documentation or check my article series on this topic.

The Prometheus metric format is a simple text format. It comprises the metric name, optional labels, the numeric value, and an optional timestamp. A comment above the metric describes the metric type. Look at this example of a counter metric:

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000

The first line is just a human-readable description, while the second line is a hint for Prometheus that this metric is a counter. Counters are one of Prometheus metric types. You can only increment the counter. Prometheus will detect a problem if you decrease a counter.

The third line is the metric http_requests_total with the labels method and code. The current value of the counter is 1027.

You must write your metric in this to a file called your_metric.prom. The textfile collector will collect all the metrics and expose them to Prometheus.

While you may write multiple metrics to one file, I recommend writing each metric to a single file. This way, you keep the overview easier.

We start with a simple gauge metric. It will have the value of one if a reboot is required and zero otherwise. I tested it on Ubuntu 22.04 LTS. It should work on all Debian-based systems.

We need to check if the file /var/run/reboot-required exists. Create a new file called reboot-check.sh with the following content:

Make it executable with chmod +x reboot-check.sh . If you run it, you get the following output:

#TYPE reboot_required gauge
reboot_required{server_type="dev"} 0

The metric is a gauge, meaning it can go up and down. Exactly what we need.

We create a cron job to run and write the script periodically. Open your crontab with crontab -e and append this to the bottom:

* * * * * sh path_to_your_script/reboot_check.sh > path_to_your_metric_folder/foo.prom

This will execute the script every minute. If you need help with the cron syntax, head over to crontab.guru.

We have our metric now. Time to install the node_exporter!

You only need to download the node_exporter, unpack and run it:

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gztar xvfz node_exporter-1.3.1.linux-amd64.tar.gzcd node_exporter-1.3.1.linux-amd64./node_exporter

At the time of writing, 1.3.1 is the latest version. Head over to GitHub to check for newer versions.

Open your_server:9100/metrics in a browser to see all exported metrics. You will see a lot!

[...]
# HELP node_vmstat_oom_kill /proc/vmstat information field oom_kill.
# TYPE node_vmstat_oom_kill untyped
node_vmstat_oom_kill 0
# HELP node_vmstat_pgfault /proc/vmstat information field pgfault.
# TYPE node_vmstat_pgfault untyped
node_vmstat_pgfault 3.16685474e+08
# HELP node_vmstat_pgmajfault /proc/vmstat information field pgmajfault.
# TYPE node_vmstat_pgmajfault untyped
node_vmstat_pgmajfault 24606
# HELP node_vmstat_pgpgin /proc/vmstat information field pgpgin.
# TYPE node_vmstat_pgpgin untyped
node_vmstat_pgpgin 8.334653e+06
# HELP node_vmstat_pgpgout /proc/vmstat information field pgpgout.
# TYPE node_vmstat_pgpgout untyped
node_vmstat_pgpgout 3.1538757e+07
# HELP node_vmstat_pswpin /proc/vmstat information field pswpin.
# TYPE node_vmstat_pswpin untyped
node_vmstat_pswpin 0
# HELP node_vmstat_pswpout /proc/vmstat information field pswpout.
# TYPE node_vmstat_pswpout untyped
node_vmstat_pswpout 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 9
[...]

But our custom metric is missing. We must tell the node_exporter where it can collect our metric. Stop the node_exporter and run it again with the following command:

./node_exporter \
--collector.textfile.directory=/absolut_path_to_your_metric_folder

You must enter the absolute path. Using ~ does not work either.

And there it is!

reboot_required metric.

Now, create (or delete) the monitored file and refresh the page:

1*NiTVt1Hf5cK7dMv vmfDjA
Updated reboot_required metric.

Works as intended! Now, you can scrape the node_exporter endpoint with Prometheus and get an alert if you need to restart your server.

You can use node_exporter to monitor everything you want on your servers. The example was quite simple. You can find more sophisticated examples on GitHub.

To learn more about Prometheus and the node_exporter, I recommend the following pages:

Thank you for your time!

Want to Connect?Subscribe to my newsletter so you never miss a new post:https://verbosemode.dev/subscribe



News Credit

%d bloggers like this: