On-premise solution for monitoring isolated clusters been established.
The various metrics from every node is collected with the collectd
package, which sending the data to the Grafana
monitoring server, where
the data is being stored in the influxdb
to be displayed with Grafana
.
The firewall rules were enabled to allow static routing on UPD 25826
port to make possible for isolated clusters to communicate with the monitoring server.
Also Grafana
was configured to send alerting messages to the dedicated Slack
channel in the case of potential node health issues in real time.
The messages could be configured to be sent to different devices – smartphone or desktop.