-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose metrics from NGINX and Bind to Prometheus #38
Comments
This issue has been automatically marked as inactive because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is still awating review. There seems to be an interest in getting these features. If not all, then some. |
I have been following the project is expected new updates for exporters and addition of prometheus ? |
Introduction to the problem
As HTX-LAN has been using this awesome piece of software for some years now,
we wanted some more insights into how the cache was utilized live under the LAN parties.
The provided logs did show some HIT and MISS but the actual throughput and performance of the tools were invisible.
Idea
We wanted to export metrics from Bind (DNS) and NGINX in a form that where easy to use in Grafana through Prometheus
Solution
To implement this, we have updated the infrastructure in three services (repositories).
lancache-dns
The first repository, that we have updated, is the lancache-dns.
Here we have added statistics logging to the bind configuration file, which is exposed on port 8053 (TCP).
The core Bind functionality is not changed, but due to statistics now being logged, it does create some extra load. This load is very limited in scope.
This endpoint is not intended to be exposed publicly, but to be used by an exporter, which we have included in the docker-compose repository.
monolithic
The second repository, that we have updated, is the monolithic.
Here we have added a NGINX status endpoint, that returns the current status for NGINX used in the monolithic setup.
This status endpoint is set up as a standalone site configuration (
30_metrics.conf
).Through this site configuration, we expose the status endpoint on port 8080 (TCP), with the help of the
stub_status
functionality in NGINX. By doing it on a standalone site, the endpoint can easily be disabled by not opening the port in the container.This endpoint is not intended to be exposed publicly, but to be used by an exporter, which we have included in the docker-compose repository.
docker-compose
The third repository, that we have updated, is the docker-compose, which is the repository in which this issue is created.
The update contains multiple parts:
Exporters
The first part is that we have added two new services, that are used to export the metrics from the lancache-dns and monolithic services.
The two exporters used for this, are the Bind prometheuscommunity/bind-exporter, which converts the bind statistics to a format that can be used by Prometheus, and nginx/nginx-prometheus-exporter, which is used to export the NGINX statistics.
These two exporters are, respectively, community build and official NGINX software. Therefore, we expect them to be maintained and updated in the future, and it is, therefore, adequate to use them here for this use case, instead of building custom exporters and maintaining them.
Network segregation
To build an adequate docker stack for this purpose, and ensure security in the network, we have segregated the network into three parts.
The first part is the main
default
network. It binds the main services together (lancache-dns and monolithic).The second network is the
dns-metrics
network. This handle network connection between the lancache-dns (more specifically bind, and the exposed service on port 8053) and the prometheuscommunity/bind-exporter service.The third network is the
nginx-metrics
network. This handles network connection between the monolithic service, and its exposed status service on port 8080, and the nginx/nginx-prometheus-exporter.By doing it with this segregated network, exporters only have the access that they absolutely need, and no more. Therefore following the principle of least privilege.
This may be simplified, by just using one network, and not following the principle of least privilege.
Healthcheck
To ensure the exporters start correctly, we have implemented health checks on 3 out of 4 of the services in the stack.
The health checks are implemented as a simple
curl
(for the two Lancache services) andwget
(for the prometheuscommunity/bind-exporter) command, that checks that the service is available and running correctly.The nginx/nginx-prometheus-exporter does not include a health check, due to the container used for this, does not support it.
Healthcecks in general allows for better knowledge of the system status, and in this case, allows us to configure the exporters to only start when the services they are exporting from, are running correctly. This limits the need for restarts.
METRIC_BIND_IP
To help segregate the metrics, and in some form limit the availability of the metrics, we have added a new environment variable to the stack.
METRIC_BIND_IP
can be set to a specific IP address, that the exporters will bind to. Therefore segregating the metrics to a specific IP address, and not the DNS IP, which is the default IP of the metrics endpoints.This is documented in both the README.md and the .env file.
In the example file, it is left empty, such that it falls back to
DNS_BIND_IP
.Prometheus and grafana
To then use the metrics, Prometheus and Grafana are recommended. This is not included in any of the updates, but is a recommendation for the use of the metrics.
This has shortly been documented in the README.md of the docker-compose repository.
Through the exportes, two new endpoints are available, that can be scraped by Prometheus. One for each Lancache service.
The endpoints are:
As an example, of how this data can be used, we have created a simple Grafana dashboard, that shows the data from the two endpoints.
We are still working on a more complete dashboard, that shows more data, and is more complete. If it is wanted, we can include it in a later update.
Note: The dashboard shown as an example, also uses Cadvisor to show the resource usage of the containers. This is not included in the update and is only shown as an example of how the data can be used.
Pull requests
We have created the following pull requests, to the repositories, that we have mentioned above, from our fork at HTX-LAN
Pull requests:
Further improvement
The system may not be perfect, and there may be some improvements that can be made. We have already thought of a few elements, and would bring them up, as a part of this issue, as they may depend on the requests of the maintainers.
List of further improvements:
This would allow for profile launching of the stack, and therefore allow for a more simple setup, and a more complex setup, depending on the needs of the user, through a simple command.
This is not a part of the update, but would be a good addition to the website, to show how the metrics can be used, and document this new feature of the stack.
We are open to any changes that may be needed, in order to conform with the standards of the project and the needs of the maintainers. But we do note, that the current setup, is going to be used in some form, at the next HTX-LAN, and therefore we would like to at least keep the core functionality this adds.
Special thanks go to William Børresen, for being one of the main contributors to this update.
The text was updated successfully, but these errors were encountered: