blob: b18d2ec3cb420787484eae78b503ab979372336b (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
# health
## Name
*health* - enables a health check endpoint.
## Description
Enabled process wide health endpoint. When CoreDNS is up and running this returns a 200 OK HTTP
status code. The health is exported, by default, on port 8080/health.
## Syntax
~~~
health [ADDRESS]
~~~
Optionally takes an address; the default is `:8080`. The health path is fixed to `/health`. The
health endpoint returns a 200 response code and the word "OK" when this server is healthy.
An extra option can be set with this extended syntax:
~~~
health [ADDRESS] {
lameduck DURATION
}
~~~
* Where `lameduck` will delay shutdown for **DURATION**. /health will still answer 200 OK.
Note: The *ready* plugin will not answer OK while CoreDNS is in lame duck mode prior to shutdown.
If you have multiple Server Blocks, *health* can only be enabled in one of them (as it is process
wide). If you really need multiple endpoints, you must run health endpoints on different ports:
~~~ corefile
com {
whoami
health :8080
}
net {
erratic
health :8081
}
~~~
Doing this is supported but both endpoints ":8080" and ":8081" will export the exact same health.
## Metrics
If monitoring is enabled (via the *prometheus* plugin) then the following metrics are exported:
* `coredns_health_request_duration_seconds{}` - The *health* plugin performs a self health check
once per second on the `/health` endpoint. This metric is the duration to process that request.
As this is a local operation it should be fast. A (large) increase in this
duration indicates the CoreDNS process is having trouble keeping up with its query load.
* `coredns_health_request_failures_total{}` - The number of times the self health check failed.
Note that these metrics *do not* have a `server` label, because being overloaded is a symptom of
the running process, *not* a specific server.
## Examples
Run another health endpoint on http://localhost:8091.
~~~ corefile
. {
health localhost:8091
}
~~~
Set a lame duck duration of 1 second:
~~~ corefile
. {
health localhost:8092 {
lameduck 1s
}
}
~~~
|