aboutsummaryrefslogtreecommitdiff
path: root/plugin/forward/README.md
blob: b4307d8ddcd601ef5c67138d04ebab9146e994a2 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# forward

## Name

*forward* - facilitates proxying DNS messages to upstream resolvers.

## Description

The *forward* plugin re-uses already opened sockets to the upstreams. It supports UDP, TCP and
DNS-over-TLS and uses in band health checking.

When it detects an error a health check is performed. This checks runs in a loop,
starting with a *0.5s* interval and exponentially backing off with randomized intervals
up to *60s* for as long as the upstream reports unhealthy. The exponential backoff
will reset to *0.5s* after 15 minutes. Once healthy we stop health checking (until the next
error). The health checks use a recursive DNS query (`. IN NS`) to get upstream health. Any response
that is not a network error (REFUSED, NOTIMPL, SERVFAIL, etc) is taken as a healthy upstream. The
health check uses the same protocol as specified in **TO**. If `max_fails` is set to 0, no checking
is performed and upstreams will always be considered healthy.

When *all* upstreams are down it assumes health checking as a mechanism has failed and will try to
connect to a random upstream (which may or may not work).

This plugin can only be used once per Server Block.

## Syntax

In its most basic form, a simple forwarder uses this syntax:

~~~
forward FROM TO...
~~~

* **FROM** is the base domain to match for the request to be forwarded.
* **TO...** are the destination endpoints to forward to. The **TO** syntax allows you to specify
  a protocol, `tls://9.9.9.9` or `dns://` (or no protocol) for plain DNS. The number of upstreams is
  limited to 15.

Multiple upstreams are randomized (see `policy`) on first use. When a healthy proxy returns an error
during the exchange the next upstream in the list is tried.

Extra knobs are available with an expanded syntax:

~~~
forward FROM TO... {
    except IGNORED_NAMES...
    force_tcp
    prefer_udp
    expire DURATION
    max_fails INTEGER
    tls CERT KEY CA
    tls_servername NAME
    policy random|round_robin|sequential
    health_check DURATION
    max_queries MAX
}
~~~

* **FROM** and **TO...** as above.
* **IGNORED_NAMES** in `except` is a space-separated list of domains to exclude from forwarding.
  Requests that match none of these names will be passed through.
* `force_tcp`, use TCP even when the request comes in over UDP.
* `prefer_udp`, try first using UDP even when the request comes in over TCP. If response is truncated
  (TC flag set in response) then do another attempt over TCP. In case if both `force_tcp` and
  `prefer_udp` options specified the `force_tcp` takes precedence.
* `max_fails` is the number of subsequent failed health checks that are needed before considering
  an upstream to be down. If 0, the upstream will never be marked as down (nor health checked).
  Default is 2.
* `expire` **DURATION**, expire (cached) connections after this time, the default is 10s.
* `tls` **CERT** **KEY** **CA** define the TLS properties for TLS connection. From 0 to 3 arguments can be
  provided with the meaning as described below

  * `tls` - no client authentication is used, and the system CAs are used to verify the server certificate
  * `tls` **CA** - no client authentication is used, and the file CA is used to verify the server certificate
  * `tls` **CERT** **KEY** - client authentication is used with the specified cert/key pair.
    The server certificate is verified with the system CAs
  * `tls` **CERT** **KEY**  **CA** - client authentication is used with the specified cert/key pair.
    The server certificate is verified using the specified CA file

* `tls_servername` **NAME** allows you to set a server name in the TLS configuration; for instance 9.9.9.9
  needs this to be set to `dns.quad9.net`. Multiple upstreams are still allowed in this scenario,
  but they have to use the same `tls_servername`. E.g. mixing 9.9.9.9 (QuadDNS) with 1.1.1.1
  (Cloudflare) will not work.
* `policy` specifies the policy to use for selecting upstream servers. The default is `random`.
  * `random` is a policy that implements random upstream selection.
  * `round_robin` is a policy that selects hosts based on round robin ordering.
  * `sequential` is a policy that selects hosts based on sequential ordering.
* `health_check`, use a different **DURATION** for health checking, the default duration is 0.5s.
* `max_concurrent` **MAX** will limit the number of concurrent queries to **MAX**.  Any new query that would
  raise the number of concurrent queries above the **MAX** will result in a SERVFAIL response. This
  response does not count as a health failure. When choosing a value for **MAX**, pick a number
  at least greater than the expected *upstream query rate* * *latency* of the upstream servers.
  As an upper bound for **MAX**, consider that each concurrent query will use about 2kb of memory.

Also note the TLS config is "global" for the whole forwarding proxy if you need a different
`tls-name` for different upstreams you're out of luck.

On each endpoint, the timeouts of the communication are set by default and automatically tuned depending early results.

* dialTimeout by default is 30 sec, and can decrease automatically down to 100ms
* readTimeout by default is 2 sec, and can decrease automatically down to 200ms

## Metrics

If monitoring is enabled (via the *prometheus* plugin) then the following metric are exported:

* `coredns_forward_request_duration_seconds{to}` - duration per upstream interaction.
* `coredns_forward_request_count_total{to}` - query count per upstream.
* `coredns_forward_response_rcode_count_total{to, rcode}` - count of RCODEs per upstream.
* `coredns_forward_healthcheck_failure_count_total{to}` - number of failed health checks per upstream.
* `coredns_forward_healthcheck_broken_count_total{}` - counter of when all upstreams are unhealthy,
  and we are randomly (this always uses the `random` policy) spraying to an upstream.
* `max_concurrent_reject_count_total{}` - counter of the number of queries rejected because the
  number of concurrent queries were at maximum.
Where `to` is one of the upstream servers (**TO** from the config), `rcode` is the returned RCODE
from the upstream.

## Examples

Proxy all requests within `example.org.` to a nameserver running on a different port:

~~~ corefile
example.org {
    forward . 127.0.0.1:9005
}
~~~

Load balance all requests between three resolvers, one of which has a IPv6 address.

~~~ corefile
. {
    forward . 10.0.0.10:53 10.0.0.11:1053 [2003::1]:53
}
~~~

Forward everything except requests to `example.org`

~~~ corefile
. {
    forward . 10.0.0.10:1234 {
        except example.org
    }
}
~~~

Proxy everything except `example.org` using the host's `resolv.conf`'s nameservers:

~~~ corefile
. {
    forward . /etc/resolv.conf {
        except example.org
    }
}
~~~

Proxy all requests to 9.9.9.9 using the DNS-over-TLS protocol, and cache every answer for up to 30
seconds. Note the `tls_servername` is mandatory if you want a working setup, as 9.9.9.9 can't be
used in the TLS negotiation. Also set the health check duration to 5s to not completely swamp the
service with health checks.

~~~ corefile
. {
    forward . tls://9.9.9.9 {
       tls_servername dns.quad9.net
       health_check 5s
    }
    cache 30
}
~~~

Or with multiple upstreams from the same provider

~~~ corefile
. {
    forward . tls://1.1.1.1 tls://1.0.0.1 {
       tls_servername cloudflare-dns.com
       health_check 5s
    }
    cache 30
}
~~~

## Bugs

The TLS config is global for the whole forwarding proxy if you need a different `tls_servername` for
different upstreams you're out of luck.

## Also See

[RFC 7858](https://tools.ietf.org/html/rfc7858) for DNS over TLS.
ass='deletions'>-1/+64 2022-12-01ComptimeStringMap is faster than ExactSizeMatcherGravatar Jarred Sumner 1-37/+22 2022-12-01Fix timers keeping process alive unnecessarilyGravatar Jarred Sumner 2-6/+12 2022-12-01bun test `toStrictEqual` (#1568)Gravatar Dylan Conway 8-46/+277 2022-12-01[`bun:sqlite`] ~15% perf improvement to all() and get()Gravatar Jarred Sumner 2-79/+175 2022-12-013x faster `TextEncoder.prototype.encodeInto`Gravatar Jarred Sumner 4-23/+32 2022-12-01Add test for DOMJIT call version of encodeIntoGravatar Jarred Sumner 1-2/+7 2022-12-01Fix crash in test runner with gc modeGravatar Jarred Sumner 2-1/+4 2022-12-01Include size of ArrayBuffer and others in GC timer schedulingGravatar Jarred Sumner 1-1/+1 2022-12-01Incorrect readmeGravatar Jarred Sumner 1-1/+1 2022-12-01Increase sensitivity of GC timersGravatar Jarred Sumner 1-2/+3 2022-12-01Reduce memory usage in Bun.serve() by up to 3x (#1569)Gravatar Jarred Sumner 16-37/+216 2022-11-30formatting and remove commentGravatar Dylan Conway 2-13/+10 2022-11-30fix slow regex testsGravatar Dylan Conway 1-20/+26 2022-11-30Fix console.log regression with emojiGravatar Jarred Sumner 2-2/+6 2022-11-30Redirect imports to `"readable-stream"` -> `"node:stream"`Gravatar Jarred Sumner 1-1/+4 2022-11-30Accidentally deleted `prompt` as a result of the process.env changesGravatar Jarred Sumner 2-7/+1 2022-11-29:scissors:Gravatar Jarred Sumner 5-626/+7 2022-11-29Remove some dead bindings codeGravatar Jarred Sumner 7-1401/+69 2022-11-29import everything from "bun" where possibleGravatar Jarred Sumner 161-739/+621 2022-11-28Make .toInt64 fasterGravatar Jarred Sumner 1-0/+8 2022-11-28Handle when the process already exited immediatelyGravatar Jarred Sumner 3-38/+47 2022-11-28toHaveProperty and tests (#1558)Gravatar Dylan Conway 8-118/+532 2022-11-28Fix failing spawn() and spawnSync() testsGravatar Jarred Sumner 7-143/+317 2022-11-28Update README.mdGravatar Jarred Sumner 1-1/+26 2022-11-28Update README.mdGravatar Jarred Sumner 1-5/+5 2022-11-28Add small section about profiling bunGravatar Jarred Sumner 1-0/+171 2022-11-28Fix hanging in FIFO streamsGravatar Jarred Sumner 1-1/+2 2022-11-28[internal] Add debug timerGravatar Jarred Sumner 2-2/+33 2022-11-28Ensure we report errors in controller.closeGravatar Jarred Sumner 1-1/+9 2022-11-28Update JSEnvironmentVariableMap.cppGravatar Jarred Sumner 1-1/+1 2022-11-28Allow overriding node:fsGravatar Jarred Sumner 1-0/+9 2022-11-28Add test for processGravatar Jarred Sumner 1-3/+14 2022-11-28Fix process.env and Bun.env object spreadGravatar Jarred Sumner 8-204/+127 2022-11-27Fix `console.log(process.env)`Gravatar Jarred Sumner 2-20/+27