Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH? from Laurent Bercot on 2022-10-11 (skaware)

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Mon, 10 Oct 2022 22:37:04 +0000

>However, the OS would still deliver them to skadnsd in a recv() /
>recvfrom() call, right? If my reading of the truss outputs is correct,
>the HardenedBSD system isn't getting a response at all,

  That's right, which is why my hypothesis of the RD bit filter only
applied to OmniOS, which did get responses but these got ignored
by skadnsd. On HardenedBSD, 18 queries getting no answers from the
caches is absolutely a different problem.

> and whatever
>error happens with the program running on the OmniOS system, if any,
>does not involve the network

  It involves the relevance test:

https://github.com/skarnet/s6-dns/blob/master/src/libs6dns/s6dns_engine.c#L32
  This function is called on every incoming message that is a potential
response. If it returns 0, the message is deemed irrelevant to the
current query, and ignored. When you see a recv() (or recvfrom()) from
a UDP socket, but no answer is reported to the client and the socket is
still polled until it times out, it means that the relevant() test
failed.

  Until tonight, the "h.rd != (q[2] & 1)" test, i.e. "is the rd bit of
the response different from the rd bit of the query", was performed
outside of the "strict" guard. This made some responses be ignored as
malformed, because it's the cache not following the RFC; it is quite
possible that it's what happened on OmniOS here.

> (I can't tell if skadnsd is delivering
>all received answers to the client).

  After the first one which is a connection/synchronization marker,
a write() to the async pipe to the client (10 on HardenedBSD, 9 on
OmniOS) is an answer or a sequence of answers. (skadnsd buffers the
answers into a textmessage_sender, i.e. a bufalloc, which is flushed
at the next ppoll() invocation.) Writes of length 7 are failures
(4 bytes length, 2 bytes query id, 1 byte errno); writes of length
14 are 2 reports of failure, you can see it in the string. 28 is
4 failures; 95 and 140 are likely 1 success (length, query id, 0
for success, then the response packet); 279 is likely two successes.

  At the end of the traces, we get EOF on 0 while there are still a
lot of sockets being polled. That's the client exiting - or at least
closing the skadns connection - while some queries are in-flight.
The bro math checks out, it definitely looks like all received
answers, positive and negative, have been delivered.

>I feel that packet capture tools like tcpdump(1) or OmniOS' snoop(8)
>would be better suited for answering the questions that have been
>raised so far (malformed packets, ignored responses, lack of
>responses, etc.).

  strace has an option to print full strings. truss should have a
similar option (if its display can be trusted...) You're right that
packet capture tools would be good to use in this situation, but since
I personally loathe using them, I don't want to ask other people to
use them, and I can work with what we have. On HardenedBSD at least,
the traces are readable.

> Also, aren't 18 outstanding queries in a short
>amount of time from one single host, like, a lot? Couldn't Shaw's
>caches think that they are being DoS'ed :P ?

  That's definitely possible, and I would say likely, but I don't want
to lay the blame on others before making sure we're in the clear. :)

--
  Laurent

Received on Tue Oct 11 2022 - 00:37:04 CEST

This archive was generated by hypermail 2.4.0 : Tue Oct 11 2022 - 00:37:32 CEST