When debugging network problems we usually have a few reflexes:
- ping the host
- traceroute the host
- connect to the port (1)
But when we are confronted to firewalls on the path it is a little annoying. Many firewalls block ICMP messages thus making the good old pings and traces useless. That's why the TCPtraceroute (or similar) is a very useful principle that is really under-used in normal debug situations.
Let's imagine the following situation: My colleague John wants to understand why the mails are not working. This server, called BRUMAIL, is, based on the best practices concerning perimeter services, placed in a DMZ. Unfortunately John hasn't access to the configuration of the firewall. This device, and everything towards the internet, is managed by an external company.
Steps of debugging
John, based on the advice of his colleagues, performed the above steps:ping, traceroute, netcat
To test if the host was alive he first performed a ping. The ping gave this result:
$ ping -c 2 22.214.171.124 PING 126.96.36.199 (188.8.131.52) 56(84) bytes of data. --- 184.108.40.206 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1014ms
The ping wasn't really productive, so the next thing John tried was a traceroute:
$ tracepath -n 220.127.116.11 1: 18.104.22.168 0.200ms pmtu 1500 1: 22.214.171.124 0.819ms 2: 126.96.36.199 0.634ms 3: 188.8.131.52 asymm 2 2.467ms 4: 184.108.40.206 16.559ms 5: 220.127.116.11 16.443ms 6: 18.104.22.168 17.083ms 7: no reply 8: no reply 9: no reply
That's annoying... John sees that he doesn't get replies from pings he sent starting from hop 7. Hop 7 being not his destination John concludes this is a firewall that's in-between.
Suddenly a colleague of John remembers that BRUMAIL also runned a webserver and proposes him to simply test surfing to that machine, or to perform a netcat connection:
$ nc -n -vv 22.214.171.124 25 (UNKNOWN) [126.96.36.199] 25 (smtp) : Connection timed out sent 0, rcvd 0 $ nc -n -vv 188.8.131.52 80 (UNKNOWN) [184.108.40.206] 80 (www) open sent 0, rcvd 0
The conclusion is that the webserver runs fine. (John also tested it using a browser). As the smtp service is working locally John concludes this is a firewall problem. Unfortunately the company responsible of the firewall says their firewall is really configured correctly. In the meantime Johns boss becomes annoyed as the thing is still not solved. So John needs to find a way to proof that the firewall is the origin of the problem.
TCPtraceroute (or the many variants) is the solution. In comparison to the normal traceroute tcptraceroute doesn't use ICMP packets. But it still uses the TTL-principle to perform the hopping. Thanks to the fact it uses normal tcp packets it can pass trough firewalls that block icmp. Let's ask John to do the trace again, but now with tcp packets on port 80. The working port.
$ tcptraceroute -i eth0 -n 220.127.116.11 80 Selected device eth0, address 18.104.22.168, port 53545 for outgoing packets Tracing the path to 22.214.171.124 on TCP port 80 (www), 30 hops max 1 126.96.36.199 0.522 ms 0.378 ms 0.544 ms 2 188.8.131.52 0.475 ms * 5.214 ms 3 184.108.40.206 0.657 ms * 1.252 ms 4 220.127.116.11 15.986 ms 16.138 ms 15.946 ms 5 18.104.22.168 15.824 ms 16.130 ms 16.017 ms 6 22.214.171.124 16.204 ms 16.683 ms 16.917 ms 7 126.96.36.199 17.168 ms 17.091 ms 17.676 ms 8 188.8.131.52 [open] 16.953 ms 17.342 ms 17.429 ms
Wait, in the previous traceroute we didn't get answers starting from hop 7. Now John gets an answer from hop 7 and hop 8. He can now also assume the server with IP 184.108.40.206 is the firewall. He immediately tests it with port 25 (smtp).
$ tcptraceroute -i eth0 -n 220.127.116.11 25 Selected device eth0, address 18.104.22.168, port 57399 for outgoing packets Tracing the path to 22.214.171.124 on TCP port 25, 30 hops max 1 126.96.36.199 0.504 ms 0.388 ms 0.396 ms 2 188.8.131.52 0.292 ms * 6.403 ms 3 184.108.40.206 0.977 ms * 0.691 ms 4 220.127.116.11 16.052 ms 15.803 ms 15.862 ms 5 18.104.22.168 16.088 ms 16.021 ms 15.870 ms 6 22.214.171.124 16.806 ms 16.505 ms 16.571 ms 7 * * * 8 * * * 9 * * *
The conclusion is simple. John has now the proof that the 7th hop, probably the firewall, blocks the packets for port 25. With this information he can fully convince the firewalling-company that their device is indeed wrongly configured.
Did John also notice this method can be used to guess what's behind firewalls and what firewalls block? What if the packet arrived at hop 7, but not at hop 8? Is that a sign that the firewall lets the traffic pass, but not the server?