Sunday, November 4, 2007

Using traceroute with icmp and tcp

When debugging network problems we usually have a few reflexes:

  • ping the host
  • traceroute the host
  • connect to the port (1)
  • ...

But when we are confronted to firewalls on the path it is a little annoying. Many firewalls block ICMP messages thus making the good old pings and traces useless. That's why the TCPtraceroute (or similar) is a very useful principle that is really under-used in normal debug situations.

Situation

Let's imagine the following situation: My colleague John wants to understand why the mails are not working. This server, called BRUMAIL, is, based on the best practices concerning perimeter services, placed in a DMZ. Unfortunately John hasn't access to the configuration of the firewall. This device, and everything towards the internet, is managed by an external company.

Steps of debugging

John, based on the advice of his colleagues, performed the above steps:ping, traceroute, netcat

To test if the host was alive he first performed a ping. The ping gave this result:

$ ping -c 2 193.190.255.40
PING 193.190.255.40 (193.190.255.40) 56(84) bytes of data.

--- 193.190.255.40 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1014ms

The ping wasn't really productive, so the next thing John tried was a traceroute:

$ tracepath -n 193.190.255.40
 1:  213.251.167.209   0.200ms pmtu 1500
 1:  213.251.167.252   0.819ms 
 2:  213.186.32.1      0.634ms 
 3:  213.186.32.4    asymm  2   2.467ms 
 4:  194.68.129.183   16.559ms 
 5:  193.191.1.1      16.443ms 
 6:  193.191.1.174    17.083ms 
 7:  no reply
 8:  no reply
 9:  no reply

That's annoying... John sees that he doesn't get replies from pings he sent starting from hop 7. Hop 7 being not his destination John concludes this is a firewall that's in-between.
Suddenly a colleague of John remembers that BRUMAIL also runned a webserver and proposes him to simply test surfing to that machine, or to perform a netcat connection:

$ nc -n -vv 193.190.255.40 25
(UNKNOWN) [193.190.255.40] 25 (smtp) : Connection timed out
 sent 0, rcvd 0

$ nc -n -vv 193.190.255.40 80
(UNKNOWN) [193.190.255.40] 80 (www) open
 sent 0, rcvd 0

The conclusion is that the webserver runs fine. (John also tested it using a browser). As the smtp service is working locally John concludes this is a firewall problem. Unfortunately the company responsible of the firewall says their firewall is really configured correctly. In the meantime Johns boss becomes annoyed as the thing is still not solved. So John needs to find a way to proof that the firewall is the origin of the problem.

The Solution

TCPtraceroute (or the many variants) is the solution. In comparison to the normal traceroute tcptraceroute doesn't use ICMP packets. But it still uses the TTL-principle to perform the hopping. Thanks to the fact it uses normal tcp packets it can pass trough firewalls that block icmp. Let's ask John to do the trace again, but now with tcp packets on port 80. The working port.

$ tcptraceroute -i eth0 -n 193.190.255.40 80
Selected device eth0, address 213.251.167.209, port 53545 for outgoing packets
Tracing the path to 193.190.255.40 on TCP port 80 (www), 30 hops max
 1  213.251.167.252  0.522 ms  0.378 ms  0.544 ms
 2  213.186.32.1  0.475 ms * 5.214 ms
 3  213.186.32.4  0.657 ms * 1.252 ms
 4  194.68.129.183  15.986 ms  16.138 ms  15.946 ms
 5  193.191.1.1  15.824 ms  16.130 ms  16.017 ms
 6  193.191.1.174  16.204 ms  16.683 ms  16.917 ms
 7  193.191.9.29  17.168 ms  17.091 ms  17.676 ms
 8  193.190.255.40 [open]  16.953 ms  17.342 ms  17.429 ms

Wait, in the previous traceroute we didn't get answers starting from hop 7. Now John gets an answer from hop 7 and hop 8. He can now also assume the server with IP 193.191.9.29 is the firewall. He immediately tests it with port 25 (smtp).

$ tcptraceroute -i eth0 -n 193.190.255.40 25
Selected device eth0, address 213.251.167.209, port 57399 for outgoing packets
Tracing the path to 193.190.255.40 on TCP port 25, 30 hops max
 1  213.251.167.252  0.504 ms  0.388 ms  0.396 ms
 2  213.186.32.1  0.292 ms * 6.403 ms
 3  213.186.32.4  0.977 ms * 0.691 ms
 4  194.68.129.183  16.052 ms  15.803 ms  15.862 ms
 5  193.191.1.1  16.088 ms  16.021 ms  15.870 ms
 6  193.191.1.174  16.806 ms  16.505 ms  16.571 ms
 7  * * *
 8  * * *
 9  * * *

Conclusion

The conclusion is simple. John has now the proof that the 7th hop, probably the firewall, blocks the packets for port 25. With this information he can fully convince the firewalling-company that their device is indeed wrongly configured.
Did John also notice this method can be used to guess what's behind firewalls and what firewalls block? What if the packet arrived at hop 7, but not at hop 8? Is that a sign that the firewall lets the traffic pass, but not the server?

(1) Using netcat or netcat win by example, and please NOT telnet!