You are hereTips

Tips


Debugging IPsec VPN Tunnels

Today I had to debug an IPsec VPN tunnel between OpenSwan and Cisco PIX.
Being the third person to work on this 'problem', you could call this last-possible-escalation. Time to give a little hint to know what to look at and where the problem can be. Let's hope these tips&tricks will help others...

Usually IPsec VPN problems can be resumed to five points (order is important):

  1. Incorrect Phase 1 settings
  2. Incorrect Phase 2 settings
  3. Wrong routing
  4. NATting when the VPN traffic should be NATted
  5. Incompatible IPsec stacks

Phase 1 & Phase 2

The first two points seem very simple. But unfortunately it's usually the first place where the problem lies. They key is to clearly choose the settings and give them to 'the other side'. Communication is the key of success here.
Once you and 'the other side' have configured the tunnel, check the logs. If you see messages saying that 'Phase 1 is completed' stop looking at the settings of Phase 1. They are correct ! Check Phase 2 now.
If your tunnel doesn't complete Phase 1, then check the settings again, and again. Ask the other party to dictate what they see. The annoying part is that when you have different brands of VPN devices, the interface to configure these is usually different. So it's more difficult to compare the settings.

A one-million-euro rule: "If you can't see the configuration with your own eyes, don't trust what they say. See to believe !"

Ok, now your IPsec implementation tells you the tunnel is up & running. But "it still doesn't work".

Wrong routing

A simple traceroute or tracert should tell you what is wrong. Nothing very difficult.

NATting

Traffic that enters the VPN tunnel shouldn't be NATted. Usually the firewall NATs all traffic from inside to outside. As your VPN tunnel also has traffic from inside to outside it can become NATted. Check these settings to prevent your tunnel-traffic to be natted. If natted it won't match the encryption domain and will not enter the tunnel.

If the tunnel works for traffic in one direction, but not in the other direction the solution is 'no NAT'.

With Cisco devices the debug icmp trace is a great help. (don't forget the terminal monitor to see the debugging). Now start a ping from a host in your network and see the output in your CLI. Full info about the ICMP packet and what NAT has been applied will appear on the screen.

Incompatible IPsec stacks

This happens, not often but it happens. It's the most difficult thing to detect as you must be 100% sure that your settings are correct. Don't take this easy solution as granted too fast.

To solve this try to upgrade your firmware or use another device with a newer/older firmware and the same configuration.
Try with two devices of the same brand. As long as it works the customer will probably be happy.

A last thing

A last controversial thing I had to do in my career was to take a difficult decision. I had to make a VPN tunnel work between a Fortigate and a Symantec thing (sorry, but I really can't call this thing a firewall out of respect of real firewalls). First the Symantec firewall didn't show certain settings, so I had to reverse-engineer the default configuration. Also another downside was that the device just wouldn't bring up the tunnel all by itself when seeing 'interesting traffic'. It had to be started from the Fortigate. Once that stupid VPN tunnel was working (both directions) it just couldn't stay stable. The Fortigate already had 6 stable VPN connections to other sites, all running FortiOS (based on Linux). But this connection to that crap just crashed regularly without reason.

After many hours of troubleshooting I took my courage with both hands, called the customer and told him: "I could continue debugging this crap, but I can't promise any result. Instead I ask you to throw that Symantec thing away and replace it with a Fortigate of less than €500. In the end I will have a working VPN tunnel in maximum 4 hours, making this a total cost of €900. Compare this to the unknown cost of further troubleshooting without certainty of results.

The customer thanked me for my honesty, a day later they ordered the device and once it was delivered I had a working, stable VPN tunnel in a few hours.

Lesson learned.

Side-effects of using SVN/CVS with your website

While reading security papers I realize we did something very stupid with the FOSDEM webserver configuration.
Like many websites we use a versioning system (CVS/SVN) to keep track of the changes and to synchronize between our sandbox-website and our public website. But something I totally forgot was one of the side-effects of doing this.

Subversion and CVS both create their hidden .svn or .cvs directories. Subversion uses the information in .svn to keep track of things like:

  • Which repository location(s) are represented by the files and subdirectories in the working copy directory.
  • What revision of each of those files and directories are currently present in the working copy.
  • Any user-defined properties that might be attached to those files and directories.
  • Pristine (un-edited) copies of the working copy files.

For more information about this .svn directory the documentation is a good place.

So what's the problem?

What's exactly in that directory?

chri@sophos:/home/services/www/fosdem.org/.svn$ ls -aR
.:
.  ..  empty-file  entries  format  prop-base  props  README.txt  text-base  tmp  wcprops
./prop-base:
.                       .htaccess.svn-base          install.php.svn-base      .project.svn-base     xmlrpc.php.svn-base
..                      index.php.svn-base          INSTALL.txt.svn-base      robots.txt.svn-base
CHANGELOG.txt.svn-base  INSTALL.mysql.txt.svn-base  LICENSE.txt.svn-base      update.php.svn-base
cron.php.svn-base       INSTALL.pgsql.txt.svn-base  MAINTAINERS.txt.svn-base  UPGRADE.txt.svn-base
...some more...
./tmp:
.  ..  prop-base  props  text-base  wcprops
...still more...

That's scary!! What happens if I enter the exact path of one of these files? Let's say: http://fosdem.org/2008/.svn/entries

<?xml version="1.0" encoding="utf-8"?>
<wc-entries
   xmlns="svn:">
<entry
   committed-rev="288"
   name=""
   committed-date="2007-10-18T05:14:28.629800Z"
   url="file:///XXXX/svn/site/trunk/drupal5"
   last-author="loki"
   kind="dir"
   uuid="e968193e-8020-0410-a39f-a17aa7e0140e"
   repos="file:///XXXX/svn/site"
   revision="288"/>
<entry
   name="profiles"
   kind="dir"/>
...and much more...

If you are creative, and understand what can be in these directories you will understand the danger of publishing it.

How do I prevent this?

A few solutions exist, it all depends on your creativity and the tools you want to use. The easiest solution is to stop using versioning on your website. But that's probably not something you like ;-).

As we run mod_security on the FOSDEM server I quickly added the following rule: SecFilter "\.svn/". This way the user gets an 406 error message.

Another solution would be to configure apache to prevent showing .svn files and directories, just what happens with the .htaccess files. You can do this like this:

<DirectoryMatch "\.svn">
Deny from all
</DirectoryMatch>

You can check by yourself: http://fosdem.org/2008/.svn/entries


Oh, btw if you want to harden your Apache webserver the following document is a good start: http://httpd.apache.org/docs/2.2/misc/security_tips.html


To telnet or not to telnet? To SSH !

This evening I was thinking about what I could write that could potentially interest fellow network and security people.
In my previous post I told a little story about tracerouting and connecting to a port using netcat and not Telnet. My plan is to write something about the difference between the two applications and when we should use one or the other. But not this evening.

Thinking about telnet... Last week I had a short discussion with my colleague concerning all these network-people that still use telnet to manage their network devices. In this 21st century, and as a security person, I can't imagine creating such a security-hole in the network of my customers.

Why should(n't) we use the telnet protocol?

  • + Telnet is widely used
  • + Telnet is a fairly simple protocol
  • + The Telnet client is installed by default on Microsoft Windows
  • - Telnet is using clear text authentication
  • - Telnet is using clear text data-transfer
  • - Many Telnet daemons had several vulnerabilities
  • - Secure alternatives like SSH exist

So why don't we all use SSH then?

It's very clear to all of us that Telnet is simply insecure-by-design®. But why are we still using it? Why do many sysadmins still leave that thing enabled by default, or why do they even enable it when it's turned off?

I think I finally discovered this during a discussion with that colleague. The reason can be resumed to only one, very simple, word.

laziness

It's just because we/they don't know by heart how to enable SSH. Enabling it is indeed a little more complex than Telnet. And why don't they look it up on the internet or in the documentation? Simply laziness...

Ok guys, now you don't have an excuse anymore ! Doh, this message is also published on the net, so they won't find it...

Enabling SSH on Cisco and HP devices

Even if I just realized it's completely useless to publish these commands once again on the net I'll just paste them here for my own future reference.

Cisco Switch with CatOS

set crypto key rsa 1024
set ip permit 10.0.0.0 255.255.255.0
set ip permit enable ssh
show ip permit
write memory

Cisco Router or Switch with IOS

hostname myrouter
ip domain-name vandeplas.lab
!--- generate the rsa keygen
cry key generate rsa
!--- allow authentication using local aaa
username chri password myVerySecurePassword
!--- Use SSH v2 as v1 is insecure
ip ssh version 2
ip ssh time-out 60
ip ssh authentication-retries 2
!--- Prevent non-SSH connections
transport input ssh
write memory

Cisco PIX/ASA

hostname mypix
domain-name vandeplas.lab

!--- generate the rsa key 
ca generate rsa key 1024
!--- or on newer versions
crypto key generate rsa modulus 1024
!--- don't forget to save the crypto key into the memory
ca save all
!--- allow ssh from the network 10.0.0.0/24 on the inside interface
ssh 10.0.0.0 255.255.255.0 inside
!--- allow authentication using local aaa
username chri password myVerySecurePassword
aaa authentication ssh console LOCAL
!--- save the running-config 
write memory

HP Procurve Switch

crypto key generate
ip ssh version 2
ip ssh
write memory

Useful links

http://www.cisco.com/en/US/customer/tech/tk583/tk617/technologies_tech_note09186a00800949e2.shtml
http://cisco.com/en/US/partner/products/ps6350/products_configuration_guide_chapter09186a00804831de.html
http://cisco.com/en/US/partner/products/hw/vpndevc/ps2030/products_configuration_example09186a008069bf1b.shtml
http://cisco.com/en/US/partner/docs/security/pix/pix63/command/reference/s.html
http://h10025.www1.hp.com/ewfrf/wc/genericDocument?docname=c01139356&cc=ca&dlc=en&lc=en&jumpid=reg_R1002_CAEN
http://www.dice.inf.ed.ac.uk/groups/infrastructure/network/docs/5308.html






Using traceroute with icmp and tcp

When debugging network problems we usually have a few reflexes:

  • ping the host
  • traceroute the host
  • connect to the port (1)
  • ...

But when we are confronted to firewalls on the path it is a little annoying. Many firewalls block ICMP messages thus making the good old pings and traces useless. That's why the TCPtraceroute (or similar) is a very useful principle that is really under-used in normal debug situations.

Situation

Let's imagine the following situation: My colleague John wants to understand why the mails are not working. This server, called BRUMAIL, is, based on the best practices concerning perimeter services, placed in a DMZ. Unfortunately John hasn't access to the configuration of the firewall. This device, and everything towards the internet, is managed by an external company.

Steps of debugging

John, based on the advice of his colleagues, performed the above steps:ping, traceroute, netcat

To test if the host was alive he first performed a ping. The ping gave this result:

$ ping -c 2 193.190.255.40
PING 193.190.255.40 (193.190.255.40) 56(84) bytes of data.

--- 193.190.255.40 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1014ms

The ping wasn't really productive, so the next thing John tried was a traceroute:

$ tracepath -n 193.190.255.40
 1:  213.251.167.209   0.200ms pmtu 1500
 1:  213.251.167.252   0.819ms 
 2:  213.186.32.1      0.634ms 
 3:  213.186.32.4    asymm  2   2.467ms 
 4:  194.68.129.183   16.559ms 
 5:  193.191.1.1      16.443ms 
 6:  193.191.1.174    17.083ms 
 7:  no reply
 8:  no reply
 9:  no reply

That's annoying... John sees that he doesn't get replies from pings he sent starting from hop 7. Hop 7 being not his destination John concludes this is a firewall that's in-between.
Suddenly a colleague of John remembers that BRUMAIL also runned a webserver and proposes him to simply test surfing to that machine, or to perform a netcat connection:

$ nc -n -vv 193.190.255.40 25
(UNKNOWN) [193.190.255.40] 25 (smtp) : Connection timed out
 sent 0, rcvd 0

$ nc -n -vv 193.190.255.40 80
(UNKNOWN) [193.190.255.40] 80 (www) open
 sent 0, rcvd 0

The conclusion is that the webserver runs fine. (John also tested it using a browser). As the smtp service is working locally John concludes this is a firewall problem. Unfortunately the company responsible of the firewall says their firewall is really configured correctly. In the meantime Johns boss becomes annoyed as the thing is still not solved. So John needs to find a way to proof that the firewall is the origin of the problem.

The Solution

TCPtraceroute (or the many variants) is the solution. In comparison to the normal traceroute tcptraceroute doesn't use ICMP packets. But it still uses the TTL-principle to perform the hopping. Thanks to the fact it uses normal tcp packets it can pass trough firewalls that block icmp. Let's ask John to do the trace again, but now with tcp packets on port 80. The working port.

$ tcptraceroute -i eth0 -n 193.190.255.40 80
Selected device eth0, address 213.251.167.209, port 53545 for outgoing packets
Tracing the path to 193.190.255.40 on TCP port 80 (www), 30 hops max
 1  213.251.167.252  0.522 ms  0.378 ms  0.544 ms
 2  213.186.32.1  0.475 ms * 5.214 ms
 3  213.186.32.4  0.657 ms * 1.252 ms
 4  194.68.129.183  15.986 ms  16.138 ms  15.946 ms
 5  193.191.1.1  15.824 ms  16.130 ms  16.017 ms
 6  193.191.1.174  16.204 ms  16.683 ms  16.917 ms
 7  193.191.9.29  17.168 ms  17.091 ms  17.676 ms
 8  193.190.255.40 [open]  16.953 ms  17.342 ms  17.429 ms

Wait, in the previous traceroute we didn't get answers starting from hop 7. Now John gets an answer from hop 7 and hop 8. He can now also assume the server with IP 193.191.9.29 is the firewall. He immediately tests it with port 25 (smtp).

$ tcptraceroute -i eth0 -n 193.190.255.40 25
Selected device eth0, address 213.251.167.209, port 57399 for outgoing packets
Tracing the path to 193.190.255.40 on TCP port 25, 30 hops max
 1  213.251.167.252  0.504 ms  0.388 ms  0.396 ms
 2  213.186.32.1  0.292 ms * 6.403 ms
 3  213.186.32.4  0.977 ms * 0.691 ms
 4  194.68.129.183  16.052 ms  15.803 ms  15.862 ms
 5  193.191.1.1  16.088 ms  16.021 ms  15.870 ms
 6  193.191.1.174  16.806 ms  16.505 ms  16.571 ms
 7  * * *
 8  * * *
 9  * * *

Conclusion

The conclusion is simple. John has now the proof that the 7th hop, probably the firewall, blocks the packets for port 25. With this information he can fully convince the firewalling-company that their device is indeed wrongly configured.
Did John also notice this method can be used to guess what's behind firewalls and what firewalls block? What if the packet arrived at hop 7, but not at hop 8? Is that a sign that the firewall lets the traffic pass, but not the server?

(1) Using netcat or netcat win by example, and please NOT telnet!

Migration to Google Apps for your domain

I took the decision to migrate my mails to Google Apps for your domain.
Some nice things are already activated and working fine and documented.
As expected similar migrations always bring some issues with them, but this time I wanted to do everything just right.
That's why I started working on a Q & A section. When writing questions I came across a 'I was only using webmail. How do I migrate the mails from the old webmail to the new system?' question.
I immediately starting writing instructions to setup gmail to download pop messages from another server to the gmail account. (you can find this in the 'Settings' > 'Account' page)
Unfortunately this option is not available with the 'apps for your domain' system. I thus had to find a way to transfer all the mails from server A to server B (google).
Using imap to transfer mails was not possible as Google does not support this. A guess brought me on a fetchmail documentation quest. Browsing the website I became more and more excited when I came across this phrase in the FAQ: "Q: How can I forward mail to another host? A:To forward mail to a host other than the one you are running fetchmail on, use the smtphost or smtpname option. See the manual page for details."

After a few minutes I created this little script to do the work for me:
#!/bin/bash
PROTO="POP3"
SMTPHOST="ASPMX.L.GOOGLE.COM"
SERVER="vandeplas.com"
USERS="christophe familymember2 familymember3"
for user in ${USERS}; do
  fetchmail -v  -p ${PROTO} --smtphost ${SMTPHOST}\
     --smtpname ${user}@${SERVER}.test-google-a.com   -u ${user}@${SERVER} ${SERVER}
done;
Migration, here I come :-)

Edit: added links below
Usefull links are: Edit: It is now possible to configure the 'add POP account' in the GAFYD gmail settings. Your GUI language should be English to see this. http://christophe.vandeplas.com/2007/03/17/migration-google-apps-finished