FortiGate 80F to Unifi Security Gateway Pro 4 IPSec Tunnel Issues

I have recently replaced an older Cisco ASA 5550 with a FortiGate 80F. Firewalls are not exactly in my guru status, so I do bring in a networking consultant company to help with these types of things. They are actually the ones the recommended the FortiGate 80F. However, they are quite busy, so it’s up to me to learn the systems and troubleshoot much of it, especially when it falls out of the FortiGate brand.

For years, the Cisco ASA 5550 had an IPSec Tunnel that worked flawlessly with the Unifi Security Gateway Pro 4 (USG-PRO-4). But the Cisco did not work well for more modern firewalls with other companies and AWS, which is one of the primary reasons I changed it out.

Things seemed to go well at first after a weekend install, though we didn’t do much monitoring of the specific tunnel as we didn’t get any complaints, and it’s not used that often under sustained traffic. However, the next weekend I received a complaint that RDP from the FortiGate SSL-VPN tunnel, through the USG tunnel, was failing about every 1.5 minutes.

Being new to the FortiGate, and not having touch the Unifi interface in years, it was cold turkey learning for me.

The network consultants and I went through a great deal of troubleshooting, including turning off DTLS on SSL-VPN. They saw packet errors, via the counters, on the tunnel using the following command on the FortiGate:

get vpn ipsec tunnel summary

which returned

tx packets: 1992 bytes: 1092273 errors: 134

The USG side is connected to the Internet via a cable modem from Cox on a business plan. It isn’t exactly fiber and can be prone to oversubscription (as we saw during the pandemic), however, the Internet connection wasn’t showing any symptoms there, and we didn’t show much if any, packet loss or errors via some long ping cycles to services such as Google DNS (8.8.8.8) and the public side of the FortiGate. They saw about 300ms ping times with some Jitter coming from their network, but we didn’t feel that this really was unordinary for a cable modem. We even opened a Tier 3 ticket with Cox and they monitored the traffic for 24 hours, seeing no issues.

We did try some diagnostic tools such as iperf3 and WinMTR, but those didn’t really give us anything useful.

Dead Peer Detection (DPD) was disabled on the IPsec tunnel. DTLS was disabled on the SSL-VPN. And Perfect Forward Secrecy (PFS) was disabled.

On the FortiGate, I saw a lot of “phase 1 negotiate error w/ PAYLOAD-MALFORMED” and “phase 2 negotiate error w/ progress IPsec phase 2” errors in the VPN events section of the FortiNet UI.

After off DPD and PFS, I realized that had a big impact on the frequency of tunnel reconnections, so I did some more digging.

After disabling PFS, the “PAYLOAD-MALFORMED” errors went away and I started seeing “INVALID-ID-INFORMATION”.

I did some playing around with using IKEv2 on Phase 1. But that did not work and reverted back to IKEv1. I also messed with the AES and SH1 encryption and hashing along with the DH groups. When I change these, the tunnel starts to work in many cases. But then the issues just come back after awhile. For now I settled on AES128-SHA1 with DH 5.

I disabled allowing the USG VPN to access the site-to-site tunnel. This was causing error noise as the USG VPN IP Pool was not setup to be received on the FortiGate. I also corrected/narrowed the Phase 1 and Phase 2 selector encryption and hashing options on the FortiGate. The FortiGate had copied over settings from the Cisco 5550, which was fairly broad on allowing many options, But here, those were just causing error noise and negotiation issues. These are common across the USG for phase 1 and phase 2, but separate, hidden away under advanced, in the FortiGate. After doing this the packet errors went away!

At this time I learned how to output IPSec logs on the FortiGate via SSH:

diagnose debug duration 480
diagnose debug console timestamp enable
diagnose vpn ike log-filter dst-addr4 1.2.3.4
dia debug application ike -1
dia vpn ike gateway flush name %Tunnel-Name%
dia vpn tunnel stat flush %Tunnel-Name%
dia deb en
diagnose debug disable
diagnose debug reset

I started with -255 verbosity, which only gave me this:

ike 0:site1:3903: nat unavailable
ike 0:site1:3903:Site1-sslvpn:578090: quick-mode negotiation failed due to retry timeout
ike 0:site1:3903:Site1-sslvpn:578119: quick-mode negotiation failed due to retry timeout
ike 0:site1:3903:Site1-sslvpn:578154: quick-mode negotiation failed due to retry timeout

But once I started using -1, it gave me a great deal more useful information. This started giving me “phase 2 proposals not being received”, which was a useful clue.

This round it started failing at Phase 2 with this error around 60 hours in.

I then started doing some digging on the USG to see what I can find via SSH. VPN logs are stored on the USG at /var/log/charon.log.

tail -f /var/log/charon.log

However, these did not prove to be much use. I then found a real-time console output of the IPsec connection information using “swanctl”. See https://docs.strongswan.org/docs/5.9/swanctl/swanctl.html.

sudo swanctl --log

at which point I found these, when the cycle of constant tunnel reconnections occurs:

invalid HASH_V1 payload length, decryption failed?

It turns out the USG-PRO-4 runs the StrongSwan server, version 5.2.2, for IPSec, and the Ubuntu version is from 2019. From what I’m reading, this is the same server the Cisco ASA ran. I would assume around the same version. This version is from 2015, and there have been many gripes with issues in this version area.

After some investigation, I found that people were resolving this with a shorter pre-shared key and removing special characters. This was an IPSec connection from a mobile device. Says Android 7 worked fine, but started getting this with Android 9. Though, this seemed to appear in earlier minor versions. So I’m not confident this is related.

I’m also seeing suggestions to dumb down the encryption even further from AES128-SHA1 to 3DES-SHA1.

Now granted, it takes about 4 days for this issue to arise, so it’s possible they didn’t wait long enough and follow up in the forum. But it’s worth a shot to simplify the pre-shared key. Currently, 22 characters consisting of upper/lower and numeric. Going to just 8 upper/lower characters.

Interestingly enough, I found the config files on the USG for Strongswan that contain the IPSec/VPN configs and pre-shared keys.

The pre-shared rekey didn’t work, however, I’m not positive I got it to the state where that mattered. After a number of tries to low-key reset things (config reload, tunnel down/up, etc) it seemed that only ‘sudo ipsec restart’ (restarting the ipsec/strongswan service on the USG took care of it. A provision to the USG didn’t do the trick, not sure why.

sudo ipsec restart

So if this problem crops up again (in 4 days?), I’ll create a cron job to restart ipsec at 3am each day.

In conclusion, it appears the major issues are:

  • Dated version of the USG’s strongSwan server from 2015, even on the current firmware version
    • Potentially reducing the pre-shared key complexity/size (TBD)
  • Removing encryption/hashing options that do not exist specifically on each IPSec’s configuration
  • Removing the auto-generated tunnel selectors on the USG that do not have a policy on the FortiGate

Here is what you want to see in diagnostic console output on the FortiGate for phase 1:

2023-05-25 17:55:47.927110 ike 0:Site1:5673:873531: peer proposal is: peer:0:10.11.12.0-10.11.12.255:0, me:0:10.13.14.0-10.13.14.255:0
2023-05-25 17:55:47.927138 ike 0:Site1:5673:Site1:873531: trying
2023-05-25 17:55:47.927480 ike 0:Site1:5673:Site1:873531: matched phase2
2023-05-25 17:55:47.927509 ike 0:Site1:5673:Site1:873531: autokey
2023-05-25 17:55:47.927551 ike 0:Site1:5673:Site1:873531: my proposal:
2023-05-25 17:55:47.927575 ike 0:Site1:5673:Site1:873531: proposal id = 1:
2023-05-25 17:55:47.927598 ike 0:Site1:5673:Site1:873531:   protocol id = IPSEC_ESP:
2023-05-25 17:55:47.927621 ike 0:Site1:5673:Site1:873531:      trans_id = ESP_AES_CBC (key_len = 128)
2023-05-25 17:55:47.927645 ike 0:Site1:5673:Site1:873531:      encapsulation = ENCAPSULATION_MODE_TUNNEL
2023-05-25 17:55:47.927668 ike 0:Site1:5673:Site1:873531:         type = AUTH_ALG, val=SHA1
2023-05-25 17:55:47.927696 ike 0:Site1:5673:Site1:873531: incoming proposal:
2023-05-25 17:55:47.927718 ike 0:Site1:5673:Site1:873531: proposal id = 0:
2023-05-25 17:55:47.927741 ike 0:Site1:5673:Site1:873531:   protocol id = IPSEC_ESP:
2023-05-25 17:55:47.927763 ike 0:Site1:5673:Site1:873531:      trans_id = ESP_AES_CBC (key_len = 128)
2023-05-25 17:55:47.927786 ike 0:Site1:5673:Site1:873531:      encapsulation = ENCAPSULATION_MODE_TUNNEL
2023-05-25 17:55:47.927813 ike 0:Site1:5673:Site1:873531:         type = AUTH_ALG, val=SHA1
2023-05-25 17:55:47.927844 ike 0:Site1:5673:Site1:873531: negotiation result
2023-05-25 17:55:47.927867 ike 0:Site1:5673:Site1:873531: proposal id = 0:
2023-05-25 17:55:47.927889 ike 0:Site1:5673:Site1:873531:   protocol id = IPSEC_ESP:
2023-05-25 17:55:47.927912 ike 0:Site1:5673:Site1:873531:      trans_id = ESP_AES_CBC (key_len = 128)
2023-05-25 17:55:47.927935 ike 0:Site1:5673:Site1:873531:      encapsulation = ENCAPSULATION_MODE_TUNNEL
2023-05-25 17:55:47.927957 ike 0:Site1:5673:Site1:873531:         type = AUTH_ALG, val=SHA1
2023-05-25 17:55:47.927979 ike 0:Site1:5673:Site1:873531: using tunnel mode.

When phase 2 expires, you want to see this:

2023-05-25 18:03:36.056951 ike 0:Site1: IPsec SA {id}/{id} hard expired 23 1.2.3.4->5.6.7.8:0 SA count 2 of 4
2023-05-25 18:03:36.057106 ike 0:Site1:5673: send IPsec SA delete, spi {id}
2023-05-25 18:03:36.057224 ike 0:Site1:5673: enc {id}
2023-05-25 18:03:36.057273 ike 0:Site1:5673: out {id}
2023-05-25 18:03:36.057361 ike 0:Site1:5673: sent IKE msg (IPsec SA_DELETE-NOTIFY): 1.2.3.4:500->5.6.7.8:500, len=76, vrf=0, id={id}/{id}:{id}

If all goes well, I’ll look into adding back in DPD, PFS, and DTLS as well as increasing the encryption, hashing, and DH levels.

Other UniFi USG Commands for IPSec:

sudo ipsec statusall
sudo ipsec up <connection_name>
sudo ipsec down <connection_name>

Now that I’m aware of the legacy server versions on the USG-PRO-4, the end-game is to replace the device, perhaps with a EdgeRouter X. The current firmware is up-to-date and I’ve seen Ubiquiti’s lack of update and support for this item, even though it’s not marked end-of-life.

#80f, #fortigate, #ipsec, #networking, #security-gateway, #tunnel, #unifi, #usg-pro-4