Been a while since I added anything new here. Been busy trying to keep my head above water I guess. In any case, I came across a situation during $dayjob$ where I had to seperate two networks that were sharing the same VLAN to two distinct VLANS since they were actually always supposed to be seperate, and are geographically distinct as well. The router configuration was as follows:
! interface Vlan6 ip address 10.0.11.1 255.255.255.0 secondary ip address 10.0.1.250 255.255.255.0 no ip redirects no ip proxy-arp end !
Networks 10.0.1.0/24 and 10.0.11.0/24 were essentially on a common VLAN.
Anyway, the task seemed simple, leave 10.0.11.1/24 on VLAN6 and create VLAN5 on this router to add a routed interface between 10.0.11.0/24 and 10.0.1.0/24 which was being moved to another device.
! interface Vlan5 ip address 10.0.249.6 255.255.255.248 end !
Now, traffic from 10.0.1.0/24 would have to transit via 10.0.249.0/29, being routed by two different routers. Hosts in 10.0.1.0/24 would have to transit via 10.0.249.1(which is another interface on the new 10.0.1.1/24 router) and 10.0.249.6 to get to 10.0.11.0/24. Simple, everyday stuff. Right?
Well for a reason that at first escaped me, the one and only host in 10.0.11.0/24 that had communications with 10.0.1.0/24 could ping every other host in the network *except* the two I needed it to communicate with, and until I added the routed interfaces, was working perfectly. I was confounded until I tried a traceroute from one of the hosts in 10.0.1.0/24 back to 10.0.11.220:
[root@host1 ~]# traceroute 10.0.11.220 traceroute to 10.0.11.220 (10.0.11.220), 30 hops max, 60 byte packets 1 (10.0.1.210) 849.491 ms !H 849.454 ms !H 849.426 ms !H
Now why would I get a HOST_UNREACHABLE?!?!? From MYSELF!!!!(10.0.1.210 is host1) Here is my routing table
[root@host1 ~]# ip -4 route default via 10.0.1.251 dev eth0 10.0.1.0/24 dev eth0 proto kernel scope link src 10.0.1.210
Seems normal (?)
Other traceroutes towards 10.0.11.0/24 hosts were working:
[root@host1 ~]# traceroute 10.0.11.100 traceroute to 10.0.11.100 (10.0.11.100), 30 hops max, 60 byte packets 1 10.0.1.251 (10.0.1.251) 0.225 ms 0.192 ms 0.180 ms 2 10.0.249.6 (10.0.249.6) 5.591 ms 3 10.0.11.100 (10.0.11.100) 6.524 ms
My gateway for 10.0.1.0/24 is 10.0.1.251, and the above makes sense. It is only then I realized the host had cached an ICMP_REDIRECT for 10.0.11.220, and I checked this with the ip route command:
[root@host1 ~]# ip route get 10.0.11.220 10.0.11.220 via 10.0.1.250 dev eth0 src 10.0.1.210 cache <redirected>
bingo. Cached, ICMP_REDIRECT from 10.0.1.250, which no longer exists. I didn’t bother to check how long the timeout is for these cached entries, however it was longer than I would have expected, especially since I troubleshot this for more than 15 minutes (*cough* *cough*).
In any case, I learned a new command to zap these, and thankfully, stuff started working again:
root@host1 ~]# ip route flush cache
And with that, my troubles were gone
[root@host1 ~]# ip route get 10.0.11.220 10.0.11.220 via 10.0.1.251 dev eth0 src 10.0.1.210 cache
from my reading of various Googles, the gc_timeout is what defines the actual timeout:
[root@host1~ proc]# cat ./sys/net/ipv4/route/gc_timeout 300
I can safely say my troubleshooting went way past this timeout, assuming it is seconds and not minutes, and therefore there is some wonkyness beyond I need to check. In any case, knowing how to clear the cache is evidently useful as well!