CategoryTroubleshooting

How My Network Broke Today (Part I of atleast a billion)

So today I went to spin up a new VM for development use. It wouldn’t get an IP address, I saw the DHCP request on the DHCP server, and saw an offer go out but it was never received.  I dug through, and it seemed like this was just happening on one VLAN since everything else was OK.

Did I mention everything else was already running?

Did I mention if I had a trap collector with an alarm board that I would know what had happened almost immediately and been able to pinpoint the issue before I even saw the effects?

No? Well, now I have.

Let’s just say that I spent over an hour digging, running tcpdump on various interfaces, then finally hit the switches. I noticed there was only one port in the port channel on the Dell 5224 access switch when there should have been two down to the distribution switch. Odd but I thought inconsequential (at the time).

I got into the Cisco switch and saw MAC flaps (TRAPPABLE) all over the place with Po2. Odd again. The Dell switch must be to blame, so I go back to it and shut the port that’s not in the LACP port channel but should be. Things improve. Have I mentioned that I’d unplugged that fiber a week ago and only recently got a new one to plug back in?

I spend some time trying to get both ports in the port channel to no avail. I finally look at the config and notice the VLAN allowed config is slightly off (one is missing from eth 1/23), so I shut both the ports on the Cisco side as Dell won’t let you change interface configs while it’s part of a port channel and this was just faster — I reset the eth 1/23 config to match eth 1/24, and voila both ports came up.

But things were even worse now, barely any MACs were seen in ‘show mac address-table’ on my 3550-12 from Po2. And they were all on VLAN 1. Ugh viagra ohne kreditkarte. I shut the interfaces again and reset some more of the configuration on the Dell switch. I pray. (I don’t really pray). I bring the interfaces back up and all is good. The VM gets its IP address and everything is right in the world.

I really hate the Dell configurations. If I hated this switch before it’d be an understatement, and it’s only given me more of a reason to want to smash it with a hammer today. It’s mainly due to me not being familiar with them, but their configs aren’t as intuitive as I’d like.

VMs, Linux Software Bridges and 802.1q — What I Learned This Time

When initially setting up the box, I had the idea in my head that I might create several bridges. One for each VLAN. That’s probably one of the best ways to tackle the issue unless you really want the trunk to exist on the VM –which is also fine and valid. But by default it gives every device the option of accessing any VLAN in the trunk, which since we’re in a lab environment is not particularly an issue.

But I like to work reality into lab mockups as much as possible. I have plenty of NIC ports, so even creating a lot of trunks is not an issue. But our VMs will accept a large number of virtual NICs, so this option seemed semi-elegant.

The first issue I ran into was crosstalk between the VLANs, I had created a bunch of 802.1q sub-interfaces (which strip/tag incoming and outgoing frames) via ‘vconfig’ or ‘ip link’. I attached p32p1.10 to br10 and p32p1.1 to br1. I attached tap0 to br10 and tap1 to br1. Everything appeared to be working on the very initial configuration until I saw the output of ‘sh cdp nei’ on the physical Cisco 3550. It saw itself. That meant it was receiving bounceback. So I loaded up tcpdump and watched bridge traffic and examined the macs in the Linux software bridge. There was definitely cross talk — and after a hunch and a little investigation it turns out that QEMU doesn’t do much to separate NIC traffic as I called them with the ‘old’ syntax. After updating my QEMU launch options the problem disappeared and I was happy… until…

Neither OSPF or EIGRP were forming neighborships. Load up tcpdump again, examine traffic. I see the packets hitting br1 and br10 from XRv, and from both the 3550 and the 2821 that I’m currently configuring. That looks good.. but ‘debug ospf packet’ on XRv was not giving me anything aside from what it was sending out. So I aimed tcpdump at the tap interfaces instead, and I saw that the tap interface was not receiving the HELLO packets on either VLAN. (Hint: Here is where I went wrong in my diagnostic chase, I had filtered tcpdump down to only EIGRP/OSPF, had I not the problem would’ve been almost immediately evident)

Thinking for some reason with no basis in fact that it may be a multicast issue with Linux software bridges, I decided to configure neighbors manually. That also resulted in a neighborship not forming with XRv to any other box. Other traffic (ICMP/TCP/UDP) was unaffected, so I thought that was interesting. I started watching the interface again, this time with no filter — and I saw the VM host replying to XRv with an ICMP Unreachable. Pretty clearly a firewall rule problem. Ebtables (iptables for layer 2 stuff if you haven’t seen it) was clear, and I didn’t see anything immediately in iptables.. but it’s always faster to test fixes than to examine things (and in a lab, perfectly acceptable), so a simple iptables -I FORWARD -j ACCEPT while removing the manual neighborships so EIGRP and OSPF would both go back to using multicast resulted in everything working viagra aus holland bestellen. Great! Classic implicit deny caught me.

This is where I usually get annoyed by pre-configured rules. Usually I load up Slackware, but I’ve been using Fedora lately for ease of getting some things up and running with real dependency management. Had I been using Slackware with its default no rules everything would’ve been honky dory, and I would’ve configured some myself when I felt the time was right.

To continue — I tried to make a new bridge, trunk0, with p32p2 in it. I load up tcpdump and notice that there’s no traffic aside from STP on some VLANs that aren’t in active use. Apparently configuring those subinterfaces whisks the frames away from the main interface, so I just deleted all of them off of p32p1, configured another trunk port and added that to trunk0 on the Linux box and voila! Tagged packets from everywhere! I have yet to try a tap interface into trunk0 and a VM, but I have a feeling everything should be all right. Then again, every time I have that feeling is usually when things are about to go terribly wrong.

© 2017 Musings

Theme by Anders NorenUp ↑