Some time ago, I posted on the configuration of an IRF independent resilient fabric with the HPE Flex Fabric FF5700 datacenter switches. During the operation some things arose to my attention which either have been corrected or perhaps not necessarily clear from the first place.
1.) Activate MAD
There needs to be a mechanism to detect multiple actives. This could be considered something like a quorum for the switch-cluster, intended to prohibit split brains. In my case I preferred to do so with LACP. This brings MAD (Multiple Active Detection) to the layer two and is rather simple. It should be configured on an appropriate Bridge Aggregation Group – resulting in an configuration like:
interface Bridge-Aggregation1
description DOWNLINK_SOMESTRANGESWITCH_WITH_MAD_ENABLED_ON_THE_SAME_LAG
port link-type trunk
port trunk permit vlan all
link-aggregation mode dynamic
mad enable
so specifically pay attention to the last command:
mad enable
Important is to activate MAD on both sides of the switch-to-switch connection and spread the LAG to more than one cluster member. Nevertheless this is easy to use and works quickly.
Alternatively there is a method on Layer 3 – so called BFD-MAD referring to Bidirectional Forwarding Detection – available. Nevertheless I decided against this option since this is often conflicting with STP and according dependencies are hard to cover. I got the impression this is error prone within a network whose topology is contignously changed.
2.) Two Node Cluster Topology
Seems I was unclear on this one prviously, but I got some feedback here. To be 100% precise the Two Node IRF is in difference to any other IRF configuration no ring. According to my experience it can’t be, allthough the configuration seems to work in the first place.
Even worse – The configuration needs to connect IRF port 1/1 on Member 1 to IRF port 2/2 on Member 2. There is no other way, which will work reliable, allthough the configuration might be accepted. Technically after the appropriate configuration the IRF link status should look similar to this:
[COREIRF]display irf link
Member 1
IRF Port Interface Status
1 Ten-GigabitEthernet1/0/1 UP
Ten-GigabitEthernet1/0/2 UP
Ten-GigabitEthernet1/0/3 UP
Ten-GigabitEthernet1/0/4 UP
2 disable --
Member 2
IRF Port Interface Status
1 disable --
2 Ten-GigabitEthernet2/0/1 UP
Ten-GigabitEthernet2/0/2 UP
Ten-GigabitEthernet2/0/3 UP
Ten-GigabitEthernet2/0/4 UP
and the according topology shows:
[COREIRF]display irf topology
Topology Info
-------------------------------------------------------------------------
IRF-Port1 IRF-Port2
MemberID Link neighbor Link neighbor Belong To
2 DIS --- UP 1 00e0-fc0f-8c02
1 UP 2 DIS --- 00e0-fc0f-8c02
You may notice that the according ports IRF port 1/2 and IRF port 2/1 are shown either disable or DIS.
This is definitely the working setup allthoug with four links you might conveniently configure a ring with even redundant links.
So far these are my major two points to follow up on FF5700 IRF configurations.
K.y.p. Frank
Hi,
thx for clarification. Also, I just tried setting up an IRF-Stack with two FF5700 and somehow I am still struggeling. As described here, I just have 10 GbE interfaces. Frist I tried to just use two links for IRF, wich could not be activated. Seems like, with 10 GbE interfaces, it has to be four links configured for irf-port1/1 on member 1 and four for irf-port2/2 on member 2.
IRF is now configured but, as soon as I enable those ports, the switch does a reboot. After the reboot, ports are again in ADM-state and disabled. Both switches don’t display the other member in irf topology. Is there something I am missing here?
How do I need to connect those ports? I tried 1/0/37 to 2/0/40; 1/0/38 to 2/0/39 and so on. Is there any rule?
I got a german thread running:
https://administrator.de/forum/hpe-comware-irf-stack-676578.html
(feel free to remove link if forbidden)
Hi,
First – that was 7 years ago and the firmware is accordingly old and may behave different. Unfortunately I retired these switches a while ago and can not doublecheck anymore what might be working today.
As i reviewed on administrator.de you specifically have the IRF port 1/1 on member 1 and IRF port 2/2 on member 2. I remember that this had to be right and the systems reacted picky if connected different (1/2 to 2/1 should work but neither 1/1 to 2/1 nor 1/2 to 2/2). As well it looks like member renumbering and priorities are set properly too.
After boot definitely check that the member renumbering was persistent. Otherwise this will force the fallback you describe.
What I can not remember is whether it could work with less than four links – but all I remember from comware I see no reason, why this should be a hard requirement.
What I did not see on administrator.de is that you set the MAD (multiple active detection). This should prevent a split brain. The implementation in 2018 was quite a bit rustic, nevertheless this is absolutely necessary and can result in the described behavior.
You may research if there is a more modern MAD implementation in newer firmware versions. I would hope for a lldp-mad implementation as they are now used on the aruba side of the world (and probably many other brands too).
I have redone it all from scratch and now it is working (Four links). My current theory is, that I missed to save the config at some point after enabeling the ports and connecting cables. But it may also be something different.
Firmware was updated to latest version on both devices before I even started with stacking.
I took a brief look at MAD. I’m somehow not convinced this is really needed. The stack consists of only two members in a small buisness setup. From what I understand, the stack should be able to recover from stack disconnect / reconnect without active MAD due to priority settings, shouldn’t it?
Thx for the quick response and yes, networking equipement will survive for quite some time longer 🙂
To trigger conditions that make meaningful use of MAD are hard to trigger and harder to test. But I have seen split brain situations with comware based switches out in the wild, which MAD should have helped to prevent.
But for this being relevant you need multi chassis link aggregation and spanning tree or similar topologies in place. And maybe easier topologies make more sense. Since back then I tend to simplify our network topologies, and I only have it in the VSFs on Aruba 2930F.