Juniper SRX Chassis Cluster RG0 Nagios Check

I was required to check (as this customer did not have a trap collector) which node was active for redundancy group 0 on a SRX cluster. So I thought I would check for a SNMP OID that is only presented by the active RG0 node. This script uses snmpwalk and is configured to use SNMP v2c (this can be easily changed). It has been tested on:

  • CentOS 5
  • Junos 11.4R2
  • SNMP v2c

Here is the little hacky shell script:

[bash]
#!/bin/bash

# Cooper Lees <me@cooperlees.com>
# Dirty Cluster RG0 checker
# Lasted Updated: 20120818

HOST=$1
COMMUNITY=$2

if [ "$HOST" == "" ] || [ "$COMMUNITY" == "" ]; then
echo "ERROR: No host or SNMP community specified"
exit 2
fi

SNMPOUTPUT=$(snmpwalk -v 2c -c $COMMUNITY $HOST 1.3.6.1.4.1.2636.3.1.14.1.7)

echo $SNMPOUTPUT | grep "INTEGER: 2" > /dev/null
if [ $? == 0 ]; then
echo "Host $HOST is the Chassis cluster ACTIVE RE"
exit 0
fi

echo $SNMPOUTPUT | grep "No Such Object available on this agent at this OID" > /dev/null
if [ $? == 0 ]; then
echo "Host $HOST is the INACTIVE RE"
exit 2
fi

echo "WTF – Something is not right …"
exit 3
[/bash]

It checks for the “jnxRedundancyState” OID – this OID reports on RE states and is only accurate on Junos routers (e.g. M and MX series etc.).

Enjoy …

VMWare Guest Consoles over a WAN with Latency

Have you ever used the VMWare console over a WAN with latency and it enters multiple key strokes into the console and makes using the console super annoying! It makes me HATE VMWare and want to smash it into 10000 pieces with a baseball bat.

Well the answer is to add a line to your VMs VMX file to allow it to be ‘laggier’. For example the following will give you 2 second between key strokes:

  • keyboard.typematicMinDelay = “2000000”

For more information: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=196

SRX Branch Chassis Cluster Ports

Here is a table of the ports that are used for chassis cluster control link and management ports on Branch SRX devices.

The quoted ports are the ‘stand alone’ non clustered port names (not node1’s port names once clustered). In a SRX cluster the PIM slots on node1 start at the last PIM slot of node0 + 1. For example, a SRX240 cluster’s node1 starts at PIM 5. It’s control link port is effectively ge-5/0/1).

Model FXP0 (Management) FXP1 (Control Link)
SRX100 fe-0/0/6 fe-0/0/7
SRX210 fe-0/0/6 fe-0/0/7
SRX220 ge-0/0/6 (> 11.0) ge-0/0/7
SRX240 ge-0/0/0 ge-0/0/1
SRX550 ge-0/0/0 ge-0/0/1
SRX650 ge-0/0/0 ge-0/0/1

 *fab0 and fab1 interfaces (Data Link) are always configurable, e.g.:

  • set interfaces fab0 fabric-options member-interfaces ge-0/0/2
  • set interfaces fab1 fabric-options member-interfaces ge-5/0/2

Backup your Junos configs TODAY !

Cooper’s tip of the moment, ALWAYS backup your Junos configurations. Hate when a customer does not, your router does not have raid (unless it has redundant REs, VC or is in a Chassis Cluster :)). It’s a built in feature of Junos so use it! It even allows multiple sites, so if you have DR site with storage – Push it there too!

Here is the conf:

[plain]
set system archival configuration transfer-on-commit
set system archival configuration archive-sites "scp://junos@x.x.x.x/data/configs/DEVICE" password "bla"
set system archival configuration archive-sites "scp://junos@y.y.y.y/data/configs/DEVICE" password "bla"
[/plain]

More info: http://www.juniper.net/techpubs/en_US/junos9.5/information-products/topic-collections/swconfig-system-basics/junos-software-system-management-router-configuration-archiving.html

QFabric Part 2 – Lets get Down and Dirty Deploying and Configuring …

Juniper is selling QFabric as a bundle. Due to this the install has been templated and will be similar in regards to the control plane and getting the fabric up and ready to be configured for each target environment. Every QFabric bundle today must include Juniper Professional Services. Hopefully in the future I (and other partner engineers) will be seen as smart enough to do a QFabric install without Juniper’s assistance. I think I could manage it :). Here is the procedure that you and your friendly Juniper Professional Service engineer will complete to get QFabric up and running.
Building the EX4200 VCs for Control Plane
Juniper have extensive configuration documentation with example configurations for the Control Plane for the QFabric components. Please read Juniper’s article “Configuring the Virtual Chassis for the QFabric Switch Control Plane” for instructions on building the control plane infrastructure. I will not go into specifics in this blog post for the control plane.
Initial QFabric Components Deployment
This is my recollection and notes taken during the demonstration and explanation from Juniper’s current most experienced QFabric installer in APAC of the basic process of getting your QFabric up and running.
  1. Check BOM against what’s been received and power all equipment and test for hardware issues
    1. Also ensure directors and interconnects are the same version (should not be a problem yet but as ‘newer’ builds come old stock might pop up)
  2. Build and power ex4200 VCs for control plane
    1. I would recommended to upgrade to the JTAC recommended version of Junos on your 4200s
  3. Patch Up directors into control plane VCs and boot the desired ‘master director’
  4. Complete the console initialisation and then after ~60 seconds boot the slave and complete it’s initial configuration
  5. Patch directors into correct control ports and boot
  6. Turn each node into ‘fabric mode’
  7. Patch into each interconnect and boot each node
    1. The directors will adjust the version of Junos if required on the QFX3500 node
  8. You now have a functional QFabric and can now beging to alias nodes and add them to network/server groups

Configuration (all centrally from the Director)

To build a new Fabric you need to

Create aliases for nodes

  • set fabric aliases node-device SERIAL ALIAS_NAME
Create node groups
Always have 1 network-domain and 1 server group (max 2 nodes per server group making it a redundant server group)
  • set fabric resources node-group NW-NG-0 network-domain
  • set fabric resources node-group NW-NG-0 node-device ALIAS_NAME_X
  • set fabric resources node-group PRON-NG node-device PRON_SW1
Further configuration is ‘like’ a normal EX style configuration, but using the new interface names, for example:
Interface: NODE_ALIAS:xe-0/0/1.0
Aggregated Interface: NODE_GROUP:ae0.0
Handy Debug Commands
  • show fabric administration inventory director-group status all
    • See the directors status and who is master
  • show fabric administration inventory [terse]
    • Shows all the hardware the directors have found and are including in the QFabric
  • show chassis fabric connectivity
    • Shows the connectivity through the interconnects to each nodes
  • show fabric aliases
    • See the serial to alias mappings
  • show fabric inventory
Checking VLANS, the ethernet-switching table etc. commands are all identical to the Juniper EX Switch family.
Power On Sequence
  1. ex4200 Control Plane VCs
  2. QFabric Interconnects
  3. Director Master
    1. Election of master is based on uptime. Wait for ~60 to boot secondary director node
  4. Nodes
    1. I have not tested this, but I would power the network group first, with the members I would prefer to be the masters of the ‘vc’ first (remember each group with multiple members is an incarnation of VC – same rules apply)
Extra Functions

Node Replacement

Replacing a node, and keeping the configuration is EXTREMELY easy due to the ‘replace pattern’ feature of Junos.

  • Repatch cables
  • replace pattern OLD_SERIAL with NEW_SERIAL
  • commit

QFabric Part 1 – Explained and Explored First Hand

I was lucky enough to be one of the first APAC partner engineers to get my hands on Juniper’s new QFabric gigantic scalable switch technology. I have even beat some of Juniper’s own SEs. In general, it rocks, but does have some features and fine tuning, this will come. This post is an introduction to QFabric, with my likes, dislikes and feature wish-list.

I would like to thanks Juniper APAC and Yeu Kuang Bin for this excellent opportunity and very knowledgable training.

Cooper with a working QFabric

What is QFabric?

The most simple explanation of QFabric I can explain is that it is basically a Juniper EX Virtual Chassis on steroids. The internal workings of the switch have been broken apart to be MUCH MORE scalable and Juniper have insured that there are no single points of failure, only selling the design with fully redundant components.

The QFabric components are:

  • Director Group – 2 x QFX3100 (Control Plane)

  • Interconnects – 2 x QFX3008-I (Backplane / Fabric)
    • 2 REs per Interconnect

  • Nodes (Data Plane)
    • Server Groups – 1 – 2 per group

40GE DAC cable (1m,3m,5m lengths)
40GB – QSFP+ (quad small form-factor pluggable plus) – 40 gig uses MTP connector

QFabric Node Discovery

Control Plane

The control plane is discovered automatically, it depends on being configured with a pre-defined Juniper configuration in order to discover the nodes via a pre-defined method when you turn the QFX3500 into fabric mode.

Data/Fabric Plane

The fabric plan is what makes QFabric as scalable as it is. Once again a predefined HA design is supplied and the directors perform the following tasks:

  1. Discovers, builds & Maintains Topology of the Fabric
  2. Assembles the entire topology
  3. Propagates path information to all entities
NOTE: Interconnects DO NOT interconnect to each other
Node Aliasing
Node aliasing allows administrators to give nodes a meaningful name and is used when talking about specific interfaces for specific nodes or node groups
  • Id the nodes via beaconing (the LCD screen) or serial number on chassis.
  • e.g. set fabric aliases node-device P6969-C NODE-0
    • This name is used to reference ports and assign the node to a group (discussed next)
Logical Node Groups
Node groups are used to allow the infrastructure to be divided up and allow the director to know what type of cofiguration to push to a nodes routing-engine. The local routing engine still performs some tasks, predominately to allow scale. A group can contain a maximum of 2 nodes. A group with 2 nodes is know as a redundant server group (It is a 2 node virtual chassis under the covers). Due to this, a redundant server group can have multi-chassis ae (aggregated ethernet) interfaces. There is one other type of group known as the Network node group. This group looks after all routing and l2 loop information, such as OSPF and spanning tree. All vlan routing etc. is done by these nodes.
Group Summary
  1. Network Node Group (1 per QFabric – Max 8 nodes)
  2. Server Group (Redundant Server Group optional – 2 nodes)
    1. Qfabric automatically creates a redundant server group if two nodes exist in a server group (via a form of virtual chassis).
Port Referencing
Now cause each node has an ‘alias’ (discussed above) to reference a port in configuration you now use:
  • NODE_ALIAS:INT_TYPE-x/x/x.x
  • e.g. NODE-0:xe-0/0/1.0

Aggregated interfaces can be deployed – Across chassis in a redundant server group or on one chassis in a server group:

  • GROUP_NAME:ae0.0
  • e.g. RACK-42-1:ae0.0
QFabric can also function with port in FC and FCoE mode. There are some limitations to this feature today, but can provide an excellent mechanism to create redundant paths back through the Fabric to the SAN FC based network. This will be discussed in a dedicated post in my QFabric series.
Summary
QFabric, for a Data Center is ready today and works extremely well. It can allow a HUGE number of 10gb (and soon to be 40gb) ports to allow huge data movement around a DC at low latency. It is also effectively one single point of management for all your nodes, unless something goes wrong of course. For a campus, with end users, QFabric does not have many key features that we use today either with the MX or EX range. It could be used for large campuses as the aggregation or core (especially when more IPv4 and IPv6 routing is supported) and feed 10gb out to EX switches to provide the ‘edge’. The coming ‘micro’ fabric is also interesting, which will allow for a more compelling footprint within a smaller data center.
Key Likes
  • Single switch in regards to management and functionalty
    • No TRILL or other L2 bridging redundancy protocols required
  • Ultra redundant design – Enforced by Juniper
    • No half way deployment, people can’t go in half assed !
  • The simple well thought out HA deployment/design – Common install = easier to debug for JTAC / Engineers like myself
  • Scalability – Can see how big DCs could benefit from having 1 gigantic switch
  • Road map looks good – Key features and hardware are coming
Key Dislikes
  • AFL (Advanced Feature License) required for IPv6 (when it arrives)
    • PLEASE Juniper – Can we have IPv6 for free or I will never get customers to deploy it
    • This really frustrates me … You may be able to tell 🙂
  • Limitation of 1 unit per interface
    • No vlan tagging and multiple units in Network Groups
    • Can work around by turning port into trunk and assigning multiple L3 interfaces
  • The need for legacy SAN infrastructure in order to use FC/FCoE (discussed in part 3)
  • No ability to have a full 48 Copper SFP 1gb interfaces in a node for legacy non 10gig equipment
    • The QFX3500 can not fit physically the SFPs in top and bottom rows
    • This could be handy to keep legacy equipment and as it’s replaced change the SFP to a 10g SFP+
Wish List
  • The Micro Fabric – will allow more use cases
  • Full SNMP interface statistics for all nodes through the director
    • Currently testing this with Zenoss in the Juniper Lab – Has not worked so far
    • The ability to ensure node’s RE’s and PSU etc. are also a plus (have not tested / read the MIBs yet – so could be possible)
  • Be able to downgrade and system wide request system rollback from the director
  • Full Q-in-Q Support
  • Fully self contained FC/FCoE support
To Come in this series:
Part 2 – Deploying and Configuring
Part 3 – FCoE and SAN with QFabric
Part 4 – QFabric eratta (possibly – not sure yet …)

Please note: The information presented here is from my own point of view. It is no way associated with the firm beliefs of Juniper Networks (TM) or ICT Networks (TM).

Junos Aggregated Ethernet w/LACP and Cisco Nexus Virtual Port Channel

So when I was googiling around looking for working configurations of Junos (EX in this case) AE working with a Cisco vPC (Virtual Port Channel) I could not find any examples … So I said that I would post one. I will not be covering how to set up a VPC, if you’re interested in that side visit Cisco’s guide here. I will also not discuss how to configure a Juniper Virtual Chassis (more info here). The devices used in this example are 2 x Cisco 7k (running NX-OS 4) and 2 x Juniper EX4500 switches (running Junos 11.4R1) in a Mixed Mode virtual chassis with 2 x ex4200s.

The goal, as network engineers is to use all bandwidth when it’s available (if feasible) and avoid legacy protocols to stop layer 2 loops such as Spanning-Tree. vPC from Cisco and VC technologies allow LACP (Link Control Aggregation Protocol) links to span physical chassis, allow the network engineer to avoid single points of failure and harness all available bandwidth. If a physical chassis was lost, you would still be operation in a degraded fashion, e.g. 1/2 the available bandwidth until the second chassis returned.

To configure the Cisco Nexus side you would require the following configuration on each vPC configured chassis. I found that VLAN pruning can be happily done and a Natvie VLAN1 is not needed if CDP is not mandatory (I did not test making CDP able to traverse the trunk through the Juniper – Would love to hear if someone does!).

[plain]
conf t

interface port-channel69
description Good practice
switchport mode trunk
vpc 69
mtu 9216
switchport trunk allowed vlan 69

interface Ethernetx/x
channel-group 69 mode active
[/plain]

Handy Cisco Debug Commands:

  • show vpc
  • show run interface port-channel69 member
  • show vpc consistency-parameters int port-channel 69
  • show port-channel summary

The Juniper side would only require the following, this configuration is identical (you just choose different member interfaces) even if you don’t have a Virtual Chassis configuration.

[plain]
set interfaces xe-0/0/39 ether-options 802.3ad ae0
set interfaces xe-1/0/39 ether-options 802.3ad ae0
set interfaces ae0 description "Good Practice"
set interfaces ae0 mtu 9216
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 unit 0 family ethernet-switching port-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members pr0nNet

set vlans pr0nNet vlan-id 69
set vlans pr0nNet l3-interface vlan.69 #If a L3 RVI is required
[/plain]

Handy Juniper Debug Commands:

  • show interface terse ae0
  • show lacp interfaces (you want your interfaces to be collecting and distributing)
  • show interface ae0 extensive

Please let me know if I have done anything that is not optimal – always eager to learn, I am definitely not (and proud of it) a Cisco expert.