Netflow can be a handy tool to use on the network when you need to see information such as total number of flows/sec, bps per source/destination pair, and top talkers. It can also be used by service providers for usage based billing purposes or for help in mitigating DDoS attacks etc. etc. Here I will show you how to configure netflow on Cisco Nexus 7k and Catalyst 6500 based switches for both IPv4 and IPv6.
There are two main versions of netflow that I will talk about and what their differences are. The are V4 and V9 netflow. V4 has been around for a while and only supports IPv4. It is also a fixed format in that the info you can get from a V4 netflow record is all that you will ever get. There is no expandability built into V4 to allow it to support future data types.
Netflow V9 on the other hand supports both IPv4 and IPv6. It is also template based. This means you can customize the data being sent to your collectors. You can pick and chose which data fields you want to see. V9 is also extendable in that new data types can be added to the template as required in the future.
Lets get to the nuts and bolts of actually getting some netflow data off of our switches and into a collector so that we can search on it and visualize it if we so choose. For my examples below I will be using V9 as I like to collect both IPv4 and IPv6 statistics as I am running a dual stacked network.
Cat 6500 Config
Here is the basic config to get netflow v9 running and exporting flow data to your collector of choice:
Here is what is going on in the first few lines of config. We are enabling per VLAN flow data to be collected, setting both the IPv4 and IPv6 flow masks to interface-full. Setting the mask to this exports all netflow data captured. There are several flow masks you can configure, each one sending less and less data. Setting the MLS aging timers is a good idea to get a more accurate view of flows as netflow data is not sent for a flow until it has timed out. For long lived flows, this is 32 mins by default. Setting the long timer to 300 means that long lived flows will be timed out in the netflow cache every 5 minutes thus allowing the collection of data for that flow to happen much sooner. I have set the normal timers to 120 just to age out inactive flows a little quicker than the default of 300 (5 minutes). Finally we enable the export of netflow data.
Here we are specifying netflow updates should be sourced from vlan 100, they are to be version 9 records, due to the fact I am exporting both IPv4 and IPv6 flows and my netflow collector is at 10.10.10.10 and listenign on port 9995.
Now we configure our interface to collect netflow data.
That is pretty much it. If you have tons of flows you could also set up sampled netflow. Where 1 out of every x packets are sampled for the netflow chache. This will improve CPU utilization on the switch and will be less flow data to have to store on the netflow collector as well.
Nexus 7k Config
Configuring netflow on a Nexus 7k is a bit different in that Nexus supports what is called Flexible Netflow. Here we configure netflow in a more modular approach. We configure Flow Records which state what data we want collected, Flow Exporters that specify where to send the flow data and what version of netflow to use and Flow Monitors which are placed on the interfaces where we wish to capture flow data. You can optionally specify a Flow Sampler which will do what its name implies. Capture sampled netflow data instead of looking at every packet.
Here I am configuring two separate flow records, one for IPv4 and one for IPv6 as you cannot have both IPv4 and IPv6 protocols in the same flow record. I am matching on specific keys and then collecting data based off of those keys.
Now lets create the flow exporter and flow monitors. Again you must create a seperate flow monitor for IPv4 and IPv6.
Finally we configure the flow monitors on the interfaces we wish to obtain netflow data from:
That’s it! You can verify you are collecting flow data by issuing the following commands. On a 6500 the command is:
show mls netflow ip – Shows the flow cache table.
show mls netflow ipv6 – Shows the flow cache table for IPv6.
show mls nde – Shows netflow export info.
For the Nexus the verification commands are:
show hardware flow ip – Shows IPv4 flow info
show hardware flow ipv6 – Shows IPv6 flow info
show flow exporter – Shows flow export info
Finding a host in a CE (Classic Ethernet) switched datacenter is a simple matter of showing the MAC address table on a switch and following the port that MAC is seen on until you end up at an access layer switch the host you are looking for is connected to. The command “show mac-address address aaaa.bbbb.cccc results in a nice easy to read output as such:
# sh mac address-table address 90e2.ba5b.3f90 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link, (T) - True, (F) - False VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID ---------+-----------------+--------+---------+------+----+------------------ * 800 90e2.ba5b.3f90 dynamic 30 F F Po1118
As you can see, we get the VLAN and what port the MAC shows up on. Now, when we do the same thing in a Fabricpath topology, we get a little different results. I am running this command from one of my spine switches in the Fabricpath topology. (I’m also using a different MAC as I’m on a different switch running Fabricpath):
# sh mac address-table address 000c.29af.42a7 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link, (T) - True, (F) - False VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID ---------+-----------------+--------+---------+------+----+------------------ 100 000c.29af.42a7 dynamic 0 F F 1002.0.0
Now, instead of the port the MAC address is seen on we get three numbers separated by decimal points. Lets go over what these are. The numbers correspond to the SWID – Switch ID, SSID – Sub Switch ID and LID – Local ID. The SWID is a unique value assigned per switch what ISIS uses to route traffic within the fabric. The SSID is used when VPC+ is configured and is locally significant to each switch. More specifically, it is used to specify exactly what VPC+ port-channel to forward traffic on to reach its destination. Finally, there is the LID also know as the Port ID. This identifies the physical port that traffic was sourced from or is destined to and is also locally significant to each switch in the fabric.
The entry we are most concerned with is the SWID as this will be the destination switch traffic is being routed to and thus will be the switch our host we are trying to find is connected to. In order to find that switch, lets find out what port (or ports) the switch is accessible by. To do this we show the Fabricpath route for the SWID in question. I am running this command from one of my spine switches in the Fabricpath topology.
#sh fabricpath route switchid 1002 FabricPath Unicast Route Table 'a/b/c' denotes ftag/switch-id/subswitch-id '[x/y]' denotes [admin distance/metric] ftag 0 is local ftag subswitch-id 0 is default subswitch-id FabricPath Unicast Route Table for Topology-Default 1/1002/0, number of next-hops: 2 via Eth1/1, [115/10], 97 day/s 21:40:34, isis_fabricpath-default via Eth2/1, [115/10], 54 day/s 10:45:17, isis_fabricpath-default
Looks like SWID 1002 is accessable via Eth1/1 and Eth1/2. Usually, a switch ID will have only one next hop per spine switch but int this case, the switch ID 1002 is actually a vpc Fabricpath switch ID associated with a vpc pair of switches thus it is reachable from either switch in the pair. No matter as the next step is the same regardless. We need to figure out what switch that is and its management IP so we can continue on our way. I will do this by showing the CDP neighbor info of one of the ports listed.
# sh cdp neighbor int e1/1 detail ---------------------------------------- Device ID:5f-n6001-a(abcdef12345) System Name: 5f-n6001-a Interface address(es): IPv4 Address: 10.18.0.14 Platform: N6K-C6001-64P, Capabilities: Router Switch IGMP Filtering Supports-STP-Dispute ..... MTU: 1500 Physical Location: snmplocation Mgmt address(es): IPv4 Address: 10.18.252.14
I cut some of the output but you see we have the management IP’s of the device so lets get on there and check the MAC address table to see if our host is connected to a port that is on this switch.
5f-n6001-a# sh mac address-table address 000c.29af.42a7 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID ---------+-----------------+--------+---------+------+----+------------------ * 100 000c.29af.42a7 dynamic 0 F F Po200
Looks like our host is connected to this switch in the fabric on port channel 200. We can now show what ports are bundled in the port channel to get the physical ports our host is connected to.
5f-n6001-a# sh port-channel summary int po200 Flags: D - Down P - Up in port-channel (members) I - Individual H - Hot-standby (LACP only) s - Suspended r - Module-removed S - Switched R - Routed U - Up (port-channel) M - Not in use. Min-links not met -------------------------------------------------------------------------------- Group Port- Type Protocol Member Ports Channel -------------------------------------------------------------------------------- 200 Po200(SU) Eth LACP Eth1/25(P)
So in summary, we looked up the MAC of the host we were searching for which returned a SWID that is the source/destination of traffic to/from that MAC address. Looked up the Fabricpath route to that SWID which gave us the port(s) that the switch ID is reachable via. Then we obtained the management address of the switch on the other side of those ports and checked to see if the MAC we were looking for was directly connected to that switch. If so, then we get a port that the MAC is seen on. If not, then we would have gotten another SWID.SSID.LID and gone from there.
There has been much talk about needing to move to IPv6 for a while now. With the last IPv4 allocations being handed out, reality is now sinking in that moving forward, IPv6 will be the way we start addressing our networks. Yes, IPv4 will undoubtedly be around for a while as carriers use CGN to work around the limited address space they have, but moving forward, IPv6 will see more use.
I have had the good fortune to be able to turn up and test IPv6 on a new core network that sits alongside the current network. In this network I am running MPLS L3 VPNs so the actual core is IPv4 only due to LDP not having support for IPv6 currently but that has no impact on running IPv6 within each VRF. Running IPv6 within a L3 VPN will be for another blog post though. In the next few posts, I will talk about how we can easily get IPv6 up and running. Assuming the infrastructure supports it of course! First stop, addressing.
Once you have recieved your prefix from ARIN (most likely a /48) what in the world are we supposed to do with it? The most visible difference between IPv6 and IPv4 is the larger addresses: IPv4 addresses are 32 bits long and written in ‘dotted decimal’ format i.e. 192.168.1.1. IPv6 uses 128 bit addresses, written in 8 groups of 16 bit blocks (words) written in hex and separated by colons – i.e. 2620:0000:691:1001:0000:0000.000a:0001.
When writing addresses, within each block of 4 hex digits the leading zeroes may be omitted. In addition, once per address (and no more) a contiguous set of one or more blocks which are all 0000 may be replaced by a double colon. So the above address can be rewritten as: 2620:0:691:1001::a:1.
Netmasks are written in the same mask length format as IPv4, with /length after the address. For example, a /126 subnet has 126 bits of network address, leaving 2 bits for host addresses . The bottom address being the network anycast address, although this is not required. There is no top address for a directed broadcast).
When subnetting, always remember to subnet on nibble (every 4 bits ) boundaries so as not to have a subnet end in weird places, i.e. having a subnet end on 2620:0:691:7:: and the host portion begin on 2620:0:691:8::
Also, unless you have special circumstances, always use /64’s as end host networks. Not using /64 as your network will break things such as Neighbor discovery, privacy extensions etc. Click ipSpace.net has a good post on why using /64 networks is the way to go.
There are several different addressing methods for IPv6 as well:
Stateless Autoconfiguration (SLAAC)
In this scenario, a host will address itself based off of a network prefix learned from an advertising router via periodic router advertisements (RA’s). The host will append its MAC address (with FF:FE inserted in the middle) and this will form the 128 bit host address. This could be a security concern as the hosts MAC address is easily obtained from this address so most operating systems replace the MAC with a randomized unique identifier.
Lets talk about the RA for a second. If you only configure SLAAC on your router you aren’t manually specifying a DHCP pool with default gateway etc as with IPv4 for end hosts. So how does your PC know how to send traffic off net? The RA specifies several different parameters to end hosts including:
- IP prefix (or multiple prefixes)
- Flags (such as used DHCP to obtain DNS info)
- Default gateway
In this scenario, no config flags are set in the RA messages. What about DNS? This is either handled manually on each end host or you can do SLAAC with DHCPv6.
SLAAC with Stateless DHCPv6
In this scenario, the Other-Config-Flag is set in the RA to tell the host to use SLAAC to get an address and use DHCPv6 to obtain other parameters such as DNS. The DHCPv6 server is stateless in that it does not keep track of what IPv6 address each end host has.
In this scenario the Managed-Config-Flag is set in the RA and tells the end host to use DHCPv6 only to obtain an address, DNS etc.
This option is really only useful for server farms etc where you don’t want to have these addresses randomly assigned. This can be accomplished using the no-autoconfig flag enabled. This sets the A bit to zero in RA’s so that autoconfig is not used by the end host to assign itself an address.
In my next post I’ll go over the configurations to make it work within a SLAAC and Stateless DHCPv6 environment.
Now that I have moved a significant portion of my enterprise network to our new core based on Nexus 7k switches, I need to start thinking about how to implement QoS as I am in a healthcare environment and some traffic MUST make it to its destination regardless of congestion. Currently, I don’t see any congestion on our network but traffic is always increasing and a good QoS policy will keep the business running.
Here are some important things to remember:
– QoS is on by default and cannot be turned off.
– There are default class-maps and policy-maps configured and applied to all interfaces by default.
– default policy-map cannot be modified.
– Ports trust CoS by default and use it for inbound and outbound queue selection.
– The 7k assigns EGRESS traffic to queues based on CoS even if the egress interface does not carry CoS in the frame (i.e. it’s a L3 interface).
– Interface based QoS takes precedence over VLAN based
– VLAN based QoS policy is configured in VLAN Database, no SVI is required.
There are four default rules to remember for CoS/DSCP and queue assignment. I copied these verbatim from the BRKDCT-3346 slide deck presented by Lukas Krattiger at CLUS 2013:
For Bridged Traffic:
If CoS and DSCP is present:
– CoS is used for ingress queue selection
– CoS is preserved
– DSCP is unmodified
– CoS is used for egress queue selection
If only DSCP is present:
– No CoS is treated as CoS 0 on ingress
– CoS 0 is used for iungress and egress queue selection
– DSCP is unmodified
For Routed Traffic
If CoS and DSCP is present:
– Cos is used for ingress queue selection
– DSCP is preserved and rewrites CoS (top 3 bits)
– Re-written CoS is used for egress queue selection
If only DSCP is present:
– No CoS gets treated as CoS 0 in ingress
– DSCP is preserved and rewrites CoS (top 3 bits)
– Re-written CoS is used for egress queue selection
Nexus 7k’s do have a default QoS policy in place that is applied to all interfaces in all VDC’s which assigns COS 5-7 on the ingress side to queue 1 and everything else to the default queue. On the egress side COS 5-7 is assigned to the egress priority queue and everything else to the default queue as shown below:
The above queue structure is from a M2 10gig module which has 8 ingress queues with 2 tail drop thresholds per queue and on the egress side there is 1 priorty queue and 7 queues with 4 tail drop thresholds each. Each module is different though. For example, a M148GS-11L module which has 48 1gig ports has a 2q4t ingress queue structure and a 1p3q4t egress queue structure. The class maps for it would resemble the one above except with fewer queues to map to.
So back to the defaults, there is also a default policy map applied to all ports using the above COS to queue mappings:
You can see on ingress, 50% of the queues are allocated to queue 1 and the other half to the default inbound queue. On the outbound side, pq1 is defined as the priority queue given a percentage of the queue space as are queue 3,4 and the default queue. Notice the names don’t match up with the actual class-map names. These are default class-map names that match on any queue type, for example:
Where do I configure these things?
– Class maps for queue selection are done in the admin/default VDC and apply to all interfaces across all VDC’s.
– Policy maps are configured per VDC and are assigned on a per-port or per-vlan basis.
If you configure a custom class map, be sure to assign queue-limits within your policy map for your queues. If you do not, no queue space will be allocated for that queue and you will start tail dropping traffic assigned to that queue immediately.
I will post more on this topic as I flesh out my QoS policy a little more and we can get into creating custom class-maps and policy-maps.
I had an issue where I had a need to do some PIM debugging recently on the Nexus platform in an MPLS environment and there are some nice features that make it pretty handy to use.
First off I did what I usually do and forgot to specify the VRF I wanted to actually debug PIM in so received no log messages. I looked for a bit to see where I could specify the VRF but there wasn’t any option under the debug command:
That VRF option is not the one to specify the VRF, it debugs just what the description says it does.
Then I found you can specify debug-filters! Here is where you can specify which VRF to actually apply the debugging command to, along with alot of other filtering options:
As you can see, you can filter your debugs to a specific multicast group and/or interface as well. Now that we have applied the debug filter to the user VRF, now we can turn on the specific PIM debugging we want to see:
Now this produced a lot of output so you could have applied another debug-filter to filter on a specific address, packet direction, interface, etc. It is pretty easy to extract exactly what you want to see in the debug output.
You can also send the output of a debug to a file like so (I named the file “test”):
Configuring QoS has been much improved in Cisco’s new 3850 line of switches thanks to its implementation of MQC (Modular QoS Cli) configuration instead of the old “mls qos” commands from the 3750 and 3560 lines of switches. Here I will show just how easy it is to generate a simple QoS config to handle voice and video traffic.
First we define what traffic we would like to work with. In this case it is voice traffic marked as dscp ef and video traffic marked as dscp af41.
Easy enough. Now lets configure the policy maps to actually do something with this traffic. I will configure two separate policy maps. one for user facing ports and one for the uplinks to my distribution layer.
I would like to prioritize both voice and video traffic over everything else but I want voice to be prioritized over everything. The 3850 actually has 2 priority queues that enable me to do this, a level 1 and level 2 priority queue.
Above I have separate policy maps based on if it’s a user or uplink port. For user ports, voice traffic is put in priority queue 1 and guaranteed 1% or 10Mbps of throughput. I also capped it at 10Mbps with the police command to prevent this queue from hogging bandwidth. Same process is followed for voice. It is placed in priority queue 2 and it is guaranteed 50Mbps throughput. On the uplinks, the only change is the percentages of guaranteed bandwidth has been increased.All other traffic gets 100% of the remaining bandwidth.
All that is left is to apply these policy maps to some interfaces.
Done! We can even get some meaningful stats now as well. I haven’t pushed any traffic through this switch yet but you get the idea.
As you can see pretty painless QoS implementation. Of course you can get much more complicated if needed with nested policy-maps, table maps etc. Here is the IOS-XE QoS documentation if you want to delve further into QoS IOS XE QoS Config Guide”
While working on some QoS configuration, I hit a strange issue with some Cisco 3850 switches. I saved my config and shut down the switch only to have it boot into rommon the next day when powered back on
Hmmm, not what I expected. I looked in the directory to try and find the .bin file to boot the switch byt was presented with some .pkg files instead.
Hmm, no .bin file here. After some digging, you actually boot with the packages.conf file which then invokes the necessary .pkg files and off you go.
This behavior is actually a documented bug in the 3.2.0SE code. The bug ID is CSCue76684. The workaround for this is to manually specify the packages.conf file in the boot statement in the config.
This bug is fixed in 3.2.1SE. Hopefully this saves someone some time when deploying these switches.