I've been running nftables for years. But I've never sat down to get good telemetry for it. For firewalls to get good telemetry coverage I feel you need:

  • Rule Statistics
    • See what rules are getting hit
    • Overly hot rules
    • Rules no longer getting hit (cleanup)
  • Logs + Analysis
    • Summarized traffic reports (look for trends you don't expect)
    • More so important, knowning what traffic is being dropped
  • External Scanning
    • Premptively look for unexpectedly allowed traffic
    • Possibly due to over zealous rules

So I'm going to explain what I've done in early 2026 to improve this and my over all security posture with current tools available for nftables to potentially help you posture your firewalling setup better.

Rule Statistics

To capture rule statistics, since I'm a prometheus / grafana infra I'm using nftables-exporter by metal-stack.

This has allowed me to timeseries graph nft -j list ruleset output in the following ways:

  • Show packet counts for each "chain" per rule

nftables packet counts per rule

Logs + Analysis

For logs I am capturing the dropped packets via drop rules like:

chain LOGGING-VERBOSE {
        limit rate 1/second counter log flags all prefix "verbose-nft-dropped: "
        counter drop
}

chain LOGGING {
       limit rate 2/minute counter log prefix "nft-dropped: "
        counter drop
}

by promtail into Grafana Loki. So I can use logql to query all my dropped traffic. Due to the nftables log lines being key value pairs the built in "logfmt" parser can make many parts of the log line to be "lables" to filter with.

Some things I can ask:

  • How many dropped flows per host are being collected
    • sum by (hostname) (count_over_time({job="varlogs"} |= "nft-dropped:" [$__interval]))
  • Regex away finding flows to port 69
    • {job="varlogs"} |~ "nft-dropped:.*DPT=69"
  • Use logfmt to get counts of destination hosts, protocol and port:
    sum by (DST, PROTO, DPT) (
    count_over_time(
    {job="varlogs", hostname="home1.cooperlees.com"} |= "nft-dropped" | logfmt [$__interval]
    )
    )

This is insaley handly to slice and slice the data to work out rules to potetially add / cleanup.

Jan  3 04:18:26 home1 kernel: verbose-nft-dropped: IN=att OUT=vlan69 MACSRC=0c:a4:02:c9:06:d5 MACDST=bc:9a:8e:88:56:c0 MACPROTO=86dd SRC=2a06:4880:6000:0000:0000:0000:0000:0089 DST=2600:1700:3040:13e1:0000:0000:0000:0069 LEN=64 TC=0 HOPLIMIT=241 FLOWLBL=270679 PROTO=TCP SPT=36289 DPT=55553 SEQ=3101165342 ACK=0 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405A0)
# Legacy IP
Jan  3 05:03:21 home2 kernel: nft-dropped: IN=astound OUT= MAC=66:69:69:69:69:69:6c:9c:ed:92:00:26:08:00 SRC=79.124.56.246 DST=207.229.169.25 LEN=40 TOS=0x00 PREC=0x00 TTL=239 ID=47457 PROTO=TCP SPT=56038 DPT=6961 WINDOW=1024 RES=0x00 SYN URGP=0 

I haven't yet, but could add more logging to get data on what traffic is allowed. I am planning to do this for my new child vlan I plan to create as I add devices for my child.

Loki nftables query example

External Scanning

The next thing to do is to periodically port scan endpoints from sources to see you only get expected results. To do this I wrapped nmap to write scan results into a mysql table, I amazingly called it nmapscanner

This tool takes lists of IPs to scan and write to mysql. I then use Grafana to do SQL queries to see what ports are seen open etc.

NMAPScanner Dashboard

Please comment away any better way to do things and concepts or ways to analyze I've overlook. I'm always ears for smarter and better ways to look at systems I run on my infra.

Keep firewalling all.

systemd progressed the Linux PID 1 situation adding many feature and capabilities. But, as always, with features and capabilities, comes complexity. One thing I and many co-workers have found difficult is getting startup order right, especially with custom targets. First lets define some systemd concepts:

  • unit: a configuration file that describes a service, a socket, a device, a mount point, or other entities that systemd can manage.
  • target: a special kind of unit that groups together other units that need to be started or stopped together in a coordinated fashion.
    • Targets are similar to runlevels in traditional SysV init systems, but are more flexible and can be used to group together any kind of unit, not just services.
  • service: a type of systemd unit that describes a process that systemd should manage

In this blog post, we are going to focused on systemd service units, using After=, Before= and the soft and hard depdency directives, Wants= and Requires= (respectively).

Note: Some AI was used to ask systemd questions to get answers, so please question any inaccuracies. I did try to lab all in a VM to ensure it somewhat seemed correct.

Requires (hard) vs. Wants (soft)

If you use Wants, the service is pulled in with other dependencies, but its failure will not lead to other services failing. When you use Before or After with Wants, it is ensured that the service is started in the right order, but it won't prevent other services from starting if the service fails.

On the other hand, if you use Requires, the service is considered a "hard" dependency, meaning that if it fails, the system will try to restart it and no other dependent services will start until the service has been started successfully.

So, Wants is best for dependencies that aren't critical for the functioning of the system, while Requires should be used for critical dependencies that are necessary for the proper functioning of the system.

Do we always need After= or Before= with Requires= / Wants=

Not necessarily!

While it's common to use Before and After in conjunction with Requires or Wants to ensure that the services start in the correct order, it's not always necessary. You can use Before and After without Requires or Wants. In fact, Before and After are used to specify ordering dependencies between units, regardless of whether there is a dependency relationship. However, if you only use Before and After, without also specifying Requires or Wants, the start-up of the two units will not be coordinated, and the order may change in future versions of systemd.

For example, if you have a service that depends on another service that is started early in the boot process and has no dependencies of its own, you might not need to use Before or After at all. However, if your service has complex dependencies, or if you have other services that depend on it, it's usually a good idea to use Before and After to ensure that everything starts up in the correct order.

So, while it's possible to use Before and After without Requires or Wants, it's generally not recommended.

Requires side effects

When you use Requires to declare a dependency in systemd, it means that the dependent unit is required for the service to function properly. If the dependent unit fails, the system will try to restart it until it succeeds.

However, this can have some side effects. For example, if you have a service that depends on another service that takes a lot of time to start, the dependent service might not start in a reasonable amount of time and may fail.

Moreover, since Requires creates a hard dependency, other dependent services will not start until the service has been started successfully. This can slow down the overall boot process of the system.

So, it's important to carefully consider whether you actually need a hard dependency before using Requires, and if you do, make sure that the dependent unit starts in a reasonable amount of time.

Amazon Link

  • Disclaimer: I get no royalites or anything here - Just had coworkers ask me about it

So since I'm no systems guru and am now working on a Linux Distribution effectively at work I thought I'd read this book. Especially since it relies so heavily on systemd components I wanted to dig deeper and understand more about it's design and why things were done the way they were. So I read this book. I also promised this review to co-workers, so here it is.

sytstemd book

This book is mainly a walk through of:

  • systemd components
    • Configuration options
  • Differences in RedHats + Cannonicals use of SystemD in their respective distros (Fedora/CentOS + Ubuntu repectively)
    • Examples of how to dig in and play with each implementation
    • Some quirky decisions were made

This walkthrough was handy and there is lots of go read the man page for more info. But the biggest win was all the small pieces that completed little gaps in my systemd knowledge. I also like how the author is an old school Linux admin. So you get a decent explanation of what the systenmd componoent being explained aims to replace. For example, SYS V Init and upstart for main systemd, syslog for journald etc. etc. ...

Overall Rating: 7/10

Why?

Good

  • Explains main configuration in enough detail
  • Good labs to do yourself to understand more + play with more
    • It suggests a Rocky VM + Ubuntu
  • Echo'd a lot why something was done
    • Also stated when the author could not find a good reason for why something was done
  • Explaining cgroups v1 vs. v2 was super handy
  • Showing a distro that used systemd-boot was interesting
    • Who's name elludes me right now (I'm sure arch can use it tho)

Bad

  • Some components were hurried over
    • networkd ... but to be fair networkd is just a layered ini configuration system over netlink ...
    • Some components left me wanting more explanations - But i have a good starting point to go play now
  • The writing style at times, pauses, and makes quirky irrelevant statements
    • But not a big deal
  • Complains about man pages / docs a lot ... Not sure if the author went and fixed them / submitted a PR
    • if they did I apologize - But this is something that grinds my gears

Summary

Anyways - if you're a sysadmin these days. I reccommend reading this to fill in some gaps with systemd + componenets. It's helped me with lots of small things and to understand things I haven't spent enough time with - e.g. cgroups. It even nicely explains why cgroups v2 is so different and how it scales better (I didn't even understand this and I work @ Meta/Facebook).

Do you also write a lot of services that need a few CLI option (e.g. --config) and or little CLI tools from time to time? Want a base CLI + logging to stderr template to start from?

I always do, so I have Python and Rust base CLI code templtes shared on GitHub that I use all the time:

https://github.com/cooperlees/base_clis

I also have sample base GitHub Actions to steal so that you can keep your:

  • Code tested
  • Code formatted
  • Type Checked / compiled
    on commit.

Dependabot will also keep the depdendecies up to date on these templates, so you can rely on them using the latest hotness.

Steal them today and contribute back handy enhancements as you see fit!

Python

Click based, using base logging function with a logging format I love. Logging is usally info or debug.

Rust

clap based, logging with a more standard glog like logging output.

  • -vvv to make it useful logging

IPv4 addressing on links is no longer required to route IPv4. What you say?? Yes, you can stop IPv4 addressing your point to point links with Legacy IP and route your IPv4 addressed packets via IPv6 next hops!

  • With this we can save Public IPv4 addressing!
  • We now only need a Public IPv4 loopback on Internet routers
  • No more wasting valuable IPv4 Public Space
    • with our /31s (or even /30s)
    • Gotta have those network and broadcast addresses ... So my CCNA training says!

MAC address is the same

How does this work??

Routers encapsulate Layer 3 IP packets into a Layer 2 frame on an ethernet networks. Most of the world today uses ethernet. Ethernet routers pull MAC address to send the frame from two sources today:

  • ARP Table for IPv4
  • Neighbor Table for IPv6

Having an IPv4 prefix routed by a IPv6 next hop just tells routers to look in the neighbor table to get the MAC address and no longer use the ARP table during encapsulation. The IP packet is then untouched and encapsulated as it would be using an IPv4 next hop and looks no different to the destination device. This is a truely simplistic, great solution to allow us to continue to routing legacy IP, but not add tech debt addressing our devices with said legacy.

IPv6 Link Local

You can use the auto configured IPv6 Link Local prefixes as your next hops too. IPv6 has the fe80::/10 prefix allocated as a Link Local scoped prefix. This prefix can not be routed. Every IPv6 enabled interface auto configures (by default) a Link Local prefix using it's MAC address to generate a unique address. Due to this, a router can even not have Site or Global scope IPv6 addressing on it's interfaces. Using Link Local addresses it can still be able to route both IPv4 and IPv6. The Terragraph project addresses it's networks like this.

If you do deploy your network this way, I would recommended to have a routable loopback address so you can contact the box inband. This would not be required, but without a routable address it would render traceroutes and pings ineffective for debugging and testing connectivity. All ICMP responses wouldn't be sent by the routers.

Why keep an IPv4 Loopback?

If you'd like ICMP responses, for say traceroute, it's still recommended that you keep a loopback adddress. Most people add /32 addresses to a loopback device on their network devices and doing so here will allow the device to have a source address for the legacy ICMP headers.

Note: I realize you lose what ingress interface you hit on a router without IPv4 p2p addressing

  • I feel this is a small price to pay since I know you're busy IPv6'ing everything ... right?

How do you see an ARP table?

On Linux:

cooper@l33t:~$ arp -an
? (10.251.254.1) at 02:42:0a:fb:fe:14 [ether] on br-a40df68e252d
? (173.255.255.1) at 00:00:0c:9f:f0:05 [ether] on eth0

How do you see a Neighbor Table?

On Linux:

cooper@l33t:~$ ip -6 neighbor
fe80::42:aff:fefb:fe17 dev br-a40df68e252d lladdr 02:42:0a:fb:fe:17 STALE
2600:3c01::8678:acff:fe0d:a641 dev eth0 lladdr 84:78:ac:0d:a6:41 router DELAY

RFCs

  • RFC8950 - "Advertising IPv4 Network Layer Reachability Information (NLRI) with an IPv6 Next Hop"
  • RFC5549 - "Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop" (obsolete)

Case Study: Simple Server to Rack

Using IPv4 Next Hops

  • Here we are using IPv4 Link Local in our rack as the next hops
  • The ARP table would be consulted here for looking up the next hop's MAC address
    • e.g. 169.254.6.9

Using IPv6 Next Hops

The same rack, but with IPv6 next hops for the IPv4 prefixes.

  • Here we are using IPv6 next hops
  • The Neighbor Table would be consulted to get the MAC address for the L2 Frame encapsulation

OSS Support

  • ExaBGP Support v4 via IPv6: Sample Config

I am sure there are more. Comment away other suggestions and I'll add.

Vendor Support

  • TODO: Finish + add config snippets

Cisco

TBA

EOS

  • We have EOS doing IPv4 via IPv6 @ Meta

TBA

JunOS

TBA

I found that jool has very good tutorials, but all the commands to get going are hidden in these large tutorials. Here are the steps I took to get it working on Ubuntu 20.04 on both a Raspberry Pi + Protectli Vault.

Please pre-read and refer to to Jool's Documentation for more information.

I have two Ubuntu 20.04 routers at home that run jool. Both routers/firewalls use NFTables so I'm just using jool in netfilter mode. When direct nftables support is implemented, I will move to this setup.

Quick Start/Setup jool

  • On ubuntu 20.04 it's just an apt install
    • apt install jool-dkms jool-tool
  • sudo modprobe jool
    • Add jool to /etc/modules to make persistent
  • Add Stateful NAT64 pool
    • jool instance add --netfilter --pool6 64:ff9b::/96
    • Here I used a oneshot systemd service to add on boot
[Unit]
Description=Add NAT64 netfilter pool6 to jool

[Service]
Type=oneshot
ExecStart=/usr/bin/jool instance add --netfilter --pool6 64:ff9b::/96

[Install]
WantedBy=multi-user.target

Handy Commands

  • See instace
    • jool instance display
    • jool instance status
  • See sessions
    • jool session display
  • Global config
    • jool global display
  • Overall stats
    • jool stats display

Testing

Try and Ping + traceroute to Google's main IPv4 NS anycast address 8.8.8.8 via IPv6 64:ff9b::8.8.8.8:

                              My traceroute  [v0.93]
coopbuntu (fd00:1::10)                                   2020-12-14T04:37:45+0000
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                         Packets               Pings
 Host                                  Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. fd00:1::2                           0.0%    11    2.5   2.3   2.1   2.6   0.1
 2. (waiting for reply)
 3. 64:ff9b::6022:7b58                  0.0%    11    9.9  13.6   9.9  20.7   3.6
 4. 64:ff9b::6022:7576                  0.0%    11   18.6  14.0  11.3  18.6   2.7
 5. 64:ff9b::6022:7932                  0.0%    10   14.0  17.1  13.2  21.9   3.0
 6. 64:ff9b::6022:b5                    0.0%    10   20.6  22.3  18.1  28.2   3.5
 7. 64:ff9b::6022:301                   0.0%    10   22.1  18.9  17.0  22.1   1.5
 8. 64:ff9b::4a7d:3072                  0.0%    10   26.2  21.3  17.7  36.2   5.9
 9. 64:ff9b::6caa:e636                  0.0%    10   24.1  19.2  17.3  24.1   2.5
10. 64:ff9b::4a7d:fc97                  0.0%    10   19.6  21.1  18.0  27.3   3.1
11. 64:ff9b::808:808                    0.0%    10   17.1  18.2  16.5  24.5   2.4

Once ICMP works, move on to tcp.

  • ssh -v 64:ff9b::173.255.255.199

Session Table

  • jool session display is your friend to see current translations
    • --numeric stops the non parallel DNS resolution
Every 1.0s: sudo jool session display                                  home1.cooperlees.com: Mon Dec 14 04:42:17 2020

---------------------------------
(ESTABLISHED) Expires in 1:59:31.440
Remote: us.cooperlees.com#ssh   fd00:1::10#48656
Local: 66.214.99.163#61019      64:ff9b::adff:ffc7#22
---------------------------------
(V4_FIN_V6_FIN_RCV) Expires in 0:03:14.796
Remote: 5.85.222.35.bc.googleusercontent.com#http       fd00:1::10#43868
Local: 66.214.99.163#62581      64:ff9b::23de:5505#80
---------------------------------

Mr Aijay Adams and I am back making my Fireplace Internet / Smart device controllable. Now, via a very sexy Web UI, when I'm heading back to Chateau Tahoe, I can turn my fireplace on to be ready as soon as I walk in the door. Sexy warmth controlled by a sexy custom made API.

  • A goal was to keep the original switch working too, so we can be Cave people as well!

Web UI

Gorgeous Web 0.69 design.

Install Photos

Tech Specs

Hardware:

  • Raspberry Pi 4
    • Relay Hat on the GPIO

Software:

Firestarter API:

  • / - Status of the fireplace
  • /turn_off - Turn the fireplace off
  • /turn_on - Turn thr fireplace off

Are you using the latest Linux kernel firewall?. Here are some notes I've saved that I use and forget all the time.
I plan to add to this as I do more. Hopefully it helps you work something out one day.

Note: I am using inet tables combining my IPv4 and IPv6 rulesets.

List Tables

sudo nft list table inet filter -n -a
sudo nft list table inet nat -n -a

  • -n: numeric
  • -a: handle (object handles)

Add a rule

nft insert rule inet filter OUTPUT position 0 icmpv6 type {nd-router-advert} drop

Delete a rule

nft delete rule inet filter OUTPUT handle 41

ICMPv6 Types

Noting some handy IPv6 ICMP types. I use nftables to block RAs when my WAN is down.

  • nd-router-advert == 134

tcpdump expressions

  • tcpdump -v -i en0 'ip6[40] = 134'

Recently in the Terragraph project I work on we changed from RPM to OPKG to removes some dependencies (e.g. perl) and make our overall image size smaller. I've never driven OPKG, but know RPM, so I made this cheat sheet for my shit memory.

I'm cheap so I don't have a Table plugin - So used Python to generate me one 🤠

+--------------------------+-------------------------------+
| RPM Cmd                  | OPKG Cmd                      |
+--------------------------+-------------------------------+
| rpm -qa                  | opkg list-installed           |
| rpm -qf <FILE>           | opkg search <FILE>            |
| rpm -i[vh] --force <PKG> | opkg install [--force*] <PKG> |
| rpm -e --force <PKG>     | opkg remove [--force*] <PKG>  |
+--------------------------+-------------------------------+
- opkg Force Examples:
  --force-depends|--force-downgrade|--force-remove

I often use a lot of PyPI CLI tools. Here is an example of how to get them easily installed and kept up to date via Ansible on Ubuntu >= 18.04.

Install base pip via apt then run pip:

- name: Get Python3 pip
  package:
    name: python3-pip
    state: latest

- name: Add some handy Python PyPI Tools
  pip:
    name: "{{ item }}"
    extra_args: --upgrade
  with_items:
    - "black"
    - "coverage"
    - "mypy"
    - "pip"
    - "setuptools"

Enjoy up to date core Python tools + handy CLIs for dev work.

Please do NOT use on a Production host ...