site defense

defending my reverse proxy from very light, unintentional DoS

2025-10-24

Long story short: I blacklisted most of the world from accessing the HTML render of my nixpkgs fork on pub.npry.dev, since crawlers mostly in Asia (India, China, and Singapore) wouldn't respect my robots.txt. They were continuously scraping the HTML pages listing the code for nixpkgs, which is famously a very large repo. These are dynamically rendered by pygments, which was sitting at 100% CPU (single core) syntax highlighting thousands of large Nix derivations, which was stalling out my browsing of my own website.

I had previously addressed this problem via an automated rdap tool, but this only emitted logs that I later integrated into my nixpkgs config. This required ongoing manual maintenance to keep things working, and the problem was never really access to my sites, just this pathological scraping behavior in large repos — I figured a geo-IP solution would work. This was easier to do than I expected — the MaxMind databases are free, and there's a NixOS module for downloading and updating them: services.geoipupdate — the keys are straightforward to integrate through {sops,age}-nix.

Nginx gets:

services.nginx.additionalModules = with pkgs.nginxModules; [
    geoip2
];

And a config like:

geoip2 $PATH/GeoLite2-Country.mmdb {
    auto_reload 5m;
    $geoip_continent_code continent code;
}

map $geoip_continent_code $geo_continent_allow {
    default no;

    NA yes;
    EU yes;
    AN yes;
}

And you can geofilter anything:

location /nixpkgs {
    if ($geo_continent_allow = no) {
        return 444;
    }
}