site defense
defending my reverse proxy from very light, unintentional DoS
Long story short: I blacklisted most of the world from accessing the HTML
render of my nixpkgs fork on pub.npry.dev, since
crawlers mostly in Asia (India, China, and Singapore) wouldn't respect my
robots.txt. They were continuously scraping the HTML pages listing the code
for nixpkgs, which is famously a very large repo. These are dynamically
rendered by pygments, which was sitting at 100% CPU (single core) syntax
highlighting thousands of large Nix derivations, which was stalling out my
browsing of my own website.
I had previously addressed this problem via an automated rdap tool,
but this only emitted logs that I later integrated into my nixpkgs config.
This required ongoing manual maintenance to keep things working, and the
problem was never really access to my sites, just this pathological scraping
behavior in large repos — I figured a geo-IP solution would work. This
was easier to do than I expected — the MaxMind
databases are free, and there's a NixOS module for downloading and updating
them: services.geoipupdate — the keys are straightforward to integrate
through {sops,age}-nix.
Nginx gets:
services.nginx.additionalModules = with pkgs.nginxModules; [
geoip2
];
And a config like:
geoip2 $PATH/GeoLite2-Country.mmdb {
auto_reload 5m;
$geoip_continent_code continent code;
}
map $geoip_continent_code $geo_continent_allow {
default no;
NA yes;
EU yes;
AN yes;
}
And you can geofilter anything:
location /nixpkgs {
if ($geo_continent_allow = no) {
return 444;
}
}