"sanitize" is a code smell#
Sun Sep 7 08:46:27 2025
This is something of a hobby horse of mine, so forgive the rant: when I see something has been "sanitized" I treat it as a code smell (per Martin Fowler, "... a surface indication that usually corresponds to a deeper problem in the system"), and often find it reveals sloppy thinking which may not even prevent the exploits it is supposed to guard against.
Each data item in your system is a value, which has a canonical representation inside your system but may be represented in multiple different external formats at the boundaries of your system.
When we say "sanitize" we imply that the input data was "insanitary"
(or even "insane", same etymological root I think) but it really
probably wasn't - it just didn't conform to the rules of some
particular representation you had in mind that you would later need to
output. So why is that particular representation special? Should
"sanitizing" strip out backticks (specal in shell)? The semicolon
(special in SQL)? The angle brackets (HTML)? The string +++
(Hayes
modem commands)? ..
(pathnames)? ` The dollar sign (bound to be used somewhere)?
Non-ASCII unicode characters (can't put those in a domain name)?
Don't "sanitize". Encode and decode between the canonical internal
representation and the external representation you need to interface
with. Mr O'Leary
will be happy, Sigur Rós
will appreciate you've
spelled their name right, and Smith & Sons, Artisan Greengrocers
won't have their ampersand dropped.