That assembly program the author compares to is waay bloated. This guy managed with 105 bytes: https://nathanotterness.com/2021/10/tiny_elf_modernized.html (that is with overlapping part of the code into the ELF header and other similar level shenanigans). ;)
All kidding aside, interesting article.
Another aspect is that calling a cli command is way slower than a library function (in general). This is most apparent on short running commands, since the overhead is mostly fixed per command invocation rather than scaling with the amount of work or data.
As such I would at the very least keep those commands out of any hot/fast paths.