this post was submitted on 08 Apr 2026
4 points (66.7% liked)

Linux

17027 readers
89 users here now

Welcome to c/linux!

Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!

Rules:

  1. Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.

  2. Be respectful: Treat fellow community members with respect and courtesy.

  3. Quality over quantity: Share informative and thought-provoking content.

  4. No spam or self-promotion: Avoid excessive self-promotion or spamming.

  5. No NSFW adult content

  6. Follow general lemmy guidelines.

founded 2 years ago
MODERATORS
 

https://en.wikipedia.org/wiki/Private_Use_Areas

I came across a Python library that passed the ASCII range into one of these non printable character ranges and then into a database. If someone was doing that manually with a hex table, how is that detected and mitigated?

top 16 comments
sorted by: hot top controversial new old
[–] Arcanoloth@lemmy.ml 5 points 20 hours ago

Whatever you're trying to achieve: You're decidedly approaching it from the wrong angle.

[–] lurch@sh.itjust.works 4 points 21 hours ago

It's not detected or mitigated, because it's nothing bad. People can use whatever data formats or file formats they want. This is just another weird data format.

[–] vk6flab@lemmy.radio 6 points 23 hours ago (1 children)

Globally we've agreed that the ASCII code for a space is 32, 65 for the letter A.

Unicode characters are also globally defined, so when someone uses an agreed upon code, everyone sees the same thing, like this grimace smiley 😁

A private area is a place that we've all agreed is for "private use". If a trademark owner wants to use their special character in their documentation, they can define one area to represent their character, but the only people who will see it in the same way, are people who installed their particular font.

Anyone without that font would see whatever the font on their own machine displayed.

Putting random stuff in such a place is no more than putting gobbledygook in a text and it might even be used as a way to fingerprint text.

I'm not sure what you want to "detect" or "mitigate".

[–] j4k3@lemmy.world 1 points 12 hours ago

I need to block a known threat actor already present on a system. Call it an exercise. I am interested in exploring and understanding it. This text obfuscation is a technique I discovered being used. This is a stage 3 threat model type of situation where every possible vector is in play.

This is not the theoretical, rtfm or read and trust the source situation.

[–] trigg@lemmy.world 5 points 23 hours ago (1 children)

I can't work out what you're asking.

You use "mitigated" like this is some kind of exploit but it's just unicode text still.

What is the problem with private use areas of unicode?

[–] j4k3@lemmy.world -1 points 23 hours ago (2 children)

It is non printing. It cannot be seen or scanned or highlighted. It looks like nothing, except the file size is large with more hex than should be in the binary.

[–] trigg@lemmy.world 4 points 23 hours ago (1 children)

I'm still not seeing why that is a problem. The information remains even if it has no glyphs.

[–] j4k3@lemmy.world 0 points 14 hours ago (1 children)

It does not. It can be rendered as a control character.

[–] trigg@lemmy.world 2 points 11 hours ago (1 children)
[–] j4k3@lemmy.world 1 points 7 hours ago (1 children)

No one reads hex as strings IRL.

[–] trigg@lemmy.world 2 points 6 hours ago

But it means nothing. You can cypher in much more efficient or clever ways.

[–] MartianSands@sh.itjust.works 3 points 23 hours ago (1 children)

It ought to look like a bunch of □, which is the glyph generally used to indicate that the font has nothing to represent the character.

Specifically you'd expect U+25A1 □ WHITE SQUARE

[–] MartianSands@sh.itjust.works 4 points 23 hours ago (2 children)

Also, the answer to your actual question is no. There's definitely no way to block people from using any particular characters at the kernel level.

What you seem to be asking for is a way to absolutely forbid all software from writing certain characters to files, and/or from reading those characters. Aside from requiring that the kernel inspect all data in detail before letting other software have it, which would slow everything way down, it would prevent anyone from reading or writing binary data which happens to contain those sequences of bytes by coincidence. Binary data includes things like the programs which make the system work, so blocking those characters would be terminal

[–] j4k3@lemmy.world -1 points 11 hours ago (1 children)

Not necessarily. Turn this around. Let's say I am working at somewhere like a chip foundry with tons of IP. I have no access to encryption tools, but I can easily shift characters to a hex range in bash and send emails.

These characters can use the control glyph, and so do not print or show up in any physical way except in hex.

This technique must be obfuscated at every serious organization from governments to industry.

[–] trigg@lemmy.world 2 points 11 hours ago

Encryption exists manually. This isn't the problem you appear to imagine it is

[–] tal@lemmy.today 3 points 23 hours ago

Also, (a) userspace could have some higher-level encoding or encryption or compression that happens without the kernel seeing the non-encoded data, and (b) whatever particular Unicode encoding OP is probably thinking of isn't the only Unicode encoding out there.

That doesn't, strictly-speaking, mean that it's impossible to have kernel-level blocking


you could create some kind of emulated system that inspects everything, but it does mean that you couldn't just inspect data at points where one normally enters the kernel.

The answer that is probably most useful to OP is that if it's a problem for his application, he should validate it in userspace.