hedgehog

joined 2 years ago
[–] hedgehog@ttrpg.network 1 points 2 hours ago

It may be aware of them, but not in that context. If you asked it how to solve the problem rather than to solve the problem for you, there’s a chance it would suggest you use a reverse image search.

[–] hedgehog@ttrpg.network 3 points 10 hours ago (2 children)

LLM image processing doesn’t work the same way reverse image lookup does.

Tldr explanation: Multimodal LLMs turn pictures into a ~~thousand~~ 200-500 or so ~~words~~ tokens, but reverse image lookups create perceptual hashes of images and look the hash of your uploaded image up in a database.

Much longer explanation:

Multimodal LLMs (technically, LMMs - large multimodal models) use vision transformers to turn images into tokens. They use tokens for words, too, but these tokens don’t also correspond to words. There are multiple ways this could be implemented, but a common approach is to break the image down into a grid, then transform each “patch” of a specific size, e.g., 16x16, into a single token. The patches aren’t transformed individually - the whole image is processed together, in context - but it still comes out of it with basically 200 or so tokens that allow it to respond to the image, the same way it would respond to text.

Current vision transformers also struggle with spatial awareness. They embed basic positional data into the tokens but it’s fragile and unsophisticated when it comes to spatial awareness. Fortunately there’s a lot to explore in that area so I’m sure there will continue to be improvements.

One example improvement, beyond improved spatial embeddings, would be to use a dynamic vision transformers that’s dependent on the context, or that can re-evaluate an image based off new information. Outside the use of vision transformers, simply training LMMs to use other tools on images when appropriate can potentially help with many of LMM image processing’s current shortcomings.

Given all that, asking an LLM to find the album for you is like - assuming you’ve given it the ability and permission to search the web - like showing the image to someone with no context, then them to help you find what music video - that they’ve never seen, by an artist whose appearance they describe with 10-20 generic words, none of which are their name - it’s in, and to hope there were, and that they remembered, the specific details that would make it would come up in the top ten results if searched for on Google. That’s a convoluted way to say that it’s a hard task.

By contrast, reverse image lookup basically uses a perceptual hash generated for each image. It’s the tool that should be used for your particular problem, because it’s well suited for it. LLMs were the hammer and this problem was a torx screw.

Suggesting you use - or better, using a reverse image lookup tool itself - is what the LLM should do in this instance. But it would need to have been trained to think to suggest this, capable of using a tool that could do the lookup, and have both access and permission to do the lookup.

Here’s a paper that might help understand the gaps between LMMs and tasks built for that specific purpose: https://arxiv.org/html/2305.07895v7

[–] hedgehog@ttrpg.network 2 points 12 hours ago

Thank you! That gives me a starting point that should be easy to look up!

[–] hedgehog@ttrpg.network 8 points 12 hours ago

From the blog post referenced:

We do not provide evidence that:

AI systems do not currently speed up many or most software developers

Seems the article should be titled “16 AI coders think they’re 20% faster — but they’re actually 19% slower” - though I guess making us think it was intended to be a statistically relevant finding was the point.

That all said, this was genuinely interesting and is in-line with my understanding of the human psychology that’s at play. It would be nice to see this at a wider scale, broken down across different methodologies / toolsets and models.

[–] hedgehog@ttrpg.network 1 points 12 hours ago (1 children)

Why is 255 off limits? What is 127.0.0.0 used for?

To clarify, I meant that specific address - if the range starts at 127.0.0.1 for local, then surely 127.0.0.0 does something (or is reserved to sometimes do something, even if it never actually does in practice), too.

Advanced setup would include a reverse proxy to forward the requests from the applications port to the internet

I use Traefik as my reverse proxy, but I have everything on subdomains for simplicity’s sake (no path mapping except when necessary, which it generally isn’t). I know 127.0.0.53 has special meaning when it comes to how the machine directs particular requests, but I never thought to look into whether Traefik or any other reverse proxy supported routing rules based on the IP address. But unless there’s some way to specify that IP and the IP of the machine, it would be limited to same device communications. Makes me wonder if that’s used for any container system (vs the use of the 10, 172.16-31, and 192.168 blocks that I’ve seen used by Docker).

Well this is another advanced setup but if you wanted to segregate two application on different subnets you can. I’m not sure if there is a security benefit by adding the extra hop

Is there an extra hop when you’re still on the same machine? Like an extra resolution step?

I still don’t understand why .255 specifically is prohibited. 8 bits can go up to 255, so it seems weird to prohibit one specific value. I’ve seen router subnet configurations that explicitly cap the top of the range at .254, though - I feel like I’ve also seen some that capped at .255 but I don’t have that hardware available to check. So my assumption is that it’s implementation specific, but I can’t think of an implementation that would need to reserve all the .255 values. If it was just the last one, that would make sense - e.g., as a convention for where the DHCP server lives on each network.

[–] hedgehog@ttrpg.network 3 points 15 hours ago (5 children)

Why is 255 off limits? What is 127.0.0.0 used for?

[–] hedgehog@ttrpg.network 2 points 15 hours ago (1 children)

PSTN is wiretapped.

It’s a good thing that the website itself supports sending and receiving alerts, then.

[–] hedgehog@ttrpg.network 1 points 1 day ago

Current generation iPad Pros and Airs have the same processing power as Apple Silicon Macs. That’s more than enough for Blender. Even the base iPad and the iPad Mini likely have enough processing power - though I don’t think the base iPad has enough RAM.

[–] hedgehog@ttrpg.network 1 points 2 days ago (1 children)

Does mirroring a screen (or adding a screen) from a computer or connecting to a computer via remote desktop count?

[–] hedgehog@ttrpg.network 6 points 6 days ago

if everyone thought like you no one would create digital media

This is obviously incorrect.

[–] hedgehog@ttrpg.network 0 points 1 week ago (1 children)

When did Democrats have a supermajority in both the House and the Senate?

[–] hedgehog@ttrpg.network 1 points 1 week ago (1 children)

I thought Hue bulbs used Zigbee?

 

This only applies when the homophone is spoken or part of an audible phrase, so written text is safe.

It doesn’t change reality, just how people interpret something said aloud. You could change “Bare hands” to be interpreted as “Bear hands,” for example, but the person wouldn’t suddenly grow bear hands.

You can only change the meaning of the homophones.

It’s not all or nothing. You can change how a phrase is interpreted for everyone, or:

  • You can affect only a specific instance of a phrase - including all recordings of it, if you want - but you need to hear that instance - or a recording of it - to do so. If you hear it live, you can affect everyone else’s interpretation as it’s spoken.
  • You can choose not to affect how it is perceived by people when they say it aloud, and only when they hear it.
  • You can affect only the perception of particular people for a given phrase, but you must either be point at them (pictures work) or be able to refer to them with five or fewer words, at least one of which is a homophone. For example, “my aunt.” Note that if you do this, both interpretations of the homophone are affected, if relevant, (e.g., “my ant”).
  • You can make it so there’s a random chance (in 5% intervals, from 5% to 95%) that a phrase is misinterpreted.
 

cross-posted from: https://lemmy.world/post/19716272

Meta fed its AI on almost everything you’ve posted publicly since 2007

 

The video teaser yesterday about this was already DMCAed by Nintendo, so I don’t think this video will be up long.

view more: next ›