this post was submitted on 14 Feb 2025
483 points (96.9% liked)

No Stupid Questions

37414 readers
1475 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.



Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.



Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.



Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago
MODERATORS
 

I'm a tech interested guy. I've touched SQL once or twice, but wasn't able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).

With that, I keep seeing [pic related] as proof that Elon Musk doesn't understand SQL.

Can someone give me a technical explanation for how one would come to that conclusion? I'd love if you could pass technical documentation for that.

you are viewing a single comment's thread
view the rest of the comments
[–] 9point6@lemmy.world 291 points 1 week ago (1 children)

The statement "this [guy] thinks the government uses SQL" demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.

The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

If he knew the domain, he would know this isn't an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it's not an issue.

The man continues to be a malignant moron

[–] spankmonkey@lemmy.world 30 points 1 week ago* (last edited 1 week ago) (3 children)

The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn't supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.

Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don't know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.

https://www.ssa.gov/history/hfaq.html

Q20: Are Social Security numbers reused after a person dies?

A: No. We do not reassign a Social Security number (SSN) after the number holder's death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.

[–] Lightor@lemmy.world 1 points 6 days ago

My guess would be around your note. If someone mistakenly has two SSNs (due to fraud, error, or name changes), combining DOB helps detect inconsistencies.

Some other possibilities, and I'm just throwing out ideas at this point:

  • Adding DOB could help with manual lookups and verification.
  • Using SSN + DOB ensures a standard key format across agencies, making it easier to link records.
  • Prevents accidental duplication if an SSN is mistyped.
  • Maybe the databases were optimized for fixed-length fields, and combining SSN + DOB fit within memory constraints.
  • It was easier to locate records with a “human-readable” key. Where as something like a UUID is harder for humans to read or sift through.
[–] halcyonloon@midwest.social 27 points 1 week ago (3 children)

Take this with a grain of salt as I'm not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.

In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.

[–] DahGangalang@infosec.pub 25 points 1 week ago* (last edited 1 week ago) (1 children)

A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated "in the database".

Of all the comments ao far, I find yours the most compelling.

[–] Barbarian@sh.itjust.works 14 points 1 week ago* (last edited 1 week ago) (2 children)

Theoretically, yeah, that's one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:

SSN_Table

ID | SSN | Other info

Other_Table

ID | SSN_ID | Other info

When you want to connect them to have both sets of info, it'd be the following:

SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID

EDIT: Oh, just to clear up any confusion, the SSN_ID in this simple example is not the SSN itself. To access that in this example query, it'd by SSN_Table.SSN

[–] schteph@lemmy.world 21 points 1 week ago (3 children)

This is true, but there are many instances where denormalization makes sense and is frequently used.

A common example is a table that is frequently read. Instead of going to the "central" table the data is denormalized for faster access. This is completely standard practice for every large system.

There's nothing inherently wrong with it, but it can be easily misused. With SSN, I'd think the most stupid thing to do is to use it as the primary key. The second one would be to ignore the security risks that are ingrained in an SSN. The federal government, being large as it is, I'm sure has instances of both, however since Musky is using his possy of young, arrogant brogrammers, I'm positively certain they're completely ignoring the security aspect.

[–] syklemil@discuss.tchncs.de 8 points 1 week ago

To be a bit more generic here, when you're at government scale you're generally deep in trade-off territory. Time and space are frequently opposed values and you have to choose which one is most important, and consider the expenses of both.

E.g. caching is duplicating data to save time. Without it we'd have lower storage costs, but longer wait times and more network traffic.

[–] DahGangalang@infosec.pub 7 points 1 week ago

Yeah, no one appreciates security.

I probably overused that saying to explain it: 'if theres no break ins, why do we pay for security? Oh, there was a break in - what do we even pay security for?'

[–] Barbarian@sh.itjust.works 3 points 1 week ago

Yeah, I work daily with a database with a very important non-ID field that is denormalized throughout most of the database. It's not a common design pattern, but it is done from time to time.

[–] DahGangalang@infosec.pub 2 points 1 week ago (1 children)

Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I'm realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn't need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.

But anyway, you're helping clear things up for me. I really appreciate the pseudo code level example.

[–] Barbarian@sh.itjust.works 6 points 1 week ago* (last edited 1 week ago) (1 children)

It's necessary to split it out into different tables if you have a one-to-many relationship. Let's say you have a list of driver licenses the person has had over the years, for example. Then you'd need the second table. So something like this:

SSN_Table

ID | SSN | Other info

Driver_License_Table

ID | SSN_ID | Issue_Date | Expiry_Date | Other_Info

Then you could do something like pull up a person's latest driver's license, or list all the ones they had, or pull up the SSN associated with that license.

[–] Arcka@midwest.social 2 points 5 days ago

I think a likely scenario would be for name changes, such as taking your partner's surname after marriage.

[–] Ephera@lemmy.ml 8 points 1 week ago

The SSN is likely to appear in multiple tables, because they will reference a central table that ties it all together. This central table will likely only contain the SSN, the birth date (from what others have been saying), as well as potentially first and last name. In this table, the entries have to be unique.
But then you might have another table, like a table listing all the physical exams, which has the SSN to be able to link it to the person's name, but ultimately just adds more information to this one person. It does not duplicate the SSN in a way that would be bad.

[–] spankmonkey@lemmy.world 6 points 1 week ago (1 children)

It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.

Hell, I work in a state agency and one of our older databases has a dozen tables with databases.

  • One has the whole thing as a long int: 222333444
  • One has the whole thing as a string: 2223334444 (which of course can't be directly compared to the one that is a long int...)
  • One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
  • One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444

The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don't need to include the parenthesis or hyphens in the database itself.

[–] pixxelkick@lemmy.world 4 points 1 week ago (1 children)

Okay but if that happens, musk is right that that's a bit of a denormalization issue that mayne needs resolving.

SSNs should be stored as strings without any hyphen or additional markup, nothing else.

  • Storing as a number can cause issues if you ever wanna support trailing zeros
  • any "styling" like hyphens should be handled by a consuming front end system, you want only the important data in the DB to maximize query times

It's more likely though it's just a composite key...

[–] spankmonkey@lemmy.world 15 points 1 week ago (1 children)

This is not what he is actively doing though. He isn't trying to improve databases.

He is tearing down entire departments and agencies and using shit like this to justify it.

[–] pixxelkick@lemmy.world 2 points 1 week ago* (last edited 1 week ago) (1 children)

Sure but my point is, if it was the scenario you described, then Elon would be talking about the right kind of denormalization problem.

Denormalization due to multiple different tables storing their own copies of the same data, in different formats worse yet, would actually be the kind of problem he's tweeting about.

As opposed to a composite key on one table which means him being an ultracrepidarian, as usual.

[–] spankmonkey@lemmy.world 8 points 1 week ago* (last edited 1 week ago) (1 children)

Musk canceled the support for the long running Common Education Data Standards (CEDS) which is an initiative to promote better database standards and normalization for the states to address this kind of thing.

It does not fucking matter if he is technically correct about one tiny detail because he is only using to to destroy, not to improve efficiency.

[–] pixxelkick@lemmy.world 0 points 1 week ago (1 children)

I mean it matters here, as it's literally the topic being actively discussed by the person who literally asked, so obviously it matters to them lol

[–] spankmonkey@lemmy.world 3 points 1 week ago (1 children)

The thing is, there are a large number of different reasons to store an SSN as a long int or a string depending on how it is used with the rest of the data. For a phone number, there can be a valid reason to store the area code separately to speed up data queries that narrow down by area code instead of all in one field and peeling it apart. There are also reasons to have additional, seemingly redundant, columns that can be used for optimizing searches or simplifying how queries are written.

A common one is that using 1 and 0 instead of Y an N is often faster for massively large dataset optimization, but isn't as easily human readable.

There are complex reasons for choosing different approaches in a database, and the most important thing is generally consistency within the database. His point is meaningless without context beyond consistency, and the different government systems will have had different priorities, not to mention trying to update all of the databases to make them consistent is a MASSIVE fucking undertaking. And the systems can stay the way they are as long as they have APIs or other methods of transferring data that ARE normalized and consistent.

I have personally been working with reporting data to federal systems for 15 years as a semi knowledgeable technical person. This is what I do for a job. What he is saying is pointlessly small trivia used to justify tearing things down instead of improving them.

[–] Feathercrown@lemmy.world 0 points 1 week ago

They weren't justifying anything or making a moral statement, they were just discussing the technical question that was posed.

[–] DahGangalang@infosec.pub 3 points 1 week ago

Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!