this post was submitted on 04 Apr 2026
422 points (99.8% liked)
Science Memes
19863 readers
2188 users here now
Welcome to c/science_memes @ Mander.xyz!
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules
- Don't throw mud. Behave like an intellectual and remember the human.
- Keep it rooted (on topic).
- No spam.
- Infographics welcome, get schooled.
This is a science community. We use the Dawkins definition of meme.
Research Committee
Other Mander Communities
Science and Research
Biology and Life Sciences
- !abiogenesis@mander.xyz
- !animal-behavior@mander.xyz
- !anthropology@mander.xyz
- !arachnology@mander.xyz
- !balconygardening@slrpnk.net
- !biodiversity@mander.xyz
- !biology@mander.xyz
- !biophysics@mander.xyz
- !botany@mander.xyz
- !ecology@mander.xyz
- !entomology@mander.xyz
- !fermentation@mander.xyz
- !herpetology@mander.xyz
- !houseplants@mander.xyz
- !medicine@mander.xyz
- !microscopy@mander.xyz
- !mycology@mander.xyz
- !nudibranchs@mander.xyz
- !nutrition@mander.xyz
- !palaeoecology@mander.xyz
- !palaeontology@mander.xyz
- !photosynthesis@mander.xyz
- !plantid@mander.xyz
- !plants@mander.xyz
- !reptiles and amphibians@mander.xyz
Physical Sciences
- !astronomy@mander.xyz
- !chemistry@mander.xyz
- !earthscience@mander.xyz
- !geography@mander.xyz
- !geospatial@mander.xyz
- !nuclear@mander.xyz
- !physics@mander.xyz
- !quantum-computing@mander.xyz
- !spectroscopy@mander.xyz
Humanities and Social Sciences
Practical and Applied Sciences
- !exercise-and sports-science@mander.xyz
- !gardening@mander.xyz
- !self sufficiency@mander.xyz
- !soilscience@slrpnk.net
- !terrariums@mander.xyz
- !timelapse@mander.xyz
Memes
Miscellaneous
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
“Wait, why is this not a CSV file?”
God I hate csv with the fire of a thousand suns.
Contractors never seem to know how to write them correctly. Last year, one even provided “csv”s that were just Oracle error messages. lol. Another told me their system could not quote string columns nor escape commas or use anything but commas as their separator, so there were unpredictable numbers of commas in the rows when the actual data contained commas. Total nightmare. And so much of my data has special character issues because somewhere in the pipeline a text encoding was wrong and there is exactly one mangled character in 5 million lines for me to find.
Give me the data as closely to the source data as you can. If it is a database, then a database dump or access to a clone of your database is the best option by far. I don’t care how obscure your shit is, Ill do the conversion myself.
For intermediate data, something like parquet or language specific formats like Rdata or pickle files. Maaaaybe very carefully created csv files for archival purposes, but even then, I think parquet is safe for the long haul nowadays.
I can't tell you how many scripts I've written to format poorly made CSV files
The essence of data science
After many years of being a developer I've come to the conclusion that the single strongest indicator of a person's competence is how they handle CSV when asked to produce or consume it.
Reminds me of writing my own csv parser that implemented escapes properly. The one everyone else went with of course was written in regex, so it was faster... But broke if there were escaped newlines.
P.s. in the above quagmire, the only solution is choose to keep only the most important un-clean column per csv, and make it the last column in the file so you have predictable columns. If you need more, then write separate csvs. Computers are stupid.
If you could choose the column order, you could choose a better format, or at least escape correctly.
It was some sort of weird database frontend the contractor used. It was very limited.
What delimiter should I be using instead of commas?
🤪 as a delimiter
🥦 for end of line
Use comma for delimiter, and escape any comma in the data by enclosing that entry in quotes.
Data: 225 | 2,500 | 450
CSV: 225,"2,500",450
Semi-colons. Tabulators. Something not in the actual strings. However the Python CSV module it formats.
Alt-008