this post was submitted on 13 May 2026
775 points (100.0% liked)
Science Memes
20177 readers
1292 users here now
Welcome to c/science_memes @ Mander.xyz!
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules
- Don't throw mud. Behave like an intellectual and remember the human.
- Keep it rooted (on topic).
- No spam.
- Infographics welcome, get schooled.
This is a science community. We use the Dawkins definition of meme.
Research Committee
Other Mander Communities
Science and Research
Biology and Life Sciences
- !abiogenesis@mander.xyz
- !animal-behavior@mander.xyz
- !anthropology@mander.xyz
- !arachnology@mander.xyz
- !balconygardening@slrpnk.net
- !biodiversity@mander.xyz
- !biology@mander.xyz
- !biophysics@mander.xyz
- !botany@mander.xyz
- !ecology@mander.xyz
- !entomology@mander.xyz
- !fermentation@mander.xyz
- !herpetology@mander.xyz
- !houseplants@mander.xyz
- !medicine@mander.xyz
- !microscopy@mander.xyz
- !mycology@mander.xyz
- !nudibranchs@mander.xyz
- !nutrition@mander.xyz
- !palaeoecology@mander.xyz
- !palaeontology@mander.xyz
- !photosynthesis@mander.xyz
- !plantid@mander.xyz
- !plants@mander.xyz
- !reptiles and amphibians@mander.xyz
Physical Sciences
- !astronomy@mander.xyz
- !chemistry@mander.xyz
- !earthscience@mander.xyz
- !geography@mander.xyz
- !geospatial@mander.xyz
- !nuclear@mander.xyz
- !physics@mander.xyz
- !quantum-computing@mander.xyz
- !spectroscopy@mander.xyz
Humanities and Social Sciences
Practical and Applied Sciences
- !exercise-and sports-science@mander.xyz
- !gardening@mander.xyz
- !self sufficiency@mander.xyz
- !soilscience@slrpnk.net
- !terrariums@mander.xyz
- !timelapse@mander.xyz
Memes
Miscellaneous
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'm a data analyst at a medical nonprofit, primarily doing analyses on germline variants for rare forms of cancer. I'm new to this kind of work, but had a decent educational background in biology.
Something I've learned is that genetics are complicated as hell. A single gene can produce multiple different proteins, and proteins change over time due to somatic variation. Only 1% of the genome are protein coding, called exomes. Exomes can be affected by variations to start and stop codons, non coding regions, and untranslated regions. There are entire fields dedicated to studying genome-wide, exomics, transcriptomics, proteomics, phenomics, and probably several others that I don't know about. The amount of data involved with these fields is in the tebibytes region. Have you ever seen a "small" 3GiB csv? I have. The filtered and cleaned data frames created by genetics are over 100 columns wide and have nearly 5 million entries.
There are companies creating artificial life by generating custom chromosomes. There's a whole field of computer science dedicated to biological computing, using DNA as a storage medium. There are companies dedicated to simply classifying genes.
DNA is cool as hell.
If you really want to blow your mind, look into the theoretical alternatives to DNA. we are all taught about RNA and how it is a precursor to DNA, but what if it went another way? Look up PNA, PNA-O, or even GNA. If life existed on other worlds, there is a decent chance it follows an xNA structure, but not necessarily DNA.
Interesting, could you enlighten what types if data is in those 100 columns? I’m aware of ATGC and thought it would be just one column, but maybe the rest are some that indicate intensity or activity. Or what sequence they are part of.
Well it varies depending on what the file is meant for. Usually there's columns like chromosome, variant position, reference nucleotide, observed nucleotide, type of variation, codon sequence, gene name, etc.
There's also columns that result from various analyses. In the file I've been working on lately, there are columns such as variant impact, level of confidence, pathogenicity, clinical significance, etc.
That sounds like a marker file. It's a bit different than a sequence file.
Molecular markers are linked to specific sequences in the DNA. These markers are generally close by or in the gene of interest. All the extra columns described its characteristics and results. Anyplace in the entire genome where there is one nucleotide difference (polymorphic) can be another marker. There's millions of these and they add up to massive files.
A sequence file is basically just a long boring sequence of nucleotides and are not that large. Now some of the files you use to generate the sequence. Let's just say they had to wait almost 20 years for computers to get fast enough to process those files in a reasonable time. Those make the marker files look like childs play.
I'm not familiar with the name of the file I'm currently working with tbh. It's used to create the annotation files for regenie analyses. It has every variant for every gene within the biobank. There's far more than just missense; there are stop/start gain/loss, splice donor/acceptor, frameshifts, and ptv. It contains primateAI scores, spliceAI scores, cava data, clinvar data, and more.
Sweet, thanks for the reply. I didn’t expect to fully understand what they would contain but I got the idea.
There’s a Japanese artist Ryoji Ikeda who you might like, he has visualised DNA and all sorts of data. I like his data.gram exhibition’s style the most esthetically amusing and he has published some albums too.
https://www.taronasugallery.com/en/exhibitions/ryoji-ikeda%E3%80%8Cdata-gram%E3%80%8D/
My dude, not a fun thing to think about who might have control over that. Is it a musk, zuck, cook or epstein?
No, none of those guys are involved afaik. The one that made the first breakthrough in artificial life is ran by the same dude who competed with the Human Genome Project to map 99% of the human genome. They modified an extremely simple bacteria that only had something like 300 base pairs
We still don't know what type of person they are. Them being smart and focused on the research, doesn't give them a pass. They could even not care who else has the info.
Yup. Many Nazi scientists only cared about the research. A lot of medical and physics breakthroughs last century directly resulted from those experiments.
I have no context/knowledge on topic. Are you saying DNA has that much data that can be extracted from it? If so, that’s nuts.
yes, all that data is extrapolated directly from DNA. It's a huge amount of information. All the DNA in a single human cell is directly translated to about 750MiB. Now, add in the fact that genomic studies use biobanks, like the UK Biobank, which contains the genetic info of hundreds of thousands of people. The data we can extrapolate from DNA is absolutely massive.
That’s too much science. We, as a people, need less sci- wait, no. No, no. Uh - We need bett-er? Science? Hmm.
Look just make it an animated cartoon with fun music for now and we’ll circle back.