this post was submitted on 13 May 2026
745 points (100.0% liked)

Science Memes

20177 readers
1287 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.



Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] ptu@sopuli.xyz 3 points 9 hours ago (1 children)

Interesting, could you enlighten what types if data is in those 100 columns? I’m aware of ATGC and thought it would be just one column, but maybe the rest are some that indicate intensity or activity. Or what sequence they are part of.

[–] rockSlayer@lemmy.blahaj.zone 3 points 9 hours ago (2 children)

Well it varies depending on what the file is meant for. Usually there's columns like chromosome, variant position, reference nucleotide, observed nucleotide, type of variation, codon sequence, gene name, etc.

There's also columns that result from various analyses. In the file I've been working on lately, there are columns such as variant impact, level of confidence, pathogenicity, clinical significance, etc.

[–] The_v@lemmy.world 2 points 2 hours ago (1 children)

That sounds like a marker file. It's a bit different than a sequence file.

Molecular markers are linked to specific sequences in the DNA. These markers are generally close by or in the gene of interest. All the extra columns described its characteristics and results. Anyplace in the entire genome where there is one nucleotide difference (polymorphic) can be another marker. There's millions of these and they add up to massive files.

A sequence file is basically just a long boring sequence of nucleotides and are not that large. Now some of the files you use to generate the sequence. Let's just say they had to wait almost 20 years for computers to get fast enough to process those files in a reasonable time. Those make the marker files look like childs play.

[–] rockSlayer@lemmy.blahaj.zone 1 points 2 hours ago

I'm not familiar with the name of the file I'm currently working with tbh. It's used to create the annotation files for regenie analyses. It has every variant for every gene within the biobank. There's far more than just missense; there are stop/start gain/loss, splice donor/acceptor, frameshifts, and ptv. It contains primateAI scores, spliceAI scores, cava data, clinvar data, and more.

[–] ptu@sopuli.xyz 3 points 9 hours ago

Sweet, thanks for the reply. I didn’t expect to fully understand what they would contain but I got the idea.

There’s a Japanese artist Ryoji Ikeda who you might like, he has visualised DNA and all sorts of data. I like his data.gram exhibition’s style the most esthetically amusing and he has published some albums too.

https://www.taronasugallery.com/en/exhibitions/ryoji-ikeda%E3%80%8Cdata-gram%E3%80%8D/