this post was submitted on 13 Aug 2024
52 points (93.3% liked)

Technology

83452 readers
2793 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Nomecks@lemmy.ca 3 points 2 years ago (3 children)

Is there a benefit to doing CoW with Pandas vs. offloading it to the storage? Practically all modern storage systems support CoW snaps. The pattern I'm used to (Infra, not big data) is to leverage storage APIs to offload storage operations from client systems.

[–] sem@lemmy.ml 1 points 2 years ago (1 children)

If you are doing data processing in pandas CoW allows to avoid of a lot of redundant computations on intermediate steps. Before CoW any data processing in Pandas required manual and careful working with code to avoid the case described in the blog post. To be honest I cannot imagine the case of offloading each result of each operation in the pipeline to the storage…

[–] Nomecks@lemmy.ca 2 points 2 years ago (1 children)

So you would be using CoW in-memory in this case?

[–] sem@lemmy.ml 0 points 2 years ago

If I already use Pandas for processing my data in-memory, CoW can significantly improve the performance. That was my point.

[–] LunarLoony@lemmy.sdf.org 1 points 2 years ago

I'm confused by all this talk of black-and-white animals. Can we instead use a Zebra node and put it behind a TuxedoCat cluster? I've also heard good things about barred-knifejaw as a data warehouse.

(Genuine question: what are Pandas and Cows in this context?)