this post was submitted on 09 Dec 2025
39 points (100.0% liked)

askchapo

23184 readers
187 users here now

Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.

Rules:

  1. Posts must ask a question.

  2. If the question asked is serious, answer seriously.

  3. Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.

  4. Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.

founded 5 years ago
MODERATORS
 

How on planet Earth can I change this pdf to epub? I tried everything I could think of in Calibre but the problem is that the pdf has 2 columns of text per page, plus footnotes on each page. When it converts to epub it just prints each line of each text column as a line of text, which makes it totally lose it's meaning. Footnotes are also just added as regular text, as part of a supremely incoherent story with aggressive punctuation.

Has anybody been able to solve this before?

top 32 comments
sorted by: hot top controversial new old
[–] stupid_asshole69@hexbear.net 31 points 4 days ago (2 children)

First you have to accept epub into your heart

[–] fort_burp@feddit.nl 17 points 4 days ago (1 children)

OK I swallowed the Kobo, what next?

[–] miz@hexbear.net 10 points 4 days ago

rookie mistake. you have to liquefy the Kobo and inject it like the documentary Pulp Fiction

[–] thefunkycomitatus@hexbear.net 16 points 4 days ago (1 children)

I would just find the books in .epub to begin with. PDF is an evil format. Just in case anyone needs book sources:

https://fmhy.net/reading

[–] fort_burp@feddit.nl 3 points 3 days ago

woah, cool site!

[–] starkillerfish@hexbear.net 16 points 4 days ago (1 children)

pdf is a printing format. epub is a type of html essentially. you essentially want to turn a book into a webpage. it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.

TLDR: pdf and epub are very different formats. you cannot easily convert pdf to epub (but epub to pdf is much easier).

[–] fort_burp@feddit.nl 8 points 4 days ago

it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.

Yea, seems like it :/ thanks

[–] Edie@hexbear.net 14 points 4 days ago* (last edited 4 days ago) (1 children)

PDFs are styling with text. The footnotes are usually just plain text, with no connection, no different from the rest of the text—unlike in EPUBs where they are usually connected through anchors, bonus if they have epub:type, and the footnote text is usually away from the rest of the chapter text. AFAIK there is no good way of automatically converting from PDF to EPUB. So to answer the question in the title, manually.


This user is suspected of being a cat. Please report any suspicious behavior.

[–] fort_burp@feddit.nl 7 points 4 days ago

Bah, thanks. It's so annoying bc highlighting the page and doing copy paste also mixes the text of the two columns.

[–] dead@hexbear.net 7 points 3 days ago

Epub is a zip file with html files inside of it. You can rename epub to zip and extract it with any archive tool.

PDF is a document format.

Book PDFs can contain text or sometimes pictures of text if it is a scanned book. Images of text can be converted into text using OCR software.

If you have like some basic programming knowledge, you could write a script to convert your specific book to the epub style you want.

You could see if the book is already available in epub form on LibGen.

https://en.wikipedia.org/wiki/Library_Genesis

[–] oscardejarjayes@hexbear.net 8 points 4 days ago* (last edited 4 days ago)

ePUB is basically zipped HTML, so while it's easy to convert from, it's hard to convert to. You might just want to try to find your book in an alternative format from somewhere like Annas Archive. I think azw3 and mobi's can be converted to ePUB easier.

Really the only good way is to manually recreate the book, there's no good automatic pdf to epub converter. You might be able to hire a guy on fiverr or such to do it for you, that's the closest I can think of to automatic.

[–] techpeakedin1991@lemmy.ml 7 points 4 days ago

Like others have said, you probably have to do it manually. If the pdf has a lot of pages though, and they're all in a similar format, it might be easier to script it using something like https://github.com/jsvine/pdfplumber

[–] Edamamebean@hexbear.net 3 points 3 days ago* (last edited 3 days ago) (1 children)

Instead of doing any converting you could probably find the epub on Anna's Archive. I've never had any problems finding books on there, even pretty obscure stuff. They also seem to have everything in both epub and pdf. Good luck friend!

https://annas-archive.org/

[–] fort_burp@feddit.nl 1 points 3 days ago

Good advice, thanks! Actually I got the PDF from Anna, there was no epub available :/

[–] bobs_guns@lemmygrad.ml 5 points 3 days ago (1 children)

Use koreader in two column mode if you can. It's kinda funky but will let you read the text at a more appropriate size if that's your issue

[–] fort_burp@feddit.nl 2 points 3 days ago

lol yea, size is the issue and it's just so awkward to read

[–] Beaver@hexbear.net 4 points 4 days ago (2 children)

I haven't tried this tool, but it claims to be able to re-flow PDF text: https://www.willus.com/k2pdfopt/

[–] fort_burp@feddit.nl 4 points 4 days ago

Cool, thank you. I'll give it a try.

[–] Edie@hexbear.net 4 points 4 days ago* (last edited 4 days ago) (1 children)

The PDF Conversion Tips page is interesting.


This user is suspected of being a cat. Please report any suspicious behavior.

[–] fort_burp@feddit.nl 2 points 3 days ago

From that link:

I've been on mobileread.com since 2011, regularly reading the PDF forum, and probably the most common question from new members regarding PDFs is about the best way to view them on e-readers such as the Kindle, Kobo, Nook, etc.

How can you be so helpful, Edie? Thanks!

[–] ClathrateG@hexbear.net 4 points 4 days ago (1 children)

https://cloudconvert.com/pdf-to-epub

First google result for 'pdf to epub online converter', just tried it myself on a random pdf and the converted epub opened fine in calibre

[–] fort_burp@feddit.nl 3 points 4 days ago (1 children)

Thanks I will give it a try when I get back. Did the text come out ok for you, like were all the words in the same order?

[–] ClathrateG@hexbear.net 2 points 4 days ago* (last edited 4 days ago) (2 children)

From my glance at the first paragraph yes, even the font was the same

Is that an issue you've encountered with other converters?

[–] Edie@hexbear.net 8 points 4 days ago* (last edited 3 days ago)

I tried https://redstarpublishers.org/adoratsky.pdf in the one you shared. It's good compared to all the PDF converts I've seen. And if I had to read it without making any changes to it, it'll certainly do. But it could use some manual intervention. There are random line breaks, blockquotes are not blockquotes, and footnotes are just... in the text. That's at least what I see at a glance.

Edit: Wait, hang on, cloudconvert is just using Calibre! It's the exact same output. Every css class is calibre[number]. And stuff like the OPF contain metadata with calibre: <dc:contributor opf:role="bkp">calibre (8.4.0) [https://calibre-ebook.com/]</dc:contributor>


This user is suspected of being a cat. Please report any suspicious behavior.

[–] fort_burp@feddit.nl 2 points 3 days ago

Yes, the same column fuckery persists :/

[–] Monk3brain3@hexbear.net 1 points 3 days ago (1 children)
[–] Edie@hexbear.net 6 points 3 days ago

Please read the post before commenting /gen


This user is suspected of being a cat. Please report any suspicious behavior.

[–] stupid_asshole69@hexbear.net 1 points 3 days ago (1 children)

Pdfs an be set up in a lot of different ways.

One way is where text is encoded into the document like if text were aligned and sized just right for one of those typewriters with the white out ribbon. Text encoded into the pdf in this way can be selected, edited and copied just like any other kind of document.

Another way is where text is embedded into the document, like a picture of a newspaper article pasted onto a piece of paper. Text in the pdf like this can’t be manipulated or selected and is the kind you’re having problems with.

The way to get around that kind of text is optical character recognition. OCR software analyzes images of text and figures out what characters it corresponds to. Just chase down some free ocr package and input your pdf.

[–] fort_burp@feddit.nl 1 points 3 days ago (1 children)

Cool, thank you very much. I got k2pdf (courtesy of another dope-ass bear) to get the two columns + footnotes in the original pdf into a pdf that is just one column with footnotes clearly distinguishable. Now I need just what you're saying because the result of the k2pdf conversion is an image that I can't select text from (but the words are all in the right order, which is good).

Tesseract seems like a popular choice, I'll give that a try.

[–] Edie@hexbear.net 2 points 3 days ago* (last edited 3 days ago) (1 children)

Tesseract doesn't support PDF input, you'll need some other program like ocrmypdf (which I have used. It uses tesseract), or extract each page to it's own image (which I have also done but I forget how right now.)


This user is suspected of being a cat. Please report any suspicious behavior.

[–] fort_burp@feddit.nl 2 points 3 days ago

Thanks again! You're the best :)

This looks like exactly what I need. After getting the formatting right with k2pdf I can then use ocrmypdf to get it back to text form and then just ctrl + a copy to writer and export as epub, since the pdf size is like 15x the epub size.