this post was submitted on 09 Dec 2025
39 points (100.0% liked)

askchapo

23183 readers
148 users here now

Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.

Rules:

  1. Posts must ask a question.

  2. If the question asked is serious, answer seriously.

  3. Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.

  4. Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.

founded 5 years ago
MODERATORS
 

How on planet Earth can I change this pdf to epub? I tried everything I could think of in Calibre but the problem is that the pdf has 2 columns of text per page, plus footnotes on each page. When it converts to epub it just prints each line of each text column as a line of text, which makes it totally lose it's meaning. Footnotes are also just added as regular text, as part of a supremely incoherent story with aggressive punctuation.

Has anybody been able to solve this before?

top 32 comments
sorted by: hot top controversial new old
[–] stupid_asshole69@hexbear.net 31 points 2 days ago (2 children)

First you have to accept epub into your heart

[–] fort_burp@feddit.nl 17 points 1 day ago (1 children)

OK I swallowed the Kobo, what next?

[–] miz@hexbear.net 10 points 1 day ago

rookie mistake. you have to liquefy the Kobo and inject it like the documentary Pulp Fiction

[–] Edamamebean@hexbear.net 3 points 1 day ago* (last edited 1 day ago) (1 children)

Instead of doing any converting you could probably find the epub on Anna's Archive. I've never had any problems finding books on there, even pretty obscure stuff. They also seem to have everything in both epub and pdf. Good luck friend!

https://annas-archive.org/

[–] fort_burp@feddit.nl 1 points 1 day ago

Good advice, thanks! Actually I got the PDF from Anna, there was no epub available :/

[–] thefunkycomitatus@hexbear.net 15 points 1 day ago (1 children)

I would just find the books in .epub to begin with. PDF is an evil format. Just in case anyone needs book sources:

https://fmhy.net/reading

[–] fort_burp@feddit.nl 2 points 1 day ago

woah, cool site!

[–] starkillerfish@hexbear.net 16 points 1 day ago (1 children)

pdf is a printing format. epub is a type of html essentially. you essentially want to turn a book into a webpage. it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.

TLDR: pdf and epub are very different formats. you cannot easily convert pdf to epub (but epub to pdf is much easier).

[–] fort_burp@feddit.nl 8 points 1 day ago

it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.

Yea, seems like it :/ thanks

[–] Edie@hexbear.net 14 points 2 days ago* (last edited 1 day ago) (1 children)

PDFs are styling with text. The footnotes are usually just plain text, with no connection, no different from the rest of the text—unlike in EPUBs where they are usually connected through anchors, bonus if they have epub:type, and the footnote text is usually away from the rest of the chapter text. AFAIK there is no good way of automatically converting from PDF to EPUB. So to answer the question in the title, manually.


This user is suspected of being a cat. Please report any suspicious behavior.

[–] fort_burp@feddit.nl 7 points 1 day ago

Bah, thanks. It's so annoying bc highlighting the page and doing copy paste also mixes the text of the two columns.

[–] dead@hexbear.net 7 points 1 day ago

Epub is a zip file with html files inside of it. You can rename epub to zip and extract it with any archive tool.

PDF is a document format.

Book PDFs can contain text or sometimes pictures of text if it is a scanned book. Images of text can be converted into text using OCR software.

If you have like some basic programming knowledge, you could write a script to convert your specific book to the epub style you want.

You could see if the book is already available in epub form on LibGen.

https://en.wikipedia.org/wiki/Library_Genesis

[–] oscardejarjayes@hexbear.net 8 points 1 day ago* (last edited 1 day ago)

ePUB is basically zipped HTML, so while it's easy to convert from, it's hard to convert to. You might just want to try to find your book in an alternative format from somewhere like Annas Archive. I think azw3 and mobi's can be converted to ePUB easier.

Really the only good way is to manually recreate the book, there's no good automatic pdf to epub converter. You might be able to hire a guy on fiverr or such to do it for you, that's the closest I can think of to automatic.

[–] bobs_guns@lemmygrad.ml 5 points 1 day ago (1 children)

Use koreader in two column mode if you can. It's kinda funky but will let you read the text at a more appropriate size if that's your issue

[–] fort_burp@feddit.nl 2 points 1 day ago

lol yea, size is the issue and it's just so awkward to read

[–] techpeakedin1991@lemmy.ml 7 points 1 day ago

Like others have said, you probably have to do it manually. If the pdf has a lot of pages though, and they're all in a similar format, it might be easier to script it using something like https://github.com/jsvine/pdfplumber

[–] Beaver@hexbear.net 4 points 2 days ago (2 children)

I haven't tried this tool, but it claims to be able to re-flow PDF text: https://www.willus.com/k2pdfopt/

[–] fort_burp@feddit.nl 4 points 1 day ago

Cool, thank you. I'll give it a try.

[–] Edie@hexbear.net 4 points 1 day ago* (last edited 1 day ago) (1 children)

The PDF Conversion Tips page is interesting.


This user is suspected of being a cat. Please report any suspicious behavior.

[–] fort_burp@feddit.nl 2 points 1 day ago

From that link:

I've been on mobileread.com since 2011, regularly reading the PDF forum, and probably the most common question from new members regarding PDFs is about the best way to view them on e-readers such as the Kindle, Kobo, Nook, etc.

How can you be so helpful, Edie? Thanks!

[–] ClathrateG@hexbear.net 4 points 2 days ago (1 children)

https://cloudconvert.com/pdf-to-epub

First google result for 'pdf to epub online converter', just tried it myself on a random pdf and the converted epub opened fine in calibre

[–] fort_burp@feddit.nl 3 points 1 day ago (1 children)

Thanks I will give it a try when I get back. Did the text come out ok for you, like were all the words in the same order?

[–] ClathrateG@hexbear.net 2 points 1 day ago* (last edited 1 day ago) (2 children)

From my glance at the first paragraph yes, even the font was the same

Is that an issue you've encountered with other converters?

[–] fort_burp@feddit.nl 2 points 1 day ago

Yes, the same column fuckery persists :/

[–] Edie@hexbear.net 8 points 1 day ago* (last edited 1 day ago)

I tried https://redstarpublishers.org/adoratsky.pdf in the one you shared. It's good compared to all the PDF converts I've seen. And if I had to read it without making any changes to it, it'll certainly do. But it could use some manual intervention. There are random line breaks, blockquotes are not blockquotes, and footnotes are just... in the text. That's at least what I see at a glance.

Edit: Wait, hang on, cloudconvert is just using Calibre! It's the exact same output. Every css class is calibre[number]. And stuff like the OPF contain metadata with calibre: <dc:contributor opf:role="bkp">calibre (8.4.0) [https://calibre-ebook.com/]</dc:contributor>


This user is suspected of being a cat. Please report any suspicious behavior.

[–] Monk3brain3@hexbear.net 1 points 1 day ago (1 children)
[–] Edie@hexbear.net 6 points 1 day ago

Please read the post before commenting /gen


This user is suspected of being a cat. Please report any suspicious behavior.

[–] stupid_asshole69@hexbear.net 1 points 1 day ago (1 children)

Pdfs an be set up in a lot of different ways.

One way is where text is encoded into the document like if text were aligned and sized just right for one of those typewriters with the white out ribbon. Text encoded into the pdf in this way can be selected, edited and copied just like any other kind of document.

Another way is where text is embedded into the document, like a picture of a newspaper article pasted onto a piece of paper. Text in the pdf like this can’t be manipulated or selected and is the kind you’re having problems with.

The way to get around that kind of text is optical character recognition. OCR software analyzes images of text and figures out what characters it corresponds to. Just chase down some free ocr package and input your pdf.

[–] fort_burp@feddit.nl 1 points 1 day ago (1 children)

Cool, thank you very much. I got k2pdf (courtesy of another dope-ass bear) to get the two columns + footnotes in the original pdf into a pdf that is just one column with footnotes clearly distinguishable. Now I need just what you're saying because the result of the k2pdf conversion is an image that I can't select text from (but the words are all in the right order, which is good).

Tesseract seems like a popular choice, I'll give that a try.

[–] Edie@hexbear.net 2 points 1 day ago* (last edited 1 day ago) (1 children)

Tesseract doesn't support PDF input, you'll need some other program like ocrmypdf (which I have used. It uses tesseract), or extract each page to it's own image (which I have also done but I forget how right now.)


This user is suspected of being a cat. Please report any suspicious behavior.

[–] fort_burp@feddit.nl 2 points 1 day ago

Thanks again! You're the best :)

This looks like exactly what I need. After getting the formatting right with k2pdf I can then use ocrmypdf to get it back to text form and then just ctrl + a copy to writer and export as epub, since the pdf size is like 15x the epub size.