First you have to accept epub into your heart
askchapo
Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.
Rules:
-
Posts must ask a question.
-
If the question asked is serious, answer seriously.
-
Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.
-
Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.
OK I swallowed the Kobo, what next?
rookie mistake. you have to liquefy the Kobo and inject it like the documentary Pulp Fiction

Instead of doing any converting you could probably find the epub on Anna's Archive. I've never had any problems finding books on there, even pretty obscure stuff. They also seem to have everything in both epub and pdf. Good luck friend!
Good advice, thanks! Actually I got the PDF from Anna, there was no epub available :/
I would just find the books in .epub to begin with. PDF is an evil format. Just in case anyone needs book sources:
woah, cool site!
pdf is a printing format. epub is a type of html essentially. you essentially want to turn a book into a webpage. it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.
TLDR: pdf and epub are very different formats. you cannot easily convert pdf to epub (but epub to pdf is much easier).
it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.
Yea, seems like it :/ thanks
PDFs are styling with text. The footnotes are usually just plain text, with no connection, no different from the rest of the text—unlike in EPUBs where they are usually connected through anchors, bonus if they have epub:type, and the footnote text is usually away from the rest of the chapter text. AFAIK there is no good way of automatically converting from PDF to EPUB. So to answer the question in the title, manually.
ⓘ This user is suspected of being a cat. Please report any suspicious behavior.
Bah, thanks. It's so annoying bc highlighting the page and doing copy paste also mixes the text of the two columns.
Epub is a zip file with html files inside of it. You can rename epub to zip and extract it with any archive tool.
PDF is a document format.
Book PDFs can contain text or sometimes pictures of text if it is a scanned book. Images of text can be converted into text using OCR software.
If you have like some basic programming knowledge, you could write a script to convert your specific book to the epub style you want.
You could see if the book is already available in epub form on LibGen.
ePUB is basically zipped HTML, so while it's easy to convert from, it's hard to convert to. You might just want to try to find your book in an alternative format from somewhere like Annas Archive. I think azw3 and mobi's can be converted to ePUB easier.
Really the only good way is to manually recreate the book, there's no good automatic pdf to epub converter. You might be able to hire a guy on fiverr or such to do it for you, that's the closest I can think of to automatic.
Use koreader in two column mode if you can. It's kinda funky but will let you read the text at a more appropriate size if that's your issue
lol yea, size is the issue and it's just so awkward to read
Like others have said, you probably have to do it manually. If the pdf has a lot of pages though, and they're all in a similar format, it might be easier to script it using something like https://github.com/jsvine/pdfplumber
I haven't tried this tool, but it claims to be able to re-flow PDF text: https://www.willus.com/k2pdfopt/
Cool, thank you. I'll give it a try.
The PDF Conversion Tips page is interesting.
ⓘ This user is suspected of being a cat. Please report any suspicious behavior.
From that link:
I've been on mobileread.com since 2011, regularly reading the PDF forum, and probably the most common question from new members regarding PDFs is about the best way to view them on e-readers such as the Kindle, Kobo, Nook, etc.
How can you be so helpful, Edie? Thanks!
https://cloudconvert.com/pdf-to-epub
First google result for 'pdf to epub online converter', just tried it myself on a random pdf and the converted epub opened fine in calibre
Thanks I will give it a try when I get back. Did the text come out ok for you, like were all the words in the same order?
From my glance at the first paragraph yes, even the font was the same
Is that an issue you've encountered with other converters?
Yes, the same column fuckery persists :/
I tried https://redstarpublishers.org/adoratsky.pdf in the one you shared. It's good compared to all the PDF converts I've seen. And if I had to read it without making any changes to it, it'll certainly do. But it could use some manual intervention. There are random line breaks, blockquotes are not blockquotes, and footnotes are just... in the text. That's at least what I see at a glance.
Edit: Wait, hang on, cloudconvert is just using Calibre! It's the exact same output. Every css class is calibre[number]. And stuff like the OPF contain metadata with calibre: <dc:contributor opf:role="bkp">calibre (8.4.0) [https://calibre-ebook.com/]</dc:contributor>
ⓘ This user is suspected of being a cat. Please report any suspicious behavior.
Calibre is your friend
Please read the post before commenting /gen
ⓘ This user is suspected of being a cat. Please report any suspicious behavior.
Pdfs an be set up in a lot of different ways.
One way is where text is encoded into the document like if text were aligned and sized just right for one of those typewriters with the white out ribbon. Text encoded into the pdf in this way can be selected, edited and copied just like any other kind of document.
Another way is where text is embedded into the document, like a picture of a newspaper article pasted onto a piece of paper. Text in the pdf like this can’t be manipulated or selected and is the kind you’re having problems with.
The way to get around that kind of text is optical character recognition. OCR software analyzes images of text and figures out what characters it corresponds to. Just chase down some free ocr package and input your pdf.
Cool, thank you very much. I got k2pdf (courtesy of another dope-ass bear) to get the two columns + footnotes in the original pdf into a pdf that is just one column with footnotes clearly distinguishable. Now I need just what you're saying because the result of the k2pdf conversion is an image that I can't select text from (but the words are all in the right order, which is good).
Tesseract seems like a popular choice, I'll give that a try.
Tesseract doesn't support PDF input, you'll need some other program like ocrmypdf (which I have used. It uses tesseract), or extract each page to it's own image (which I have also done but I forget how right now.)
ⓘ This user is suspected of being a cat. Please report any suspicious behavior.
Thanks again! You're the best :)
This looks like exactly what I need. After getting the formatting right with k2pdf I can then use ocrmypdf to get it back to text form and then just ctrl + a copy to writer and export as epub, since the pdf size is like 15x the epub size.
