Converting a PDF to a plain text file on Linux doesn’t have to be complicated. Whether you’re dealing with reports, documents, or something else, there are a few free tools that can get the job done quickly. Here are four great options you can try, with simple steps to follow.
Table of Contents
1. Convert PDFs with pdftotext
One of the easiest tools for converting PDFs to plain text on Linux is pdftotext
. It’s part of the Poppler utilities and works right from the command line.
How to Use It:
- First, make sure Poppler is installed on your system. You can do this by running:
sudo apt install poppler-utils
- Once it’s installed, use this command to convert your PDF to a text file:
pdftotext input.pdf output.txt
- If you don’t need to save the output to a file and just want to see the text in your terminal, try this:
pdftotext input.pdf -
Why It’s Great:
- Fast and reliable for most PDFs.
- Works offline, so no need for an internet connection.
Any Downsides?
- It might struggle with PDFs that have complex layouts or lots of images.
2. Use LibreOffice to Extract Text
Did you know LibreOffice can handle PDFs too? It’s a great way to extract text if you already have LibreOffice installed.
How to Do It:
- Open the PDF in LibreOffice Draw.
- Highlight the text you want, copy it, and paste it into a plain text editor.
- Alternatively, you can save the file as a
.txt
by selectingFile > Save As
and choosing the text format.
Why Choose This?
- Easy to use, especially if you’re not comfortable with command-line tools.
- Lets you make edits to the PDF before saving it as text.
Things to Keep in Mind:
- This method works best for small jobs. It’s not ideal for batch processing or very large files.
3. Try pdftk
for Advanced Options
If you’re up for a little scripting, pdftk
(PDF Toolkit) can also help you extract text from PDFs. It’s especially useful for those who like to automate tasks.
Steps to Get Started:
- Install
pdftk
with this command:sudo apt install pdftk
- Extract the content with:
pdftk input.pdf output output.txt uncompress
- You can even pair it with
awk
or other text-processing tools to clean up the output.
What’s Good About It?
- Super customizable for advanced users.
- A great option for handling multiple files at once.
The Drawbacks:
- It can be a bit of a learning curve if you’re not familiar with scripting.
4. Use Online Tools with wget
Don’t feel like installing software? Online tools can help. With wget
or curl
, you can upload your file to an online service and download the text.
Here’s How:
- Use a command like this to upload your PDF to a conversion API:
curl -F "file=@input.pdf" https://api.example.com/convert-to-text > output.txt
- Once the file is processed, download the converted text to your system.
Why Use Online Tools?
- No installation needed.
- Perfect for one-off conversions when you’re in a pinch.
Why You Might Skip It:
- You’ll need an internet connection.
- Be cautious if your PDF contains sensitive information.
FAQs
Can These Methods Handle Scanned PDFs?
Not directly. For scanned PDFs, you’ll need OCR (Optical Character Recognition) software like tesseract-ocr
.
What About PDFs in Different Languages?
Tools like pdftotext
generally support multiple languages, but you might need to install additional language packs on your system.
Which Option is Best for Batch Jobs?
pdftotext
or pdftk
are great choices for batch processing, especially if you automate with scripts.
Are There GUI Options for Casual Users?
Yes! PDF viewers like Okular or PDF Studio Viewer let you copy and paste text manually, which is a good option if you’re not comfortable with command-line tools.
No matter what kind of PDF you’re working with, there’s a free tool for Linux that can help you extract plain text quickly and easily. Start with pdftotext
for its simplicity, and explore the others if you need something more advanced.