Tag Archives: bash

Using PDFTOTEXT to convert a batch of PDFs to text and splitting them by page

I can’t believe how hard it was to find this (also, I know basically nothing about bash scripting), so maybe the next person who Googles this will find this post and save themselves a few minutes:

(replace ‘999’ with the number of pages in a document)

for f in *.PDF; 
   do 
         for i in {1..999}; 
         do 
         pdftotext -f "$i" -l $l "$i" -layout $f "${f%.PDF}_$1.txt"; 
     done; 
done

Or:
for f in *.PDF; do for i in {1..999}; do pdftotext -f "$i" -l $l "$i" -layout $f "${f%.PDF}_$i.txt"; done; done

The above script will tell pdftotext to take every .PDF file and convert each page into a separate text file in the format original_file_name_pagenumber.txt