<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>danwin.com &#187; pdf</title>
	<atom:link href="https://danwin.com/tag/pdf/feed/" rel="self" type="application/rss+xml" />
	<link>https://danwin.com</link>
	<description>Words, photos, and code by Dan Nguyen. The &#039;g&#039; is mostly silent.</description>
	<lastBuildDate>Thu, 21 Nov 2019 12:29:57 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.2.39</generator>
	<item>
		<title>Using PDFTOTEXT to convert a batch of PDFs to text and splitting them by page</title>
		<link>https://danwin.com/2009/12/using-pdftotext-to-convert-a-batch-of-pdfs-to-text-and-splitting-them-by-page/</link>
		<comments>https://danwin.com/2009/12/using-pdftotext-to-convert-a-batch-of-pdfs-to-text-and-splitting-them-by-page/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 23:49:30 +0000</pubDate>
		<dc:creator><![CDATA[Dan Nguyen]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[pdf]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">https://danwin.com/uncategorized/using-pdftotext-to-convert-a-batch-of-pdfs-to-text-and-splitting-them-by-page/</guid>
		<description><![CDATA[<p>I can&#8217;t believe how hard it was to find this (also, I know basically nothing about bash scripting), so maybe the next person who Googles this will find this post and save themselves a few minutes: (replace &#8216;999&#8217; with the number of pages in a document) for f in *.PDF; do for i in {1..999}; [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2009/12/using-pdftotext-to-convert-a-batch-of-pdfs-to-text-and-splitting-them-by-page/">Using PDFTOTEXT to convert a batch of PDFs to text and splitting them by page</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>I can&#8217;t believe how hard it was to find this (also, I know basically nothing about bash scripting), so maybe the next person who Googles this will find this post and save themselves a few minutes:</p>
<p>(replace &#8216;999&#8217; with the number of pages in a document)</p>
<pre>
for f in *.PDF; 
   do 
         for i in {1..999}; 
         do 
         pdftotext -f "$i" -l $l "$i" -layout $f "${f%.PDF}_$1.txt"; 
     done; 
done
</pre>
<p>Or:<br />
<code>for f in *.PDF; do for i in {1..999}; do pdftotext -f "$i" -l $l "$i" -layout $f "${f%.PDF}_$i.txt"; done; done</code></p>
<p>The above script will tell pdftotext to take every .PDF file and convert each page into a separate text file in the format original_file_name_<b>pagenumber</b>.txt</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2009/12/using-pdftotext-to-convert-a-batch-of-pdfs-to-text-and-splitting-them-by-page/">Using PDFTOTEXT to convert a batch of PDFs to text and splitting them by page</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://danwin.com/2009/12/using-pdftotext-to-convert-a-batch-of-pdfs-to-text-and-splitting-them-by-page/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
