<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>danwin.com &#187; ocr</title>
	<atom:link href="https://danwin.com/tag/ocr/feed/" rel="self" type="application/rss+xml" />
	<link>https://danwin.com</link>
	<description>Words, photos, and code by Dan Nguyen. The &#039;g&#039; is mostly silent.</description>
	<lastBuildDate>Thu, 21 Nov 2019 12:29:57 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.2.39</generator>
	<item>
		<title>A HTML GUI for training Tesseract on character sets</title>
		<link>https://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/</link>
		<comments>https://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/#comments</comments>
		<pubDate>Thu, 10 May 2012 20:02:27 +0000</pubDate>
		<dc:creator><![CDATA[Dan Nguyen]]></dc:creator>
				<category><![CDATA[works]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[tesseract]]></category>

		<guid isPermaLink="false">https://danwin.com/?p=1971</guid>
		<description><![CDATA[<p>The Tesseract OCR Chopper, by data journalist Dino Beslagic. I&#8217;m making this short stub post because ever since I&#8217;ve used tesseract to convert scanned documents into text, I&#8217;ve wondered why the hell is it so hard to train tesseract (to make it better at recognizing a font)? As it turns out, Beslagic created a web-app [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/">A HTML GUI for training Tesseract on character sets</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>The <a href="http://pp19dd.com/tesseract-ocr-chopper/">Tesseract OCR Chopper</a>, by data journalist <a href="http://pp19dd.com/things/about/">Dino Beslagic</a>.</p>
<p>I&#8217;m making this short stub post because ever since I&#8217;ve used <a href="http://code.google.com/p/tesseract-ocr/">tesseract</a> to convert scanned documents into text, I&#8217;ve wondered why the hell is it so hard to train tesseract (to make it better at recognizing a font)? As it turns out, <a href="http://pp19dd.com/tesseract-ocr-chopper/">Beslagic created a web-app that makes the task comparatively easy and platform-independent</a>.</p>
<p>He recently updated it but posted it about 2 years ago. I can&#8217;t believe I didn&#8217;t find it until now. How did I find it? By stumbling upon the <a href="http://code.google.com/p/tesseract-ocr/wiki/AddOns">&#8220;AddOns&#8221; wiki</a> for the Tesseract project. I love Tesseract but am surprised at how such a useful and popular utility can have such scattered resources.</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/">A HTML GUI for training Tesseract on character sets</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
