Personal Project
Dec 10, 2025
<p>This is a powerful, all-in-one text extraction tool built entirely with modern web technologies. It solves the problem of digitizing printed documents and scraping web content efficiently.</p>
<ul style="margin-top: 10px; margin-left: 20px;">
<li><strong>AI-Powered OCR:</strong> Uses Tesseract.js v5 to recognize text in images with high accuracy.</li>
<li><strong>Multi-Language Support:</strong> Specialized support for <strong>Sinhala</strong> and <strong>Tamil</strong> languages, including mixed-language detection.</li>
<li><strong>Image Pre-processing:</strong> Automatically converts images to black & white and boosts contrast to improve readability before processing.</li>
<li><strong>PDF Reading:</strong> Can upload multi-page PDF files, converting each page into text automatically.</li>
<li><strong>Smart Web Scraper:</strong> Fetches external websites via proxies, cleans up ads/sidebars, and organizes content by sections.</li>
<li><strong>Typing Effect UI:</strong> Results are displayed with an engaging, terminal-style typing animation.</li>
</ul>