Tools Featured

Optical Character Recognition (OCR)

Client

Personal Project

Project Date

Dec 10, 2025

Tech Stack

HTML5 Tailwind CSS JavaScript Tesseract.js PDF.js

Project Overview

<p>This is a powerful, all-in-one text extraction tool built entirely with modern web technologies. It solves the problem of digitizing printed documents and scraping web content efficiently.</p>

<ul style="margin-top: 10px; margin-left: 20px;">

<li><strong>AI-Powered OCR:</strong> Uses Tesseract.js v5 to recognize text in images with high accuracy.</li>

<li><strong>Multi-Language Support:</strong> Specialized support for <strong>Sinhala</strong> and <strong>Tamil</strong> languages, including mixed-language detection.</li>

<li><strong>Image Pre-processing:</strong> Automatically converts images to black & white and boosts contrast to improve readability before processing.</li>

<li><strong>PDF Reading:</strong> Can upload multi-page PDF files, converting each page into text automatically.</li>

<li><strong>Smart Web Scraper:</strong> Fetches external websites via proxies, cleans up ads/sidebars, and organizes content by sections.</li>

<li><strong>Typing Effect UI:</strong> Results are displayed with an engaging, terminal-style typing animation.</li>

</ul>