Show HN: Gogosseract, a Go Lib for CGo-Free Tesseract OCR via Wazero
dlock17 Saturday, November 04, 2023Tesseract is one of the largest Open Source OCR (Optical Character Recognition) projects. There is already a Go library for using Tesseract from Go with CGo, called Gosseract.
However if you are interested in OCR from Go without C complicating building and cross-compiling, there aren't any other options.
Wazero is a Go WASM runtime that doesn't have any CGo dependencies. With Emscripten Tesseract has been compiled to WASM and ran within Wazero.
Gogosseract provides a simple API on top of this. This project has been an interesting delve into the world of WASM.
Summary
The linked article is about a reimplementation of gosseract without CGo, using Tesseract compiled to WASM with Wazero. Tesseract is an OCR library written in C++. The article also discusses the use of a pool of Tesseract workers for thread-safe concurrent image parsing, and strategies for dealing with Tesseract's requirement for training data.
120
24
Summary
github.com