Tesseract 3.0x.xx running in Docker Container based on Alpine Linux
Tesseract ist being installed from Alpine Linux Binary Package Repository with apk
Tesseract ist not being built from the sources using the latest 4.xx.xx development versions.
There will be a Dockerfile and scripts for building latest tesseract development version from sources in the future...
Tesseract is an Open Source OCR Engine.
The latest stable Tesseract version is 3.05.01
released on 2017-06-01
Original Tesseract Sources and Tesseract Wiki see Tesseract GitHub Repository
Latest changes in this repository were made on 2017-11-27
- Current Alpine Linux version:
3.6
- Current stable Tesseract
apk
binary version:3.04.01-r1
from Alpine-Linuxmain
Repository - Latest stable Tesseract
apk
binary version:3.05.01-r2
from Alpine-Linuxedge
Repository - To build the latest Tesseract development versions
4.xx.xx
from the sources see Tesseract Wiki
- Test all bash scripts and Dockerfiles
- Add tessdata to the build scripts
- Write Makefile for easier build options
- Add build options for the
main/edge
alpine linux branches - Add build options for language data packages
- Write scripts and Dockerfile for source build
- Write scripts and Multistage Dockerfile for source build
- Add
python:3.6-alpine3.6
layer and a python wrapper librarypytesseract
and some sort of Interface/API to the container - Maybe adding file triggers for automatic OCR transformation (?)
- This repo is under development. Scripts are not finished yet!
- I'm neither a Docker nor a Linux expert, so be patient...
- Image based on Alpine Linux Image
alpine:3.6
- Installs Tesseract from Alpine Linux Repository
main
branch - Installs Language Data package for ENG and DEU (English and German)
- Tesseract Version
3.04.01-r1
- Image based on Alpine Linux Image
alpine:3.6
- Installs Tesseract from Alpine Linux Repository
edge
branch - Installs Language Data package for ENG and DEU (English and German)
- Tesseract Version
3.05.01-r2
- Image based on Debian 9 "Stretch" Docker Image
debian:stretch
- Installs Dependencies for Tesseract
- Clone GitHub repository
- Build Tesseract from sources
- Install Tesseract
From the original Tesseract Wiki Documentation.
Basic command line usage:
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
For more information about the various command line options use tesseract --help
or man tesseract
.