Skip to content
This repository has been archived by the owner on Jun 16, 2020. It is now read-only.
/ docker-tesseract Public archive

Tesseract in Docker Container running Alpine Linux

Notifications You must be signed in to change notification settings

Franky1/docker-tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docker-tesseract

Tesseract 3.0x.xx running in Docker Container based on Alpine Linux
Tesseract ist being installed from Alpine Linux Binary Package Repository with apk
Tesseract ist not being built from the sources using the latest 4.xx.xx development versions.
There will be a Dockerfile and scripts for building latest tesseract development version from sources in the future...

Tesseract

Tesseract is an Open Source OCR Engine.
The latest stable Tesseract version is 3.05.01 released on 2017-06-01
Original Tesseract Sources and Tesseract Wiki see Tesseract GitHub Repository


Status of this Repository

Latest changes in this repository were made on 2017-11-27

Versions (2017-11-27)

  • Current Alpine Linux version: 3.6
  • Current stable Tesseract apk binary version: 3.04.01-r1 from Alpine-Linux main Repository
  • Latest stable Tesseract apk binary version: 3.05.01-r2 from Alpine-Linux edge Repository
  • To build the latest Tesseract development versions 4.xx.xx from the sources see Tesseract Wiki

ToDo (2017-11-27)

  • Test all bash scripts and Dockerfiles
  • Add tessdata to the build scripts
  • Write Makefile for easier build options
  • Add build options for the main/edge alpine linux branches
  • Add build options for language data packages
  • Write scripts and Dockerfile for source build
  • Write scripts and Multistage Dockerfile for source build
  • Add python:3.6-alpine3.6 layer and a python wrapper library pytesseract and some sort of Interface/API to the container
  • Maybe adding file triggers for automatic OCR transformation (?)

Disclaimer (2017-11-27)

  • This repo is under development. Scripts are not finished yet!
  • I'm neither a Docker nor a Linux expert, so be patient...

Docker Containers

Docker Containers built from binary packages

Tesseract Docker Container based on Alpine Linux and apk binary main repository

  • Image based on Alpine Linux Image alpine:3.6
  • Installs Tesseract from Alpine Linux Repository main branch
  • Installs Language Data package for ENG and DEU (English and German)
  • Tesseract Version 3.04.01-r1

Tesseract Docker Container based on Alpine Linux and apk binary edge repository

  • Image based on Alpine Linux Image alpine:3.6
  • Installs Tesseract from Alpine Linux Repository edge branch
  • Installs Language Data package for ENG and DEU (English and German)
  • Tesseract Version 3.05.01-r2

Docker Containers built from sources

Tesseract Docker Container based on Debian 9 and latest Tesseract git sources

  • Image based on Debian 9 "Stretch" Docker Image debian:stretch
  • Installs Dependencies for Tesseract
  • Clone GitHub repository
  • Build Tesseract from sources
  • Install Tesseract

Running Tesseract

From the original Tesseract Wiki Documentation.
Basic command line usage:

tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]

For more information about the various command line options use tesseract --help or man tesseract.

About

Tesseract in Docker Container running Alpine Linux

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published