Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
Updated
Sep 24, 2024 - Python
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Topic Modelling for Humans
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
Anomaly detection related books, papers, videos, and toolboxes
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A unified framework for machine learning with time series
🍊 📊 💡 Orange: Interactive data analysis
A library of extension and helper modules for Python's data analysis and machine learning libraries.
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
novel deep learning research works with PaddlePaddle
A curated list of data mining papers about fraud detection.
Comprehensive and timely academic information on federated learning (papers, frameworks, datasets, tutorials, workshops)
Multi-class confusion matrix library in Python
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.
AIL framework - Analysis Information Leak framework. Project moved to /~https://github.com/ail-project
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
pyclustering is a Python, C++ data mining library.
Add a description, image, and links to the data-mining topic page so that developers can more easily learn about it.
To associate your repository with the data-mining topic, visit your repo's landing page and select "manage topics."