Scalable identity resolution, entity resolution, data mastering and deduplication using ML
-
Updated
Feb 26, 2025 - Java
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Wrangler Transform: A DMD system for transforming Big Data
ETL processing tool with SQL-like language and GIS capabilities, built on core Spark. Extensible and modular. Rich CLI included
Preprocessing of data (e.g. filling missing values, normalization,etc.) in field of Data Mining (Knowledge Discovery).
The project efficiently processes user data, demonstrating key components. Explore the code for a structured approach to large-scale data transformations.
🗓️ iCalendar proxy reshaping the data for your needs
DeltaFi is a flexible, code-light data transformation and normalization platform.
Pluggable framework that can be used to spider websites and extract data.
Apache Spark based 'Dist' utility to supplement Data Cooker ETL tool
Api to receive IoT data from an end device
[👨🎓 BSc thesis] merGeo: Integration Platform For Linked Data Management Tools
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."