Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
-
Updated
Dec 3, 2024 - HTML
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Reference Architectures for Datalakes on AWS
Tokyo-olympic-azure-data-engineering-end-to-end-project
This course will teach students to use popular tools for sourcing data, transforming it, building and optimizing models, communicating these as visual stories, and deploying them in production.
Course materials for CDS 101: Introduction to Computational and Data Sciences, offered at George Mason University
🎸 XML Guitars Project showcases the use of XML, XSLT, CSS, DTD, and JavaScript to create an interactive and visually appealing display of structured guitar data. This project transforms raw XML into dynamic web pages with filtering and search features.
A website to help users view, verify and modify data for preprocessing and apply various classical ML algorrithms
Explores MOOC engagement using the PhD dataset and multiple iterations. It classifies learners into Completers, Disengaging Learners, Auditing Learners, and Bystanders based on their interactions. Using dplyr, it merges, transforms, and aggregates data to analyze engagement trends and retention, offering insights for improving online learning.
XABN (XML Abbreviated Notation) - Combines a simplified format for representing XML data with a cross platform object notation covering a comprehensive range of data structures and types. XABN allows for the direct exchange of objects between applications including XML generation and conversion.
Implementation of a traditional classifier of argumentative components (claims and premises), trained with features/metadata previously extracted from manually annotated argumentative sentences from the citizen proposals available in the Decide Madrid platform.
The website is now described as an educational resource for data management, with the objective of educating, engaging, guiding, and providing resources.
Uno strumento web per la conversione tra JSON e XML.
Performs EDA on fertility rates and national income, fits a simple linear regression model, diagnoses its validity, and use it to make predictions about future fertility based on income
This repository provides an introduction to essential data analysis libraries, including Numpy and Pandas.
Slides for a presentation on how CDVS helps patrons with data transformations.
A Jupyter notebook documentation of an ETL (extract -> transform -> load) data pipeline
Skills: Python (Pandas, Numpy, Matplotlib, Seaborn, Sklearn, Statsmodels)
Introduction to Diffusion Real-Time Event Stream through a simple application using Diffusion Cloud and Apache Kafka. A simple projects illustrating real-time replication and fan-out of foreign exchange (fx) event streams from Kafka cluster A to Kafka cluster B, through Diffusion Cloud instance via the use of our Kafka Adapter.
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."