You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Easily convert JSON data into Parquet format for efficient storage and analysis. Simplify data processing and analysis pipelines by converting JSON objects into optimized Parquet files.
This repository contains the NYC Taxi Data Engineering Pipeline project, which aims to build a comprehensive data engineering pipeline using NYC taxi data from the years 2022 and 2023. The pipeline involves extracting, transforming and loading (ETL) data into a Snowflake database, followed by creating a dashboard for visualisation.
Aplicação que captura mensagens de um grupo de Telegram e as armazena diariamente em arquivos, utilizando AWS S3 para armazenamento em nuvem. Em seguida, as mensagens são analisadas com foco em sentimento, menções a produtos da empresa e detecção de intenção de compra. O processamento é automatizado em batch usando funções Lambda da AWS.