This solution provides an automated, serverless way to redact sensitive data from PDF files using Google Cloud Services like Data Loss Prevention (DLP), Cloud Workflows, and Cloud Run.
-
Updated
Nov 13, 2024 - HCL
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference.
Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
This solution provides an automated, serverless way to redact sensitive data from PDF files using Google Cloud Services like Data Loss Prevention (DLP), Cloud Workflows, and Cloud Run.
This is a demo project to use Terraform to manage BigQuery scheduled queries with Cloud Build CI/CD
...an automated data pipeline that retrieves cryptocurrency data from the CoinCap API, processes and transforms it for analysis, and presents key metrics on a near-real-time dashboard
Yelp Data Processing Pipeline on GCP
This project uses Terraform to deploy a BigQuery Data Clean Room on Google Cloud
Final project for DataTalks.Club Data Engineering bootcamp
Use GCP Datastream to incrementally load PostgreSQL to BigQuery
Dataflow job subscriber to PubSub subscription. It takes message from subscription and push it into BigQuery table.
terraform-bigquery-googlesheet
A terraform module to copy BigQuery datasets across regions
Bauli repository for connection between SAP Datasphere and Google BigQuery
This repo contains the solution for an ETL pipeline on GCP, using Terraform for infrastructure and Airflow for orchestration.
Automatic Anomaly Decetor
GenAI data pipeline that performs data preparation, management and performance evaluation tasks for RAG systems using SQL as the primary development language. Please feel free to use this as a starting point for your own projects.
A IaC script to ingest and process messages containing data of trips taken by vehicles.
Terraform module for BigQuery sink connector on Aiven KafkaConnect cluster
Released May 19, 2010