The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
-
Updated
Dec 14, 2024
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a real-time analytics database management system
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
An open source cybersecurity protocol for syncing decentralized graph data.
The Data Engineering Cookbook
PredictionIO, a machine learning server for developers and ML engineers.
CMAK is a tool for managing Apache Kafka clusters
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Open-Source Web UI for Apache Kafka Management
The most widely used Python to C compiler
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.
To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."