/

big-data

database
sql
python
java
scala
data-science
machine-learning
hadoop
r
spark
analytics
flink
graph
data-engineering
olap
hacktoberfest
distributed-database
deep-learning
kaggle
hive
data
kafka
graphdb
distributed
graph-database
gbm
data-visualization
business-intelligence
dashboard
data-analytics
jdbc
mpp

apache/spark
502日前37.8k

Apache Spark - A unified analytics engine for large-scale data processing

ClickHouse/ClickHouse
502日前33.1k

ClickHouse® is a free analytics DBMS for big data

donnemartin/data-science-ipython-notebooks
503日前26.1k

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

apache/flink
502日前22.8k

Apache Flink

prestodb/presto
501日前15.4k

The official home of the Presto distributed SQL query engine for big data

andkret/Cookbook
502日前12.7k

The Data Engineering Cookbook

apache/predictionio
504日前12.6k

PredictionIO, a machine learning server for developers and ML engineers.

yahoo/CMAK
502日前11.6k

CMAK is a tool for managing Apache Kafka clusters

vesoft-inc/nebula
502日前9.9k

A distributed, fast open-source graph database featuring horizontal scalability and high availability

catboost/catboost
502日前7.6k

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

h2oai/h2o-3
502日前6.6k

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

apache/zeppelin
502日前6.2k

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

risingwavelabs/risingwave
501日前5.9k

Scalable Postgres for stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.

tangbc/vue-virtual-scroll-list
507日前4.2k

⚡️A vue component support big amount data list with high render performance and efficient.

TuiQiao/CBoard
512日前3.0k

An easy to use, self-service open BI reporting and BI dashboard platform.

apache/incubator-hugegraph
503日前2.5k

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)

dremio/dremio-oss
502日前1.3k

Dremio - the missing link in modern data

traildb/traildb
511日前1.1k

TrailDB is an efficient tool for storing and querying series of events