Apache DataSketches API

Apache DataSketches is the open-source library providing production-quality implementations of sketch algorithms including Theta Sketches (set operations), Quantiles Sketches (percentile estimation), HLL (HyperLogLog for cardinality), CPC, Frequency, and Tuple sketches. It is widely used in data warehouses and OLAP systems including Apache Druid, Apache Spark, and Amazon Redshift. The library provides Java, C++, and Python APIs.

API entry from apis.yml

apis.yml Raw ↑
aid: sketches:apache-datasketches-api
name: Apache DataSketches API
description: Apache DataSketches is the open-source library providing production-quality implementations
  of sketch algorithms including Theta Sketches (set operations), Quantiles Sketches (percentile estimation),
  HLL (HyperLogLog for cardinality), CPC, Frequency, and Tuple sketches. It is widely used in data warehouses
  and OLAP systems including Apache Druid, Apache Spark, and Amazon Redshift. The library provides Java,
  C++, and Python APIs.
humanURL: https://datasketches.apache.org
baseURL: https://datasketches.apache.org
tags:
- Open Source
- Apache
- Data Structures
- Probabilistic Algorithms
- Analytics
properties:
- url: https://datasketches.apache.org
  type: Documentation
- url: https://github.com/apache/datasketches-java
  type: GitHubOrg