Built for scale

Dagger or Data Aggregator is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of both real time and historical streaming data. With Dagger, you don't need to write custom applications to process data as a stream. Instead, you can write SQLs to do the processing and analysis on streaming data.

Reliable & consistent processing

Provides built-in support for fault-tolerant execution that is consistent and correct regardless of data size, cluster size, processing pattern or pipeline complexity.

Robust recovery mechanism

Checkpoints, Savepoints & State-backup ensure that even in unforeseen circumstances, clusters & jobs can be brought back within minutes.

SQL and more

Define business logic in a query & kick-start your streaming job; but it is not just that, there is support for user-defined functions & pre-defined transformations.

Scale

Dagger scales in an instant, both vertically and horizontally for high performance streaming sink and zero data drops.

Extensibility

Add your own sink to dagger with a clearly defined interface or choose from already provided ones. Use Kafka Source for processing real time data or opt for Parquet Source to stream historical data from Parquet Files.

Flexibility

Add custom business logic in form of plugins (UDFs, Transformers, Preprocessors and Post Processors) independent of the core logic.

Key Features

Stream processing platform for transforming, aggregating and enriching data in real-time mode with ease of operation & unbelievable reliability. Dagger can deployd in VMs or cloud-native environment to makes resource provisioning and deployment simple & straight-forward, the only limit to your data processing is your imagination.

Aggregations

Supports Tumble & Slide for time-windows. Longbow feature supports large windows upto 30-day.

SQL Support

Query writing made easy through formatting, suggestions, auto-completes and template queries.

Stream Enrichment

Enrich streamed messages from HTTP endpoints or database sources to bring offline & reference data context to real-time processing.

Observability

Always know what’s going on with your deployment with built-in monitoring of throughput, response times, errors and more.

Analytics Ecosystem

Dagger can transform, aggregate, join and enrich data in real-time for operational analytics using InfluxDB, Grafana and others.

Stream Transformations

Convert messages on the fly for a variety of use-cases such as feature engineering.

Support for Real Time and Historical Data Streaming

Use Kafka Source for processing real time data or opt for Parquet Source to stream historical data from Parquet Files.

Proud Users

Dagger was originally created for the Gojek data processing platform, and it has been used, adapted and improved by other teams internally and externally.