Removing duplicate records using Transformers
About this example#
In this example, we will use the DeDuplication Transformer in Dagger to remove the booking orders (as Kafka records) having duplicate order_number. By the end of this example we will understand how to use Dagger to remove duplicate data from Kafka source.
Before Trying This Example#
We must have Docker installed. We can follow this guide on how to install and set up Docker in your local machine.
Clone Dagger repository into your local
git clone https://github.com/raystack/dagger.git
Steps#
Following are the steps for setting up dagger in docker compose -
- cd into the aggregation directory:
cd dagger/quickstart/examples/aggregation/tumble_window - fire this command to spin up the docker compose:Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the booking logs without any duplicate
docker compose uporder_number. - fire this command to gracefully close the docker compose:This will stop and remove all the containers.
docker compose down
Congratulations, we are now able to use Dagger to remove duplicate data from Kafka source!