Repository files navigation Triple Joins with Simulated Data
Cards are generated with id and additional data
Verifications by Users are generated
User are generated
First verifications are joined to cards
Next, users are joined to the verified cards
Then an aggregation is performed on the users where the cards verified by each user are determined
Finally a filter is applied to only output users with greater than a certain number (200) of verifications
Case 1: SQL with SQLite3 in Python (no ORM)
Case 2: Pandas in Python with Dataframes
Case 3: Scala Spark Structured Streaming with Kafka generated streams
Program 1 generates three simulated data streams to three Kafka Producer topics
Program 2 performs the Triple Join with Aggregation
Case 4: Scala Kafka Streams
Use same Program 1 from Case 3 to generate data records to Producer topics
Program is Scala Kafka Streams implementation of Triple Join with aggregation
Case 5: Scala Flink with Kafka generated streams
Use same Program 1 from Case 3 to generate data records to Producer topics
Program is Scala Flink implementation of Triple Join with aggregation
About
3-way INNER JOINS with aggregation -- Python SQL, Pandas, Scala Kafka Streams, Scala Flink and Scala Spark Structured Streams
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.