I was looking for a highly scalable streaming framework in python. I was using spark streaming till now for reading data from streams with heavy through puts. But somehow I felt spark a little heavy as the minimum system requirement is high.

Last day I was researching on this and found one framework called Faust. I started exploring the framework and my initial impression is very good.

This framework is capable of running in distributed way. So we can run the same program in multiple machines. This will enhance the performance.

I tried executing the sample program present in their website and it worked properly. The same program is pasted below. I have used CDH Kafka 4.1.0. The program worked seamlessly.

import faust
# The definition of message
class Greeting(faust.Record):
from_name: str
to_name: str
# Here we initialize the application. The Kafka broker details are specified in the broker details.
app = faust.App('hello-app', broker='kafka://192.168.0.20')
# Here we define the topic and define the template of the message
topic = app.topic('hello-topic', value_type=Greeting)
# This is the faust agent that reads the data from the kafka topic asynchronously.
@app.agent(topic)
async def hello(greetings):
async for greeting in greetings:
print(f'Hello from {greeting.from_name} to {greeting.to_name}')
# This function acts as the producer and send messages to Kafka at the mentioned time interval
# Here the time interval is 0.1 seconds. You can adjust this and test the speed of produce & consume.
@app.timer(interval=0.1)
async def example_sender(app):
await hello.send(
value=Greeting(from_name='Amal', to_name='you'),
)
if __name__ == '__main__':
app.main()

view raw
sample_faust.py
hosted with ❤ by GitHub

To execute the program, I have used the following command.

python sample_faust.py worker -l info

The above program reads the data from Kafka and prints the message. This framework is not just about reading messages in parallel from streaming sources. This has integrations with an embedded key-value data store RockDB. This is opensourced by Facebook and is written in C++.