Hey, fellow programmers! After delving into the world of streaming and getting a glimpse of Flink in our previous articles, your curiosity has likely piqued, and you're itching to dive into Flink coding. Well, buckle up because we're about to embark on our first Flink coding adventure: a basic word-counting program. Fire up your favorite IDE and Docker to effortlessly set up and plunge into the realm of streaming with just a few keystrokes.
Word Count Streaming Program:
In this program, we'll tally up repeated words in a sentence. For instance, in the sentence "Hello, I am Akshit, and I will perform streaming," the word "I" is repeated, and our program will highlight that.
Prerequisites:
- Java 17
- Flink 1.18
- IntelliJ
- Socket terminal (For Mac, the command is: nc -lk 9999)
To start, create a Maven Java project with Flink dependencies installed:
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.18.0
We'll use Flink's DataStream API, and the following line marks the beginning of our streaming journey:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Read data from a socket like this:
DataStream<String> text = env.socketTextStream("localhost", 9999);
Now, let's utilize the powerful flatMap function in Flink to transform our data streams. This function allows us to unnest elements, filter elements based on custom logic, and modify elements individually.
Create a custom flatMap function to split the words:
public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
String[] tokens = value.toLowerCase().split("\\W+");
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
}
Now, let's sum the repeated words and create a DataStream object:
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new Tokenizer())
.keyBy(0)
.sum(1);
Finally, print/sink the output in the terminal:
counts.print();
Execute the Flink streaming job with the following line:
env.execute("Word count streaming");
Find the complete program on GitHub. Explore the results of the output and witness the streaming magic unfold.
Input of the data:
Output from the streaming engine:
Conclusion:
In this hands-on session, we implemented the word count program to grasp the basics of streaming with Flink. The journey doesn't end here; stay tuned as we delve into more advanced Flink concepts in the upcoming articles. Keep up with me, show some love to the blog, and let the data streaming continue! 🚀💻
Top comments (0)