DEV Community

Streaming Audio: A Confluent podcast about Apache Kafka®

Collecting Data with a Custom SIEM System Built on Apache Kafka and Kafka Connect ft. Vitalii Rudenskyi

The best-informed business insights that support better decision-making begin with data collection, ahead of data processing and analytics. Enterprises nowadays are engulfed by data floods, with data sources ranging from cloud services, applications, to thousands of internal servers. The massive volume of data that organizations must process presents data ingestion challenges for many large companies. In this episode, data security engineer, Vitalli Rudenskyi, discusses the decision to replace a vendor security information and event management (SIEM) system by developing a custom solution with Apache Kafka® and Kafka Connect for a better data collection strategy.

Having a data collection infrastructure layer is mission critical for Vitalii and the team in helping enterprises protect data and detect security events. Building on the base of Kafka, their custom SIEM infrastructure is configurable and designed to be able to ingest and analyze huge amounts of data, including personally identifiable information (PII) and healthcare data. 

When it comes to collecting data, there are two fundamental choices: push or pull. But how about both? Vitalii shares that Kafka Connect API extensions are integral to data ingestion in Kafka. The three key components to allow their SIEM system to collect and record daily by pushing and pulling: 

  1. NettySource Connector: A connector developed to receive data from different network devices to Apache Kafka. It helps receive data using both the TCP and UDP transport protocols and can be adopted to receive any data from Syslog to SNMP and NetFlow.
  2. PollableAPI Connector: A connector made to receive data from remote systems, pulling data from different remote APIs and services.
  3. Transformations Library: These are useful extensions to the existing out-of-the-box transformations. Approach with “tag and apply” transformations that transform data into the right place in the right format after collecting data.

Listen to learn more as Vitalii shares the importance of data collection and the building of a custom solution to address multi-source data management requirements. 

EPISODE LINKS

Episode source