Apache SeaTunnel 2.3.8 is set to be released soon, and recently, Apache SeaTunnel PMC Member Fan Jia shared insights on the new features and updates at a community meeting. Here’s a detailed overview of what to expect:
Introduction to SeaTunnel
SeaTunnel is a high-performance open-source distributed data integration system that supports real-time streaming and offline batch processing of various data sources, making it suitable for massive data integration. Key features include:
- Extensive Connectors: Supports over 100 data sources and storage systems.
- Multi-Engine Support: Compatible with various data processing engines, including SeaTunnel Zeta Engine, Spark, and Flink.
- HTTP Support: Enables data integration via HTTP interfaces.
- Stream and Batch Integration: Supports both stream processing and batch processing.
- Stream Rate Control: Capable of controlling the rate of data flow.
- Automatic Table Creation: Automatically creates tables based on data structure.
New Features and Updates in Version 2.3.8
In the upcoming 2.3.8 release, the community will introduce several new features and updates:
Docker Images
The new version will provide official Docker images that include nearly all connectors. Users can run SeaTunnel more quickly and simplify deployment without downloading installation packages.
- Build Images via Command: Users with custom needs can build images locally using command-line instructions.
- Start Services via Command: Supports starting services for distributed deployment, submitting tasks, and querying task statuses via the command line. Users can also submit tasks through REST APIs.
- Submit tasks via the command:
Spark Multi-Table Support
Currently, SeaTunnel only supports multi-table tasks with the Zeta Engine. The new version will introduce Spark engine support for multi-table tasks, allowing for automatic recognition and execution of multi-table jobs. Additionally, Flink’s multi-table support is in progress, and interested contributors are welcome to join on GitHub.
Config Parameter Default Values
The current version allows variable configuration in the config parameters, but each variable needs to be set manually. The new version will permit the use of default values for configuration parameters, enhancing flexibility.
Prometheus Integration for Cluster Monitoring
Previously, SeaTunnel provided interfaces for retrieving task run metrics. The new version will support integration with Prometheus for cluster monitoring. Prometheus will regularly pull the status of SeaTunnel cluster tasks and present this in a visual interface, making it easier to monitor cluster status and quickly identify issues.
Embedding Transform
The addition of the Embedding transform will enable the integration of machine learning models into the data transformation process, converting raw fields into vector values for storage in appropriate machine learning databases. Current machine-learning model providers supported by SeaTunnel include Doubao, Qianfan, and OpenAI.
Job-Level Log Filtering
The new version will enhance log filtering and viewing capabilities at the job level, enabling users to filter logs through three methods:
- Job ID in Logs: Users can search for logs associated with a specific Job ID, making it easier to troubleshoot when multiple tasks are running concurrently.
- Splitting Logs by Job ID: By modifying the log configuration file, users can ensure that logs for the same Job ID are categorized into the same file, simplifying log management.
Example modification for the log4j2.properties
configuration file:
...
rootLogger.appenderRef.file.ref = routingAppender
...
appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%-30.30c{1.}] [%t] - %m%n
...
Kafka Support for Protobuf Data
The Kafka connector has been enhanced to support the Protobuf data format, allowing for the definition of Protobuf data types for reading and writing.
File Support for Reading Compressed Files
The new version will introduce support for reading compressed file formats, eliminating the need for decompression steps.
Other Features
Additionally, the new version will remove filters on system tables, allowing users to read system tables, and enhance support for Paimon’s stream reading and dynamic bucket writing.
How to Get the Latest Version and Contribute
Download
The SeaTunnel 2.3.8 version is expected to be released in early October. Stay tuned to the SeaTunnel official download page for the latest version.
Contributing
- Mailing List: Subscribe to the SeaTunnel development mailing list by emailing dev-subscribe@seatunnel.apache.org to participate in community discussions and release votes.
- GitHub: Visit the Apache SeaTunnel GitHub repository to keep up with community updates and submit bug reports and feature requests.
Conclusion
The release of SeaTunnel 2.3.8 will introduce a series of new features and improvements, making data integration more efficient and flexible. Thanks to all contributors for their efforts in making SeaTunnel a more powerful data integration tool.
For more information, please visit the SeaTunnel official website.
Top comments (0)