- Pre-processing actions, e.g. filtering and cleaning the inbound data in order to reject all irrelevant or corrupted data.
- Data aggregation, i.e. combining multiple data sources in order to prepare combined datasets for further processing (grouping data into topics, for instance).
- Data analytics, e.g. calculating statistics or specific functions (for example, Product Environmental Footprint).
- Checking business rules for triggering specific action, i.e. creating an alert or calling a specific function.
- Publish-subscribe mechanism, i.e. AFarCloud stakeholders can provide data on specific topics in order to observe and be consumed by other participants.
Stream Processing Engine (SPE)
System Prototype Demonstrated in Operational Environment.
The global streaming market is projected to reach USD 1.6 Trillion by 2029, and the Stream Processing Engine was developed at DAC.Digital is all set to be an integral part of this growth. It was built by our engineers in cooperation with the University of West Bohemia and ICPS (Lativa). The Stream Processing Engine is a platform based on Kafka. The applied lambda architecture enables scalable integration of multiple sensor data and supports real-time and batched data stream processing.
It has already been tested and deployed for stream processing (incoming data as they are measured) and batched data processing (a lot of data is incoming in batches, e.g., once a day). One of the successful deployments was in the dairy industry, where the incoming data from cows’ collars connected to ruminal probes and the data from milking robots were processed.
A more advanced example of using SPE is the practical implementation of Artificial Intelligence algorithms. Data from a topic can provide sets of data to the training algorithms. It might be implemented as a microservice, and by continuous processing of the incoming data, an improvement of the trained algorithm might be achieved. All of these operate and are coordinated through the appropriate implementation within a data stream.
State of the art.
The evolution of the network and sensor network technologies have enabled easy access to real-world information in real time. However, there is still a wide scope to push the envelope in terms of bridging the gap between the requirements of Industry 4.0 and the existing capabilities of stream processing as well as analytics.
Since the beginning of the 20th century, data production has increased exponentially. In recent years there has also been a steep increase in streaming data, which has created a need for more efficient management and utilization of it. In the context of the Internet of Things (IoT), the multiplication of data stream sources (connected devices, sensor networks, etc) has been trending, especially in the cloud.
Researchers have pointed out that although there are several frameworks for stream processing, there is a void regarding the available stream processing platforms/engines that businesses can readily utilize to take advantage of real-world, real-time big data. There is a high dependence on open-source stream processing platforms and documentation and source code available freely. To use this open-source solution, high technical expertise is required, which is scarce as well as expensive.
There is also a need for a solution such as SPE that would be able to integrate multiple streams. Having all electronic devices with their own software makes it impossible to integrate with devices from other companies. It results in a major disadvantage of managing many different software apps to display and process data.
The SPE was developed by DAC.digital, addresses this gap in the state-of-the-art and provides businesses with an easy-to-use emerging technology tool, which could boost the businesses’ operations and productivity.
The Solution: How does it work?
The SPE provides real-time data analytics based on Lambda Architecture, i.e. a generic, scalable, and fault-tolerant data processing architecture. This architecture is based on an append-only and immutable data source. Thus the serving layer is decoupled from data (events) storage and processing. Figure 1 shows the SPE within the AFarCloud platform Semantic Middleware (High-Level Services layer).
Figure 1: Stream Processing Engine within the AFarCloud platform Semantic Middleware
The aim is to process the data inbound from third-party artifacts (data sources, software systems, and devices such as sensors) within real-time constraints. An example of processing functionalities is as under.
SPE utilizes the Apache Kafka platform to implement a Data Broker, as shown in the figure 1 above. This is the core element that manages the inbound data. Kafka provides tools for managing real-time data pipelines and creating provider, consumer, and streaming applications.
Figure 2 presents SPE within the AFarCloud architecture. It is a part of the Data Management layer. The data is directly provided from the AFarCloud Interfaces layer via the Data Access Manager. Therefore there is no need for any additional middleware component (converter or adaptor) for stakeholders that publish the data within the AFarCloud ecosystem. In order to forward data to the SPE as well, the SPE Data Provider must be implemented.
Figure 2: Stream Processing Engine within the AFarCloud architecture
Key Features of Stream Processing Engine:
- software component
- integration “all in one” – all data aggregated and adequately processed in one place
- ease of introduction of new analytics
- on-demand scalability
- suited for both real-time and batch data processing (thanks to the lambda architecture)
A comparison of the SPE with others publish/subscriber-based technologies is shown in the table below:
|SPE based on Kafka||REST||MQTT|
|Filtration and processing data||allowed on stream||everything must be implemented from scratch||no|
|Requests service||flexible scalability, big throughput||overload by many requests||small throughput|
|Complexity||A complete and standalone application. Use TCP binary protocol for communication.||API exposing endpoints. Use HTTP protocol which is significantly slower than plain TCP binary.||Lightweight transmission protocol. Optimized for sensor networks and M2M.|
|Persistence and reliability||Ensures high reliability by adding the persistence layer and holding copies of streams.||Everything must be implemented from scratch.||No embedded persistence.|
Who can make use of this technology?
Individual farmers who want their own tailor-made software with the possibility to further extend
Data analysts working in the agricultural field – for rapid development and tests of data processing algorithms in the same framework
SPE can be implemented by an integration/software enterprise as a core used for integrating a few separate agricultural apps into one software or system
Farm dangerous events detection
Sensors that monitor the farm ecosystem (plantation or animal breeding) produce data that can be used in the SPE to detect dangerous events.
Stream Processor (see Figure 1) can be used to monitor the specific type of events originating from the farm ecosystem in order to check the defined business rules. In the case of fulfilling the condition, it triggers specific actions, for example, append an alert to the specific topic in the SPE stream. The alert can be handled by dedicated software, for example, a system that informs the end-user (farmer) about the dangerous situation or a system that manages the dedicated vehicle (e.g. drone) to start the mission.
Monitoring of cow breeding zone
Nowadays, modern farm is equipped with sensors that monitor the cow farming ecosystem. Data from these devices should be fused for further processing. This employs applications and algorithms that provide all necessary information about breeding. This approach enables detecting abnormalities, planning future breeding, calculating the costs of infrastructure and herd maintenance, etc.
Farm Management System layer (e.g. Decision Support System) or third-party software can consume data from the SPE in order to use it in the breeding domain algorithms and applications that monitor and manage the cow breeding zone.