Next-Generation Release Provides Integration with Spark for Data and Stream Processing and Kafka for Data Ingestion in Real Time
From its PentahoWorld 2017 user conference, Hitachi Vantara, a wholly-owned subsidiary of Hitachi Ltd., today unveiled the next generation of its Pentaho data integration and analytics platform software. Pentaho 8.0 is enhanced and provides support for Spark and Kafka to improve data and stream processing, plus the ability to easily match compute resources with business demand in real time. The new release is designed to help Hitachi’s customers extract greater value from their data to gain a competitive advantage and accelerate their digital transformation journeys.
According to independent research firm IDC, the global data sphere will grow to 163 zetabytes by 2025 – 10 times faster than the amount of data generated in 2016. The firm also forecasts that more than a quarter of that data will be real-time in nature, with IoT data making up more than 95-percent of it.
With its Pentaho 8.0 release, Hitachi Vantara helps customers to better prepare their businesses to address this real-time data deluge by optimizing and modernizing their data analytics pipelines and improving the productivity of their existing teams. New enhancements to the Pentaho 8.0 platform allow users to:
Improve Connectivity to Streaming Data Sources: With data moving faster, it’s critical to process it as it happens and react immediately if necessary. New capabilities in Pentaho 8.0 include:
- Stream processing with Spark: Pentaho 8.0 now fully enables stream data ingestion and processing using its native engine or Spark. This adds to existing Spark integration with SQL, MLlib and Pentaho’s adaptive execution layer.
- Connect to Kafka Streams: Kafka is a very popular publish/subscribe messaging system that handles large data volumes that are common in today’s big data and IoT environments. Pentaho 8.0 now enables real-time processing with specialized steps that connect Pentaho Data Integration (PDI) to Kafka.
- Big data security with Knox: Building on its existing enterprise-level security for Cloudera and Hortonworks, Pentaho 8.0 now adds support for the Knox Gateway used for authenticating users to Hadoop services.
Optimize Processing Resources: Every organization has constrained data processing resources that it wants to use intelligently, guaranteeing high availability even when demand for computation resources are high. To support this, Pentaho 8.0 provides:
- Worker nodes to scale out enterprise workloads: IT managers can now easily bring up additional nodes and spread simultaneous workloads across all available computation resources to match capacity with demand. This matching provides elasticity and portability between cloud and on-premises environments resulting in faster and more efficient processing for end users.
- Adaptive execution enhancements:First introduced in Pentaho 7.1, Pentaho’s adaptive execution allows users to match workloads with the most appropriate processing engine, without having to rewrite any data integration logic. Now, Pentaho 8.0 makes adaptive execution easier to set up, use and secure. The functionality is also now available on Hortonworks.
- Native support for Avro and Parquet: Pentaho 8.0 makes it easy to read and write to these popular big data file formats and process with Spark using Pentaho’s visual editing tools.
- Boost Team Productivity: Pentaho 8.0 also comes with several new features to help increase productivity across the data pipeline. These include granular filters for preparing data, improved repository usability and easier application auditing.
“On the path to digital transformation, enterprises must fully exploit all the data available to them. This requires connecting traditional data silos and integrating their operational and information technologies to build modern analytics data pipelines that can accommodate a more connected, open and fluid world of data,” said Donna Prlich, chief product officer for Pentaho software at Hitachi Vantara. “Pentaho 8.0 provides enterprise scale and faster processing in anticipation of future data challenges to better support Hitachi’s customers on their digital journeys.”