Data Integration

Launching Qlik Open Lakehouse

A new paradigm for building Iceberg Lakehouses

Image of Qlik blog author Vijay Raja

Vijay Raja

6 minutes

Qlik Open Lakehouse

The shift towards Iceberg-based Open Lakehouses 

Open lakehouses powered by Apache Iceberg are redefining the landscape of data management. By bringing together the scalability of data lakes with the performance and reliability of data warehouses, lakehouses give you unprecedented flexibility and control over your data, while embracing open standards.

Iceberg

More and more organizations are turning to open lakehouses with Apache Iceberg to break down data silos, boost interoperability, and achieve substantial cost savings in the cloud. According to recent industry surveys, over half of large organizations expect to cut their analytics costs by more than 50% by adopting a lakehouse architecture, with some anticipating savings as high as 75%. These dramatic reductions stem from eliminating data duplication, streamlining data movement, and scaling storage and compute independently ensuring you only pay for what you need. 

Challenges abound in realizing the full value of Lakehouses 

However, this is still a rapidly evolving domain, and organizations have to navigate some murky waters to get maximum value out of Lakehouses and to realize the true promise of Apache Iceberg. Some of the key challenges include: 

  • Lack of Native Ingestion: Apache Iceberg is a powerful table format, but it lacks built-in ingestion capabilities—forcing data teams to rely on external tools to move data from operational systems into Iceberg tables easily and efficiently. 

  • Complex and Fragile Pipelines: Building and maintaining robust data pipelines requires significant engineering effort to manage schema evolution, data quality, observability, and handle bursty workloads—all prone to errors and inefficiencies. 

  • Manual Optimization Bottlenecks: Unoptimized Iceberg tables can quickly drive up the storage footprint while dramatically slowing down queries as well.  Achieving high-performance queries in Iceberg demands ongoing tuning of individual tables, compaction, and partitioning—often managed manually or through limited open-source tools that don’t scale. 

  • Data Trust and Lineage Visibility: Without automated quality checks and end-to-end lineage, it’s difficult for organizations to ensure data is accurate, complete, and ready for AI, analytics, or compliance-driven workloads. 

Introducing Qlik Open Lakehouse 

Today we are excited to launch Qlik Open Lakehouse - a new capability in Qlik Talend Cloud that radically simplifies how organizations ingest, manage, and optimize data in Apache Iceberg-based lakehouses.  

Just a few months ago we announced the acquisition of Upsolver to accelerate our innovation and commitment to Apache Iceberg. And today, by integrating the Upsolver platform into Qlik Talend Cloud, we are thrilled to announce the launch of Qlik Open Lakehouse – now available in private preview. With this integration, users can effortlessly deploy a lakehouse architecture on their Amazon S3 environment and start loading both batch and real-time data directly into Iceberg tables with just a few clicks—no complex/brittle pipelines, no bottlenecks, no manual configurations. 

Click to play Qlik Open Lakehouse video via Vidyard

You don’t have to get bogged down by all of the operational complexity that typically comes with Iceberg implementations. Qlik Open Lakehouse handles all the hard parts automatically. It maps source schemas to target structures; resolves data type conflicts, applies intelligent partitioning, performs automatic file compactions, manages schema evolution, and handles updates and deletes with precision – on an ongoing basis with no manual interventions. Behind the scenes, Qlik’s Adaptive Iceberg Optimizer continuously monitors and optimizes your Iceberg tables in real-time to improve performance and minimize storage footprint. 

And the results speak for themselves: 

  • Up to 5x faster query performance through adaptive optimizations

  • Up to 50% lower storage costs thanks to efficient compaction and clean-ups

  • A truly no-code experience for enterprise-scale Iceberg management 

Furthermore, all of these capabilities will be included in the Standard Edition and upwards of Qlik Talend Cloud making it even easier and affordable for you to adopt and work with Apache Iceberg.  Whether you're working with batch or real-time data, Qlik Open Lakehouse lets you harness the full power of Apache Iceberg—without the operational burden. 

Qlik Open Lakehouse is now available in private preview and will be generally available in July 2025. If you are interested in being part of an Early Access Program (EAP) for Qlik Open Lakehouse, please sign up using the link here

Key Features of Qlik Open Lakehouse 

Real-time high throughput ingestion into Iceberg:  

Ingesting massive volumes of batch and real-time data into Apache Iceberg has never been easier. With Qlik Open Lakehouse, you can pull data from hundreds of sources—including operational databases, SaaS apps, SAP, mainframes, file sources and CDC streams—directly into Iceberg tables with just a few clicks. As part of the launch, customers will be able to perform low-latency data ingestion from all of the existing 200+ Qlik Talend Cloud sources directly into Iceberg tables in their AWS S3 environment.  

Whether it’s real-time updates from operational systems or historical snapshots, Qlik ensures your data is written, merged, and optimized quickly — supporting both Type 1 and Type 2 change history to meet the strictest SLAs. 

Qlik handles the complexity for you: 

✅ Automatic schema mapping 

✅ Type conflict resolution 

✅ Intelligent partitioning 

✅ Schema evolution 

✅ Update/delete handling  

More importantly, customers can now take advantage of cost-efficient Qlik compute, with AWS spot instances, to support ingestion and the bronze layer, further driving compute savings and efficiencies. 

The result? Fast, efficient, and cost-effective data ingestion—no code, no manual work. 

Qlik open lakehouse

Qlik Adaptive Iceberg Optimizer:  

Qlik’s industry leading Adaptive Iceberg Optimizer technology continuously monitors tables and determines the ideal optimizations, compactions and clean ups to execute based on each table’s unique characteristics, delivering unmatched performance boost (up to 2.5x - 5x improvement in performance1), and up to 50% reduction in costs, all without writing a single line of code.  

Other Iceberg optimization services require the user to tune and tweak optimization behavior by manually configuring each and every table. This process is tedious, complicated and simply not scalable. Qlik’s Adaptive Iceberg Optimizer monitors each table and dynamically adapts to its unique characteristics automatically delivering better query performance and lower costs compared to manually tuned tables. That’s performance without all of the engineering! 

Learn more about how leading organizations including Iron Source and Cox Automotive were able to drive significant cost savings and business impacts using the Upsolver platform. 

Data warehouse mirroring:  

Data warehouse mirroring lets users automatically replicate data from Qlik Open Lakehouse Iceberg tables into their cloud data warehouse (such as Snowflake) to enable querying or additional downstream transformations, without duplicating or creating a copy of the data. This ensures interoperability with your existing systems, minimizes data duplication, and delivers further cost savings.   Users also have the option to run their ingestion and bronze layer utilizing cost-effective Qlik compute, and then effortlessly mirror data into Snowflake for further downstream processing and transformations, providing them with the best of both worlds.  

Open, industry-leading integrations:  

Qlik Open Lakehouse is built to be open and platform-agnostic. That means you’re never locked in. It integrates with industry-leading catalogs such as AWS Glue, Apache Polaris (incubating) and Snowflake Open Catalog, enabling users to achieve high-performance queries on petabytes of data using any of the leading Iceberg-compatible platform or query engines.   Use your preferred query engine, catalog, or analytics platform. We will integrate natively with:    

  • Catalogs: AWS Glue, Apache Polaris (incubating), Snowflake Open Catalog  

  • Platforms & Query engines: Snowflake, Amazon Athena, Trino, Presto, Dremio, Apache Spark and more... 

  • Cloud storage: Amazon S3 (with future support for more)  

You’re free to build the architecture that works for you—no compromises.  

Unified, end-to-end solution for Lakehouses

Ultimately, using Qlik Talend Cloud with Open Lakehouse, customers can now deploy a single end-to-end solution to ingest, transform, govern, optimize, and manage data pipelines for their Iceberg-based lakehouses, while ensuring comprehensive data quality and trust -- all without having to cobble together multiple pieces of the puzzle to build and scale their open lakehouses. Other managed Iceberg options typically solve for only ingestion or optimizations for Iceberg, whereas Qlik Talend Cloud now provides a unified solution for managing your end-to-end pipeline for Iceberg-based open lakehouses -- from data sources all the way through analytics and AI applications. 

Learn more about Qlik Talend Cloud 

Secure and flexible data management:  

With Qlik Open Lakehouse, data never leaves your private cloud environment. Data ingestion and optimization utilize Qlik-managed compute resources running on Amazon EC2 Spot instances inside of your AWS VPC. Allowing you to configure preferred instance types, Qlik automatically scales resources up and down to match demand, delivering low latency queries at a fraction of the cost of self-managed open-source tools or a data warehouse. With configurable scaling strategies ranging from low cost to low latency, customers can choose the right mix to ensure delivery of the right data at the right latency with the right cost structure. That means you get the choice of your compute, your own cloud, your rules, giving you full control over performance, security, and cost. 

High Throughput Ingestion into Iceberg  

Ingest data in real-time from hundreds of sources including databases, SaaS sources, and SAP directly into Iceberg   

Adaptive Iceberg Optimizer  

Optimize Iceberg tables with Adaptive Optimizer to drive 5x query performance with zero manual effort

Drive 50% Cost Savings with 5x Faster Queries  

Drive faster queries and unlock cost savings of >50% through optimizations and storage cost reductions 

Open and Interoperable 

Work with leading Iceberg catalogs, processing platforms and query engines 

Mirror Data to your Data Warehouse 

Seamlessly mirror data to your Snowflake warehouse without the need to copy data 

Proven Performance at Petabyte-scale  

A battle-tested, proven solution for ingesting, and managing data at PB scale. 

Unlock the full potential of Iceberg with Qlik Open Lakehouse  

Qlik Talend Cloud - with Open Lakehouse - offers an independent solution to ingest, optimize, process and manage Apache Iceberg tables on your Amazon S3, while delivering unparalleled query performance and scalability at 50% lower costs. It brings together Qlik Talend Cloud’s low-latency ingestion and efficient compute resources, together with Upsolver’s adaptive iceberg optimization capabilities to deliver 5x boost to query performance with 50% in cost savings.  

Thousands of customers already trust Qlik for data ingestion, integration, transformation, data quality and analytics using data warehouses and data lakes – and now they can extend all the same capabilities to support Iceberg as an endpoint. Qlik Open Lakehouse integrates with your existing investments, including your AWS, Snowflake and more, so that you can read and write data from Iceberg tables with the highest performance, regardless of query or processing engines utilized, all with zero manual effort. 

Start building with Open Lakehouse on Qlik Talend Cloud! Sign up to be a part of the Early Access Program for Qlik Open Lakehouse.  

Learn more 

In this article:

Data Integration

Ready to get started?