What is
Apache Hop?
The open-source standard for visual data orchestration and pipeline development. Learn how it works, when to use it, and how to run it in production.
What is Apache Hop?
Apache Hop helps teams visually author, run, and monitor data pipelines and workflows. It handles complex integrations that require data transformation, workflow orchestration, and reliable execution across a wide range of systems — from relational databases and REST APIs to ERP platforms and cloud services.
At the heart of Apache Hop are two core concepts: pipelines and workflows. Pipelines perform the core data processing tasks — extracting, transforming, and loading data between systems. Workflows handle orchestration-level logic: running pipelines in sequence, managing errors, moving files, sending notifications, and coordinating execution across environments.
Unlike code-first orchestration tools, Apache Hop uses a visual, metadata-driven approach. Pipelines and workflows are built using a drag-and-drop IDE, making complex logic accessible without requiring deep programming expertise — while still offering the power and flexibility that experienced engineers expect.
History of Apache Hop
If you've worked in data integration for more than a few years, you've probably encountered Pentaho Data Integration — also known as Kettle. For over a decade, PDI was one of the most widely used ETL tools in the world. It was powerful, visual, and approachable. It was also showing its age.
In 2019, a group of engineers — several of whom had spent years building and contributing to PDI — decided to start over. Not from scratch, but with intention. The result was Apache Hop: a complete architectural rethink that kept what made PDI great and replaced everything that held it back. The project entered the Apache Software Foundation incubator in 2020 and graduated as a top-level project in 2021.
The rebuild introduced a fully plugin-based engine, a redesigned IDE, a clean separation between pipeline logic and environment configuration, and native support for multiple execution runtimes. For teams migrating from PDI, the transition is familiar. For teams starting fresh, there's no legacy baggage to navigate.
What is Apache Hop used for?
Apache Hop is the tool of choice for data integration teams that need to connect, transform, and move data reliably across business systems. Data engineers use it to build pipelines that run on any infrastructure — locally, in Docker, on Kubernetes, or in the cloud — without modifying the pipeline logic itself.
Built on a strong and growing community
Apache Hop is an active Apache Software Foundation project with a growing community of contributors, users, and commercial adopters worldwide.
Key features of Apache Hop
Putki: Apache Hop for production
Putki is know.bi's production-ready distribution of Apache Hop. Running Apache Hop in production requires more than a download — teams need hardened builds, operational visibility, governance tooling, and someone accountable when something breaks. Putki provides all of that, built by the same team that contributes to the Apache Hop core.
Ready to run Apache Hop in production?
Putki is available across Basic, Professional, and Enterprise tiers — priced on observability, governance, and support, not data volume.
Getting started with Apache Hop
The fastest way to get started is to download the Hop IDE and follow the official documentation at hop.apache.org. The IDE runs on Windows, Linux, and macOS and requires only a supported Java runtime.
For teams that need a production-ready starting point — with security hardening, operational tooling, and commercial support already in place — Putki provides a distribution of Apache Hop that is ready to run from day one.
Start with Putki
Get the enterprise-ready Apache Hop distribution with security, observability, and governance built in.
What is
Apache Hop?
The open-source standard for visual data orchestration and pipeline development. Learn how it works, when to use it, and how to run it in production.
What is Apache Hop?
Apache Hop helps teams visually author, run, and monitor data pipelines and workflows. It handles complex integrations that require data transformation, workflow orchestration, and reliable execution across a wide range of systems — from relational databases and REST APIs to ERP platforms and cloud services.
At the heart of Apache Hop are two core concepts: pipelines and workflows. Pipelines perform the core data processing tasks — extracting, transforming, and loading data between systems. Workflows handle orchestration-level logic: running pipelines in sequence, managing errors, moving files, sending notifications, and coordinating execution across environments.
Unlike code-first orchestration tools, Apache Hop uses a visual, metadata-driven approach. Pipelines and workflows are built using a drag-and-drop IDE, making complex logic accessible without requiring deep programming expertise — while still offering the power and flexibility that experienced engineers expect.
History of Apache Hop
If you've worked in data integration for more than a few years, you've probably encountered Pentaho Data Integration — also known as Kettle. For over a decade, PDI was one of the most widely used ETL tools in the world. It was powerful, visual, and approachable. It was also showing its age.
In 2019, a group of engineers — several of whom had spent years building and contributing to PDI — decided to start over. Not from scratch, but with intention. The result was Apache Hop: a complete architectural rethink that kept what made PDI great and replaced everything that held it back. The project entered the Apache Software Foundation incubator in 2020 and graduated as a top-level project in 2021.
The rebuild introduced a fully plugin-based engine, a redesigned IDE, a clean separation between pipeline logic and environment configuration, and native support for multiple execution runtimes. For teams migrating from PDI, the transition is familiar. For teams starting fresh, there's no legacy baggage to navigate.
What is Apache Hop used for?
Apache Hop is the tool of choice for data integration teams that need to connect, transform, and move data reliably across business systems. Data engineers use it to build pipelines that run on any infrastructure — locally, in Docker, on Kubernetes, or in the cloud — without modifying the pipeline logic itself.
Built on a strong and growing community
Apache Hop is an active Apache Software Foundation project with a growing community of contributors, users, and commercial adopters worldwide.
Key features of Apache Hop
Putki: Apache Hop for production
Putki is know.bi's production-ready distribution of Apache Hop. Running Apache Hop in production requires more than a download — teams need hardened builds, operational visibility, governance tooling, and someone accountable when something breaks. Putki provides all of that, built by the same team that contributes to the Apache Hop core.
Ready to run Apache Hop in production?
Putki is available across Basic, Professional, and Enterprise tiers — priced on observability, governance, and support, not data volume.
Getting started with Apache Hop
The fastest way to get started is to download the Hop IDE and follow the official documentation at hop.apache.org. The IDE runs on Windows, Linux, and macOS and requires only a supported Java runtime.
For teams that need a production-ready starting point — with security hardening, operational tooling, and commercial support already in place — Putki provides a distribution of Apache Hop that is ready to run from day one.
Start with Putki
Get the enterprise-ready Apache Hop distribution with security, observability, and governance built in.