Skip to Content
// Apache Hop overview

What is
Apache Hop?

The open-source standard for visual data orchestration and pipeline development. Learn how it works, when to use it, and how to run it in production.

Apache Software Foundation Successor to Pentaho PDI Visual · Metadata-driven Open source
01 — Overview

What is Apache Hop?

Apache Hop is an open-source data orchestration platform built for teams who need to move, transform, and manage data across business systems — reliably, repeatably, and without being locked into a single infrastructure. It is a top-level Apache Software Foundation project and the modern successor to Pentaho Data Integration (PDI/Kettle).

Apache Hop helps teams visually author, run, and monitor data pipelines and workflows. It handles complex integrations that require data transformation, workflow orchestration, and reliable execution across a wide range of systems — from relational databases and REST APIs to ERP platforms and cloud services.

At the heart of Apache Hop are two core concepts: pipelines and workflows. Pipelines perform the core data processing tasks — extracting, transforming, and loading data between systems. Workflows handle orchestration-level logic: running pipelines in sequence, managing errors, moving files, sending notifications, and coordinating execution across environments.

Unlike code-first orchestration tools, Apache Hop uses a visual, metadata-driven approach. Pipelines and workflows are built using a drag-and-drop IDE, making complex logic accessible without requiring deep programming expertise — while still offering the power and flexibility that experienced engineers expect.

02 — Background

History of Apache Hop

If you've worked in data integration for more than a few years, you've probably encountered Pentaho Data Integration — also known as Kettle. For over a decade, PDI was one of the most widely used ETL tools in the world. It was powerful, visual, and approachable. It was also showing its age.

In 2019, a group of engineers — several of whom had spent years building and contributing to PDI — decided to start over. Not from scratch, but with intention. The result was Apache Hop: a complete architectural rethink that kept what made PDI great and replaced everything that held it back. The project entered the Apache Software Foundation incubator in 2020 and graduated as a top-level project in 2021.

The rebuild introduced a fully plugin-based engine, a redesigned IDE, a clean separation between pipeline logic and environment configuration, and native support for multiple execution runtimes. For teams migrating from PDI, the transition is familiar. For teams starting fresh, there's no legacy baggage to navigate.

03 — Applications

What is Apache Hop used for?

Apache Hop is the tool of choice for data integration teams that need to connect, transform, and move data reliably across business systems. Data engineers use it to build pipelines that run on any infrastructure — locally, in Docker, on Kubernetes, or in the cloud — without modifying the pipeline logic itself.

ERP and CRM integrations — Connecting systems like Odoo, Salesforce, and HubSpot to data warehouses or reporting layers
ETL and ELT pipelines — Extracting, transforming, and loading data between relational databases, files, and APIs
Data migration projects — Moving data between legacy and modern systems with full transformation control
Operational data workflows — Automating file transfers, database maintenance, and system synchronisation tasks
04 — Community

Built on a strong and growing community

Apache Hop is an active Apache Software Foundation project with a growing community of contributors, users, and commercial adopters worldwide.

1.4K
GitHub Stars
View on GitHub →
1.4K
Downloads
View releases →
1.4K
Contributors
View contributors →
05 — Platform

Key features of Apache Hop

01
Visual, metadata-driven development
Build pipelines on a visual canvas. All logic is stored as metadata — easy to version, share, and deploy. Engineers can also work directly with the underlying XML or integrate with standard Git workflows.
02
Architecture-agnostic execution
Run the same pipeline locally, in Docker, on Kubernetes, or on Apache Spark and Flink — without changing a single transform. One pipeline definition, any runtime.
03
Pipelines and workflows
Pipelines stream data through transforms in parallel, optimised for throughput. Workflows orchestrate sequences with conditional logic, error handling, and notification support built in.
04
Plugin-based extensibility
Every component is a plugin — transforms, actions, database connections, execution engines. Extend the platform for specific use cases or build entirely new runtimes without modifying core code.
05
Environment and config management
Separate pipeline logic from environment configuration. The same pipeline runs in dev, test, and production by switching the active environment — no modifications required. Natural fit for CI/CD.
06
Native Git integration
All metadata stored as structured files — natively compatible with Git and any version control system. Apply standard software engineering practices to data pipeline development.
06 — Enterprise

Putki: Apache Hop for production

Putki is know.bi's production-ready distribution of Apache Hop. Running Apache Hop in production requires more than a download — teams need hardened builds, operational visibility, governance tooling, and someone accountable when something breaks. Putki provides all of that, built by the same team that contributes to the Apache Hop core.

Security
Hardened Docker images, vulnerability scanning with every release using Trivy and Docker Scout, and patch releases for critical issues between Apache Hop major versions. Current release carries no CVEs above score 9.
Observability
Centralized execution logs, pipeline health dashboards, Grafana monitoring, and Slack alerting. Know what your pipelines are actually doing — before your business users do.
Governance
Autodoc generates technical documentation from pipeline metadata. SQL Parser extracts dependencies. RDBMS Impact Analysis shows which pipelines will break before you change a database schema — not after.
Connectivity
15+ maintained connectors for business systems including Odoo, Shopify, and HubSpot — with built-in handling for API versioning, pagination, and rate limiting.
Support
SLA-backed service desk, a 70+ article knowledge base, and advisory services from the team that built the platform. know.bi's founders are active contributors to the Apache Hop project.

Ready to run Apache Hop in production?

Putki is available across Basic, Professional, and Enterprise tiers — priced on observability, governance, and support, not data volume.

Talk to us
07 — Next steps

Getting started with Apache Hop

The fastest way to get started is to download the Hop IDE and follow the official documentation at hop.apache.org. The IDE runs on Windows, Linux, and macOS and requires only a supported Java runtime.

For teams that need a production-ready starting point — with security hardening, operational tooling, and commercial support already in place — Putki provides a distribution of Apache Hop that is ready to run from day one.

Start with Putki

Get the enterprise-ready Apache Hop distribution with security, observability, and governance built in.

Get started