logo

What is under the hood of Airflow ?

Image without caption
Airflow has many contenders now but still continues to hold strong when it comes to orchestration and scheduling of tasks in the Data Engineering Field.
If you are running self hosted Airflow in your organization it helps to understand what the system is made of so that when it breaks you can fix it fast. Letโ€™s look into its Architecture.
Airflow is composed of several microservices that work together to perform work. Here are the components:
๐—ฆ๐—ฐ๐—ต๐—ฒ๐—ฑ๐˜‚๐—น๐—ฒ๐—ฟ.
โžก๏ธ Central piece of Airflow architecture.
โžก๏ธ Performs triggering of scheduled workflows.
โžก๏ธ Submits tasks to the executor.
๐—˜๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ผ๐—ฟ.
โžก๏ธ Part of the scheduler process.
โžก๏ธ Handles task execution.
โžก๏ธ In production workloads pushes tasks to be performed to workers.
โžก๏ธ Can be configured to execute against different Systems (Celery, Kubernetes etc.)
๐—ช๐—ผ๐—ฟ๐—ธ๐—ฒ๐—ฟ.
โžก๏ธ The unit that actually performs work.
โžก๏ธ In production setups it usually takes work in the form of tasks from a queue placed between workers and the executor.
๐— ๐—ฒ๐˜๐—ฎ๐—ฑ๐—ฎ๐˜๐—ฎ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฎ๐˜€๐—ฒ.
โžก๏ธ Database used to store the state by Scheduler, Executor and Webserver.
๐——๐—”๐—š ๐—ฑ๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ๐—ถ๐—ฒ๐˜€.
โžก๏ธ Airflow DAGs are defined in Python code.
โžก๏ธ This is where you store the DAG code and configure Airflow to look for DAGs.
๐—ช๐—ฒ๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ.
โžก๏ธ This is a Flask Application that allows users to explore, debug and partially manipulate Airflow DAGs, users and configuration.
โ—๏ธ Two most important parts are the Scheduler and the Metadata DB.
โ—๏ธ Even if Webserver is down - tasks will be executed as long as the Scheduler is healthy.
โ—๏ธ Metadata DB transaction locks can cause problems for other services.
Share