Data Engineering/Airflow

Airflow의 parallelism과 dag_concurrency

신수동탈곡기 2022. 5. 26. 11:06

parallelism

This defines the maximum number of task instances that can run concurrently in Airflow regardless of scheduler count and worker count. Generally, this value is reflective of the number of task instances with the running state in the metadata database.

scheduler와 worker의 개수와 상관없이 Airflow 내에서 동시에 구동할 수 있는 task 인스턴스의 최대 개수

 

Configuration Reference — Airflow Documentation

 

airflow.apache.org

dag_concurrency

Deprecated since version 2.2.0: The option has been moved to core.max_active_tasks_per_dag

2.2.0 버전부터 없어진 configuration이고, max_active_tasks_per_dag으로 대체되었다. 

max_active_tasks_per_dag

The maximum number of task instances allowed to run concurrently in each DAG. To calculate the number of tasks that is running concurrently for a DAG, add up the number of running tasks for all DAG runs of the DAG. This is configurable at the DAG level with max_active_tasks, which is defaulted as max_active_tasks_per_dag. An example scenario when this would be useful is when you want to stop a new dag with an early start date from stealing all the executor slots in a cluster.

각각의 DAG 내부에서 구동할 수 있는 task 인스턴스의 최대 개수. dag_concurrency보다 더 직관적으로 뜻을 알 수 있도록 바뀐 느낌이다.

 

Configuration Reference — Airflow Documentation

 

airflow.apache.org