Site icon Roel Peters

Solve: “name ‘ds’ is not defined” in Airflow

I’ve been working with Airflow recently, and I needed to configure an operator by passing it the execution time. Or at least I thought so. I really felt like writing findings down, because if you’re in the dark here, like I was today, there are some fundamental Airflow principles you still need to wrap your head around.

Let’s simulate what cause the error first.

with DAG(...) as dag:
		...
        my_hive_task = HiveOperator(
                hql = '/my_hiveql_query.sql',
                params = {
                        **args,
						end_date = '{{ ds }}'
                },
                task_id = '...',
        )
      	...

The query in my_hiveql_query.sql contains the following query. It should be clear that I’m trying to filter a SELECT query based on the end_date param from my_hive_task.

SELECT
	*
FROM
	my_schema.my_table
WHERE
	timestamp < '{{ params.end_date }}'

However, that simply generated the following error in the Airflow GUI:

name 'ds' is not defined

If you want to understand what the issue at hand is here, it’s crucial that you understand how Airflow templates work.

Conclusion: You can’t simply evaluate variables and expressions wherever you want in your DAG file. Templates can only be used in operator parameters that support them.

For example, you can template the bash_command parameter of the Bash Operator.

with DAG(...) as dag:
		...
		BashOperator(
			task_id = "echo_date",
			bash_command = "echo {{ ds }}"
)

In my case, that meant using the ds variable inside the hql parameter — which is the content of my_hiveql_query.sql.

with DAG(...) as dag:
		...
        my_hive_task = HiveOperator(
                hql = '/my_hiveql_query.sql',
                params = {
                        **args
                },
                task_id = '...',
        )
      	...

The query in my_hiveql_query.sql contains the following query. It should be clear that I’m trying to filter a SELECT query based on the end_date param from my_hive_task.

SELECT
	*
FROM
	my_schema.my_table
WHERE
	timestamp < '{{ ds }}'

🤨 So how do I know if a parameter supports templates?

First of all, you can always look at the documentation of an operator. The Hive operator clearly states “(templated)” at the end of the hql documentation.

You can read in the Airflow documentation that the hql parameter supports templates.

Second, you can check the source code of the operator. The source code for the Hive operator is accessible on GitHub. All you need to do is find the template_fields variable, which contains all the parameters that can be templated.

The template_fields variable should be somewhere at the top.

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Exit mobile version