With the acquisition of Dataform by Google, it is steadily becoming the ELT data orchestration tool par préférence. In this blog post we explore how to automatically create a table in your data warehouse that will function as an incremental data destination in Dataform.
If you are interested in a JavaScript solution, please read this post.
In my personal opinion, one of the most prominent core features of Dataform is the incremental table. By comparing each record’s unique ID or timestamp to the row last added to the destination table, one can incrementally insert rows. However, if your destination table doesn’t exist, you need to create it; either manually in BigQuery, or automatically in Dataform.
In this blog post, we create a Dataform definition that creates the destination table if it doesn’t exists and incrementally inserts rows with every run.
-- (1) config { type: "incremental", schema: "YOUR_SCHEMA", name: "YOUR_DESTINATION_TABLE", -- [YOUR OTHER CONFIGURATION PARAMETERS GO HERE] } -- (2) SELECT YOUR_INCREMENTAL_FIELD INT, -- [YOUR FIELDS GO HERE] FROM ${ref(YOUR_SCHEMA.YOUR_SOURCE_TABLE)} WHERE ( -- (3) IF THERE ARE ALREADY ROWS IN THE DESTINATION TABLE, ONLY INSERT NEW ONES (YOUR_INCREMENTAL_FIELD > (SELECT MAX(YOUR_INCREMENTAL_FIELD) FROM ${self()})) OR -- (4) IF TABLE IS EMPTY, INSERT ALL ROWS (0 = (SELECT COUNT(<YOUR_INCREMENTAL_FIELD>) FROM ${self()})) ) -- (5) USE PRE_OPERATIONS FOR CREATING THE TABLE IF IT DOESN'T EXIST pre_operations { CREATE TABLE IF NOT EXISTS ${self()} ( ... ); }
Here’s how it works.
- In our configuration, we create an incremental dataset using the “incremental” type.
- The query that will insert the new rows in the incremental data set.
- This is where we check if the incremental field from the source table is higher than the highest value in that same field in the destination table. This ensures only new rows get added.
- This condition is added because we cant to add all rows if the created dataset is empty.
- In this pre_operation, we create the destination table in our data warehouse if it doesn’t exist.
If you are interested in a JavaScript solution, please read this post.
Your article helped me a lot, is there any more related content? Thanks!