Home » Dataform: Create if incremental table doesn’t exist (JS)

Dataform: Create if incremental table doesn’t exist (JS)

  • by
  • 2 min read

One of Google’s latest acquisitions, ELT data orchestration tool Dataform, supports JavaScript. This is a great way to chain Dataform’s actions. In this blog post we explore how to automatically create a destination table in your data warehouse, before an incremental definition is run.

If you are interested in a SQLX solution, please read this blog post.

One of the most prominent core features of Dataform is the incremental table. By comparing each record’s unique ID or timestamp to the row last added to the destination table, one can incrementally insert rows. However, if your destination table doesn’t exist, you need to create it; either manually in BigQuery, or automatically in Dataform.

In this blog post, we create a Dataform definition that creates the destination table if it doesn’t exist and incrementally inserts rows with every run. This is how you’d do it via JavaScript:

publish("YOUR_DESTINATION_TABLE", {
  type: "incremental",
  schema: "YOUR_SCHEMA"
  // [YOUR OTHER CONFIGURATION PARAMETERS GO HERE]
}).query(ctx => `
  SELECT
 	YOUR_INCREMENTAL_FIELD,
	[YOUR_FIELDS]
FROM
  ${ctx.ref('YOUR_SOURCE_TABLE')}
WHERE
  YOUR_INCREMENTAL_FIELD > (
    SELECT
      MAX(PROCESSING_TIMESTAMP)
    FROM
      ${ctx.self()})
      OR
      (0 = (SELECT COUNT(YOUR_INCREMENTAL_FIELD) FROM ${ctx.self()}))
  )
`).preOps(ctx => `
   CREATE TABLE IF NOT EXISTS ${ctx.self()} (
     INCREMENTAL_FIELD INT
  );
`)

Here’s how it works.

  1. Using the publish function, we create an incremental dataset using the “incremental” type.
  2. The query that will insert the new rows in the incremental data set.
  3. We check if the incremental field from the source table is higher than the highest value in that same field in the destination table. This ensures only new rows get added.
  4. A second OR condition is added because we cant to add all rows if the created dataset is empty.
  5. In a pre_operation, we create the destination table in our data warehouse if it doesn’t exist.

If you are interested in a SQLX solution, please read this blog post.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *