Skip to content
Home » BigQuery: JSON_EXTRACT vs. JSON_QUERY

BigQuery: JSON_EXTRACT vs. JSON_QUERY

  • by
  • 2 min read
Tags:

In this blog post, I discuss the difference between several BigQuery functions that appear to do the same: JSON_EXTRACT and JSON_QUERY — and their value counterparts JSON_EXTRACT_SCALAR and JSON_VALUE.

MPP Data Warehouses are increasingly becoming data lakes. In this light, extracting strings/values from semistructured data (stored as a JSON STRING) is a prominent feature of Google BigQuery.

bigquery stringified json field
JSON stored as a STRING column
bigquery stringified json value
A JSON string in the STRING column ‘event’

Extracting strings and values from this “stringified JSON” can be done with the following functions:

  • JSON_EXTRACT & JSON_QUERY: returns a string “16” from the field “hit_properties.hardware_concurrency”
  • JSON_EXTRACT_SCALAR & JSON_VALUE: returns a scalar 16 from the field “hit_properties.hardware_concurrency”

So there are two functions for extracting a string and another two functions for extracting a scalar. Why? The reason is extremely boring: it’s an update in international standards. You can download the relevant publication here. The one you’re looking for is “ISO/IEC TR 19075-6:2017”.

JSON_QUERY and JSON_VALUE are the updated functions that follow the new standard. So what is the functional difference of these two functions with their respective duplicate functions JSON_EXTRACT and JSON_EXTRACT_SCALAR? It’s a matter of handling dots in field names (keys).

This is how you would handle dots using the JSON_EXTRACT function:

SELECT JSON_EXTRACT(json_text, "$['students.info']") AS json_text_string
FROM UNNEST([
  '{"class" : {"students.info" : [{"name" : "Jane"}]}}',
  '{"class" : {"students.info" : []}}',
  '{"class" : {"students.info" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

…and this is how you would handle dots using the JSON_QUERY function:

SELECT JSON_QUERY(json_text, '$."students.info"') AS json_text_string
FROM UNNEST([
  '{"class" : {"students.info" : [{"name" : "Jane"}]}}',
  '{"class" : {"students.info" : []}}',
  '{"class" : {"students.info" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text;

The difference is very subtle, yet manifests itself in the fact that you cannot wrap a key in double quotes when you use JSON_EXTRACT. This is not in line with the newest international standards.

Conclusion: use JSON_QUERY and JSON_VALUE.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *