In this blog post, I discuss the difference between several BigQuery functions that appear to do the same: JSON_EXTRACT and JSON_QUERY — and their value counterparts JSON_EXTRACT_SCALAR and JSON_VALUE.
MPP Data Warehouses are increasingly becoming data lakes. In this light, extracting strings/values from semistructured data (stored as a JSON STRING) is a prominent feature of Google BigQuery.
![bigquery stringified json field](https://www.roelpeters.be/wp-content/uploads/2021/01/image-3.png)
![bigquery stringified json value](https://www.roelpeters.be/wp-content/uploads/2021/01/image-4.png)
Extracting strings and values from this “stringified JSON” can be done with the following functions:
- JSON_EXTRACT & JSON_QUERY: returns a string “16” from the field “hit_properties.hardware_concurrency”
- JSON_EXTRACT_SCALAR & JSON_VALUE: returns a scalar 16 from the field “hit_properties.hardware_concurrency”
So there are two functions for extracting a string and another two functions for extracting a scalar. Why? The reason is extremely boring: it’s an update in international standards. You can download the relevant publication here. The one you’re looking for is “ISO/IEC TR 19075-6:2017”.
JSON_QUERY and JSON_VALUE are the updated functions that follow the new standard. So what is the functional difference of these two functions with their respective duplicate functions JSON_EXTRACT and JSON_EXTRACT_SCALAR? It’s a matter of handling dots in field names (keys).
This is how you would handle dots using the JSON_EXTRACT function:
SELECT JSON_EXTRACT(json_text, "$['students.info']") AS json_text_string FROM UNNEST([ '{"class" : {"students.info" : [{"name" : "Jane"}]}}', '{"class" : {"students.info" : []}}', '{"class" : {"students.info" : [{"name" : "John"}, {"name": "Jamie"}]}}' ]) AS json_text;
…and this is how you would handle dots using the JSON_QUERY function:
SELECT JSON_QUERY(json_text, '$."students.info"') AS json_text_string FROM UNNEST([ '{"class" : {"students.info" : [{"name" : "Jane"}]}}', '{"class" : {"students.info" : []}}', '{"class" : {"students.info" : [{"name" : "John"}, {"name": "Jamie"}]}}' ]) AS json_text;
The difference is very subtle, yet manifests itself in the fact that you cannot wrap a key in double quotes when you use JSON_EXTRACT. This is not in line with the newest international standards.
Conclusion: use JSON_QUERY and JSON_VALUE.
Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me?
Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me?