In this blog post, I explain why a certain error in BigQuery arises and how you can get rid of it.
Although I have abandoned the comma join syntax a while ago, I do happen to use it within the context of arrays in Google BigQuery (all the cool kids do it). That’s why I was surprised to run into the following error.
“Syntax error: RIGHT JOIN must be parenthesized when following a comma join. Also, if the preceding comma join is a correlated CROSS JOIN that unnests an array, then CROSS JOIN syntax must be used in place of the comma join a [x:x]”
In the following piece of query, I tried to select all users responsible for a transaction that had their page speed tested at least once, from a Google Analytics export table. Since I needed access to the ‘hits’ array, which is a nested table, I first did a cross join.
[...]
WITH HASLOAD AS (
[...]
)
SELECT
DISTINCT fullVisitorId,
[...]
FROM
`project-name.9000000.ga_sessions_*` GA, UNNEST(GA.hits) hits
RIGHT JOIN HASLOAD H ON GA.fullVisitorId = H.fullVisitorId
AND hits.eCommerceAction.action_type = '6'
A small red sign with an exclamation mark appeared on the RIGHT JOIN line and it showed me the following tip.
I read the tip too fast and didn’t read the second part of the error. I added parentheses, yet it didn’t work. Of course, this is indeed a correlated cross join, as each ‘hits’ array is flattened and joined on the single row it belongs to. So the easiest way to fix it is by replacing the comma join with a CROSS JOIN.
[...]
WITH HASLOAD AS (
[...]
)
SELECT
DISTINCT fullVisitorId
FROM
`project-name.9000000.ga_sessions_*` GA
CROSS JOIN UNNEST(GA.hits) hits
RIGHT JOIN HASLOAD H ON GA.fullVisitorId = H.fullVisitorId
AND hits.eCommerceAction.action_type = '6'
Why doesn’t this work?
Comma joins are processed from left to right, so there doesn’t seem to be a problem at first glance. But truly, I don’t know. The BigQuery syntax documentation simply says:
“There cannot be a
RIGHT JOIN
orFULL JOIN
after a comma join.”
Let’s just leave it at that.