Skip to content

How to do a SUMIF in PySpark

  • by
  • 2 min read

One of the most frequent used Excel functions is probably SUMIF and its SUMIFS variant. In this article, you’ll learn how to do exactly the same in PySpark. What is the sumif function? In Excel, the SUMIF function is an aggregation function for summing values from a column, but only… 

How to generate a date range in Azure Synapse without Spark notebooks

  • by
  • 4 min read

Every since I’ve started using Synapse for an assignment, I’ve preferred using Spark Notebooks to get anything done. However, they take time to spin up, something I wanted to mitigate by using Synapse-native components. In this article, we’ll generate a date range without Spark notebooks. It’s unnecessarily complicated, but you… 

Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error

  • by
  • 2 min read

In the past couple of weeks, I’ve been working on a project which users Spark pools in Azure Synapse. However, this appears to be a general Spark issue. I was unable to write to delta lake using Spark because I received the following error. You may get a different result…