Skip to content
Home ยป python

python

How to do a SUMIF in PySpark

  • by
  • 2 min read

One of the most frequent used Excel functions is probably SUMIF and its SUMIFS variant. In this article, you’ll learn how to do exactly the same in PySpark. What is the sumif function? In Excel, the SUMIF function is an aggregation function for summing values from a column, but only… 

Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error

In the past couple of weeks, I’ve been working on a project which users Spark pools in Azure Synapse. However, this appears to be a general Spark issue. I was unable to write to delta lake using Spark because I received the following error. You may get a different result… 

Solve TypeError: ‘dict’ object does not support indexing when running SQL queries in Python

I ran into another silly error, for which I wanted to share the solution in order to save you some time. It occurs when trying to run a query using Python’s SQLAlchemy libary. Let’s dive right in. The problem When you’re trying to run a query, either by using Pandas’…