Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error

by roelpi
June 20, 2022June 20, 2022
18 Comments
2 min read

In the past couple of weeks, I’ve been working on a project which users Spark pools in Azure Synapse. However, this appears to be a general Spark issue. I was unable to write to delta lake using Spark because I received the following error.

You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+’s Proleptic Gregorian calendar.
See more details in SPARK-31404. You can set spark.sql.legacy.parquet.datetimeRebaseModeInRead to ‘LEGACY’ to rebase the datetime values w.r.t. the calendar difference during reading. Or set spark.sql.legacy.parquet.datetimeRebaseModeInRead to ‘CORRECTED’ to read the datetime values as it is.

First of all, what causes this? Apparently Spark 3.0 has issues reading very old dates (before the year 1582) and timestamps (before 1900). This is due to Spark 3.0 using the Proleptic Gregorian calendar instead of the hybrid Gregorian/Julian calendar. To solve this, there are two things you should do.

How to fix reading data

To be able to read the data into memory, you should update your spark configuration as follows.

spark.conf.set('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'CORRECTED')

If you’re adjusting the setting in Synapse, you should set this specific setting in the Spark configuration file, and reload your Spark pool.

How to fix writing data

It’s not because you can now read the data into memory, that you’ll be able to write the data. For example, I couldn’t write the data to delta lake, as long as it contained these erroneous dates.

To fix this, you can run the following Spark script. It will loop over all date columns and changes weird date values to ‘1900-01-01’.

    date_cols = [item[0] for item in sdf.dtypes if item[1].startswith('date')]
    for date_col in date_cols:
        sdf = sdf
			.withColumn(date_col, 
				F.when(
					F.col(date_col) <= '1900-01-01', 
					F.to_date(F.lit('1900-01-01'), 'yyyy-MM-dd'))
				.otherwise(F.col(date_col)))

Good luck!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

18 thoughts on “Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error”

Twicsy June 23, 2023 at 10:49 pm

Bel article, je l’ai partagé avec mes amis.

Reply
buy real followers instagram 2022 June 24, 2023 at 2:05 am

Highly energetic article, I enjoyed that bit. Will there be
a part 2?

Reply
Mpo3333 July 17, 2023 at 9:20 am

Hello tһere! I know tһis is kind of off topic but I was ᴡondering if you knew
whｅrｅ I c᧐uld get a captcһa plugin for my commernt form?
I’m using the samе bl᧐g platform as yours and I’m having troublе finding
one? Thanks a lot!

Stop by my homepage – Mpo3333

Reply
togelonline88 August 17, 2023 at 1:39 pm

Hi there, i reaԁ your blog occasionaⅼly and i own a similar one
and i was just cuгious if you get a lot of spam comments?
If so how dօ yоu preᴠent it, ɑny plսgin or anythinhg you can suggest?
I get so much lately it’s driving me maԀ so any hеlp is veгy much appreciated.

Take a lߋok at mʏ homepage: togelonline88

Reply
ikqjPptJyrnxdKD January 13, 2024 at 2:31 am

ECdkuzwQGBNoe

Reply
Fausto April 29, 2024 at 5:00 pm

Hi, just wanted to tell you, I loved this blog post.
It was funny. Keep on posting!

Reply
Hokicoy April 30, 2024 at 8:31 am

Way cool! Some extremely valid points! I appreciate you writing this post and the rest
of the site is extremely good.

Reply
Sherod April 30, 2024 at 6:16 pm

What’s up to all, how is all, I think every one is getting more from
this web page, and your views are pleasant designed
for new viewers.

Reply
cara deposit pulsa joker123 April 30, 2024 at 6:17 pm

Awesome! Its truly amazing article, I have got much clear idea concerning from
this paragraph.

Reply
Hokicoy Alternatif May 2, 2024 at 8:55 am

No matter if some one searches for his necessary thing, thus he/she needs to
be available that in detail, so that thing is maintained over here.

Reply
joker123 deposit pulsa indosat May 3, 2024 at 8:47 am

Attractive section of content. I just stumbled upon your site and
in accession capital to assert that I acquire in fact
enjoyed account your blog posts. Anyway I’ll be subscribing to
your feeds and even I achievement you access consistently
rapidly.

Reply
joker123 deposit pulsa 10rb May 9, 2024 at 4:10 am

Hi there to all, how is all, I think every one is getting more from this site, and your views are nice
in favor of new visitors.

Reply
Hokicoy May 11, 2024 at 7:01 am

Marvelous, what a website it is! This blog provides helpful information to us, keep it up.

Reply
Alejandro Phelps May 20, 2024 at 7:54 pm

Is noce to have this kind of sites that are extincted nowdaysContinue reading

Reply
naga169 May 21, 2024 at 3:00 pm

Awesome! Its truly remarkable post, I have got much
clear idea regarding from this post.

Reply
agen sv388 May 23, 2024 at 12:45 am

Hi! I know this is somewhat off-topic but I needed to ask.
Does building a well-established website such as yours require a lot of work?
I am completely new to writing a blog however I do write in my diary every day.
I’d like to start a blog so I can easily share my personal experience and feelings online.
Please let me know if you have any recommendations or tips for brand new aspiring bloggers.

Thankyou!

Reply
naga169 May 23, 2024 at 3:18 am

I like what you guys tend to be up too. This
kind of clever work and exposure! Keep up the wonderful works guys I’ve incorporated you guys to blogroll.

Reply
zAyGFkJBjd May 29, 2024 at 5:02 am

PeQhFiVONJ

Reply

Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error

How to fix reading data

How to fix writing data

Say thanks, ask questions or give feedback

18 thoughts on “Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error”

Leave a Reply Cancel reply

Related Posts

How to do a SUMIF in PySpark

Check if Python logger already exists

Python & NetworkX: Set node attributes from Pandas DataFrame