As of this week, I can call myself a Google Cloud Certified Professional Data Engineer, after passing the assessment. In this blog post, I would like to share my experiences for preparing and taking the exam. I hope they’ll help you in one way or another.
The whole preparation took me 3 months, but with some more dedication I could’ve done it in 6 weeks. These are the 5 steps I took:
- assessing my personal baseline;
- grinding through the documentation;
- lots of quizzes;
- crawling through my notes;
- the assessment.
Step 1: Assessing my personal baseline
D-day minus 12 weeks
First, I started by taking some online courses. I didn’t just play them during work, because a lot of the details are important and I needed to actively process the information. Nevertheless, I watched most of them on 2x speed. The ‘Google Data Engineer Exam’ courses by Guy Hummel on Cloud Academy are absolutely brilliant to get started. I’ve taken over three dozen online courses on various topics and I can truly say it’s in my top three. The course is concise, yet guarantees the necessary depth and width. It combines lectures with exercises, quizzes and reading material. The quizzes are okay, but they’re easier and different from the real exam.
After that, I started solving some practice questions. The practice questions from Whizlabs are really good and are in line with what you can expect in the assessment. If you search really hard, you’ll be able to find more questions around the web. However, a lot of the questions I found are completely different from the assessment, or even have wrong answers.
By taking online courses and completing a quiz or two, I got an idea of my strengths and weaknesses.
These are the core GCP products that the assessment covers.
But there was some overlap with other Google tools too.
- Cloud Storage
- Cloud Spanner
- Compute Engine
- Data Loss Prevention
- Natural Language AI
- Video AI
Also, I encountered the following tools somewhere along my study journey.
I’ve worked as a web analytics consultant (~4y) and as a data scientist (~2y). I’ve also built some side projects, most of them used GCP in one way or another. I had a lot of experience with BigQuery and I’ve used AI Platform and Pub/Sub occasionally.
For me, Bigtable, Dataproc and Dataflow were the biggest bottle necks.
Step 2: grinding the documentation
D-day minus 9 weeks
I started reading the GCP documentation. Honestly, I kept procrastinating to get started with it, but I think it was the most rewarding step, eventually.
Also, there’s no shortcut through this step. I found it important to keep an overview of what I covered and what I was still up against. To help you get started: I’ve listed a lot of important documentation pages on this Notion page (make sure to wait for the lazy load when scrolling, it’s a long list).
I also found notetaking to be helpful. I later turned them into flash cards, which you can find below.
Step 3: quiz time
D-day minus 7 days
I retook the first exams from Whizlabs and finished all of the others. I was always in the range of 70~85% (yup, failed some). I think Whizlabs’ questions are so good because:
- they don’t look at GCP components in isolation;
- they present a corporate situation and you have to choose the most suitable solution;
- the answers are contextualized.
I wrote down and looked up the questions I answered incorrectly. Sometimes, I didn’t agree with the answer, or I thought that multiple answers were possible. Nevertheless, it’s a great way to see the bigger picture and to grasp how all the GCP components relate to each other.
Step 4: crawling through my notes for the final sprint
D-day minus 1 day
Repeat. This is where your notes will come in handy. If you don’t have them: I’ve created very simple questions (~flash cards) that should help you remember a lot of the details.
After repeating, you should be able to parrot the topics below, because they will definitely be in the assessment and they are very easy to answer since they don’t require a lot of interpretation. You just have to know it.
- Make sure you memorize the requirements and limitations for the various data transfer techniques: transfer appliance, cloud storage transfer and gsutil. (~2 questions)
- If you’re a data scientist, you’re in luck. Many of the questions cover basic machine learning topics such as performance metrics and generalization (2~5 questions). If you’re no data scientist, take an introductory course into machine learning.
- You’ll get a question about whether to store data on HDFS or GCP Storage when using Dataproc.
- Another important topic is row keys in Bigtable, expect 1 or 2 questions.
- BigQuery is a very prominent GCP component. These topics kept coming back in practice questions around the web, and eventually in the exam: authorized views, altering tables (what can and can you not do) and (de)normalization.
- The AI APIs: Natural Language AI, Speech-to-text and Video AI. Simply knowing what their use case is should suffice. (1 question)
- Make sure you know how acknowledgement works in Pub/Sub. (1 question)
By memorizing these topics, you’ve basically covered 25%~30% of the exam.
Step 5: the assessment
I took the assessment via Kryterion’s Webassessor. It’s an extremely shitty website, with an equally shitty assessment app. I had to show my surroundings via webcam, turn off my phone, empty my desk, etc. All this took ~ half an hour. I didn’t have to talk to the person behind the camera, but we chatted a lot. The person also observed me during the assessment.
I found the questions very hard. For most, I couldn’t just pick the right answer. They came down to eliminating what was surely not the right answer. I’ve done other exams like these (Google Analytics, Adobe Analytics, …) and they are nowhere near the level of difficulty I encountered.
In the end, I got a Pass, but it still had to be confirmed by Google — for whatever reason. That confirmation came 4 days later. I could finally call myself a “Google Cloud Certified Professional Data Engineer” and an obligatory LinkedIn post soon followed.
Some final tips:
- As with many multiple-choice exams, eliminate. If you can’t eliminate to one possible answer, make a guess.
- You can eliminate all the answer that recommend non-GCP solutions.
- If you’ve taken your time to go through all the documentation and you have no clue what one of the answers even means, it’s probably the wrong answer.