G O O G L E C E R T I F I C A T I O N

Professional Data Engineer Practice Exam

Exam Number: 1007 | Last updated April 21, 2026 | 999+ questions across 4 vendor-aligned objectives

Professional Data Engineer certification is aimed at working professionals who possess the practical knowledge Google expects on its platform. It is built for data engineers and analytics engineers who build pipelines, warehouses, and ML-ready datasets on Google Cloud, and scoring rewards candidates who translate features into measurable results rather than simply recognize service names.

Heavy-weighted areas define where study time pays back fastest: 25% targets Ingesting and Processing the Data (Dataflow, Dataproc, Pub/Sub, Cloud Composer, Datastream); 22% targets Designing Data Processing Systems (batch versus streaming, schema design, pipeline reliability, data lake versus warehouse); 20% targets Storing the Data (BigQuery partitioning and clustering, Cloud Storage classes, Bigtable keys).

Supporting domains fill out the blueprint: 18% covers Maintaining and Automating Data Workloads (monitoring pipelines, cost optimization, IAM for data, catalog governance); 15% covers Preparing and Using Data for Analysis (BigQuery ML, Looker, Dataform, materialized views). Each still appears on the exam, so none can be safely skipped. Google updates exam guides regularly, so verify domain weights on the official certification page before you finalize a study plan.

♥ BigQuery dominates this exam, and so do the cost controls around it. Know how partitioning, clustering, materialized views, and BI Engine reservations interact, and expect several questions where the wrong answer is the one that scans the entire table every time.

Every answer links to the source. Each explanation below includes a hyperlink to the exact Google documentation page the question was derived from. PowerKram is the only practice platform with source-verified explanations. Learn about our methodology →

836

practice exam users

93.9%

satisfied users

85.6%

passed the exam

4.2/5

quality rating

Test your Data Engineer knowledge

10 of 999+ questions

Question #1 - Storing the Data

A finance team’s BigQuery table grew to 8 TB and daily queries that only need last month’s rows still scan the full table, driving up cost.

Which BigQuery feature most directly fixes this?

A) Remove the table and recreate it with no indexes
B) Enable partitioning by date and use pruning in queries
C) Disable the results cache
D) Convert the table to CSV in Cloud Storage

Show solution

Correct answers: B – Explanation:
Date partitioning plus partition pruning in WHERE clauses scans only the relevant partitions, which cuts cost dramatically. Recreating without indexes misunderstands BigQuery. Disabling cache increases cost. CSV in Cloud Storage loses BigQuery’s query engine. Source: Check Source

Question #2 - Ingesting and Processing the Data

A data team needs a serverless unified batch and streaming pipeline that scales automatically and uses Apache Beam.

Which Google Cloud service fits?

A) Dataproc with pre-provisioned workers
B) Cloud Functions chained together
C) Compute Engine with custom scripts
D) Dataflow

Show solution

Correct answers: D – Explanation:
Dataflow is the serverless, autoscaling Apache Beam runtime on Google Cloud that handles both batch and streaming. Dataproc is managed but not serverless and not unified. Cloud Functions chains and custom scripts on VMs are not unified pipelines. Source: Check Source

Question #3 - Designing Data Processing Systems

A marketing team needs dashboards to reflect events within one minute, and the source is a high-volume Pub/Sub topic.

Which architecture fits best?

A) Streaming Dataflow writing to BigQuery with streaming inserts
B) Batch Dataflow job running once per day
C) Manual daily CSV exports
D) FTP to an on-prem DB hourly

Show solution

Correct answers: A – Explanation:
A streaming Dataflow pipeline from Pub/Sub to BigQuery delivers near real-time analytics. Daily batch, CSV exports, and FTP are too slow for a one-minute freshness goal. Source: Check Source

Question #4 - Storing the Data

A time-series analytics project stores sensor readings in Bigtable and observes hot-spotting on a few tablet servers.

Which row-key design most directly mitigates hot-spotting?

A) Use a monotonically increasing timestamp prefix
B) Use a single constant as the key
C) Use a hashed or reversed field as the prefix to distribute writes
D) Use the device’s serial number only, unhashed, oldest first

Show solution

Correct answers: C – Explanation:
Hashing or reversing a prefix spreads writes evenly across tablet servers, which is Google’s recommended time-series pattern. A monotonically increasing prefix concentrates writes on one tablet. A constant key is the worst case. An unhashed serial can still cluster writes. Source: Check Source

Question #5 - Preparing and Using Data for Analysis

An analytics team wants to train a simple churn model directly on the data already in BigQuery without exporting it.

Which Google Cloud capability enables that?

A) Cloud Functions
B) Cloud DNS
C) Memorystore for Redis
D) BigQuery ML

Show solution

Correct answers: D – Explanation:
BigQuery ML lets analysts train and serve models using SQL on BigQuery data, avoiding exports. Cloud Functions, DNS, and Memorystore are not ML platforms. Source: Check Source

Question #6 - Maintaining and Automating Data Workloads

A data platform team needs to orchestrate DAGs with complex branching and task dependencies across BigQuery, Dataflow, and external systems.

Which Google Cloud service is the best fit?

A) Cloud Composer (managed Apache Airflow)
B) Cloud Scheduler alone
C) A single Cloud Run service running cron
D) Cloud Storage lifecycle rules

Show solution

Correct answers: A – Explanation:
Cloud Composer is managed Airflow, built for complex DAGs with branching. Scheduler triggers individual jobs but does not model dependencies. A Cloud Run cron is too simple. Lifecycle rules are storage policies, not orchestration. Source: Check Source

Question #7 - Ingesting and Processing the Data

A data engineer needs low-latency CDC from an Oracle database on-premises into BigQuery.

Which Google Cloud service is purpose-built for this?

A) Cloud Storage transfer jobs only
B) Datastream with BigQuery as a destination
C) Manual daily mysqldump
D) Pub/Sub push to Oracle directly

Show solution

Correct answers: B – Explanation:
Datastream is Google’s serverless CDC service that streams changes from Oracle and others to BigQuery. Storage transfer jobs move files, not CDC. Manual dumps are not low-latency. Pub/Sub does not push into Oracle. Source: Check Source

Question #8 - Preparing and Using Data for Analysis

A dashboard re-runs the same expensive aggregation on a 10 TB table every hour, driving BigQuery cost up.

Which BigQuery feature most directly cuts cost for this pattern?

A) Disabling query results cache
B) Exporting the dataset to Cloud Storage daily
C) Materialized views refreshed incrementally
D) Switching to a Cloud SQL dashboard

Show solution

Correct answers: C – Explanation:
Materialized views cache results and refresh incrementally, which reduces repeated scan cost. Disabling cache increases cost. Export loses BigQuery. Switching to Cloud SQL does not scale to 10 TB analytics. Source: Check Source

Question #9 - Maintaining and Automating Data Workloads

A governance lead needs to ensure analysts can query a sensitive BigQuery dataset only through approved views that mask PII.

Which approach best enforces that?

A) Authorized views and column-level access policies in BigQuery
B) Grant dataset Owner to all analysts
C) Export sensitive columns to an open Cloud Storage bucket
D) Ask analysts to promise not to query PII

Show solution

Correct answers: A – Explanation:
Authorized views plus column-level access policies let analysts query only the masked view without needing direct table access. Granting Owner is the opposite of least privilege. Open Cloud Storage leaks data. Promises are not controls. Source: Check Source

Question #10 - Designing Data Processing Systems

A retailer keeps raw click-stream JSON in Cloud Storage and cleaned, modeled data in BigQuery.

Which phrase best characterizes this pattern?

A) A single monolithic warehouse with no raw zone
B) A data lake for raw plus a warehouse for modeled analytics
C) A transactional database used as a warehouse
D) An in-memory cache replacing both storage and analytics

Show solution

Correct answers: B – Explanation:
Raw data in Cloud Storage plus modeled data in BigQuery is the classic lake-plus-warehouse (lakehouse) pattern on Google Cloud. A monolithic warehouse has no raw zone. An OLTP database is not a warehouse. An in-memory cache cannot replace either tier. Source: Check Source

Get 999+ more questions with source-linked explanations

Every answer traces to the exact Google documentation page — so you learn from the source, not just memorize answers.

Exam mode & learn mode · Score by objective · Updated April 21, 2026

Learn more...

What the Data Engineer exam measures

Designing Data Processing Systems (22%): Apply Google Cloud practices to batch versus streaming, schema design, pipeline reliability, data lake versus warehouse.
Ingesting and Processing the Data (25%): Apply Google Cloud practices to Dataflow, Dataproc, Pub/Sub, Cloud Composer, Datastream.
Storing the Data (20%): Apply Google Cloud practices to BigQuery partitioning and clustering, Cloud Storage classes, Bigtable keys.
Preparing and Using Data for Analysis (15%): Apply Google Cloud practices to BigQuery ML, Looker, Dataform, materialized views.
Maintaining and Automating Data Workloads (18%): Apply Google Cloud practices to monitoring pipelines, cost optimization, IAM for data, catalog governance.

How to prepare for this exam

Review the Professional Data Engineer official exam guide end to end before you commit a study plan, so every later hour is spent against the published blueprint.
Complete the relevant Google Cloud Skills Boost learning path and treat its labs as non-optional rather than extra credit.
Get hands-on practice in Qwiklabs sandbox, repeating the same tasks from memory until configuration feels routine.
Apply what you learn in real-world project experience — your day job, a volunteer project, or an open-source contribution — so the concepts stick.
Master one objective at a time, starting with the highest-weighted domain on the blueprint and moving down from there.
Use PowerKram learn mode with feedback and sourced links to close gaps while the answer rationale is still fresh.
Finish with PowerKram exam mode across all objectives under realistic time pressure before you book the real exam.

Career paths and salary outlook

Holding the Professional Data Engineer certification typically supports roles such as:

Data Engineer: roughly $ 115,000 to $165,000 USD per year in the US market (range varies by region, years of experience, and specialization). See current data on Glassdoor.
Analytics Engineer: roughly $ 100,000 to $150,000 USD per year in the US market (range varies by region, years of experience, and specialization). See current data on Levels.fyi.
BigQuery Specialist: roughly $ 130,000 to $180,000 USD per year in the US market (range varies by region, years of experience, and specialization). See current data on Payscale.

Official resources

Work directly from Google’s own preparation resources and treat third-party content as a supplement:

Professional Data Engineer Practice Exam

836

93.9%

85.6%

4.2/5

Test your Data Engineer knowledge

Get 999+ more questions with source-linked explanations

Learn more...

Related certifications to explore

Related reading from our Learning Hub