A W S C E R T I F I C A T I O N
DEA C01 Certified Data Engineer Associate Practice Exam
Exam Number: 1208 | Last updated April 24, 2026 | 700+ questions across 4 vendor-aligned objectives
The AWS Certified Data Engineer Associate (DEA-C01) targets practitioners who build and operate data pipelines, lakehouses, and analytics platforms on AWS. Candidates typically have two or three years of data engineering experience and at least one year of hands-on AWS work, including SQL, Python or Spark, and familiarity with the AWS analytics service portfolio. The exam emphasizes choosing the right service for the right ingestion, storage, processing, and serving pattern.
Data Ingestion and Transformation and Data Store Management form the core of the blueprint. Data Ingestion and Transformation (34%) is the largest domain and covers AWS Glue, AWS Lambda, Amazon Kinesis Data Streams, Amazon Data Firehose, Amazon Managed Streaming for Apache Kafka (Amazon MSK), and AWS Database Migration Service. Data Store Management (26%) covers Amazon S3 with Apache Iceberg, Amazon Redshift, Amazon DynamoDB, Amazon RDS, Amazon Aurora, AWS Lake Formation, and partition-strategy design.
The remaining domains stress operations and governance. Data Operations and Support (22%) covers Amazon CloudWatch monitoring of pipelines, AWS Step Functions orchestration, AWS Glue Data Quality, and incident remediation patterns. Data Security and Governance (18%) covers AWS Identity and Access Management for data, AWS Lake Formation row- and column-level security, AWS Key Management Service, and Amazon Macie for sensitive-data discovery.
Every answer links to the source. Each explanation below includes a hyperlink to the exact AWS documentation page the question was derived from. PowerKram is the only practice platform with source-verified explanations. Learn about our methodology →
296
practice exam users
90.2%
satisfied users
85%
passed the exam
4/5
quality rating
Test your aws-data-engineer-associate knowledge
10 of 700+ questions
Question #1 - Data Ingestion and Transformation
A pipeline must ingest streaming clickstream events at ~50,000 events/second, retain them 24 hours, and let two separate consumer apps replay events independently.
Which service best fits?
A) Amazon SQS standard queue
B) Amazon SNS
C) Amazon Kinesis Data Firehose only
D) Amazon Kinesis Data Streams
Show solution
Correct answers: D – Explanation:
KDS retains data (default 24h, up to 365d), supports multiple parallel consumers, and scales via shards or on-demand. SQS deletes messages after consumption (no replay across consumers); Firehose is delivery-only with no replay; SNS is pub/sub but not retained streaming. Source: [Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/introduction.html)
Question #2 - Data Store Management
An analytics team needs petabyte-scale columnar storage with concurrent SQL queries and automatic concurrency scaling.
Which service fits?
A) Amazon Redshift with Concurrency Scaling
B) Amazon RDS for PostgreSQL
C) Amazon DynamoDB
D) Amazon S3 with Athena only
Show solution
Correct answers: A – Explanation:
Redshift is a columnar MPP warehouse; Concurrency Scaling adds transient capacity for query bursts. RDS is row-oriented OLTP; DynamoDB is key-value; Athena S3 works but Redshift is purpose-built for sustained, concurrent BI workloads. Source: [Redshift Concurrency Scaling](https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html)
Question #3 - Data Operations and Support
Glue ETL jobs are failing intermittently with ‘Resource unavailable’ errors during peak hours.
What is the best first remediation?
A) Disable monitoring
B) Switch to manual SSH tuning
C) Lower job concurrency or use job bookmarks and retries with exponential backoff
D) Move to EC2 cron
Show solution
Correct answers: C – Explanation:
Reducing concurrent DPU demand and enabling retries handles transient resource pressure; bookmarks prevent reprocessing on retry. The other options either obscure problems or move off the managed service unnecessarily. Source: [AWS Glue job parameters](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html)
Question #4 - Data Ingestion and Transformation
A data engineer needs to land Kinesis stream data into S3 partitioned by event date with Parquet format and minimal code.
Which service does this most simply?
A) EMR cluster running 24/7
B) Manual Lambda writing CSV per record
C) Amazon Kinesis Data Firehose with dynamic partitioning and Parquet conversion
D) Glue job triggered every minute
Show solution
Correct answers: C – Explanation:
Firehose natively converts to Parquet and supports dynamic partitioning by attributes like event date. Lambda CSV is hand-rolled and inefficient; 24/7 EMR is overkill; Glue every minute is more complex than Firehose for delivery. Source: [Firehose Parquet and dynamic partitioning](https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html)
Question #5 - Data Security and Governance
A central data lake on S3 must let multiple teams query specific columns/rows of tables without copying data.
Which service governs fine-grained access for Athena/Redshift Spectrum/Glue?
A) IAM only with object-level allow
B) S3 bucket policies only
C) AWS Lake Formation with column- and row-level security
D) KMS only
Show solution
Correct answers: C – Explanation:
Lake Formation centralizes table/column/row-level grants enforced by integrated services. Bucket policies and IAM operate on objects, not columns/rows; KMS handles encryption, not access governance. Source: [AWS Lake Formation](https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html)
Question #6 - Data Store Management
A team needs an OLTP database that auto-scales storage from 10 GB to 128 TB and supports MySQL/PostgreSQL compatibility with low-latency replicas.
Which service fits?
A) Amazon Redshift
B) Amazon DynamoDB
C) Amazon RDS for SQL Server
D) Amazon Aurora
Show solution
Correct answers: D – Explanation:
Aurora auto-scales storage to 128 TB and supports MySQL/PostgreSQL with low-latency replicas. DynamoDB is NoSQL; RDS for SQL Server is a different engine; Redshift is a warehouse. Source: [Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html)
Question #7 - Data Operations and Support
An engineer wants to orchestrate a multi-step ETL workflow with retries, branching, and human approval steps.
Which service is purpose-built?
A) AWS Step Functions
B) AWS Lambda alone
C) Amazon SQS
D) Amazon CloudWatch Events
Show solution
Correct answers: A – Explanation:
Step Functions provides a state machine with retries, choice/parallel states, and human-approval patterns. Lambda alone has no orchestration; SQS is queueing; CloudWatch Events triggers but doesn’t orchestrate workflows. Source: [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html)
Question #8 - Data Ingestion and Transformation
A pipeline needs to capture row-level changes from an on-prem Oracle database into S3 in near-real-time.
Which service is appropriate?
A) AWS Snowball Edge
B) AWS Database Migration Service (DMS) with CDC and an S3 target
C) Amazon SES
D) AWS Backup
Show solution
Correct answers: B – Explanation:
DMS supports CDC from Oracle and writing to S3 (CSV/Parquet) for near-real-time replication. Snowball is for offline bulk; SES is email; Backup is for backups, not CDC. Source: [DMS S3 target with CDC](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html)
Question #9 - Data Security and Governance
PII data must be discovered automatically across S3 buckets and reported.
Which service does this?
A) Amazon Macie
B) Amazon GuardDuty
C) AWS WAF
D) AWS Config
Show solution
Correct answers: A – Explanation:
Macie uses ML and pattern matching to discover sensitive data in S3 and reports findings. GuardDuty detects threats; WAF is web request filtering; Config tracks resource configuration. Source: [Amazon Macie](https://docs.aws.amazon.com/macie/latest/user/what-is-macie.html)
Question #10 - Data Store Management
A read-heavy DynamoDB table experiences hot-partition throttling because 80% of reads target a small set of items.
Which solution most directly fixes this?
A) Lower the table’s RCUs
B) Add DynamoDB Accelerator (DAX) for in-memory caching
C) Switch to provisioned mode without changing schema
D) Add a GSI on the same partition key
Show solution
Correct answers: B – Explanation:
DAX caches hot items in front of DynamoDB, dramatically reducing hot-partition pressure for read-heavy workloads. Lowering RCUs makes throttling worse; mode change alone won’t fix hot keys; a GSI on the same key gives no spread. Source: [Amazon DAX](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.html)
Get 700+ more questions with source-linked explanations
Every answer traces to the exact AWS documentation page — so you learn from the source, not just memorize answers.
Exam mode & learn mode · Score by objective · Updated April 24, 2026
Learn more...
What the aws-data-engineer-associate exam measures
- Data Ingestion and Transformation (34%) — Build streaming and batch pipelines with AWS Glue, AWS Lambda, Amazon Kinesis Data Streams, Amazon Data Firehose, and Amazon MSK; transform with Apache Spark and SQL.
- Data Store Management (26%) — Design Amazon S3 lakehouse layouts with Apache Iceberg, model warehouses in Amazon Redshift, choose between Amazon RDS, Amazon Aurora, and Amazon DynamoDB, and govern with AWS Lake Formation.
- Data Operations and Support (22%) — Monitor pipelines with Amazon CloudWatch, orchestrate with AWS Step Functions, validate with AWS Glue Data Quality, and remediate failed jobs.
- Data Security and Governance (18%) — Apply AWS Identity and Access Management for data access, configure row- and column-level security in AWS Lake Formation, encrypt with AWS Key Management Service, and detect sensitive data with Amazon Macie.
How to prepare for this exam
- Review the official AWS exam guide and confirm the latest domain weights and content scope before scheduling.
- Complete the matching learning plan on AWS Skill Builder, including the digital courses and exam prep modules.
- Build hands-on muscle memory in an AWS Free Tier account by deploying the services that appear in the Data Ingestion and Transformation domain.
- Apply your skills to a real-world project — workplace assignments, volunteer work, or open-source contributions where AWS services solve a concrete problem.
- Master one objective at a time, beginning with the highest-weighted domain so the score impact of each study session is maximized.
- Run PowerKram in Learn mode to read the explanations and follow every sourced documentation link until you can predict the right answer before reading the choices.
- Switch to PowerKram Exam mode across all objectives once your accuracy in Learn mode passes 85%, simulating the timed exam experience.
Career paths and salary outlook
Data Engineer Associate is one of the fastest-growing AWS credentials by demand:
- Cloud Data Engineer — $130,000 to $200,000. Levels.fyi: Data Engineer Compensation
- Analytics Engineer — $115,000 to $180,000. Glassdoor: Analytics Engineer Salaries
- Data Platform Engineer — $135,000 to $210,000. BLS: Database Administrators and Architects Outlook
Official resources
This exam pulls heavily from the analytics and lakehouse documentation; budget time for both:
