I B M C E R T I F I C A T I O N
C9007300 IBM Certified watsonx Data Lakehouse Engineer v1 – Associate Practice Exam
Exam Number: 4340 | Last updated April 17, 2026 | 342+ questions across 5 vendor-aligned objectives
Lakehouse engineers who build, load, and query watsonx.data deployments are the audience for the C9007300 credential. This associate-level exam validates your ability to design lakehouse storage and compute topologies, ingest and transform data, register tables in open formats like Apache Iceberg, and query across engines such as Presto and Spark. Candidates should be comfortable with SQL, object storage, and modern data-lake concepts like time travel and schema evolution.
Landing 26% of the exam, Architecture and Storage Design covers object storage, bucket organization, table formats, and engine placement. At 22%, Ingestion and Transformation covers batch and streaming ingestion, DBT-style transformations, and the movement of data through bronze/silver/gold layers. A further 20% targets Table Formats, covering Apache Iceberg in depth — partitioning, time travel, schema evolution, and compaction.
Tying off the blueprint, Query Engines accounts for 18% and spans Presto and Spark configuration, query optimization, and cost-based planning. Governance and Security represents 14% and spans catalog registration, column- and row-level security, and lineage. Lakehouse questions often test whether a workload should run in Presto or Spark — decide based on latency, complexity, and concurrency, not personal familiarity with either engine.
Every answer links to the source. Each explanation below includes a hyperlink to the exact IBM documentation page the question was derived from. PowerKram is the only practice platform with source-verified explanations. Learn about our methodology →
752
practice exam users
94%
satisfied users
91%
passed the exam
4.7/5
quality rating
Test your C9007300 watsonx lakehouse v1 knowledge
10 of 342+ questions
Question #1 - Architecture and Storage Design
A lakehouse engineer at Branwell Media is designing object-storage layout for a multi-team watsonx.data deployment.
Which watsonx.data storage-design approach organizes buckets for the multi-team deployment?
A) Organize buckets by data domain with consistent prefixes per layer (bronze/silver/gold), attach buckets to catalogs, and give each team least-privilege access
B) Dump every dataset into one bucket with no prefixes
C) Use a different storage vendor per team
D) Skip bucket design and let teams create buckets at random
Show solution
Correct answers: A – Explanation:
Domain-based buckets with layered prefixes and scoped access is watsonx.data’s storage reference. Flat dumps, vendor sprawl, and random bucket creation all fail the design. Source: Check Source
Question #2 - Architecture and Storage Design
A watsonx.data engineer at Redway Analytics must decide which engine to attach to a bucket for mixed interactive and batch queries.
Which approach fits?
A) Use only Presto for everything including heavy ETL
B) Attach both Presto (for interactive, low-latency SQL) and Spark (for heavy batch and ETL) to the bucket, choosing per workload
C) Use only Spark for everything including interactive dashboards
D) Avoid engines and query objects directly
Show solution
Correct answers: B – Explanation:
Engine selection per workload — Presto for interactive SQL, Spark for ETL — is watsonx.data’s engine-placement reference. All-Presto, all-Spark, and direct-object access all miss the trade-off. Source: Check Source
Question #3 - Architecture and Storage Design
A lakehouse at Hardwick Financial must separate compute from storage so engines can scale independently.
Which principle fits?
A) Keep data in object storage in open table formats (Iceberg) and attach query engines on demand, scaling compute without moving data
B) Copy data into the engine’s local disk
C) Tie compute and storage together on a single VM
D) Use a closed proprietary format that locks the compute choice
Show solution
Correct answers: A – Explanation:
Separated compute and storage with open table formats is the lakehouse reference. Local copies, single-VM bundling, and proprietary lock-in all fail the architecture. Source: Check Source
Question #4 - Ingestion and Transformation
A batch ingestion at Pemberfield Energy lands CSVs into bronze, needs cleaning to silver, and aggregation to gold.
Which watsonx.data transformation pattern moves CSVs through bronze, silver, and gold layers?
A) Ingest raw CSVs to bronze, apply cleaning and validation transformations to silver, then aggregate to gold — using Spark (or DBT-style transformations) and Iceberg tables at each layer
B) Write everything to a single table and call it done
C) Skip validation and aggregate raw CSVs
D) Maintain only gold and hope raw data is never needed again
Show solution
Correct answers: A – Explanation:
Layered bronze/silver/gold with Iceberg tables is watsonx.data’s transformation reference. Single-table, no-validation, and gold-only all skip necessary layering. Source: Check Source
Question #5 - Ingestion and Transformation
A streaming use case at Haldane Retail must land events into the lakehouse with near-real-time availability.
Which watsonx.data ingestion pattern lands streaming events near real time into the lakehouse?
A) Batch-only ingestion at midnight
B) Stream events with a streaming framework (e.g., Spark Structured Streaming or Kafka Connect) into Iceberg tables, using commit cadences that balance latency and small-file overhead
C) Stream events into one giant append-only CSV
D) Skip streaming and ask users to refresh manually
Show solution
Correct answers: B – Explanation:
Streaming into Iceberg tables with tuned commit cadence is the streaming-ingestion reference. Midnight batch, CSV append, and manual refresh all miss the feature. Source: Check Source
Question #6 - Table Formats
A data engineer at Greshley Insurance needs to query a table as it was two days ago.
Which Iceberg capability serves the as-of-two-days-ago query directly?
A) Reconstruct the state by subtracting recent changes manually
B) Restore a backup into a separate table
C) Use Iceberg time travel to query a snapshot or timestamp from two days ago directly
D) Skip the request because time travel is not possible in lakehouses
Show solution
Correct answers: C – Explanation:
Iceberg time travel is the reference. Backup restores, manual reconstruction, and denial all miss the feature. Source: Check Source
Question #7 - Table Formats
A table at Finmore Financial gains a new column that should not break existing queries.
Which Iceberg capability adds the new column without breaking existing queries?
A) Block all schema changes permanently
B) Rewrite the entire table to add the column
C) Create a new table and deprecate the old one
D) Use schema evolution to add the column (nullable or with a default) so existing queries continue to work and new queries can use the new column
Show solution
Correct answers: D – Explanation:
Iceberg schema evolution is the reference for non-breaking column additions. Rewrites, table duplication, and blocking changes all miss the feature. Source: Check Source
Question #8 - Table Formats
An Iceberg table at Turvey Retail has accumulated many small files from frequent streaming commits.
Which maintenance capability fits?
A) Turn off streaming to avoid small files
B) Delete small files at random
C) Ignore the small-file problem and accept slow queries
D) Schedule Iceberg compaction (rewrite_data_files) to combine small files into larger ones, improving query performance without changing data
Show solution
Correct answers: D – Explanation:
Iceberg compaction is the reference. Random deletion, ignoring, and killing streaming all fail maintenance practice. Source: Check Source
Question #9 - Query Engines
A query choice at Harvingham Ltd pits Presto against Spark for an interactive dashboard with sub-second response.
Which engine fits?
A) Spark, which is tuned for batch and ETL, not sub-second interactive SQL
B) Presto, whose low-latency distributed SQL engine is tuned for interactive queries
C) Neither — dashboards cannot use lakehouses
D) Both simultaneously for the same query
Show solution
Correct answers: B – Explanation:
Presto for interactive is the watsonx.data engine reference. Spark is batch-oriented. Dashboards can use lakehouses. Dual-engine for one query is not a thing. Source: Check Source
Question #10 - Governance and Security
A sensitive table at Marshford Bank must restrict certain columns (SSN, email) so only the compliance group can read them.
Which watsonx.data capability fits?
A) Remove the columns entirely and lose the data
B) Store sensitive columns in a separate file and email them when requested
C) Configure column-level security on the table so only the compliance group can read the restricted columns, while other users see allowed columns only
D) Grant everyone access and add a disclaimer
Show solution
Correct answers: C – Explanation:
Column-level security on the table is watsonx.data’s governance reference. Email workflows, data loss, and disclaimers all fail governance. Source: Check Source
Get 342+ more questions with source-linked explanations
Every answer traces to the exact IBM documentation page — so you learn from the source, not just memorize answers.
Exam mode & learn mode · Score by objective · Updated April 17, 2026
Learn more...
What the C9007300 watsonx lakehouse v1 exam measures
- Design and provision object storage, bucket organization, table formats, and engine placement to deliver a lakehouse topology that scales with data growth without breaking the cost story
- Ingest and transform batch and streaming data, DBT-style pipelines, and bronze/silver/gold layers to move raw data through progressively refined layers so analysts and models get trustworthy inputs
- Model and evolve Apache Iceberg partitioning, time travel, schema evolution, and compaction to keep large datasets queryable, rewindable, and storage-efficient over months and years
- Query and optimize Presto and Spark configuration, query optimization, and cost-based planning to meet latency and concurrency targets across interactive and batch workloads
- Catalog and control catalog registration, column- and row-level security, and data lineage to expose lakehouse assets responsibly while satisfying governance requirements
How to prepare for this exam
- Review the official exam guide to understand every objective and domain weight before you begin studying
- Work through the relevant IBM Training learning path — ibm certified watsonx data lakehouse engineer v1 associate C9007300 — to cover vendor-authored material end-to-end
- Get hands-on inside IBM TechZone or a comparable sandbox so you can practice the console tasks, CLI commands, and APIs the exam expects
- Tackle a real-world project at your workplace, a volunteer role, or an open-source repository where the technology under test is actually in use
- Drill one exam objective at a time, starting with the highest-weighted domain and only moving on once you can teach it to someone else
- Study by objective in PowerKram learn mode, where every explanation links back to authoritative IBM documentation
- Switch to PowerKram exam mode to rehearse under timed conditions and confirm you consistently score above the pass mark
Career paths and salary outlook
Data engineers with lakehouse skills lead the highest-paid tier of modern data-platform roles:
- Data Lakehouse Engineer — $120,000–$165,000 per year, building and operating modern lakehouse platforms (Glassdoor salary data)
- Senior Data Engineer — $130,000–$175,000 per year, leading data-platform work across teams (Indeed salary data)
- Analytics Platform Architect — $140,000–$185,000 per year, designing enterprise analytics platforms end-to-end (Glassdoor salary data)
Official resources
Work through the official IBM Training learning path for this certification, which bundles videos, labs, and skill tasks aligned to every objective. The official exam page lists the full objective breakdown, prerequisite knowledge, and scheduling details.
