MICROSOFT CERTIFICATION

DP-100 Azure Data Scientist Associate Practice Exam

Exam Number: 3114 | Last updated 16-Apr-26 | 785+ questions across 4 vendor-aligned objectives

The DP-100 Azure Data Scientist Associate certification validates the skills of data scientists who apply Azure Machine Learning and related data science techniques to train, evaluate, and deploy models. This exam measures your ability to work with Azure Machine Learning, Azure Machine Learning Designer, Azure Automated ML, MLflow, Azure Databricks, demonstrating both conceptual understanding and practical implementation skills required in today’s enterprise environments.

The heaviest exam domains include Explore Data and Train Models (25–30%), Design and Prepare a Machine Learning Solution (20–25%), and Prepare a Model for Deployment (20–25%). These areas collectively represent the majority of exam content and require focused preparation across their respective subtopics.

Additional domains tested include Deploy and Retrain a Model (10–15%), and Manage and Review Models (10–15%). Together, these areas round out the full exam blueprint and ensure candidates possess well-rounded expertise across the certification scope.

 Expect heavy coverage of the Azure Machine Learning SDK v2 and MLflow experiment tracking. Know how to configure compute clusters, design reproducible pipelines, and implement responsible AI dashboards.

Every answer links to the source. Each explanation below includes a hyperlink to the exact Microsoft documentation page the question was derived from. PowerKram is the only practice platform with source-verified explanations. Learn about our methodology →

619

practice exam users

91.7%

satisfied users

88.4%

passed the exam

4.9/5

quality rating

Test your DP-100 Azure Data Scientist Associate knowledge

10 of 785+ questions

Question #1 - Design and Prepare a Machine Learning Solution

A data science team needs a collaborative workspace where they can share datasets, run experiments, and track model metrics. The environment must support both Python and R.

Which Azure service should they use?

A) Azure HDInsight
B) Azure Databricks only
C) Azure Machine Learning workspace
D) Azure Synapse Analytics

 

Correct answers: C – Explanation:
Azure ML workspace provides collaborative experiment tracking, dataset management, and compute targets supporting Python and R notebooks. Databricks is a Spark-based analytics platform. Synapse focuses on big data analytics. HDInsight is for open-source cluster computing. Source: Check Source

A data science team needs a collaborative workspace for sharing datasets, running experiments, and tracking metrics. The environment must support Python and R.

Which Azure service should they use?

A) Azure Machine Learning workspace offering integrated experiment tracking and compute management
B) Azure Databricks providing a Spark-based analytics platform with collaborative notebooks
C) Azure HDInsight deploying open-source cluster computing frameworks like Hadoop and Spark
D) Azure Synapse Analytics combining big data processing with enterprise data warehousing

 

Correct answers: A – Explanation:
Azure ML workspace provides collaborative experiment tracking, dataset versioning, compute target management, and model registry supporting both Python and R notebooks. Databricks is a Spark analytics platform optimized for big data processing. Synapse focuses on unified data analytics and warehousing. HDInsight deploys open-source cluster computing requiring manual framework management. Source: Check Source

A company runs ML training jobs needing 8 GPUs for 6 hours daily but wants to avoid paying for idle compute.

Which compute option should be configured?

A) A persistent compute instance running continuously with 8 GPUs available around the clock
B) Azure Container Instances provisioned with GPU support for on-demand container workloads
C) An Azure Reserved VM Instance purchased with a one-year commitment for consistent pricing
D) A compute cluster with auto-scale configured from zero minimum nodes to 8 GPU maximum

 

Correct answers: D – Explanation:
A compute cluster scaling from zero eliminates idle costs by deallocating all nodes when no jobs are queued, then scaling to 8 GPU nodes for training. Persistent instances incur 24/7 costs regardless of actual training activity. Reserved instances lock in a pricing commitment regardless of utilization patterns. Container Instances provide GPU access but lack the integrated ML pipeline orchestration and auto-scaling. Source: Check Source

A healthcare company needs reproducible ML pipelines with versioned data transformations, feature engineering, and training steps.

Which Azure ML feature ensures reproducible pipelines?

A) Jupyter notebooks saved to a GitHub repository for manual version tracking of code changes
B) Azure ML pipelines with registered versioned components, datasets, and environment snapshots
C) Azure Data Factory pipelines orchestrating data movement between storage and compute resources
D) Manual documentation of each processing step maintained in a shared team knowledge base

 

Correct answers: B – Explanation:
Azure ML pipelines with registered components and datasets ensure full reproducibility with lineage tracking across data, code, and environment versions. GitHub notebooks version code but not data or compute environments holistically. Manual documentation is error-prone and cannot guarantee execution reproducibility. Data Factory handles data orchestration but lacks ML-specific pipeline, experiment, and model versioning. Source: Check Source

A retailer has 500K rows of sales data and needs to find the best forecasting algorithm without manually testing dozens of models.

Which Azure ML capability should they use?

A) Manually test linear regression, then random forest, then XGBoost in sequential experiments
B) Deploy all candidate algorithms to production and run live A/B tests with real customers
C) AutoML with forecasting task type that evaluates multiple algorithms and hyperparameters
D) Azure ML Designer with a fixed neural network architecture pre-configured in the pipeline

 

Correct answers: C – Explanation:
AutoML automatically evaluates multiple algorithms, preprocessing steps, and hyperparameter combinations for the forecasting task, selecting the best-performing model. Manual sequential testing is slow and may miss optimal algorithm-hyperparameter combinations. Designer with a fixed architecture skips the model selection search entirely. Production A/B testing is expensive, risky, and inappropriate for unvalidated model candidates. Source: Check Source

A data scientist trains a classification model achieving 98% accuracy on training data but only 60% on the test set.

What problem does this indicate and how should it be addressed?

A) Insufficient features indicating the input variables are inadequate — add more data columns
B) Overfitting indicating the model memorized training data — apply regularization and cross-validation
C) Underfitting indicating the model is too simple — increase model complexity and feature count
D) Data leakage indicating test data leaked into training — the high training accuracy is expected

 

Correct answers: B – Explanation:
The large gap between training (98%) and test (60%) accuracy indicates overfitting where the model memorized training patterns without learning generalizable rules. Regularization, cross-validation, and more training data help generalize. Underfitting shows poor performance on both sets. Data leakage would produce artificially high test accuracy. Adding features without regularization may worsen overfitting further. Source: Check Source

A bank tracks multiple experiment runs comparing hyperparameters for a fraud detection model and needs visual comparison.

Which tool should the data scientist use?

A) MLflow experiment tracking in Azure ML logging metrics, parameters, and artifacts per run
B) Azure Monitor metric dashboards configured with custom charts displaying training statistics
C) Azure DevOps work items created for each experiment with results noted in descriptions
D) Excel spreadsheets with manually entered results from each experimental training run

 

Correct answers: A – Explanation:
MLflow in Azure ML logs metrics, parameters, and artifacts automatically for each run, enabling visual comparison across experiments with interactive charts and tables. Excel requires manual data entry and lacks automated logging integration. DevOps work items track project tasks, not ML experiment parameters and metrics. Azure Monitor dashboards track infrastructure performance, not experiment-specific ML training metrics. Source: Check Source

A credit scoring model must be evaluated for bias across protected attributes like age and gender before deployment.

Which Azure ML tool provides this fairness assessment?

A) Azure Advisor providing optimization recommendations for Azure Machine Learning resource usage
B) Application Insights monitoring the runtime performance of deployed model inference endpoints
C) Responsible AI dashboard with the fairness assessment component evaluating group-level disparities
D) Azure Cost Management analyzing the compute spending associated with model training runs

 

Correct answers: C – Explanation:
The Responsible AI dashboard includes fairness assessment components that evaluate prediction disparities across demographic groups, identifying potential bias before deployment. Cost Management handles compute billing analysis. Advisor provides resource optimization tips. Application Insights monitors deployed service performance, not pre-deployment model fairness evaluation. Source: Check Source

A trained model needs conversion to a format optimized for low-latency inference on edge devices with limited compute resources.

Which model format and tool should be used?

A) Save the model as a Python pickle file packaged with the full training environment dependencies
B) Convert to ONNX format using the ONNX Runtime for cross-platform optimized model inference
C) Deploy the complete training environment including GPU drivers and frameworks to edge devices
D) Export the trained model weights as a CSV file and reload them at inference time on edge

 

Correct answers: B – Explanation:
ONNX provides an optimized, hardware-accelerated, interoperable format for inference across platforms including resource-constrained edge devices. Pickle files are Python-specific and include no cross-platform optimization. CSV files cannot represent complex model architectures and lose computational graph information. Deploying full training environments to edge devices is impractical due to their resource constraints. Source: Check Source

A deployed recommendation model’s accuracy has degraded over three months as user preferences shifted.

What should the team implement to address this?

A) Implement an automated retraining pipeline triggered by data drift detection monitoring
B) Retrain the model once with the latest data and redeploy it as a permanent final version
C) Ignore the accuracy degradation as a normal expected behavior of machine learning models
D) Switch entirely to a deterministic rule-based recommendation system without ML components

 

Correct answers: A – Explanation:
Automated retraining triggered by data drift detection ensures the model adapts to changing patterns continuously over time. One-time retraining will degrade again as preferences continue shifting. Ignoring degradation reduces recommendation quality and user engagement. Rule-based systems lack the personalization capability that ML models provide for diverse user preferences. Source: Check Source

Get 785+ more questions with source-linked explanations

Every answer traces to the exact Microsoft documentation page — so you learn from the source, not just memorize answers.

Exam mode & learn mode · Score by objective · Updated 16-Apr-26

Learn more...

What the DP-100 Azure Data Scientist Associate exam measures

  • Design and Prepare a Machine Learning Solution (20–25%) — Evaluate your ability to implement and manage tasks within this domain, including real-world job skills and scenario-based problem solving.
  • Explore Data and Train Models (25–30%) — Evaluate your ability to implement and manage tasks within this domain, including real-world job skills and scenario-based problem solving.
  • Prepare a Model for Deployment (20–25%) — Evaluate your ability to implement and manage tasks within this domain, including real-world job skills and scenario-based problem solving.
  • Deploy and Retrain a Model (10–15%) — Evaluate your ability to implement and manage tasks within this domain, including real-world job skills and scenario-based problem solving.
  • Manage and Review Models (10–15%) — Evaluate your ability to implement and manage tasks within this domain, including real-world job skills and scenario-based problem solving.

  • Review the official exam guide to understand every objective and domain weight before you begin studying
  • Complete the relevant Microsoft Learn learning path to build a structured foundation across all exam topics
  • Get hands-on practice in an Azure free-tier sandbox or trial environment to reinforce what you have studied with real configurations
  • Apply your knowledge through real-world project experience — whether at work, in volunteer roles, or contributing to open-source initiatives
  • Master one objective at a time, starting with the highest-weighted domain to maximize your score potential early
  • Use PowerKram learn mode to study by individual objective and review detailed explanations for every question
  • Switch to PowerKram exam mode to simulate the real test experience with randomized questions and timed conditions

Earning this certification can open doors to several in-demand roles:

Microsoft provides comprehensive free training to prepare for the DP-100 Azure Data Scientist Associate exam. Start with the official Microsoft Learn learning path for structured, self-paced modules covering every exam domain. Review the exam study guide for the complete skills outline and recent updates.

Related certifications to explore

Related reading from our Learning Hub