Machine Learning

by Synchronized Software L.L.C. / 3-23-2026

Machine Learning Fundamentals

A Cross-Vendor Training Guide

Certification Alignment: CompTIA AI+, Azure AI-900, AWS ML Specialty, Google Cloud ML Engineer, Salesforce AI Associate

Introduction

Machine learning (ML) is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. Rather than following rigid instructions, ML algorithms build mathematical models based on training data to make predictions or decisions.

This foundational knowledge is essential for anyone pursuing cloud AI certifications or working with enterprise AI platforms.

What Is Machine Learning?

Machine learning is the science of getting computers to act without being explicitly programmed. Instead of writing rules for every possible scenario, you provide data and let the algorithm discover patterns.

Traditional Programming vs. Machine Learning

Traditional Programming: Input Data + Rules → Output

You define explicit rules. For example, to identify spam emails, you might write: “If email contains ‘FREE MONEY’ and sender is unknown, mark as spam.”

Machine Learning: Input Data + Expected Output → Model (learned rules)

You provide examples of spam and non-spam emails. The algorithm learns what distinguishes them and creates its own rules.

Why Machine Learning Matters

Machine learning excels when:

Rules are too complex to define manually – Recognizing faces involves millions of pixel patterns impossible to code by hand
Rules change frequently – Fraud patterns evolve constantly; ML models can adapt
Data is abundant but patterns are hidden – Customer behavior data contains insights humans cannot easily see
Personalization is required – Recommendation systems must adapt to individual preferences

The Three Types of Machine Learning

1. Supervised Learning

Supervised learning uses labeled training data—input-output pairs where the correct answer is known. The algorithm learns to map inputs to outputs by studying these examples.

How It Works:

Collect labeled training data (features + labels)
Train the model to find patterns between features and labels
Use the trained model to predict labels for new, unseen data

Common Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), Neural Networks

Real-World Applications: Email spam detection, house price prediction, medical diagnosis, credit scoring

Vendor Implementations:

Vendor	Service	Documentation
Microsoft	Azure Machine Learning AutoML	learn.microsoft.com/azure/machine-learning/concept-automated-ml
AWS	Amazon SageMaker Autopilot	docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html
Google	Vertex AI AutoML	cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/overview
Salesforce	Einstein Prediction Builder	help.salesforce.com/s/articleView?id=sf.einstein_prediction_builder.htm

2. Unsupervised Learning

Unsupervised learning works with unlabeled data. The algorithm finds hidden patterns, structures, or relationships without being told what to look for.

Common Algorithms: K-Means Clustering, Hierarchical Clustering, Principal Component Analysis (PCA), t-SNE, Autoencoders

Real-World Applications: Customer segmentation, anomaly detection, topic modeling, data compression

Vendor Implementations:

Vendor	Service	Documentation
Microsoft	Azure ML K-Means Clustering	learn.microsoft.com/azure/machine-learning/component-reference/k-means-clustering
AWS	Amazon SageMaker K-Means	docs.aws.amazon.com/sagemaker/latest/dg/k-means.html
Google	BigQuery ML Clustering	cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-kmeans
Salesforce	Einstein Discovery Clustering	help.salesforce.com/s/articleView?id=sf.bi_edd_wb_clustering.htm

3. Reinforcement Learning

Reinforcement learning trains agents to make sequential decisions by rewarding desired behaviors and penalizing undesired ones. The agent learns through trial and error in an environment.

Key Components:

Agent – The learner/decision maker
Environment – What the agent interacts with
State – Current situation of the agent
Action – Choices the agent can make
Reward – Feedback signal (positive or negative)
Policy – Strategy the agent follows

Vendor Implementations:

Vendor	Service	Documentation
Microsoft	Azure Personalizer	learn.microsoft.com/azure/ai-services/personalizer/
AWS	AWS DeepRacer, SageMaker RL	docs.aws.amazon.com/sagemaker/latest/dg/reinforcement-learning.html
Google	Vertex AI RL	cloud.google.com/vertex-ai/docs/training/training-frameworks
NVIDIA	Isaac Sim RL	developer.nvidia.com/isaac-sim

Classification vs. Regression

The two most common supervised learning tasks are classification and regression. Understanding the difference is fundamental.

Classification

Classification predicts discrete categories or classes. The output is one of a predefined set of labels.

Binary Classification – Two possible outcomes: Spam or Not Spam, Fraud or Legitimate, Positive or Negative sentiment

Multi-class Classification – Three or more categories: Image recognition, Document categorization, Disease diagnosis

Evaluation Metrics for Classification:

Metric	Description	Use Case
Accuracy	(Correct predictions) / (Total predictions)	Balanced datasets
Precision	(True Positives) / (True Positives + False Positives)	False positives costly
Recall	(True Positives) / (True Positives + False Negatives)	False negatives costly
F1 Score	Harmonic mean of Precision and Recall	Imbalanced datasets
AUC-ROC	Area under Receiver Operating Characteristic curve	Overall performance

Regression

Regression predicts continuous numerical values. The output is a number on a continuous scale.

Examples: House price prediction, temperature forecasting, sales revenue projection, customer lifetime value estimation

Evaluation Metrics for Regression:

Metric	Description	Interpretation
MAE	Mean Absolute Error – average absolute difference	Same unit as target
MSE	Mean Squared Error – average squared difference	Penalizes large errors
RMSE	Root Mean Squared Error – square root of MSE	Same unit as target
R-Squared	Proportion of variance explained by model	0-1 scale; higher better

The Machine Learning Workflow

Every ML project follows a similar lifecycle, regardless of vendor or platform.

Step 1: Problem Definition

Before writing any code, clearly define:

What business problem are you solving?
What would success look like?
Is ML the right approach? (vs. traditional programming or business rules)
What type of ML task is this? (classification, regression, clustering, etc.)

Step 2: Data Collection and Preparation

Data is the foundation of machine learning. This step typically consumes 60-80% of project time.

Key Activities: Data Collection, Data Exploration, Data Cleaning, Feature Engineering, Data Splitting

Vendor Tools for Data Preparation:

Vendor	Service	Purpose
Microsoft	Azure Data Factory, Azure ML Data Prep	ETL, data transformation
AWS	AWS Glue, SageMaker Data Wrangler	Data integration, preparation
Google	Dataflow, Vertex AI Feature Store	Data pipelines, features
Salesforce	Data Cloud, Einstein Data Prep	CRM data integration

Step 3: Model Selection and Training

Choose an appropriate algorithm and train it on your data.

Considerations: Dataset size, feature types, interpretability requirements, computational constraints, accuracy requirements

Step 4: Model Evaluation

Assess model performance using held-out test data that the model has never seen.

Key Concepts:

Training Error – Error on data used to train the model
Validation Error – Error on data used to tune hyperparameters
Test Error – Error on completely unseen data (final evaluation)
Generalization – Model’s ability to perform well on new data

Step 5: Model Deployment

Move the trained model to production where it can make predictions on new data.

Vendor Deployment Services:

Vendor	Service	Documentation
Microsoft	Azure ML Endpoints	learn.microsoft.com/azure/machine-learning/concept-endpoints
AWS	SageMaker Endpoints	docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html
Google	Vertex AI Predictions	cloud.google.com/vertex-ai/docs/predictions/get-predictions
NVIDIA	Triton Inference Server	developer.nvidia.com/triton-inference-server

Step 6: Monitoring and Maintenance

ML models degrade over time as real-world data changes (data drift, concept drift).

What to Monitor: Prediction accuracy over time, data drift, model latency and throughput, resource utilization

Bias-Variance Tradeoff

One of the most important concepts in machine learning is the bias-variance tradeoff.

Understanding Bias

Bias is error from overly simplistic assumptions in the learning algorithm. High bias causes the model to miss relevant patterns (underfitting).

Symptoms of High Bias: Poor performance on training data, poor performance on test data, model is too simple for the problem

Understanding Variance

Variance is error from sensitivity to small fluctuations in training data. High variance causes the model to model noise rather than signal (overfitting).

Symptoms of High Variance: Excellent performance on training data, poor performance on test data, model is too complex

The Tradeoff

You cannot minimize both bias and variance simultaneously. Reducing one typically increases the other.

Total Error = Bias² + Variance + Irreducible Error

Finding the Sweet Spot:

Start simple, increase complexity gradually
Use cross-validation to detect overfitting
Apply regularization to reduce variance
Gather more data to reduce variance
Add features to reduce bias

Feature Engineering

Feature engineering is the process of using domain knowledge to create features that make machine learning algorithms work better.

Common Feature Engineering Techniques

1. Handling Missing Values

Remove rows/columns with missing values
Impute with mean, median, or mode
Use advanced imputation (KNN, regression)
Create a “missing” indicator feature

2. Encoding Categorical Variables

Technique	Description	When to Use
One-Hot Encoding	Create binary column for each category	Nominal, <10 categories
Label Encoding	Assign integer to each category	Ordinal categories
Target Encoding	Replace with mean of target variable	High cardinality
Embedding	Learn dense vector representation	Very high cardinality

3. Feature Scaling

Technique	Formula	Range	Use For
Min-Max Scaling	(x – min) / (max – min)	[0, 1]	Neural nets, KNN
Standardization	(x – mean) / std	~[-3, 3]	Linear models, SVM
Robust Scaling	(x – median) / IQR	varies	Data with outliers

Key Takeaways

Machine Learning enables systems to learn from data rather than explicit programming
Three main types: Supervised (labeled data), Unsupervised (unlabeled data), Reinforcement (reward-based learning)
Classification predicts categories; Regression predicts continuous values
The ML workflow is consistent across platforms: Define → Prepare → Train → Evaluate → Deploy → Monitor
Bias-Variance tradeoff is fundamental: Too simple = underfitting; Too complex = overfitting
Feature engineering often has more impact than algorithm choice
All major cloud vendors offer similar ML capabilities with different interfaces

Additional Learning Resources

Official Vendor Documentation

Microsoft Learn: learn.microsoft.com/training/paths/create-machine-learning-models/
AWS Training: aws.amazon.com/training/learn-about/machine-learning/
Google ML Crash Course: developers.google.com/machine-learning/crash-course
NVIDIA Deep Learning Institute: nvidia.com/en-us/training/
CompTIA AI Fundamentals: comptia.org/certifications/ai-fundamentals
Salesforce Trailhead: trailhead.salesforce.com/content/learn/modules/ai-basics

Certification Study Guides

Azure AI-900: Basic AI
AWS ML Specialty: aws.amazon.com/certification/certified-machine-learning-specialty/
Google ML Engineer: cloud.google.com/learn/certification/machine-learning-engineer
Salesforce Agentforce Specialist: Specializing in Agentforce Practice Exam

Article 1 of 5 | AI/ML Foundations Training Series

Level: Beginner | Estimated Reading Time: 25 minutes | Last Updated: February 2025

Machine Learning