Machine Learning
by Synchronized Software L.L.C. / 3-23-2026
Machine Learning Fundamentals
A Cross-Vendor Training Guide
Certification Alignment: CompTIA AI+, Azure AI-900, AWS ML Specialty, Google Cloud ML Engineer, Salesforce AI Associate
Introduction
Machine learning (ML) is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. Rather than following rigid instructions, ML algorithms build mathematical models based on training data to make predictions or decisions.
This foundational knowledge is essential for anyone pursuing cloud AI certifications or working with enterprise AI platforms.
What Is Machine Learning?
Machine learning is the science of getting computers to act without being explicitly programmed. Instead of writing rules for every possible scenario, you provide data and let the algorithm discover patterns.
Traditional Programming vs. Machine Learning
Traditional Programming: Input Data + Rules → Output
You define explicit rules. For example, to identify spam emails, you might write: “If email contains ‘FREE MONEY’ and sender is unknown, mark as spam.”
Machine Learning: Input Data + Expected Output → Model (learned rules)
You provide examples of spam and non-spam emails. The algorithm learns what distinguishes them and creates its own rules.
Why Machine Learning Matters
Machine learning excels when:
- Rules are too complex to define manually – Recognizing faces involves millions of pixel patterns impossible to code by hand
- Rules change frequently – Fraud patterns evolve constantly; ML models can adapt
- Data is abundant but patterns are hidden – Customer behavior data contains insights humans cannot easily see
- Personalization is required – Recommendation systems must adapt to individual preferences
The Three Types of Machine Learning
1. Supervised Learning
Supervised learning uses labeled training data—input-output pairs where the correct answer is known. The algorithm learns to map inputs to outputs by studying these examples.
How It Works:
- Collect labeled training data (features + labels)
- Train the model to find patterns between features and labels
- Use the trained model to predict labels for new, unseen data
Common Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), Neural Networks
Real-World Applications: Email spam detection, house price prediction, medical diagnosis, credit scoring
Vendor Implementations:
| Vendor | Service | Documentation |
| Microsoft | Azure Machine Learning AutoML | learn.microsoft.com/azure/machine-learning/concept-automated-ml |
| AWS | Amazon SageMaker Autopilot | docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html |
| Vertex AI AutoML | cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/overview | |
| Salesforce | Einstein Prediction Builder | help.salesforce.com/s/articleView?id=sf.einstein_prediction_builder.htm |
2. Unsupervised Learning
Unsupervised learning works with unlabeled data. The algorithm finds hidden patterns, structures, or relationships without being told what to look for.
Common Algorithms: K-Means Clustering, Hierarchical Clustering, Principal Component Analysis (PCA), t-SNE, Autoencoders
Real-World Applications: Customer segmentation, anomaly detection, topic modeling, data compression
Vendor Implementations:
| Vendor | Service | Documentation |
| Microsoft | Azure ML K-Means Clustering | learn.microsoft.com/azure/machine-learning/component-reference/k-means-clustering |
| AWS | Amazon SageMaker K-Means | docs.aws.amazon.com/sagemaker/latest/dg/k-means.html |
| BigQuery ML Clustering | cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-kmeans | |
| Salesforce | Einstein Discovery Clustering | help.salesforce.com/s/articleView?id=sf.bi_edd_wb_clustering.htm |
3. Reinforcement Learning
Reinforcement learning trains agents to make sequential decisions by rewarding desired behaviors and penalizing undesired ones. The agent learns through trial and error in an environment.
Key Components:
- Agent – The learner/decision maker
- Environment – What the agent interacts with
- State – Current situation of the agent
- Action – Choices the agent can make
- Reward – Feedback signal (positive or negative)
- Policy – Strategy the agent follows
Vendor Implementations:
| Vendor | Service | Documentation |
| Microsoft | Azure Personalizer | learn.microsoft.com/azure/ai-services/personalizer/ |
| AWS | AWS DeepRacer, SageMaker RL | docs.aws.amazon.com/sagemaker/latest/dg/reinforcement-learning.html |
| Vertex AI RL | cloud.google.com/vertex-ai/docs/training/training-frameworks | |
| NVIDIA | Isaac Sim RL | developer.nvidia.com/isaac-sim |
Classification vs. Regression
The two most common supervised learning tasks are classification and regression. Understanding the difference is fundamental.
Classification
Classification predicts discrete categories or classes. The output is one of a predefined set of labels.
Binary Classification – Two possible outcomes: Spam or Not Spam, Fraud or Legitimate, Positive or Negative sentiment
Multi-class Classification – Three or more categories: Image recognition, Document categorization, Disease diagnosis
Evaluation Metrics for Classification:
| Metric | Description | Use Case |
| Accuracy | (Correct predictions) / (Total predictions) | Balanced datasets |
| Precision | (True Positives) / (True Positives + False Positives) | False positives costly |
| Recall | (True Positives) / (True Positives + False Negatives) | False negatives costly |
| F1 Score | Harmonic mean of Precision and Recall | Imbalanced datasets |
| AUC-ROC | Area under Receiver Operating Characteristic curve | Overall performance |
Regression
Regression predicts continuous numerical values. The output is a number on a continuous scale.
Examples: House price prediction, temperature forecasting, sales revenue projection, customer lifetime value estimation
Evaluation Metrics for Regression:
| Metric | Description | Interpretation |
| MAE | Mean Absolute Error – average absolute difference | Same unit as target |
| MSE | Mean Squared Error – average squared difference | Penalizes large errors |
| RMSE | Root Mean Squared Error – square root of MSE | Same unit as target |
| R-Squared | Proportion of variance explained by model | 0-1 scale; higher better |
The Machine Learning Workflow
Every ML project follows a similar lifecycle, regardless of vendor or platform.
Step 1: Problem Definition
Before writing any code, clearly define:
- What business problem are you solving?
- What would success look like?
- Is ML the right approach? (vs. traditional programming or business rules)
- What type of ML task is this? (classification, regression, clustering, etc.)
Step 2: Data Collection and Preparation
Data is the foundation of machine learning. This step typically consumes 60-80% of project time.
Key Activities: Data Collection, Data Exploration, Data Cleaning, Feature Engineering, Data Splitting
Vendor Tools for Data Preparation:
| Vendor | Service | Purpose |
| Microsoft | Azure Data Factory, Azure ML Data Prep | ETL, data transformation |
| AWS | AWS Glue, SageMaker Data Wrangler | Data integration, preparation |
| Dataflow, Vertex AI Feature Store | Data pipelines, features | |
| Salesforce | Data Cloud, Einstein Data Prep | CRM data integration |
Step 3: Model Selection and Training
Choose an appropriate algorithm and train it on your data.
Considerations: Dataset size, feature types, interpretability requirements, computational constraints, accuracy requirements
Step 4: Model Evaluation
Assess model performance using held-out test data that the model has never seen.
Key Concepts:
- Training Error – Error on data used to train the model
- Validation Error – Error on data used to tune hyperparameters
- Test Error – Error on completely unseen data (final evaluation)
- Generalization – Model’s ability to perform well on new data
Step 5: Model Deployment
Move the trained model to production where it can make predictions on new data.
Vendor Deployment Services:
| Vendor | Service | Documentation |
| Microsoft | Azure ML Endpoints | learn.microsoft.com/azure/machine-learning/concept-endpoints |
| AWS | SageMaker Endpoints | docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html |
| Vertex AI Predictions | cloud.google.com/vertex-ai/docs/predictions/get-predictions | |
| NVIDIA | Triton Inference Server | developer.nvidia.com/triton-inference-server |
Step 6: Monitoring and Maintenance
ML models degrade over time as real-world data changes (data drift, concept drift).
What to Monitor: Prediction accuracy over time, data drift, model latency and throughput, resource utilization
Bias-Variance Tradeoff
One of the most important concepts in machine learning is the bias-variance tradeoff.
Understanding Bias
Bias is error from overly simplistic assumptions in the learning algorithm. High bias causes the model to miss relevant patterns (underfitting).
Symptoms of High Bias: Poor performance on training data, poor performance on test data, model is too simple for the problem
Understanding Variance
Variance is error from sensitivity to small fluctuations in training data. High variance causes the model to model noise rather than signal (overfitting).
Symptoms of High Variance: Excellent performance on training data, poor performance on test data, model is too complex
The Tradeoff
You cannot minimize both bias and variance simultaneously. Reducing one typically increases the other.
Total Error = Bias² + Variance + Irreducible Error
Finding the Sweet Spot:
- Start simple, increase complexity gradually
- Use cross-validation to detect overfitting
- Apply regularization to reduce variance
- Gather more data to reduce variance
- Add features to reduce bias
Feature Engineering
Feature engineering is the process of using domain knowledge to create features that make machine learning algorithms work better.
Common Feature Engineering Techniques
1. Handling Missing Values
- Remove rows/columns with missing values
- Impute with mean, median, or mode
- Use advanced imputation (KNN, regression)
- Create a “missing” indicator feature
2. Encoding Categorical Variables
| Technique | Description | When to Use |
| One-Hot Encoding | Create binary column for each category | Nominal, <10 categories |
| Label Encoding | Assign integer to each category | Ordinal categories |
| Target Encoding | Replace with mean of target variable | High cardinality |
| Embedding | Learn dense vector representation | Very high cardinality |
3. Feature Scaling
| Technique | Formula | Range | Use For |
| Min-Max Scaling | (x – min) / (max – min) | [0, 1] | Neural nets, KNN |
| Standardization | (x – mean) / std | ~[-3, 3] | Linear models, SVM |
| Robust Scaling | (x – median) / IQR | varies | Data with outliers |
Key Takeaways
- Machine Learning enables systems to learn from data rather than explicit programming
- Three main types: Supervised (labeled data), Unsupervised (unlabeled data), Reinforcement (reward-based learning)
- Classification predicts categories; Regression predicts continuous values
- The ML workflow is consistent across platforms: Define → Prepare → Train → Evaluate → Deploy → Monitor
- Bias-Variance tradeoff is fundamental: Too simple = underfitting; Too complex = overfitting
- Feature engineering often has more impact than algorithm choice
- All major cloud vendors offer similar ML capabilities with different interfaces
Additional Learning Resources
Official Vendor Documentation
- Microsoft Learn: learn.microsoft.com/training/paths/create-machine-learning-models/
- AWS Training: aws.amazon.com/training/learn-about/machine-learning/
- Google ML Crash Course: developers.google.com/machine-learning/crash-course
- NVIDIA Deep Learning Institute: nvidia.com/en-us/training/
- CompTIA AI Fundamentals: comptia.org/certifications/ai-fundamentals
- Salesforce Trailhead: trailhead.salesforce.com/content/learn/modules/ai-basics
Certification Study Guides
- Azure AI-900: Basic AI
- AWS ML Specialty: aws.amazon.com/certification/certified-machine-learning-specialty/
- Google ML Engineer: cloud.google.com/learn/certification/machine-learning-engineer
- Salesforce Agentforce Specialist: Specializing in Agentforce Practice Exam
Article 1 of 5 | AI/ML Foundations Training Series
Level: Beginner | Estimated Reading Time: 25 minutes | Last Updated: February 2025
