Generative AI and Large Language Models

Q: Question #1

A data science team at a consumer lending company is building an AI model to approve or deny personal loan applications. The compliance officer insists the model must achieve Demographic Parity, Equalized Odds, AND Predictive Parity simultaneously to satisfy all stakeholders. The lead ML engineer pushes back, citing a fundamental limitation. Why is the compliance officer’s requirement problematic? A) These three metrics can only be satisfied simultaneously if the model uses protected attributes as direct input features. B) Achieving all three metrics requires an interpretable model architecture such as logistic regression, which would sacrifice accuracy. C) These metrics are designed for classification tasks only and cannot be applied to the continuous probability scores used in lending decisions. D) It is mathematically proven that — except in trivial cases — Demographic Parity, Equalized Odds, and Predictive Parity cannot all be satisfied simultaneously, so the organization must choose which definition of fairness is most appropriate for their context. Solution Correct Answer: D Explanation: This reflects the Impossibility Theorem described in the Fairness Metrics section. These three fairness definitions are mathematically incompatible in all but trivial cases (e.g., when base rates are identical across groups). Organizations must make a deliberate, documented choice about which fairness metric best fits their use case, regulatory requirements, and stakeholder values. The other options introduce incorrect preconditions — using protected attributes, requiring specific architectures, or limiting metric applicability — none of which are the actual constraint.

A Cross-Vendor Training Guide

Certification Alignment: Azure AI-102, AWS ML Specialty, Google ML Engineer, Salesforce AI Specialist, NVIDIA DLI GenAI

Introduction

Generative AI represents a paradigm shift in artificial intelligence. Rather than simply classifying or predicting, these models create new content—text, images, code, audio, and video. Large Language Models (LLMs) like GPT-4, Claude, Gemini, and Llama have demonstrated remarkable capabilities in understanding and generating human-like text.

This guide provides comprehensive coverage of generative AI concepts, architectures, and practical implementation across all major cloud platforms.

What Is Generative AI?

Generative AI refers to AI systems that can create new content rather than just analyzing existing data. These models learn patterns from training data and use that knowledge to generate novel outputs.

Generative vs. Discriminative Models

Aspect	Discriminative	Generative
Goal	Classify or predict labels	Generate new data samples
Learns	P(y\|x) – probability of label given input	P(x) or P(x\|y) – data distribution
Examples	Logistic Regression, SVM, CNN classifiers	GPT, DALL-E, Stable Diffusion, VAEs
Output	Class labels, predictions	New text, images, audio, code

Types of Generative AI

Type	Description	Examples
Text Generation	Create human-like text, answer questions, write code	GPT-4, Claude, Gemini, Llama
Image Generation	Create images from text descriptions	DALL-E 3, Midjourney, Stable Diffusion
Code Generation	Write and explain programming code	GitHub Copilot, CodeLlama, StarCoder
Audio Generation	Create speech, music, sound effects	Eleven Labs, Suno, AudioCraft
Video Generation	Create video content from prompts	Sora, Runway, Pika
Multimodal	Process and generate multiple modalities	GPT-4V, Gemini, Claude 3

Foundation Models

Foundation models are large AI models trained on broad data that can be adapted to many downstream tasks. They represent a shift from task-specific models to general-purpose AI.

Characteristics of Foundation Models

Scale: Billions to trillions of parameters
Pre-training: Trained on massive, diverse datasets
Adaptability: Fine-tuned or prompted for specific tasks
Emergence: Capabilities that emerge only at scale
Transfer Learning: Knowledge transfers across domains

Major Foundation Model Providers

Provider	Models	Access
OpenAI	GPT-4, GPT-4o, DALL-E 3	API, Azure OpenAI Service
Anthropic	Claude 3.5 Sonnet, Claude 3 Opus	API, AWS Bedrock, Google Vertex
Google	Gemini Pro, Gemini Ultra, PaLM 2	Vertex AI, Google AI Studio
Meta	Llama 3, Llama 2, Code Llama	Open source, cloud platforms
Mistral	Mistral Large, Mixtral 8x7B	API, AWS Bedrock, Azure
Cohere	Command R+, Embed	API, AWS Bedrock

The Transformer Architecture

The Transformer architecture, introduced in the 2017 paper “Attention Is All You Need,” is the foundation of modern LLMs. It revolutionized NLP by enabling parallel processing of sequences through self-attention.

Key Components

1. Self-Attention Mechanism

Self-attention allows each position in a sequence to attend to all other positions, capturing long-range dependencies.

How it works:

Each token is transformed into Query (Q), Key (K), and Value (V) vectors
Attention scores computed: Attention(Q,K,V) = softmax(QK^T / √d_k) × V
Scores determine how much each token attends to every other token

2. Multi-Head Attention

Instead of single attention, use multiple attention “heads” in parallel, each learning different relationship patterns.

Benefit: Captures different types of relationships (syntactic, semantic, positional)

3. Positional Encoding

Since attention has no inherent notion of order, positional information is added to embeddings.

Methods: Sinusoidal encoding (original), learned positional embeddings, Rotary Position Embedding (RoPE)

4. Feed-Forward Networks

After attention, each position passes through identical feed-forward networks (typically 2-layer MLPs).

5. Layer Normalization and Residual Connections

Stabilize training and enable very deep networks through skip connections and normalization.

Transformer Variants

Type	Architecture	Examples
Encoder-Only	Bidirectional attention, good for understanding	BERT, RoBERTa
Decoder-Only	Causal attention, good for generation	GPT, Llama, Claude
Encoder-Decoder	Both components, good for translation	T5, BART

Vendor References:

NVIDIA: nvidia.com/blog/understanding-transformer-model-architectures/
Google: org/text/tutorials/transformer

Tokenization

Tokenization converts text into numerical tokens that models can process. The tokenization strategy significantly impacts model performance and efficiency.

Tokenization Methods

Method	Description	Used By
Word-level	Each word is a token. Large vocabulary, OOV issues.	Older models
Character-level	Each character is a token. Small vocab, long sequences.	Specialized tasks
BPE	Byte-Pair Encoding. Merges frequent character pairs.	GPT, RoBERTa
WordPiece	Similar to BPE with likelihood-based merging.	BERT
SentencePiece	Language-agnostic, treats text as raw bytes.	T5, Llama

Context Windows

The context window defines how much text a model can process at once. Longer contexts enable better understanding but increase computational cost quadratically.

Model	Context Window	Approx. Words
GPT-3.5	4K – 16K tokens	3K – 12K words
GPT-4 Turbo	128K tokens	~96K words
Claude 3	200K tokens	~150K words
Gemini 1.5 Pro	1M+ tokens	~750K words

Pre-training and Fine-tuning

Pre-training

Foundation models are pre-trained on massive datasets using self-supervised objectives.

Common Pre-training Objectives:

Causal Language Modeling (CLM): Predict next token given previous tokens (GPT-style)
Masked Language Modeling (MLM): Predict masked tokens (BERT-style)
Span Corruption: Predict corrupted spans (T5-style)

Fine-tuning Methods

Adapt pre-trained models to specific tasks or domains.

Full Fine-tuning

Update all model parameters. Highest quality but most expensive.

Use when: You have sufficient data and compute, need maximum customization

Parameter-Efficient Fine-tuning (PEFT)

Update only a small subset of parameters, keeping most frozen.

LoRA (Low-Rank Adaptation):

Inject trainable low-rank matrices into attention layers
Typically <1% of original parameters trained
Can be merged back for inference efficiency

QLoRA:

Combines LoRA with 4-bit quantization
Enables fine-tuning large models on consumer GPUs

Adapter Layers:

Insert small trainable modules between frozen layers

Instruction Fine-tuning

Train model to follow instructions using instruction-response pairs.

Result: Models that are better at following user requests

RLHF (Reinforcement Learning from Human Feedback)

Align model outputs with human preferences using reinforcement learning.

Process:

Collect human preferences on model outputs
Train a reward model on these preferences
Use PPO or similar to optimize LLM against reward model

Vendor Fine-tuning Services

Vendor	Service	Documentation
Microsoft	Azure OpenAI Fine-tuning	learn.microsoft.com/azure/ai-services/openai/how-to/fine-tuning
AWS	Bedrock Custom Models	docs.aws.amazon.com/bedrock/latest/userguide/custom-models.html
Google	Vertex AI Model Tuning	cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models
NVIDIA	NeMo Framework	docs.nvidia.com/nemo-framework/user-guide/latest/

Prompt Engineering

Prompt engineering is the practice of designing inputs that elicit desired outputs from LLMs. It’s a critical skill for working effectively with generative AI.

Prompting Techniques

1. Zero-Shot Prompting

Ask the model to perform a task without examples.

Example: “Classify this review as positive or negative: ‘Great product!'”

2. Few-Shot Prompting

Provide examples to guide the model’s behavior.

Example: “Review: ‘Loved it!’ → Positive. Review: ‘Terrible.’ → Negative. Review: ‘Pretty good’ → “

3. Chain-of-Thought (CoT)

Ask the model to show its reasoning step-by-step.

Trigger: Add “Let’s think step by step” or provide reasoning examples

Benefit: Dramatically improves performance on complex reasoning tasks

4. ReAct (Reasoning + Acting)

Combine reasoning traces with actions (like tool use).

Pattern: Thought → Action → Observation → Thought → …

5. Self-Consistency

Generate multiple reasoning paths and take majority vote.

Benefit: Reduces errors by leveraging diverse reasoning

Prompt Structure Best Practices

Role/Persona: “You are an expert data scientist…”
Context: Provide relevant background information
Task: Clear, specific instruction
Format: Specify desired output structure
Examples: Provide few-shot examples when helpful
Constraints: Define limitations and requirements

Vendor Prompt Engineering Resources

Vendor	Documentation
Microsoft	learn.microsoft.com/azure/ai-services/openai/concepts/prompt-engineering
Google	cloud.google.com/vertex-ai/docs/generative-ai/learn/prompts/introduction-prompt-design
AWS	docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html
Anthropic	docs.anthropic.com/en/docs/build-with-claude/prompt-engineering

Retrieval-Augmented Generation (RAG)

RAG combines LLMs with external knowledge retrieval to provide more accurate, up-to-date, and verifiable responses.

Why RAG?

LLM Limitations:

Knowledge cutoff – can’t access recent information
Hallucinations – generate plausible but false information
No access to private/proprietary data

RAG Solutions:

Ground responses in retrieved documents
Access current and private information
Provide citations for verification

RAG Architecture

Indexing: Documents → Chunking → Embedding → Vector Store
Retrieval: Query → Embed → Similarity Search → Relevant Chunks
Generation: Prompt + Context → LLM → Response

Key RAG Components

Chunking Strategies

Strategy	Description	Best For
Fixed-size	Split at fixed token/character count	Simple documents
Semantic	Split at natural boundaries (paragraphs)	Structured content
Recursive	Hierarchical splitting with overlap	General purpose
Document-aware	Respect document structure (headers)	Technical docs

Vector Embeddings

Convert text to dense numerical vectors that capture semantic meaning.

Popular Embedding Models:

OpenAI text-embedding-3-large
Cohere Embed v3
Google Vertex Embeddings
Open source: BGE, E5, GTE

Vector Databases

Database	Type	Best For
Pinecone	Managed cloud service	Production, scale
Weaviate	Open source, managed	Hybrid search
Chroma	Open source, embedded	Development, prototyping
pgvector	PostgreSQL extension	Existing Postgres infra
Qdrant	Open source, managed	Performance, filtering

Vendor RAG Services

Vendor	Service	Documentation
AWS	Bedrock Knowledge Bases	docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
Microsoft	Azure AI Search + OpenAI	learn.microsoft.com/azure/search/retrieval-augmented-generation-overview
Google	Vertex AI RAG Engine	cloud.google.com/vertex-ai/docs/generative-ai/rag-overview
NVIDIA	NeMo Retriever	developer.nvidia.com/blog/rag-101-retrieval-augmented-generation-questions-answered/

AI Agents and Tool Use

AI agents are LLM-powered systems that can take actions, use tools, and work autonomously toward goals.

Agent Capabilities

Tool Use: Call APIs, search web, execute code, query databases
Planning: Break complex tasks into steps
Memory: Maintain context across interactions
Reasoning: Decide which actions to take
Self-correction: Learn from errors and adjust

Function Calling / Tool Use

Modern LLMs can be trained to output structured function calls that applications can execute.

Pattern:

Define available tools with schemas
LLM decides when/which tool to use
Application executes the function
Results returned to LLM for continued reasoning

Agent Frameworks

Framework	Description
LangChain	Popular framework for building LLM applications with agents, chains, and tools
LlamaIndex	Framework focused on data ingestion and RAG with agent capabilities
AutoGen	Microsoft’s multi-agent conversation framework
CrewAI	Framework for orchestrating role-playing AI agents

Vendor Agent Services

Vendor	Service	Documentation
Microsoft	Azure AI Agent Service	learn.microsoft.com/azure/ai-services/agents/
AWS	Bedrock Agents	docs.aws.amazon.com/bedrock/latest/userguide/agents.html
Google	Vertex AI Agent Builder	cloud.google.com/vertex-ai/docs/generative-ai/agent-builder/overview
Salesforce	Agentforce	salesforce.com/agentforce/

Generative AI Safety

Ensuring safe and responsible use of generative AI requires addressing unique risks.

Key Risks

Hallucinations: Model generates false but plausible information
Harmful Content: Generation of toxic, illegal, or dangerous content
Prompt Injection: Malicious inputs that hijack model behavior
Data Leakage: Model reveals training data or private information
Bias Amplification: Model amplifies biases present in training data

Mitigation Strategies

Input Validation: Filter and sanitize user inputs
Output Filtering: Detect and block harmful outputs
Grounding: Use RAG to ground responses in verified sources
Guardrails: Implement content policies and system prompts
Human Review: Human-in-the-loop for high-stakes outputs
Red Teaming: Proactively test for vulnerabilities

Key Takeaways

Generative AI creates new content – text, images, code, audio, video
Foundation models are general-purpose – adapt through fine-tuning or prompting
Transformers power modern LLMs – self-attention enables parallel processing
Prompt engineering is essential – technique choice significantly impacts results
RAG grounds responses in facts – reduces hallucinations, enables private data
Fine-tuning customizes models – PEFT methods enable efficient adaptation
Agents extend LLM capabilities – tool use, planning, autonomous action
Safety requires active measures – guardrails, filtering, human oversight

Additional Learning Resources

Official Vendor Documentation

Azure OpenAI: microsoft.com/azure/ai-services/openai/
AWS Bedrock: aws.amazon.com/bedrock/
Google Vertex AI: google.com/vertex-ai/docs/generative-ai/learn/overview
Salesforce Einstein GPT: salesforce.com/s/articleView?id=sf.generative_ai_trust_layer.htm
NVIDIA NeMo: nvidia.com/nemo-framework/user-guide/latest/

Certification Preparation

Azure AI-102: microsoft.com/certifications/exams/ai-102
AWS ML Specialty: amazon.com/certification/certified-machine-learning-specialty/
Google ML Engineer: google.com/learn/certification/machine-learning-engineer
Salesforce AI Specialist: salesforce.com/credentials/aispecialist
NVIDIA DLI GenAI: nvidia.com

Article 6 | AI/ML Training Series – Generative AI Track

PowerKram Career Preparation Resources

Preparing for a certification exam aligned with this content? PowerKram offers objective-based practice exams built by industry experts, with detailed explanations for every question and scoring by vendor domain. Start with a free 24-hour trial:

Salesforce Agentforce Specialist Practice Tests — Prompt engineering and generative AI objectives for the Agentforce Specialist (AI-201) exam
Databricks Generative AI Engineer Associate Practice Tests — LLM architecture and GenAI pipeline objectives for Databricks certification
Azure AI-102 Practice Tests — Generative AI service objectives for the Azure AI Engineer Associate exam
AWS ML Specialty Practice Tests — Foundation model and GenAI objectives for the AWS ML Specialty

Level: Intermediate to Advanced | Estimated Reading Time: 35 minutes | Last Updated: February 2025

Part of the Complete AI & Machine Learning Guide

This article is part of The Complete Guide to AI and Machine Learning, a comprehensive pillar guide covering every essential AI/ML discipline from foundations to production deployment. The pillar guide maps how this topic connects to the broader AI/ML ecosystem and provides business context, common misconceptions, and underutilized capabilities for each area.

Continue Your Learning

Explore these related articles in the AI/ML training series to deepen your expertise across the full stack:

Deep Learning and Neural Networks — For the neural network and transformer fundamentals that underpin LLMs
RAG Architecture Deep Dive — For a comprehensive guide to grounding LLM responses in your organization’s data
Advanced Prompt Engineering — For production-grade prompting techniques including CoT, self-consistency, and function calling
AI Agents and Orchestration — To explore how LLMs power autonomous agent systems with tools and memory
Implementation specialist

← Return to the Complete AI & Machine Learning Guide for the full topic map and all supporting articles.

Question #1

A data science team at a consumer lending company is building an AI model to approve or deny personal loan applications. The compliance officer insists the model must achieve Demographic Parity, Equalized Odds, AND Predictive Parity simultaneously to satisfy all stakeholders. The lead ML engineer pushes back, citing a fundamental limitation.

Why is the compliance officer’s requirement problematic?

A) These three metrics can only be satisfied simultaneously if the model uses protected attributes as direct input features.

B) Achieving all three metrics requires an interpretable model architecture such as logistic regression, which would sacrifice accuracy.

C) These metrics are designed for classification tasks only and cannot be applied to the continuous probability scores used in lending decisions.

D) It is mathematically proven that — except in trivial cases — Demographic Parity, Equalized Odds, and Predictive Parity cannot all be satisfied simultaneously, so the organization must choose which definition of fairness is most appropriate for their context.

Solution

Correct Answer: D

Explanation: This reflects the Impossibility Theorem described in the Fairness Metrics section. These three fairness definitions are mathematically incompatible in all but trivial cases (e.g., when base rates are identical across groups). Organizations must make a deliberate, documented choice about which fairness metric best fits their use case, regulatory requirements, and stakeholder values. The other options introduce incorrect preconditions — using protected attributes, requiring specific architectures, or limiting metric applicability — none of which are the actual constraint.

Question #2

A consortium of five hospitals wants to collaboratively train a diagnostic AI model for a rare disease. Data privacy regulations such as HIPAA prohibit sharing patient records across institutions, and no single hospital has enough data to train an accurate model independently. The consortium needs a technique that enables collaborative model training while keeping all patient data within each hospital’s infrastructure.

Which privacy-preserving technique is BEST suited to this scenario?

A) Homomorphic encryption, which allows the hospitals to upload encrypted patient records to a shared cloud server where the model is trained on ciphertext without ever decrypting the data.

B) Federated learning, where a global model is sent to each hospital, trained locally on that hospital’s patient data, and only aggregated model updates — not raw data — are shared with a central server.

C) Differential privacy, which adds calibrated noise to each hospital’s patient records before they are combined into a single centralized training dataset.

D) Synthetic data generation, where each hospital creates artificial patient records that mimic statistical patterns and then shares the synthetic datasets for centralized model training.

Solution

Correct Answer: B

Explanation: Federated learning is specifically designed for this scenario — it enables collaborative model training across decentralized data sources without centralizing the raw data. The model travels to the data, not the other way around. Each hospital trains locally, and only model gradients (updates) are aggregated centrally. While homomorphic encryption is a valid privacy technique, it is computationally expensive and does not directly address the distributed training challenge. Differential privacy with centralized data still requires sharing records. Synthetic data loses fidelity for rare diseases where subtle clinical patterns matter most.

Question #3

A corporate legal department has deployed an AI system to review vendor contracts and flag potentially risky clauses. After initial deployment as a fully automated system (human-out-of-the-loop), the tool missed several unusual liability clauses that fell outside its training patterns, exposing the company to significant financial risk. Leadership wants to redesign the system to balance efficiency with risk mitigation.

Which approach BEST addresses this situation while maintaining operational efficiency?

A) Retrain the model on a larger dataset of contracts that includes the unusual liability clauses it missed, then redeploy as a fully automated system with quarterly accuracy audits.

B) Replace the AI system entirely with a team of paralegals who manually review all contracts, since AI has proven unreliable for legal document analysis.

C) Implement a human-on-the-loop model with confidence-based routing, where high-confidence contract reviews are auto-approved with sampling, and low-confidence or high-value contracts are escalated to attorneys for review.

D) Switch to an interpretable rule-based system that uses keyword matching to flag risky clauses, since black-box AI models cannot be trusted for legal decisions.

Solution

Correct Answer: C

Explanation: The human-on-the-loop model with confidence-based routing directly addresses the core problem: fully automated systems miss edge cases, while fully manual review is inefficient. By routing decisions based on the model’s confidence level, the organization captures the efficiency benefits of automation for routine contracts while ensuring human expertise is applied to uncertain or high-value cases. This matches the document’s guidance that the appropriate level of human oversight should be calibrated to the risk, impact, and reversibility of decisions. Simply retraining doesn’t prevent future novel patterns from being missed. Abandoning AI entirely sacrifices the efficiency gains. Rule-based keyword matching is too rigid for complex legal language.

Question #4

A fintech company uses a gradient-boosted ensemble model to evaluate personal loan applications. A financial regulator has issued an inquiry requiring the company to provide individual-level explanations for each applicant who was denied credit — specifically, they must cite the top contributing factors for every adverse decision and show applicants what changes would improve their outcome.

Which combination of explainability techniques BEST satisfies both regulatory requirements?

A) SHAP values to identify the top features contributing to each denial, combined with counterfactual explanations to show applicants the smallest changes that would produce a different outcome.

B) Global feature importance rankings to show which factors the model weighs most heavily across all decisions, combined with partial dependence plots to illustrate how each feature affects predictions on average.

C) A global surrogate model (decision tree) trained to approximate the ensemble’s behavior, which can then be presented to regulators as the actual decision logic.

D) Attention visualization to show which parts of the application the model focuses on, combined with LIME to fit a local linear model around each prediction.

Solution

Correct Answer: A

Explanation: The regulator requires two things: (1) individual-level factor attribution for each denial, and (2) actionable guidance for applicants. SHAP values provide mathematically rigorous, game-theoretic feature contributions for individual predictions — making them the gold standard for per-decision explanations. Counterfactual explanations identify the smallest input changes needed to flip the outcome, directly addressing the ‘what would need to change’ requirement. Global feature importance and PDP are aggregate techniques that do not explain individual decisions. A surrogate model is an approximation and misrepresents the actual decision process. Attention visualization applies to neural networks and transformers, not gradient-boosted ensembles.

Question #5

A global consumer brand is deploying a generative AI system to create personalized marketing emails at scale across diverse international markets. During pilot testing, the system occasionally produces culturally insensitive content when targeting specific demographic segments, including stereotypical references and tone-deaf messaging that could damage the brand’s reputation.

Which set of safeguards is MOST comprehensive for responsible deployment of this generative AI system?

A) Translate all marketing content into English first, run it through a single toxicity filter, and then translate it back into the target language before sending.

B) Restrict the generative AI to producing content only in English for all markets, and hire local translators to manually adapt every email for cultural relevance.

C) Add a disclaimer to each email stating that the content was generated by AI, which satisfies transparency requirements and shifts responsibility away from the brand.

D) Implement a multi-layer pipeline: prompt engineering with cultural sensitivity guidelines, automated toxicity and bias detection on outputs, human review sampling with higher rates for diverse segments, and a recipient feedback mechanism to flag inappropriate content.

Solution

Correct Answer: D

Explanation: The multi-layer pipeline approach addresses the problem at every stage — from input (prompt engineering with cultural guidelines), through processing (automated toxicity and bias detection), to output (human review sampling and recipient feedback). This aligns with the document’s guidance on responsible generative AI deployment, which emphasizes content filtering, human review for high-stakes content, transparent disclosure, and red-team testing. Translating to English and back introduces translation artifacts and misses cultural nuance. Restricting to English ignores the reality of global marketing. A disclaimer alone does not prevent the harm — it merely attempts to deflect accountability, which contradicts the core principle of accountability in responsible AI.

Choose Your AI Certification Path

Whether you’re exploring AI on Google Cloud, Azure, Salesforce, AWS, or Databricks, PowerKram gives you vendor‑aligned practice exams built from real exam objectives — not dumps.

Start with a free 24‑hour trial for the vendor that matches your goals.

All
Google
AWS
Microsoft
DataBricks
Salesforce

All

See all vendors offering data engineering practice exams.

Table of Contents

Generative AI and Large Language Models

Introduction

What Is Generative AI?

Generative vs. Discriminative Models

Types of Generative AI

Foundation Models

Characteristics of Foundation Models

Major Foundation Model Providers

The Transformer Architecture

Key Components

1. Self-Attention Mechanism

2. Multi-Head Attention

3. Positional Encoding

4. Feed-Forward Networks

5. Layer Normalization and Residual Connections

Transformer Variants

Tokenization

Tokenization Methods

Context Windows

Pre-training and Fine-tuning

Pre-training

Fine-tuning Methods

Full Fine-tuning

Parameter-Efficient Fine-tuning (PEFT)

Instruction Fine-tuning

RLHF (Reinforcement Learning from Human Feedback)

Vendor Fine-tuning Services

Prompt Engineering

Prompting Techniques

1. Zero-Shot Prompting

2. Few-Shot Prompting

3. Chain-of-Thought (CoT)

4. ReAct (Reasoning + Acting)

5. Self-Consistency

Prompt Structure Best Practices

Vendor Prompt Engineering Resources

Retrieval-Augmented Generation (RAG)

Why RAG?

RAG Architecture

Key RAG Components

Chunking Strategies

Vector Embeddings

Vector Databases

Vendor RAG Services

AI Agents and Tool Use

Agent Capabilities

Function Calling / Tool Use

Agent Frameworks

Vendor Agent Services

Generative AI Safety

Key Risks

Mitigation Strategies

Key Takeaways

Additional Learning Resources

Official Vendor Documentation

Certification Preparation

PowerKram Career Preparation Resources

Part of the Complete AI & Machine Learning Guide

Continue Your Learning

Choose Your AI Certification Path

All

Professional Machine Learning Engineer

Professional Data Engineer

AWS Certified AI Practioner

AWS Machine Learning Specialist

AWS Machine Learning Engineer – Associate

Microsoft AI-102 Azure AI Engineer Associate

Microsoft AI-900 Azure AI Fundamentals