I B M C E R T I F I C A T I O N
F1005100 IBM Certified Professional SRE v2 PLUS IBM Power Virtual Server v1 Specialty Practice Exam
Exam Number: 4311 | Last updated April 17, 2026 | 329+ questions across 5 vendor-aligned objectives
Holders of the F1005100 composite credential pair cloud-native SRE expertise with deep Power Virtual Server skills. This track is built for reliability engineers who run mission-critical workloads spanning Linux containers, AIX, and IBM i on IBM Cloud. Candidates should be fluent with SLI/SLO design, error budget policy, Kubernetes operations, and the networking and storage patterns specific to Power Virtual Server estates.
Capturing 25% of the exam, Reliability Engineering Fundamentals covers service-level indicators, error budgets, incident command, and blameless postmortem practice. At 22%, Platform and Container Operations covers OpenShift, Kubernetes, Helm, and Argo CD patterns for resilient deployment. A further 20% targets Power Virtual Server Operations, covering workspace design, VPN and Direct Link connectivity, and storage tier management for AIX and IBM i workloads.
The closing domains round out the blueprint. Observability and Performance accounts for 18% and spans metrics, traces, logs, IBM Instana integration, and capacity management. Security and Compliance represents 15% and spans IAM, secrets management, and posture assessments across hybrid environments. Expect layered scenarios that ask you to apply SRE practice to Power workloads rather than cloud-native services.
Every answer links to the source. Each explanation below includes a hyperlink to the exact IBM documentation page the question was derived from. PowerKram is the only practice platform with source-verified explanations. Learn about our methodology →
721
practice exam users
94%
satisfied users
91%
passed the exam
4.7/5
quality rating
Test your F1005100 sre v2 power server v1 knowledge
10 of 329+ questions
Question #1 - Reliability Engineering Fundamentals
Northbrook Mutual’s payments API is committed to a 99.9% monthly availability SLO. Last month it was available 99.87% of the time, burning more than the allotted error budget.
What is the appropriate SRE response for the release train the following month?
A) Freeze non-essential changes and only allow reliability-focused work until the budget recovers
B) Accept the overage and continue the normal release cadence to preserve velocity
C) Lower the SLO target to 99.85% so the team is no longer over budget
D) Escalate all future deploys for VP approval regardless of risk
Show solution
Correct answers: A – Explanation:
Error-budget policy requires slowing feature releases once the budget is exhausted so the team can repair reliability rather than add risk. Accepting the overage ignores the purpose of the budget. Lowering the SLO hides the problem rather than fixing it and may violate customer commitments. Blanket VP approval is ceremony, not a reliability control, and does not restore budget. Source: Check Source
Question #2 - Reliability Engineering Fundamentals
During a sev-1 incident at Pembroke Genomics, the on-call SRE, a database engineer, and a product manager all believe they are running the response.
Which incident-command practice resolves the ambiguity fastest?
A) Page a senior manager to decide who is in charge
B) Dissolve the call and restart it in 15 minutes
C) Let each participant continue to work in parallel until consensus emerges
D) Explicitly name and announce a single Incident Commander and document it in the incident channel
Show solution
Correct answers: D – Explanation:
Incident command requires one named, announced commander so coordination and decisions flow through a single point; this is standard SRE doctrine. Paging a senior manager adds latency and typically hands command to someone without live context. Parallel work without a commander produces conflicting actions and duplicated effort. Restarting the call wastes critical recovery minutes. Source: Check Source
Question #3 - Reliability Engineering Fundamentals
After a two-hour outage at Harborview Logistics, the team wants to hold a postmortem. Two engineers involved are worried about being blamed.
Which postmortem practice addresses their concern and produces the most learning?
A) Skip the postmortem to protect team morale
B) Limit the postmortem attendance to leadership only
C) Require each contributor to submit a written apology before attending
D) Run a blameless postmortem focused on systemic contributors and corrective actions rather than individual fault
Show solution
Correct answers: D – Explanation:
Blameless postmortems treat incidents as systems failures and focus on contributing factors and action items, which is both the canonical SRE practice and the one most likely to surface honest detail. Skipping the postmortem forfeits learning and lets the same incident recur. Restricting attendance hides detail from the people closest to the failure. Requiring apologies encourages defensive rather than honest accounts. Source: Check Source
Question #4 - Platform and Container Operations
A hotel-booking platform at Cedarcrest Hospitality runs on Red Hat OpenShift on IBM Cloud and needs consistent, auditable rollouts across dev, stage, and prod clusters.
Which deployment pattern best satisfies the consistency and audit requirements?
A) A shared Helm chart deployed through Argo CD with environment overlays committed to Git
B) Hand-crafted oc apply commands run by each environment team
C) Manual OpenShift web-console deployments with screenshots for audit
D) Standalone shell scripts executed from each engineer’s laptop
Show solution
Correct answers: A – Explanation:
GitOps with Argo CD deploying a versioned Helm chart produces the same artifact across environments and keeps every change in Git for audit — the canonical OpenShift GitOps pattern. Hand-crafted oc apply and console clicks are not reproducible and leave weak audit trails. Local shell scripts add drift between operator laptops and break least-privilege. Source: Check Source
Question #5 - Platform and Container Operations
The Willowpark Insurance platform team is rolling out a new microservice across 18 OpenShift clusters and must limit blast radius if the new version misbehaves.
Which progressive rollout pattern fits best?
A) Deploy to all 18 clusters at once and monitor alerts
B) Hold deployment until a full regression suite passes in one lab cluster, then deploy everywhere simultaneously
C) Use a canary rollout to a small subset of clusters, then promote to the rest based on SLI health
D) Roll out only to the smallest-traffic cluster permanently
Show solution
Correct answers: C – Explanation:
Canary rollouts limit blast radius and use SLI health to gate promotion, which is the standard progressive-delivery pattern on OpenShift GitOps. Deploying everywhere at once defeats blast-radius control. A lab cluster cannot reproduce real traffic shapes, so a green suite followed by a full rollout is still risky. Holding new versions on the smallest cluster forever is a never-done rollout, not progressive delivery. Source: Check Source
Question #6 - Power Virtual Server Operations
Meridian Shipping runs several AIX workloads on Power Virtual Server across two workspaces in different zones. They need a reliable low-latency connection from their on-premises data center to both workspaces.
Which connectivity option best meets the requirement?
A) A public-internet VPN from each site to each workspace
B) Direct Link to IBM Cloud with Transit Gateway fanning out to both Power workspaces
C) NAT gateways in each workspace pointing back to on-prem
D) A bastion host in each workspace with SSH tunnels
Show solution
Correct answers: B – Explanation:
Direct Link plus Transit Gateway is the supported pattern for private, low-latency connectivity from on-premises into multiple Power Virtual Server workspaces. Public VPNs add latency and depend on internet quality. NAT gateways address egress IP masking, not private connectivity. Bastion hosts serve operator access, not production data paths. Source: Check Source
Question #7 - Power Virtual Server Operations
A manufacturing customer at Goldleaf Industrial has an AIX LPAR on PowerVS with an Oracle workload that must tolerate planned hypervisor maintenance without an outage.
Which capability is designed for this use case?
A) Rebuild the LPAR from an image during the maintenance window
B) Live Partition Mobility to migrate the running LPAR to another host without application outage
C) Stop the LPAR before maintenance and start it afterward
D) Run a second cold-standby LPAR and swap DNS manually
Show solution
Correct answers: B – Explanation:
Live Partition Mobility relocates an active LPAR to another host with no application outage, which is the defined answer for planned maintenance on Power. Rebuilding from an image produces downtime and data risk. Stop/start is an outage by definition. A DNS-swap cold standby adds recovery time and human error versus the hypervisor-transparent LPM option. Source: Check Source
Question #8 - Observability and Performance
Riverbend Retail’s cart service suffers intermittent latency spikes that do not show up in dashboards. The team can see that p99 latency crosses SLO but cannot find the responsible dependency.
Which observability capability most directly answers that question?
A) Running load tests in staging against the cart service in isolation
B) Adding more CPU to the cart service
C) Distributed traces that follow each cart request across services, showing per-span latency
D) Increasing the log level to DEBUG globally
Show solution
Correct answers: C – Explanation:
Distributed tracing attributes end-to-end latency to specific downstream spans and is the purpose-built observability tool for intermittent p99 regressions, provided by IBM Instana and the OpenTelemetry ecosystem. Adding CPU is blind tuning without a diagnosis. Isolated load tests miss the multi-service interaction. Raising log levels globally produces noise and can mask the issue by slowing the system further. Source: Check Source
Question #9 - Observability and Performance
Before on-boarding a new payment processor, the Observability team at Oakhaven Bank wants to capture every metric that the existing processor already exposes, for parity comparison.
Which approach produces that comparison with the least manual work?
A) Import the processor’s existing Prometheus metrics into IBM Instana via the metrics pipeline and build a side-by-side dashboard
B) Have SREs screenshot each Grafana panel and paste them into a slide deck
C) Wait until an incident occurs and compare behaviors live
D) Delete the old processor before evaluating the new one
Show solution
Correct answers: A – Explanation:
Ingesting existing Prometheus metrics into Instana and building a side-by-side dashboard is the supported way to drive parity comparisons without manual effort. Screenshots freeze in time and lose detail. Waiting for an incident removes controlled comparison. Deleting the old processor before evaluation eliminates the baseline you need. Source: Check Source
Question #10 - Security and Compliance
A hybrid estate at Maplecreek Health stores database credentials used by OpenShift workloads and by an AIX application on PowerVS. The security team wants one place to rotate and audit the secrets.
Which approach best meets the requirement?
A) Keep secrets in each team’s Git repository and rotate by pull request
B) Store secrets in IBM Cloud Secrets Manager and pull them into both OpenShift and the AIX workload via the Secrets Manager API or operator
C) Hard-code the credentials in application configuration files on each host
D) Email rotated secrets to operators every 90 days
Show solution
Correct answers: B – Explanation:
IBM Cloud Secrets Manager centralizes lifecycle, rotation, and audit for secrets, with supported integrations for OpenShift and custom API consumers such as AIX apps. Git storage exposes secrets in history. Hard-coding prevents rotation. Email rotation is auditable only informally and fails basic secret-handling controls. Source: Check Source
Get 329+ more questions with source-linked explanations
Every answer traces to the exact IBM documentation page — so you learn from the source, not just memorize answers.
Exam mode & learn mode · Score by objective · Updated April 17, 2026
Learn more...
What the F1005100 sre v2 power server v1 exam measures
- Define and enforce service-level indicators, error budgets, and incident-command practices to align engineering teams on what reliability means and when to slow releases
- Operate and scale OpenShift, Kubernetes, Helm, and Argo CD deployment patterns to ship containerized services on IBM Cloud with consistent, auditable rollouts
- Run and maintain Power Virtual Server workspaces, connectivity, and storage tiers to keep AIX and IBM i workloads highly available across maintenance and failure events
- Instrument and diagnose metrics, traces, logs, and IBM Instana dashboards to detect degradation before customers notice and drive faster mean-time-to-recovery
- Protect and verify IAM, secrets management, and posture assessments to enforce least-privilege access and maintain compliance across hybrid deployments
How to prepare for this exam
- Review the official exam guide to understand every objective and domain weight before you begin studying
- Work through the relevant IBM Training learning path — ibm certified professional sre v2 plus ibm power virtual server v1 specialty F1005100 — to cover vendor-authored material end-to-end
- Get hands-on inside IBM TechZone or a comparable sandbox so you can practice the console tasks, CLI commands, and APIs the exam expects
- Tackle a real-world project at your workplace, a volunteer role, or an open-source repository where the technology under test is actually in use
- Drill one exam objective at a time, starting with the highest-weighted domain and only moving on once you can teach it to someone else
- Study by objective in PowerKram learn mode, where every explanation links back to authoritative IBM documentation
- Switch to PowerKram exam mode to rehearse under timed conditions and confirm you consistently score above the pass mark
Career paths and salary outlook
Reliability engineers who can span Power workloads and cloud-native services sit at a premium on the hiring market:
- Site Reliability Engineer — $125,000–$170,000 per year, owning reliability for hybrid IBM Power and containerized workloads (Glassdoor salary data)
- Principal SRE — $145,000–$195,000 per year, setting reliability strategy across multiple product lines (Indeed salary data)
- Platform Engineering Manager — $150,000–$200,000 per year, leading SRE and platform teams for regulated enterprises (Glassdoor salary data)
Official resources
Work through the official IBM Training learning path for this certification, which bundles videos, labs, and skill tasks aligned to every objective. The official exam page lists the full objective breakdown, prerequisite knowledge, and scheduling details.
