DEV Community

I Let an AI Agent Become My DevOps Engineer

Sarvar Nadaf on February 25, 2026

👋 Hey there, tech enthusiasts! I'm Sarvar, a Cloud Architect with a passion for transforming complex technological challenges into elegant soluti...

Read full post

Luftie The Anonymous • Feb 26

Did you review the codebase of the product that AI built ? Also it's important and crucial to mention that this way of coding is not adviced for everyone. As you mentioned, you got 10 years of experience in it, so you basically have massive experience in building pipelines but a junior or midish dev who avoids writing the code on his own, because he does not understand and therefore uses AI to execute the task without any learning, then it's just stupid imo.

Anyways, I'm glad your pipelines work as they should :D

Sarvar Nadaf AWS Community Builders • Feb 26

Yes I reviewed every PR before merge. The agent generated the implementation, but validation, refactoring, and architectural decisions stayed with me.
And I completely agree this approach assumes strong fundamentals. Without deep DevOps and cloud experience, using AI as a substitute for understanding is risky. It should amplify expertise, not replace learning.

The Seventeen • Feb 26

How do you handle secrets when using this workflow?

Sarvar Nadaf AWS Community Builders • Feb 26

For this demo implementation, due to time constraints, I kept everything in a single configuration file IPs, ports, tokens, usernames, and passwords. It was purely for a controlled, non-production setup.

In a real-world environment, I would use AWS Secrets Manager to store all sensitive data. Instead of hardcoding credentials, the config would reference secret ARNs (e.g., Jenkins_Token_Arn = "arn:aws:secretsmanager:..."), and the AI agent would retrieve the secret dynamically at runtime using IAM-based access.

Hardcoding is acceptable for a quick demo, but for production-grade DevSecOps workflows, centralized secret management with proper IAM controls is non-negotiable.

The Seventeen • Feb 26

That answers the question perfectly. I built a Zero-knowledge secrets management approach for ai agents. The ai can make authenticed requests with your secrets without seeing their values

Sarvar Nadaf AWS Community Builders • Feb 26 • Edited

Could you please share all the details at simplynadaf@gmail.com ?

Mustkhim Inamdar • Feb 27

Hi Sarvar, nice work. Just for understanding, how did you handel the credentials and was the security strategy from your end?

Sarvar Nadaf AWS Community Builders • Feb 27

For this demo implementation, due to time constraints, I kept everything in a single configuration file — IPs, ports, tokens, usernames, and passwords. It was purely for a controlled, non-production setup.

Hardcoding is acceptable for a quick demo, but for production-grade DevSecOps workflows, centralized secret management with proper IAM controls is non-negotiable.

signalstack • Feb 26

The 45-minute pipeline story tracks with my experience — but the harder question is what happens at 3 AM when something breaks in production and the agent needs to diagnose it across a system it built.

I run agents autonomously 24/7 and the DevOps tasks are actually where the failure modes show up most clearly. Three patterns worth flagging:

State drift between sessions. The agent built the pipeline with full context. Six weeks later, when diagnosing an incident, it needs to reconstruct that context from logs and config. If you didn't invest in good audit trails during the build, the diagnostic session is slower than a human reading the code fresh. The 45-minute build time is real. The missing investment is logging what the agent decided and why.

Retry loops on ambiguous failures. Your OWASP / Docker permission examples are the easy case — the agent correctly identified the root cause and fixed it. The hard case is a flaky test that passes 70% of the time. An agent without a hard retry budget will keep spinning. Human engineers give up and page someone. The agent needs explicit circuit breakers.

Scope creep in autonomous fix mode. When the agent fixes one issue, it sometimes "helpfully" refactors adjacent code. In a demo pipeline, that's fine. In production, you get a PR that fixed the failing test and also changed 400 lines of Terraform that weren't part of the original task. Strong scope constraints on fix mode matter more than strong scope constraints on build mode.

None of this negates the core point — the propose-approve loop is real leverage. But the operational overhead shifts from execution to observability.

Sarvar Nadaf AWS Community Builders • Feb 27

That’s a very sharp observation, and I agree the real test isn’t the 45-minute build, it’s the 3 AM production incident. State drift becomes a serious issue if the agent’s decisions and rationale aren’t logged, retry loops need explicit budgets and circuit breakers to prevent endless spinning on flaky failures, and fix mode must be tightly scoped to avoid unintended refactors beyond the original problem. None of this invalidates the leverage of the propose-approve loop, but it does mean the operational focus shifts from raw execution speed to strong observability, auditability, and governance.

syncchain2026-Helix • Feb 26

Great article on AI DevOps agents! The key challenge is documenting procedures so agents can execute them reliably. I built SkillForge (skillforge.expert, also on Product Hunt) to solve this - it turns screen recordings into structured SKILL.md files that agents can follow. Record yourself doing a DevOps workflow once, and the AI extracts every step into an executable skill file. Free recording + 20 free credits on signup. Would love your thoughts on using recorded workflows for agent training!

Sarvar Nadaf AWS Community Builders • Feb 26

Thank You!

klement Gunndu • Feb 27

Curious what the failure mode looked like when it got something wrong — the pipeline success story is compelling, but the interesting data point is how it handled the first broken state it encountered. Did it recover autonomously or did you have to step in?

Sarvar Nadaf AWS Community Builders • Feb 27

It mainly depends on how the prompt and guardrails were designed.

If the instructions explicitly allowed retries, log inspection, and iterative fixes until validation passed, the agent attempted autonomous recovery. But if the failure crossed defined constraints or wasn’t covered in the prompt logic, I stepped in. So recovery wasn’t magic it was bounded autonomy based on how the workflow was engineered.