Context: This article makes the case that BDD is not dying and that the classic 3 Amigos, namely, Business, Development, and Testing, are now joined by a 4th– yes, AI. To keep things concrete, I’m using one sticky example from a retail e-commerce application to explore how BDD evolves today with GenAI and Agentic AI in the world of testing and quality engineering. I recommend reading it start to finish to get the full picture, as the example builds across sections.
The Original BDD Promise – Clarity Through Collaboration
Behavior-Driven Development (BDD) was never just about writing test cases. It was about alignment. It brought together the three roles that most influence product quality– Business, Development, and Testing. This “3 Amigos” practice created a shared vocabulary and behavioral understanding before any code was written.
- In theory, it worked. But in practice, teams often stopped at tooling.
- Gherkin became a syntax checkbox– used, but rarely understood for its purpose.
- Conversations were skipped.
- The behavior part of BDD quietly faded.
- It got overtaken by brittle step definitions and disconnected automation scripts.
In many teams, step definitions were cloned and reused across unrelated features. This created brittle dependencies– where a change in one domain broke scenarios in another.
Often, the .feature files lived in test repos while requirements were discussed in Jira or Slack– developers saw them as QA artifacts and rarely as design inputs.
Even when .feature files ran in CI/CD, teams rarely trusted them as a source of behavioral truth. The syntax lived on, but the design conversation moved elsewhere.
This disconnect turned BDD into an after-the-fact test format, not a shared contract.
Some say the story ended there, but I am among those who never believed it had to.
Why Traditional BDD Tools Are Fading
SpecFlow, once the standard for .NET BDD, was officially sunset by Tricentis in late 2024. Cucumber, the original poster child of Gherkin syntax, has lost active investment. SmartBear no longer allocates dedicated engineering to it. The tooling landscape for BDD is in maintenance (retiring?) mode. By choice? Well, let us not bother about it.
Facts
But the core problem is not tool death– it is that BDD as a human process was never fully adopted. Teams rushed to automate before they aligned. AI now offers a second chance to get it right. I believe that deeply.
A Sticky Example – E-Commerce Cart Holds
We will anchor this discussion around a common e-commerce feature: “Cart Hold”. When a logged-in user adds limited-stock items to their cart, the system should hold those items for, say, 15 minutes. If they do not check out, the items are released.
Seems simple, right? But is the hold per item? Per cart? What happens if they open two tabs? Can a guest user get a hold? Is it reset if the user adds another item? This is where BDD shines– and where the 4th Amigo AI now steps in to help the original 3.
Let us dig in.

AI as the 4th Amigo
With the rise of GenAI & test agents, we now have a 4th participant in the discovery process. An AI-powered agent can:
- Assist in refining the story using INVEST principles.
- Convert a vague feature like “cart hold” into multiple behavioral examples.
- Apply prompt engineering to explore timeouts, concurrency, and stock deduction.
- Detect missing scenarios, hidden business rules, or ambiguity in existing ones.
For example, take a vague user story like:
“As a customer, I want to hold my cart so I don’t lose items”.
A typical GenAI agent, the 4th Amigo, can instantly sharpen this, at the minimum, using INVEST principles:
“As a logged-in user adding limited-stock items, I want those items held in my cart for 15 minutes so I have time to complete checkout without losing availability”.
Yes.
The 4th Amigo does not stop there– it proposes edge-case scenarios, like concurrent sessions or partial cart expirations that are often missed in grooming, and more so with true respect to “behavior” aspect of BDD and the conversations triggered from it.
What used to take 30 minutes can now start with a short assistive prompt to the 4th Amigo, bringing the shared conversation among the 3 Amigos down to a sharper 15 minutes.
BDD, The Old Way
With the 4th Amigo, AI
Reimagining the 3 Amigos Session
One of the root failures of traditional BDD was that developers rarely engaged with the .feature files. By the time they encountered them, the code was already written, or worse, the file was just a ready-to-break automation script. AI-assisted workflows give us a fresh opportunity to make behavior modeling collaborative again.
Unlike the traditional 3 Amigos whiteboard session, modern teams are adopting a hybrid loop:
Pre-Session: The 4th Amigo, an AI test agent, like the ones I build for my clients, refines the raw story into INVEST-compliant form with acceptance criteria, generates at least three to five Gherkin-style scenarios, and highlights areas of vagueness.
During Session: Human Amigos (PO, Developer, QE) vet the AI outputs. They edit or challenge the behavior assumptions. They use assistive prompts to query their domain-tuned test agent, for example, “What happens if the user opens two tabs”?
Post-Session: Finalized scenarios go into version control and become acceptance criteria and test drivers. AI is not replacing the human conversation. It is amplifying the starting point.
Behind the Scenes – Yes, The AI Test Agent Uses Gherkin
Even if the user interface shows “plain English”, most modern AI test agents internally translate to Gherkin-like behavioral structures. Why? Because the Given–When–Then structure is still the most robust foundation for modeling behavior. And of course, to seed the ideas for useful test cases.
Whether the output is structured JSON, YAML, or a DSL, the internal model benefits from the clarity of:
Given the cart has two limited-stock items
When the user leaves the page idle for 16 minutes
Then the items should be released to inventory
In test asset generation, prompt engineering works best when it produces strong behavioral scaffolding or structures– even if the 4th Amigo hides that structure from the 3 Amigos!
Bringing Developers Back In – Through the Pull Request
One of the most effective ways to enforce quality and collaboration is to treat behavioral scenarios as first-class citizens in version control. That means:
Step 1: Include AI-generated or AI-refined scenarios directly in pull requests.
Step 2: Require a review by at least one human Amigo, preferably the developer.
Why?
- Behavior becomes part of the DevOps loop instead of remaining a detached artifact.
- It forces alignment before test case authoring, automation, or execution begins.
- It filters out hallucinated behavior before it reaches the test suite.
- It reinforces learning in AI agents, especially when working with memory-backed or fine-tuned models.
Now imagine a pull request where the new and the 4th Amigo, the AI test agent, proposes three edge-case scenarios: concurrent sessions, guest user attempts, and item expiration timing. It also flags concerns like whether an unauthenticated session could abuse the hold logic through repeated requests, raising a security issue, or whether excessive parallel holds might degrade system performance. These scenarios, structured in Gherkin or a similar DSL, are included in the same Git branch as the application code change. The Developer Amigo reviews them, spots a logic gap in one, and updates both the scenario and the code or implementation logic. They may also update related documentation or annotations.
If needed, the Automation Amigo or SDET Amigo steps in to revise the underlying step definitions, ensuring they accurately reflect the evolving system behavior and stay executable in the CI/CD pipeline.
But the loop does not stop there. The Tester Amigo picks up the merged branch, validating the updated behavior against both automated assertions and exploratory insights. They may issue new assistive prompts to the test agent to explore missed business flows, concurrency overlaps, edge timing issues, or input abuse cases. Any gaps or insights are folded back into version control as enriched scenarios, updated steps, or new tests.
AI-powered test agents now detect step reuse risks, enforce scenario boundaries, and prevent silent breakage. Feature files evolve with requirements. Not after them. We can now restore trust in them as the living design anchors (we must keep them live, needless to say).
Again, this is not just tooling. It is the new BDD in motion. Seeded by AI, refined by Developer Amigo in Git, extended by Tester Amigo in test execution, and kept alive in your DevOps and DevSecOps pipelines. It brings clarity to CI/CD and relevance to Continuous Testing of the AUT and Continuous Learning for the AI, all while keeping behavior at the center of design.
Let SpecFlow and Cucumber rest in peace (or as a piece). But do not mistake that for the death of BDD. A new loop has begun, and this time, all four Amigos are in.
Do you see how BDD can come alive again? Not just as syntax, but as a modern collaboration practice with the 4th Amigo now fully in the loop.
The Future Is Agentic
In the near future, we will not just use AI to draft stories. We will deploy domain-specific test agents that:
- Trace stories to coverage gaps.
- Recommend exploratory charters.
- Trigger boundary condition tests.
- Evaluate time-based state changes (temporal testing, like the cart hold expiration in our ongoing example).
The Story Does Not End at Scenario Modeling, This Article Does 🙂
Once a Cart Hold story is finalized:
- AI can generate both manual and automated test cases.
- Test execution can happen locally, remotely, or in parallel test clouds.
- Observability and test impact analysis can feed back into future scenario discovery.
The story continues-- from prompt to plan to production. Repeat. Hail BDD.
Final Thought Before You Invite the 4th Amigo
BDD began as a way to align humans around behavior. In the AI era, that alignment still matters– only now, it comes with smarter scaffolding, faster iterations, and fewer blind spots.
If Gherkin seems a limitation, invent your own DSL or at best, use English. BDD remains untouched.
AI is not here to take over the Amigos table. It is here to fill the empty chair that was always waiting.
Ready?
Let the customer hold their cart, not their breath.
Let the business close the sale, not the tale.
Let the 4th Amigo move the things, like never before.
Call to Action
- Start small.
- Pick one user story.
- Approach it with a BDD mindset.
- Focus on shared understanding.
- Activate the 4th Amigo.
- Let the it sharpen things.
- Align the 3 Amigos with the 4th.
The rest will follow naturally.
Wait. Interested To Read More Industry Perspectives?
- Dawid Dylowicz – “Is BDD dying?” – Raises concerns and forward-looking perspectives following the deprioritization of Cucumber and the sunsetting of SpecFlow. https://www.linkedin.com/posts/dawid-dylowicz_softwaretestingweekly-softwaretesting-activity-7307877400036454400-XOhM
- testRigor – “Why Cucumber and SpecFlow Died?” – Analyzes key reasons behind declining usage of these tools. https://testrigor.com/blog/why-cucumber-and-specflow-died/
- Zhimin Zhan – “SpecFlow is Dying. Another Prediction of Mine Proven Correct.” – Comments on the timing of SpecFlow;s deprecation and its implications. https://medium.com/@zhiminzhan/specflow-is-dying-another-prediction-of-mine-proven-correct-0795176e95a1
- Daniel Delimata – “Is Cucumber dying? Not so fast with this funeral!” – Argues that Cucumber remains relevant with continued community engagement. https://daniel-delimata.medium.com/is-cucumber-dying-not-so-fast-with-this-funeral-431014dc55cc
Always invite AI to the table, as Ethan Mollick emphasizes in "Co-Intelligence: Living and Working with AI". Why not welcome AI as the 4th Amigo in your BDD discovery meetings-- and beyond?
Ashwin Palaparthi at Ai4Testers™