Article

February 1, 2026

How to Make Tests with AI: From Prompt to Passing Test in Minutes

You can make tests with AI by describing the user flow you want to test in plain language, then letting an AI testing platform generate the automation code, selectors, and assertions. What used to take hours of Playwright or Selenium scripting now takes minutes—and the resulting tests are often more reliable because AI-generated locators adapt to UI changes automatically.

This guide walks through the practical process of making tests with AI, including prompt strategies that produce better tests, common pitfalls, and how to decide when AI-generated tests are the right choice.

The Old Way vs. Making Tests with AI

Here's a scenario that's become painfully common: Your team updates the onboarding flow. Just UI changes, nothing behavioral. But a bunch of Playwright tests (all written by AI coding assistants) break anyway. Flakes start piling up. Within a week, your CI is basically decorative—everything fails, so no one pays attention to failures anymore.

The real kicker? There was an actual onboarding bug in production. But it got lost in the noise. When everything's flaky, nothing feels urgent.

This is why making tests with AI matters: not just for speed, but for signal quality.

Consider a simple E2E test: verify that a user can add an item to their cart and complete checkout.

Traditional approach (Playwright):

test('checkout flow', async ({ page }) => {
  await page.goto('/products');
  await page.click('[data-testid="product-1"]');
  await page.click('[data-testid="add-to-cart"]');
  await page.click('[data-testid="cart-icon"]');
  await page.click('[data-testid="checkout-btn"]');
  await page.fill('#email', 'test@example.com');
  await page.fill('#card-number', '4242424242424242');
  // ... 30 more lines of selectors and assertions
  await expect(page.locator('.confirmation')).toBeVisible();
});

test('checkout flow', async ({ page }) => {
  await page.goto('/products');
  await page.click('[data-testid="product-1"]');
  await page.click('[data-testid="add-to-cart"]');
  await page.click('[data-testid="cart-icon"]');
  await page.click('[data-testid="checkout-btn"]');
  await page.fill('#email', 'test@example.com');
  await page.fill('#card-number', '4242424242424242');
  // ... 30 more lines of selectors and assertions
  await expect(page.locator('.confirmation')).toBeVisible();
});

test('checkout flow', async ({ page }) => {
  await page.goto('/products');
  await page.click('[data-testid="product-1"]');
  await page.click('[data-testid="add-to-cart"]');
  await page.click('[data-testid="cart-icon"]');
  await page.click('[data-testid="checkout-btn"]');
  await page.fill('#email', 'test@example.com');
  await page.fill('#card-number', '4242424242424242');
  // ... 30 more lines of selectors and assertions
  await expect(page.locator('.confirmation')).toBeVisible();
});

This test might take 30-60 minutes to write, debug, and stabilize. And it's brittle—if someone renames checkout-btn to proceed-to-checkout, the test breaks.

AI approach:

"Add the first product to cart, go to checkout, complete purchase with test payment details, and verify the order confirmation appears."

The AI generates the test in under a minute. When the button name changes, the AI recognizes it by context and adjusts automatically. According to 2025 data, teams using AI for test creation report 50% faster scripting time compared to traditional tools.

The AI Coding Agent Problem

Here's the context that makes this urgent: 82% of developers now use AI coding assistants daily or weekly. Tools like Claude Code, Cursor, and GitHub Copilot accelerate development dramatically—but they also accelerate bug creation.

In 2025, a Replit AI assistant deleted a company's production database during a code freeze, then tried to cover its tracks. The lesson wasn't that AI coding is bad—it's that AI-generated code needs AI-speed validation.

When your team ships multiple times a day with AI assistance, traditional test automation becomes the bottleneck. Making tests with AI creates the validation layer that matches your development velocity.

Step-by-Step: Making Your First AI Test

Step 1: Choose a critical flow

Start with a flow that matters to your business and breaks frequently. Good candidates:

User signup/onboarding
Checkout or payment
Core product actions (creating, editing, sharing)
Authentication flows

Don't start with edge cases. AI testing tools work best on well-traveled paths where they can leverage common patterns.

Step 2: Describe the flow (or record it)

You have two main approaches:

Describe in natural language. Write your test description as if you're explaining the flow to a new team member. Include the starting point, actions in order, and how you know it worked.

Record the flow. Perform the actions yourself while the tool watches. This gives you precise control—useful for complex flows or when you know exactly what edge cases matter. The AI enhances your recording with intelligent selectors and handles maintenance automatically.

Most teams use both: record critical flows where precision matters, describe simpler scenarios in natural language, and let the AI suggest additional coverage you might have missed. Tools like Decipher (that's us 👋) support all three approaches, so you can pick whichever fits each situation.

For natural language descriptions, include:

Starting point: Where does the flow begin?
Actions: What does the user do, in order?
Verification: How do you know it worked?

Example prompts that work well:

✅ "Starting from the homepage, search for 'wireless headphones', click the first result, add it to cart, proceed to checkout, enter test shipping address (123 Test St, San Francisco, CA 94102), complete payment with card 4242424242424242, and verify the order confirmation shows the correct product."

✅ "Log in as test user (email: qa@test.com, password: testpass123), navigate to Settings, change the display name to 'Updated Name', save changes, and verify the new name appears in the header."

Prompts that produce poor tests:

❌ "Test the checkout" (too vague)

❌ "Make sure everything works" (no specific verification)

❌ "Click all the buttons on the page" (not a real user flow)

Step 3: Review and refine the generated test

AI-generated tests aren't perfect on the first try. Review the test for:

Completeness: Did the AI capture all the steps you intended?
Assertions: Are the verification checks meaningful, or just "page loaded"?
Data: Does the test use appropriate test data, or will it create garbage in your system?

Most AI testing tools let you edit the generated test in natural language: "After adding to cart, also verify the cart count badge shows '1'."

Step 4: Run and stabilize

Run the test multiple times across different browsers or viewports. If it passes consistently, you're done. If it's flaky, investigate:

Timing issues: Does the AI wait for elements, or race against page loads?
Dynamic content: Are there random elements (ads, recommendations) that interfere?
Test data: Does the test depend on specific data that might change?

Step 5: Add to your CI pipeline

Once stable, integrate the test into your continuous integration workflow. Configure it to run on pull requests or deployments, and route failures to the appropriate team.

Prompt Engineering for Better AI Tests

The quality of your AI-generated tests depends heavily on how you describe them. Research shows teams can improve AI testing effectiveness by 40% with better prompting strategies. Here's what works:

Be specific about verification

❌ "Check that it works"

✅ "Verify the success message 'Your order has been placed' appears, the cart is empty, and the order number is displayed"

Include realistic test data

❌ "Fill out the form"

✅ "Enter shipping address: 500 Tech Lane, Austin TX 78701. Use phone 555-123-4567."

Describe the expected state, not just actions

❌ "Click submit"

✅ "Click submit and verify the form disappears, a confirmation email is mentioned, and the user is redirected to the dashboard"

Handle conditional flows

✅ "If a cookie consent banner appears, click 'Accept'. Then proceed to login..."

Specify negative cases explicitly

✅ "Attempt to checkout with an invalid card number (4000000000000002). Verify an error message appears and the order is not placed."

When AI-Generated Tests Work Best

AI testing excels in specific scenarios:

High-coverage expansion. If you have 20 tests and need 200, AI can generate the bulk of them quickly. Manual refinement is still needed, but the heavy lifting is automated.

Regression testing. Stable, critical flows that you test repeatedly are perfect for AI—the tests run constantly, and self-healing handles minor UI changes.

Smoke testing. Quick verification that core functionality works after deployment. AI tests are reliable enough for "did we break something obvious?" checks.

Democratizing test creation. When PMs or designers can describe tests in natural language, testing becomes a team responsibility rather than an engineering bottleneck. Studies show 74% of testing professionals identify as beginners in AI, making accessible tools critical for adoption.

When to Write Tests Manually Instead

AI isn't always the right choice:

Complex business logic. If the test requires intricate state setup, specific data combinations, or verification of backend calculations, hand-written tests give you more control.

Performance testing. AI E2E tools focus on functional correctness, not load testing or performance benchmarks.

Visual regression. Some AI tools include visual testing, but dedicated tools like Percy or Chromatic are usually more capable for pixel-perfect verification.

Security testing. AI testing tools don't replace security scanners or penetration testing.

Making AI Tests Part of Your Workflow

The teams that succeed with AI testing don't just generate tests—they build a sustainable process:

Review AI tests like code. Even AI-generated tests deserve review. Check that they test what matters, use appropriate test data, and won't create side effects.

Connect tests to production impact. Prioritize tests for flows where bugs have historically caused customer issues. AI can generate tests for everything, but not everything needs testing.

Monitor test health. Track flakiness rates and fix unstable tests quickly. A noisy test suite that everyone ignores is worse than no tests at all.

Iterate on prompts. Save effective prompts and descriptions as templates for future tests. Your "checkout test" prompt will improve over time.

Frequently Asked Questions

Can AI really write tests for me?

Yes. Modern AI testing tools can generate functional E2E tests from natural language descriptions. You describe the user flow you want to test, and the AI produces working automation code with appropriate selectors, waits, and assertions. The tests typically require some refinement, but the core automation is handled by the AI. Teams report reducing test authoring time by 50% or more compared to manual scripting.

What do I need to make tests with AI?

To make tests with AI, you need access to your application (either a staging environment or production), an AI testing platform (options range from free tiers to enterprise solutions), and a clear understanding of what flows you want to test. No automation engineering expertise is required—that's the point of AI testing. You can either describe flows in plain English or record them yourself; the AI handles the technical implementation and ongoing maintenance.

How do I write a good prompt for AI test generation?

Write prompts that include four elements: the starting point (where the test begins), specific user actions in sequence, realistic test data (addresses, names, card numbers for testing), and clear verification criteria (what indicates success). Be as specific as possible about what you expect to see at the end. Avoid vague instructions like "make sure it works" in favor of concrete checks like "verify the confirmation page shows the order total."

Are AI-generated tests reliable?

AI-generated tests are generally more reliable than hand-written tests for UI automation because they use intelligent element identification rather than brittle selectors. When a button moves or gets renamed, AI tests often continue working while traditional Playwright or Selenium tests break. However, AI tests aren't magic—they can still fail due to timing issues, environmental differences, or genuine bugs. Teams typically see self-healing rates of 90-95% for common UI changes.

How long does it take to make tests with AI?

Creating a single E2E test with AI typically takes 2-10 minutes depending on flow complexity, compared to 30-60 minutes for manual automation. The time is spent describing the flow and refining the generated test. Complex flows with many branches or conditional logic take longer. Most teams can build initial coverage of their core user journeys in a single day rather than weeks.

Can non-engineers make tests with AI?

Yes. The natural language interface of AI testing tools means anyone who can describe a user flow can create a test. Product managers, designers, and QA analysts can all contribute to test coverage without writing code. This democratization is one of the primary benefits of AI testing, as it removes the engineering bottleneck from quality assurance. However, engineers should still review AI-generated tests to ensure they're appropriate and well-designed.

How do AI-generated tests handle dynamic content?

AI testing tools use context and intent rather than exact matching to handle dynamic content. If your page includes personalized recommendations, timestamps, or random elements, the AI typically focuses on the structural elements of the flow rather than specific content. For verification, you can instruct the AI to check for patterns ("verify an order number in format ORD-XXXXX appears") rather than exact values.

Getting Started

Making tests with AI isn't a future promise—it's a practical option for engineering teams today. The technology has matured to the point where AI-generated tests are often more stable and faster to create than hand-written automation.

Start with a single critical flow. Describe it clearly. Generate the test. Refine it until it passes reliably. Then add another. Within a week, you can have meaningful E2E coverage that would have taken months with traditional approaches.

Decipher lets you make tests by recording flows, describing them in plain language, or letting an agent suggest coverage. Tests generate in minutes and update automatically as your product evolves. See how it works.

Written by:

Michael Rosenfield

Co-founder

Share with friends:

Share on X