Building an Automated Test Framework with AI in Record Time

Andrew Orange

Most of us in tech are fascinated by the relentless wave of AI tools we’re being bombarded with on an almost weekly basis. The pace of change is staggering, and it’s hard not to be drawn in.

My last mandate was to help an AI company improve quality. The rough goal? Build a solid test automation framework that developers can simply drop tests into. Ambitious? Definitely—especially with just 7 weeks to get it done.

How I Approach Testing

I always start with exploratory testing and test charters. My charters follow the template proposed by Elisabeth Hendrickson in Explore It:

Explore [some target] With [resource] To discover [particular information]

The outcomes of a test charter can be:

  • #bug – A bug!
  • #finding – Something interesting, but not necessarily a bug
  • #idea – A thought for the team to consider
  • #question – Something that needs clarification
  • #answer – A response to a previous question
  • #risk – A potential issue we need to mitigate
  • #todo – A task to be tackled

I spend about a week on this, raising bugs as I go and asking the team a lot of questions. This helps pinpoint the pain points—where safety nets should be placed and, crucially, where and how to automate.

Do we need more unit tests? Data journeys (DB ↔ API ↔ UI)? User journeys across multiple UIs or within a single UI? Structured exploratory sessions? E2E testing? Often, it’s a mix of all of the above.

Generative AI can introduce a wealth of new testing ideas, but they need to be structured properly—something Mark Winteringham covers brilliantly in his book, Software Testing with Generative AI.

AI-Powered Test Automation

To build the automated test cases, I leaned heavily on AI. Given that I was working with an AI company, I asked them directly: What would you recommend to speed up my work while maintaining quality?

They introduced me to two game-changing tools:

  1. Windsurf – I'd been using Copilot and was considering Cursor, but they told me to skip those and go straight to Windsurf.
  2. Warp – Previously, I was using mods, but Warp is a CLI with AI built in from the ground up.

Long story short: in 7 weeks, two of us from House of Test Switzerland built and integrated two testing frameworks into their pipeline—among other achievements we were particularly proud of.

How Fast Can AI Build a Test Framework?

Once the project was wrapped up, I started thinking: How quickly could I build an automated testing framework using AI tools?

So, I gave it a go with houseoftest.ch.

  1. Created an empty project folder and opened it in Windsurf.
  2. Prompted: I need a new project. I have to test this website using Playwright in Python: https://www.houseoftest.ch/. Cover all menu items and languages. Tests should run via pytest.

Fifteen minutes later, I had a framework with tests. I told Windsurf to run them, and—of course—they all failed. So then I:

  1. Told it to fix the tests and add email reporting. It spent 45 minutes adding the email code, refactoring, debugging, stabilising, and even generating analysis tools.
  2. One hour later, I had exactly what I needed.

Next, I asked it to generate a README.md so devs could install the framework. Specifically, I told it to ensure the instructions included the possibility to deploy on an Ubuntu cloud instance. 10 seconds later: done.

I deployed the code to my preferred cloud provider (lunanode), had to tweak the README slightly (Windsurf fumbled some Ubuntu-specific install steps), then moved on to scheduling the test execution.

I asked Warp: What’s the best scheduling tool for Linux?

It replied that cron is still a solid option and asked if it should create the crontab entry. Yeah, why not? Boom—tests scheduled every hour, with reports sent directly to my boss. 🙂

AI is a Tool—Not a Replacement

AI is powerful, but using these tools blindly won’t get you far. As test engineers, we need to:

  • Identify the weak spots in the system
  • Determine what work needs doing (framework? pipeline? experimentation?)
  • Spot communication gaps
  • Ensure teams are aligned on test and release strategies

It’s all about finding what’s required and choosing the best tool for the job.

Let us know if you need any help! 🚀