How to Use OpenAI Codex for End-to-end Testing | by Júlio Almeida

Automating the automated

1* S

This week I was planning and choosing what type of end-to-end testing tool should I use for a certain project, even though my based brain screams cypress all the way. But the scream is always higher regarding OpenAI, using every situation to scream. As you may conclude the question here is:

“Can we build an agnostic tool that converts testing behavior to multiple frameworks?”

Let’s see!

These tools for end-to-end testing don’t have the best market share division in my opinion, but from what I have seen there are some tools in the spotlight:

  • Selenium: Takes a great part of the market share, but it’s not known to be the best tool. It was the first one appearing and therefore still in the first position. In this type of situation, most people will talk about selenium webdriver, since the product is a barebone start for another extension like the one mentioned.
  • cypress: A new version, much more user-friendly to use, built on top of Node that brings several tools and community along with it. Have some performance limitations in comparison with Selenium, but usually, that’s not the point. CI/CD pipelines are usually dirt cheap and speed is not the name of the game, so I wouldn’t be concerned.
  • puppeteer: It’s a module useful for testing and web scrapping, also easy to use. Doesn’t have the testing infrastructure of cypress, but was not the main focus.
  • A Microsoft tool, and the new kid of the block. Very similar to Cypress, but you are not stuck in the ecosystem, which is good, since cypress gives you some nice tools and some limitations that come along with it.

There are other tools in the market, but with some kind of vendor lock, that is not the point here.

As I just mention, the market offers two options: Code-oriented Testing (developer-friendly) or User-oriented Testing (no code). The first one tends to be open-source or gives freedom regarding how to run it. Needs to be done by a developer. The second one you get limited by the ecosystem, knowing nothing about what happens backstage, but a manager without any coding knowledge could also do the entire testing by itself, which would make sense for more enterprise-grade companies. None of these options gives you the ability to have the cake and eat it too, like using a low-code tool that generates the code and then using it elsewhere. OpenAI can solve this problem and even project a potential product.

First, let’s be clear on what we want with an image:

Modern languages Compilation vs OpenAI conversion logic

This image on the left contains an oversimplification of how now modern languages work, or rather .NET/Java work. These languages are converted to an intermediate language (IL/Bytecode) that would look like a pseudo-assembly, which would only be converted to assembly once the code runs on a certain machine with the proper framework installed.

The idea here is the same. We don’t want a static generation of tests every time, but we want an easy interface (e.g click on the interface) that would then be used to create our text for the OpenAI. This Document would then run to create tests in cypress or any other option.

Completion mode and the latest davinci engine

The new update that came out this week release a new version of davinci, as well new features, but we are going to stick with the completion now.

OpenAI Codex Document

This is the structure of the document, beautifully straightforward. A phrase telling what to do with the engine, followed by the URL to test. Then every test is separated by a stop sequence and contains a simple phrase that explains what the client wants to do. In our idea, this message would be generated after the user clicks on a button for example (using an hypothetical interface). After all, tests are written then we can generate the code.

Document with some completion help

It helps enormously to have the first part of the completion test. Makes the job and performance x100 better.

I recently did a template to test some stuff to use in the future. It’s here and deployed ready to be played around. Let’s design two tests:

  • A test to write in every question, step over each question, and then clear the content. Also, make sure that the button is disabled when the inputs are empty.
  • A test that changes the theme
1*e9fY5T7CTqdD DuIQahW3Q
The generated result of the tests above

As you can see, this works extremely well, but if this was production-ready, you would need a pretty complex decision tree to cover all the phrases, otherwise, your code would be unreadable.

Notice that each line of text is a line of code, written in such a way that makes sense to the AI. Sometimes I mention the id of the component, as well the text that it contains. But the best one is the stepper, I explained it as an array, and he got it, otherwise, he would have done a code with direct access, which would be pretty ugly.

So let’s see the result:

Running Cypress

Works nice and looks clean. This could be done for everything, for Jest or any other unit testing tool. I also converted xUnit tests to Jest and worked flawlessly!

OpenAI is becoming something that I didn’t expect until the release of GTP-4. The first version was kinda floppy, basically a wrapper for your fine-tuning. It’s looking increasingly more aware, correcting mistakes, and even doing what you want, without that much try and error.

This article is a teaser for something that will be talked about for a possible startup since the business model makes sense, and correctly I hold the skills to do it once the world gets better.

News Credit

%d bloggers like this: