ais: the good, the bad, the ugly - the ugly
There are some things AI coding assistants are not good of assisting at this time. But because AI assistants always want to predict, it will. And ais will sound very confident in the prediction. Sometimes the answer can be completely misleading and just garbage. Which is much worse than no answer at all.
We recently upgraded our react version and some cypress tests for the app then started failing. I did some investigation and could see an unexpected failing api call in the tests. Why?
- I asked the llm for assistance with some context such as new version of react, some libraries, details on the test failure.
- It's prediction was a reasonable guess given the prompt but a wrong guess.
It suggested a bunch of code changes such as the following:
1. adding timeouts to every cypress element check e.g.
`cy.contains('button', 'Save', { timeout: 10000 }).should('be.visible').click();`
2. new cypress helper functions to handle a double action and then changed tests to use the new `waitForAction` command
```
// Add to your cypress/support/commands.js
Cypress.Commands.add('handleDoubleAction', (actionFn, expectedCalls = 1) => {
// For React 18 Strict Mode, actions might be called twice
// This command helps handle that gracefully
let callCount = 0;
return new Cypress.Promise((resolve) => {
const originalFn = actionFn;
actionFn = (...args) => {
callCount++;
if (callCount <= expectedCalls) {
return originalFn(...args);
}
// Ignore subsequent calls in Strict Mode
return Promise.resolve();
};
resolve(actionFn);
});
});
// Command to wait for actions to complete (handles double-invocation)
Cypress.Commands.add('waitForAction', (alias, timeout = 10000) => {
// In React 18 Strict Mode, the same action might be called twice
// Wait for the first one to complete
cy.wait(alias, { timeout });
// If there's a second call, wait for it too but don't fail if it doesn't happen
cy.wait(alias, { timeout: 5000 }).then(() => {
// Success - both calls completed
}).catch(() => {
// Only one call happened - that's fine
});
});
```
Accepting the prediction would add almost 500 new lines of code to our codebase!
AI also created a new ~200 line script file to migrate existing cypress tests to work in the new react version i.e. code mod the code to use the new timeout version. Crazy.
Wanna know what the fix actually was?
Its a 2 line fix in 2 places to add `type="button"` to two html buttons!
Without a type, the default behavior for html button is submit. That default behavior was submitting a form unexpectedly.
I added the "type" prop fix and the cypress tests all pass once again.
I don't yet know why we only saw this failure in the cypress tests (and not when running manually), but it makes sense. I plan to do a follow up refactor to improve on the current code.
How did I find the root cause? By studying the code and runtime behavior.
How did I find the root cause? By studying the code and runtime behavior.
Adding console.logs and reading them in the cypress runner. Then reading the code in that part of the system and from those learnings figuring out that a form must be being submitted and the html button must be doing it.
I try to explain to people that there are some things the code assistant AIs are good at and some things they are not. Sometimes I feel that message is not always getting through, folks think: "AI can generate hundreds of lines of code, that's magic and amazing...so it must be amazing at everything".
Not true.
In general the more deterministic the outcome, the better the AI will be e.g. given an initial state example, write a typed reducer function. Standard, well understood pattern. You'll usually get a good code result.
The less deterministic and it can be waaay off. e.g. find root cause for an unusual bug.
The AI will still predict. And its prediction will sound convincing. But in some use cases the prediction is garbage.
I try to explain to people that there are some things the code assistant AIs are good at and some things they are not. Sometimes I feel that message is not always getting through, folks think: "AI can generate hundreds of lines of code, that's magic and amazing...so it must be amazing at everything".
Not true.
In general the more deterministic the outcome, the better the AI will be e.g. given an initial state example, write a typed reducer function. Standard, well understood pattern. You'll usually get a good code result.
The less deterministic and it can be waaay off. e.g. find root cause for an unusual bug.
The AI will still predict. And its prediction will sound convincing. But in some use cases the prediction is garbage.
If you don't know better. If you're under pressure (deadline etc.). If you're stressed. You may just accept the answer and boom! you've just added a bunch of garbage code which doesn't belong, may well cause other problems/bugs and will only confuse other devs.
My bet is over time (in some cases already) this will happen more and more. Such will be the pressure to accept. Even devs who should know better will fall into that trap. Critical thinking and attention to detail is every more important.
Comments
Post a Comment