My learnings from Addy Osmanis article on how good is AI React coding

These are my notes from the article How Good Is AI at Coding React (Really)? by Addy Osmani.

There's a lot of information packed into this presentation

Addy says that AI is a force multiplier. "It amplifies everything: good requirements, good architecture, good taste"

AI is most useful for scenarios such as building isolated components, scaffolding, implementing explicit requirements. Its less useful for scenarios such as: multi-step integration, design taste, complex state management. 

We can generalize this to: the higher the complexity the less useful (productive) is the LLM. I called this same point out in a presentation I made in October to our tech leaders. 

And in fact Addys says this explicity later in the article: "If you remember nothing else from this article, remember this: AI handles simple tasks well and then falls off a cliff as complexity rises."

I like that Addy calls out "Objective benchmarks". We've seen by now that LLM model evals can be biased to show the model in a better light. Wow! LLMs are phd level, so why does it struggle with complexity? 
That's because some benchmark tests are limited: oversimplified problems, limited task complexity, solutions seen before. These benchmarks/evals give a false impression of the models.

There are other evals and benchmarks which address these shortcomings and provide a more realistic picture of model performance:

  • Nextjs evals is one such benchmark which shows success rate of approx 42% on tests (https://nextjs.org/evals) 
  • SWE Bench Pro - models score <= 43% success rate on tasks (https://scale.com/leaderboard/swe_bench_pro_public)


React apps are not just code, they're products with user experiences which also include reliability, security, performance, accessibility. React devs should pay attention to quality, user experience quality, code quality, aesthetic and more. 

Addy references these eval sites:

  • Design Arena (https://www.designarena.ai/) where users rank ai generated content
  • Web Dev arena (https://lmarena.ai/leaderboard/webdev) how well models generate websites from prompts and follow ups


Comments

Popular posts from this blog

angular js protractor e2e cheatsheet

angularjs ui-router query string parameter support

angular js - ngCookie