Posts

anthropic mythos

Anthropics new model Mythos , scores significantly higher than Opus in a number of benchmarks and has found security bugs which have been present for decades in some os software. How much is hype vs reality, idk, but. Anthropic formed  project glasswing with some big names to manage security vulnerabilities. Theo is worried  (though you'll see some (funny) skepticism in the comments) Having software (browser, devices etc.) be up to date with latest versions is now non negotiable, has to be current. btw I read training cost for Mythos was $10 billion 😮 

From Claude Code to Figma – and Back Again - my notes

From Claude Code to Figma – and Back Again presentation link - presentation by Anthropic Thariq Shihipar from Anthropic and Brett McMillin from Figma - I had (and others) trouble connecting so missed first ~10 mins. But I understand was about installing figma mcp - Brett: roles are blending, workflows are blending, ideas can start from anywhere - Thariq: figma mcp allows Claude Code to go both ways; mcp allows you to get all of you data into agents - Brett introduced useFigma (in beta) which allows create or modify any design in figma   - recommended: load the "figma-use" skill in Claude code - Thariq jumped around a figma design doc - demod building a html web app from a figma   - then made changes in figma and had claude pick it up - Thariq showed how he used Claude code to generate a figma design, "good for starting"   - did they run the prompt in figma? ...it looks like it   - "figma canvas" - prompting playground, free tokens - Brett showed e...

cursor composer 2 model is a lot cheaper than Claude Opus

Image
"tokens are the currency of LLMs", and as usage increases so do costs, and these costs are no longer small change for organizations, t he messaging has definitely shifted to being cost conscious. cursors Composer 2 model is cursors own model available in cursor cursors composer model is cheap, 10x cheaper than Claude Opus 4.6/4.7 per million output tokens $2.5 versus $25 (and is also 10x cheaper for input tokens)  6x cheaper than Sonnet wow! that's really significant and from the chart below, from cursor benchmarks, indications are it performs well compared to Opus 4.6 imo it makes total sense to use cursors Composer 2 model as default  why not "Auto"? I've not used Auto so I can't comment from experience but I have read that it can be unpredictable and costly. Composer 2 works perfectly well for me If in Claude code then use Opus for complex tasks otherwise user Sonnet e.g. "write unit tests for Component Y" use Sonnet

AI maturity levels - understanding the software engineering disruption

Image
The last 2 to 3 years have been a whirlwind of disruption in Software engineering due to the impact of AI. I was seeking about a way to represent an "ai maturity" scale to make sense of the changes and I think this video based on  this article from Dan Shapiro is a good representation. From these I created a graphic which represents visually key points as I understand it. Remember about 2 or 3 years ago when you first used copilot in vscode? the predictive typeahead...pretty cool eh. Then we all moved on to cursor and its much better typeahead. Btw what a great acquisition by cursor to buy Supermaven code completion in 2024 . I believe it more than repaid their investment given cursors since skyrocketed valuation. And then came ai chat for code gen. At the time codegen quality was low e.g. "write tests for function xyz". It used to get maybe 30% and then you finished the rest. Then we added rules to make the codegen better (we added 1000s of lines of rules for diff...

cypress e2e tests: page objects vs application actions vs custom commands

Should you use Application Actions when writing cypress tests? Or is the Page Objects pattern better? Or neither and just write tests and use cypress custom commands to share common selectors and user actions? Page Objects is a well established pattern. Create classes for pages in your app and put selectors in those classes. Then in your test use (and reuse) the page object classes methods to simulate user actions such as fill in a field or submit a form. This centralizes and reuses selector code. Application Actions pattern exposes the applications model as a property on window which can be directly edited in cypress test code. Now test setup to add todos so they can be toggled becomes basically like so: `window.appModel.addToDos([{}, {}])` The author provides examples of how to use Application actions including with async operations. Faster setup results in faster tests. And per the author better organized app code. The Application Actions post link argues that Page Objects are a ...

My learnings from Addy Osmanis article on how good is AI React coding

These are my notes from the article How Good Is AI at Coding React (Really)?  by Addy Osmani. There's a lot of information packed into this presentation Addy says that AI is a force multiplier. "It amplifies everything: good requirements, good architecture, good taste" AI is most useful for scenarios such as building isolated components, scaffolding, implementing explicit requirements. Its less useful for scenarios such as: multi-step integration, design taste, complex state management.  We can generalize this to: the higher the complexity the less useful (productive) is the LLM.  I called this same point out in a presentation I made in October to our tech leaders.  And in fact Addys says this explicity later in the article:  "If you remember nothing else from this article, remember this: AI handles simple tasks well and then falls off a cliff as complexity rises." I like that Addy calls out "Objective benchmarks". We've seen by now that LLM model ...

AI product idea: your teams performance dashboard and continuous review

Ah yes, "Welcome Performance reviews my old friend" as the song goes So I'm using AI to help me write my performance review. I use Glean to pull key data from conversations, documents, tickets and more. I use Gemini to help me write and summarize data. And my manager will use AI when writing their performance review of me. So why not skip all this and build a product around this: a performance dashboard and continuous review for your team Imagine a dashboard of your team viewable by timeframe such as: day, week, month etc. for all activity as well as ability to zero in on specific projects Imagine Agents with specific focus:  Performance review Agent; reviews personnel performance based on custom criteria Career Growth Agent: a coach and mentor for personnel career growth. Feedback tailored to the persons role as well as their next role Career change Agent: want to switch from an IC engineer to Manager role, we have an agent for that Parental Leave Agent: guides you throu...