ai engineer roadmap - Matt Pocock

July 27, 2025

These are my notes to Matt Pococks aihero.dev course "ai engineer roadmap"

https://www.aihero.dev/ai-engineer-roadmap

What is An AI Engineer

Matt said his course is partly based on article from Latent Space: "The rise of the ai engineer"

"An AI Engineer is a software developer who builds applications powered by AI"

An AI Engineer needs

strong software engineering fundamentals
ability to build reliable scaleable applications
knows modern AI tools and frameworks
focuses on user experience and applications

AI Engineers - use models, use prompt engineering, pipelines/chains, optimize cost/speed/performance, infra, evaluate systems.

Different to an ML engineer who: build and train models, heavy math, phd level skills

Yes, web developers can become AI Engineers. And Typescript is growing fast and makes a great fit for AI systems (well Matt would say that, wouldn't he!..but I do think it's true)

What can you use LLMs for?

1. Converting unstructured to structured data e.g. emails, transcripts from calls, invoices etc.

2. Labelling and Classification i.e. given raw input, organize it

3. Answering questions e.g. chatbots and search engines

4. Agents - take action, interact with other systems

Not suitable for deterministic systems. If you can build a deterministic system you probably should

What is an LLM (Large Language Model)?

An LLM is a massively compressed file (like a 1TB zip file) containing numbers which are the parameters of the model (compresses knowledge into numbers)
The parameters are the result of the models pre-training and represents the models understanding of the world

pre-training takes a large amount of text data and compresses it into the parameters
pre-training -> parameters -> model

The number of parameters represents the models "brain". In general models with more parameters perform better but slower. Smaller parameter sizes may hallucinate more.

Inference

inferences is the process of sending test to the model and getting a response back
done using an inference function, which is software which takes the parameters of the model and runs the algorithm to get a response
inference is far cheaper than pre-training

Sampling strategy

sampling is choosing a subset from a larger population so conclusions can be drawn from that
the inference function uses a sampling strategy for deciding how the model predicts
sampling strategies include: greedy sampling (always pick most likely), temperature sampling (introduce randomness), top k and top p sampling

you may have noticed you can adjust temperature of requests for different results

Input tokens

numbers representing words which are passed to the inference engine
text sent to a model needs to be tokenized, break up text into words and then into numbers

see this tool for example of converting words to numbers https://tiktokenizer.vercel.app/?model=google%2Fgemma-7b

tokens are not 1:1 with words, usually more tokens than words because works are broken up

Pre-training

Mentioned already, steps to create a model

take pre-training data
feed it into pre-training process (think tens of thousands of GPUs for weeks/months)
generates parameters
there's also a post-training process which shapes model parameters personality and behavior through careful instruction and example

Thinking models

a class of models which utilizes thinking strategies (backtrack, revisit assumptions etc.) to solve problems
not all models are thinking models e.g. openai names it's thinking "reasoning models" o* e.g. o1 is reasoning but 3.5 is not. But they since scrapped that "o" prefix and gpt4 and 5 are both reasoning models.

References:

deep dive into LLMs - Andrei Karpathy

how I use LLMs - Andrei Karpathy

claude artifacts - idea generate diagrams (mermaid) of book chapter, code

5 questions to ask to choose LLM

Open vs Closed?

Open source models free to download and use.
Closed models controlled by companies and you pay to use e.g. OpenAIs models, Google, Anthopic models etc.

How much will it cost?

Most models charge by tokens, both input and output tokens. "pay as you use model"

see pricing calculator

If you host yourself then you pay to host the model
AWS Bedrock provides mix of both: token and hosting cost in one easy to use package

How important is latency? (time takes for model to respond)

impacted by

size of model
hardware its running on
inference optimizations

How to assess model performance?

How accurate is the model responses? how well does it perform the task asked of it?
There are benchmarks to evaluate but it's complicated and models which have hit high benchmarks have been bad in other areas.
see lmarena.ai leaderboard, e.g. for webdev

How big a context window do I need?

the number of tokens the model can see at a time, the larger the context window the more information the model has to predict

measured in tokens

but it's complicated, studies have shown deterioration of model performance even with context windows < 10k tokens
model providers are increasing the context size all the time, latest now up to 250k token size

Best way to evaluate is to test with your data and use cases

Search This Blog

Health to the Web