ai engineer roadmap - Matt Pocock
These are my notes to Matt Pococks aihero.dev course "ai engineer roadmap"
https://www.aihero.dev/ai-engineer-roadmap
What is An AI Engineer
Matt said his course is partly based on article from Latent Space: "The rise of the ai engineer"
"An AI Engineer is a software developer who builds applications powered by AI"
An AI Engineer needs
- strong software engineering fundamentals
- ability to build reliable scaleable applications
- knows modern AI tools and frameworks
- focuses on user experience and applications
AI Engineers - use models, use prompt engineering, pipelines/chains, optimize cost/speed/performance, infra, evaluate systems.
Different to an ML engineer who: build and train models, heavy math, phd level skills
Yes, web developers can become AI Engineers. And Typescript is growing fast and makes a great fit for AI systems (well Matt would say that, wouldn't he!..but I do think it's true)
What can you use LLMs for?
What is an LLM (Large Language Model)?
- An LLM is a massively compressed file (like a 1TB zip file) containing numbers which are the parameters of the model (compresses knowledge into numbers)
- The parameters are the result of the models pre-training and represents the models understanding of the world
- pre-training takes a large amount of text data and compresses it into the parameters
- pre-training -> parameters -> model
- The number of parameters represents the models "brain". In general models with more parameters perform better but slower. Smaller parameter sizes may hallucinate more.
Inference
- inferences is the process of sending test to the model and getting a response back
- done using an inference function, which is software which takes the parameters of the model and runs the algorithm to get a response
- inference is far cheaper than pre-training
Sampling strategy
- sampling is choosing a subset from a larger population so conclusions can be drawn from that
- the inference function uses a sampling strategy for deciding how the model predicts
- sampling strategies include: greedy sampling (always pick most likely), temperature sampling (introduce randomness), top k and top p sampling
- you may have noticed you can adjust temperature of requests for different results
- numbers representing words which are passed to the inference engine
- text sent to a model needs to be tokenized, break up text into words and then into numbers
- see this tool for example of converting words to numbers https://tiktokenizer.vercel.app/?model=google%2Fgemma-7b
- tokens are not 1:1 with words, usually more tokens than words because works are broken up
Pre-training
- take pre-training data
- feed it into pre-training process (think tens of thousands of GPUs for weeks/months)
- generates parameters
- there's also a post-training process which shapes model parameters personality and behavior through careful instruction and example
- a class of models which utilizes thinking strategies (backtrack, revisit assumptions etc.) to solve problems
- not all models are thinking models e.g. openai names it's thinking "reasoning models" o* e.g. o1 is reasoning but 3.5 is not. But they since scrapped that "o" prefix and gpt4 and 5 are both reasoning models.
References:
5 questions to ask to choose LLM
- Open vs Closed?
- Open source models free to download and use.
- Closed models controlled by companies and you pay to use e.g. OpenAIs models, Google, Anthopic models etc.
- How much will it cost?
- Most models charge by tokens, both input and output tokens. "pay as you use model"
- If you host yourself then you pay to host the model
- AWS Bedrock provides mix of both: token and hosting cost in one easy to use package
- How important is latency? (time takes for model to respond)
- impacted by
- size of model
- hardware its running on
- inference optimizations
- How to assess model performance?
- How accurate is the model responses? how well does it perform the task asked of it?
- There are benchmarks to evaluate but it's complicated and models which have hit high benchmarks have been bad in other areas.
- see lmarena.ai leaderboard, e.g. for webdev
- How big a context window do I need?
- the number of tokens the model can see at a time, the larger the context window the more information the model has to predict
- measured in tokens
- but it's complicated, studies have shown deterioration of model performance even with context windows < 10k tokens
- model providers are increasing the context size all the time, latest now up to 250k token size
Comments
Post a Comment