ai engineer roadmap - Matt Pocock

These are my notes to Matt Pococks aihero.dev course "ai engineer roadmap"

https://www.aihero.dev/ai-engineer-roadmap


What is An AI Engineer

Matt said his course is partly based on article from Latent Space: "The rise of the ai engineer"

"An AI Engineer is a software developer who builds applications powered by AI"

An AI Engineer needs

  • strong software engineering fundamentals
  • ability to build reliable scaleable applications
  • knows modern AI tools and frameworks
  • focuses on user experience and applications

AI Engineers - use models, use prompt engineering, pipelines/chains, optimize cost/speed/performance, infra, evaluate systems.

Different to an ML engineer who: build and train models, heavy math, phd level skills

Yes, web developers can become AI Engineers. And Typescript is growing fast and makes a great fit for AI systems (well Matt would say that, wouldn't he!..but I do think it's true)


What can you use LLMs for?

1. Converting unstructured to structured data e.g. emails, transcripts from calls, invoices etc.
2. Labelling and Classification i.e. given raw input, organize it
3. Answering questions e.g. chatbots and search engines
4. Agents - take action, interact with other systems

Not suitable for deterministic systems. If you can build a deterministic system you probably should


What is an LLM (Large Language Model)? 

  • An LLM is a massively compressed file (like a 1TB zip file) containing numbers which are the parameters of the model (compresses knowledge into numbers)
  • The parameters are the result of the models pre-training and represents the models understanding of the world 
    • pre-training takes a large amount of text data and compresses it into the parameters
    • pre-training -> parameters -> model
  • The number of parameters represents the models "brain". In general models with more parameters perform better but slower. Smaller parameter sizes may hallucinate more.

Inference

  • inferences is the process of sending test to the model and getting a response back
  • done using an inference function, which is software which takes the parameters of the model and runs the algorithm to get a response
  • inference is far cheaper than pre-training

Sampling strategy

  • sampling is choosing a subset from a larger population so conclusions can be drawn from that
  • the inference function uses a sampling strategy for deciding how the model predicts
  • sampling strategies include: greedy sampling (always pick most likely), temperature sampling (introduce randomness), top k and top p sampling
    • you may have noticed you can adjust temperature of requests for different results
Input tokens
  • numbers representing words which are passed to the inference engine
  • text sent to a model needs to be tokenized, break up text into words and then into numbers
    • see this tool for example of converting words to numbers https://tiktokenizer.vercel.app/?model=google%2Fgemma-7b
  • tokens are not 1:1 with words, usually more tokens than words because works are broken up

Pre-training

Mentioned already, steps to create a model
  1. take pre-training data
  2. feed it into pre-training process (think tens of thousands of GPUs for weeks/months)
  3. generates parameters
  4. there's also a post-training process which shapes model parameters personality and behavior through careful instruction and example
Thinking models
  • a class of models which utilizes thinking strategies (backtrack, revisit assumptions etc.) to solve problems
  • not all models are thinking models e.g. openai names it's thinking "reasoning models" o* e.g. o1 is reasoning but 3.5 is not. But they since scrapped that "o" prefix and gpt4 and 5 are both reasoning models.

References: 

deep dive into LLMs - Andrei Karpathy
how I use LLMs - Andrei Karpathy 
claude artifacts - idea generate diagrams (mermaid) of book chapter, code


5 questions to ask to choose LLM

  1. Open vs Closed?
    1. Open source models free to download and use.
    2. Closed models controlled by companies and you pay to use e.g. OpenAIs models, Google, Anthopic models etc.
  2. How much will it cost?
    1. Most models charge by tokens, both input and output tokens. "pay as you use model"
      1. see pricing calculator
    2. If you host yourself then you pay to host the model
    3. AWS Bedrock provides mix of both: token and hosting cost in one easy to use package
  3. How important is latency? (time takes for model to respond)
    1. impacted by
      1. size of model
      2. hardware its running on
      3. inference optimizations
  4. How to assess model performance?
    1. How accurate is the model responses? how well does it perform the task asked of it?
    2. There are benchmarks to evaluate but it's complicated and models which have hit high benchmarks have been bad in other areas.
    3. see lmarena.ai leaderboard,  e.g. for webdev
  5. How big a context window do I need?
    1. the number of tokens the model can see at a time, the larger the context window the more information the model has to predict
      1. measured in tokens
    2. but it's complicated, studies have shown deterioration of model performance even with context windows < 10k tokens
    3. model providers are increasing the context size all the time, latest now up to 250k token size
Best way to evaluate is to test with your data and use cases


The AI Engineer Mindset







Comments

Popular posts from this blog

angularjs ui-router query string parameter support

typescript notes (typeof, keyof, ReturnType, Parameters, Extract)

deep dive into Material UI TextField built by mui