Andrew Ng - Agentic AI talk
This is a good talk from Andrew on Agentic AI
He packs a lot of info and ideas into this talk.
Andrew starts with his view that "AI is the new electricity" - because its a general purpose technology to allow building of new applications
Andrew provides an overview of the current AI tech stack, but adds a new layer: an "agentic orchestration layer"
- applications: credo, woebothealth, workera, meeno etc.
- new agentic orchestration layer: langchain, crewai etc.
- models: openAi, anthropic, llama
- infrastructure: aws, google, azure
- chips: nvidia, amd etc.
Most of the opportunity will be in building new AI applications (even though models are currently getting the most attention). Andrew said he is "most excited" by Agentic AI (AI agents). Agentic AI is an automonous workflow which uses advanced reasoning to solve problems by reasoning, trial and error and iteration. "zero shot" when you ask a question and get back an answer e.g. original chatGpt. Agentic workflow takes longer, refines the answer, iterates on "drafts". Agentic workflows can provide much better results.
Andrew highlights 4 major design patterns in Agentic workflows:
- Reflection
- take answers from llm and feed it back in to get an improved answer i.e. the llm is critiquing its own output
- Andrew talked about having 2 lls: one to write the code and the other to critique the work, both collaborating and iterating
- Tool use (api calls)
- llms can make api calls
- Planning (decide on steps to ask)
- llm takes a complex request and pick a a sequence of actions to execute to solve the request
- e.g. request "generate an image when a girl is reading a book and her pose is the same as the boy in the image example.jgp then describe the image in your own voice"
- Andrew explained this could be broken into multiple steps to deliver on the complex task
- pose determination
- post to image
- image to text
- text to speech
- "take a task and break into subtasks"
- Multi agent collaboration
- can choose models which are specialized to a task e.g image to text may be best done by a "vit-gpt2 model" whereas text to speech may be best done by a "fastspeech model"
Andrew gave a demo of an app he developed va.landing.ai which uses agentic workflows for ai tasks. Effectively an agentic orchestration layer for video processing. He showed how an image was uploaded and then user can ask a question about the image "how many players in the image". The agentic workflow solved the problem and also provides python code for the user to use processing many images.
- Andrew said this could be used to access a companys videos and process, extract value
- Could process videos to extract metadata and build apps on top of that data. e.g. you could search a library of videos for matches e.g. "find all airborne skiers un the video"
Supervised learning which could take months to 1. assemble the data 2. train the AI model on the data and 3. deploy the model
Is now being replaced by much faster LLM based development which may take days/weeks to create a prototype and test it. Being able to build prototypes quickly allows teams to try lots of ideas quicly
- The bottle neck is now moving to Integration and Dev Ops.
Comments
Post a Comment