Link to original video by Andrej Karpathy

How I use LLMs

Short Summary:

This video explores the practical applications and functionalities of Large Language Models (LLMs), focusing on how the speaker utilizes them in daily life and professional work. Key points include the evolution of LLMs from ChatGPT to various other models (Gemini, Claude, Grok), the importance of understanding model capabilities (thinking vs. non-thinking models), the use of tools like internet search and Python interpreters, and the integration of different modalities (audio, image, video). The speaker demonstrates various applications, including data analysis, code generation, language learning, and content creation. The video highlights the importance of understanding the limitations of LLMs and emphasizes the need for critical evaluation of their outputs. Detailed processes like tokenization, conversation formatting, and the use of custom instructions and GPTs are explained.

Detailed Summary:

The video is structured into several sections, each focusing on a different aspect of LLM usage:

1. Introduction to LLMs and ChatGPT: The speaker begins by introducing LLMs and their evolution since the release of ChatGPT in 2022. He highlights the expanding ecosystem of LLMs, mentioning various models from Big Tech companies (Google, Meta, Microsoft) and startups (Anthropic, xAI). Leaderboards like Chatbot Arena and the Scale AI leaderboard are presented as resources for tracking model performance. The speaker emphasizes that while ChatGPT is the most popular and feature-rich, other models offer unique capabilities. He introduces the concept of tokens and tokenization, using a tokenizer app to demonstrate how text is processed by the model.

2. Understanding the LLM Entity: The speaker explains the training process of LLMs (pre-training and post-training), emphasizing that the pre-trained model's knowledge is limited to its training data's cutoff point. He likens the LLM to a "one TB zip file" containing the compressed knowledge of the internet and a personality shaped by post-training. He humorously introduces ChatGPT as: "Hi, I'm ChatGPT. I am a one TB zip file. My knowledge comes from the internet, which I read in its entirety about six months ago, and I only remember vaguely... My winning personality was programmed by example by human labelers at OpenAI."

3. Knowledge-Based Queries and Context Windows: The speaker provides examples of knowledge-based queries he uses with ChatGPT, emphasizing the importance of considering the model's knowledge cutoff and the frequency of information on the internet. He stresses the importance of starting new chats when switching topics to avoid overloading the context window (working memory).

4. Thinking Models: This section introduces "thinking models," LLMs trained with reinforcement learning to improve reasoning capabilities, particularly in math and code. The speaker compares the performance of a standard GPT-4 model and a "thinking" model (OpenAI's O series) on a programming problem, demonstrating the latter's superior problem-solving abilities. He notes that different providers have different approaches to implementing this functionality.

5. Tool Use (Internet Search): The speaker explains how LLMs can utilize tools like internet search to access up-to-date information. He demonstrates how ChatGPT and other models can perform searches, integrate the results into the context window, and provide answers based on the retrieved information. He provides numerous examples of queries best suited for this functionality, highlighting its usefulness for accessing recent information.

6. Deep Research: This section focuses on the "deep research" capability, a combination of internet search and thinking, available in some higher-tier LLM services. The speaker demonstrates its use in researching health supplements, comparing its output to similar features in Perplexity and Grok. He emphasizes that while this feature provides comprehensive reports and citations, it's crucial to verify the information independently.

7. File Uploads and Document Reading: The speaker demonstrates how to upload documents (PDFs, etc.) to the LLM's context window, enabling interaction with specific content. He shows how this feature facilitates reading and understanding complex documents, particularly those outside the user's area of expertise, using examples of scientific papers and classic literature.

8. Python Interpreter and Advanced Data Analysis: This section covers the integration of LLMs with programming languages (Python, JavaScript), allowing them to write and execute code to solve problems. The speaker demonstrates ChatGPT's ability to use a Python interpreter for complex calculations and advanced data analysis, creating charts and visualizations from data. He cautions about the potential for errors and the need for careful scrutiny of the generated code.

9. Cloud Artifacts: The speaker introduces Cloud Artifacts, a feature allowing Claude to generate custom web applications based on user prompts. He demonstrates the creation of a flashcard app and a conceptual diagram generator, highlighting the potential for creating tailored tools for various tasks.

10. Code Generation with Cursor: The speaker shifts to discussing code generation using dedicated apps like Cursor, which utilizes LLMs (in this case, Claude) to assist in coding tasks. He demonstrates "vibe coding," where the user provides high-level instructions, and the LLM handles the low-level code generation and editing, showcasing the creation of a simple Tic-Tac-Toe game.

11. Multimodality (Audio, Image, Video): The speaker explores the integration of different modalities beyond text. He demonstrates voice input and output, highlighting the difference between "fake audio" (speech-to-text/text-to-speech) and "true audio" (native audio processing within the LLM). He showcases advanced voice features in ChatGPT and Grok, including voice impersonation and storytelling. He also demonstrates image input and output, using examples of analyzing nutrition labels, blood test results, and memes. Finally, he shows video input capabilities in the ChatGPT mobile app and video generation tools.

12. Quality of Life Features: The final section covers features enhancing user experience, including ChatGPT's memory feature, custom instructions, and custom GPTs. The speaker demonstrates how these features personalize the interaction with the LLM, making it more efficient and tailored to individual needs. He uses examples of language learning to illustrate the benefits of custom GPTs.

13. Conclusion: The speaker summarizes the key features and capabilities of various LLMs, emphasizing the rapid evolution of the field and the need for users to stay informed about the latest developments and capabilities of different platforms. He encourages experimentation and exploration of the various tools and features available.