ARC Prize: A Guide to DSL, LLM-Guided & Test-time Training Approaches

Short Summary:
This video explores three programmatic approaches to solving the ARC AGI challenge, a set of abstract reasoning problems difficult for LLMs but relatively easy for humans. The key approaches discussed are: Domain Specific Language (DSL), Large Language Model (LLM)-guided search, and neural network training with test-time training. Specific technologies like GPT-3, Gemini, and Llama are mentioned in demonstrations. The video aims to provide a practical guide for tackling ARC AGI problems, highlighting the strengths and weaknesses of each method. Detailed examples and code implementations are provided in a publicly available repository ("Trellis Research Minimal Arc"). The video also promotes a Trellis Research team forming to compete in ARC AGI2.
Detailed Summary:
The video is structured into several sections:
-
Introduction and Human Approach: The video introduces the ARC AGI challenge, emphasizing its difficulty for current LLMs. The speaker demonstrates a human's problem-solving approach on an example from arprize.org, highlighting the intuitive reasoning involved. This sets the stage for exploring programmatic alternatives.
-
Three Main Approaches: Three primary approaches for solving ARC AGI problems are presented:
- DSL: This involves defining a set of basic grid operations (e.g., rotation, mirroring) and exhaustively searching for combinations that transform the input into the output. The speaker mentions Ice Cuber's successful use of this approach. A simple rotation example is demonstrated using Chat GPT to define primitives and a brute-force search.
- LLM-guided Search: This uses an LLM (like GPT-3 or Gemini) to generate Python code that solves the problem. The generated code is tested on training examples, and successful programs are used to predict the test output. The speaker notes that LLMs struggle with larger, more complex grids. A simple rotation example is shown using GPT-3 to generate the Python code.
- Neural Network Training with Test-Time Training: This involves training a neural network (e.g., using a Llama model) to predict the output grid. Test-time training is emphasized, where the model is further trained on the provided training examples during the competition to improve performance. Beam search and depth-first search are discussed as methods for selecting the best output from multiple predictions. A simple example using a small autoregressive neural network in NumPy is demonstrated.
-
"Trellis Research Minimal Arc" Repository: The speaker introduces a public GitHub repository containing minimal examples for each of the three approaches, along with more detailed implementations and explanations. This repository serves as a practical resource for viewers to learn and experiment.
-
Detailed Implementation Examples: The video dives into the details of each approach, including:
- DSL: Explains the importance of unary operations (operations that don't require additional parameters), caching intermediate states to speed up the search, and the use of heuristics to reduce the search space. A practical demonstration on a subset of ARC AGI1 problems is shown, highlighting the challenges and successes of the approach.
- LLM-guided Search: Details the process of prompt engineering for LLM program generation, testing generated programs, and using majority voting to select the best output. A demonstration on a subset of ARC AGI1 problems is presented, showing the limitations of this approach with complex problems.
- Test-Time Training: This section focuses on pre-training a neural network on augmented ARC data, followed by task-specific fine-tuning during the competition. The use of parameter-efficient fine-tuning (using LoRA) is highlighted, along with techniques like depth-first search and the importance of choosing the right output from multiple predictions. A Jupyter Notebook demonstration is shown, illustrating the pre-training and test-time training process.
-
Trellis Research ARC AGI2 Team: The video concludes by promoting a Trellis Research team being formed to compete in ARC AGI2, inviting applications from interested individuals. The prize structure and team selection process are briefly described.
The video consistently emphasizes the importance of practical implementation, providing code examples and detailed explanations of the techniques involved. The speaker also highlights the iterative nature of solving ARC AGI problems, emphasizing the need for experimentation and refinement of different approaches. The "Trellis Research Minimal Arc" repository is presented as a key resource for further exploration.