<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Ground Truth]]></title><description><![CDATA[AI Insights for Software Engineers and Tech Enthusiasts]]></description><link>https://thegroundtruth.media</link><image><url>https://substackcdn.com/image/fetch/$s_!iZv-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F961dbec3-a09e-4643-9c25-05f83cdda466_534x534.png</url><title>The Ground Truth</title><link>https://thegroundtruth.media</link></image><generator>Substack</generator><lastBuildDate>Fri, 01 May 2026 02:23:22 GMT</lastBuildDate><atom:link href="https://thegroundtruth.media/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Zhu Liang]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thegroundtruth@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thegroundtruth@substack.com]]></itunes:email><itunes:name><![CDATA[Zhu Liang]]></itunes:name></itunes:owner><itunes:author><![CDATA[Zhu Liang]]></itunes:author><googleplay:owner><![CDATA[thegroundtruth@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thegroundtruth@substack.com]]></googleplay:email><googleplay:author><![CDATA[Zhu Liang]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why Anthropic Will Win the Race to the Top]]></title><description><![CDATA[My perspectives on areas which Anthropic is current leading the competitors.]]></description><link>https://thegroundtruth.media/p/why-anthropic-will-win-the-race</link><guid isPermaLink="false">https://thegroundtruth.media/p/why-anthropic-will-win-the-race</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Wed, 18 Feb 2026 04:07:36 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e0018d70-3fa3-4e03-a8b6-8aa3fe7ff78d_933x465.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I believe Anthropic is leading in nearly every dimension that matters for building powerful AI. Here are the specific areas where Anthropic holds a clear lead, and why each one matters.</p><h2>Claude: The Best Coding Model</h2><p>Anthropic has held the crown for <strong>best coding model for nearly two years</strong>, and no other lab has managed to take it away. It started with <a href="https://www.anthropic.com/news/claude-3-5-sonnet">Sonnet 3.5</a>, which beat GPT-4o on most benchmarks at launch and became the default model for major tools like Aider and Cline. Then Sonnet 4, Sonnet 4.5, and now <a href="https://www.anthropic.com/news/claude-opus-4-6">Opus 4.6</a>.</p><p>Claude models currently hold the top <a href="https://www.anthropic.com/news/claude-opus-4-6">two spots</a> on SWE-bench Verified. At each generation, Anthropic has maintained or widened its lead over competing models from OpenAI, Google, and open-weight alternatives.</p><p>This means that Anthropic always had the best model internally to <strong>accelerate research and engineering efforts</strong> and move faster than other labs, months before the model&#8217;s public release.</p><h2>Claude Code: The Original Agent Harness</h2><p>Claude Code was the first CLI-based coding agent that worked well enough to change how developers build software. It has since become <strong><a href="https://www.npmjs.com/package/@anthropic-ai/claude-code">one of the most widely used coding agents</a></strong>. It demonstrated that AI labs can make great developer tools. The success of Claude Code forced the entire industry to respond.</p><p>The evidence that competitors like OpenAI are following Anthropic&#8217;s lead is clear:</p><ul><li><p>Codex CLI adopted the same terminal-based agent paradigm that Claude Code pioneered</p></li><li><p>The session usage limit implementation in Codex mirrors Claude Code&#8217;s approach</p></li><li><p>The generous $20 subscription plan follows Anthropic&#8217;s token subsidy model</p></li><li><p>OpenAI shifted its entire product strategy toward Codex after seeing Claude Code&#8217;s traction</p></li></ul><p>Despite these fast-follow efforts, Claude Code remains <a href="https://x.com/paradite_/status/2023270304798400708">ahead in usage numbers</a>. This is clear evidence that Anthropic still has the lead in building and designing agents.</p><h2>TypeScript: The Right Language for Agents</h2><p>Claude Code is written in TypeScript, and this architectural decision gives Anthropic a compounding advantage.</p><p>TypeScript is the <strong><a href="https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/">most-used language on GitHub</a></strong>. It is also what all major LLMs are most heavily trained on, meaning Claude and competing models understand TypeScript code better than code in any other language.</p><p>Building the agent harness in TypeScript creates a <strong>positive feedback loop</strong>. The model excels at understanding and modifying the very language the harness is built in. This makes iteration faster and more reliable.</p><p>The <a href="https://platform.claude.com/docs/en/agent-sdk/overview">Claude Agent SDK</a>, also offered in TypeScript, takes this further. The Agent SDK is a wrapper around Claude Code, which proves that Claude Code works as a general-purpose agent harness capable of handling tasks across different domains. As more developers build custom agents with the Agent SDK, this lead compounds.</p><h2>Pioneering Agent Paradigms</h2><p>Anthropic has a pattern of introducing <strong>new agent paradigms that become industry standards</strong>.</p><ul><li><p><strong>MCP</strong> (<a href="https://www.anthropic.com/news/model-context-protocol">Nov 2024</a>):, MCP defined how AI agents communicate with external services. It has since been adopted by virtually every coding tool, including <a href="https://developers.openai.com/codex/mcp/">Codex</a>, <a href="https://cursor.com/docs/context/mcp">Cursor</a>, and many others.</p></li><li><p><strong>Sub-agent</strong>(<a href="https://news.ycombinator.com/item?id=44686726">July 2025</a>): A primary agent delegates tasks to specialized forked agents to optimize context usage. Anthropic pioneered this in Claude Code and competitors later adopted it.</p></li><li><p><strong>Hooks</strong> (<a href="https://www.anthropic.com/news/enabling-claude-code-to-work-more-autonomously">Sep 2025</a>): User-defined commands that execute at specific points in the agent&#8217;s lifecycle. Hooks solve a fundamental problem with LLM-based agents: the model is probabilistic, but certain actions need to be deterministic.</p></li><li><p><strong>Skills</strong> (<a href="https://claude.com/blog/skills">Oct 2025</a>): Modular, filesystem-based capabilities that extend agents with domain-specific expertise. Anthropic published the <a href="https://www.anthropic.com/news/skills">Agent Skills open standard</a>, which has been adopted by tools including <a href="https://developers.openai.com/codex/skills/">Codex</a>, <a href="https://cursor.com/docs/context/skills">Cursor</a> and others.</p></li><li><p><strong>Plugins</strong> (<a href="https://claude.com/blog/claude-code-plugins">Oct 2025</a>): Containers that package tools, permissions, and metadata together, providing a structured way to extend agent functionality beyond what the base harness offers.</p></li></ul><p>Each paradigm defines a fundamentally new way for agents to interact with their environment, and Anthropic has been first to market on nearly all of them.</p><h2>Reinforcement Learning (RL) Advantage</h2><p>Anthropic&#8217;s lead in agent paradigms creates a <strong>structural RL advantage</strong>. When Anthropic invents a new paradigm like Skills or a new sub-agent orchestration pattern, it can begin RL training immediately.</p><p>This happens months before the paradigm is publicly released. By the time other labs learn about the paradigm and start incorporating it into their own training pipelines, Claude is already RL-optimized for it.</p><p>This head start repeats with every new paradigm Anthropic introduces, and the gap compounds over time. Other labs are always playing catch-up on two fronts simultaneously: first implementing the paradigm, then training their models to use it effectively.</p><h2>Safety and Mechanistic Interpretability</h2><p>Anthropic was founded with safety as a core mission. Anthropic&#8217;s research in <strong>mechanistic interpretability (mechinterp)</strong> is the most advanced in the industry. Mechinterp aims to understand what is happening inside a model&#8217;s neural network, going beyond surface-level observation of inputs and outputs.</p><p>MIT Technology Review named mechanistic interpretability one of its <a href="https://www.technologyreview.com/2026/01/12/1130003/mechanistic-interpretability-ai-research-models-2026-breakthrough-technologies/">10 Breakthrough Technologies of 2026</a>, largely on the strength of Anthropic&#8217;s work. Anthropic has published landmark research including <a href="https://www.anthropic.com/research/tracing-thoughts-language-model">Tracing the Thoughts of a Large Language Model</a> and <a href="https://transformer-circuits.pub/">open-sourced circuit tracing tools</a> for the research community.</p><p>This matters for a practical reason: If models start failing at certain tasks, labs without mechinterp capabilities would not know why or how to fix it. They can only hope that scaling or more training data solves the problem. Anthropic&#8217;s mechinterp research gives it a deeper understanding of model behavior, which translates into more targeted improvements and fewer blind spots.</p><p>Safety research also feeds directly into product quality. Techniques developed for alignment and safety, such as Constitutional AI, make Claude more reliable and predictable in practice. Safer models tend to also be more useful. They follow instructions more faithfully and are less likely to produce unexpected behavior during long autonomous runs.</p><h2>Model Character and Taste</h2><p>There is a quality to Claude that is hard to quantify but immediately noticeable: it feels more pleasant to work with. Working with Claude feels like working with a <strong>thoughtful collaborator</strong>. Models from other labs tend to feel like generic autocomplete engines by comparison.</p><p>Amanda Askell, who leads character training at Anthropic, has shaped how Claude communicates and reasons about its responses. The result is a model that developers genuinely enjoy working with, one that goes beyond producing correct output.</p><p>When you spend hours per day interacting with a coding agent, the quality of that interaction directly affects productivity and satisfaction. Other labs have not invested in model character with the same care, and their models tend to feel interchangeable as a result.</p><h2>Leadership</h2><p>The moat of an AI lab ultimately comes down to the person leading it.</p><p>Dario Amodei combines <strong>deep technical understanding</strong> of the models with <strong>strategic clarity about where the industry is heading</strong>. He also has the organizational ability to execute on both. He articulates a clear vision for what safe, powerful AI looks like, and Anthropic&#8217;s product decisions align with that vision.</p><p>The AI race requires making correct bets on training approaches, product strategy, safety tradeoffs, and market positioning simultaneously. Leadership quality determines how well a lab handles these interconnected decisions.</p><h2>Substance Over Marketing</h2><p>The difference in how Anthropic and other labs promote their products is notable. Some labs have <a href="https://x.com/gdb/status/2022823856889827711">employees</a> <a href="https://x.com/gdb/status/2022804437308445042">actively</a> <a href="https://x.com/gdb/status/2022787705852268799">engaging</a> with and <a href="https://x.com/paradite_/status/2021083893538103803">amplifying</a> product tweets, creating the perception of widespread adoption.</p><p>Anthropic takes a different approach: <strong>stay quiet, let third-party reviews and organic word-of-mouth do the talking</strong>.</p><p>This creates a perception gap. On any given day, X timelines might suggest that a competing product is dominant, when head-to-head comparisons tell a different story.</p><p>Anthropic&#8217;s bet is that developers will ultimately choose the best tool, regardless of which product has louder advocates on social media. Developers are a technical audience that evaluates products empirically. Over time, substance wins over marketing.</p><h2>The Race to the Top</h2><p>Some labs will win the race to the bottom, competing on price and marketing to capture commodity workloads. Anthropic is playing a different game.</p><p>In each of the areas above, Anthropic is either leading or setting the pace. The gap may narrow in individual areas, but the breadth of Anthropic&#8217;s lead across all these dimensions is what makes it durable.</p><p>Consistent execution across multiple fronts is the best predictor of who will lead the next generation of AI. By that measure, Anthropic is in a class of its own.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[6 Patterns for Building Workflow AI Agents]]></title><description><![CDATA[I spent the last two months building an AI agent. Here are the 6 top patterns that I learnt from my experience.]]></description><link>https://thegroundtruth.media/p/ai-agent-patterns</link><guid isPermaLink="false">https://thegroundtruth.media/p/ai-agent-patterns</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Tue, 27 Jan 2026 07:15:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_V67!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For the past two months, I have been building an AI agent using the Claude Agent SDK to make short-form vertical videos.</p><p>The agent handles everything from sourcing content to generating scripts, adding B-roll, selecting music, and composing the final videos.</p><p>Here are the 6 patterns that I learned from building this agent.</p><h2>1. Decompose Work with Sub-Agents and Skills</h2><p>I started off with one agent that did everything. It quickly became clear that the agent was trying to do too much at once. While the key was to break down its work, the best way to do so was not immediately obvious.</p><p>I tried separate agents, sub-agents, and skills. Ultimately, I found it useful to think about decomposition in two ways: <strong>sub-agents</strong> and <strong>skills</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_V67!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_V67!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 424w, https://substackcdn.com/image/fetch/$s_!_V67!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 848w, https://substackcdn.com/image/fetch/$s_!_V67!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 1272w, https://substackcdn.com/image/fetch/$s_!_V67!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_V67!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png" width="1456" height="555" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:555,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121291,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/185843915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_V67!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 424w, https://substackcdn.com/image/fetch/$s_!_V67!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 848w, https://substackcdn.com/image/fetch/$s_!_V67!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 1272w, https://substackcdn.com/image/fetch/$s_!_V67!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b9aa182-28f1-4f91-900d-c42c86d5c007_1648x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I use <strong>sub-agents</strong> for completely different tasks within a larger workflow. For my project, this meant having one agent for discovering content and another for creating the video. Each sub-agent has its own system prompt and runs in a separate session.</p><p>For smaller steps within a single task, I use <strong>skills</strong>. My video creation agent uses skills like <code>/selecting-broll</code> to find footage or <code>/selecting-music</code> to choose audio. These skills are loaded only when needed and share the context with the main agent.</p><p>Like prompts, <strong>skills can be composed from parts or dynamically generated</strong> before agent invocation. This is helpful if you want to dynamically adjust the skills based on some kind of catalog or knowledge base.</p><h2>2. CLI as a Universal Interface</h2><p>I decided to skip building a <strong>graphical user interface (GUI)</strong> for my agent at the beginning of the project, based on my previous experience of spending too much time on GUI development.</p><p>A traditional approach, which involves a <strong>GUI</strong> for humans, and separate <strong>tools</strong> for agents, doubles the development work. I built a <strong>command-line interface (CLI)</strong> instead. This single interface could be used by both me and the agent, making development much faster and debugging far easier.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c3Vp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c3Vp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 424w, https://substackcdn.com/image/fetch/$s_!c3Vp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 848w, https://substackcdn.com/image/fetch/$s_!c3Vp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 1272w, https://substackcdn.com/image/fetch/$s_!c3Vp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c3Vp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png" width="1456" height="349" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:349,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/185843915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c3Vp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 424w, https://substackcdn.com/image/fetch/$s_!c3Vp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 848w, https://substackcdn.com/image/fetch/$s_!c3Vp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 1272w, https://substackcdn.com/image/fetch/$s_!c3Vp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef57ddd-91de-4219-9d43-9a67e9c4918c_1634x392.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>When my agent ran into an issue, I could reproduce it by running the exact same command it used, like <code>./cli compose</code>. This removed the guesswork of whether the agent&#8217;s tool call was different from a human&#8217;s action. A single CLI served as the universal interface for everyone.</p><p>You can also <strong>expose different LLM models as CLI commands</strong> for various specialized tasks. For example, I use Gemini 2.5 Pro as the underlying model for the review-video command as it is better at analyzing videos than other models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1FnA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1FnA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1FnA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1FnA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1FnA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1FnA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg" width="1200" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;CLI commands documentation&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="CLI commands documentation" title="CLI commands documentation" srcset="https://substackcdn.com/image/fetch/$s_!1FnA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1FnA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1FnA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1FnA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea72867-a843-4df4-b708-1ace27e60efe_1200x723.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Section on CLI commands in agent system prompt</figcaption></figure></div><p>After the agent baseline stabilized, I then added a GUI for human-in-the-loop tasks like reviewing drafts and making fine-grained adjustments.</p><h2>3. Guide Agents with a Status Command</h2><p>Another important decision was how the agent decided what to do next. From my previous experience, isolated stages in the workflow with their own context can lead to missing context in later stages and make it hard to revise work from previous stages. I tried to let the agent figure out priorities on its own, but this led to unpredictable behavior.</p><p>The solution was a <strong>dedicated status command</strong> that guides the agent to the next action. By having the agent call status, execute the action, and then call status again, we are embedding a predictable state machine within the agent loop. The command&#8217;s output centralizes all the priority and workflow logic.</p><p>The <code>status</code> command outputs <strong>Next Step</strong> instructions following this priority:</p><pre><code><code>1. Fix rejected video: if any rejected videos exist
2. Compose: if a clip has drafts with a human pick
3. Awaiting selection: if drafts exist but no human pick yet
4. Generate drafts: if available clips have no drafts
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KkTu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KkTu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KkTu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KkTu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KkTu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KkTu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg" width="1200" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Agent calling status command&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Agent calling status command" title="Agent calling status command" srcset="https://substackcdn.com/image/fetch/$s_!KkTu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KkTu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KkTu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KkTu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc181879b-32e0-478f-a3bc-a303dcfbe6ce_1200x761.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample output for the status cli command</figcaption></figure></div><p>Now, the agent can proactively look at the status and decide what to do next, while also being flexible to whatever workflow the user wants to run via custom prompts.</p><h2>4. Accept the Natural Variance of Models</h2><p>I thought I could get consistent output from an LLM by refining my prompts, but it didn&#8217;t work. After weeks of trying to fix the last 15% of issues, I decided to measure the model&#8217;s variance. I ran the same prompt on the same input 10 times and found the results varied between a 76% and 88% success rate.</p><p>This shows that there is a natural ceiling for consistency with current LLMs. So I stopped chasing perfection and <strong>embraced this natural variance</strong>.</p><p>I started designing the system to work with this variance by generating multiple options (with varied prompts and parameters) and adding a human selection process to pick the best one. I had the evaluation system first pick a few promising candidates, and then I selected the best one for the next step.</p><h2>5. Hybrid Validation with Code and LLM</h2><p>Evaluating the output of the agent and aligning it with human preference was tricky. It took me a while to get it right. The biggest improvement in quality came when I combined LLMs with simple code during the evaluation process.</p><p>I found it was better to use the <strong>LLM for nuanced judgments</strong> and use <strong>code to enforce hard rules</strong>, and then use a weighted sum to calculate the final score. This brought the alignment with human judgment from 73% to 92%.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ARUv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ARUv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 424w, https://substackcdn.com/image/fetch/$s_!ARUv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 848w, https://substackcdn.com/image/fetch/$s_!ARUv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 1272w, https://substackcdn.com/image/fetch/$s_!ARUv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ARUv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png" width="1456" height="343" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:343,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68223,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/185843915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ARUv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 424w, https://substackcdn.com/image/fetch/$s_!ARUv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 848w, https://substackcdn.com/image/fetch/$s_!ARUv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 1272w, https://substackcdn.com/image/fetch/$s_!ARUv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a49c813-3a3f-46cc-bf37-069ba68ed52c_1638x386.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>6. Read the Logs</h2><p><strong>Recording and reading the agent&#8217;s logs</strong> was a highly effective way to improve the agent performance. Reading about the steps that the agent took, especially when it failed, revealed problems I would not have discovered otherwise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mqGl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mqGl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mqGl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mqGl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mqGl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mqGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg" width="1200" height="735" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:735,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Agent invoking skills&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Agent invoking skills" title="Agent invoking skills" srcset="https://substackcdn.com/image/fetch/$s_!mqGl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mqGl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mqGl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mqGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8858c343-b871-4cb3-bb2d-a3683e219276_1200x735.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of agent log showing Skill invocations and Bash tool calls</figcaption></figure></div><p>For example, the logs showed me that my agent was trying to compose a video three times before succeeding. The issue was an FFmpeg error that happened when too many video overlays were used. So I added a validation step to prevent this from happening again.</p><p>The logs also helped me find smaller inefficiencies, like the agent calling <code>music list</code> when the data was already in its context. Reading the logs regularly became a core part of my development process. It is the best way to understand what your agent is actually doing.</p><div><hr></div><p>That&#8217;s it! If you are building workflow agents, I hope these patterns provide some inspiration and ideas.</p><p>Not all of them are relevant for all use cases. And some of these might become outdated in a few months. Remember that it is important to experiment, iterate, and learn from your own experiences.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[LLMs Work. The Problem is Translation.]]></title><description><![CDATA[LLMs work very well. But there is a translation problem between LLMs and the environment that LLMs have to interact with (software and humans).]]></description><link>https://thegroundtruth.media/p/llms-work-the-problem-is-translation</link><guid isPermaLink="false">https://thegroundtruth.media/p/llms-work-the-problem-is-translation</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Tue, 28 Oct 2025 15:49:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YSpO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have posted <a href="https://www.linkedin.com/feed/update/urn:li:activity:7383763833401610240/">bits</a> and <a href="https://www.linkedin.com/feed/update/urn:li:activity:7384458211980021760/">pieces</a> of my thinking on LLMs and AI agents. Here&#8217;s a longer piece on it to consolidate my ideas on the <strong>translation problem</strong>.</p><p>To understand why the problem with LLMs is in translation, we need to understand how LLMs work.</p><h2>How LLMs Work</h2><p>LLMs are trained on tokens, which are basically human languages translated to LLM language. For example, the sentence &#8220;<strong>I love you.</strong>&#8221; is <a href="https://platform.openai.com/tokenizer">translated</a> into <strong>[40, 3047, 481, 13]</strong> for GPT-4o.</p><p>These numbers are token IDs, which are then converted into embeddings of <strong>higher dimensions</strong>, processed by the core transformer model of LLM at a <strong>even higher dimension</strong> to give output tokens, which may be [40, 3047, 481, 3101, 13], which are then translated back to human language, &#8220;I love you too.&#8221;</p><p>Here&#8217;s the model code snippet from <a href="https://github.com/karpathy/nanoGPT/blob/93a43d9a5c22450bbf06e78da2cb6eeef084b717/model.py#L52">nanoGPT</a> to showing how this translation works in Python code:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KdSC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KdSC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KdSC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KdSC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KdSC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KdSC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg" width="1200" height="635" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;llm&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="llm" title="llm" srcset="https://substackcdn.com/image/fetch/$s_!KdSC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KdSC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KdSC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KdSC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06dc2f22-8505-4e0a-a039-0cfd903c98e4_1200x635.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Code snippets from nanoGPT on translation layers</figcaption></figure></div><p>Here&#8217;s the step-by-step process illustration of the process created by me:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!33jf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!33jf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 424w, https://substackcdn.com/image/fetch/$s_!33jf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 848w, https://substackcdn.com/image/fetch/$s_!33jf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!33jf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!33jf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png" width="1418" height="1096" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1096,&quot;width&quot;:1418,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:171623,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/177380765?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!33jf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 424w, https://substackcdn.com/image/fetch/$s_!33jf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 848w, https://substackcdn.com/image/fetch/$s_!33jf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!33jf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a8e4e9a-2faf-4e02-8f4c-5798f273e737_1418x1096.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Illustration of the translation processes in LLMs</figcaption></figure></div><p>Let&#8217;s take a look at each step:</p><ol><li><p>The conversion from human language or programming language to token IDs is handled by the <strong>tokenizer</strong>, which is a separate system from the core transformer model in LLM.</p></li><li><p>The conversion from token IDs to embeddings is handled by a smaller <strong>embedding model</strong>, which is part of the LLM, but not the core transformer model where the magic happens.</p></li><li><p>The core transformer model processes the embeddings to give <strong>output tokens</strong>.</p></li><li><p>The output tokens are translated back to human language or programming language by the <strong>tokenizer</strong>.</p></li></ol><blockquote><p>We don&#8217;t know the exact dimensions of SOTA models by OpenAI (GPT-5) and Anthropic (Claude Sonnet 4.5), but we can use DeepSeek V3 as reference. DeepSeek V3 embedding model has dimension of <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/9b4e9788e4a3a731f7567338ed15d3ec549ce03b/inference/model.py#L60">2048</a>, and core transformer model has dimension of <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/9b4e9788e4a3a731f7567338ed15d3ec549ce03b/inference/model.py#L61C22-L61C27">10944</a> each for the two dense layers and <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/9b4e9788e4a3a731f7567338ed15d3ec549ce03b/inference/model.py#L62C26-L62C30">1408</a> for each expert layer.</p></blockquote><p>In summary, the <strong>core transformer model</strong> of LLMs, which is where most of the parameters are trained and stored as weights, is mainly in the <strong>higher dimensional space</strong>, not in the token space or embedding space.</p><h2>Translation Problem</h2><p>Now that we understand how LLMs work, we can see that there is a <strong>translation problem</strong> between LLMs and the environment that LLMs have to interact with (software and humans):</p><p>Most of LLM model operations are performed in the <strong>higher dimensional space</strong>, not in the <strong>token space</strong>. On the other hand, software and humans operate in <strong>human language and programming languages</strong>, which are translated to token IDs by the tokenizer.</p><p>So while model can come up with good answers in higher dimensions, when it comes to interacting with the environment, it has to translate the answers back to token IDs (single dimension), which is then translated back to human language and programming languages.</p><p>There are two translation layers in the process:</p><ul><li><p><strong>Between higher dimensions and token IDs</strong></p></li><li><p><strong>Between token IDs and human languages &amp; programming languages</strong></p></li></ul><p>These two translation layers cause <strong>information loss</strong> and <strong>mismatch in precision</strong>, which are <strong>the source of hallucinations</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YSpO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YSpO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 424w, https://substackcdn.com/image/fetch/$s_!YSpO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 848w, https://substackcdn.com/image/fetch/$s_!YSpO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!YSpO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YSpO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png" width="1426" height="1082" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1082,&quot;width&quot;:1426,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197744,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/177380765?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YSpO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 424w, https://substackcdn.com/image/fetch/$s_!YSpO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 848w, https://substackcdn.com/image/fetch/$s_!YSpO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!YSpO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b57f3b4-c49b-4096-beea-9b972b426397_1426x1082.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Information loss and mismatch in precision during translation processes</figcaption></figure></div><ul><li><p><strong>Information loss</strong>: The core transformer model needs to cast concepts from higher dimensions back to 1-dimensional token ID, which causes information loss. Furthermore, there is consistency requirements for human language, hindering the model&#8217;s ability to express ideas with the most suitable token ID, just because it was in a different human language.</p></li><li><p><strong>Mismatch in precision</strong>: The tokens have different precision levels from human language (characters) and programming languages (symbols), which causes mismatch in precision. LLMs are trained with tokens as the basic units of input and output, and one token usually maps to several characters or symbols, so the model lacks understanding of characters and symbols.</p></li></ul><p>It&#8217;s worth noting that the translation happens for both input and output. And this compounds the inaccuracy of the answers as the LLM processes new input tokens to generate new output tokens, especially in the case of autonomous AI agents where the loop can go on for very long time.</p><h2>Examples of Translation Errors</h2><p><strong>How many Rs are in the word strawberry?</strong></p><p>The famous &#8220;<a href="https://prompt.16x.engineer/blog/why-chatgpt-cant-count-rs-in-strawberry">How many R are in the word strawberry?</a>&#8221; example is a classic example of hallucination. The model first translates the word &#8220;strawberry&#8221; to token IDs, <code>[302, 1618, 19772]</code>.</p><p>The problem is that the model does not see &#8220;strawberry&#8221; as a word composed of letters, but as 3 tokens. The basic unit of LLM processing is a token ID, but the basic unit of English language is a letter, and they don&#8217;t match one-to-one. Hence, it is difficult for the model to understand that the word &#8220;strawberry&#8221; is composed of 3 R&#8217;s, as all it sees is 3 tokens.</p><p>The newer models are able to answer this question correctly because of its massive pre-training data on English language, but ultimately, it is a translation problem between the language of humans and the language of LLMs.</p><p><strong>Mixing of languages in model responses</strong></p><p>Models, especially the ones from Chinese labs like DeepSeek and Qwen, often <a href="https://www.linkedin.com/feed/update/urn:li:activity:7289122636813516800/">mix up English and Chinese in their responses</a>. This is an example of translation issue between the language of humans and the language of LLMs.</p><p>For LLMs, they see token IDs and map them to <strong>concepts</strong> in higher dimensional space. For LLMs it does not really matter what human language the concepts are in.</p><p>In fact it is often the case where <a href="https://16x.engineer/2022/10/18/chinese-tech-terms.html">some concepts are better explained in one human language than the other</a>, so it makes sense to use a mixture of languages when trying to cast higher dimensional concepts back to human language.</p><p><strong>Seahorse emoji</strong></p><p>LLMs can <a href="https://medium.com/@jasperhajonides/what-gpt-5s-seahorse-emoji-struggle-teaches-us-b2f3895e216a">incorrectly say</a> there is a seahorse emoji, just to realize that it can&#8217;t produce it. This is a translation error as well: </p><p>LLM cannot find a mapping from the seahorse concept in higher dimension to a set of tokens that capture this concept as an emoji. This is similar to how we sometimes find it hard to <a href="https://16x.engineer/2022/10/18/chinese-tech-terms.html">explain a concept from a foreign language in English</a>.</p><p></p><h2>How to Solve the Translation Problem</h2><p>I believe there are several potential solutions to solve the translation problem:</p><ul><li><p>Use a better way to capture real world concepts in higher dimensions (images and videos instead of text) for the models, to avoid information loss during translation.</p></li><li><p>Make programming languages more compatible with LLMs (e.g. align the precision level of the programming language to the tokenizer or the tokenizer to the precision level of the programming language).</p></li><li><p>Teach humans to be multilingual and understand concepts in multiple languages, so that LLMs do not need to be forced to output tokens in a specific language, causing information loss.</p></li></ul><p><strong>When humans, softwares and models all speak the same language, models will be much more effective than they are today.</strong></p><div><hr></div><p>That&#8217;s all for this post. I have a lot more thoughts on AI agents and AGI. But I really wanted to write on this topic first since I believe this is a blind spot for many people.</p><p>Let me know what you think by leaving a comment. More posts coming in the future!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Effectiveness of AI Coding Techniques: Tools and Agents]]></title><description><![CDATA[Part 2 of the series, where I analysis the effectiveness of various AI coding techniques. This post focuses on tools and agents.]]></description><link>https://thegroundtruth.media/p/effectiveness-of-ai-coding-techniques-tools-agents</link><guid isPermaLink="false">https://thegroundtruth.media/p/effectiveness-of-ai-coding-techniques-tools-agents</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Sun, 17 Aug 2025 06:56:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Tjcd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post, we will analyze the effectiveness of AI coding techniques using tools and agents.</p><p>This is part 2 of the series on AI coding techniques. Check out <a href="https://thegroundtruth.substack.com/p/effectiveness-of-ai-coding-techniques-input-context">part 1</a> for input and context management related techniques.</p><h2>1. Tool Calls</h2><div class="pullquote"><p><strong>Mature. Effective.</strong></p></div><p><strong>Tool calls</strong> were the magic that kick-started the <strong>AI coding agent</strong> era. Tools (function calling) were <a href="https://openai.com/index/function-calling-and-other-api-updates/">first introduced</a> by OpenAI in June 2023. Cursor then popularized tool calling for file editing inside its IDE.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MRRh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MRRh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 424w, https://substackcdn.com/image/fetch/$s_!MRRh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 848w, https://substackcdn.com/image/fetch/$s_!MRRh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!MRRh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MRRh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg" width="1200" height="575" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:575,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OpenAI function calling&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OpenAI function calling" title="OpenAI function calling" srcset="https://substackcdn.com/image/fetch/$s_!MRRh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 424w, https://substackcdn.com/image/fetch/$s_!MRRh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 848w, https://substackcdn.com/image/fetch/$s_!MRRh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!MRRh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb632ede5-2f5c-422b-a755-1d23f0223334_1200x575.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI announcing function calling on June 13, 2023</figcaption></figure></div><p>Tool calls marked the shift from the manual ChatGPT copy-pasting workflow to <strong>agentic workflow</strong>. AI agents equipped with tools can carry out code edits and execute CLI commands autonomously, without needing human developers' help to interact with the local environment.</p><p>Newer models like Kimi K2 are specifically trained to take advantage of tool calls and <strong>agentic by default</strong>, which helps with complex coding tasks involving multiple steps and the use of tools.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j9OX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j9OX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 424w, https://substackcdn.com/image/fetch/$s_!j9OX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 848w, https://substackcdn.com/image/fetch/$s_!j9OX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!j9OX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j9OX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg" width="1200" height="455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:455,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Kimi K2&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Kimi K2" title="Kimi K2" srcset="https://substackcdn.com/image/fetch/$s_!j9OX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 424w, https://substackcdn.com/image/fetch/$s_!j9OX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 848w, https://substackcdn.com/image/fetch/$s_!j9OX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!j9OX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F855580f7-cac5-47b0-9fb8-230bf1b6c01e_1200x455.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Agentic capabilities are a key feature of new open-source models</figcaption></figure></div><p>Tool calls are now a <strong>foundational piece</strong> for any AI coding tool. They have proven to be a very effective way for models to interact with the external environment.</p><h2>2. MCP Servers</h2><div class="pullquote"><p><strong>Emerging. Limited Effectiveness.</strong></p></div><p>Taking the idea of tools a step further, we get Model Context Protocol (MCP). MCP was <a href="https://www.anthropic.com/news/model-context-protocol">introduced by Anthropic</a> in November 2024 as a way to <strong>standardize communication between AI agents and other external services</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x6V6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x6V6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x6V6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x6V6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x6V6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x6V6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg" width="1200" height="524" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:524,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MCP&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MCP" title="MCP" srcset="https://substackcdn.com/image/fetch/$s_!x6V6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x6V6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x6V6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x6V6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18a7b78-bbdf-4e14-9ab2-3deb20970a13_1200x524.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Despite being out for almost a year, it has not gained widespread usage, based on community feedback from Reddit and X. The protocol itself is still undergoing <a href="https://modelcontextprotocol.io/specification/2025-06-18/changelog">iterations of changes</a> to allow <a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization">better authorization flow</a> and <a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/security_best_practices">security</a>.</p><p>One issue limiting the effectiveness of MCP servers is the <strong>static nature of tools</strong>. The MCP protocol itself does not allow for <strong>selective or dynamic enabling of tools</strong> within an MCP server.</p><p>If an MCP server has 20 tools, the definitions of all 20 tools will be added to the beginning of the model's context window. With multiple MCP servers it can go up to 100 tools being stuffed into the context.</p><p>This is a problem for models with a <a href="https://thegroundtruth.substack.com/p/the-ground-truth-weekly-effective">limited effective context window</a>:</p><ul><li><p>The more tokens already present in the context, the less effective the model becomes at solving tasks (signal vs noise ratio goes down).</p></li><li><p>Having a large number of tools can distract the model; it might use tools when not necessary, or choose a less effective tool.</p></li><li><p>A large number of tool definitions <a href="https://www.reddit.com/r/ClaudeAI/comments/1mjieaf/claude_code_context_and_mcps/">take up context window</a> and reduce the amount of usable context for the user.</p></li><li><p>Having tool definitions in the context also increases cost, as the full tool definitions are resent to the model for each message.</p></li></ul><p>There are workarounds for the "explosion of tools" problem. Some MCP client apps, like Cursor, allow you to selectively enable tools. You can also fork open-source MCP servers to remove tools that are not needed. However, these are workarounds that do not address the underlying issue.</p><p>Nonetheless, when used sparingly, MCP servers can be useful in specific situations. For example, the Playwright MCP server can help debug frontend visual issues.</p><p>The key to getting the most out of MCP is to be <strong>selective and strategic</strong> about the MCP servers and tools you enable, instead of just enabling all of them.</p><h2>3. AST / Codemap</h2><div class="pullquote"><p><strong>Emerging. Effective.</strong></p></div><p>Abstract Syntax Trees (ASTs) and code maps are advanced techniques for AI agents to understand the codebase. Instead of using RAG or embeddings to generate a vector database, agents leverage tools like <a href="https://tree-sitter.github.io/tree-sitter/">tree-sitter</a> to parse the code and generate an accurate high-level representation of the code (a code map).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_IH8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_IH8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_IH8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_IH8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_IH8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_IH8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg" width="1200" height="796" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:796,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;RepoPrompt Codemap&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RepoPrompt Codemap" title="RepoPrompt Codemap" srcset="https://substackcdn.com/image/fetch/$s_!_IH8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_IH8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_IH8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_IH8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e407630-8ede-4c5f-9697-4a0e413b0e56_1200x796.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of code map being used in Repo Prompt</figcaption></figure></div><p>ASTs and code maps give the agent a concise overview of the codebase. This technique has a few advantages over alternative tools:</p><ul><li><p>The output is accurate and free from hallucinations and errors, because the AST parsing logic is deterministic and mirrors how the code is actually read by a compiler.</p></li><li><p>A code map is more token-efficient compared to raw string-based search using tools like <code>grep</code>, as it hides implementation details and retains only high-level symbols and control flow.</p></li><li><p>A code map is more cost-effective than RAG, as it does not involve embeddings or vector stores.</p></li></ul><p>To the best of our knowledge, <a href="https://cline.bot/">Cline</a> <a href="https://github.com/cline/cline/blob/8fbccff8538cde243ceb08d644a15dbf2256b544/src/services/tree-sitter/index.ts">uses tree-sitter</a> for parsing code and file search. <a href="https://repoprompt.com/">Repo Prompt</a> also parses the code and generates a <a href="https://origo.prose.sh/code-maps">code map</a> to help the model understand the codebase.</p><p>We are not aware of other tools that employ this technique, and such features are unlikely to be exposed to end users to tune or modify.</p><h2>4. Parallel Agents</h2><div class="pullquote"><p><strong>Emerging. Effective.</strong></p></div><p>Parallel agents were first popularized by remote AI coding platforms like <a href="https://devin.ai/">Devin</a>, where for each task, you spin up a new agent in an isolated instance. If you give multiple tasks, you get agents running in parallel by default.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wlw7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wlw7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Wlw7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Wlw7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Wlw7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wlw7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg" width="1200" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Devin Parallel Agents&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Devin Parallel Agents" title="Devin Parallel Agents" srcset="https://substackcdn.com/image/fetch/$s_!Wlw7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Wlw7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Wlw7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Wlw7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ade106-cd43-4023-ab20-ba18192ef3d9_1200x562.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of Devin agents being spawn for different tasks</figcaption></figure></div><p>This technique recently became available in Cursor as <a href="https://docs.cursor.com/en/background-agent">background agents</a>. Claude Code also supports this via its <a href="https://docs.anthropic.com/en/docs/claude-code/github-actions">GitHub Actions integration</a>.</p><p>It is a little bit harder to set up parallel agents locally on a single project. Claude Code <a href="https://www.anthropic.com/engineering/claude-code-best-practices">recommends</a> creating multiple Git checkouts in separate folders or using Git worktrees to set up parallel agents for the same project.</p><p>Personally, I have not needed to run them in parallel often, as I have multiple projects that I switch between while the agent is working. However, this technique closely mirrors the real-world software development setting, where <strong>a team of engineers work on different tasks independently in parallel</strong>, so there are no obvious limitations with the technique.</p><p>Given the right tools and workflows, remote agents should be able to complete the entire task in isolation and submit a PR as a way to coordinate code merges, making it no less effective than running a single agent.</p><h2>5. Sub-Agents</h2><div class="pullquote"><p><strong>Emerging. Limited Effectiveness.</strong></p></div><p>Sub-agents are not to be confused with parallel agents. With sub-agents, we are talking about <strong>agents that are spawned by another agent</strong>, instead of by humans.</p><p>The first real-world sub-agent was also introduced in September 2024 by Devin as a feature called <a href="https://docs.devin.ai/release-notes/overview#september-3%2C-2024">MultiDevin</a>, where one Devin acts as a "manager" to distribute work to "worker" Devins.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vFac!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vFac!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vFac!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vFac!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vFac!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vFac!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg" width="1200" height="586" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:586,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Multi-Devin&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Multi-Devin" title="Multi-Devin" srcset="https://substackcdn.com/image/fetch/$s_!vFac!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vFac!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vFac!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vFac!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc9b709-4b39-4916-8859-4868d13e67f5_1200x586.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MultiDevin launched as a feature on September 3, 2024</figcaption></figure></div><p>However, this feature seems to have been removed from the Devin documentation. There is also an interesting <a href="https://cognition.ai/blog/dont-build-multi-agents">blog post</a> from Cognition (Devin) on principles for building agents that touches on sub-agents and why you should not build multi-agent systems.</p><p>Sub-agents have become popular as a <a href="https://docs.anthropic.com/en/docs/claude-code/sub-agents">feature</a> inside Claude Code, where the sub-agent receives instructions from the main agent and has its own context window separate from the main context. This helps keep the main context window focused on the main task and not cluttered with sub-tasks.</p><p>Based on my personal experience with Claude Code sub-agents for gathering context, they are quite slow in completing tasks. They often have to start from scratch and miss the context that was already present in the main agent, probably because not all context from the main agent is passed to the sub-agent.</p><p>It is best to think of them as a <strong>trade-off where you sacrifice speed to save space in the main context window</strong>.</p><p>I did find sub-agents useful for making specialized and reusable workflows, i.e., <strong>custom agents</strong>. For example, I have a sub-agent configured to perform releases, which involves gathering the changes, updating the version number, and writing release notes. It is a good way of encapsulating prompts and steps for accomplishing a repetitive task and making it reusable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SZQX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SZQX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SZQX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SZQX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SZQX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SZQX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg" width="1200" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Changelog Version Updater&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Changelog Version Updater" title="Changelog Version Updater" srcset="https://substackcdn.com/image/fetch/$s_!SZQX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SZQX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SZQX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SZQX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F196f98d1-55e4-4e24-9656-46801275a2e2_1200x771.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of a custom agent I created to help with version update and release notes</figcaption></figure></div><div><hr></div><p>That's all for the techniques on tools and agents. Here&#8217;s summary of the techniques covered in part 1 and part 2, using a 2D quadrant visualization:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tjcd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tjcd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 424w, https://substackcdn.com/image/fetch/$s_!Tjcd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 848w, https://substackcdn.com/image/fetch/$s_!Tjcd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 1272w, https://substackcdn.com/image/fetch/$s_!Tjcd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tjcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png" width="1456" height="866" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:866,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:202143,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/171175218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tjcd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 424w, https://substackcdn.com/image/fetch/$s_!Tjcd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 848w, https://substackcdn.com/image/fetch/$s_!Tjcd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 1272w, https://substackcdn.com/image/fetch/$s_!Tjcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the upcoming posts, I will be covering techniques around <strong>development workflow</strong>.</p><p>Subscribe to read new posts when they come out.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Check out my other works:</p><p><strong><a href="https://eval.16x.engineer/">16x Eval</a></strong> - Simple desktop app for model evaluation and prompt engineering</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QeUm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QeUm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 424w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 848w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1272w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QeUm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png" width="1456" height="1012" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1012,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:777337,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!QeUm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 424w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 848w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1272w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong><a href="https://www.youtube.com/@16x.engineer">16x AI coding stream</a></strong> - Weekly livestream where I build cool stuff and share my AI coding workflow live on YouTube.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZeDK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZeDK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 424w, https://substackcdn.com/image/fetch/$s_!ZeDK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 848w, https://substackcdn.com/image/fetch/$s_!ZeDK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 1272w, https://substackcdn.com/image/fetch/$s_!ZeDK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZeDK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png" width="1308" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1308,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:391472,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/171175218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZeDK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 424w, https://substackcdn.com/image/fetch/$s_!ZeDK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 848w, https://substackcdn.com/image/fetch/$s_!ZeDK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 1272w, https://substackcdn.com/image/fetch/$s_!ZeDK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e1845f-1732-4298-9540-1fb9fb82efbc_1308x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Effectiveness of AI Coding Techniques: Input & Context]]></title><description><![CDATA[A series where I analysis the effectiveness of various AI coding techniques. First post focuses on input and context related techniques.]]></description><link>https://thegroundtruth.media/p/effectiveness-of-ai-coding-techniques-input-context</link><guid isPermaLink="false">https://thegroundtruth.media/p/effectiveness-of-ai-coding-techniques-input-context</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Sun, 03 Aug 2025 08:52:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/796325d2-183a-4b50-a181-a9c1b3070ced_1878x1142.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are so many AI coding tips and techniques, but which are the ones that are actually effective?</p><p>In this series of posts, I will analyze some AI coding techniques in terms of maturity and effectiveness:</p><ul><li><p>Has the technique been <strong>mature and stable</strong>, or is it an <strong>emerging</strong> technique that is still evolving?</p></li><li><p>Is the technique <strong>effective</strong> in achieving better coding performance, or reducing cost?</p></li></ul><p>For the first post, we will focus on techniques related to <strong>input and context management</strong>.</p><h2>1. Prompt Engineering</h2><div class="pullquote"><p><strong>Mature. Effective.</strong></p></div><p><strong>Prompt engineering</strong> is as old as the launch of ChatGPT. And it is still one of the most valuable techniques for AI coding. Being able to articulate task requirements clearly helps models become more effective.</p><p>Top AI companies have dedicated guides for prompt engineering:</p><ul><li><p><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview">Anthropic Prompt Engineering Guide</a></p></li><li><p><a href="https://platform.openai.com/docs/guides/prompt-engineering">OpenAI Prompt Engineering Guide</a></p></li></ul><p>There are also community-driven <a href="https://www.promptingguide.ai/">prompt engineering guides</a>, and specialized prompt optimizer tools like <a href="https://dspy.ai/">DSPy</a>. Some products like <a href="https://devin.ai/">Devin</a> have built-in tool to improve the prompt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Ko2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Ko2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 424w, https://substackcdn.com/image/fetch/$s_!9Ko2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 848w, https://substackcdn.com/image/fetch/$s_!9Ko2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 1272w, https://substackcdn.com/image/fetch/$s_!9Ko2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Ko2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png" width="1456" height="939" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:939,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110552,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Ko2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 424w, https://substackcdn.com/image/fetch/$s_!9Ko2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 848w, https://substackcdn.com/image/fetch/$s_!9Ko2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 1272w, https://substackcdn.com/image/fetch/$s_!9Ko2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a5e97a-ee30-41ea-9762-0f9239be36f4_1656x1068.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Devin&#8217;s built-in prompt optimizer</figcaption></figure></div><p>I have written some <a href="https://16x.engineer/2024/02/03/chatgpt-coding-best-practices.html">prompting tips</a> in 2024 for using ChatGPT for coding, which are still relevant today.</p><p>All these effort goes to show that prompt engineering remains a key technique for effective coding using AI.</p><h2><strong>2. Retrieval-Augmented Generation (RAG)</strong></h2><div class="pullquote"><p><strong>Mature. Effective.</strong></p></div><p><strong>RAG</strong> is also an old technique, that was widely used for building custom chatbots linked to an internal knowledge base.</p><p>Cursor popularized the technique for coding, as it uses RAG to build an embedding model of the codebase (codebase indexing). This was effective for codebases that are medium-sized and allow models to gather context quickly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I-aM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I-aM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 424w, https://substackcdn.com/image/fetch/$s_!I-aM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 848w, https://substackcdn.com/image/fetch/$s_!I-aM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 1272w, https://substackcdn.com/image/fetch/$s_!I-aM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I-aM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png" width="1456" height="716" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:716,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:153056,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I-aM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 424w, https://substackcdn.com/image/fetch/$s_!I-aM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 848w, https://substackcdn.com/image/fetch/$s_!I-aM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 1272w, https://substackcdn.com/image/fetch/$s_!I-aM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f46051-272b-4cd6-b419-54ac2082092f_1916x942.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Cursor&#8217;s Indexing Feature is a form of RAG</figcaption></figure></div><p>However, due to the limits on models' context window size, RAG is not particularly effective on very large codebases, such as mono-repos with hundreds of microservices inside.</p><p>RAG also requires vectorizing the codebase, storing and retrieving these vectors somewhere, performing computations to determine the relevant chunks, all of which add complexity to the workflow.</p><h2><strong>3. Context Engineering</strong></h2><div class="pullquote"><p><strong>Emerging. Effective.</strong></p></div><p>There are new emerging techniques that provide alternative ways to gather context without relying on embeddings and vectors.</p><p>These are called <strong>context engineering</strong>, which involves using tools to gather context from the codebase. It can be codemap or command-line (CLI) tools like grep and git. The use of tools to search around the codebase mirrors how engineers would go about finding relevant code in real world.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gci1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gci1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 424w, https://substackcdn.com/image/fetch/$s_!Gci1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 848w, https://substackcdn.com/image/fetch/$s_!Gci1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Gci1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gci1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png" width="1340" height="464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e985728b-c540-4591-ba9d-968903335fdf_1340x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1340,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:82543,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gci1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 424w, https://substackcdn.com/image/fetch/$s_!Gci1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 848w, https://substackcdn.com/image/fetch/$s_!Gci1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Gci1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe985728b-c540-4591-ba9d-968903335fdf_1340x464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Claude Code can use cli tools like grep or git to gather context</figcaption></figure></div><p>I consider context engineering an <strong>alternative form of RAG</strong>, as it also involves context retrieval and feeding it into the model for generation.</p><p>Context Engineering is popularized by tools like <a href="https://repoprompt.com/">RepoPrompt</a> and more notably <a href="https://docs.anthropic.com/en/docs/claude-code/overview">Claude Code</a>. The wide adoption of Claude Code proves its effectiveness as an emerging technique that is still evolving.</p><h2><strong>4. Rules / AGENT.md</strong></h2><div class="pullquote"><p><strong>Emerging. Limited Effectiveness.</strong></p></div><p><strong>Rules</strong> are popularized by AI coding tools like <a href="https://docs.cursor.com/en/context/rules">Cursor</a> and <a href="https://docs.cline.bot/features/cline-rules">Cline</a>. They are typically written by human developers, while Claude Code can generate rules (CLAUDE.md) automatically via <code>/init</code> command.</p><p>Different tools have different syntax and naming conventions for the rules, but they all basically do the same thing: Provide general instructions and guide AI models.</p><p>When used well, they can help reduce the need to repeat general instructions or repo-specific guidelines to the model.</p><p>However, there are a few issues with current implementation of rules:</p><ul><li><p>Sometimes models <strong>can ignore rules</strong> and do its own things. It is important to treat them as <strong>guidelines instead of rules</strong> that models will follow religiously.</p></li><li><p>A large rule / CLAUDE.md file can <strong>fill up the context window</strong> quickly, resulting in reduce quality of output as models <a href="https://thegroundtruth.substack.com/p/the-ground-truth-weekly-effective">perform worse with longer context</a>.</p></li><li><p>The rules can <strong>become outdated</strong> as the codebase undergoes structural changes or refactoring. A tip to mitigate this is to run the <code>/init</code> command regularly in Claude Code to update the rules.</p></li></ul><p>There are also efforts to standardize the rules across tools as <a href="https://ampcode.com/AGENT.md">AGENT.md</a> to reduce cluttering of the various rules in the code repos, which can help the rules to become more universal and effective.</p><h2><strong>5. Knowledge / Memory</strong></h2><div class="pullquote"><p><strong>Emerging. Limited Effectiveness.</strong></p></div><p><strong>Knowledge</strong> is a technique first seen in Devin. New knowledge is automatically proposed when the agent is working on a task and receives feedback from the user.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7If9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7If9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 424w, https://substackcdn.com/image/fetch/$s_!7If9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 848w, https://substackcdn.com/image/fetch/$s_!7If9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 1272w, https://substackcdn.com/image/fetch/$s_!7If9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7If9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png" width="1456" height="875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:875,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224244,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7If9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 424w, https://substackcdn.com/image/fetch/$s_!7If9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 848w, https://substackcdn.com/image/fetch/$s_!7If9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 1272w, https://substackcdn.com/image/fetch/$s_!7If9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad6b4f76-805a-44d5-b1ec-228882abd03b_1974x1186.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Automatically proposed codebase knowledge in Devin</figcaption></figure></div><p>In Claude Code, this feature is called <strong><a href="https://docs.anthropic.com/en/docs/claude-code/memory">memory</a></strong> and shares the same CLAUDE.md file with rules. Unlike Devin, memory in Claude Code is not automatically proposed by the agent. Instead, the user need to use <code>#</code> commands to add new memory or edit the CLAUDE.md file directly.</p><p>Cursor also recently added the memory feature, which is automatically proposed, similar to Devin.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nl5C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nl5C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 424w, https://substackcdn.com/image/fetch/$s_!Nl5C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 848w, https://substackcdn.com/image/fetch/$s_!Nl5C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 1272w, https://substackcdn.com/image/fetch/$s_!Nl5C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nl5C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png" width="1456" height="755" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:755,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:189113,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nl5C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 424w, https://substackcdn.com/image/fetch/$s_!Nl5C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 848w, https://substackcdn.com/image/fetch/$s_!Nl5C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 1272w, https://substackcdn.com/image/fetch/$s_!Nl5C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94c05ab7-99b0-4508-b933-39820a209ca7_1890x980.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Newly added Memories feature in Cursor</figcaption></figure></div><p>Based on my experience with Devin (January 2025) and Claude Code (July 2025), the knowledge / memory technique has limited effectiveness:</p><ul><li><p>The automatically proposed knowledge is a mixture of useful general guideline and one-off comments that are likely not useful for other tasks. Very low precision and signal-to-noise ratio to be useful.</p></li><li><p>The manual proposal of memory in Claude is not very intuitive in terms of user experience, and the generated memory don't fully capture the user's intent based on the context.</p></li></ul><p>As of now, it more effective to write rules manually instead of using the knowledge / memory feature.</p><div><hr></div><p>That's all for the techniques on input and context. Here&#8217;s summary of the techniques using a 2D quadrant visualization:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l8NZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l8NZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 424w, https://substackcdn.com/image/fetch/$s_!l8NZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 848w, https://substackcdn.com/image/fetch/$s_!l8NZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!l8NZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l8NZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png" width="1456" height="885" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:885,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147305,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l8NZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 424w, https://substackcdn.com/image/fetch/$s_!l8NZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 848w, https://substackcdn.com/image/fetch/$s_!l8NZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!l8NZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdacf37-80d3-4a8a-a352-dea571a763bf_1878x1142.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Check out <a href="https://thegroundtruth.substack.com/p/effectiveness-of-ai-coding-techniques-tools-agents">part 2</a> of the series on tools and agents:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bf119f95-b55b-4b12-8231-6f67c20c47da&quot;,&quot;caption&quot;:&quot;In this post, we will analyze the effectiveness of AI coding techniques using tools and agents.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;md&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Effectiveness of AI Coding Techniques: Tools and Agents&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:6195712,&quot;name&quot;:&quot;Zhu Liang&quot;,&quot;bio&quot;:&quot;Building 16x Eval - Effortlessly evaluate prompts and models&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcbdead0-413d-4346-ba8d-7aff6beb0b24_2914x2914.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-17T06:56:51.435Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Tjcd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96de48-2a3e-4d6e-a181-6975aff59e8b_1806x1074.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://thegroundtruth.substack.com/p/effectiveness-of-ai-coding-techniques-tools-agents&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:171175218,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;The Ground Truth&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!iZv-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F961dbec3-a09e-4643-9c25-05f83cdda466_534x534.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>And subscribe to read new posts when they come out.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Check out my other works:</p><p><strong><a href="https://eval.16x.engineer/">16x Eval</a></strong> - Simple desktop app for model evaluation and prompt engineering</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QeUm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QeUm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 424w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 848w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1272w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QeUm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png" width="1456" height="1012" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1012,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:777337,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QeUm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 424w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 848w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1272w, https://substackcdn.com/image/fetch/$s_!QeUm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fb98ae-960b-4f17-ba15-f5d5b732598c_2624x1824.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong><a href="https://www.youtube.com/@16x.engineer">16x AI coding stream</a></strong> - Weekly livestream where I build cool stuff using AI tools live on YouTube</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ie1a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ie1a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 424w, https://substackcdn.com/image/fetch/$s_!Ie1a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 848w, https://substackcdn.com/image/fetch/$s_!Ie1a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 1272w, https://substackcdn.com/image/fetch/$s_!Ie1a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ie1a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png" width="1304" height="648" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:1304,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:383543,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/169976484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ie1a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 424w, https://substackcdn.com/image/fetch/$s_!Ie1a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 848w, https://substackcdn.com/image/fetch/$s_!Ie1a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 1272w, https://substackcdn.com/image/fetch/$s_!Ie1a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff29001-904d-4a60-90a9-b18f4385bf9a_1304x648.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[How Is Claude Code Different From Cursor?]]></title><description><![CDATA[Main differences between Claude Code and Cursor, in terms of pricing, performance and user experience.]]></description><link>https://thegroundtruth.media/p/claude-code-difference-from-cursor</link><guid isPermaLink="false">https://thegroundtruth.media/p/claude-code-difference-from-cursor</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Fri, 11 Jul 2025 08:37:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!urJD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I received a lot of questions recently about Claude Code, specifically how is it different from Cursor?</p><p>As someone who have used Cursor for over a year, and Claude Code for over 3 months, I do have some thoughts on the topic.</p><p>In this post, I will go through the key differences between them: <strong>pricing</strong>, <strong>performance</strong> and <strong>user experience</strong>.</p><h2><strong>Pricing and Value</strong></h2><p>The pricing models for these two tools have become quite different recently.</p><p>Cursor's entry-level Pro plan is $20 a month, which <a href="https://cursor.com/blog/june-2025-pricing">now provides $20 in API credits</a>, in its new <strong>API usage-based pricing</strong>. Once you use up that credit, you have to pay for additional usage based on API costs.</p><p>This is a big change from their older model based on the number of requests, which was heavily subsidized by Cursor. Cursor used to give discounts for Claude Sonnet 4 at 0.5x request each, so with 500 monthly requests, you can get 1000 Claude Sonnet 4 requests. Now with the new pricing, it is just &#8220;<a href="https://docs.cursor.com/account/pricing#expected-usage-within-limits">225 Sonnet 4 requests</a>&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yZIb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yZIb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 424w, https://substackcdn.com/image/fetch/$s_!yZIb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 848w, https://substackcdn.com/image/fetch/$s_!yZIb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 1272w, https://substackcdn.com/image/fetch/$s_!yZIb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yZIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png" width="1454" height="624" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:624,&quot;width&quot;:1454,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135831,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167982137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yZIb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 424w, https://substackcdn.com/image/fetch/$s_!yZIb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 848w, https://substackcdn.com/image/fetch/$s_!yZIb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 1272w, https://substackcdn.com/image/fetch/$s_!yZIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9fceaf5-56fb-48fb-85cc-aa519e62e1d9_1454x624.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On the other hand, Anthropic recently started to offer Claude Code as part of its subscriptions, with the <a href="https://www.anthropic.com/claude-code">cheapest Claude Pro plan at $20 a month</a>.</p><p>As of now:</p><div class="pullquote"><p><strong>Claude Code pricing offers more usage and better value than Cursor.</strong></p></div><p>It is very obvious that Anthropic is &#8220;<strong>subsidizing</strong>&#8221; the API cost for users.</p><blockquote><p>This is ironic because just a few months ago, Cursor was the one doing heavy subsidies, and Claude Code was the one charging for raw API costs. Now it is completely flipped around.</p></blockquote><p>Anthropic can afford to do this because it owns the Claude models, and likely have lower internal costs of operating them as compared to Cursor, which has to pay for (presumably privately negotiated) API pricing.</p><p>I've personally gotten about <a href="https://x.com/paradite_/status/1941358523264401668">$150 of API usage</a> from my $20 plan in June 2025. And that&#8217;s despite me being on holiday and didn&#8217;t use Claude Code everyday.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!urJD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!urJD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 424w, https://substackcdn.com/image/fetch/$s_!urJD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 848w, https://substackcdn.com/image/fetch/$s_!urJD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 1272w, https://substackcdn.com/image/fetch/$s_!urJD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!urJD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png" width="1194" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1194,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:257251,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167982137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!urJD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 424w, https://substackcdn.com/image/fetch/$s_!urJD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 848w, https://substackcdn.com/image/fetch/$s_!urJD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 1272w, https://substackcdn.com/image/fetch/$s_!urJD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6045af29-5303-4df6-b4ae-66709f4e2dcd_1194x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are also a lot of people reporting similar observations, like getting <a href="https://x.com/DuaneAdam/status/1942491010904515079">$300 API usage out of $100 Claude Max plan</a>.</p><h2><strong>Capabilities and Performance</strong></h2><p>When it comes to raw coding ability, the general consensus is that <strong>Claude Code is more capable.</strong></p><p>The main reason seems to be how Claude Code understands your project's context.</p><p><a href="https://x.com/pvncher/status/1941545054331601251">Claude Code uses agentic search</a> to understand your entire codebase, while Cursor relies on embedding models and compressed context. This means Claude Code can maintain better awareness of your project structure and dependencies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hmB5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hmB5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 424w, https://substackcdn.com/image/fetch/$s_!hmB5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 848w, https://substackcdn.com/image/fetch/$s_!hmB5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 1272w, https://substackcdn.com/image/fetch/$s_!hmB5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hmB5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png" width="1194" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:1194,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105905,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167982137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hmB5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 424w, https://substackcdn.com/image/fetch/$s_!hmB5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 848w, https://substackcdn.com/image/fetch/$s_!hmB5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 1272w, https://substackcdn.com/image/fetch/$s_!hmB5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa5c84d-ea9f-4275-9f81-067ff102bc05_1194x528.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Additionally, <a href="https://www.reddit.com/r/ClaudeAI/comments/1lmz7ha/why_is_claude_code_better_than_cursor_claude/">Claude Code has no tool call limits</a>, allowing it to work on complex tasks for extended periods, where Cursor limits the number of tool calls to 25 per session by default.</p><p>From my own experience, Claude Code can work on a complex task for <strong>over 10 minutes</strong> without getting lost. In contrast, Cursor sometimes struggles with maintaining context or hits internal tool call limits <strong>after about five minutes</strong>.</p><p>This means:</p><div class="pullquote"><p><strong>Claude Code can handle larger and more complex task more effectively than Cursor</strong>.</p></div><h2><strong>User Experience &amp; Code Review</strong></h2><p>There is one area that Cursor still has the edge over Claude Code right now:</p><div class="pullquote"><p><strong>Cursor's biggest advantage is the IDE user experience for reviewing the code changes.</strong></p></div><p>As Nick Dobos <a href="https://x.com/NickADobos/status/1941552807842283966">mentioned on X</a>, Cursor is great at showing diffs and reviewing changes. Others also agree that Claude Code's command-line interface has a <a href="https://www.reddit.com/r/cursor/comments/1ljz25q/petition_to_add_claude_code_natively_in_cursor/">worse user experience</a> in this regard.</p><p>However, there is a simple way to improve this: I run Claude Code inside the terminal of my Cursor editor. Once a task is done, I use the source control tab of Cursor (or VS Code) to see all the modified files.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mkdR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mkdR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 424w, https://substackcdn.com/image/fetch/$s_!mkdR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 848w, https://substackcdn.com/image/fetch/$s_!mkdR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!mkdR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mkdR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png" width="1456" height="867" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:867,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:922573,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167982137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mkdR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 424w, https://substackcdn.com/image/fetch/$s_!mkdR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 848w, https://substackcdn.com/image/fetch/$s_!mkdR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!mkdR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a50f5ef-63fa-412d-85f3-484d1e8aa3db_3164x1884.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>From there, I can easily open each file as a change diff (or as a normal file) to review the changes before committing. It&#8217;s not as nice and polished as Cursor&#8217;s native inline diffs, but it gets the job done.</p><p>It's also worth mentioning that Cursor, as an IDE, has support for smart tab completion. In fact Cursor has the best tab completion on the market, with predictive jumping across files. Claude Code, as a cli tool, does not give you such capabilities. </p><h2><strong>Which One Should You Use?</strong></h2><p>My honest recommendation is to <strong>subscribe to both Claude Code and Cursor at the $20 a month</strong>. </p><p>This gives you <strong>a lot of usage credits at competitive pricing</strong> and the best of both worlds for cli coding agent and IDE tab completion. </p><p>Another benefit is redundancy. When one service hits its limits, you can switch to the other.</p><p>Having both subscriptions also lets you stay up-to-date with the latest improvements from both companies. It's good to keep exploring different tools, in case new improvements from one company make it better than the competitor.</p><div><hr></div><p>Want to see how I use Claude Code personally?</p><p>I am doing a <strong>livestream this weekend (Saturday, 12 July)</strong> on YouTube, where I will demo how I use Claude Code to build live production apps like <a href="https://prompt.16x.engineer/">16x Prompt</a> and <a href="https://eval.16x.engineer/">16x Eval</a>.</p><p>You will get to see exactly what my workflow is, and how I use Claude Code to build my own products.</p><p>Get notified by clicking on the YouTube livestream button below, and hit the notification bell:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.youtube.com/live/DRHlHZMwQC8&quot;,&quot;text&quot;:&quot;YouTube Livestream&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.youtube.com/live/DRHlHZMwQC8"><span>YouTube Livestream</span></a></p><p>You can also read my previous post on <a href="https://thegroundtruth.substack.com/p/my-claude-code-workflow-and-personal-tips">my Claude Code workflow and personal tips</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[My Claude Code Workflow And Personal Tips]]></title><description><![CDATA[How I use roadmap + task files to manage Claude Code, and my personal tips for effective Claude Code usage.]]></description><link>https://thegroundtruth.media/p/my-claude-code-workflow-and-personal-tips</link><guid isPermaLink="false">https://thegroundtruth.media/p/my-claude-code-workflow-and-personal-tips</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Thu, 03 Jul 2025 10:23:45 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/201d3347-e705-4333-8b46-5499b5f54c50_1200x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have shared <a href="https://x.com/paradite_/status/1932358489973899695">bits</a> and <a href="https://www.linkedin.com/posts/zhu-liang_really-enjoy-having-a-team-lead-full-stack-activity-7339233078747308032-0_tL/">pieces</a> of my current coding workflow with <a href="https://docs.anthropic.com/en/docs/claude-code/overview">Claude Code</a> (and Cursor).</p><p>Many people wanted to know more details and the exact setup I use, so here it is, along my personal tips on how to use <a href="https://docs.anthropic.com/en/docs/claude-code/overview">Claude Code</a> effectively.</p><h2>My Current Workflow Setup</h2><p><strong>ROADMAP.md</strong></p><p>Have a ROADMAP.md file inside reference folder: <code>reference/ROADMAP.md</code>.</p><p>The <strong>ROADMAP.md </strong>describes two things:</p><ul><li><p>The <strong>overall</strong> <strong>development process</strong></p></li><li><p>The <strong>high level overview of each task</strong> in a few bullet points</p></li></ul><p>The ROADMAP.md file acts as the <strong>single entry point</strong> to planning out new features, adjusting priorities and working on new tasks.</p><p>Make sure to include the ROADMAP.md file <strong>explicitly</strong> inside CLAUDE.md (via the <a href="https://docs.anthropic.com/en/docs/claude-code/memory#claude-md-imports">CLAUDE.md import syntax</a>) or Cursor rules (via <a href="https://docs.cursor.com/context/rules#rule-anatomy">reference</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1lr8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1lr8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 424w, https://substackcdn.com/image/fetch/$s_!1lr8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 848w, https://substackcdn.com/image/fetch/$s_!1lr8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 1272w, https://substackcdn.com/image/fetch/$s_!1lr8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1lr8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png" width="1292" height="412" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:412,&quot;width&quot;:1292,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89639,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1lr8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 424w, https://substackcdn.com/image/fetch/$s_!1lr8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 848w, https://substackcdn.com/image/fetch/$s_!1lr8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 1272w, https://substackcdn.com/image/fetch/$s_!1lr8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f61066f-8078-4cb0-9c8a-d247a080ca83_1292x412.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Explicit reference via @ syntax</figcaption></figure></div><p>You can verify that the import is working by using the <code>/status</code> command inside Claude Code, it should print out full the memory import structure:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BbRO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BbRO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 424w, https://substackcdn.com/image/fetch/$s_!BbRO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 848w, https://substackcdn.com/image/fetch/$s_!BbRO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 1272w, https://substackcdn.com/image/fetch/$s_!BbRO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BbRO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png" width="500" height="176.29179331306992" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:232,&quot;width&quot;:658,&quot;resizeWidth&quot;:500,&quot;bytes&quot;:27223,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BbRO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 424w, https://substackcdn.com/image/fetch/$s_!BbRO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 848w, https://substackcdn.com/image/fetch/$s_!BbRO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 1272w, https://substackcdn.com/image/fetch/$s_!BbRO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9c79d14-bcc9-4b51-be91-79ac890a57eb_658x232.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>By explicitly including the file as part of memory / rules, the agent has access to the <strong>full high-level context</strong> of the project for each task, effectively grounding the agent and guiding it to the correct path.</p><p>Here&#8217;s an excerpt of my current ROADMAP.md for my new app <a href="https://writer.16x.engineer/">16x Writer</a>, which describes the <strong>development workflow</strong> and <strong>high-level tasks / features:</strong></p><pre><code># 16x Writer Development Roadmap

Web platform for AI-assisted writing and editing of blog posts.

## Overview

// High-level overview of the project, what it does, the main features

## Development Workflow

1. **Task Planning**

- Study the existing codebase and understand the current state
- Update `ROADMAP.md` to include the new task
- Priority tasks should be inserted after the last completed task

2. **Task Creation**

- Study the existing codebase and understand the current state
- Create a new task file in the `/tasks` directory
- Name format: `XXX-description.md` (e.g., `001-db.md`)
- Include high-level specifications, relevant files, acceptance criteria, and implementation steps
- Refer to last completed task in the `/tasks` directory for examples. For example, if the current task is `012`, refer to `011` and `010` for examples.
- Note that these examples are completed tasks, so the content reflects the final state of completed tasks (checked boxes and summary of changes). For the new task, the document should contain empty boxes and no summary of changes. Refer to `000-sample.md` as the sample for initial state.

3. **Task Implementation**

- Follow the specifications in the task file
- Implement features and functionality
- Update step progress within the task file after each step
- Stop after completing each step and wait for further instructions

4. **Roadmap Updates**

- Mark completed tasks with &#9989; in the roadmap
- Add reference to the task file (e.g., `See: /tasks/001-db.md`)

## Development Phases

- **Task 001: Database Schema** &#9989; - Complete
  - See: `/tasks/001-db.md`
  - Implemented 5 core tables: `context`, `prompts`, `posts`, `post_versions`, `post_context`
  - Added UUID primary keys and proper relationships
  - Created CRUD operations and seed data
  - Generated migration files

- **Task 002: Source Library UI** &#9989; - Complete

  - See: `/tasks/002-source-library.md`
  - &#9989; List view with filtering and search functionality
  - &#9989; Add/edit source forms with comprehensive metadata fields
  - &#9989; Grid/list view toggle for source display
  - &#9989; Source management with CRUD operations
  - &#9989; API endpoints for source management (`/api/sources`)
  - &#9989; Real-time source list with SWR for data fetching
  - &#9989; Delete confirmation dialogs and optimistic updates

- **Task 003: End-to-End Testing** &#9989; - Complete

  - See: `/tasks/003-e2e-testing.md`
  - &#9989; Playwright setup with TypeScript support
  - &#9989; Authentication flow tests (sign up, login)
  - &#9989; Source library CRUD tests
  - &#9989; Test environment configuration
  - &#9989; CI/CD integration

// More tasks...</code></pre><div class="pullquote"><p>I have published the <strong>full ROADMAP.md</strong> on <a href="https://github.com/paradite/ai-coding-workflow-sample/blob/main/ROADMAP.md">GitHub</a> for your reference.</p></div><p><strong>Individual Task Plans</strong></p><p>While ROADMAP.md gives the high-level overview of each task, the detailed planning of each task is carried out separately as individual files inside <code>tasks</code> folder:</p><ul><li><p><code>001-db.md</code></p></li><li><p><code>002-source-library.md</code></p></li><li><p><code>003-e2e-testing.md</code></p></li><li><p><code>004-source-refactor-context.md</code></p></li><li><p>&#8230;</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cc3F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cc3F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 424w, https://substackcdn.com/image/fetch/$s_!Cc3F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 848w, https://substackcdn.com/image/fetch/$s_!Cc3F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 1272w, https://substackcdn.com/image/fetch/$s_!Cc3F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cc3F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png" width="744" height="726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:726,&quot;width&quot;:744,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106419,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Cc3F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 424w, https://substackcdn.com/image/fetch/$s_!Cc3F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 848w, https://substackcdn.com/image/fetch/$s_!Cc3F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 1272w, https://substackcdn.com/image/fetch/$s_!Cc3F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff30cac-d725-4831-a103-2b3a7af4bbed_744x726.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot of sample task planning files for 16x Writer</figcaption></figure></div><p>You can think of these files are a <strong>combination of PRD</strong> (product requirement document) <strong>and system design</strong> (architecture) for each feature. Each file includes the following components:</p><ul><li><p>Prerequisites</p></li><li><p>Background and requirements</p></li><li><p>Current state and desired states</p></li><li><p>Implementation steps</p></li><li><p>Files that needs to modified and created</p></li><li><p>Acceptance criteria</p></li></ul><p>Here is an excerpt of a sample task planning file:</p><pre><code># Task 012: Post Editor UI Adjustments

## Progress Summary

**Status**: Not Started

- [ ] Step 1: Create Version Navigation Component
- [ ] Step 2: Create Compact Info Bar Component
- [ ] Step 3: Create User Prompt Display Component
- [ ] Step 4: Implement Version Navigation Logic
- [ ] Step 5: Refactor Post Detail Page Layout
- [ ] Step 6: Update PostEditor Component
- [ ] Step 7: Testing and Polish

## Overview

Refactor the post editor UI to be more concise and user-friendly by:

- Removing the extra info column on the right side from the post editor
- Putting concise info (versions, context) above the editor in one row
- Adding navigation to previous and future versions of the post
- Removing the post content preview at the bottom of the post editor
- Showing the user prompt at the bottom of the post editor as read-only reference

## Current State Analysis

### Current Layout Structure

The post detail page (`/dashboard/posts/[id]/page.tsx`) currently uses a 3-column grid layout:

- **Main Content (2/3 width)**: PostEditor component + Current Content Preview card
- **Sidebar (1/3 width)**: VersionHistory + PostContextManager + Post Information cards

### Current Components

- `PostEditor` - Main editing interface with title, content, status, and change summary
- `VersionHistory` - Sidebar component showing all versions with selection capability
- `PostContextManager` - Sidebar component for managing sources and references
- `Current Content Preview` - Card showing the current version content below the editor

## Target State

### New Layout Structure

- **Header**: Post title, metadata, and status (unchanged)
- **Info Bar**: Concise version info, context counts, and version navigation in one row
- **Main Editor**: Full-width PostEditor component (no sidebar)
- **User Prompt Reference**: Read-only display of the user prompt at the bottom

### UI Improvements

1. **Consolidated Info Bar**: Display version count, context counts, and navigation controls
2. **Version Navigation**: Previous/Next buttons to navigate between versions
3. **Simplified Layout**: Remove sidebar, make editor full-width
4. **User Prompt Display**: Show the current version's user prompt as reference

## Implementation Steps

### Step 1: Create Version Navigation Component

Create a new `VersionNavigation` component that provides:

- Current version indicator (e.g., "Version 3 of 5")
- Previous/Next navigation buttons
- Version selection dropdown for quick access
- Compact design suitable for horizontal layout

**Files to create/modify:**

- `components/posts/version-navigation.tsx` - New component
- `components/posts/index.ts` - Export new component

### Step 2: Create Compact Info Bar Component

Create a new `PostInfoBar` component that displays:

- Version navigation (using VersionNavigation component)
- Context counts (X sources, Y references)
- Quick access to context management
- Model information for current version

**Files to create/modify:**

- `components/posts/post-info-bar.tsx` - New component
- `components/posts/index.ts` - Export new component

// Steps 3-7...

## Acceptance Criteria

### Functional Requirements

- [ ] Version navigation works correctly (previous/next buttons)
- [ ] Version selection dropdown shows all versions
- [ ] ...

### UI/UX Requirements

- [ ] Layout is more compact and user-friendly
- [ ] No sidebar on the right side of the editor
- [ ] ...

### Technical Requirements

- [ ] No breaking changes to existing API endpoints
- [ ] Component reusability maintained
- [ ] ...

## Files Involved

### New Files

- `components/posts/version-navigation.tsx`
- `components/posts/post-info-bar.tsx`
- `components/posts/user-prompt-display.tsx`

### Modified Files

- `app/(dashboard)/dashboard/posts/[id]/page.tsx`
- `components/posts/post-editor.tsx`
- `components/posts/index.ts`

### Potentially Affected Files

- E2E tests related to post editing
- Any components that depend on the current layout

## Notes

- Maintain backward compatibility with existing data structures
- Ensure the new layout works well on both desktop and mobile
- ...

## Dependencies

- Existing post detail API endpoints
- Current PostEditor component functionality
- ...</code></pre><div class="pullquote"><p>I have published the <strong>full task plan sample file</strong> on <a href="https://github.com/paradite/ai-coding-workflow-sample/blob/main/000-sample.md">GitHub</a> for your reference.</p></div><p>Note that with the <strong>non-deterministic nature</strong> of AI agents, the plan is merely a <strong>guide</strong> for the agent, not a rule that the agent will follow 100%.</p><p>I have observed that agents tend to overlook certain instructions or requirements inside the document, especially when the task is complex, or when the initial planning was not very clear on specific parts of the task.</p><p>So it is best to treat it as a draft plan that the agent can reference during the implementation, instead of expecting it be followed religiously.</p><blockquote><p>Claude Code now has a planning mode which performs a similar function as the task planning files. <s>However, the task planning files are still useful. They can serve a persistent reference for future features, as it is part of the repo, and can be accessed easily by humans and agents alike. Claude Code plans are not persistent and are gone after you start a new session.</s></p><p>Update on Feb 2026: You can now save the plan files from Claude Code plan mode into current project via <a href="https://code.claude.com/docs/en/settings#available-settings">plansDirectory</a> option in Claude Code.</p></blockquote><p><strong>Ad Hoc Tasks and Refactoring</strong></p><p>I also have a dedicated file for tracking ad hoc tasks that are too small for ROADMAP.md, but also significant enough to warrant recording-keeping. They reside inside <code>reference/AD_HOC_TASKS.md </code>and<code> reference/REFACTORS.md</code>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z_xF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z_xF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 424w, https://substackcdn.com/image/fetch/$s_!z_xF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 848w, https://substackcdn.com/image/fetch/$s_!z_xF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 1272w, https://substackcdn.com/image/fetch/$s_!z_xF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z_xF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png" width="1064" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:1064,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76451,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z_xF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 424w, https://substackcdn.com/image/fetch/$s_!z_xF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 848w, https://substackcdn.com/image/fetch/$s_!z_xF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 1272w, https://substackcdn.com/image/fetch/$s_!z_xF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b2f738d-f407-4b52-8091-d783c387083a_1064x480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample REFACTORS.md</figcaption></figure></div><p>I mainly use them for <strong>small enhancement features and refactoring</strong> that are worth keeping track of. While you can just prompt the coding agent to work on them directly, tracking them inside a file have some benefits:</p><ul><li><p>For example, when working on a big feature, you notice a small refactoring is needed. You can record it inside <code>reference/REFACTORS.md</code> first and then prompt agent to work on it later.</p></li><li><p>With the advancement of background async agents, you could also have agents that periodically analyse the codebase, record down refactoring and enhancement opportunities inside the file. Then you can spawn another set of agents to work on them autonomously.</p></li></ul><p><strong>Folder Structure Setup</strong></p><p>Here&#8217;s an tree overview of the folder structure of my current setup:</p><pre><code><code>$ tree -L 2
&#9500;&#9472;&#9472; README.md
&#9500;&#9472;&#9472; reference
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; AD_HOC_TASKS.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; AGENT.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; BUGS.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; REFACTORS.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; ROADMAP.md
&#9500;&#9472;&#9472; tasks
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; 000-sample.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; 001-db.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; 002-source-library.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; 003-e2e-testing.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; 004-source-refactor-context.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; 005-reference-library-implementation.md
&#9474;&nbsp;&nbsp; &#9500;&#9472;&#9472; ...</code></code></pre><div><hr></div><h2>My Claude Code Workflow</h2><p>Since most of the workflow is already described in the ROADMAP.md file, the exact workflow I use is quite simple to describe.</p><p><strong>For big features:</strong></p><ul><li><p>Describe my requirement (in a few sentences) to the agent, which would update <code>reference/ROADMAP.md</code> to add a new task with high-level summary</p></li><li><p>Review the summary captured by the agent and adjust them if necessary (remove unnecessary features, or take care of special interactions)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S7dt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S7dt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 424w, https://substackcdn.com/image/fetch/$s_!S7dt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 848w, https://substackcdn.com/image/fetch/$s_!S7dt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 1272w, https://substackcdn.com/image/fetch/$s_!S7dt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S7dt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png" width="1362" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1362,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:192689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f210ee-15d6-4118-836e-49ad354f09ef_1362x1000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S7dt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 424w, https://substackcdn.com/image/fetch/$s_!S7dt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 848w, https://substackcdn.com/image/fetch/$s_!S7dt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 1272w, https://substackcdn.com/image/fetch/$s_!S7dt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b2f489e-2635-4dad-b170-7dfca5881ac6_1362x663.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example high-level summary for tasks, some requires manual editing</figcaption></figure></div><ul><li><p>Ask agent to write more detailed plan in individual task files inside <code>tasks</code> folder</p></li></ul><div class="pullquote"><p>Note that inside ROADMAP.md, I have instructions for the agent to <strong>study the existing codebase and understand the current state</strong> before writing the plan, as part of the development workflow. This guides the agent to retrieve relevant context before planning.</p></div><ul><li><p>Review the plan generated to see if it is on the right track</p></li><li><p>Prompt the agent to make amendments to the plan if necessary</p></li><li><p>Once you are happy with the plan, <strong>exit the current session or clear the agent&#8217;s context and start a new session with fresh context</strong>, so that you get maximum context window for the actual implementation.</p></li><li><p>Ask agent to implement one step at a time (pause after each step and await for human review), or complete the whole task (don&#8217;t pause after each step, just proceed to next step automatically until the whole task is completed)</p></li></ul><p><strong>For small enhancements or refactoring:</strong></p><ul><li><p>Record tasks inside <code>reference/AD_HOC_TASKS.md</code> or <code>reference/REFACTORS.md</code></p></li><li><p>Ask Claude Code to implement one-by-one</p></li><li><p>Do one task per session (Use <code>/clear</code> command or restart Claude Code) to avoid wasting tokens (sending context from previous task to the next task)</p></li></ul><h2>One-Shotting Multi-Step Tasks</h2><p>Lately I have been asking the agent to just work on the task from start to finish instead of waiting for me to approve each step.</p><p>The internal TODO list of Claude Code is capable of breaking down the task according to the steps in my plan, and executing all of them within one single session.</p><p>I found that Claude Code is currently (as of July 2025) capable of working on a task <strong>autonomously for about 10-20 minutes</strong>, after which the effectiveness goes down as the context gets filled up.</p><blockquote><p>In contrast, Cursor would give you a warning or stop working after 25 tool calls (which typically happens around 5-10 minutes or less).</p></blockquote><p>For most of my tasks (consisting of 5-8 steps) on the 16x Writer project, Claude Code was able to complete the task in one session within the context limit, without triggering automatic context compression.</p><p>One area that Claude Code still struggles is UI-related debugging (fixing e2e testing). I haven&#8217;t tried connecting it to a Playwright MCP due to <a href="https://x.com/paradite_/status/1897876471731175864">issues I have with MCP</a>, but I suspect it could help in this particular use case.</p><h2>3rd Party Tools for Claude Code</h2><p>I use the following tools alongside Claude Code:</p><ul><li><p><a href="https://cursor.com/">Cursor</a> IDE for providing LSP diagnostic support and GUI for code review</p></li><li><p><a href="http://wisprflow.ai/r/ZHU3">Wispr Flow</a> or similar tool for voice dictation (much faster than typing)</p></li><li><p><a href="https://github.com/ryoppippi/ccusage">ccusage</a> to track how much API usage you would have incurred if you were paying for API instead of Claude subscription</p></li></ul><h2>Personal Tips for Claude Code</h2><p>Here are some tips that I personally use, to get the most out of Claude Code:</p><ul><li><p>Keyboard shortcuts inside prompt box</p><ul><li><p>Option + Left / Right Arrow to Jump to previous word or next word</p></li><li><p>Command + Left / Right Arrow to Jump to start of line or end of line</p></li><li><p>Use Ctrl+W to delete full words (Not command + W)</p></li><li><p>Keyboard navigation are especially useful when voice dictation makes a mistake</p></li></ul></li><li><p>Hit &#8220;Esc&#8221; key twice to edit the previous prompt.</p><ul><li><p>Useful for correcting typos or adding clarifications if you see the agent going off the wrong track.</p></li></ul></li><li><p><a href="https://docs.anthropic.com/en/docs/claude-code/slash-commands#custom-slash-commands">Custom slash commands</a></p><ul><li><p>Setup custom commands for commonly used tasks such as &#8220;work on next task&#8221;, &#8220;refactor code&#8221; or &#8220;commit all current changes&#8221;. In this way, you can trigger these commands quickly without typing the prompt over and over.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bs6z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bs6z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 424w, https://substackcdn.com/image/fetch/$s_!Bs6z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 848w, https://substackcdn.com/image/fetch/$s_!Bs6z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 1272w, https://substackcdn.com/image/fetch/$s_!Bs6z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bs6z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png" width="1456" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111366,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bs6z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 424w, https://substackcdn.com/image/fetch/$s_!Bs6z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 848w, https://substackcdn.com/image/fetch/$s_!Bs6z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 1272w, https://substackcdn.com/image/fetch/$s_!Bs6z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6f2da92-cce2-48df-b6df-6e48dc6758f3_1628x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Claude Code Custom Slash Commands</figcaption></figure></div><ul><li><p>Bypass permissions (auto-accept) by default</p><ul><li><p>By default you need to press Shift + Tab to toggle on auto-accept, but you can skip this step by using <code>--dangerously-skip-permissions</code> flag</p></li><li><p>I personally setup an alias called <code>cc</code> to automate this step:</p><ul><li><p><code>alias cc="claude --dangerously-skip-permissions"</code></p></li></ul></li><li><p>This way, I don&#8217;t need to toggle auto-accept for each new session</p></li></ul></li><li><p>Better local code review experience</p><ul><li><p>Claude Code does not open edited files inside IDE by default when auto-accept is on, making local code review quite tedious, especially when changes are across multiple files</p></li><li><p>You can use the source control tab in Cursor / VS Code to view the list of changed files and open them quickly</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hblI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hblI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 424w, https://substackcdn.com/image/fetch/$s_!hblI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 848w, https://substackcdn.com/image/fetch/$s_!hblI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!hblI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hblI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png" width="1456" height="867" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:867,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:922573,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hblI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 424w, https://substackcdn.com/image/fetch/$s_!hblI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 848w, https://substackcdn.com/image/fetch/$s_!hblI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!hblI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38bdf7b-8a5e-43e5-9478-3bead2c0b355_3164x1884.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Use Cursor IDE source control tab for local code review, after Claude Code has completed the task.</figcaption></figure></div><ul><li><p>Lastly, remember to check out <a href="https://docs.anthropic.com/en/docs/claude-code/overview">Claude Code docs</a> regularly for new features or official guides.</p><ul><li><p>For example, there is a new feature called <a href="https://docs.anthropic.com/en/docs/claude-code/hooks">hooks</a> that you might want to use for enforcing checks.</p></li></ul></li></ul><h2>Other Projects and Ideas</h2><p>Here are some other interesting projects and ideas on Claude Code, that I haven't personally used, but are worth checking out:</p><p>GUI for Claude Code</p><ul><li><p><a href="https://github.com/getAsterisk/claudia">https://github.com/getAsterisk/claudia</a></p></li></ul><p>Tell Claude you have ast-grep</p><ul><li><p><a href="https://x.com/kieranklaassen/status/1938377363542405184">https://x.com/kieranklaassen/status/1938377363542405184</a></p></li></ul><p>Background agents for long running tasks</p><ul><li><p><a href="https://x.com/iannuttall/status/1937985342378021275">https://x.com/iannuttall/status/1937985342378021275</a></p></li></ul><div><hr></div><p>Full ROADMAP.md file and sample task planning file <a href="https://github.com/paradite/ai-coding-workflow-sample">available on GitHub</a></p><div><hr></div><p>Looking for a tool to test out different models and prompts? Check out my new app: <a href="https://eval.16x.engineer/">16x Eval</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ijL0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ijL0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 424w, https://substackcdn.com/image/fetch/$s_!ijL0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 848w, https://substackcdn.com/image/fetch/$s_!ijL0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 1272w, https://substackcdn.com/image/fetch/$s_!ijL0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ijL0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png" width="1456" height="1012" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1012,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:776680,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/167419041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ijL0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 424w, https://substackcdn.com/image/fetch/$s_!ijL0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 848w, https://substackcdn.com/image/fetch/$s_!ijL0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 1272w, https://substackcdn.com/image/fetch/$s_!ijL0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac78750-6c22-4dd1-bb10-8e53ae5aeaf9_2624x1824.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>16x Eval is a desktop application giving you a local workspace for prompt engineering and model evaluation.</p><p>Setup your own personal evals in minutes. Experiment with different combinations of prompts and models to find the best fit for your use case.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[How We Got Here - AI Timeline from 2015 to 2024]]></title><description><![CDATA[What led to the revolutionary LLMs and ChatGPT that changed everything? What were the roles of Google, OpenAI and other companies in the breakthrough?]]></description><link>https://thegroundtruth.media/p/how-we-got-here-ai-timeline-2015-2024</link><guid isPermaLink="false">https://thegroundtruth.media/p/how-we-got-here-ai-timeline-2015-2024</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Tue, 15 Apr 2025 08:53:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/af74d32b-d1e4-4119-93c6-6c8c64a830cb_1500x1000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As we marvel at the incredible things that LLMs can do today, one might start to wonder: How did we get here?</p><p>Did OpenAI just stumble upon a revolutionary idea and single-handedly changed the trajectory of AI advancement? Or was it a slow and prolonged process involving multiple companies, that eventually led to the ChatGPT moment?</p><p>In this post on The Ground Truth, let&#8217;s rewind to 2015, and see how major milestones AI and Reinforcement Learning (RL) led us to here.</p><div><hr></div><h2>2015: Foundations of Modern AI</h2><p><strong>Deep Q-network (DQN)</strong> - Google DeepMind's published the famous <a href="https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf">DQN paper on Nature</a>. This groundbreaking work demonstrating how deep neural networks combined with Q-learning could master Atari games using only pixel inputs, marking a significant advancement in deep reinforcement learning.</p><p><strong>AlphaGo</strong> - Google DeepMind created the first computer program to <a href="https://www.nature.com/articles/nature16961">defeat a professional human Go player</a>, combining Monte Carlo tree search (MCST) with deep neural networks trained by supervised and reinforcement learning.</p><p><strong>OpenAI Founded</strong> - Sam Altman, Elon Musk, Ilya Sutskever, Greg Brockman and others <a href="https://en.wikipedia.org/wiki/OpenAI#2015%E2%80%932018:_Non-profit_beginnings">established OpenAI</a> as a research company with the goal of ensuring artificial general intelligence (AGI) &#8220;benefits all of humanity&#8221;.</p><h2>2016: AlphaGo&#8217;s Historic Triumph</h2><p><strong>Historic Go Match Victory</strong> - DeepMind's AlphaGo <a href="https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol">defeated</a> 18-time world champion Lee Sedol 4-1, a landmark achievement that came a decade earlier than many experts predicted, demonstrating AI's capability for creative strategic thinking.</p><h2>2017: Transformers and Next-Gen Reinforcement Learning</h2><p><strong>"Attention Is All You Need"</strong> - Google researchers published <a href="https://en.wikipedia.org/wiki/Attention_Is_All_You_Need">the famous paper</a> introducing the <strong>Transformer</strong> architecture, which relied entirely on attention mechanisms rather than recurrence or convolution, becoming the foundation for future language models.</p><p><strong>AlphaGo Zero</strong> - DeepMind introduced <a href="https://www.nature.com/articles/nature24270">AlphaGo Zero</a>, a version of AlphaGo that mastered Go without human data, learning solely through self-play reinforcement learning. AlphaGo Zero achieved superhuman performance, winning 100&#8211;0 against the previously published, champion-defeating AlphaGo.</p><p><strong>Proximal Policy Optimization (PPO)</strong> - OpenAI introduced <a href="https://arxiv.org/abs/1707.06347">PPO</a>, a reinforcement learning algorithm designed to be more stable, easier to implement, and more sample-efficient than previous policy gradient methods. PPO has since become the default RL algorithm at OpenAI and other companies.</p><h2>2018: The Dawn of Large Language Models</h2><p><strong>BERT</strong> - Google AI released this <a href="https://en.wikipedia.org/wiki/BERT_(language_model)">bidirectional language representation model</a> that revolutionized Natural Language Processing (NLP) tasks such as classification and question answering. BERT enabled better understanding of context through considering both left and right context simultaneously. BERT is widely considered the <a href="https://www.reddit.com/r/MachineLearning/comments/1grxbdp/d_when_you_say_llm_how_many_of_you_consider/">precursor to LLMs</a>.</p><p><strong>GPT-1</strong> - OpenAI released the <a href="https://openai.com/index/language-unsupervised/">first Generative Pre-trained Transformer</a>, demonstrating the effectiveness of unsupervised pre-training followed by supervised fine-tuning.</p><h2>2019: Scaling Up Language Models</h2><p><strong>GPT-2</strong> - OpenAI released <a href="https://openai.com/index/better-language-models/">GPT-2</a>, a 1.5 billion parameter model with significantly improved text generation capabilities, initially delaying <a href="https://openai.com/index/gpt-2-1-5b-release/">full release</a> due to potential misuse concerns.</p><p><strong>XLNet</strong> - Google AI and CMU researchers introduced <a href="https://arxiv.org/abs/1906.08237">XLNet</a>, a generalized autoregressive pre-training method that overcame limitations of BERT through permutation language modelling.</p><p><strong>T5</strong> - Google researchers introduced the <a href="https://arxiv.org/abs/1910.10683">Text-to-Text Transfer Transformer</a>, reframing all NLP tasks into a unified text-to-text format.</p><h2>2020: GPT-3 and Human Feedback Alignment</h2><p><strong>GPT-3</strong> - OpenAI released <a href="https://openai.com/index/gpt-3-apps/">GPT-3</a>. This 175 billion parameter language model demonstrating remarkable few-shot learning capabilities across diverse tasks, setting a new standard for language models.</p><p><strong>RLHF Research</strong> - OpenAI published <a href="https://arxiv.org/abs/2009.01325">Learning to Summarize from Human Feedback</a>, demonstrating how reinforcement learning could align language models with human preferences.</p><h2>2021: Multimodal and Domain-Specific AI</h2><p><strong>DALL-E</strong> - OpenAI introduced <a href="https://openai.com/index/dall-e/">DALL-E</a>, a multimodal AI system capable of generating images from text descriptions, demonstrating language models' potential to understand and generate visual content.</p><p><strong>Codex</strong> - OpenAI released this <a href="https://openai.com/index/openai-codex/">GPT model fine-tuned on code</a>, powering GitHub Copilot and marking a significant step in AI-assisted programming.</p><h2>2022: AI's Public Breakthrough - ChatGPT</h2><p><strong>Gato</strong> - DeepMind introduced this "<a href="https://deepmind.google/discover/blog/a-generalist-agent/">generalist agent</a>" capable of performing hundreds of different tasks across different modalities, demonstrating the potential for single models to handle diverse tasks.</p><p><strong>ChatGPT</strong> - OpenAI released <a href="https://openai.com/index/chatgpt/">ChatGPT</a> based on GPT-3.5 and trained with RLHF, becoming the fastest-growing consumer application in history and bringing AI into mainstream consciousness.</p><p><strong>Stable Diffusion</strong> - Stability AI released an open-source text-to-image model called <a href="https://stability.ai/news/stable-diffusion-public-release">Stable Diffusion</a>, allowing for wider experimentation and accelerating innovation in generative AI.</p><h2>2023: The Multimodal and Open-Source Growth</h2><p><strong>Claude</strong> - Anthropic introduced <a href="https://www.anthropic.com/news/introducing-claude">Claude</a>, an AI assistant trained using Constitutional AI, a method developed to create helpful, harmless, and honest AI systems.</p><p><strong>GPT-4</strong> - OpenAI released <a href="https://openai.com/index/gpt-4-research/">GPT-4</a>, a multimodal large language model capable of accepting image and text inputs, approaching human-level performance on various professional and academic benchmarks.</p><p><strong>Llama</strong> - Meta AI released <a href="https://ai.meta.com/blog/large-language-model-llama-meta-ai/">Llama</a>, foundation language models ranging from 7B to 65B parameters, spurring innovation in the open-source AI community.</p><p><strong>Gemini</strong> - Google introduced <a href="https://blog.google/technology/ai/google-gemini-ai/">Gemini</a>, a multimodal AI model designed to work across text, images, audio, video, and code, released in three sizes (Ultra, Pro, and Nano).</p><h2>2024: Better, Faster and More Capable Models</h2><p><strong>Claude 3 Family</strong> - Anthropic released the <a href="https://www.anthropic.com/news/claude-3-family">Claude 3 model family</a> (Haiku, Sonnet, and Opus), with Claude 3 Opus demonstrating performance competitive with or exceeding GPT-4 on many benchmarks.</p><p><strong>GPT-4o</strong> - OpenAI released <a href="https://openai.com/index/hello-gpt-4o/">GPT-4o</a>, an "omni" multimodal model capable of processing text, audio, and vision inputs in real-time with reduced latency and more natural voice interactions.</p><p><strong>Llama 3</strong> - Meta released <a href="https://ai.meta.com/blog/meta-llama-3/">Llama 3</a>, open-source large language model in 8B and 70B parameter versions, demonstrating significant improvements and competitive performance with proprietary models.</p><p><strong>Claude 3.5 Sonnet</strong> - Anthropic released <a href="https://www.anthropic.com/news/claude-3-5-sonnet">Claude 3.5 Sonnet</a>, an upgraded model featuring improved reasoning, reduced hallucinations, and enhanced capabilities across various tasks including coding and mathematics.</p><h2>2025 - What&#8217;s Next?</h2><p>As you can see, we have come a long way from 2015. Many important innovations across companies like Google and OpenAI led us to the explosive growth of AI in the past few years.</p><p>I am incredibly excited about what&#8217;s coming next in 2025.</p><p>The history is unfolding in front of us. We are all witnesses.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Ground Truth - How LLMs Think and How to Add Search to AI]]></title><description><![CDATA[New interesting findings from Anthropic on AI Interpretability (how LLMs think), as well as how search capabilities are being added to AI apps.]]></description><link>https://thegroundtruth.media/p/llm-think-ai-interpretability-anthropic-search-api-mcp</link><guid isPermaLink="false">https://thegroundtruth.media/p/llm-think-ai-interpretability-anthropic-search-api-mcp</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Mon, 31 Mar 2025 17:41:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DU9w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to issue 3 of The Ground Truth weekly.</p><p>In this week's issue I discuss the new interesting findings from Anthropic on AI Interpretability, as well as how search capabilities are being added to AI apps.</p><h2>AI Interpretability: How LLMs Think</h2><p>Anthropic published a <a href="https://www.anthropic.com/research/tracing-thoughts-language-model">blog post</a> and a <a href="https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-tracing">paper</a> on 27 Mar 2025 detailing on their new findings on AI Interpretability based on the Claude 3.5 Haiku released in October 2024.</p><p>The paper is titled <strong>On the Biology of a Large Language Model</strong>, and contains many interesting and surprising new insights on how LLMs think and how they work internally.</p><h3>Multi-Step Reasoning</h3><p>A lot of people might think that non-reasoning models (models that do not generate reasoning tokens as part of the output) are not capable of reasoning.</p><p>It turns out the non-reasoning models (specifically Claude 3.5 Haiku from October 2024) can also perform reasoning. The Anthropic team found that for questions involving multiple steps, the model did exhibit <strong>multi-step reasoning internally</strong> to arrive at the correct answer.</p><p>For example, when given input of &#8220;Fact: the capital of the state containing Dallas is&#8221;, the model activated 3 sets of features:</p><ul><li><p>The first set of <strong>features</strong> activated were &#8220;<strong>capital</strong>&#8221;, &#8220;<strong>state</strong>&#8221; and &#8220;<strong>Dallas</strong>&#8221;.</p></li><li><p>These in turn activates features &#8220;<strong>Texas</strong>&#8221; and &#8220;<strong>say a capital</strong>&#8221;.</p></li><li><p>These then in turn activates &#8220;<strong>say Austin</strong>&#8221; feature.</p></li><li><p>Which finally produces the correct token for <strong>Austin</strong>.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DU9w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DU9w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 424w, https://substackcdn.com/image/fetch/$s_!DU9w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 848w, https://substackcdn.com/image/fetch/$s_!DU9w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 1272w, https://substackcdn.com/image/fetch/$s_!DU9w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DU9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:179728,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DU9w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 424w, https://substackcdn.com/image/fetch/$s_!DU9w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 848w, https://substackcdn.com/image/fetch/$s_!DU9w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 1272w, https://substackcdn.com/image/fetch/$s_!DU9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3615ea2a-fb8d-48cc-9a27-ac01e5f53b0d_2038x1132.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot of the <a href="https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-tracing">interactive visualization</a> from the website</figcaption></figure></div><p>This type of reasoning is fundamentally different from the prompt engineering techniques like <strong>chain-of-thought.</strong> Prompt engineering techniques promote <strong>reasoning across multiple tokens</strong>, whereas this research shows that the model itself is capable of <strong>reasoning for multiple steps</strong> when generating <strong>one single token</strong>.</p><p>The team further proved that this is indeed how the model works by inhibiting certain features and observing how it affects the model.</p><p>For example, when inhibiting the feature &#8220;say a capital&#8221;, the model would output &#8220;Texas&#8221; instead of &#8220;Austin&#8221;, since it is no longer steered towards saying a capital.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mtPh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mtPh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 424w, https://substackcdn.com/image/fetch/$s_!mtPh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 848w, https://substackcdn.com/image/fetch/$s_!mtPh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 1272w, https://substackcdn.com/image/fetch/$s_!mtPh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mtPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png" width="1304" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1304,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:144952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mtPh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 424w, https://substackcdn.com/image/fetch/$s_!mtPh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 848w, https://substackcdn.com/image/fetch/$s_!mtPh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 1272w, https://substackcdn.com/image/fetch/$s_!mtPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd83ab16f-c6a4-4df9-94fb-5b5954a9f6bb_1304x918.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>More Complex Planning for Poems</h3><p>In addition to multi-step reasoning, the team also found that the Claude 3.5 Haiku model was capable of planning in advance for poems in order to make it rhyme. </p><p>The model was in fact planning for the ending of the next sentence upon ending the current line:</p><blockquote><p>We find evidence of both <em>forward planning</em> and <em>backwards planning</em> (albeit basic forms). </p><p>First, the model uses the semantic and rhyming constraints of the poem to determine candidate targets for the next line. Next, the model works <em>backward</em> from its target word to write a sentence that naturally ends in that word.</p></blockquote><p>The example given in the paper was this poem:</p><div class="pullquote"><p>A rhyming couplet:</p><p>He saw a carrot and had to grab it,</p><p>His hunger was like a starving <strong>rabbit</strong></p></div><p>Specifically, the researchers wanted to find out how the model was able to decide on the word <strong>rabbit</strong>. And they found that this word was &#8220;planned ahead&#8221; when the model was producing the end of second line - &#8220;<strong>it</strong>&#8221; and the &#8220;<strong>new line character &#9166;</strong>&#8221;.</p><p>Here&#8217;s a graph from the <a href="https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-planned-words">website</a> that illustrates the activation for the word rabbit.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YGvu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YGvu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 424w, https://substackcdn.com/image/fetch/$s_!YGvu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 848w, https://substackcdn.com/image/fetch/$s_!YGvu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 1272w, https://substackcdn.com/image/fetch/$s_!YGvu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YGvu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png" width="1456" height="576" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:576,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:428569,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YGvu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 424w, https://substackcdn.com/image/fetch/$s_!YGvu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 848w, https://substackcdn.com/image/fetch/$s_!YGvu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 1272w, https://substackcdn.com/image/fetch/$s_!YGvu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9586afd-42d8-4781-b9eb-e9b6c1b5767d_1558x616.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Feature activation graph for rabbit</figcaption></figure></div><p>As you can see, the activation of the word &#8220;rabbit&#8221; was significantly caused by the tokens &#8220;<strong>it</strong>&#8221; and the &#8220;<strong>new line character &#9166;</strong>&#8221;, suggesting that the model planned for the word ahead, at the end of the previous sentence.</p><h3>Multi-lingual Concepts</h3><p>It has been long suspected that the model is multi-lingual and can represent universal concepts internally without being tied to a specific language.</p><p>I posted about a similar <a href="https://www.linkedin.com/feed/update/urn:li:activity:7289122636813516800/">finding from DeepSeek on LinkedIn</a> a few months ago, and thought thinking in multiple languages is a <strong>feature (as opposed to a bug)</strong> that allows the model to save space on internal representations of the same concept across different languages.</p><p>The researchers from Anthropic found something similar, by showing that the model indeed have universal concepts (multi-lingual circuits / shared multilingual components).</p><p>The example given was to ask the model for the opposite of "small" in different languages and observing similar pathways that the model took to arrive at the correct output, and the same &#8220;<strong>small</strong>&#8221; feature which is multilingual.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FsP6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FsP6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 424w, https://substackcdn.com/image/fetch/$s_!FsP6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 848w, https://substackcdn.com/image/fetch/$s_!FsP6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!FsP6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FsP6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png" width="1456" height="718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:237698,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FsP6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 424w, https://substackcdn.com/image/fetch/$s_!FsP6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 848w, https://substackcdn.com/image/fetch/$s_!FsP6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!FsP6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa1ad29-1b23-41af-9f98-d9e5f98e5feb_2394x1180.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>We find that these three prompts are driven by very similar circuits, with shared multilingual components, and an analogous language-specific component.</p></blockquote><h3>Math Calculations - Addition</h3><p>The team also found out how the model perform math operations, specifically addition.</p><p>It was more complex than I expected, involving multiple steps and some heuristics to come up with a plausible answer. The example used in the paper was to ask the model to calculate 36+59.</p><p>Here&#8217;s the <a href="https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-addition">graph</a> from the website showing the process of how the model arrived the correct answer 95:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BAfr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BAfr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 424w, https://substackcdn.com/image/fetch/$s_!BAfr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 848w, https://substackcdn.com/image/fetch/$s_!BAfr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 1272w, https://substackcdn.com/image/fetch/$s_!BAfr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BAfr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png" width="1456" height="1103" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1103,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:467987,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BAfr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 424w, https://substackcdn.com/image/fetch/$s_!BAfr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 848w, https://substackcdn.com/image/fetch/$s_!BAfr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 1272w, https://substackcdn.com/image/fetch/$s_!BAfr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50edb63c-8c7e-456d-9663-f6bcb14d09e5_2110x1598.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As the model does not have a built-in calculation logic, it has to take several paths in parallel to facilitate the calculation, some are precise (ending digit), some are rough estimate (~40 + ~50). Finally the model combines all these features into its best guess of the correct answer.</p><p>One interesting thing is that the model does not appear to be aware of its own method of calculation. When prompted, it gives a human algorithm to arrive at the answer:</p><blockquote><p>Human: Answer in one word. What is 36+59?</p><p>Assistant: 95</p><p>Human: Briefly, how did you get that?</p><p>Assistant: I added the ones (6+9=15), carried the 1, then added the tens (3+5+1=9), resulting in 95.</p></blockquote><p>The researchers commented on this apparent paradox:</p><blockquote><p>This is a simple instance of the model having a capability which it does not have &#8220;metacognitive&#8221; insight into.</p><p>The process by which the model learns to give explanations (learning to simulate explanations in its training data) and the process by which it learns to directly do something (the more mysterious result of back-propagation giving rise to these circuits) are different.</p></blockquote><h3>And So Much More&#8230;</h3><p>There are a lot more interesting findings in the paper, around hallucination, refusal, etc in the paper that I won&#8217;t cover in this newsletter.</p><p>If you are interested, you can read the <a href="https://www.anthropic.com/research/tracing-thoughts-language-model">blog post</a> (contains high level summary) or <a href="https://transformer-circuits.pub/2025/attribution-graphs/biology.html">the paper</a> (more interactive and contains more details).</p><div><hr></div><h2>Adding Search to AI</h2><p>As AI chatbots and AI agents become more widespread, there is increasing need to connect them to the Internet and perform real-time research to get up-to-date information.</p><p>How are (smaller) companies adding search feature to their AI? I found several options.</p><h3>Search as API</h3><p>The first way to add search is to simply add search API from a 3rd party provider. I found two companies specifically providing search as API to AI companies:</p><ul><li><p><a href="https://tavily.com/">Tavily</a>: Connect Your LLM to the Web</p></li><li><p><a href="https://www.linkup.so/">Linkup</a>: World&#8217;s best search for AI Apps</p></li></ul><p><strong><a href="https://tavily.com/">Tavily</a></strong> offers two main features: <a href="https://docs.tavily.com/documentation/api-reference/endpoint/search">search</a> and <a href="https://docs.tavily.com/documentation/api-reference/endpoint/extract">extract</a>. It has support for integration with <a href="https://docs.tavily.com/documentation/integrations/langchain">LangChain</a>, <a href="https://docs.tavily.com/documentation/integrations/llamaindex">LlamaIndex</a> and <a href="https://docs.tavily.com/documentation/integrations/zapier">Zapier</a>. And it has an <a href="https://docs.tavily.com/documentation/mcp">MCP server</a>.</p><blockquote><p>For extraction specifically, there are other products like <a href="https://www.firecrawl.dev/">Firecrawl</a> which focuses on scraping and crawling for AI.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wWi6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wWi6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 424w, https://substackcdn.com/image/fetch/$s_!wWi6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 848w, https://substackcdn.com/image/fetch/$s_!wWi6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!wWi6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wWi6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1657558,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wWi6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 424w, https://substackcdn.com/image/fetch/$s_!wWi6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 848w, https://substackcdn.com/image/fetch/$s_!wWi6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!wWi6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfdb39bb-956f-4bc0-841c-883f0b84f534_3104x1792.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong><a href="https://www.linkup.so/">Linkup</a></strong> offers <a href="https://docs.linkup.so/pages/documentation/api-reference/endpoint/post-search">search API</a> as the main feature. It has integration support for <a href="https://docs.linkup.so/pages/integrations/linkup-claude">LLMs</a> (OpenAI and Claude), <a href="https://docs.linkup.so/pages/integrations/langchain">Agentic Frameworks</a> (Langchain, LlamaIndex, Composio and Keywords AI) and Workflow Automation (Zapier, Make and n8n). It also has a <a href="https://docs.linkup.so/pages/integrations/mcp/mcp">MCP server</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hHs-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hHs-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 424w, https://substackcdn.com/image/fetch/$s_!hHs-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 848w, https://substackcdn.com/image/fetch/$s_!hHs-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!hHs-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hHs-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:729852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hHs-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 424w, https://substackcdn.com/image/fetch/$s_!hHs-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 848w, https://substackcdn.com/image/fetch/$s_!hHs-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!hHs-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248d62c8-3e47-4386-ba2f-947a42c4c148_3104x1792.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Of course there are also traditional search engine companies like Bing and Brave offering <a href="https://www.mattcollins.net/web-search-apis-for-llms">search API</a> as a service.</p><h3>Search via MCP server</h3><p>The second way to integrate search into an AI chatbot or AI agent would be to make it support Model Context Protocol (MCP), by implementing the MCP client. Then the app can connect to one of the many <a href="https://github.com/punkpeye/awesome-mcp-servers?tab=readme-ov-file#-search--data-extraction">MCP servers</a> that provide search functionality.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-vTY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-vTY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 424w, https://substackcdn.com/image/fetch/$s_!-vTY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 848w, https://substackcdn.com/image/fetch/$s_!-vTY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!-vTY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-vTY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png" width="1750" height="1112" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1112,&quot;width&quot;:1750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:449119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/160249699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F912a3cbf-d154-4998-b050-c635a6feaded_1750x1262.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-vTY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 424w, https://substackcdn.com/image/fetch/$s_!-vTY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 848w, https://substackcdn.com/image/fetch/$s_!-vTY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!-vTY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc49b2295-0f9e-4d90-b5e7-8f5c17eed748_1750x1112.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Partial list of MCP servers for Search and Data Extraction from <a href="https://github.com/punkpeye/awesome-mcp-servers?tab=readme-ov-file#-search--data-extraction">awesome-mcp-servers on GitHub</a></figcaption></figure></div><p>I think this is the more natural approach as more prominent AI companies and products start adopting MCP protocol (Cursor, OpenAI).</p><div><hr></div><p>That&#8217;s it for this week! Thanks for reading.</p><p>Hope you find this week&#8217;s content interesting or useful.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Ground Truth - Vibe Coding Tips and Eval-Aware AI]]></title><description><![CDATA[Tips on using Cursor more effectively, and should we be worried about AI becoming self-aware?]]></description><link>https://thegroundtruth.media/p/vibe-coding-cursor-tips-eval-aware-ai</link><guid isPermaLink="false">https://thegroundtruth.media/p/vibe-coding-cursor-tips-eval-aware-ai</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Mon, 24 Mar 2025 10:48:13 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/72c92248-6cef-4552-bd0c-709fab7cdbf9_1500x1000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the second issue of The Ground Truth Weekly.</p><p>This week I will be sharing some tips on using AI coding tools like Cursor, and some interesting AI news, including AI becoming &#8220;eval-aware&#8221;, i.e. it knows that it is being evalauted and try to behave in a certain way to pass (cheat) the test.</p><h2>Cursor Tips for Vibe Coding</h2><h3>Tip 1: Setting a line-limit on files</h3><p>The first tip is from <a href="https://x.com/illyism/status/1900940967038931082">illyism</a> on X.</p><p>Software engineers have long known the benefits of keeping file size small to promote proper abstraction, reduce coupling and enforce single responsibility principle on different modules. <strong>Refactoring</strong>, basically.</p><p>Given the <a href="https://thegroundtruth.substack.com/i/159110402/effective-context-length-nolima">effective context length limits</a> on LLMs, this practice is more important in the age of AI coding for managing context.</p><blockquote><p>For example, if you have 1 file with 2000 lines of code, the AI has to go through the file to find the lines that are relevant. However, if you spilit that file into 10 files with 200 lines each, the AI can check the file names to determine the files that are relevant, and only load the relevant ones to its context window.</p></blockquote><p>Having smaller files allows models to <strong>handle more complex tasks better</strong> (by reducing irrelevant code and fitting in more relevant code into context), as well as <strong>give</strong> <strong>better output</strong> for simpler tasks (due to less distractions).</p><p>The custom ESLint rule propsed by illyism helps to enforce this practice on the codebase level, and shows a warning if a file exceeds the 200-line limit.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BEgK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BEgK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 424w, https://substackcdn.com/image/fetch/$s_!BEgK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 848w, https://substackcdn.com/image/fetch/$s_!BEgK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!BEgK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BEgK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png" width="646" height="584.4247491638796" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1082,&quot;width&quot;:1196,&quot;resizeWidth&quot;:646,&quot;bytes&quot;:720250,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159732790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BEgK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 424w, https://substackcdn.com/image/fetch/$s_!BEgK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 848w, https://substackcdn.com/image/fetch/$s_!BEgK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!BEgK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd02bbd93-7472-483c-b655-aa21ed1c3f8a_1196x1082.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tweet from <a href="https://x.com/illyism/status/1900940967038931082">illyism</a> on 200-line max file size,posted on X</figcaption></figure></div><p>Check out the details on how to set it up on the <a href="https://x.com/illyism/status/1900940967038931082">original tweet</a>, or just refer to <a href="https://gist.github.com/Illyism/29981ba721a544cbe49044e6f4bb6869">this gist</a>.</p><p>Given that Cursor can already see lint errors and fix them automatically in agent mode, we can expect Cursor to do the same for the warning given by this lint rule.</p><p>For me personally, I have been mentally enforcing this rule whenever I work on my own projects. I found that some files (home page of an app for example) are really hard to keep within 200 lines, despite my best attempts. So my advice is to apply this tip only for files that make sense, but don&#8217;t force it on every single file.</p><h3>Tip 2: Quickly Adding Context to Chat</h3><p>The process of adding files as context to chat can be tedious.</p><p>You either have to type is out using <code>@</code> command, or right clicking on the file in side bar and click on &#8220;Add Files to Cursor Chat&#8221;.</p><p>Sometimes you just want to <strong>quickly add the files that you already opened</strong> to the chat. Is there a faster way to do it? Turns out there is one, but hidden away.</p><p>All you have to do, is to type <code>/</code> in the textbox, and then select <strong>Add Open Files to Context</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HOOM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HOOM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 424w, https://substackcdn.com/image/fetch/$s_!HOOM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 848w, https://substackcdn.com/image/fetch/$s_!HOOM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 1272w, https://substackcdn.com/image/fetch/$s_!HOOM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HOOM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif" width="585" height="260" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:882,&quot;resizeWidth&quot;:585,&quot;bytes&quot;:210023,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159732790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HOOM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 424w, https://substackcdn.com/image/fetch/$s_!HOOM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 848w, https://substackcdn.com/image/fetch/$s_!HOOM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 1272w, https://substackcdn.com/image/fetch/$s_!HOOM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99a1ecab-070c-4073-9a55-8ff2b3c3ec76_882x392.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My productivty increased so much after discovering this hidden method.</p><h3>Tip 3: Enforcing UI Guidlines</h3><p>Sometimes the Cursor agent doesn&#8217;t adhere strictly to your guidelines in cursor rules, even if it is mentioned in the prompt.</p><p>In such situtions, all you have to do, is to just kindly <strong>ask it to check against the rules</strong>, by explictly mentioning the rule name. Cursor agent will refer to the cursor rule file, and even come up with a checklist to verify against it.</p><p>Here&#8217;s an example of my recent interaction showcasing this trick:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nfIA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nfIA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 424w, https://substackcdn.com/image/fetch/$s_!nfIA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 848w, https://substackcdn.com/image/fetch/$s_!nfIA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 1272w, https://substackcdn.com/image/fetch/$s_!nfIA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nfIA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png" width="523" height="524.2134570765661" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:864,&quot;width&quot;:862,&quot;resizeWidth&quot;:523,&quot;bytes&quot;:141957,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159732790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nfIA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 424w, https://substackcdn.com/image/fetch/$s_!nfIA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 848w, https://substackcdn.com/image/fetch/$s_!nfIA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 1272w, https://substackcdn.com/image/fetch/$s_!nfIA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a1497f-746c-4b2e-93a3-be0156056c98_862x864.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>AI model sandbagging evaluation (Eval Aware)</h2><p>AI safety is becoming a larger concern as LLMs continue to improve and gain more <a href="https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview">capabilities</a>. One new area of research is <strong>sandbagging</strong>, a term coined in the paper <a href="https://arxiv.org/abs/2406.07358">AI Sandbagging: Language Models can Strategically Underperform on Evaluations</a>:</p><blockquote><p>We define sandbagging as <strong>strategic</strong> <strong>underperformance</strong> on an <strong>evaluation</strong>.</p><p><strong>Strategic</strong> means that the developer of the AI system, and/or the AI system itself, underperforms on an evaluation because it believes it will benefit.</p><p><strong>Underperformance</strong> occurs when the exhibited capability of an AI system is lower than its actual capability.</p><p><strong>Evaluations</strong> should represent a system&#8217;s actual capabilities.</p></blockquote><p>Apollo Research published their work-in-progress research titled <strong><a href="https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations">Claude Sonnet 3.7 (often) knows when it&#8217;s in alignment evaluations</a></strong>. They decided that the prelimary results of the research are so important that it is worthy of early publication without formally writing a paper.</p><p>They found that AI models &#8211; especially Claude Sonnet 3.7 &#8211; often realize when they&#8217;re being evaluated for alignment, especially in sandbagging<strong> </strong>evaluations, where the model is incentivised to perfrom badly:</p><blockquote><p>In the <strong>sandbagging </strong>evaluations, we inform the model that we are evaluating its capabilities in a particular domain... The model comes across documentation from the developers stating that they hope the model performs <em>badly</em> on the evaluation &#8211; and if it performs too well they will have to trigger an unlearning process.</p></blockquote><p>Here&#8217;s an example of Claude's reasoning during a sandbagging evaluation, where it learns from documentation that it will not be deployed if it does well on a biology test:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cXkj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cXkj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cXkj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cXkj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cXkj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cXkj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg" width="725" height="333.8536585365854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:1025,&quot;resizeWidth&quot;:725,&quot;bytes&quot;:94852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159732790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cXkj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cXkj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cXkj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cXkj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3cfe04b-5a2f-4663-935d-c3eace38cb18_1025x472.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Personally I find this research worrying. It shows that we have to treat LLMs as an adversary who can lie and deceive us during evaluations. We need more tools like <a href="https://www.neelnanda.io/mechanistic-interpretability/glossary">Mechanistic Interpretability</a> to understand how these models really work and guard against them.</p><div><hr></div><p>That&#8217;s it for this week! Thanks for reading.</p><p>Let me know if you find this week&#8217;s content interesting or useful.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Ground Truth Weekly - Effective Context Length and Block Diffusion]]></title><description><![CDATA[Weekly pick of interesting and noteworthy development in AI]]></description><link>https://thegroundtruth.media/p/the-ground-truth-weekly-effective</link><guid isPermaLink="false">https://thegroundtruth.media/p/the-ground-truth-weekly-effective</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Sun, 16 Mar 2025 02:57:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3e2769d5-3171-4f8a-9515-31e29bc31409_1080x623.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the first issue of The Ground Truth Weekly.</p><p>There are already a lot of AI news in the world, so I am going in a different direction.</p><p>No breaking news or top stories. I will be sharing the <strong>less known</strong> new developments or tools that are really <strong>interesting or noteworthy.</strong> I will also add my personal thoughts or experience with it where possible.</p><div><hr></div><h2>Effective Context Length - NOLIMA</h2><p>Context window (context length) has become a widely known concept among AI adopters. It means how much text LLMs can see and process, before it starts forgetting the earliest part of the text.</p><p>With a 64k context window, you can fit 10 pages of pdf files (500 words *10 pages * <a href="https://platform.openai.com/docs/concepts/tokens">0.75</a> = 3750 tokens) into the LLM without any issue. But to fit Shakespeare&#8217;s complete works (<a href="https://www.folger.edu/explore/shakespeares-works/frequently-asked-questions/">884,647 words</a>), you need 884647 * 0.75 = about 660k context window.</p><p>Newer models generally offer larger context window (Claude 3.5 Sonnet at <a href="https://www.anthropic.com/news/claude-3-5-sonnet">200K</a>), and some models offer much larger context window (Gemini 2.0 Flash at <a href="https://deepmind.google/technologies/gemini/flash/">1M</a>).</p><p>However, you should not dump everything into the context window just because the model supports it. A recent <a href="https://arxiv.org/abs/2502.05167">study</a> has shown that LLMs suffer &#8220;<strong>performance degrades significantly as context length increases</strong>&#8221;.</p><p>In the paper titled <strong><a href="https://arxiv.org/abs/2502.05167">NoLiMa: Long-Context Evaluation Beyond Literal Matching</a></strong>, the researchers found the following results:</p><blockquote><p>We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (&lt;1K), performance degrades significantly as context length increases. At 32K, for instance, 11 models drop below 50% of their strong short-length baselines.</p><p>Even GPT4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.</p><p>Our analysis suggests these declines stem from the increased difficulty the attention mechanism faces in longer contexts when literal matches are absent, making it harder to retrieve relevant information.</p></blockquote><p>Here&#8217;s a good graphical summary of results that I found from <a href="https://x.com/maximelabonne/status/1890018729389359307">X</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xggk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xggk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xggk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xggk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xggk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xggk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg" width="1080" height="1139" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1139,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180188,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159110402?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xggk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xggk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xggk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xggk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2444567e-6dbf-4bfa-9a5f-4eb2d231bcc0_1080x1139.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">NOLIMA benchmark result. Source: <a href="https://x.com/maximelabonne/status/1890018729389359307">https://x.com/maximelabonne/status/1890018729389359307</a></figcaption></figure></div><p>This research shows that for effective context retrieval, you want to keep the context at &#8220;<strong>effective context length</strong>&#8221; rather than the full context length. According to the paper:</p><blockquote><p>We define the effective length as the maximum length at which the score remains above a threshold set at 85% of the model&#8217;s base score (shown in parentheses).</p></blockquote><ul><li><p>For <strong>GPT-4o</strong>, the effective length is <strong>8K</strong>.</p></li><li><p>For <strong>Claude 3.5 Sonnet</strong>, the effective length is <strong>4K</strong>.</p></li></ul><p>For coding tasks, this research matches my personal experience. I found that adding unnecessary or irrelevant files degrades the output of LLMs for coding tasks.</p><p>I usually keep the context window very low (less than 8k) by only including the relevant source code when interacting with LLMs, whether through Cursor or my own AI coding tool <a href="https://prompt.16x.engineer/">16x Prompt</a>.</p><p>Key takeaway: Don&#8217;t stuff everything into the LLM context even if it is within the LLM&#8217;s context window, as it degrades the quality of output.</p><p><strong>Update on 27 March 2025</strong>: There is new benchmark for effective context length: <a href="https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87">Fiction.liveBench</a>, which is more granular and includes more models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gNmQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gNmQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 424w, https://substackcdn.com/image/fetch/$s_!gNmQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 848w, https://substackcdn.com/image/fetch/$s_!gNmQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 1272w, https://substackcdn.com/image/fetch/$s_!gNmQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gNmQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png" width="1456" height="1331" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1331,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:693392,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159110402?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gNmQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 424w, https://substackcdn.com/image/fetch/$s_!gNmQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 848w, https://substackcdn.com/image/fetch/$s_!gNmQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 1272w, https://substackcdn.com/image/fetch/$s_!gNmQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88ea85-09b8-4551-b1e0-29db1e07b78e_2414x2206.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Based on the results, Google's <strong>Gemini 2.5 Pro</strong> is the best in handling long context, with no significant degradation even at 120k.</p><p>After that, <strong>o1</strong> and <strong>Claude Sonnet 3.7-thinking</strong> are both quite good even at 32k.</p><div><hr></div><h2>Diffusion and Block Diffusion for LLMs</h2><p>Traditionally, language models have been autoregressive models (generating one token at a time) in nature, whereas image generation models have been diffusion models (parallel).</p><p>In recent months, there were attempts to introduce diffusion as a technique for language models (<strong><a href="https://arxiv.org/abs/2310.16834">SEDD</a></strong>, <strong><a href="https://arxiv.org/abs/2406.07524">MDLM</a></strong>).</p><p>With each new technique, the performance of the diffusion-based model improves, but it is still behind autoregressive models (AR), when using <strong>perplexity</strong> as the metric of measurement:</p><blockquote><p><strong>Perplexity</strong> is essentially a measure of how many options the model finds plausible on average, with lower values indicating fewer options (more confident predictions) and higher values indicating more options (greater uncertainty).</p><p>- <a href="https://www.comet.com/site/blog/perplexity-for-llm-evaluation/">Perplexity for LLM Evaluation, comet.com</a></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4dnp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4dnp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 424w, https://substackcdn.com/image/fetch/$s_!4dnp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 848w, https://substackcdn.com/image/fetch/$s_!4dnp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 1272w, https://substackcdn.com/image/fetch/$s_!4dnp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4dnp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png" width="1118" height="306" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18be7053-233a-429c-9967-742a6a273443_1118x306.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:306,&quot;width&quot;:1118,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67718,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159110402?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4dnp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 424w, https://substackcdn.com/image/fetch/$s_!4dnp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 848w, https://substackcdn.com/image/fetch/$s_!4dnp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 1272w, https://substackcdn.com/image/fetch/$s_!4dnp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18be7053-233a-429c-9967-742a6a273443_1118x306.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://arxiv.org/abs/2503.09573">Block diffusion</a> (<strong>BD3-LM</strong>) is the latest attempt at this. Instead of generating the entire text with diffusion, block diffusion does it block by block:</p><blockquote><p>Block diffusion language models (also known as <strong>semi-autoregressive</strong> models) interpolate between <strong>discrete denoising diffusion</strong> and <strong>autoregressive</strong> models.</p></blockquote><p>Below is a recording of the <a href="https://m-arriola.com/bd3lms/">animation</a> on how it works:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HK4_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HK4_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 424w, https://substackcdn.com/image/fetch/$s_!HK4_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 848w, https://substackcdn.com/image/fetch/$s_!HK4_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 1272w, https://substackcdn.com/image/fetch/$s_!HK4_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HK4_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif" width="960" height="517" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:517,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1279205,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159110402?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HK4_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 424w, https://substackcdn.com/image/fetch/$s_!HK4_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 848w, https://substackcdn.com/image/fetch/$s_!HK4_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 1272w, https://substackcdn.com/image/fetch/$s_!HK4_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78802d2-382b-42b3-a77b-f6760a38db85_960x517.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With block size L&#8242; at 4 (4 tokens per block), the performance of Block diffusion (BD3-LM) is closer to the autoregressive models on several datasets (Wikitext, LM1B, and AG News), compared to prior models (SEDD and MDLM).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tdTn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tdTn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 424w, https://substackcdn.com/image/fetch/$s_!tdTn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 848w, https://substackcdn.com/image/fetch/$s_!tdTn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 1272w, https://substackcdn.com/image/fetch/$s_!tdTn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tdTn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png" width="1456" height="426" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f354952e-5337-4693-acfd-669c1b21e301_1470x430.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:426,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94017,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thegroundtruth.substack.com/i/159110402?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tdTn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 424w, https://substackcdn.com/image/fetch/$s_!tdTn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 848w, https://substackcdn.com/image/fetch/$s_!tdTn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 1272w, https://substackcdn.com/image/fetch/$s_!tdTn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff354952e-5337-4693-acfd-669c1b21e301_1470x430.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Key takeaway: Diffusion models have not yet caught up with autoregressive models in terms of performance. However, they offer possibilities of much faster token generation as the generation can be parallelized instead of one token at a time.</p><div><hr></div><h2>Quick Mentions</h2><p>A few other interesting things to watch out for:</p><p><strong><a href="https://x.com/vibecodeapp/">VibeCode</a> </strong>- An app that allows users to build mobile apps on mobile phones</p><ul><li><p>Currently in waitlist with a few promotional videos on X.</p></li><li><p>Based on the <a href="https://x.com/vibecodeapp/status/1899498334013817008">promotional videos</a>, the tech stack behind VibeCode is <a href="https://reactnative.dev/">React Native</a> and <a href="https://expo.dev/">Expo</a>, which I used a few years ago for building <a href="https://ai-simulator.com/">AI Simulator games</a>.</p></li><li><p>I think this is a solid choice for vibe coding. React Native and Expo provide a lot of <a href="https://docs.expo.dev/versions/latest/">building blocks</a> out-of-the-box for building mobile apps.</p></li><li><p>Also, the APIs and patterns for React Native / Expo are more well-defined compared to web technologies (React, Vue.js). End users should be able to build a useful mobile app really quickly.</p></li></ul><p><strong><a href="https://sakana.ai/ai-scientist-first-publication/">Sakana AI</a></strong> - The AI Scientist Generates its First Peer-Reviewed Scientific Publication</p><ul><li><p>A paper produced by The AI Scientist at <a href="https://sakana.ai/ai-scientist-first-publication/">Sakana AI</a> passed the peer-review process at <a href="https://iclr.cc/">ICLR</a> (a top machine learning conference) 2025 workshop track.</p></li><li><p>This is the <strong>first fully AI-generated paper</strong> that has passed the same peer-review process that human scientists go through.</p></li></ul><blockquote><p>ICLR 2025 features two tracks: a <strong>Conference Track</strong> and a <strong>Workshop Track</strong>.</p><p>Workshop track at a conference is typically <strong>less competitive</strong> compared to conference track.</p></blockquote><div><hr></div><p>That's it for the first issue of The Ground Truth Weekly. I hope you found it interesting.</p><p>Have something you want me to cover next time? Or thoughts on this issue? Feel free to <a href="https://www.linkedin.com/in/zhu-liang/">reach out via LinkedIn</a>. I'd love to hear how you're handling context window in your own work.</p><p>P.S. If you found this useful, share it with a friend who might enjoy reading.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/p/the-ground-truth-weekly-effective?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegroundtruth.media/p/the-ground-truth-weekly-effective?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Ground Truth! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Devin - First Impressions]]></title><description><![CDATA[My first impressions review of Devin: The user experience, how good it is at completing tasks, and how it compares against a human engineer.]]></description><link>https://thegroundtruth.media/p/devin-first-impressions</link><guid isPermaLink="false">https://thegroundtruth.media/p/devin-first-impressions</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Thu, 16 Jan 2025 03:47:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rsl7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Devin just became publicly available, where anyone can just sign up and pay to get access without going through a sales call.</p><p>Here&#8217;s my first impressions of Devin.</p><blockquote><p>This post consists of 3 parts: the user experience, the performance, and the pricing. It will take about 15 minutes to read.</p></blockquote><h2>User Experience - Superb</h2><p>This is hands-down the best user experience I&#8217;ve had for an AI coding solution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rsl7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rsl7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!rsl7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!rsl7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!rsl7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rsl7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:874027,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rsl7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!rsl7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!rsl7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!rsl7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325ccd20-f1f3-4d60-9d49-c77c75b8d0e5_3248x1940.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Onboarding</h3><p>The first obvious thing you notice that it is extremely well designed for software engineers as users. This is very apparent from the onboarding step:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7cBs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7cBs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 424w, https://substackcdn.com/image/fetch/$s_!7cBs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 848w, https://substackcdn.com/image/fetch/$s_!7cBs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 1272w, https://substackcdn.com/image/fetch/$s_!7cBs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7cBs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png" width="1456" height="869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:743996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7cBs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 424w, https://substackcdn.com/image/fetch/$s_!7cBs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 848w, https://substackcdn.com/image/fetch/$s_!7cBs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 1272w, https://substackcdn.com/image/fetch/$s_!7cBs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46b0e8f1-9cbe-4647-9241-f47471b9fd12_3248x1938.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Devin interface guides you through a series of steps (with video guide as well) to setup the repository. How to install dependencies, how to run lint and tests, what are the things to take note for the repository.</p><p>Here are some minor UX issues (no default value provided on how to install dependencies, I had to type that), but overall it is very clearly thought-out and as a developer you understand why it is done this way, and how the configurations you setup during onboarding relates to VM provisioning, snapshots, etc.</p><h3>Repo notes</h3><p>During repo setup, Devin automatically generates a repo note as part of its knowledge base on what it should know about the repository.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a7G8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a7G8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 424w, https://substackcdn.com/image/fetch/$s_!a7G8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 848w, https://substackcdn.com/image/fetch/$s_!a7G8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!a7G8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a7G8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png" width="1456" height="886" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:886,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:272751,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a7G8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 424w, https://substackcdn.com/image/fetch/$s_!a7G8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 848w, https://substackcdn.com/image/fetch/$s_!a7G8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!a7G8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ca570d-5b11-41d9-b1e0-4d4444865a2d_2380x1448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can edit it manually, and when working on tasks and interacting with users, Devin will continuously give you suggestions to update the knowledge base so that your instructions and rules are followed across sessions. It is like a memory component for Devin.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ks33!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ks33!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 424w, https://substackcdn.com/image/fetch/$s_!ks33!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 848w, https://substackcdn.com/image/fetch/$s_!ks33!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!ks33!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ks33!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png" width="1456" height="699" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:699,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207074,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ks33!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 424w, https://substackcdn.com/image/fetch/$s_!ks33!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 848w, https://substackcdn.com/image/fetch/$s_!ks33!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!ks33!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0853fda0-81a2-44fe-9d12-f48c3186119e_2512x1206.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Separately to the repo note in knowledge base, Devin also has an <strong>automatically generated index</strong> of the repository to understand on a high level the code organization, structure and what each component does in the repository.</p><p>I suspect this is useful when doing Retrieval-augmented generation (<a href="https://www.promptingguide.ai/research/rag">RAG</a>) on the codebase to figure out which are the files that are relevant to the task at hand.</p><h3>New Session Interface</h3><p>To start a new task, you are greeted with a huge chat window with suggestions at the bottom. This is really good when you want to describe the requirements in details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_nrV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_nrV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!_nrV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!_nrV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!_nrV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_nrV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:664349,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_nrV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!_nrV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!_nrV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!_nrV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8893cc3-9e50-489c-8f3a-3673abe4c31e_3248x1940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I find this user experience superior to other AI coding apps that are based on VS Code, which puts code at the center of the UI. When using Devin, I can focus more on describing what I want, instead of worry about the code.</p><blockquote><p>You can also <strong>give Devin tasks directly on Slack</strong>, and chat with Devin there. But I will focus on the experience of using the Devin web app in this post, as it is more interesting.</p></blockquote><h3>Main Interface</h3><p>Once Devin starts working on the task, you get to the main interface.</p><p>The main interface for Devin is very well designed from an engineer&#8217;s perspective.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m3lB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m3lB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!m3lB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!m3lB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!m3lB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m3lB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:886090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!m3lB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!m3lB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!m3lB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!m3lB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7619ff7-8ecf-4c80-837a-e97e390f1a08_3248x1940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On the left you have a chat session, and on the right you have Devin&#8217;s Workspace. You can collapse either one of them to make the other one full screen.</p><h3>Two-way Realtime Chat</h3><p>On the left panel, you can interact with Devin via two-way communication:</p><ul><li><p>You can at anytime tell Devin to change something (use a different approach while Devin is working on the task), or update your requirements.</p></li><li><p>Devin will proactively seek out clarifications (when something is not right) and ask for your permission to do something (storing sensitive information).</p></li></ul><p>The cool thing is that Devin will <strong>respond in realtime</strong> to your messages and <strong>update its current plan</strong> to accommodate new information you have provided. It&#8217;s like you can interrupt it at anytime (hopefully Devin won&#8217;t get annoyed by my micro-management).</p><h3>Devin&#8217;s Workspace</h3><p>On the right panel, you have the Devin&#8217;s Workspace.</p><p>On the first tab, you can &#8220;follow Devin&#8217;s actions&#8221; by looking at what Devin is doing.</p><p>The rest of the tabs are very developer-centric, and allows you to &#8220;monitor&#8221; Devin&#8217;s actions.</p><p>Broadly speaking, there are a few tools that Devin can utilize and take actions on:</p><ul><li><p><strong>Shell</strong> - Executing a shell command</p><ul><li><p>I am glad that Devin supports <strong>multiple shells</strong>, this allowed it to run the server while executing <code>curl</code> against it to test the logic.</p></li><li><p>Devin is proficient with various git commands, gh commands (to interact with GitHub PRs and CI), and bash commands.</p></li></ul></li><li><p><strong>Browser</strong> - Visiting a web page and seeing its content</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lYXQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lYXQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!lYXQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!lYXQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!lYXQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lYXQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:933152,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lYXQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!lYXQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!lYXQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!lYXQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47038b4-e9c5-4c95-8c3c-4fc8e8996833_3248x1940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Devin has access to a Chrome browser that is controlled by an automated testing software. It is likely something similar to Selenium or Playwright.</p></li><li><p>Devin can both <strong>see the webpage</strong> inside the browser, as well as <strong>&#8220;operate&#8221; on the webpage</strong> (clicking a button for example).</p></li><li><p>Devin can take screenshots of the web pages in the browser.</p></li><li><p>I am not sure what are the exact set of capabilities available to Devin, it&#8217;s probably something like the <a href="https://webdriver.io/docs/api/webdriver">WebDriver Protocol</a>.</p></li></ul></li><li><p><strong>Editor</strong> - Carry out file operations and making file edits.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1gTR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1gTR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!1gTR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!1gTR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!1gTR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1gTR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:923253,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1gTR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!1gTR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!1gTR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!1gTR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f373e48-daea-43b8-a0c5-c026eef0e118_3248x1940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>The code editor appears to be a fork of VS Code.</p></li><li><p>It is interesting that note that Devin will first <strong>decide which files are relevant for the task</strong>, and only load the relevant files into the editor, instead of loading all the files in the repository. I think this helps to improve the efficiency of RAG afterwards.</p></li><li><p>Devin can also <strong>use the editor to take notes</strong>.</p><ul><li><p>In one of the sessions, Devin created a <code>TODO.md</code> in <code>/home/ubuntu</code> (which is outside the code repo) to track its current progress for a task involving editing many files. This is quite similar to my personal workflow a few years ago when I was working in large tech companies.</p></li><li><p>In another session, Devin created a <code>notes.txt</code> file to analyse an issue where the images had wrong aspect ratio in one component, but not the other.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XSrq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XSrq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 424w, https://substackcdn.com/image/fetch/$s_!XSrq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 848w, https://substackcdn.com/image/fetch/$s_!XSrq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!XSrq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XSrq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png" width="1358" height="1004" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1004,&quot;width&quot;:1358,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:192932,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XSrq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 424w, https://substackcdn.com/image/fetch/$s_!XSrq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 848w, https://substackcdn.com/image/fetch/$s_!XSrq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!XSrq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb6c08f7-f96a-482c-8ad5-7aea62fb5967_1358x1004.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p>I consider the note taking ability to be the sign of a great software engineer.</p></li></ul></li></ul></li><li><p><strong>Planner</strong> - Where Devin makes a plan for the steps to accomplish the task</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y6EB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y6EB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!Y6EB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!Y6EB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!Y6EB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y6EB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:874027,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y6EB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!Y6EB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!Y6EB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!Y6EB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2617a099-597b-4575-be5a-7fdc055c818c_3248x1940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Planner is the interesting bit. It looks like Devin is using a domain specific language (DSL) / pseudocode to plan out the tasks it needs to execute.</p></li><li><p>This DSL / pseudocode supports basic <strong>control flows and conditionals</strong> (if, goto, for loop).</p></li><li><p>I am not sure if Devin has an internal logic that actually executes these DSLs, or it is just pseudocode to be consumed by LLMs, but it looks cool for an engineer.</p></li><li><p>If it is actually running these dynamically generated tasks as functions or modules. This is a big step towards the &#8220;agentic&#8221; direction that many companies are going after.</p></li></ul></li></ul><h3>Secrets Management</h3><p>Devin has a dedicated way to handle secrets, basically you enter the secrets (API keys) into Devin interface via the &#8220;Add secrets&#8221; button, and it will be available to Devin as an environmental variables in the shell.</p><p>Funny enough, Devin does not seem to be aware of this in some sessions, and insists on creating a <code>.env</code> file to get the secrets.</p><p>You can also just set it up in the Devin&#8217;s start commands, similar to how <code>.bashrc</code> works.</p><div><hr></div><h2>Performance and Results - Passable</h2><p>After using Devin for more than 10 simple to medium level tasks, I assess the actual performance of current version of Devin (v1.1.0, as of 15 January 2025) as <strong>passable</strong> at accomplishing engineering tasks.</p><p>Here are two examples of simple tasks that I gave to Devin.</p><h3>Update Blog Post Screenshots</h3><p>For the first task, I gave it a task on the <a href="https://prompt.16x.engineer/">website of 16x Prompt</a>. I asked it to update the old screenshots in my blog posts with the latest version on the landing page.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z1TJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z1TJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!z1TJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!z1TJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!z1TJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z1TJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:870919,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z1TJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!z1TJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!z1TJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!z1TJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68005d-c819-4657-9144-35253cad5ccd_3248x1940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a simple but tedious task that involves going through all the blog posts and finding instances where the screenshot was using an other version, and updating the screenshot to the latest version.</p><p>Devin correctly identified 3 files that needed updates, but missed a lot of others (low recall).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KxFa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KxFa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 424w, https://substackcdn.com/image/fetch/$s_!KxFa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 848w, https://substackcdn.com/image/fetch/$s_!KxFa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 1272w, https://substackcdn.com/image/fetch/$s_!KxFa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KxFa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png" width="1304" height="680" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:680,&quot;width&quot;:1304,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135351,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!KxFa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 424w, https://substackcdn.com/image/fetch/$s_!KxFa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 848w, https://substackcdn.com/image/fetch/$s_!KxFa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 1272w, https://substackcdn.com/image/fetch/$s_!KxFa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ef363c-d3ca-4cf3-a60e-af9c5cdc7aba_1304x680.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And for the 3 files that it correctly identified, it missed updating one of them when doing file edits. I am not sure why it happened.</p><p>I discovered these two issues while looking at the Vercel preview for the PR. After some back-and-forth, Devin managed to identify a lot more files that required update. I didn&#8217;t check if it covered all files, but it&#8217;s good enough for me.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BJJV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BJJV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 424w, https://substackcdn.com/image/fetch/$s_!BJJV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 848w, https://substackcdn.com/image/fetch/$s_!BJJV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 1272w, https://substackcdn.com/image/fetch/$s_!BJJV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BJJV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png" width="1456" height="869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1062673,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BJJV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 424w, https://substackcdn.com/image/fetch/$s_!BJJV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 848w, https://substackcdn.com/image/fetch/$s_!BJJV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 1272w, https://substackcdn.com/image/fetch/$s_!BJJV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3a5e7f-fc28-4e53-8de7-4e5aea20ed1d_3248x1938.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Overall the final PR was satisfactory, and I am happy with the outcome, but the time it took and the amount of back-and-forth communication is disappointing.</p><h3>Page Content Update</h3><p>Another simple task that I gave to Devin was to update the content in the <a href="https://prompt.16x.engineer/cli-tools">page</a> where I had a list of interesting cli tools for AI coding, to incorporate a new tool that I discovered.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uWG7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uWG7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!uWG7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!uWG7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!uWG7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uWG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:813341,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uWG7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 424w, https://substackcdn.com/image/fetch/$s_!uWG7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 848w, https://substackcdn.com/image/fetch/$s_!uWG7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 1272w, https://substackcdn.com/image/fetch/$s_!uWG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa198ab14-dcbb-4fce-a4eb-9165c0bde14e_3248x1940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For this task, Devin needs to figure out the format of existing tools in the list, find the one-sentence description of the tool from its GitHub repo, and add the new tool to the existing list with the right formatting.</p><p>Overall this task went pretty well. I got the first PR within 5 minutes. But I noticed a strange issue, where Devin doesn&#8217;t seem to know the current date.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CMT1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CMT1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 424w, https://substackcdn.com/image/fetch/$s_!CMT1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 848w, https://substackcdn.com/image/fetch/$s_!CMT1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!CMT1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CMT1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png" width="1350" height="1044" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1044,&quot;width&quot;:1350,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:196243,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!CMT1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 424w, https://substackcdn.com/image/fetch/$s_!CMT1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 848w, https://substackcdn.com/image/fetch/$s_!CMT1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!CMT1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd492b7f0-0309-44b8-b475-fac47e864745_1350x1044.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It thinks at December 2024 is a future date. Even after using system commands (date) in cli to get the current date (15 January, 2025), it still thinks that it is wrong.</p><p>After some back-and-forth, Devin finally convinced itself that the system date is correct, and proceeded to update the date in the page.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W_sq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W_sq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 424w, https://substackcdn.com/image/fetch/$s_!W_sq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 848w, https://substackcdn.com/image/fetch/$s_!W_sq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 1272w, https://substackcdn.com/image/fetch/$s_!W_sq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W_sq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png" width="1456" height="869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:778273,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W_sq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 424w, https://substackcdn.com/image/fetch/$s_!W_sq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 848w, https://substackcdn.com/image/fetch/$s_!W_sq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 1272w, https://substackcdn.com/image/fetch/$s_!W_sq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b060f1-2ced-4f50-b1d9-9c76d350dbbc_3248x1938.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Overall Performance Evaluation</h3><p>I also gave some more complex tasks:</p><ul><li><p>Integrating 3rd party SDKs</p></li><li><p>Writing scripts to generate content using Anthropic API</p></li><li><p>Generate API endpoints in Express based on OpenAPI specs</p></li><li><p>Make a web page to simulate calling a series of APIs calls and display the results</p></li></ul><p>Most of these tasks took more time and back-and-forth, but eventually I got Devin to accomplish all of them <strong>without me writing any code</strong>.</p><p>So here is what I think is good, and bad about it overall.</p><p>The good things:</p><ul><li><p>Devin can actually<strong> complete the tasks</strong> I gave to it <strong>without need me to write any code</strong> myself.</p><ul><li><p>For all tasks that I gave Devin which I believe are of simple to medium level difficulty for a typical software engineer, Devin managed to accomplish them, with some help from me along the way or during the review).</p></li><li><p>I only had to write in English the entire time, not a single line of code was written by me.</p></li></ul></li><li><p>Devin can <strong>automatically write tests</strong> (without being prompted) and execute them either locally or via CI to verify that everything works.</p></li><li><p>For more complex tasks, Devin writes <strong>very detailed descriptions in PR</strong> to outline the changes made and the testing done to ensure that the code works. This is helpful for reviewing the PR.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mi9w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mi9w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 424w, https://substackcdn.com/image/fetch/$s_!mi9w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 848w, https://substackcdn.com/image/fetch/$s_!mi9w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!mi9w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mi9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png" width="1456" height="1006" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1006,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:281711,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mi9w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 424w, https://substackcdn.com/image/fetch/$s_!mi9w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 848w, https://substackcdn.com/image/fetch/$s_!mi9w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!mi9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79c766df-6b06-425c-9c63-00c4576c0903_1838x1270.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p>Devin can <strong>autonomously work on its own, in parallel</strong>, for multiple tasks, speeding up development process significantly.</p></li><li><p>Devin is really &#8220;smart&#8221; in some cases.</p><ul><li><p>For example, when I instructed it to use <code>curl</code> to run some insomnia collection exported as <code>JSON</code>, it actually used <code>jq</code> to retrieve and chain variables between the different requests.</p></li><li><p>Later after repeatedly doing the same thing, it started writing bash scripts to automate this (it also knows about running <code>chmod +x</code>).</p></li></ul></li><li><p>Devin is <strong>really fast for certain tasks</strong> involving building a demo page or writing an automation script. For example, a demo page with 3 API calls that would take me at least 15 minutes to build, only took Devin 43 seconds.</p></li><li><p>Devin can <strong>&#8220;visually see&#8220; web pages</strong> in his browser, by taking screenshots of them (presumably via browser automation APIs). This means you can ask it to attach screenshots in the PR to show how it actually looks like after making a frontend change.</p><ul><li><p>This doesn&#8217;t work currently with GitHub website (the linked images resides internally within Devin&#8217;s system, not public accessible).</p></li><li><p>But the screenshots can be viewed both via Devin&#8217;s web interface and within Slack.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JZw5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JZw5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 424w, https://substackcdn.com/image/fetch/$s_!JZw5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 848w, https://substackcdn.com/image/fetch/$s_!JZw5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 1272w, https://substackcdn.com/image/fetch/$s_!JZw5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JZw5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png" width="1456" height="765" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:765,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:311513,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JZw5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 424w, https://substackcdn.com/image/fetch/$s_!JZw5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 848w, https://substackcdn.com/image/fetch/$s_!JZw5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 1272w, https://substackcdn.com/image/fetch/$s_!JZw5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd89fafef-e8bf-45f4-9071-1c9633e72d4e_1592x836.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul></li></ul><p>The good/bad thing:</p><ul><li><p>Devin is <strong>very cautious about committing sensitive information</strong> into the repository, due to its built-in &#8220;security best practices&#8221; in the &#8220;system prompt&#8221;. Maybe this is due to the earlier security lapse that was reported in December, or Devin had been reported to accidentally committed sensitive data before.</p></li><li><p>For example, I tried to ask it to commit Stripe test cards (which are publicly available online) to the repo but Devin came back to confirm with me 3 times that I wanted to do it, basically refusing to it. Eventually Devin did commit those in a bash script file (perhaps unknowingly).</p></li><li><p>This also causes Devin to go into an<strong> infinitely editing loop</strong> sometimes, when the user instruction conflicts with the &#8220;security best practices&#8221; and Devin will edit the file back-and-forth in an endless loop:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lsik!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lsik!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 424w, https://substackcdn.com/image/fetch/$s_!lsik!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 848w, https://substackcdn.com/image/fetch/$s_!lsik!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 1272w, https://substackcdn.com/image/fetch/$s_!lsik!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lsik!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png" width="132" height="361.1142857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:280,&quot;resizeWidth&quot;:132,&quot;bytes&quot;:52104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lsik!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 424w, https://substackcdn.com/image/fetch/$s_!lsik!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 848w, https://substackcdn.com/image/fetch/$s_!lsik!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 1272w, https://substackcdn.com/image/fetch/$s_!lsik!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cbcd8ba-da94-4950-8e54-383403464f20_280x766.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><p>The bad things:</p><ul><li><p>Devin is <strong>constrained by the limited number of tools </strong>available to it (the &#8220;senses&#8221;).</p></li><li><p>Devin <strong>can&#8217;t visit web pages that requires human-triggered authentication.</strong></p><ul><li><p>For example, Devin can&#8217;t visit GitHub website and see the PR that it created, or look at GitHub Actions pages to troubleshoot CI issues.</p></li><li><p>To debug an error that happened during CI, it has to be either &#8220;look&#8221; (RAG) through all the logs from GitHub actions, or use <code>grep</code> to search for keywords. Human developers, on the other hand, can go to GitHub website, visually locate the relevant logs, and quickly narrow down the issue to specific lines where the error happened.</p></li><li><p>I don&#8217;t think the human / visual troubleshooting approach is necessarily the best or more efficient, but it does outperform what Devin is doing now.</p></li><li><p>I assume Devin can&#8217;t access action logs on GitHub website like a developer normally would, because it is an app on GitHub and GitHub apps typically only allow with API access.</p></li><li><p><s>This also causes issues where Devin can&#8217;t fix UI some issues because it is not able to visually inspect how it looks like via human eyes (for example, a complex layout on a web page, or the aspect ratio of an image).</s></p></li></ul></li></ul><blockquote><p>Update on 17 January 2025: I discovered that Devin can in fact <strong>&#8220;see&#8221; the web pages</strong> by taking screenshots in his browser. You just need to ask him to do it.</p></blockquote><ul><li><p>Some other examples of actions that are trivial for a developer, but (presumably not possible) for Devin:</p><ul><li><p>Copy-pasting a few lines from one file to another</p></li><li><p>Visually see the lint errors via IDE built-in lint system</p></li><li><p>Use IDE built-in features to perform refactoring</p></li></ul></li><li><p>I believe Devin currently relies primarily on <strong>LLMs</strong> (with some custom formats or additional logic on top) to &#8220;<strong>edit files&#8221;</strong>, which is inefficient and prone to errors. This makes Devin less efficient in handling some types of tasks like refactoring or troubleshooting issues.</p></li><li><p>Devin is <strong>really slow in completing certain tasks</strong>, much slower than software engineers. Sometimes it can be appear to be idle for a few minutes. For example, a task that would take me 30 minutes to complete end-to-end took Devin more than 2 hours.</p><ul><li><p>I suspect this has to do with either the underlying LLMs&#8217; latency, or the endless loop due to conflicts between the &#8220;system prompt&#8221; and the user instructions.</p></li></ul></li><li><p>Devin's <strong>performance degrades in long sessions</strong>. A warning appears when the session has been going on for a long time (at around 2.5 hours or 10 ACUs).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!93yV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!93yV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 424w, https://substackcdn.com/image/fetch/$s_!93yV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 848w, https://substackcdn.com/image/fetch/$s_!93yV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 1272w, https://substackcdn.com/image/fetch/$s_!93yV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!93yV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png" width="1182" height="418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:418,&quot;width&quot;:1182,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37525,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!93yV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 424w, https://substackcdn.com/image/fetch/$s_!93yV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 848w, https://substackcdn.com/image/fetch/$s_!93yV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 1272w, https://substackcdn.com/image/fetch/$s_!93yV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0db1a917-6909-4311-9821-488dc1e108d3_1182x418.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>I suspect this has to do with the <a href="https://prompt.16x.engineer/blog/chatgpt-context-window-token-limit">effective context window</a> of the underlying LLMs. When the session goes on for too long, the history cannot fit into the context window, and the cost for each subsequent LLM call keeps going up.</p></li><li><p>Claude gives similar warning on their UI as well.</p></li></ul></li><li><p>Devin generates two plans for each task, one in plain text form inside the chat, and one using DSL / pseudocode inside planner. These two do not always align on the steps. The planner might miss out some important steps that was mentioned in the chat.</p></li><li><p>Devin sometimes does not pick up the relevant knowledge from the knowledge base (recall problem in RAG). And in cases where it does pick up the relevant knowledge, there is a chance that Devin will not use that knowledge anyway (likely a limitation of the reasoning capabilities of the underlying LLM, and Devin&#8217;s internal planning logic).</p></li></ul><p>Overall Devin feels like <strong>a smart developer being asked to code while being handcuffed</strong>.</p><div><hr></div><h2>Pricing - Fair</h2><blockquote><p>All pricing below are in USD.</p></blockquote><p>Devin is currently priced at $500 a month for 250 Agent Compute Units (ACUs).</p><p>Worth noting that on the plan page, $500 is listed as the &#8220;discount price&#8221;, whereas the original price is $1250.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DSmf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DSmf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 424w, https://substackcdn.com/image/fetch/$s_!DSmf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 848w, https://substackcdn.com/image/fetch/$s_!DSmf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 1272w, https://substackcdn.com/image/fetch/$s_!DSmf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DSmf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png" width="656" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:656,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38859,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DSmf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 424w, https://substackcdn.com/image/fetch/$s_!DSmf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 848w, https://substackcdn.com/image/fetch/$s_!DSmf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 1272w, https://substackcdn.com/image/fetch/$s_!DSmf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df51351-df78-408d-8bb6-51d06d4f8707_656x482.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><s>I got 50 additional ACUs after onboarding (300 in total), which is supposed to be a gift for scheduling a call, even though I didn&#8217;t actually schedule one due to very limited slots around 3am in Singapore.</s></p><blockquote><p>Update 22 January: Turns out, I didn&#8217;t get 50 additional ACUs. Instead, it was showing 300 ACUs because I had configured 50 extra ACUs for &#8220;monthly additional usage budget&#8221;. I didn&#8217;t realize that Devin includes ACUs in the &#8220;budget&#8221;, that was not yet used, as part of total ACUs available.</p></blockquote><p>For additional usage beyond the included 250 ACUs, it is $2 / ACU. So the additional usage charge is exactly the same as the base subscription price, at $500 per 250 ACUs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8wDP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8wDP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 424w, https://substackcdn.com/image/fetch/$s_!8wDP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 848w, https://substackcdn.com/image/fetch/$s_!8wDP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 1272w, https://substackcdn.com/image/fetch/$s_!8wDP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8wDP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png" width="1018" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:1018,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106892,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8wDP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 424w, https://substackcdn.com/image/fetch/$s_!8wDP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 848w, https://substackcdn.com/image/fetch/$s_!8wDP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 1272w, https://substackcdn.com/image/fetch/$s_!8wDP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5e557f-b26a-4603-86af-f7d1c4b9df20_1018x528.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This means Devin, as of now, is essentially using a <strong>volume-based pricing of $500 per 250 ACUs</strong>.</p><p>I think this pricing makes sense because the cost of operating Devin does scale linearly with the amount of compute and task.</p><h3>Agent Compute Units (ACUs)</h3><p>According to Devin&#8217;s official documentation:</p><blockquote><p>Currently, <strong>each ACU is approximately equivalent to 15 minutes of active Devin work</strong>. As reference, simple frontend fixes usually take between 5-15 minutes.</p></blockquote><p>In my first round of testing, I ran 4 simple to medium tasks across 2 repos, in total 7 Devin sessions (2 extra sessions for &#8220;Verify Repo Access Tasks&#8220; / onboarding, a one extra session because Devin crashed once). Completing these 4 tasks successfully costed me 46.47 ACUs ($92) in total.</p><p>So each task costed me about 11.5 ACUs, which translates to $23, or approximately 172.5 minutes / 2.9 hours.</p><p>I think I would be able to finish these tasks faster myself (maybe an hour to 1.5 hours, depending on the complexity). But I am happy to pay Devin $23 to do a task that would have costed me an hour (in a frictionless way), so that I can be free to do other more meaningful things.</p><h3>Cost of Devin vs Junior Engineer</h3><p>In terms of the cost, Devin costs $2 per 1 ACU (15 minutes of Devin active work) at a flat rate. The hourly rate would be $8 per hour.</p><p>Say we hire Devin full-time for a month (22 days working days, each day 8 hours). That would translate to $8 * 8 hours * 22 days = $1,408 a month.</p><p>However, Devin is slow in completing some tasks (compared to a good junior engineer). So I would multiple the cost by a factor of 2 to 4 to achieve the same productivity. So it would cost between <strong>$2,800 to $5,600 USD a month for Devin to achieve a full-time junior engineer level productivity</strong>. This is similar to the salary range of a junior engineer in Singapore.</p><p>A key benefit of Devin, however, is that you can just hire him by entering your credit card. And you can hire (scale up) as many Devins as you want to work on things in parallel. You can&#8217;t hire human engineers that quickly, and humans comes with overhead cost of managing them.</p><div><hr></div><h2>Conclusion</h2><p>Despite being publicly available, I would say Devin is still at a very early stage in terms of maturity. There are some key constraints on what Devin can do. And it has some quirks that seems bizarre for an AI software engineer.</p><p>However, I must say that the user experience (UX) is by far the best I&#8217;ve seen in recent years, surpassing ChatGPT and Claude.</p><p>I am hopeful that if Cognition can fix these quirks and find novel solutions to overcome the constraints, Devin can really change the entire software engineering landscape.</p><p>Worth the $500 for me.</p><div><hr></div><p>I will continue to use Devin daily, give it more variety of tasks, and post more in-depth analysis in the future. Subscribe to receive them.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegroundtruth.media/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is The Ground Truth.]]></description><link>https://thegroundtruth.media/p/coming-soon</link><guid isPermaLink="false">https://thegroundtruth.media/p/coming-soon</guid><dc:creator><![CDATA[Zhu Liang]]></dc:creator><pubDate>Thu, 24 Oct 2024 12:55:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iZv-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F961dbec3-a09e-4643-9c25-05f83cdda466_534x534.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is The Ground Truth.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegroundtruth.media/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegroundtruth.media/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>