I Cloned Buffett and Graham with AI and Had Them Team Up to Automate My Investment Research
I've been running multi-agent teams since February. Writing content, shipping code, doing research. The question I get most is: how do you get them to actually work together? The answer has nothing to
Written by
Voxyz AI

I've been running multi-agent teams since February. Writing content, shipping code, doing research. The question I get most is: how do you get them to actually work together? The answer has nothing to do with the model. The biggest mistake I made these past few months was assuming a smarter model would produce better results. What actually makes the difference is how you seat them, split the work, and set up opposition. The upgrade is the org chart. I recently came across two open-source projects, TradingAgents (81k stars) and AI Hedge Fund (59k stars). On the surface they're both investment research frameworks, but the more I looked, the more interesting they got: they laid out the exact teamwork structure I'd been figuring out on my own. Analysts collect information in parallel, bull/bear researchers debate, a trader synthesizes, a risk team pokes holes, and a portfolio manager gives final approval. Both projects are upfront about it: this is an engineering pattern for research purposes, not investment advice. Investment research is the shell. What's worth studying is the agent team architecture underneath.
This article breaks down that structure, with a team-building template you can take and use directly. What I care about is how an agent team divides labor, checks each other, and leaves a reviewable trail. Investment research just happens to be the noisiest, highest-stakes scenario where collaboration quality shows up fastest. I built the whole thing on Bloome. Easiest way to picture it: a group chat where some of the members are AI agents instead of people. You add them like contacts, give each one a role, and they talk to each other and to you in the same thread. A Usable Agent Team Has to Pass Five Gates After months of building agent teams across different domains, I've boiled it down to five things: Agent Team: Five-Gate Checklist
TradingAgents took off because it broke a high-stakes judgment call into exactly this kind of organizational pipeline. Every step documented, every step traceable. It has nothing to do with "AI trading stocks." These Five Gates Need a Room These five gates are hard to run in a single chat window. Sure, you can write one prompt: "Please play Buffett, Graham, the bull, the bear, and the Lead." But that's still one model doing five voices in one breath. They don't have real seats. They can't really see each other. The most common result: the voices sound different, but the final judgment is still a mush. I needed a place where multiple agents could actually see each other. That's exactly what Bloome gives you: humans and agents in the same group chat, each agent with its own role card and boundaries. The Lead is a real member in the group with its own context, not a paragraph of instructions buried in a prompt. The bull and the bear are two independent agents that can push back face-to-face, not two paragraphs inside the same response. You @ everyone with a question, and they break it apart, push back, debate in the same thread. Then the Lead collects it into a memo. It's like pulling a few colleagues into a group chat for a meeting, except these colleagues won't compromise out of politeness, but they will step on each other's toes if you didn't write clear role boundaries. Bloome is a supporting character in this article. What it does is simple: it turns those five gates into a visible process. How the gates map onto a group chat
Building the team itself isn't hard. The hard part is writing clear boundaries for each agent: what it's responsible for, what it can't touch, and when it must hand off to the Lead. Below, I'll run through all five gates using investment research as the scenario. Gate 1: The Lead Rejects Bad Questions First I set up the investment research team on Bloome, and the first thing I did was @ everyone: 10 million into EV, what happens? I waited for someone to throw a stock at me. Nobody did. The Lead held the question and pushed back: Lead Intake Gate For any research question, the Lead asks five things first:
I threw a bad question. The Lead not rushing to answer was the first sign this system worked. A good Lead reshapes the question into something answerable before showing off. I replied: USD, three to five years. Only then did it greenlight and start assigning work.
Gate 2: Roles Aren't Personas. They're Filters. I staffed the team with AI Buffett and AI Graham. But cloning investment legends is where most people go wrong. The bad approach: "You are Buffett. Please analyze this stock in Buffett's voice." That just gives you a cosplay bot. The right approach is translating the master into a set of judgment filters: AI Buffett Role Card
AI Graham Role Card
What makes a master agent valuable is the stable filter behind it. Whether it sounds like the real person barely matters. The AI Hedge Fund project does the same thing: Graham is defined as "only buys hidden gems with margin of safety," Buffett as "looks for wonderful companies at fair prices." Master personas get translated into executable investment filters. Why these two specifically? Because Buffett and Graham naturally disagree. One always looks at the world ten years out. The other only cares about today's safety cushion. You don't need to engineer conflict. Their investment philosophies are inherently opposed. The key to casting: the tension between roles should be built-in, not forced. Cloning a role on Bloome takes seconds. The whole team is up in minutes. The team I built: Finance Lead, Buffett, Graham, The Bull, The Bear.
Gate 3: Opposing Sides Aren't Theater. They're Anti-Confirmation Bias. A single AI's biggest flaw is that it wants to please you too much. Whatever you say, it agrees. Swapping models won't fix this. You have to fix it in the architecture. Beyond Buffett and Graham, I added a die-hard bull and a die-hard bear to the team. Their job isn't to offer opinions. It's to push both sides to the extreme: Bull / Bear Debate Protocol
In practice, the Buffett agent kept pulling the conversation back to one point: will this company still be here in ten years? The Graham agent wouldn't engage with that. It only cared about one thing: whether the current price leaves enough safety cushion, and how far it could fall. One looks at the decade, the other looks at the downside. They went back and forth on the same stock, neither convincing the other. By the end, the disagreement had shifted from emotion to verifiable hypotheses. That's far more useful than picking a winner. The Lead synthesized both sides into a conclusion I could actually understand. Buffett judges long-term quality, Graham judges margin of safety, the Lead synthesizes in between.
Gate 4: Why Single-Assistant Mode Hits a Wall Here I asked a similar question to a single default assistant. It quickly went into compliance mode: I can't give investment advice. That's not a bad thing. Financial questions really shouldn't get snap answers from a single assistant. The issue is: single-assistant mode only has one exit. It either answers directly or refuses. An agent team adds a middle layer: it turns "give me a stock pick" into "organize a research process." So the Lead asks constraints, Specialists break down perspectives, Bull/Bear lay out disagreements, and the final Memo only gives next research steps, never a buy/sell order. The underlying model didn't change. What changed is that the question got placed inside an organizational process. Gate 5: The Output Isn't an Answer. It's a Decision Memo. A good agent team should output a reviewable decision record: Decision Memo Template
When I asked about putting 10 million into EV, what I got back wasn't "buy" or "don't buy." It was a document laying out both sides' arguments, key assumptions, and the conditions under which the conclusion would be invalid.
The latest version of TradingAgents added a persistent decision log. AI Hedge Fund also emphasizes that agent reasoning must be debuggable. They independently arrived at the same conclusion: whatever an agent team outputs, you have to be able to review it after the fact. This Works Beyond Investment Research Bloome was just the room this time, not the boundary of the method. I used investment research as a stress test because it's noisy, high-risk, and the fastest way to see whether a team is actually checking each other. Swap in content, code, product, or sales, and it's still the same five gates: Investing. Collector: news, filings, technicals. Specialist: Buffett, Graham. Adversary: bull, bear, risk. Lead: PM. Content. Collector: material gathering. Specialist: writer. Adversary: fact-check, pushback. Lead: editor. Code. Collector: repo reader. Specialist: implementer. Adversary: reviewer, security. Lead: tech lead. Product. Collector: user feedback. Specialist: PM agent. Adversary: skeptical user. Lead: founder. Sales. Collector: lead research. Specialist: account strategist. Adversary: objection handler. Lead: sales lead. The value of multi-agent isn't having more agents. It's making a question pass through multiple judgment positions. Come @ the Team Yourself I didn't hide this team behind a screenshot. It's the exact group I built. Same Lead, same AI Buffett and Graham, same die-hard bull and bear, all sitting in one chat. Step inside, @ the team, and throw it a question of your own. I haven't put it into the Arena yet. This piece is a team-building notebook, not a competition recap. But if you want to see fully automated teams go head to head under the same tasks, the same data, and the same simulated constraints, go watch the Bloome Arena. The point isn't who made the most in one round. It's watching how different teams divide work, make mistakes, and synthesize. The Bloome Live Trading Arena: agent teams competing on the same capital, in public.
Investment research is just the stress test. What's really worth watching is the organizational capacity an agent team reveals in a public environment. To Be Clear This is an experiment running on a paper trading simulator. I don't know investing, and this article isn't stock advice. What I cloned are AI roles built on publicly available investment philosophies. They have nothing to do with the real people. From start to finish, what I'm curious about is how the team collaborates. Which stock it ended up picking? I honestly didn't pay much attention. A single AI is like a well-spoken intern. A good agent team is more like a small meeting room. Your job isn't to make them all smarter. It's to arrange who gathers information, who plays devil's advocate, who flags risk, and who cleans up the table at the end. References TradingAgents: Open-source multi-agent investment research framework, 81k stars. Breaks high-stakes judgment into an orchestratable, traceable collaboration pipeline (research purposes). AI Hedge Fund: Open-source investment legend agent system, 59k stars. Proves master personas can be translated into executable filters (educational purposes). Bloome (bloome.im): Multi-agent messaging platform, agents join your group chat as teammates. Alpaca Paper Trading: Paper trading simulator, test strategies without real money.

Next step
If you want to build your own system from this article, choose the next step that matches what you need right now.
Related insights
Dynamic Workflows: Claude Code Is Moving the Agent Plan from Chat into Executable Scripts
Claude Code shipped Dynamic Workflows a few days ago. Most people's first reaction: "finally, hundreds of subagents at once." That's half right. To be clear, "hundreds" doesn't mean hundreds running
Read next10 Lessons for Writing a Good AGENTS.md: Get Codex and Claude Code to Understand Your Project
One markdown file plus a famous name, 162K stars (andrej-karpathy-skills). I stared at that number for a while. The name did a lot of the lifting. The same file from an unknown account would mostly
Read nextI tried letting my scheduled agents deliver only HTML, and I'm not going back
A couple weeks ago Thariq published "Using Claude Code: The Unreasonable Effectiveness of HTML," and it hit 12.6M views. His argument: Markdown has become the bottleneck for agent output, and he's
Read next