How I’ve run major projects

My few most productive individual weeks at Anthropic have all been “crisis project management:” coordinating major, time-sensitive implementation or debugging efforts.

In a company like Anthropic, excellent project management is an extremely high-leverage skill, and not just during crises: our work has tons of moving parts with complex, non-obvious interdependencies and hard schedule constraints, which means organizing them is a huge job, and can save weeks of delays if done right. Although a lot of the examples here come from crisis projects, most of the principles here are also the way I try to run any project, just more-so.

I think excellent project management is also rarer than it needs to be. During the crisis projects I didn’t feel like I was doing anything particularly impressive; mostly it felt like I was putting in a lot of work but doing things that felt relatively straightforward. On the other hand, I often see other people miss chances to do those things, maybe for lack of having seen a good playbook.

So here’s an attempt to describe my playbook for when I’m being intense about project management.

(I’ve described what I did as “coordinating” above, but that’s underselling it a bit; it mattered a lot for this playbook that I had enough technical context, and organizational trust, to autonomously make most prioritization decisions about the project. Sometimes we instead try to have the trusted decisionmakers not be highly involved in managing execution, and instead farm that out to a lower-context or less-trusted project manager to save the trusted decisionmaker time, but IMO this is usually a false economy for projects where it’s critical that they be executed well.)

Focus

For each of the crisis management projects I completely cleared my schedule to focus on them, and ended up spending 6+ hours a day organizing them.

This is a bit unintuitive because I’m used to thinking of information processing as basically a free action. After all, you’re “just” moving info from place to place, not doing real work like coding, right? But if you add it all up—running meetings, pinging for updates, digesting Slack threads, pinging for updates again, thinking about what’s next, pinging for updates a third time, etc.—it’s surprisingly time-intensive.

Even more importantly than freeing up time, clearing my schedule made sure the project was the top idea in my mind. If I don’t do that, it’s easy for me to let projects “go on autopilot,” where I keep them running but don’t proactively make time to think through things like whether we should change goals, add or drop priorities, or do other “non-obvious” things.

For non-crisis projects, it’s often not tenable (or the right prioritization) to spend 6+ hours a day project-managing; but it’s still the case that you can improve execution a lot if you focus and make them a top priority, e.g. by carving out dedicated time every day to check statuses, contemplate priorities, broadcast updates, and so on.

Maintain a detailed plan for victory

A specific tool that I’ve found critical for staying oriented and updating quickly is a detailed plan for victory, i.e., a list of steps, as concrete as possible, that end with the goal being achieved.

The plan is important because whether or not we’re achieving the plan is the best way to figure out how well or badly things are going. Knowing how well or badly things are going is important because it tells me when to start asking for more support, cutting scope, escalating problems, and otherwise sounding more alarms. One of the most common megaproject failure modes is to not freak out soon enough, and having a concrete plan is the best antidote.

As a both positive and negative example of this, during a recent sprint to release a new implementation of a model, we took a detailed accounting of all the work we thought we had to do to launch.

As the above example shows, having a plan can’t completely save you if you underestimate how long all the steps in the plan will take. But it certainly helps! My sense is that the main things that would have helped even more in the above case were:

Run a fast OODA loop

OODA stands for “observe, orient, decide, act”—in other words, the process by which you update your plans and behavior based on new information.

Most of the large projects I’ve worked on have been characterized by incomplete information:

In fact, I’d make a stronger claim: usually getting complete information was the hard part of the project, and took up a substantial fraction of the overall critical-path timeline.

For example, let’s take a recent project to kick off a training run. The critical path probably looked something like:

  1. Chips for the training run are delivered
  2. We run some tests
  3. We discover one aspect of performance is unexpectedly poor
  4. We escalate the problem with our compute partner
  5. Compute partner staffs a large debugging effort
  6. We realize we had given our compute partner an outdated benchmark that is causing them to target the wrong improvements
  7. Compute partner switches benchmark and prioritizes different improvements
  8. We share our benchmarks with compute partner so they can run the exact same code as us
  9. Compute partner rolls out improvements
  10. We test the improvements
  11. Performance is still poor and we tell them that
  12. Repeat steps 8-10 until eventually it’s good enough

Practically all of these steps are about information-processing, not writing code! Even the step where the compute partner debugged the problems on their side was itself constrained by information processing speed, since there were tens of people working on the debugging effort and coordinating / sharing info between them was difficult. Overall, the project timeline was strongly constrained by how quickly information could round-trip from our compute partner’s large-scale debugging effort, through their tech lead, me, and Anthropic’s large-scale debugging effort.

This pattern generalizes to most projects I’ve been a part of, and as a result, one of my most productive project management habits is to try to run the fastest OODA loop that I can.

A few specific things that I’ve found help:

Overcommunicate

It’s not just enough for me personally to be running a fast OODA loop—in a large group, everyone needs to be autonomously making frequent, high-quality, local prioritization decisions, without needing a round-trip through me. To get there, they need to be ambiently aware of:

  1. what else is going on around them, so they can coordinate and update on new info quickly (“oh, we’re planning to kick off the next derisking run in three days, so I have to have my new RL environment ready and tested by then”)
  2. how their goal fits into the overall project, so they can make correct decisions about the details of their approach (“we’re trying to scale up as much as possible right now, so this direction isn’t valuable to pursue since it could never provide the scale of data we need”)

I’ve usually found that to create the right level of ambient awareness, I have to repeat the same things way more often than I intuitively expect. This is roughly the same “communicate uncomfortably much” principle above, but applied to broadcasts and not just 1:1 conversations with people.

For example, although the first team I managed at Anthropic started with a daily asynchronous standup, we found that synchronous meetings were much more effective for creating common knowledge and reorienting, so we moved to a twice-weekly synchronous standup, which probably qualified as “uncomfortably much” synchronous communication for Anthropic at the time.

Break off subprojects

Once a project gets over maybe 10 people, I can’t track everything myself in enough detail to project-manage the entire thing myself. At this point, it becomes critical to delegate.

Here I mean delegating the project management, not just the execution (that’s what I’d be delegating to the first 10 people). This is the point where I need other people to help split up the work, monitor and communicate progress, escalate blockers, etc.

A few things I try to keep in mind when delegating project management:

One of my favorite things to make delegation easier is to keep goals simple—if they can fit in a Slack message while still crisply describing a path to the desired end state, then the people working on the goal will be much more able to prioritize autonomously, and point their work at the real end goal rather than doing something that turns out to be useless for some reason they didn’t think about.

“Keep goals simple” doesn’t have to mean “do less”—the best way to keep goals simple is to find the latent structure that enables a clean recursive decomposition into subgoals. This often requires a deceptive amount of work—both cognitive and hands-on-keyboard—to identify the right intermediate goals, but I’ve found that it pays off immensely by clarifying what’s important to work on.

Have fun

Some of my favorite memories of Anthropic are of helping out with these big projects. While they can be intense, it’s also really inspiring to see how our team comes together, and the feeling of being part of a big team of truly excellent people cooking something ambitious together can be really magical! So I try to enjoy the chaos :)


Appendix: my project DRI starter kit

Here’s the internal doc I share with folks on my team who are getting into being responsible for large projects.

So you’re the DRI of a project (or part of one). Concretely, what do you do to “be DRI”?

This doc is my suggested “starter kit” answer to that question. The habits and rituals described here aren’t perfect for every situation, but they’re lightweight and broadly helpful. I suggest you use them as a starting point for iteration: try them out, then adjust as necessary. This is an SL init; the RL is your job :)

Goals of this playbook

The goal is to help you do your job as DRI—

—without adding too much overhead:

(Note: being DRI will still unavoidably add some overhead—e.g. you’ll have to track what other people are doing, delegate work, unblock people, set and communicate goals, etc. The goal is specifically for the process/paperwork to be minimal.)

Weekly meeting

You should schedule at least one 30-minute weekly meeting with everyone working on the project.

The goal of this meeting is to (1) be a backstop for any coordination that needs to happen and didn’t happen asynchronously; (2) be an efficient way to create common knowledge of goals, updates, etc.; (3) help you track whether things are going well.

Landing page / working doc

It’s really helpful for discoverability and wayfinding to have a single “master doc” with all the most important info about a project. As you loop more people in, they can read the doc to get up to speed. And anyone who thinks “I wonder how X is going” can stop by there to find out.

Create a doc for your workstream with:

If it’s part of a larger project, your doc should be nested within the larger project’s working doc.

If this ends up being too much for one doc, you can fork these out into sub-docs (esp. running notes and updates).

Plan / roadmap / milestones

Who’s working on what

Slack norms

Weekly broadcast updates

Retrospectives

Thanks to Kelley Rivoire for many thoughtful comments on a draft!

Comments

email me replies

format comments in markdown.

Your comment has been submitted! It should appear here within 30 minutes.
joe cubed

this was really insightful! i’m building a startup rn and i hope to incorporate your learnings as we scale :)

email me replies

format comments in markdown.

Your comment has been submitted! It should appear here within 30 minutes.

Jon Bo

Fascinating and pragmatic dive into how to run complex things. Thank you for sharing

email me replies

format comments in markdown.

Your comment has been submitted! It should appear here within 30 minutes.