Background image

Managing Distributed Code in the Enterprise: How Git Subtrees Saved Our Sanity

In almost every large enterprise I've encountered, one consistent struggle persists: managing internal dependencies spread across numerous polyrepos. Often, the solution revolves around layering tooling and orchestration to tame complexity. But here's the issue: this only adds more complexity, more code to manage, and more opportunities for problems to compound.

Let's talk briefly about some traditional approaches:

  • Stacked PRs: A common attempt to orchestrate coordinated merges across multiple repos. Yet, it becomes cumbersome at scale, adding tooling to solve tooling problems—a classic anti-pattern.
  • Git Submodules: Just don't. Ask anyone who's used them, and you'll quickly hear horror stories of endless frustration.

Enter the Monorepo

A monorepo simplifies the developer experience significantly:

  • Single PRs can modify multiple components across the stack.
  • Simplifies CI/CD pipelines.
  • Reduces the need for complicated orchestration.

Yet, even with a monorepo, managing changes to and from external or internal dependencies remains challenging. You must still deal with synchronizing your changes with external repositories.

Git Subtrees: The Middle Path

At TestifySec, when I joined as Director of Platform Engineering, we faced a daunting situation: polyrepos everywhere, unclear dependency boundaries, and a painful development cycle. With our seed funding on the horizon and a new engineering team incoming, it was imperative to streamline this immediately.

My solution? A monorepo powered by git subtrees.

Here's how I structured it:

  • Root-level directories for domain-specific code: /web, /judge-api, /infra, /cd
  • All external/internal dependency repos included under /subtrees/*.

This provided clarity—anything under /subtrees/ was explicitly recognized as distributed code, deserving extra attention.

Benefits of Git Subtrees

  1. Single PR Across Multiple Components: One PR could span infrastructure, API changes, frontend updates, and third-party patches.
  2. Clear Dependency Management: Subtrees let us merge external repositories cleanly into the monorepo without the complexities or pitfalls of submodules.
  3. Deferred Upstreaming/Downstreaming: We could temporarily ignore syncing changes upstream/downstream. Junior engineers could contribute immediately, and more senior members could handle integration later.

Handling External Dependencies

We identified three types of subtree management patterns:

  • Private internal repositories: Easy management—no external synchronization required.
  • Public open-source projects: Generally, you don't need to fork external repositories to manage them with subtrees. However, if you intend to upstream changes, introducing a fork can be beneficial. It creates a useful buffer zone, adds an additional layer of scrutiny, and provides an opportunity to keep certain changes private within your monorepo before selectively upstreaming.
  • Critical third-party dependencies: Incorporated into our monorepo directly, enabling rapid vulnerability patches without waiting on external maintainers.

Automating Subtree Management

To further streamline the workflow, we scripted the git subtree commands into makefiles. For example:

make subtrees-pull-<subtree-name>
make subtrees-push-<subtree-name>

This simplified even the more intricate operations of syncing changes upstream and downstream.

Challenges and Workarounds

Git subtrees aren't without quirks:

  • Merge Commit Necessity: Subtrees inherently use merge commits, conflicting with our branch protection rules enforcing rebases.
  • Manual Rebase Simulation: This specific issue occurs only when someone merges to main before a downstream PR is completed. In this situation, a straightforward rebase can cause conflicts. Our painless workaround is simply starting fresh branches from main and running automated subtree pulls again. Importantly, this scenario can generally be avoided by coordinating PR merges, making manual rebasing an infrequent and minimal inconvenience.

Despite these challenges, the subtree model significantly enhanced our workflow.

Impact at TestifySec

Implementing git subtrees within a monorepo radically improved our developer experience, accelerated feature development, simplified onboarding, and shortened the time to value. Our entire CI/CD infrastructure became simpler, and our ability to rapidly deliver high-value features dramatically increased.

This architecture was crucial when we faced aggressive deadlines, such as launching a public SaaS and achieving an AWS marketplace listing simultaneously within a 4-week sprint. Without our monorepo powered by git subtrees, these ambitious targets might have been unattainable.

Closing Thoughts

Git subtrees, combined with a carefully organized monorepo, offer an elegant, scalable solution to managing distributed code. If your organization struggles with the complexity of polyrepos and dependency management, consider giving git subtrees a try—you might just regain your sanity.