When forwardRef becomes a saga: planning module extraction in a modular monolith

TL;DR

Cross-module atomic writes via NestJS forwardRef work because the modules share a process, a connection pool, and a transaction. The moment you extract one module into its own service, the shared transaction can't span the boundary — DB transactions live on connections, connections live in processes. The replacement patterns are saga (with compensation), transactional outbox + inbox (for at-least-once eventual consistency), and 2PC (which you almost never want). This post walks through each, the cost ledger, and the migration path. Plan for the extraction question from day one, even if the answer is "not yet."

Setting the stage: the patterns named here

Before the migration logic, brief grounding on the distributed-system patterns this post moves between. Skip if you've shipped these patterns before. Stay if your team is still treating "we'll figure it out at extraction time" as a strategy — because the figuring-out cost is much higher mid-extraction than upfront.

The core problem in one paragraph

Two services own different tables. A single user action needs to write to both atomically — community-management creates the Community row; user-management creates the BankAccount row; both must commit or both must roll back. In one process this is one DB transaction. Across two processes, there's no shared transaction context — and the question becomes which pattern do you use to preserve the consistency property.

Saga

A long-lived transaction decomposed into local transactions, each with a compensating action. If step 4 fails, fire compensations 3-2-1 in reverse. The compensations make the system eventually consistent — there's a window where you've done part-of-the-thing and not the rest, but the saga is committed to either finishing or undoing.

Two flavors: orchestrated (one coordinator drives the sequence; clear control flow but the coordinator becomes a coupling point) and choreographed (services react to each other's events; looser coupling, harder to reason about past three or four steps).

Failure mode without it: "user clicks setup community" partially succeeds — Community created, BankAccount creation failed — the system is left in a half-state forever. Either you manually clean it up via support, or the user sees inconsistent data and contacts you anyway. Either way, you've discovered the lack of saga in production.

Transactional Outbox

The local write and the "we promise to publish this event" are made atomic by putting them in the same DB transaction. The transaction inserts the data row and a row in an outbox table. After commit, a relay process polls the outbox and publishes to the bus, then marks the row sent.

Why it works: if the COMMIT succeeds, the outbox row is durable. The event will be published, even if the relay crashes and restarts. If the COMMIT fails, neither the data nor the event exist. The atomicity is one-service-local; the cross-service consistency is eventually-via-events.

Failure mode without it: you write the row, then try to publish the event, and the publish fails — now you have data without a corresponding event downstream. Or you publish first and then the write fails — now you have an event for data that doesn't exist. Both lead to drift; both are painful to debug because the symptoms appear days later in a different module.

Inbox

The mirror image. The receiving service stores incoming event IDs in an inbox table, in the same transaction that processes the event. Re-delivery of the same event sees the ID already present and skips.

Why it works: message buses provide at-least-once delivery (you might get the event twice if the consumer crashed mid-processing). Without inbox, that means double-processing. With inbox, the second processing is a no-op.

Failure mode without it: the consumer crashes after processing but before acknowledging the message. Bus redelivers; consumer processes again; you have double payments, double user creations, double whatever-the-event-meant. The naive fix ("make every operation idempotent") works until it doesn't — some operations are naturally side-effecting in ways that aren't trivially idempotent.

Two-Phase Commit (2PC / XA)

The synchronous all-or-nothing answer. Each service prepares its transaction; a coordinator collects all the prepares; commits all if all succeeded, rolls back all otherwise.

Postgres supports it (PREPARE TRANSACTION). It's also the answer most modern systems reject. The reasons:

Why it's worth knowing about anyway: someone on your team will eventually suggest it. Having the rejection arguments ready saves an hour of meeting.

When you'd combine them

For most real cross-service writes, the answer is outbox + inbox as the delivery substrate plus saga as the coordination shape. The outbox guarantees the event gets published. The inbox guarantees the event gets processed exactly once. The saga decomposes the multi-step operation and provides the compensation story. 2PC stays in your back pocket for the rare case where the rest genuinely doesn't fit (and that case is much rarer than people think).

If you nodded along to all of that — skip to "The pattern that doesn't survive process boundaries" below. The rest of this post is about which of these patterns you reach for to migrate a specific in-process forwardRef site, and the honest cost of each.

The pattern that doesn't survive process boundaries

In a modular monolith, the cleanest way to handle a cross-module atomic write is something like this: the use case lives in one module, depends on repository interfaces rather than concrete classes, and the module wiring uses forwardRef(() => OtherModule) so the interfaces resolve against the data-owning module's implementations. Everything happens inside one runInTransaction call. If any step returns failure, the sentinel pattern throws inside the transaction and TypeORM rolls everything back.

This works beautifully. We use it in one place in our codebase: SetupCommunityUseCase atomically creates a Community (community-management), a Company, an Address, and a BankAccount (user-management). If the bank account insert fails, the community, company, and address all roll back. No saga, no compensation, no eventual consistency. One use case, one transaction, all-or-nothing.

It also pins those two modules to the same process, forever. Until you redesign.

Why shared transactions can't cross process boundaries

The mechanics are blunt:

When the community-management service inserts a Community row, it does so on its connection, in its transaction. When the user-management service — now extracted — needs to insert the BankAccount row, it does so on its own connection, in its own transaction. There is no shared transaction. There is no "all-or-nothing" guarantee at the wire level.

Several intuitions that don't help:

So the question is: what gives you cross-service consistency, and what does each option cost?

Option 1: Saga with compensation

Decompose the atomic operation into a sequence of local transactions, each in its own service. Each step has a compensating action that undoes its effect. If step 4 fails, you fire compensations 3-2-1 in reverse.

For our SetupCommunityUseCase if user-management were extracted:

  1. user-management service: create Company → on later failure, compensate by deleting it
  2. user-management service: create Address → on later failure, compensate by deleting it
  3. user-management service: create BankAccount → on later failure, compensate by deleting it
  4. community-management service: create Community → if successful, the saga commits; if it fails, fire compensations 3-2-1

Two implementation styles, both real:

Orchestrated saga: one service (or a dedicated coordinator service) drives the sequence and tracks state. Easier to reason about, but the coordinator becomes a coupling point.

Choreographed saga: services react to each other's events, each one knowing what to do next. Less coupling, but the event topology becomes hard to keep in your head after about five steps.

The cost ledger:

Option 2: Transactional Outbox

The atomic write becomes local to one service. In the same transaction that writes your data, you also INSERT INTO an outbox table the event(s) that need to fan out to other services.

BEGIN;
  INSERT INTO communities (id, ...) VALUES (...);
  INSERT INTO outbox (event_type, payload, target)
    VALUES ('CommunityCreated', '{...}', 'user-management');
COMMIT;

A separate relay process polls the outbox and publishes events to the bus (Redis Streams, Kafka, whatever your transport is). Once published, the row is marked as sent (or moved to an archive table). Receiving services consume the events asynchronously.

Why this is good:

Why it's not magic:

For our SetupCommunity case: the outbox alone is not enough, because the operation spans writes to both services' tables, not just one service writing and the others reading. You'd combine it with a saga: each saga step uses outbox to durably publish its "I did my part" event, and the coordinator drives the sequence.

Option 3: Inbox (the complement to outbox)

The receiving service stores the IDs of incoming events in an inbox table, in the same transaction that processes the event:

BEGIN;
  -- ON CONFLICT DO NOTHING: if we've seen this event before, the row already exists
  INSERT INTO inbox (event_id) VALUES ('evt-12345') ON CONFLICT DO NOTHING;
  IF (rows inserted == 0) THEN ROLLBACK; -- already processed; skip silently
  ELSE
    -- process the event
    INSERT INTO bank_accounts (...);
  END IF;
COMMIT;

This catches the "consumer crashes after processing but before acknowledging the message" failure mode. Without it, your message bus redelivers the event, you process it again, and your domain has duplicates.

Outbox + inbox is the canonical combination. The outbox guarantees "the event will be published." The inbox guarantees "the event will be processed exactly once." Together they convert at-least-once delivery (which is all most message buses provide) into effectively-exactly-once processing.

Option 4: Two-Phase Commit (2PC / XA)

Postgres supports prepared transactions. The pattern: each service does its part of the work, then issues PREPARE TRANSACTION 'name' (instead of COMMIT). A coordinator collects all the prepares; only when all return successfully does it tell each service to COMMIT PREPARED. If any prepare fails, everyone runs ROLLBACK PREPARED.

This is the synchronous all-or-nothing answer. It's also the answer most modern systems reject. The reasons are:

I include 2PC for completeness. If you're seriously considering it, the better answer is usually "don't extract" or "extract differently so the atomicity stays local."

What I'd actually do (specific to ECM)

If we extracted user-management from community-management tomorrow, the migration for SetupCommunityUseCase would be:

  1. Orchestrated saga, driven from community-management (because that's where the use case originates). The saga state lives in community-management.
  2. Each saga step uses outbox at the sending side. The community-management service's outbox holds "PleaseCreateCompany", "PleaseCreateAddress", "PleaseCreateBankAccount", and the eventual "SagaCompletedSuccessfully" or "SagaRolledBack".
  3. user-management uses inbox on the receiving side. Each command is processed exactly once. Each response is published back via user-management's own outbox.
  4. Compensations are local to user-management: "DeleteCompany", "DeleteAddress", "DeleteBankAccount". All idempotent. None involve external side effects (no money moved, no emails sent — we deliberately ordered the saga so external side effects happen after community creation, which is the last step).
  5. User-visible behavior during partial failure: the API returns a 202 Accepted with a saga ID. The frontend polls or subscribes to a status endpoint. "Creating your community... done" or "Something went wrong; we've rolled back; please try again."

The migration is non-trivial but bounded. ~2 weeks of work for a competent engineer, much of it in writing tests for the compensation paths. The hardest part is shifting the team's mental model from "this either fully succeeds or fully rolls back synchronously" to "this is eventually consistent within seconds, with defined failure modes and visible saga state."

The honest cost of extraction

Extraction is a trade. You exchange:

Monolith gives youExtraction gives you
One service to deployIndependent deployment cadence per service
One process to monitorIndependent scaling per service
Synchronous transactions across modulesFailure isolation — one service down doesn't kill the other
One stack trace per requestTeam autonomy at the service boundary
One CI/CD pipelineDifferent SLAs per service

This is worth it when:

It's not worth it when:

The position we've actually taken

For ECM specifically: we extracted PDF rendering because PDFs are CPU-heavy, fail differently from main app code (Playwright crashes, font issues, timeout patterns), and benefit from independent scaling. The communication boundary is BullMQ jobs — community-management enqueues "please render this PDF", the PDF service consumes the job, the result comes back as an event. No shared transaction. No saga. The natural async boundary made extraction cheap.

We have not extracted user-management, community-management, energy-management, or financial-management. They co-vary with each other, scale with the same factors (per-community user count, per-community billing cycles), and benefit from the operational simplicity of one deployment. The single forwardRef site between community-management and user-management is a documented, deliberate coupling. The day we decide to extract, we have the migration path laid out above. We don't pretend it's free; we don't ban ourselves from using forwardRef because "someday we might extract."

This is what modular monolith with extraction-readiness actually means: every coupling is a deliberate decision, every extraction is a foreseeable migration, neither is imposed by ideology. The forwardRef pattern is a tool. The saga + outbox + inbox combination is also a tool. Different tools, different costs, both legitimate. Choose based on the actual deployment boundary you actually have, not the one you might have someday.

The bottom line

If your team uses forwardRef across module boundaries:

If you are extracting:

The forwardRef pattern is durable in-process and structurally impossible cross-process. The replacement patterns are well-understood and well-documented across the industry. The migration is foreseeable. All of this can coexist in the same codebase, on the same team, in the same calendar year. That's what extraction-ready modular monolith actually means in practice — not "every module will eventually be a service," but "every module could be a service, with a clear plan and an honest cost, and we'll choose deliberately when the moment comes."