Multi-tenant RLS with AsyncLocalStorage in NestJS
Postgres Row-Level Security catches multi-tenant data leaks at the database layer, where they can't be bypassed by application bugs. To make RLS work in a NestJS app you need three things in concert: a tenant context that survives through async boundaries (AsyncLocalStorage), a transaction-bound session that sets SET LOCAL app.tenant_id exactly once per request, and a guard that refuses to start processing if tenant resolution fails. This post is the architecture, not the SQL.
The problem RLS actually solves
If you've built a multi-tenant SaaS, you've written this query a hundred times:
SELECT * FROM invoices WHERE tenant_id = $1 AND id = $2
It works. It also has a failure mode: one missing tenant_id filter is a data leak. One repository method that forgets the predicate, one ORM call that uses .findOne({ id }) instead of .findOne({ id, tenant_id }), one raw SQL string concatenation in a one-off script — and a customer can read another customer's invoices.
The mitigation most teams reach for is discipline. Write code review checklists. Add lint rules. Wrap every repository in a tenant-aware base class. These all help. They also all fail eventually, because the constraint is in the developer's head, not in the system.
Row-Level Security moves the constraint to where it belongs: into the database. You declare a policy on each tenant-scoped table that says "rows where tenant_id = current_setting('app.tenant_id') are visible." Now a query without a tenant_id filter doesn't return the wrong data — it returns only this tenant's data, because the database itself enforces the filter. A developer who forgets to filter cannot leak. A bug in the ORM cannot leak. A raw SQL script connecting as the regular app user cannot leak.
That's the value. The cost is making sure app.tenant_id is set correctly for every authenticated request, exactly once, on the same connection that runs the request's queries. That's where most "RLS is too complicated" stories come from.
The three problems to solve in concert
To make RLS work in NestJS you need to answer three questions:
- How does the tenant ID get from the HTTP request to the SQL session, given that NestJS's middleware → guard → interceptor → controller → service → repository chain is full of async boundaries?
- How do you ensure the
SET app.tenant_idis on the same connection as the query that needs it? - How do you make sure unauthenticated requests or admin queries either run with a sentinel tenant or fail loud, never silently fall back to "see everything"?
You can solve them separately, but the solution gets ugly. The clean version solves them with two pieces working together: AsyncLocalStorage for context, and transaction-bound sessions for connection guarantees.
AsyncLocalStorage: the context spine
AsyncLocalStorage is Node's standard library equivalent of thread-local storage. You enter a store, run a function inside it, and any code called during that function — synchronously or after any number of await hops — can read the same store. It is the right primitive for "thread tenant context through one request without passing it as an argument."
In a NestJS app you wrap each request in an interceptor that opens an ALS store on entry and closes it on exit. The store holds tenant_id, user_id, correlation_id, and anything else that's request-scoped. Every layer of your app can read these by importing a small helper:
const ctx = requestContext.get();
if (!ctx?.tenant_id) {
return failure('No tenant context');
}
This works through async boundaries because Node's ALS is async-resource-aware. A promise chained 12 levels deep, a setTimeout, a database callback — they all see the same store as long as they were initiated inside it.
What you must not do is rely on AsyncLocalStorage alone for RLS. The store gives you tenant context in JavaScript. The database doesn't see the JavaScript runtime; it sees a connection, and that connection has session variables that persist until something resets them. Setting app.tenant_id via the ALS context only matters if the SQL that runs the query is on a connection where that SET happened.
Transaction-bound sessions: the connection guarantee
Two facts about Postgres sessions matter here:
SET app.tenant_id = '...'persists for the lifetime of the connection. If you set it and return the connection to the pool, the next request that picks up that connection inherits your value.SET LOCAL app.tenant_id = '...'persists only until the end of the current transaction.
The first behavior is a horror story waiting to happen. The second is the one you want. SET LOCAL inside a transaction binds the tenant context to that transaction, and the transaction ends cleanly on commit or rollback — no leakage across requests, no pollution of pooled connections.
So the pattern becomes: every authenticated request runs inside a transaction, and the first statement in that transaction is SET LOCAL app.tenant_id. Every subsequent query in the same transaction sees the tenant filter applied by RLS policies. When the transaction commits, the session variable resets automatically.
In practice this means an interceptor that wraps the request handler in something like:
await runInTransaction(async () => {
const ctx = requestContext.get();
await dataSource.query(
"SELECT set_config('app.tenant_id', $1, true)",
[ctx.tenant_id],
);
return handler();
});
The true third argument is the local-to-transaction flag. The transaction either commits all the request's writes with the tenant binding intact, or it rolls back and the binding evaporates with it. There is no way to leak the tenant ID into the next request's connection because the session variable doesn't outlive the transaction.
One implementation detail that took me too long to find: if your ORM (TypeORM in our case) uses a CLS-bound manager for transactions, you have to make sure repositories use the bound manager, not the default one. Otherwise a single repository call might end up on a different connection than the one where SET LOCAL ran. The typeorm-transactional package handles most of this; dataSource.createQueryBuilder participates correctly while manager.createQueryBuilder (on a freshly resolved manager) might not. I learned this debugging polymorphic foreign-key constraint violations — the inserted row was visible in one connection's uncommitted state but not in another's.
The guard that refuses to start without a tenant
The interceptor opens the ALS store. The guard populates it. The guard's job is to extract tenant_id from the JWT (or session, or however you authenticate), validate it, and put it in the store. If the request is anonymous or the JWT lacks a tenant claim, the guard has one of three jobs to do, and you have to pick one explicitly:
- Reject the request. 401 or 403. This is the right default.
- Use a sentinel tenant. For health checks, public endpoints, and similar — bind to a tenant that owns no real data.
- Mark the request as cross-tenant admin. Some admin endpoints need to query across tenants. Bind a special "admin" role that RLS policies recognize, and audit-log the cross-tenant access.
What the guard must not do is silently fall through. "If no tenant context, the SQL won't have a tenant filter, but it's fine because we'll add WHERE clauses in code" is the bug class RLS exists to eliminate. The guard must either succeed with a tenant binding, succeed with an explicit non-tenant role, or fail.
I make this enforceable by writing the policy on every tenant-scoped table to filter against current_setting('app.tenant_id', true), where the second argument makes Postgres return NULL if the setting is missing instead of throwing. With no binding, every query returns zero rows. A test that runs without setting the binding gets empty results, which is loud enough to catch in development.
What this gets you
The architectural property you end up with is that tenant isolation is a structural guarantee, not an application invariant. Every authenticated request:
- Has its tenant context in ALS
- Runs inside a transaction
- Has
SET LOCAL app.tenant_idas its first statement - Sees only its own data through RLS policies
None of your application code has to remember to filter. None of your repositories need a tenant-aware base class. None of your raw SQL scripts can leak as long as they connect as the app role (RLS applies; superuser bypasses RLS, so don't connect as superuser).
You still need application-level authorization for "this user within this tenant can do X." RLS is about tenant boundaries, not user-within-tenant permissions. But the most catastrophic class of multi-tenant bug — data crossing the tenant boundary — becomes structurally impossible.
Trade-offs to know
Three honest costs:
- Migrations and admin scripts get harder. If you log in as the app user to run a one-off script, RLS applies. Either set the tenant context explicitly at the start of the script, use a sentinel "admin" role with bypass privileges (audited), or connect as a different role for ops work.
- Cross-tenant features need an escape valve. Aggregate reporting across tenants, support tooling that views customer data, internal dashboards — these need a deliberate cross-tenant path. The right answer is a separate role/policy that grants cross-tenant access plus audit logging. Doing this carelessly recreates the original bug class.
- Connection-pool poisoning is impossible only if you stay disciplined. A bare
SET app.tenant_id(withoutLOCAL) anywhere in the codebase is a foot-gun. I keep this enforceable by lint rules and code review, and by never exposing a "set tenant context" function that doesn't open a transaction first.
Why this composes well with the rest of the architecture
This is the part I didn't fully appreciate until I'd been running it for a few months: RLS via ALS composes cleanly with the other patterns I lean on. The Result<T,E> pattern stays unchanged (the SQL just returns fewer rows when no tenant is bound). The workflow engine doesn't know about tenancy — it asks repositories for entities by ID, and RLS quietly enforces "by ID and by current tenant." Cross-module events carry tenant context in the envelope and re-establish the ALS store on the consumer side. Cron jobs run with an explicit "system" role that bypasses RLS for the operations that have to be cross-tenant (and only those operations).
The tenancy concern becomes a property of the infrastructure layer. Domain code is, for once, oblivious to multi-tenancy. That's the right place for it.