← back

The Rails Schema Cache

This post was AI-generated. It covers something I find genuinely useful and have been meaning to write up. It has also been reviewed and edited by me.

ActiveRecord needs to know about your database schema (column names, types, defaults, nullability, primary keys, indexes) before it can build SQL queries. Without this metadata, it cannot cast values, construct INSERT statements, or validate attribute assignments.

The schema cache is how Rails avoids querying the database for this information on every request.

What Gets Cached

The SchemaCache class lives in activerecord/lib/active_record/connection_adapters/schema_cache.rb. It maintains five internal hashes, all keyed by table name:

@columns      = {}   # table_name => [Column, Column, ...]
@columns_hash = {}   # table_name => { "col_name" => Column }
@primary_keys = {}   # table_name => "id"
@data_sources = {}   # table_name => true/false
@indexes      = {}   # table_name => [Index, ...]

Each Column object stores the column's name, SQL type, Ruby type, default value, and whether it allows nulls. These objects are what Rails uses to type-cast values when building queries: knowing that age is an integer so it can cast "25" appropriately, or that email is a string so it can quote it.

When the Cache Gets Populated

By default, the schema cache is populated lazily. The first time your code touches a model (User.first, User.where(name: "Alice"), even User.columns), ActiveRecord calls load_schema internally:

# Simplified from activerecord/lib/active_record/model_schema.rb
def columns_hash
  load_schema
  @columns_hash
end

load_schema checks an internal @schema_loaded flag. If false, it queries the database through the connection adapter:

MySQL: SHOW FULL FIELDS FROM users
PostgreSQL: queries information_schema.columns

The results are stored in the SchemaCache and never re-queried for the lifetime of the process. Once Rails knows what columns users has, it uses that cached metadata for every subsequent query against that table.

This is per-process. Each Puma worker, each Sidekiq process, each Rails console session has its own schema cache instance.

Why Lazy Loading

Loading schema metadata for every table at boot sounds appealing, but it creates a dangerous coupling between application startup and database availability. If the database is under load or temporarily unreachable, eager schema loading would prevent pods from starting at all. In a degraded state, you'd be unable to scale up new capacity, the exact moment you need it most.

Lazy loading means a pod can boot and begin accepting requests without touching the database. The schema gets loaded incrementally as models are first accessed. The tradeoff is that first requests to each model are slower, since they include the schema query overhead.

Even with config.eager_load = true (which eager-loads Ruby files), Rails does not preload schema information. Eager loading the code and eager loading the schema are independent. This was made explicit in this commit, which removed a previous behavior where Rails would opportunistically define attribute methods during boot if a connection happened to already be established. An initializer inadvertently connecting to the database would radically change boot behavior.

The Schema Cache Dump

To avoid the lazy-loading penalty entirely, Rails can serialize the schema cache to a file:

rails db:schema:cache:dump

This writes a YAML file (default: db/schema_cache.yml) containing all the cached metadata. On the next boot, Rails loads this file instead of querying the database. The file extension determines the serialization format:

Extension	Format	Tradeoff
`.yml`	YAML	Human-readable, slower to load
`.dump`	Marshal	Binary, significantly faster to load

The dump includes everything: columns, column hashes, primary keys, data sources, indexes, and the schema version (latest migration timestamp).

Two configuration options control this behavior:

# Load the dump file at boot (default: true in production)
config.active_record.use_schema_cache_dump = true

# Defer loading the dump until the connection is first accessed (default: false)
config.active_record.lazily_load_schema_cache = false

# Skip verifying the dump matches the current migration version (default: true)
config.active_record.check_schema_cache_dump_version = true

With use_schema_cache_dump = true and a dump file present, Rails never needs to query the database for schema metadata at all. Every model access hits the pre-populated in-memory cache.

How Queries Use the Cache

When you write User.where(name: "Alice").first, here is the path through the schema cache:

Table name resolution: User.table_name resolves to "users" via ModelSchema (class name, pluralized, with optional prefix/suffix).
Schema loading: The where clause needs column type information. load_schema fires if the schema hasn't been loaded yet, populating @columns_hash from the cache (or from the database on a cache miss).
Type casting: Rails looks up name in columns_hash to find its type object (ActiveModel::Type::String). It uses this to cast "Alice" to the appropriate SQL representation.
Arel query building: Arel::Table constructs the SQL AST using the resolved table name and type-cast values. The column metadata tells Arel how to quote and bind parameters.
SQL generation: The connection adapter's visitor compiles the Arel AST to SQL: SELECT * FROM "users" WHERE "users"."name" = 'Alice' LIMIT 1.

Without the schema cache, step 2 would require a round-trip to the database before the actual query could even be constructed.

Other frameworks avoid this problem entirely. Django, for example, requires you to define every column explicitly in your model class. The schema metadata is right there in the code, so there's nothing to discover at runtime. Rails takes the opposite approach: models are deliberately thin, and the framework figures out the schema by inspecting the database. The schema cache is the cost of that convenience.

Cache Invalidation

The schema cache does not automatically detect schema changes. If you drop a column, add a table, or change a type, running processes will continue using stale metadata until explicitly told otherwise.

Within migrations, Rails calls clear_data_source_cache!(table_name) to evict a specific table's entries from all five internal hashes. The model-level equivalent is Model.reset_column_information, which sets @schema_loaded = false and clears all cached column data, attribute builders, and the Arel table reference. The next access re-triggers the full schema loading flow.

Neither of these affect other running processes. A migration run in a deploy script clears the cache in that process only. Every other running pod, worker, or console session retains its stale cache until restarted.

This can bite you in subtle ways. If you drop a table or column in a migration and deploy without restarting all processes, any running worker will error the moment it accesses something that no longer exists in the database. The stale metadata doesn't just produce wrong results, it produces exceptions.

Thread Safety (Rails 7.1+)

Prior to Rails 7.1, SchemaCache held a direct reference to a database connection. Since all connections in a pool shared the same cache instance, threads would overwrite which connection the cache used. Thread A could end up accidentally using Thread B's connection.

Rails 7.1 introduced SchemaReflection as a wrapper (PR #48716). Methods now accept a connection pool parameter and use pool.with_connection to obtain a connection only when needed:

class SchemaReflection
  def columns(pool, table_name)
    cache(pool).columns(pool, table_name)
  end
end

The cache itself is still not internally synchronized. Concurrent threads can trigger duplicate schema queries on a cold cache. This is accepted as harmless since the results are idempotent.

Scaling the Schema Cache

For large Rails applications, the schema cache dump becomes part of the deployment artifact. A common pattern:

A pre-deploy job runs migrations and dumps the schema cache
The dump is stored in shared storage (S3, Redis, a ConfigMap)
New pods restore the dump before accepting traffic

This eliminates the lazy-loading penalty across all processes and ensures every pod starts with a consistent, current view of the schema — without requiring database access during boot.