My preferred way of including optional where clauses in
Ecto queries is compact – but can
eventually trigger unexpected performance problems. Investigating this led me
down a rabbit hole of Postgrex, prepared
statements and PostgreSQL heuristics.
Concisely Including Optional Where Clauses in Ecto Queries
An Elixir application I’m working on involves managing a table of ‘medics’,
i.e. doctors, dentists, nurses etc.. The medics table is stored in a
PostgreSQL database and features columns like
| Column Name | Type | Examples |
|---|---|---|
first_name |
text | John |
last_name |
text | Doe |
birth_year |
integer | 1979 |
state |
text | active, retired, etc. |
| … | … | … |
A common operation is presenting a list of medics to the user, and thus the application defines a function for listing medics in that table. However, it’s usually not desirable to get all medics. Instead, only medics who are not retired should be returned.
To support this, I like to define a function which supports optional flags to influence the database query, akin to this:
defmodule Medics do
# ...
def list_medics(opts \\ []) do
opts = Keyword.validate!(opts, include_retired_medics: false)
from(medic in Medic,
where: ^opts[:include_retired_medics] or medic.state != "retired"
)
|> Repo.all()
end
# ...
end
Note how an (optional) :include_retired_medics flag is supported such that
callers can use
# Get only medics who are not retired
Medics.list_medics()
# Get all medics, including retired ones
Medics.list_medics(include_retired_medics: true)
This way of optionally including the medic.state != "retired" check is very
compacty. It relies on the query planner’s Boolean simplification: when
opts[:include_retired_medics] is false, the WHERE clause reduces to
state != 'retired'. When the flag is true, the clause reduces to just
true, so any medic is returned regardless of state.
To support this, the database features an index on the medics.state column
such that it’s efficient to query medics with just a specific state.
This worked great – until we noticed that PostgreSQL was sometimes ignoring
the index on the state column entirely, falling back to a sequential scan.
What was going on?
How Ecto Turns Queries Into Prepared Statements
Any Ecto query such as
from(medic in Medic,
where: ^opts[:include_retired_medics] or medic.state != "retired"
)
gets translated into a SQL statement similar to
SELECT ... FROM medics WHERE $1 OR state != 'retired'
Note the $1 here! This is a hint that
prepared statements
are at play:
When Ecto (via Postgrex) executes a statement for the first time, it uses
PostgreSQL’s extended query
protocol
to create a prepared statement. This is conceptually equivalent to issuing a
PREPARE: it
assigns a name to the statement (Ecto uses names of the form ecto_5122,
where the number is derived from System.unique_integer([:positive])) and
makes the database parse, analyze and rewrite the statement, storing the result
under that name.
A statement prepared like this can then be executed later via the equivalent of
an EXECUTE, which
runs the prepared statement directly, skipping the parsing/analyzing/rewriting
steps entirely.
Prepared statements can take parameters: values that are substituted into the
statement when it is executed. When creating the prepared statement, the
statement refers to parameters by position, using $1, $2, etc.. And that’s
where the $1 above is coming from.
That being said, the Most Significant Bit of all this is:
The “name” of the prepared statement (e.g. ecto_234413523) is solely based on
the “normalised” Ecto query. As part of a process called “query normalization”,
Ecto strips out the literal parameter values (the things you pin with ^) and
computes a statement name based on what’s left – the query’s shape: which
tables, which joins, which clauses, in what order, with placeholders instead of
values. The same query shape (even if parameters have different values!) will
use the same prepared statement name!
The blog post Under the Hood of Ecto has a great discussion of what happens inside Ecto as part of all this.
How PostgreSQL Deals With Prepared Statements
Executing any statement requires making a ‘query plan’ in the database, and
prepared statements are no exception. However, since prepared statements are
parameterized, it’s not immediately clear what the query plan should be! So
PostgreSQL applies some clever logic by default. The PREPARE Notes
section
explain:
A prepared statement can be executed with either a generic plan or a custom plan. A generic plan is the same across all executions, while a custom plan is generated for a specific execution using the parameter values given in that call. Use of a generic plan avoids planning overhead, but in some situations a custom plan will be much more efficient to execute because the planner can make use of knowledge of the parameter values.
[..]
The current rule for this is that the first five executions are done with custom plans and the average estimated cost of those plans is calculated. Then a generic plan is created and its estimated cost is compared to the average custom-plan cost. Subsequent executions use the generic plan if its cost is not so much higher than the average custom-plan cost as to make repeated replanning seem preferable.
I.e. if PostgreSQL finds that the generic query plan (which must produce correct results for any combination of parameters) is estimated to be no more expensive than the average cost of the five custom query plans, the database will use the generic query plan for all subsequent executions – making prepared statements even more efficient!
When Generic Plans Hurt
However, it’s this generic plan which may hurt performance. Coming back to our initial example, consider what happens:
- The first five executions invoke
list_medics(), i.e. the$1placeholder assumes the valuefalse. Each custom plan seesfalse OR state != 'retired', simplifies tostate != 'retired', and decides to use an index scan. The average custom-plan cost reflects this efficient index-based plan. - After the fifth execution, PostgreSQL creates a generic plan. This plan must
be valid for any value of
$1– it cannot assume$1isfalse. Since$1 OR state != 'retired'cannot be simplified when$1is unknown, the generic plan cannot rely on the index and instead uses a sequential scan. - PostgreSQL compares the generic plan’s estimated cost to the average custom-plan cost. If the table is small enough, the generic plan (sequential scan) may not look much more expensive than the average custom plan (index scan), so PostgreSQL adopts the generic plan for all subsequent executions.
From this point on, the prepared statement always uses the generic plan with a
sequential scan, even when called with $1 = false where an index scan would
be far more efficient. PostgreSQL does re-evaluate the plan if planner
statistics are updated (e.g. after
ANALYZE), but as
long as the statistics don’t change dramatically, the generic plan sticks
around. Thanks to connection pooling, a single database session – and its
cached generic plan – can persist for a long time.
As is often the case, there’s a configuration setting to influence this:
plan_cache_mode
can be set to influence the heuristics applied when deciding on query plans for
prepared statements. Instead of tinkering with that though, there’s a better
solution.
A Better Way To Include Optional Where Clauses in Ecto Queries
A simple fix to this problem is to change how the query is constructed client-side, in Elixir.
Stop building conditional where clauses like
from(medic in Medic,
where: ^opts[:include_retired_medics] or medic.state != "retired"
)
but rather only include the where clause if needed by doing e.g.
query = from(medic in Medic, as: :medic)
query =
if opts[:include_retired_medics] do
query
else
where(query, [medic: medic], medic.state != "retired")
end
…or use then/2 or a helper
function or something. Make sure that you produce two separate
Ecto.Query data structures
client-side; this ensures that two separate names for prepared statements are
generated, which in turn each get their own query plans.
For more complex cases where the set of filters is truly dynamic,
Ecto.Query.dynamic/2
is another useful tool – but make sure different filter combinations still
produce different query shapes. The Ecto documentation on composable
queries covers this topic in
more detail.
You can inspect which prepared statements are active in your PostgreSQL session
by querying the
pg_prepared_statements
view, and use
EXPLAIN to see
the query plan PostgreSQL chose for each one.