Postgres derailed indexes. Part 1

There seems to be something wrong with our bloody index today!

AlekseyL
3 min readNov 23, 2017

I will not give a speach about indexes in general or indexes special features in the PostgreSQL. Lets assume you know what indexes are, what is their value in delivering right amount of a needed data within a suitable amount of time. I just want to bring up a couple of interesting examples on how you can easily derail indexed behaviour in the PostgreSQL.

Yo-ho and bottle of GIN

Data structure

Lets start with a table named cards which have two columns a text title and a foreign key of collection_id.

CREATE TABLE cards (
title varchar NOT NULL,
title_tsv tsvector NOT NULL,
collection_id bigint NOT NULL,
id serial
);

Now we need to be able to do full text search (FTS) on a title, search for all cards for a given collection_id and their combination: FTS inside given collection. So lets create trigger and two indexes:

CREATE TRIGGER title_tsvupdate 
BEFORE INSERT OR UPDATE ON examples
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(title_tsv, 'pg_catalog.english', title );
CREATE INDEX index_cards_collection_id ON examples(collection_id);
CREATE INDEX index_examples_fts_with_collection ON examples USING
GIN( title_tsv, collection_id );

In a real world I’m using custom dictionaries and a custom text search configuration, a more complex trigger and much more. But this is already enough to mess with Posgres GIN-index, the way I want to.

Data set size: 1.1M of cards is ‘equally’ distributed over 1K collection.

Explain/Analyze/Surprise

So lets make some search with analyze, and review some results.

Lets search collection:

EXPLAIN ANALYZE SELECT id, title  FROM "cards"
WHERE collection_id = 624
LIMIT 10;

QUERY PLAN
--------------------------------------------------------------------Limit (cost=0.43..40.21 rows=10 width=57) (actual time=0.255..9.145 rows=10 loops=1)
-> Index Scan using index_cards_collection_id on cards (cost=0.43..4536.15 rows=1140 width=57) (actual time=0.254..9.133 rows=10 loops=1)
Index Cond: (collection_id = 624)
Planning time: 0.201 ms
Execution time: 9.183 ms

Lets search for all words starting with ‘o’ in same collection :

broken plan

What’s wrong with this bloody index, why postgres doesn’t use both condition on one index, the one we esspecially created for this purpose?

Look the next example to see whats really broke here:

fixed plan

The only difference is an exact data type of the given collection id! In the right one query I force the datatype of 624 to be a bigint. And this small change gives us 4 times boost in expectation estimations and real execution time!

As you can see in this three examples when we are using simple b-tree index, like in the first one example, the Postgres doesn’t care about strict type in search condition, but when it comes to GIN you must be more specific, or the Postgres will not go for it.

In this example even if you remove b-tree index on collection id, and start using GIN instead, Postgres will not use GIN unless you specify exact type!

Perfomance issue

To be honest the last two plans perfomance is highly dependent on data distribution, in some cases they may give pratically same expectations and execution time, in some rare cases you can get even more than 4x times boosts.

How to shoot yourself in leg with railsgun

When you can step into this trouble? Only when you operating raw SQL, usually your ORM will help you to avoid this problem.

Couple examples for rails users when you may get yourself into trouble:

ActiveRecord::Base.connection.execute( Relation.scope.to_sql )

Surprise! When you do to_sql you do it without any adapter specific knowledge, so the datatype cast doesn’t applies to substitutions if any.

Second way to harm yourself without knowing:

Card.by_text('find me').where("collection_id = #{624}")

Yes, I know about sql injections, yes, this is bad way, but there are cases when ORM cannot do sanitation for you. For example rails doesn’t sanitize selects, and doesn’t provide any nice built in way to do it, so when you need to do it manually, don’t forget to typecast.

Resume

I hope you learn something about GIN today.

See you in second part of Postgres derailed indexes, which I named Apple-pie ORDER BY!

May the GIN be with you.

--

--

AlekseyL

Chief Software Architect / CTO, Ruby and PostgresSQL fan.