Hey, in addition to Part 2, I recently started to review indexes in one of my prod databases. I would say, it turned out to be very educational.
The database has been developed for a few years, and thousands of new features and migrations have been applied. I was surprised, at how many indexes have been created and never used, or did not used for a long time.
Why index management is so important? The indexes are costly! The costs of indexing:
- Indexes are eating storage, you need high-speed, highly-available storage for that.
- Indexing takes CPU and some IOPSes on each INSERT/UPDATE/DELETE query, and gives additional latency on these queries.
- Indexes take time to rebuild on database restore during incident recovery.
Any index is a trade-off. On one side it accelerates queries and reduces IOPSes, on another it creates an additional workload.
At the last database review, I found unused indexes for 50GB on a 600GB database. How am I found that?
How to find unused indexes:
-- Unused indexes
SELECT
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0 -- No scans
ORDER BY pg_relation_size(indexrelid) DESC;
-- Total size of unused indexes
SELECT
pg_size_pretty(SUM(pg_relation_size(indexrelid))) AS indexes_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0;
The unused indexes are sorted by size and it is a priority to fix first. The total size of unused indexes shows how bad the whole situation is.
How to find indexes, that cover the same columns:
-- Duplicate indexes cover the same columns
SELECT
indrelid::regclass AS table_name,
array_agg(indexrelid::regclass) AS duplicate_indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;
Yes, it is very interesting and a good reason to review the indexes.
A sorted table to see the percentage of idx_scan/(seq_scan + idx_scan) when using the table. A good occasion to review the index coverage of tables.
-- Index usage vs seq scan
SELECT
relname AS table_name,
seq_scan, seq_tup_read,
idx_scan, idx_tup_fetch,
round(100 * idx_scan::numeric / NULLIF(seq_scan + idx_scan, 0), 2) AS index_usage_percent
FROM pg_stat_user_tables
ORDER BY index_usage_percent DESC NULLS LAST;
The query example for migrations, that drops the unused index:
ALTER TABLE table_name DROP INDEX IF EXISTS index_name;
The reasons, why this index "crisis" has happened:
- a lot of new features were added and removed in the application, and indexes have been created, but we forgot to review and remove indexes;
- a lot of indexes have been created because it was obvious - developers plan to filter or join by these columns, but Postgres plan in real life does not want to use it;
- indexes have never been reviewed.
Conclusions:
- Indexing is required to run queries, but it is costly for the CPU on INSERTs/UPDATEs/DELETEs, and takes extra storage space.
- Indexes review should be included in the regular database audit and maintenance procedures.
- Here are the extremely useful queries to get an overview of the current situation with indexes for your Postgres database.
PS: I use AI a lot as a learning partner or an advisor, but not in production. No AI was used to write this article.
Top comments (0)