How to Use ProcessList to Debug and Optimize Performance
1) Quick commands
- SHOW FULL PROCESSLIST; — snapshot of active threads (full queries shown).
- SELECTFROM performance_schema.processlist; — processlist via Performance Schema (less locking).
- mysqladmin processlist — quick CLI view.
- SHOW ENGINE INNODB STATUS\G — InnoDB locks and transactions.
- SHOW VARIABLES LIKE ‘long_query_time’; and slow query log settings.
2) What to inspect in the output
- Id — thread id (used with KILL).
- User / Host / db — source of queries.
- Command — Query, Sleep, Binlog Dump, etc.
- Time — seconds in current state (long Time → investigate).
- State — e.g., Locked, Waiting for table metadata lock, Sorting result, Copying to tmp table.
- Info — SQL text (use FULL to avoid truncation).
3) Immediate triage steps (ordered)
- Identify long-running non-Sleep queries: SELECT … FROM information_schema.processlist WHERE command=‘Query’ AND time>10 ORDER BY time DESC;
- If a query is blocking many others, consider KILL ; or KILL QUERY for non-destructive stop. Use carefully in production.
- For metadata lock / Locked states: find the transaction holding the lock (INNODB_TRX + processlist joins) and either commit/rollback that transaction or kill its thread.
- For many Sleeping connections: audit connection pooling/config (increase wait_timeout or fix app to close connections).
- For high connection counts: check max_connections and connection sources; identify clients by Host/User.
4) Diagnose root cause
- Run EXPLAIN / EXPLAIN ANALYZE on offending queries — look for full table scans (type=ALL), missing indexes, large row estimates.
- Check slow query log and analyze with pt-query-digest to find top offenders.
- Use Performance Schema (eventsstatements/eventsstages) to see where queries spend time (IO, sorting, locking).
- Inspect disk, I/O, CPU, and buffer pool usage — performance problems often external to SQL (e.g., storage saturation).
5) Common fixes
- Add or rewrite indexes (covering indexes for SELECT + ORDER BY).
- Rewrite queries (avoid SELECT *; avoid large OFFSET; use range or keyset pagination).
- Break large transactions into smaller ones; commit promptly.
- Avoid long-running DDL during peak times — use online schema change tools (pt-online-schema-change or native online DDL if supported).
- Tune MySQL config: innodb_buffer_pool_size, tmp_table_size, max_connections, thread_cache_size.
- Use connection pooling and tune wait_timeout to avoid idle connection buildup.
- Offload read traffic to replicas and cache frequent results (Redis, application cache).
6) Tools to automate and monitor
- pt-query-digest (analyze slow logs).
- pt-kill (automated safe killing of runaway queries).
- Percona Monitoring and Management (PMM), Grafana + Prometheus dashboards for real-time metrics.
- Performance Schema + sys schema queries for live diagnostics.
7) Emergency playbook (if DB is unresponsive)
- Run SHOW FULL PROCESSLIST; quickly identify top CPU/longest queries.
- Generate kill list for queries > X seconds (e.g., 30s) and KILL QUERY those that are non-critical.
- Check replication lag (if any) and I/O saturation.
- After stabilization, analyze slow queries and implement long-term fixes above.
If you want, I can produce: (A) a one-page checklist you can run during incidents, (B) sample queries to find killers and generate KILL scripts, or © a tuned Performance Schema query set — pick one.
Leave a Reply