Automating Workflows with ProcessList: Best Practices and Examples

How to Use ProcessList to Debug and Optimize Performance

1) Quick commands

  • SHOW FULL PROCESSLIST; — snapshot of active threads (full queries shown).
  • SELECTFROM performance_schema.processlist; — processlist via Performance Schema (less locking).
  • mysqladmin processlist — quick CLI view.
  • SHOW ENGINE INNODB STATUS\G — InnoDB locks and transactions.
  • SHOW VARIABLES LIKE ‘long_query_time’; and slow query log settings.

2) What to inspect in the output

  • Id — thread id (used with KILL).
  • User / Host / db — source of queries.
  • Command — Query, Sleep, Binlog Dump, etc.
  • Time — seconds in current state (long Time → investigate).
  • State — e.g., Locked, Waiting for table metadata lock, Sorting result, Copying to tmp table.
  • Info — SQL text (use FULL to avoid truncation).

3) Immediate triage steps (ordered)

  1. Identify long-running non-Sleep queries: SELECT … FROM information_schema.processlist WHERE command=‘Query’ AND time>10 ORDER BY time DESC;
  2. If a query is blocking many others, consider KILL ; or KILL QUERY for non-destructive stop. Use carefully in production.
  3. For metadata lock / Locked states: find the transaction holding the lock (INNODB_TRX + processlist joins) and either commit/rollback that transaction or kill its thread.
  4. For many Sleeping connections: audit connection pooling/config (increase wait_timeout or fix app to close connections).
  5. For high connection counts: check max_connections and connection sources; identify clients by Host/User.

4) Diagnose root cause

  • Run EXPLAIN / EXPLAIN ANALYZE on offending queries — look for full table scans (type=ALL), missing indexes, large row estimates.
  • Check slow query log and analyze with pt-query-digest to find top offenders.
  • Use Performance Schema (eventsstatements/eventsstages) to see where queries spend time (IO, sorting, locking).
  • Inspect disk, I/O, CPU, and buffer pool usage — performance problems often external to SQL (e.g., storage saturation).

5) Common fixes

  • Add or rewrite indexes (covering indexes for SELECT + ORDER BY).
  • Rewrite queries (avoid SELECT *; avoid large OFFSET; use range or keyset pagination).
  • Break large transactions into smaller ones; commit promptly.
  • Avoid long-running DDL during peak times — use online schema change tools (pt-online-schema-change or native online DDL if supported).
  • Tune MySQL config: innodb_buffer_pool_size, tmp_table_size, max_connections, thread_cache_size.
  • Use connection pooling and tune wait_timeout to avoid idle connection buildup.
  • Offload read traffic to replicas and cache frequent results (Redis, application cache).

6) Tools to automate and monitor

  • pt-query-digest (analyze slow logs).
  • pt-kill (automated safe killing of runaway queries).
  • Percona Monitoring and Management (PMM), Grafana + Prometheus dashboards for real-time metrics.
  • Performance Schema + sys schema queries for live diagnostics.

7) Emergency playbook (if DB is unresponsive)

  1. Run SHOW FULL PROCESSLIST; quickly identify top CPU/longest queries.
  2. Generate kill list for queries > X seconds (e.g., 30s) and KILL QUERY those that are non-critical.
  3. Check replication lag (if any) and I/O saturation.
  4. After stabilization, analyze slow queries and implement long-term fixes above.

If you want, I can produce: (A) a one-page checklist you can run during incidents, (B) sample queries to find killers and generate KILL scripts, or © a tuned Performance Schema query set — pick one.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *