MySQL

Li

Li Wei

January 16, 202612 min read

Title: MySQL

Basic Introduction

Overview of Databases

Database: DataBase, abbreviated DB, a repository for storing and managing data.

Advantages of databases:

  • Can persistently store data
  • Convenient for storing and managing data
  • Operated using a unified language, SQL

Relationship between databases, tables, and data:

  • Database

    • A repository for storing and managing data
    • A single database can contain multiple tables
  • Table

    • One of the most important components of a database
    • Consists of vertical columns and horizontal rows (similar to an Excel sheet)
    • Columns can be defined with names, data types, constraints, etc.
    • A table can store many rows of data
  • Data: The information you want to store permanently

MySQL is one of the most popular relational database management systems. Relational databases store data in separate tables, and relationships can be defined between tables, increasing flexibility. MySQL uses SQL statements, the most common standardized language for accessing databases.

Drawbacks: Data is stored on disk, leading to poor read/write performance; complex data relationships make scalability difficult.

Architecture

Overall MySQL Architecture

Detailed architecture:

  • Layer 1: Network Connection Layer

    • Handles client connections, including local socket communication and most TCP/IP communication implemented by client/server tools; responsibilities include connection handling, authentication, and related security measures.
    • Introduces the concept of a Connection Pool to manage buffered user connections, thread handling, and other caching needs.
    • Implements SSL‑based secure connections; the server verifies each client’s permissions for secure access.
  • Layer 2: Core Service Layer

    • Query cache, parser, optimizer, executor, and most built‑in functions (date, math, encryption, etc.).
    • Management Services & Utilities: system administration tools such as backup, security, replication, clustering, etc.
    • SQL Interface: receives user SQL commands and returns query results.
    • Parser: analyzes SQL statements.
    • Optimizer: optimizes queries.
    • Caches & Buffers: query cache; if the cache is large enough, it can improve performance in read‑heavy environments.
    • All cross‑storage‑engine features (stored procedures, triggers, views, etc.) are implemented here.
    • The server parses queries, builds an internal parse tree, performs optimizations (e.g., determining table join order, index usage), and finally generates an execution plan.
    • Transactions are not managed by the server layer; they are implemented by the storage engine.
  • Layer 3: Storage Engine Layer

    • Pluggable Storage Engines: MySQL’s key differentiator is its plugin‑style storage‑engine architecture (engines are table‑level, not database‑level).
    • Storage engines are truly responsible for data storage and retrieval; the server communicates with them via an API.
    • Different engines provide different features; you can choose the appropriate engine based on development needs.
  • Layer 4: System File Layer

    • Data storage layer; stores data on the file system and interacts with storage engines.
    • File System: holds configuration files, data files, log files, error files, binary files, etc.

SQL Execution Flow

Overall execution flow:

Establishing a Connection

  • Connector & Pooling: Creating a connection is expensive because each connection maps to a thread. Frequent creation/termination wastes resources, so a connection pool is recommended to improve performance.

  • After the TCP connection is established, authentication occurs. Once authenticated, SQL can be executed. If an administrator changes a user’s privileges, existing connections retain the old privileges; only new connections see the changes.

  • MySQL can interact with many clients simultaneously, so each session must be isolated.

  • Privilege Information: GRANT statements modify both the privilege tables on disk and the in‑memory cache; permission checks use the in‑memory data. flush privileges rebuilds the in‑memory privilege cache from the on‑disk tables, so inconsistencies can arise if the system tables are modified directly with DML statements—avoid doing that.

Privilege Disk Storage Memory Storage Modification Strategy Scope
Global privileges Table mysql.user Array acl_users Existing connections unaffected; new connections use new settings All threads
DB privileges Table mysql.db Array acl_dbs Takes effect immediately for all connections Global
Table privileges Table mysql.tables_priv + column‑privilege structure hash column_priv_hash Takes effect immediately for all connections Global
Column privileges Table mysql.columns_priv + column‑privilege structure hash column_priv_hash Takes effect immediately for all connections Global
  • Connection State: If a client is idle for too long, the connector automatically disconnects it. The timeout is controlled by parameter wait_timeout (default 8 hours). After disconnection, a subsequent request receives error Lost connection to MySQL server during query.

In MySQL, a long connection stays open as long as the client continues to send requests; a short connection is closed after a few queries and reopened for the next request.

To reduce connection overhead, long connections are recommended, but excessive long connections can cause OOM. Solutions:

  • Periodically close long connections, or close them after a large memory‑intensive query and reopen when needed.
    • In MySQL 5.7, after a large operation you can execute mysql_reset_connection to reinitialize connection resources without reconnecting or re‑authenticating; the connection is reset to its freshly created state.
  • SHOW PROCESSLIST: view current MySQL threads to see real‑time SQL execution. Rows with Command = Sleep indicate idle connections.

Parameter Explanation

Parameter Meaning
ID Connection ID assigned by MySQL; view with SELECT CONNECTION_ID();
User Current user; non‑root users see only statements they are permitted to run
Host Source IP and port of the statement; useful for tracing problematic queries
db Database the thread is connected to
Command Current command (e.g., Sleep, Query, Connect)
Time Duration of the current state, in seconds
State Detailed status of the SQL (e.g., copying to tmp table, sorting result, sending data)
Info The actual SQL statement being executed; key for troubleshooting

Sending data state means the MySQL thread is reading rows and sending results to the client; it occurs at any point during execution. Because it involves heavy disk I/O, it is usually the most time‑consuming stage of a query.

Query Cache

Workflow: When an identical SQL statement is executed, the server can return the cached result. If the underlying data changes, the cache is invalidated; tables that are frequently updated are poor candidates for query caching.

Query Process:

  1. Client sends a query.
  2. Server checks the query cache; if a hit occurs, the cached result (typically a K‑V pair) is returned immediately; otherwise, continue.
  3. Parser analyzes the SQL.
  4. Optimizer generates an execution plan.
  5. Executor calls the storage‑engine API to run the query.
  6. Result is sent back to the client.

In most cases, query caching is discouraged because it often does more harm than good.

  • The cache invalidates frequently: any update to a table clears all cached queries for that table. Thus, cached results may be discarded before they are used, especially in write‑heavy workloads where hit rates are low.
  • Only static tables that change rarely (e.g., a system‑configuration table) benefit from query caching.

Cache Configuration

  • Check whether the MySQL instance supports query caching:
  • Verify whether query caching is enabled:
  • Parameter meanings:
    • OFF or 0: query cache disabled
    • ON or 1: cache enabled; results meeting cache criteria are stored; you can explicitly prevent caching with SQL_NO_CACHE
    • DEMAND or 2: cache only queries that explicitly use SQL_CACHE; others are not cached
  • View cache size:
  • View cache status variables:
Variable Meaning
Qcache_free_blocks Number of free memory blocks in the cache
Qcache_free_memory Amount of free memory in the cache
Qcache_hits Number of cache hits
Qcache_inserts Number of queries added to the cache
Qcache_lowmem_prunes Number of queries removed due to insufficient memory
Qcache_not_cached Number of queries not cached (because of query_cache_type or other reasons)
Qcache_queries_in_cache Number of queries currently cached
Qcache_total_blocks Total number of blocks in the cache
  • Configure my.cnf and restart the service. Verify with a time‑consuming query run multiple times; observe decreasing execution times and increasing Qcache_hits to confirm caching.

Cache Invalidation Scenarios

  • The SQL must be identical to hit the cache because the cache key is the query text.
    • Non‑deterministic functions (e.g., NOW(), CURRENT_DATE(), CURDATE(), CURTIME(), RAND(), UUID(), USER(), DATABASE()) prevent caching.
    • Queries that reference no tables are not cached.
    • System tables (mysql, information_schema, performance_schema) are never cached.
    • Queries inside stored procedures, triggers, or functions that span multiple storage engines invalidate the cache.
  • Any modification to a table (e.g., INSERT, UPDATE, DELETE, ALTER TABLE, DROP TABLE, DROP DATABASE) invalidates all cached queries that involve that table, including those accessed via a MERGE view.

Parser

If the query cache is missed, the SQL is actually executed. The parser performs lexical and syntactic analysis, building a parse tree.

  • Lexical analysis: tokenizes the input string (e.g., recognizing SELECT as a query keyword, t as a table name, id as a column).
  • Syntactic analysis: validates the statement against MySQL grammar and checks that referenced columns exist. Errors produce You have an error in your SQL syntax messages.

The preprocessor further validates the parse tree (e.g., confirming tables/columns exist, checking for ambiguous aliases).

Optimizer

Cost Analysis: When multiple indexes exist or a query involves several joins, the optimizer decides which index to use and the join order.

  • Identify all candidate indexes based on the WHERE clause.
  • Perform cost analysis: cost = I/O cost + CPU cost; compare full‑table scans vs. index scans.
  • Choose the plan with the lowest estimated cost.

Row count estimation is a key factor; fewer rows scanned means fewer disk accesses and less CPU usage. The optimizer also considers temporary table usage, sorting, etc.

Statistics: MySQL stores two kinds of statistics:

  • innodb_table_stats holds table‑level statistics (one record per table).
  • innodb_index_stats holds index‑level statistics (one record per index).

Before execution, MySQL can only estimate row counts using these statistics. Cardinality (the number of distinct values in an index) indicates selectivity; higher cardinality = better discrimination.

Cardinality is obtained via sample statistics: InnoDB samples N data pages, counts distinct values on those pages, averages, then multiplies by the total number of pages.

Two storage methods for statistics can be chosen via parameter innodb_stats_persistent:

  • ON: statistics are persisted (default). Sample size N defaults to 20, configurable via innodb_stats_persistent_sample_pages; larger N yields more accurate stats but uses more resources.
  • OFF: statistics are kept only in memory; default sample size is 8. Not recommended because stats must be recomputed each time.

Tables are continuously updated, so statistics must be refreshed:

  • Set innodb_stats_auto_recalc to 1: when changed rows exceed 10 % of the table size, statistics are recomputed asynchronously.
  • Call ANALYZE TABLE t manually to re‑analyze statistics (synchronous, acquires an MDL read lock and may temporarily block the system).

EXPLAIN plans are generated during the optimizer phase. If the estimated rows differ greatly from reality, run ANALYZE to refresh statistics.

Wrong Index Choice: Sampling errors or poorly written queries can cause the optimizer to pick a suboptimal index.

Solutions:

  • Use FORCE INDEX to force a specific index.
  • Rewrite the query to guide the optimizer.
  • Create a more appropriate index or drop a misleading one.

Executor

MySQL now knows what to do (parser) and how to do it (optimizer), so it proceeds to the executor stage.

When execution starts, the server first checks whether the current connection has SELECT privileges on the target table; if not, an error is returned. If the query cache was hit, permission checking still occurs before returning cached results.

Example:

  • If privileges are sufficient, the table is opened and the executor calls the storage‑engine API defined for that engine.

Suppose table T has no index on column ID. The executor workflow:

  1. Call the InnoDB engine to fetch the first row, check if ID = 10; if not, skip; if yes, add the row to the result set.
  2. Call the engine to fetch the “next row” and repeat until the last row is reached.
  3. Return the collected rows to the client.

For indexed tables, the logic is similar: the first call fetches the first matching row via the index, then subsequent calls fetch the next matching rows using engine‑provided interfaces.

Engine Layer

Interaction between the Server layer and the storage engine is record‑by‑record. The engine returns one record at a time to the Server for further processing, rather than dumping all rows at once.

Workflow:

  1. Use a secondary index to locate the first matching record, perform a “row‑lookup” (clustered index fetch), and return the record to the Server, which checks whether it satisfies the query.
  2. Continue scanning the secondary index for the next matching record.

Termination Process

Terminating Statements

Terminate a statement that is currently executing in a thread:

KILL does not instantly abort the statement; it signals the executing thread that the statement can be stopped, allowing graceful cleanup (similar to an interrupt). Since DML operations acquire MDL read locks, abruptly killing a thread could leave those locks unreleased.

Execution flow for command KILL QUERYthread_id_A:

  1. Change session A’s status to THD::KILL_QUERY (set variable killed to THD::KILL_QUERY).
  2. Send a signal to session A’s execution thread so it can handle the THD::KILL_QUERY state.

If the session is waiting (e.g., blocked on a lock), it must be in a wake‑up‑able state; otherwise the KILL will fail.

Typical scenario: innodb_thread_concurrency equals 2, meaning the maximum concurrent threads is set to 2.

  • Session A runs a transaction, session B runs a transaction, reaching the thread limit; session C’s transaction blocks waiting.

(content truncated)


Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.

Keep reading

More related articles from DriftSeas.