MySQL
Li Wei
Title: MySQL
Basic Introduction
Overview of Databases
Database: DataBase, abbreviated DB, a repository for storing and managing data.
Advantages of databases:
- Can persistently store data
- Convenient for storing and managing data
- Operated using a unified language, SQL
Relationship between databases, tables, and data:
Database
- A repository for storing and managing data
- A single database can contain multiple tables
Table
- One of the most important components of a database
- Consists of vertical columns and horizontal rows (similar to an Excel sheet)
- Columns can be defined with names, data types, constraints, etc.
- A table can store many rows of data
Data: The information you want to store permanently
MySQL is one of the most popular relational database management systems. Relational databases store data in separate tables, and relationships can be defined between tables, increasing flexibility. MySQL uses SQL statements, the most common standardized language for accessing databases.
Drawbacks: Data is stored on disk, leading to poor read/write performance; complex data relationships make scalability difficult.
Architecture
Overall MySQL Architecture
Detailed architecture:
Layer 1: Network Connection Layer
- Handles client connections, including local socket communication and most TCP/IP communication implemented by client/server tools; responsibilities include connection handling, authentication, and related security measures.
- Introduces the concept of a Connection Pool to manage buffered user connections, thread handling, and other caching needs.
- Implements SSL‑based secure connections; the server verifies each client’s permissions for secure access.
Layer 2: Core Service Layer
- Query cache, parser, optimizer, executor, and most built‑in functions (date, math, encryption, etc.).
- Management Services & Utilities: system administration tools such as backup, security, replication, clustering, etc.
- SQL Interface: receives user SQL commands and returns query results.
- Parser: analyzes SQL statements.
- Optimizer: optimizes queries.
- Caches & Buffers: query cache; if the cache is large enough, it can improve performance in read‑heavy environments.
- All cross‑storage‑engine features (stored procedures, triggers, views, etc.) are implemented here.
- The server parses queries, builds an internal parse tree, performs optimizations (e.g., determining table join order, index usage), and finally generates an execution plan.
- Transactions are not managed by the server layer; they are implemented by the storage engine.
Layer 3: Storage Engine Layer
- Pluggable Storage Engines: MySQL’s key differentiator is its plugin‑style storage‑engine architecture (engines are table‑level, not database‑level).
- Storage engines are truly responsible for data storage and retrieval; the server communicates with them via an API.
- Different engines provide different features; you can choose the appropriate engine based on development needs.
Layer 4: System File Layer
- Data storage layer; stores data on the file system and interacts with storage engines.
- File System: holds configuration files, data files, log files, error files, binary files, etc.
SQL Execution Flow
Overall execution flow:
Establishing a Connection
Connector & Pooling: Creating a connection is expensive because each connection maps to a thread. Frequent creation/termination wastes resources, so a connection pool is recommended to improve performance.
After the TCP connection is established, authentication occurs. Once authenticated, SQL can be executed. If an administrator changes a user’s privileges, existing connections retain the old privileges; only new connections see the changes.
MySQL can interact with many clients simultaneously, so each session must be isolated.
Privilege Information:
GRANTstatements modify both the privilege tables on disk and the in‑memory cache; permission checks use the in‑memory data.flush privilegesrebuilds the in‑memory privilege cache from the on‑disk tables, so inconsistencies can arise if the system tables are modified directly with DML statements—avoid doing that.
| Privilege | Disk Storage | Memory Storage | Modification Strategy | Scope |
|---|---|---|---|---|
| Global privileges | Table mysql.user |
Array acl_users |
Existing connections unaffected; new connections use new settings | All threads |
| DB privileges | Table mysql.db |
Array acl_dbs |
Takes effect immediately for all connections | Global |
| Table privileges | Table mysql.tables_priv + column‑privilege structure hash column_priv_hash |
– | Takes effect immediately for all connections | Global |
| Column privileges | Table mysql.columns_priv + column‑privilege structure hash column_priv_hash |
– | Takes effect immediately for all connections | Global |
- Connection State: If a client is idle for too long, the connector automatically disconnects it. The timeout is controlled by parameter
wait_timeout(default 8 hours). After disconnection, a subsequent request receives errorLost connection to MySQL server during query.
In MySQL, a long connection stays open as long as the client continues to send requests; a short connection is closed after a few queries and reopened for the next request.
To reduce connection overhead, long connections are recommended, but excessive long connections can cause OOM. Solutions:
- Periodically close long connections, or close them after a large memory‑intensive query and reopen when needed.
- In MySQL 5.7, after a large operation you can execute
mysql_reset_connectionto reinitialize connection resources without reconnecting or re‑authenticating; the connection is reset to its freshly created state.
- In MySQL 5.7, after a large operation you can execute
SHOW PROCESSLIST: view current MySQL threads to see real‑time SQL execution. Rows withCommand=Sleepindicate idle connections.
Parameter Explanation
| Parameter | Meaning |
|---|---|
| ID | Connection ID assigned by MySQL; view with SELECT CONNECTION_ID(); |
| User | Current user; non‑root users see only statements they are permitted to run |
| Host | Source IP and port of the statement; useful for tracing problematic queries |
| db | Database the thread is connected to |
| Command | Current command (e.g., Sleep, Query, Connect) |
| Time | Duration of the current state, in seconds |
| State | Detailed status of the SQL (e.g., copying to tmp table, sorting result, sending data) |
| Info | The actual SQL statement being executed; key for troubleshooting |
Sending data state means the MySQL thread is reading rows and sending results to the client; it occurs at any point during execution. Because it involves heavy disk I/O, it is usually the most time‑consuming stage of a query.
Query Cache
Workflow: When an identical SQL statement is executed, the server can return the cached result. If the underlying data changes, the cache is invalidated; tables that are frequently updated are poor candidates for query caching.
Query Process:
- Client sends a query.
- Server checks the query cache; if a hit occurs, the cached result (typically a K‑V pair) is returned immediately; otherwise, continue.
- Parser analyzes the SQL.
- Optimizer generates an execution plan.
- Executor calls the storage‑engine API to run the query.
- Result is sent back to the client.
In most cases, query caching is discouraged because it often does more harm than good.
- The cache invalidates frequently: any update to a table clears all cached queries for that table. Thus, cached results may be discarded before they are used, especially in write‑heavy workloads where hit rates are low.
- Only static tables that change rarely (e.g., a system‑configuration table) benefit from query caching.
Cache Configuration
- Check whether the MySQL instance supports query caching:
- Verify whether query caching is enabled:
- Parameter meanings:
OFFor0: query cache disabledONor1: cache enabled; results meeting cache criteria are stored; you can explicitly prevent caching withSQL_NO_CACHEDEMANDor2: cache only queries that explicitly useSQL_CACHE; others are not cached
- View cache size:
- View cache status variables:
| Variable | Meaning |
|---|---|
| Qcache_free_blocks | Number of free memory blocks in the cache |
| Qcache_free_memory | Amount of free memory in the cache |
| Qcache_hits | Number of cache hits |
| Qcache_inserts | Number of queries added to the cache |
| Qcache_lowmem_prunes | Number of queries removed due to insufficient memory |
| Qcache_not_cached | Number of queries not cached (because of query_cache_type or other reasons) |
| Qcache_queries_in_cache | Number of queries currently cached |
| Qcache_total_blocks | Total number of blocks in the cache |
- Configure
my.cnfand restart the service. Verify with a time‑consuming query run multiple times; observe decreasing execution times and increasingQcache_hitsto confirm caching.
Cache Invalidation Scenarios
- The SQL must be identical to hit the cache because the cache key is the query text.
- Non‑deterministic functions (e.g.,
NOW(),CURRENT_DATE(),CURDATE(),CURTIME(),RAND(),UUID(),USER(),DATABASE()) prevent caching. - Queries that reference no tables are not cached.
- System tables (
mysql,information_schema,performance_schema) are never cached. - Queries inside stored procedures, triggers, or functions that span multiple storage engines invalidate the cache.
- Non‑deterministic functions (e.g.,
- Any modification to a table (e.g.,
INSERT,UPDATE,DELETE,ALTER TABLE,DROP TABLE,DROP DATABASE) invalidates all cached queries that involve that table, including those accessed via a MERGE view.
Parser
If the query cache is missed, the SQL is actually executed. The parser performs lexical and syntactic analysis, building a parse tree.
- Lexical analysis: tokenizes the input string (e.g., recognizing
SELECTas a query keyword,tas a table name,idas a column). - Syntactic analysis: validates the statement against MySQL grammar and checks that referenced columns exist. Errors produce
You have an error in your SQL syntaxmessages.
The preprocessor further validates the parse tree (e.g., confirming tables/columns exist, checking for ambiguous aliases).
Optimizer
Cost Analysis: When multiple indexes exist or a query involves several joins, the optimizer decides which index to use and the join order.
- Identify all candidate indexes based on the WHERE clause.
- Perform cost analysis: cost = I/O cost + CPU cost; compare full‑table scans vs. index scans.
- Choose the plan with the lowest estimated cost.
Row count estimation is a key factor; fewer rows scanned means fewer disk accesses and less CPU usage. The optimizer also considers temporary table usage, sorting, etc.
Statistics: MySQL stores two kinds of statistics:
innodb_table_statsholds table‑level statistics (one record per table).innodb_index_statsholds index‑level statistics (one record per index).
Before execution, MySQL can only estimate row counts using these statistics. Cardinality (the number of distinct values in an index) indicates selectivity; higher cardinality = better discrimination.
Cardinality is obtained via sample statistics: InnoDB samples N data pages, counts distinct values on those pages, averages, then multiplies by the total number of pages.
Two storage methods for statistics can be chosen via parameter innodb_stats_persistent:
ON: statistics are persisted (default). Sample size N defaults to 20, configurable viainnodb_stats_persistent_sample_pages; larger N yields more accurate stats but uses more resources.OFF: statistics are kept only in memory; default sample size is 8. Not recommended because stats must be recomputed each time.
Tables are continuously updated, so statistics must be refreshed:
- Set
innodb_stats_auto_recalcto 1: when changed rows exceed 10 % of the table size, statistics are recomputed asynchronously. - Call
ANALYZE TABLE tmanually to re‑analyze statistics (synchronous, acquires an MDL read lock and may temporarily block the system).
EXPLAIN plans are generated during the optimizer phase. If the estimated rows differ greatly from reality, run ANALYZE to refresh statistics.
Wrong Index Choice: Sampling errors or poorly written queries can cause the optimizer to pick a suboptimal index.
Solutions:
- Use
FORCE INDEXto force a specific index. - Rewrite the query to guide the optimizer.
- Create a more appropriate index or drop a misleading one.
Executor
MySQL now knows what to do (parser) and how to do it (optimizer), so it proceeds to the executor stage.
When execution starts, the server first checks whether the current connection has SELECT privileges on the target table; if not, an error is returned. If the query cache was hit, permission checking still occurs before returning cached results.
Example:
- If privileges are sufficient, the table is opened and the executor calls the storage‑engine API defined for that engine.
Suppose table T has no index on column ID. The executor workflow:
- Call the InnoDB engine to fetch the first row, check if
ID = 10; if not, skip; if yes, add the row to the result set. - Call the engine to fetch the “next row” and repeat until the last row is reached.
- Return the collected rows to the client.
For indexed tables, the logic is similar: the first call fetches the first matching row via the index, then subsequent calls fetch the next matching rows using engine‑provided interfaces.
Engine Layer
Interaction between the Server layer and the storage engine is record‑by‑record. The engine returns one record at a time to the Server for further processing, rather than dumping all rows at once.
Workflow:
- Use a secondary index to locate the first matching record, perform a “row‑lookup” (clustered index fetch), and return the record to the Server, which checks whether it satisfies the query.
- Continue scanning the secondary index for the next matching record.
Termination Process
Terminating Statements
Terminate a statement that is currently executing in a thread:
KILL does not instantly abort the statement; it signals the executing thread that the statement can be stopped, allowing graceful cleanup (similar to an interrupt). Since DML operations acquire MDL read locks, abruptly killing a thread could leave those locks unreleased.
Execution flow for command KILL QUERYthread_id_A:
- Change session A’s status to
THD::KILL_QUERY(set variablekilledtoTHD::KILL_QUERY). - Send a signal to session A’s execution thread so it can handle the
THD::KILL_QUERYstate.
If the session is waiting (e.g., blocked on a lock), it must be in a wake‑up‑able state; otherwise the KILL will fail.
Typical scenario: innodb_thread_concurrency equals 2, meaning the maximum concurrent threads is set to 2.
- Session A runs a transaction, session B runs a transaction, reaching the thread limit; session C’s transaction blocks waiting.
(content truncated)
Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.