Here Are the MySQL Development Specifications My Team Is Using | by Dwen | Jun, 2022

To boost our structural design!

0*LM36si3 cybIFHt6
Photo by Arnel Hasanovic on Unsplash

1.1 Database object

Database objects are the components of the database, and the common ones are as follows: Tables, Indexes, Views, Charts, Default-values, Rules, Triggers, Stored-Procedures, Users, etc.

The naming convention refers to the naming convention of database objects such as SCHEMA, TABLE, INDEX, CONSTRAINTS, etc.

1.2 Database Object Global Naming Convention

  • Use meaningful English words for naming, and the words are separated by underscores.
  • The name can only use English letters, numbers underscores, and start with English letters.
  • Avoid using MySQL reserved words such as backup, call, group, etc.
  • All database objects use lowercase letters. In fact, it is possible to set case sensitivity in MySQL. In order to ensure uniformity, we standardize all lowercase representations here.

1.3 Database naming convention

  • The database name should not exceed 30 characters as much as possible.
  • The database naming is generally the project name + abbreviation representing the meaning of the library, such as the workflow database of the IM project, which can be im_flow.
  • The default character set and collation clause must be added when the database is created. The default character set is UTF8 (dumbo has been migrated to use utf8mb4).
  • The name should be in lowercase.

1.4 Table naming convention

  • The table name of a regular table starts with t_, and t represents the meaning of the table. The naming rule is t + module (including the abbreviation for the meaning of the module) + table (including the abbreviation for the meaning of the table), such as the education information table of the user module: t_user_eduinfo.
  • Temporary tables (tables used by RD, QA, or DBA for temporary data processing), naming rules: temp prefix + module + table + date suffix: temp_user_eduinfo_20220101 .
  • Backup table (used to save and archive historical data or data for disaster recovery) naming rules, bak prefix + module + table + date suffix: bak_user_eduinfo_20220101 .
  • The tables of the same module should use the same prefix as much as possible, and the table names should express the meaning as much as possible.
  • Multiple words separated by an underscore _ .
  • The table name of the regular table should not exceed 30 characters as much as possible. The temptable and the bak table should be as short as possible, and the name should be in lowercase.

1.5 Field naming convention

  • The field name requires English words or abbreviations that represent its actual meaning, and the words are connected with underscores _, such as service_ip, service_port.
  • Fields with the same meaning between the tables must have the same name. For example, A table and the B table has a creation time, which should be unified as create_time, and the inconsistency will be very confusing.
  • Multiple words separated by an underscore _ .
  • The field name should not exceed 30 characters as much as possible, and the name should be in lowercase.

1.6 Index naming convention

  • The unique index is named using the uni + field name: create unique index uni_uid on t_user_basic(uid) .
  • The non-unique index is named using idx + field name: create index_idx_uname_mobile on t_user_basic(uname,mobile).
  • Multiple words are separated by an underscore _.
  • The index name should not exceed 50 characters as much as possible. The name should be in lowercase. The combination of index fields should not be too many, otherwise, it will not be conducive to the improvement of query efficiency.
  • For the column name composed of multiple words, take the abbreviation that represents the meaning as much as possible, such as the combined index on the member_id and friend_id of the test_contact table: idx_mid_fid.
  • Understand the leftmost prefix principle of the combined index and avoid repeated index construction. If (a,b,c) is established, it is equivalent to establishing (a), (a,b), (a,b,c).

1.7 View naming convention

  • The view name starts with v, which means view, and the complete structure is the abbreviation of v+ view content meaning.
  • If the view only comes from a single table, it is v+table name. If the view is associated with several tables, use v+underscore _to connect several table names, and the view name should not exceed 30 characters as much as possible. Abbreviations are used for more than 30 characters.
  • If there is no special need, developers are strictly prohibited from creating views.
  • The name should be in lowercase.

1.8 Stored procedure naming convention

  • The name of the stored procedure starts with sp, which means the stored procedure (storage procedure). After that, multiple words are connected with an underscore _. The name of the stored procedure should reflect its function. Stored procedure names should not exceed 30 characters as much as possible.
  • The input parameters in the stored procedure start with i_, and the output parameters start with o_.
  • The name should be in lowercase.
create procedure sp_multi_param(in i_id bigint,in i_name varchar(32),out o_memo varchar(100))

1.9 Function naming convention

  • The function name starts with func, which means function. After that, multiple words are connected with an underscore _, and the function name should reflect its function. The function name should not exceed 30 characters as much as possible.
  • The name should be in lowercase.
create function func_format_date(ctime datetime)

1.10 Trigger Naming Convention

  • Triggers start with trig, which means a trigger.
  • The basic part describes the table added by the trigger. The trigger name should not exceed 30 characters as much as possible.
  • The suffix (_i, _u, _d) indicates the triggering method of the trigger condition (insert, update or delete).
  • The name should be in lowercase.
DROP TRIGGER IF EXISTS trig_attach_log_d;CREATE TRIGGER trig_attach_log_d AFTER DELETE ON t_dept FOR EACH ROW;

1.11 Constraint naming convention

  • Unique constraint: uk_tablename_field name. uk is the abbreviation of UNIQUE KEY. For example, add a unique constraint to the department name of a department to ensure no duplicate names, as follows: alter table t_dept add constraint un_name unique(name) .
  • Foreign key constraint: fk_table name, followed by the table name where the foreign key is located and the corresponding main table name (excluding t_). The child table name and parent table name are separated by an underscore _. As follows: alter table t_user add constraint fk_user_dept foreign key(depno) references t_dept(id) .
  • Non-null constraint: If there is no special need, it is recommended that all fields be non-null by default, and different data types must give default values.
  • For performance reasons, if there is no special need, it is recommended not to use foreign keys. Referential integrity is controlled by code. This is also our common practice, and integrity control is performed from a program perspective, but if we do not pay attention, dirty data will also be generated.
  • The name should be in lowercase.
`id` int(11) NOT NULL,
`name` varchar(30) DEFAULT '',
`deptId` int(11) DEFAULT 0,
`salary` float DEFAULT NULL,

1.12 User naming convention

  • The user naming format used in production is code_application.
  • The naming rule for read-only users is read_application.

1.1 How to choose a storage engine

If there are no special requirements, the InnoDB storage engine must be used.

The current default engine can be viewed with show variables like 'default_storage_engine. There are mainly MyISAM and InnoDB, and the InnoDB engine is used by default since version 5.5.

The basic difference is: that the MyISAM type does not support advanced processing such as transaction processing, while the InnoDB type does.

The MyISAM type table emphasizes performance, which performs faster than the InnoDB type but does not provide transaction support, while InnoDB provides transaction support and advanced database features such as foreign keys.

1.2 Choice of the character set

Unless otherwise specified, utf8 or utf8mb4 must be used.

In China, it is the best way to choose the utf8 format that supports Chinese and various languages. MySQL adds utf8mb4 encoding after 5.5. mb4 means most bytes 4, which is specially designed to be compatible with four-byte unicode.

So, utf8mb4 is a superset of utf8, and no other conversion is required except changing the encoding to utf8mb4. Of course, in order to save space, it is usually enough to use utf8.

You can use the following script to view the encoding format of the database.

SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';-- orSHOW VARIABLES Like '%char%';

1.3 Table Design Specifications

  • The association between database tables corresponding to different applications should be reduced as much as possible, foreign keys are not allowed to associate between tables, to ensure the independence between the tables corresponding to the components, and to reconstruct the system or table structure to offer possibilities. Current industry practice generally controls referential integrity by procedures.
  • From the perspective of table design, the database design should not be carried out for the entire system but should be divided according to the components in the system architecture, and the database design should be carried out according to the business handled by each component.
  • The table must have PK. The advantages of the primary key are unique identification, effective reference, and efficient retrieval, so in general, try to have a primary key field.
  • A field has only one meaning.
  • The table should not have duplicate columns.
  • The use of complex data types (arrays, custom, etc.) is prohibited. The use of JSON types depends on the situation.
  • The data type of the field (join key) that needs to be joined must be absolutely consistent to avoid implicit conversion. For example, the associated fields are all of the type Int.
  • The design should at least meet the third normal form and minimize data redundancy. Some special scenarios allow denormalized design, but the design of redundant fields needs to be explained during the project review.
  • The TEXT field is stored as a large volume of text, which must be placed in a separate table and associated with the main table by PK. If there is no special need, the use of TEXT and BLOB fields is prohibited.
  • Tables that need to delete (or transfer) expired data regularly can be solved by sub-tables. Our approach is to migrate the historical data with low operation frequency to the historical table according to the 2/8 rule and make the cut point according to time or Id.
  • The number of fields in a single table should not be too many, and it is recommended not to exceed 50 at most. Excessive wide tables also have a big impact on performance.
  • When MySQL processes large tables, the performance begins to decrease significantly. Therefore, it is recommended that the physical size of a single table be limited to 16GB, and the number of data rows in the table should be controlled within 2000W.
  • The rule in the industry is that beyond 2000W performance starts to degrade significantly. However, this value is flexible, and you can judge it according to the actual situation. For example, Ali’s standard is 500W, and Baidu’s is indeed 2000W. In fact, whether the table is wide or not, the space occupied by a single row of data plays a role.
  • If the amount of data or data growth is largely in the early planning, then the table splitting strategy should be added to the design review, and there will be special articles to analyze the data splitting method: vertical splitting (vertical splitting and Vertical sub-table), horizontal split (sub-library sub-table and internal sub-table).
  • The use of a partition table is strictly prohibited without special requirements.

1.4 Field Design Specifications

  • INT: If there is no special need, use UNSIGNED INT to store integer numbers. The number after the integer field represents the display length. For example, id int(11) NOT NULL
  • DATETIME: All fields that need to be accurate to the time (hours, minutes, and seconds) use DATETIME instead of the TIMESTAMP type.
  • For TIMESTAMP, it converts the written time from the current time zone to UTC (Universal Standard Time) for storage. When querying, it is converted into the current time zone of the client and returned. And for DATETIME, no changes are made, and the input and output are basically as-is.
  • In addition, the range of DATETIME storage is also relatively large:
  • The time range that the timestamp can store is: 1970–01–01 00:00:01.000000’ to ‘2038–01–19 03:14:07.999999.
  • The time range that DateTime can store is: 1000–01–01 00:00:00.000000’ to ‘9999–12–31 23:59:59.999999.
  • However, in special cases, TIMESTAMP is more suitable for businesses across time zones.
  • VARCHAR: All dynamic length strings use the VARCHAR type, similar to the limited categories of fields such as status, and also use strings that can clearly represent the actual meaning, and should not be replaced by numbers such as INT; VARCHAR( N)
  • N represents the number of characters, not the number of bytes. For example, VARCHAR(255) can store up to 255 characters (characters include English letters, Chinese characters, special characters, etc.). But N should be as small as possible, because the maximum length of all VARCHAR fields in a MySQL table is 65535 bytes, and the number of stored characters is determined by the selected character set.
  • If UTF8 stores a character with a maximum of three bytes, then varchar should not exceed 21845 characters when storing characters that occupy three bytes in length. At the same time, when performing memory operations such as sorting and creating temporary tables, the length of N will be used to apply for memory. (If there is no special need, in principle, a single varchar type field is not allowed to exceed 255 characters)
  • TEXT: Only when the number of characters may exceed 20,000, the TEXT type can be used to store character data, because all MySQL databases will use the UTF8 character set.
  • All fields using TEXT type must be split from the original table, and form another table with the original table’s primary key for storage, and the purpose of isolation from large text fields is. If there is no special need, do not use MEDIUMTEXT, TEXT, or LONGTEXT types
  • For accurate floating-point data storage, DECIMAL is required, and FLOAT and DOUBLE are strictly prohibited.
  • If there is no special need, try not to use the BLOB type
  • If there is no special need, it is recommended to use the NOT NULL attribute for the field, and the default value can be used instead of NULL
  • The type of the auto-increment field must be an integer and must be UNSIGNED. The recommended type is INT or BIGINT, and the auto-increment field must be a primary key or a part of the primary key.

1.5 Index Design Specifications

  • Index discrimination.
  • The index must be created on a column with high index selectivity (discrimination). The selectivity is calculated as selectivity = count(distinct c_name)/count(*); If the discrimination result is less than 0.2, it is not recommended to use this column Create an index on the top, otherwise, it will probably slow down SQL execution.
  • Follow the leftmost prefix.
  • For determining the multiple fields that need to form a composite index, it is recommended to put the fields with high selectivity first during design. When used, the first field of the composite index must be in the where condition, and it needs to be matched according to the leftmost prefix rule.
  • Foreign keys are prohibited, and integrity can be constrained at the program level.
  • If you need to create an index for a Text type field, you must use a prefix index.
  • The number of indexes in a single table should theoretically be controlled within 5. There are often large batches of insert and update operation tables, and indexes should be built as little as possible. In theory, the principle of index building is to read more and write less.
  • The fields of ORDER BY, GROUP BY, and DISTINCT need to be added after the index to form a covering index.
  • Correctly understand and calculate the degree of discrimination of index fields. There are calculation rules in the text. An index with a high degree of discrimination can quickly locate data. If the degree of discrimination is too low, the index cannot be used effectively. It may be necessary to scan a large number of data pages and not use it. Indexing makes no difference.
  • Correctly understand and calculate the field length of the prefix index. There are judgment rules in the text. The appropriate length must ensure a high degree of discrimination and the most appropriate index storage capacity. Only when the optimal state is achieved can an efficient index be guaranteed.
  • Pay attention to the leftmost matching principle for the joint index: it must be matched in the order from left to right. MySQL will always match the index to the right until it encounters a range query (>, <, between, like) and then stop matching.
  • For example, depno=1 and empname>" and job=1. If the index of (depno, empname, job) order is established, the job will not use the index.
  • The strategy of fetching on demand. When querying records, do not use * as soon as they come up, but only fetch the required data. If possible, try to use only index coverage, which can reduce table return operations and improve efficiency.
  • Correctly judge whether to use a joint index (the section on the use of joint index above describes the judgment rules), and can also further analyze the index pushdown (IPC), reducing the operation of returning to the table, and improving the efficiency.
  • The principle of avoiding index invalidation: It is forbidden to use functions and operators on the index field, which will make the index invalid. This is actually the need to ensure the “cleanness” of the field corresponding to the index.
  • Avoid unnecessary type conversion. When string fields are compared using numerical values, the index will be invalid.
  • The fuzzy query ‘%value%’ will invalidate the index and turn it into a full table scan, because the range of the scan cannot be determined, but ‘value%’ can effectively use the index.
  • The index covers the sorting field, which can reduce sorting steps and improve query efficiency
  • Expand the index as much as possible, and do not create an index unless necessary. For example, there is already an index of an in the table, and now you need to add an index of (a, b), then you only need to modify the original index.

For example: such as a brand table, the established indexes are as follows, a primary key index, and a unique index.

PRIMARY KEY (`id`), UNIQUE KEY `uni_brand_define` (`app_id`,`define_id`)

When the retrieval statement in your colleague’s business code is as follows, you should be warned immediately, that is, there is no covering index, nor does it follow the leftmost prefix principle:

select brand_id,brand_name from ds_brand_system where status = ? and define_id = ? and app_id = ?

It is recommended to change to:

select brand_id, brand_name from ds_brand_system where app_id = ? and define_id = ? and status = ?

1.6 Constrained Design Specifications

  • PK should be ordered and meaningless, customized by the developer, as short as possible, and in an auto-incrementing sequence.
  • If there is a unique constraint in the table besides PK, a unique constraint index prefixed with uk_ can be created in the database.
  • The PK field is not allowed to be updated.
  • It is forbidden to create foreign key constraints, which are controlled by the program.
  • If there is no special need, all fields must add a non-null constraint, that is, not null.
  • If there is no special need, all fields must have default values.

1.1 Normativeness of select retrievals

  • Try to avoid the use of select *. The use of select * in the join, the statement may cause queries that only need to access the index to be completed and need to return to the table to fetch data.
  • One is to retrieve a lot of unnecessary data, which is a disaster for wide tables;
  • It is strictly forbidden to use select * from t_name without anywhere condition. The reason is the same, it will become a full table full-field scan.
  • The text type field storage in MySQL: (a.) It is not stored with other ordinary fields, because the reading efficiency is low, and it will also affect the access efficiency of other lightweight fields. (b.) If you do not need text-type fields and use select *, the execution will consume a lot of IO, and the efficiency is very low.
  • Relevant functions can be used to retrieve fields but functions with uncertain results such as now(), rand(), sysdate() should be avoided as much as possible. It is strictly forbidden to use any function on the filter condition field in the Where condition, including Data type conversion functions. A large number of calculations and conversions can cause inefficiencies, which are also described in the index.
  • All paging query statements need to have sorting conditions, otherwise, it is easy to cause the disorder.
  • Replace or with in() / union(), the efficiency will be better and note that the number of ins is less than 300
  • It is strictly forbidden to use % prefix for fuzzy prefix query: such as: select a,b,c from t_name where a like '%name'; , you can use %fuzzy suffix query such as select a, b from t_name where a like 'name%'; .
  • Avoid using subqueries, you can optimize subqueries into join operations.

Usually, when the subquery is in the clause, and the subquery is simple SQL (excluding the clauses of union, group by, order by, limit), the subquery can be converted into an associated query for optimization.

Reasons for poor subquery performance:

  • The result set of the subquery cannot use the index. Usually, the result set of the subquery will be stored in the temporary table. There is no index in the memory temporary table or the disk temporary table, so the query performance will be affected to a certain extent.
  • Especially for subqueries that return a large result set, the greater the impact on query performance.
  • Since the subquery will generate a large number of temporary tables and no indexes, it will consume too many CPU and IO resources, resulting in a large number of slow queries.

1.2 Normative operation

Prohibit the use of INSERT statements without a field list.

For example: insert into values ('a', 'b', 'c'); should use insert into t_name(c1, c2, c3) values ('a', 'b', 'c');.

Bulk write operations (UPDATE, DELETE, INSERT) require multiple operations in batches.

Large batch operations may cause serious primary-secondary delay, especially in primary-secondary mode, large batch operations may cause serious primary-secondary delay because the secondary needs to read the log from the primary’s bin-log for data synchronization.

A large number of logs are generated when bin-log logs are in row format.

News Credit

%d bloggers like this: