What Is It?
What Are SQL String Functions?
String functions transform and extract information from text (VARCHAR, TEXT, CHAR) values. Real-world data is messy: names with extra spaces, inconsistent capitalization, phone numbers with random separators, emails in mixed case. Before you can report on it, you have to clean it — and clean SQL string functions are the way.
MySQL ships dozens of string functions. This chapter covers the 20 or so that you will actually use every week as a data analyst or backend developer.
Sample Table Used Throughout This Chapter
CREATE TABLE customers (
id INT PRIMARY KEY,
full_name VARCHAR(100),
email VARCHAR(100),
phone VARCHAR(20),
city VARCHAR(50)
);
INSERT INTO customers VALUES
(1, ' Aarav Kumar ', 'Aarav.Kumar@Gmail.COM', '+91-98765-43210', 'Bengaluru'),
(2, 'priya SHARMA', 'priya@MAIL.com', '+91 9876 543 211', 'mumbai'),
(3, 'Rohan Mehta', 'ROHAN@mail.com', '9876543212', 'Delhi'),
(4, 'meera-iyer', 'meera@MAIL.com', '091-9876-543213', 'chennai'),
(5, 'Vikram Singh Rajput','v.singh@MAIL.CO.IN', '+91.9876.543.214', 'Pune');This is the kind of real data you will see in actual databases: leading/trailing spaces, inconsistent casing, irregular phone formatting, mixed-case emails. Let's clean it.
Why Does It Matter?
Why String Functions Matter
1. Every Real Database Has Dirty Text
User-entered data is always messy. Even 'clean' data from APIs has quirks. Before you can match, group, or report, you need normalization: trim spaces, lowercase emails, format phone numbers consistently. String functions let you do this at the database level, often faster than in application code.
2. Searching and Matching Depends on It
Equality comparisons on VARCHAR are case-sensitive in many collations. Email validation, name matching, deduplication — all need LOWER() or TRIM() or both. If you do not normalize, 'Aarav Kumar' and 'aarav kumar' are different customers to SQL, which is wrong.
3. Reporting Requires Formatting
Concatenate first and last names for display. Mask credit card numbers showing only the last 4 digits. Format phone numbers for printing. Truncate long descriptions. All of this is string function work.
4. Interview Questions Use String Functions
'Extract the domain from an email,' 'Get the first name,' 'Format names in proper case,' 'Find all customers with a specific prefix,' 'Count occurrences of a character.' These appear in real SQL interviews regularly.
Detailed Explanation
Detailed Explanation
1. CONCAT and CONCAT_WS - Joining Strings
SELECT CONCAT('Hello', ' ', 'World'); -- 'Hello World'
SELECT CONCAT('User: ', id, ', Name: ', full_name) FROM customers;
-- CONCAT_WS: CONCAT With Separator
SELECT CONCAT_WS(', ', 'Aarav', 'Bengaluru', 'India'); -- 'Aarav, Bengaluru, India'
SELECT CONCAT_WS(' - ', first_name, middle_name, last_name) FROM people;Gotcha: CONCAT returns NULL if ANY argument is NULL. CONCAT_WS skips NULL arguments gracefully (but returns NULL only if the separator itself is NULL).
2. LENGTH vs CHAR_LENGTH
This matters for non-ASCII text:
- LENGTH(str): returns the number of bytes.
- CHAR_LENGTH(str) (or CHARACTER_LENGTH): returns the number of characters.
SELECT LENGTH('hello'); -- 5 bytes
SELECT CHAR_LENGTH('hello'); -- 5 chars
SELECT LENGTH('नमस्ते'); -- 18 bytes (UTF-8)
SELECT CHAR_LENGTH('नमस्ते'); -- 6 charsFor anything user-facing (validating max length, counting), use CHAR_LENGTH. LENGTH is for storage calculations.
3. UPPER, LOWER, INITCAP
SELECT UPPER('hello'); -- 'HELLO'
SELECT LOWER('HELLO'); -- 'hello'
-- INITCAP capitalizes the first letter of each word
-- Available in PostgreSQL and Oracle; NOT in MySQL
-- MySQL workaround using functions and string manipulation shown laterUse LOWER() for case-insensitive comparisons (on case-sensitive collations):
-- Find all variants of an email
SELECT * FROM customers WHERE LOWER(email) = 'priya@mail.com';4. SUBSTRING (SUBSTR) - Extract a Portion
Syntax: SUBSTRING(str, start, length) or SUBSTRING(str FROM start FOR length). Position is 1-based in MySQL.
SELECT SUBSTRING('Bengaluru', 1, 4); -- 'Beng'
SELECT SUBSTRING('Bengaluru', 4); -- 'galuru' (from position 4 to end)
SELECT SUBSTR('Bengaluru', -4); -- 'luru' (last 4 chars, negative = from end)5. LEFT and RIGHT
SELECT LEFT('Bengaluru', 4); -- 'Beng'
SELECT RIGHT('Bengaluru', 4); -- 'luru'
-- Get last 4 digits of a phone number
SELECT RIGHT(phone, 4) FROM customers;LEFT/RIGHT are more readable than SUBSTRING when you want a prefix or suffix.
6. TRIM, LTRIM, RTRIM
-- Remove whitespace from both ends
SELECT TRIM(' Aarav Kumar '); -- 'Aarav Kumar'
-- Left only
SELECT LTRIM(' Aarav'); -- 'Aarav'
-- Right only
SELECT RTRIM('Aarav '); -- 'Aarav'
-- Remove specific characters
SELECT TRIM(LEADING '0' FROM '00012345'); -- '12345'
SELECT TRIM(TRAILING '.' FROM 'hello...'); -- 'hello'
SELECT TRIM(BOTH '-' FROM '---abc---'); -- 'abc'For cleaning messy input: TRIM() before comparison and before storage.
7. REPLACE - Substitute Substrings
SELECT REPLACE('+91-98765-43210', '-', ''); -- '+919876543210'
SELECT REPLACE(email, '@gmail.com', '@GMAIL.COM') FROM customers;
SELECT REPLACE('Hello World', 'World', 'SQL'); -- 'Hello SQL'REPLACE is case-sensitive. To replace case-insensitively in MySQL, you need REGEXP_REPLACE (MySQL 8.0+).
8. LOCATE, INSTR, POSITION - Find a Substring
All three do the same thing with slightly different argument orders. Return the 1-based position of a substring, or 0 if not found.
SELECT LOCATE('@', 'aarav@mail.com'); -- 6
SELECT INSTR('aarav@mail.com', '@'); -- 6
SELECT POSITION('@' IN 'aarav@mail.com'); -- 6
-- LOCATE with start position
SELECT LOCATE('a', 'banana', 3); -- finds 'a' starting from position 3Combined with SUBSTRING to extract parts:
-- Extract the domain from an email
SELECT SUBSTRING(email, LOCATE('@', email) + 1)
FROM customers;9. LPAD, RPAD - Padding
SELECT LPAD('42', 5, '0'); -- '00042'
SELECT RPAD('abc', 6, '.'); -- 'abc...'
-- Format IDs as 6-digit codes
SELECT LPAD(id, 6, '0') FROM products;
-- 1 -> '000001', 42 -> '000042'10. REVERSE, REPEAT
SELECT REVERSE('hello'); -- 'olleh'
SELECT REPEAT('ab', 4); -- 'abababab'REVERSE is occasionally useful (palindrome checks, finding the last delimiter). REPEAT is common in generating test data or visual separators.
11. FORMAT - Numeric to Thousands-Separated String
SELECT FORMAT(1234567.891, 2); -- '1,234,567.89'
SELECT FORMAT(1234567.891, 0); -- '1,234,568'Handy for report output. Note: the result is a string.
12. Cleaning the Sample Customer Data
-- Clean the messy customer data into a presentable form
SELECT
id,
TRIM(REPLACE(REPLACE(full_name, ' ', ' '), '-', ' ')) AS clean_name,
LOWER(email) AS clean_email,
REPLACE(REPLACE(REPLACE(REPLACE(phone, '-', ''), ' ', ''), '.', ''), '+', '') AS clean_phone,
CONCAT(UPPER(LEFT(city, 1)), LOWER(SUBSTRING(city, 2))) AS proper_city
FROM customers;Expected output:
+----+--------------------+-----------------------+------------------+-----------+
| id | clean_name | clean_email | clean_phone | proper_city|
+----+--------------------+-----------------------+------------------+-----------+
| 1 | Aarav Kumar | aarav.kumar@gmail.com | 919876543210 | Bengaluru |
| 2 | priya SHARMA | priya@mail.com | 919876543211 | Mumbai |
| 3 | Rohan Mehta | rohan@mail.com | 9876543212 | Delhi |
| 4 | meera iyer | meera@mail.com | 0919876543213 | Chennai |
| 5 | Vikram Singh Rajput| v.singh@mail.co.in | 919876543214 | Pune |
+----+--------------------+-----------------------+------------------+-----------+Note: we did not fix capitalization of names here. MySQL lacks INITCAP, so proper-case names require substring manipulation or user-defined functions.
13. Extract First Name - A Real Pattern
-- Everything before the first space
SELECT
full_name,
TRIM(SUBSTRING(TRIM(full_name), 1, LOCATE(' ', TRIM(full_name)) - 1)) AS first_name
FROM customers;
-- Handle the case where there is no space (single-word name)
SELECT
CASE
WHEN LOCATE(' ', TRIM(full_name)) > 0
THEN SUBSTRING(TRIM(full_name), 1, LOCATE(' ', TRIM(full_name)) - 1)
ELSE TRIM(full_name)
END AS first_name
FROM customers;14. Mask Sensitive Data
-- Show only last 4 digits of phone
SELECT CONCAT(REPEAT('X', CHAR_LENGTH(phone) - 4), RIGHT(phone, 4)) AS masked
FROM customers;
-- Show first 2 letters of email + xxxxx + domain
SELECT CONCAT(
LEFT(email, 2),
REPEAT('x', 5),
SUBSTRING(email, LOCATE('@', email))
) AS masked_email
FROM customers;
Code Examples
SELECT
id,
CONCAT(full_name, ' <', email, '>') AS contact_card,
CONCAT_WS(' | ', full_name, email, phone, city) AS summary
FROM customers
LIMIT 3;SELECT
id,
full_name AS original,
-- Collapse multiple spaces, trim, replace dashes
TRIM(REPLACE(REPLACE(full_name, ' ', ' '), '-', ' ')) AS cleaned,
CHAR_LENGTH(TRIM(full_name)) AS clean_length
FROM customers;REPLACE(..., ' ', ' ') collapses double spaces (note: does not handle 3+ spaces in one pass). REPLACE(..., '-', ' ') turns dashes into spaces. TRIM removes leading/trailing whitespace. CHAR_LENGTH shows the character count after cleaning.SELECT
email,
SUBSTRING(email, 1, LOCATE('@', email) - 1) AS username,
SUBSTRING(email, LOCATE('@', email) + 1) AS domain
FROM customers;SELECT
full_name,
TRIM(SUBSTRING(
TRIM(full_name),
1,
CASE
WHEN LOCATE(' ', TRIM(full_name)) = 0 THEN CHAR_LENGTH(TRIM(full_name))
ELSE LOCATE(' ', TRIM(full_name)) - 1
END
)) AS first_name
FROM customers;SELECT
phone AS original,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phone, '-', ''), ' ', ''), '.', ''), '(', ''), ')', '') AS digits_only,
RIGHT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phone, '-', ''), ' ', ''), '.', ''), '+', ''), '0', ''), 10) AS normalized_10_digit
FROM customers;REGEXP_REPLACE(phone, '[^0-9]', '').SELECT
'Hello' AS str,
LENGTH('Hello') AS byte_length,
CHAR_LENGTH('Hello') AS char_length
UNION ALL
SELECT
'नमस्ते',
LENGTH('नमस्ते'),
CHAR_LENGTH('नमस्ते')
UNION ALL
SELECT
' Aarav ',
LENGTH(' Aarav '),
CHAR_LENGTH(' Aarav ');CREATE TABLE products (id INT, name VARCHAR(30));
INSERT INTO products VALUES (1,'Pen'),(42,'Notebook'),(375,'Laptop'),(9999,'Desk');
SELECT
id,
LPAD(id, 6, '0') AS product_code,
CONCAT('PRD-', LPAD(id, 6, '0')) AS sku
FROM products;-- Find emails where the lowercase form appears more than once
SELECT
LOWER(email) AS normalized_email,
COUNT(*) AS occurrences,
GROUP_CONCAT(email) AS original_forms
FROM customers
GROUP BY LOWER(email)
HAVING COUNT(*) > 1;Common Mistakes
Using LENGTH When You Mean CHAR_LENGTH
-- Validate that a username is 5-15 characters
SELECT * FROM users WHERE LENGTH(username) BETWEEN 5 AND 15;SELECT * FROM users WHERE CHAR_LENGTH(username) BETWEEN 5 AND 15;CONCAT With NULL Returns NULL
-- middle_name is NULL for most people
SELECT CONCAT(first_name, ' ', middle_name, ' ', last_name) FROM people;
-- Returns NULL whenever middle_name IS NULL — full name is lost-- Use CONCAT_WS (skips NULLs automatically)
SELECT CONCAT_WS(' ', first_name, middle_name, last_name) FROM people;
-- Or use COALESCE to replace NULL with empty string
SELECT CONCAT(first_name, ' ', COALESCE(middle_name, ''), ' ', last_name) FROM people;SUBSTRING With Off-by-One Error
-- Intent: get the first 5 characters
SELECT SUBSTRING('Bengaluru', 0, 5); -- returns '' in MySQL!SELECT SUBSTRING('Bengaluru', 1, 5); -- 'Benga'
-- Or use LEFT for prefixes
SELECT LEFT('Bengaluru', 5);Forgetting REPLACE Is Case-Sensitive
-- Intent: mask 'Gmail' regardless of case
SELECT REPLACE(email, 'gmail', 'GMAIL') FROM customers;
-- 'Aarav.Kumar@Gmail.COM' unchanged!-- MySQL 8.0+: REGEXP_REPLACE with case-insensitive flag
SELECT REGEXP_REPLACE(email, 'gmail', 'GMAIL', 1, 0, 'i') FROM customers;
-- Older MySQL: uppercase first, then replace, or use LOWER to compare
SELECT REPLACE(LOWER(email), 'gmail', 'GMAIL') FROM customers;Summary
- CONCAT joins strings with no separator. CONCAT_WS takes a separator and safely skips NULL arguments. CONCAT returns NULL if any argument is NULL.
- LENGTH returns bytes; CHAR_LENGTH returns characters. For user-facing validation and Unicode text, use CHAR_LENGTH.
- UPPER and LOWER change case. MySQL has no INITCAP — proper case requires manual substring manipulation or a UDF.
- SUBSTRING(str, start, length) is 1-indexed. LEFT(str, n) and RIGHT(str, n) are more readable for prefixes and suffixes.
- TRIM removes whitespace; LTRIM and RTRIM remove from one side. TRIM with LEADING/TRAILING/BOTH and a specific character cleans other padding.
- REPLACE is case-sensitive. For case-insensitive replacement in MySQL 8.0+, use REGEXP_REPLACE with the 'i' flag.
- LOCATE, INSTR, POSITION all find a substring and return 1-based position (0 if not found). Combine with SUBSTRING to extract parts like email domains.
- LPAD and RPAD add padding to make strings a fixed length. Common use: formatting IDs like '000042' or SKUs like 'PRD-000042'.
- REVERSE and REPEAT are niche. FORMAT(n, d) formats a number with thousands separators and d decimals — returns a string.
- For cleaning data: TRIM then REPLACE to remove bad characters, LOWER for normalized comparison, CONCAT_WS for NULL-safe joining.