Source: Leetcode
In this blog post, we’ll explore how to capitalize words in a text column using PostgreSQL while considering special cases for hyphenated words. This process involves transforming a sentence such that the first letter of each word is capitalized, and hyphenated words are properly handled by capitalizing both parts of the word.
Problem Overview
Imagine you have a table user_content
with two columns: content_id
and content_text
. The content_text
column contains sentences or phrases, and you want to:
Capitalize the first letter of each word in content_text.
For words containing hyphens (like "top-rated" or "well-known"), both parts of the word should be capitalized (e.g., "Top-Rated" and "Well-Known").
Ensure that the rest of the letters in each word are in lowercase.
This transformation is often required for proper formatting in text processing, especially when dealing with user-generated content, titles, or descriptions.
Example:
Input:
user_content table:
+------------+---------------------------------+
| content_id | content_text |
+------------+---------------------------------+
| 1 | hello world of SQL |
| 2 | the QUICK-brown fox |
| 3 | modern-day DATA science |
| 4 | web-based FRONT-end development |
+------------+---------------------------------+
Output:
+------------+---------------------------------+---------------------------------+
| content_id | original_text | converted_text |
+------------+---------------------------------+---------------------------------+
| 1 | hello world of SQL | Hello World Of Sql |
| 2 | the QUICK-brown fox | The Quick-Brown Fox |
| 3 | modern-day DATA science | Modern-Day Data Science |
| 4 | web-based FRONT-end development | Web-Based Front-End Development |
+------------+---------------------------------+---------------------------------+
Solution Approach
We can solve this problem using a combination of string manipulation functions in PostgreSQL, including unnest, string_to_array, substring, and string_agg. Let’s break down the query:
Step-by-Step Explanation
Breaking Down content_text into Words:
First, we need to split the content_text into individual words. We use the string_to_array function to split the text by spaces.
We then apply the unnest function to convert the array of words into individual rows. This allows us to process each word separately.
SELECT
content_id,
content_text,
unnest(string_to_array(content_text, ' ')) AS word
FROM user_content
Here, the unnest(string_to_array(content_text, ' '))
part splits the content_text into individual words, and each word is returned in a separate row.
Handling Hyphenated Words:
Now, we need to address words that contain hyphens (e.g., "top-rated").
We use a CASE statement to check if a word contains a hyphen (LIKE '%-%').
For hyphenated words, we split the word by the hyphen, capitalize each part, and then join the parts back with a hyphen.
WHEN word LIKE '%-%' THEN
(
SELECT string_agg(
UPPER(substring(part, 1, 1)) || LOWER(substring(part, 2)), '-'
)
FROM unnest(string_to_array(word, '-')) AS part
)
The string_to_array(word, '-')
splits the hyphenated word into parts (e.g., "top-rated" becomes ['top', 'rated']).
unnest then converts the array into individual rows, and the UPPER(substring(part, 1, 1)) || LOWER(substring(part, 2))
part capitalizes each part by making the first letter uppercase and the rest lowercase.
Finally, string_agg(..., '-')
joins the capitalized parts back together with a hyphen.
Regular Word Capitalization:
For words that do not contain hyphens, we simply capitalize the first letter and make the rest of the letters lowercase.
ELSE
UPPER(substring(word, 1, 1)) || LOWER(substring(word, 2))
This ensures that for words like "hello" or "world", the first letter is capitalized, and the rest are in lowercase, turning "hello" into "Hello" and "world" into "World".
Aggregating Words Back Together:
After processing each word, we need to combine them back into a sentence. We use string_agg to aggregate the processed words into a single string, separated by spaces.
string_agg(
CASE
...
END, ' ') AS converted_text
This step ensures that all the processed words are joined together to form the final sentence.
Final Output:
Finally, we select the content_id, the original content_text, and the newly converted converted_text for each row. We also order the results by content_id to maintain the original order.
SELECT content_id, original_text, converted_text
FROM SplitWords
ORDER BY content_id;
Full Query
Putting everything together, the full query looks like this:
WITH SplitWords AS (
SELECT
content_id,
content_text AS original_text,
string_agg(
CASE
-- Check if the word contains a hyphen
WHEN word LIKE '%-%' THEN
-- Use a lateral join to split by hyphen and capitalize each part
(
SELECT string_agg(UPPER(substring(part, 1, 1)) || LOWER(substring(part, 2)), '-')
FROM unnest(string_to_array(word, '-')) AS part
)
ELSE
-- For regular words, just capitalize the first letter
UPPER(substring(word, 1, 1)) || LOWER(substring(word, 2))
END, ' ') AS converted_text
FROM (
SELECT
content_id,
content_text,
unnest(string_to_array(content_text, ' ')) AS word
FROM user_content
) AS Words
GROUP BY content_id, content_text
)
SELECT content_id, original_text, converted_text
FROM SplitWords
ORDER BY content_id;
Key Takeaways
This query uses PostgreSQL’s powerful string manipulation functions (substring, string_agg, unnest) to handle text transformations.
Hyphenated words are split, processed, and rejoined with proper capitalization.
The string_agg function is essential for rejoining words after processing.
This approach ensures that all words are correctly capitalized according to the specified rules.
Conclusion
By using PostgreSQL’s text manipulation capabilities, we can efficiently capitalize words in a sentence and handle special cases like hyphenated words. This query serves as a practical solution for text formatting in PostgreSQL, ensuring consistency and correct capitalization across various text inputs.
Top comments (0)