Accessing database collections and displaying the result is a peace of cake with various techniques. Users often need to sort the data according to their different need (for e.g. date). For that purpose PostgreSQL's default sort works as expected. However, when user attempts to sort the column that has numbers stored as string then things go wrong, the default sort gives unexpected results due to lexicographical sorting.
Lexicographical Order
The way data (like strings or numbers) is sorted in a "dictionary order," meaning characters are compared one by one, from left to right, just like how words are arranged in a dictionary, with the first character determining the order if the initial characters are the same.
Simply put PostgreSQL compares the words/values, stored as string (whether it is number or actual string) of the columns, from left to right based on their ASCII code or Unicode values. However it works on normal cases for strings but it won't work as expected in case of the number as strings.
It is a case-sensitive sorting algorithm. Uppercase letters are considered smaller than lowercase letters.
Example
Consider a persons table with a name and age column, where age is stored as a string:
name | age
---------|---------
'ram' | '16'
'hari' | '7'
'sita' | '20'
'rita' | '1'
'ashok' | '104'
'prabin' | '2'
SELECT name FROM persons ORDER BY name ASC;
Result
'ashok'
'hari'
'prabin'
'ram'
'rita'
'sita'
Excellent the output is as expected, now lets' see an example for sorting of age which is stored as string.
SELECT age FROM persons ORDER BY age ASC;
Result
'1'
'104'
'16'
'2'
'20'
'7'
This is not what user expected but the according to the default PostgreSQL's behavior this is correct.
Solutions for Sorting of Numerical String
- Type Casting: Casting the value to numbers before sorting makes sure it sorts the numbers as numbers rather than sorting string.
SELECT age FROM persons ORDER BY age::INTEGER ASC;
--or
SELECT age FROM persons ORDER BY CAST(age AS INTEGER) ASC;
Result
'1'
'2'
'7'
'16'
'20'
'104'
This is the correct result as user expected.
- ✅ Ensures correct numerical order, fast execution.
- ❌ Fails if non-numeric values exist.
- Zero Prefix Padding: As we know normally people lives for 100 to 125 years old at max so, what we need to do is make sure the age is always three digits. For that purpose we use 0s before the values if those are not 3digits. The values should be stored by padding already if not we can use following query to simulate padding.
-- Here LPAD means left padding
SELECT age FROM persons ORDER BY LPAD(age, 3, '0') ASC;
Padding Preprocessed columns
'1' → '001'
'2' → '002'
'7' → '007'
'16' → '016'
'20' → '020'
'104' → '104'
Result
'1'
'2'
'7'
'16'
'20'
'104'
- ✅ Works with strings, maintains consistency
- ❌ Needs proper length padding
However lexicographical ordering works for following purposes as normal without any thoughts.
- String sorting algorithms.
- Searching for words in dictionaries.
- Comparing filenames, usernames, or keys in databases.
- Competitive programming challenges requiring string comparisons.
Conclusion
Lexicographical ordering plays a crucial role in string sorting but can cause issues when dealing with numeric strings. While *PostgreSQL’s * default behavior follows lexicographical order, proper handling of numeric strings requires explicit type casting or padding techniques. By understanding and applying these solutions, developers can ensure accurate and expected sorting behavior in their applications.
Have you faced similar sorting challenges? Share your experiences in the comments!
Top comments (0)