When working on a multilingual website, it often happens that we need to deal with special and accented characters. In Québec (Canada), Websites are most of the time bilingual because we speak French and English. This can cause a headache to developers when dealing with sorting arrays by alphabetical order due to “caractères spéciaux” (french for special characters).
I developed a few methods that can help overcome the difficulties of multidimensional array sorting.
The problem with sorting special characters
The best way to explain the problem is with an example. Let’s say we have a multidimensional array of category objects returned by an API that we want ordered alphabetically by category name for a specific language :
<?php
require 'StringHelper.php';
$helper = new StringHelper();
$categories = [];
$news = new stdClass();
$news->names = ["fr" => "Actualités", "en" => "News"];
$categories[] = $news;
$sports = new stdClass();
$sports->names = ["fr" => "Sports", "en" => "Sports"];
$categories[] = $sports;
$home = new stdClass();
$home->names = ["fr" => "Accueil", "en" => "Home"];
$categories[] = $home;
$events = new stdClass();
$events->names = ["fr" => "Événements", "en" => "Events"];
$categories[] = $events;
$special = new stdClass();
$special->names = ["fr" => "Spécial", "en" => "Special"];
$categories[] = $special;
// Alphabetically sorted result in french : Accueil, Actualités, Événement, Spécial, Sports
// Alphabetically sorted result in english : Events, Home, News, Special, Sports
// Original order
var_dump($categories);
$helper->alphabeticalCompareArrayByKey($categories, 'names', 'fr');
// Alphabetical order (french) by names
var_dump($categories);
It’s a challenge to order this array correctly because of the accented characters and because of the way the array is formatted (with an array of objects).
The solution for sorting an array of objects
I have solved this problem by creating two different functions in a StringHelper.php
class that can be used in the application :
<?php
class StringHelper
{
/**
* Compare an associative multidimensionnal array by specific object value
*
* @param array &$array Reference of the array to sort
* @param string $element Element to order by specific object key from
* @param string $key Sort array by this specified key of element
* @return void
*/
public static function alphabeticalCompareArrayByKey(&$array, string $element, string $key){
usort($array, function($a, $b) use ($element, $key) {
return strcasecmp(self::transliterateString($a->{$element}->[$key]), self::transliterateString($b->{$element}->[$key]));
});
}
/**
* Replace accented caracters in string
*
* Example :
* echo transliterateString('Événenement'); // evenement
*
* @param string String with accented caracters
* @return string Transliterated string
*/
public static function transliterateString($string)
{
$transliterationTable = ['á' => 'a', 'Á' => 'A', 'à' => 'a', 'À' => 'A', 'ă' => 'a', 'Ă' => 'A', 'â' => 'a', 'Â' => 'A', 'å' => 'a', 'Å' => 'A', 'ã' => 'a', 'Ã' => 'A', 'ą' => 'a', 'Ą' => 'A', 'ā' => 'a', 'Ā' => 'A', 'ä' => 'ae', 'Ä' => 'AE', 'æ' => 'ae', 'Æ' => 'AE', 'ḃ' => 'b', 'Ḃ' => 'B', 'ć' => 'c', 'Ć' => 'C', 'ĉ' => 'c', 'Ĉ' => 'C', 'č' => 'c', 'Č' => 'C', 'ċ' => 'c', 'Ċ' => 'C', 'ç' => 'c', 'Ç' => 'C', 'ď' => 'd', 'Ď' => 'D', 'ḋ' => 'd', 'Ḋ' => 'D', 'đ' => 'd', 'Đ' => 'D', 'ð' => 'dh', 'Ð' => 'Dh', 'é' => 'e', 'É' => 'E', 'è' => 'e', 'È' => 'E', 'ĕ' => 'e', 'Ĕ' => 'E', 'ê' => 'e', 'Ê' => 'E', 'ě' => 'e', 'Ě' => 'E', 'ë' => 'e', 'Ë' => 'E', 'ė' => 'e', 'Ė' => 'E', 'ę' => 'e', 'Ę' => 'E', 'ē' => 'e', 'Ē' => 'E', 'ḟ' => 'f', 'Ḟ' => 'F', 'ƒ' => 'f', 'Ƒ' => 'F', 'ğ' => 'g', 'Ğ' => 'G', 'ĝ' => 'g', 'Ĝ' => 'G', 'ġ' => 'g', 'Ġ' => 'G', 'ģ' => 'g', 'Ģ' => 'G', 'ĥ' => 'h', 'Ĥ' => 'H', 'ħ' => 'h', 'Ħ' => 'H', 'í' => 'i', 'Í' => 'I', 'ì' => 'i', 'Ì' => 'I', 'î' => 'i', 'Î' => 'I', 'ï' => 'i', 'Ï' => 'I', 'ĩ' => 'i', 'Ĩ' => 'I', 'į' => 'i', 'Į' => 'I', 'ī' => 'i', 'Ī' => 'I', 'ĵ' => 'j', 'Ĵ' => 'J', 'ķ' => 'k', 'Ķ' => 'K', 'ĺ' => 'l', 'Ĺ' => 'L', 'ľ' => 'l', 'Ľ' => 'L', 'ļ' => 'l', 'Ļ' => 'L', 'ł' => 'l', 'Ł' => 'L', 'ṁ' => 'm', 'Ṁ' => 'M', 'ń' => 'n', 'Ń' => 'N', 'ň' => 'n', 'Ň' => 'N', 'ñ' => 'n', 'Ñ' => 'N', 'ņ' => 'n', 'Ņ' => 'N', 'ó' => 'o', 'Ó' => 'O', 'ò' => 'o', 'Ò' => 'O', 'ô' => 'o', 'Ô' => 'O', 'ő' => 'o', 'Ő' => 'O', 'õ' => 'o', 'Õ' => 'O', 'ø' => 'oe', 'Ø' => 'OE', 'ō' => 'o', 'Ō' => 'O', 'ơ' => 'o', 'Ơ' => 'O', 'ö' => 'oe', 'Ö' => 'OE', 'ṗ' => 'p', 'Ṗ' => 'P', 'ŕ' => 'r', 'Ŕ' => 'R', 'ř' => 'r', 'Ř' => 'R', 'ŗ' => 'r', 'Ŗ' => 'R', 'ś' => 's', 'Ś' => 'S', 'ŝ' => 's', 'Ŝ' => 'S', 'š' => 's', 'Š' => 'S', 'ṡ' => 's', 'Ṡ' => 'S', 'ş' => 's', 'Ş' => 'S', 'ș' => 's', 'Ș' => 'S', 'ß' => 'SS', 'ť' => 't', 'Ť' => 'T', 'ṫ' => 't', 'Ṫ' => 'T', 'ţ' => 't', 'Ţ' => 'T', 'ț' => 't', 'Ț' => 'T', 'ŧ' => 't', 'Ŧ' => 'T', 'ú' => 'u', 'Ú' => 'U', 'ù' => 'u', 'Ù' => 'U', 'ŭ' => 'u', 'Ŭ' => 'U', 'û' => 'u', 'Û' => 'U', 'ů' => 'u', 'Ů' => 'U', 'ű' => 'u', 'Ű' => 'U', 'ũ' => 'u', 'Ũ' => 'U', 'ų' => 'u', 'Ų' => 'U', 'ū' => 'u', 'Ū' => 'U', 'ư' => 'u', 'Ư' => 'U', 'ü' => 'ue', 'Ü' => 'UE', 'ẃ' => 'w', 'Ẃ' => 'W', 'ẁ' => 'w', 'Ẁ' => 'W', 'ŵ' => 'w', 'Ŵ' => 'W', 'ẅ' => 'w', 'Ẅ' => 'W', 'ý' => 'y', 'Ý' => 'Y', 'ỳ' => 'y', 'Ỳ' => 'Y', 'ŷ' => 'y', 'Ŷ' => 'Y', 'ÿ' => 'y', 'Ÿ' => 'Y', 'ź' => 'z', 'Ź' => 'Z', 'ž' => 'z', 'Ž' => 'Z', 'ż' => 'z', 'Ż' => 'Z', 'þ' => 'th', 'Þ' => 'Th', 'µ' => 'u', 'а' => 'a', 'А' => 'a', 'б' => 'b', 'Б' => 'b', 'в' => 'v', 'В' => 'v', 'г' => 'g', 'Г' => 'g', 'д' => 'd', 'Д' => 'd', 'е' => 'e', 'Е' => 'e', 'ё' => 'e', 'Ё' => 'e', 'ж' => 'zh', 'Ж' => 'zh', 'з' => 'z', 'З' => 'z', 'и' => 'i', 'И' => 'i', 'й' => 'j', 'Й' => 'j', 'к' => 'k', 'К' => 'k', 'л' => 'l', 'Л' => 'l', 'м' => 'm', 'М' => 'm', 'н' => 'n', 'Н' => 'n', 'о' => 'o', 'О' => 'o', 'п' => 'p', 'П' => 'p', 'р' => 'r', 'Р' => 'r', 'с' => 's', 'С' => 's', 'т' => 't', 'Т' => 't', 'у' => 'u', 'У' => 'u', 'ф' => 'f', 'Ф' => 'f', 'х' => 'h', 'Х' => 'h', 'ц' => 'c', 'Ц' => 'c', 'ч' => 'ch', 'Ч' => 'ch', 'ш' => 'sh', 'Ш' => 'sh', 'щ' => 'sch', 'Щ' => 'sch', 'ъ' => '', 'Ъ' => '', 'ы' => 'y', 'Ы' => 'y', 'ь' => '', 'Ь' => '', 'э' => 'e', 'Э' => 'e', 'ю' => 'ju', 'Ю' => 'ju', 'я' => 'ja', 'Я' => 'ja'];
$transliteratedString = str_replace(array_keys($transliterationTable), array_values($transliterationTable), $string);
return trim(strtolower($transliteratedString));
}
}
The two functions look complicated at first sight, but they are not really.
Explanation and result (sorting the multidimensional array) 🧙♂️
The first function receives 3 parameters, the multidimensional array reference (or array of objects), the element and key that are used to sort the array.
Let’s say we want to order the previous array by french names. The parameters would be :
alphabeticalCompareArrayByKey($categories, 'names', 'fr');
As we are sending the array as a reference, no need to reassign it to a variable. The usort
function sorts an array by values using a user-defined comparison function.
Our comparison function is a binary safe case-insensitive string comparison : strcasecmp()
.
Within the comparison we make sure the accented characters are replaced with the adequate ones (ie. é = e, â = a).
Our comparison will then be successful. 💪
The previous example only works with an array of objects, but you could easily adapt it to compare an array of array by modifying the strcasecmp()
part by :
return strcasecmp(self::transliterateString($a[$element][$key]), self::transliterateString($b[$element][$key]));
Let me know if this article helped you to sort your sorting problems!
Cover Image : Edu Grande (@edgr) from Unsplash
Top comments (4)
Nice, one thing I’d change is to get the characters by char code instead of manually writing them out because then you can’t miss any by accident if you use char code ranges for each language.
Thanks, that is a good idea. The characters array was found somewhere on the Internet and did fit my needs for French accented characters. Although, you can find characters for multiple others languages in it. Getting those by char code would be optimal!
I think you mean 'accented'
Thanks! I was convinced I was writing it the correct way. In French, accented is written "accentué".