Strings are fundamental to any programming language, and Elixir's implementation offers unique features and capabilities that make string manipulation both powerful and efficient. From UTF-8 encoding to advanced interpolation techniques, understanding how Elixir handles strings is crucial for building robust applications.
Binary strings (double quotes) are the most common choice in modern Elixir applications, especially when working with text processing, user input, or web applications. Character lists (single quotes) are primarily used when interfacing with Erlang functions that haven't been updated to work with binary strings.
Note: The examples in this article use Elixir 1.17.3. While most string operations should work across different versions, some functionality might vary. Certain features like
Mix.install/1
require Elixir >= 1.12.
Table of Contents
- Introduction
- String Types and Binary Representation
- String Creation and Basic Operations
- String Interpolation Deep Dive
- String Module Functions
- Pattern Matching with Strings
- Working with UTF8
- Regular Expressions
- Heredocs and Documentation
- Extending String Capabilities
- Conclusion
- Further Reading
- Next Steps
Introduction
In Elixir, strings are not just sequences of characters - they are binary sequences encoded in UTF-8. This design choice brings both power and sophistication to string handling. Understanding the nuances of string manipulation in Elixir is essential for writing efficient and maintainable code.
String Types and Binary Representation
Binary Strings vs Character Lists
One common source of confusion for newcomers to Elixir is the distinction between binary strings (double quotes) and character lists (single quotes). While both can represent text, they serve different purposes and have different performance characteristics:
Elixir provides two ways to represent strings, each with its own use cases:
# Binary strings (double quotes)
iex> string = "Hello, World!"
"Hello, World!"
iex> is_binary(string)
true
iex> byte_size(string)
13
# Character lists (single quotes)
iex> charlist = 'Hello, World!'
~c"Hello, World!"
iex> is_list(charlist)
true
iex> length(charlist)
13
Understanding Binary Representation
In Elixir, strings are stored as UTF-8 encoded binaries, which means they can be viewed and manipulated at the byte level. For ASCII characters, each character occupies exactly one byte, while UTF-8 characters may use multiple bytes. Through pattern matching and binary operations, we can examine and process these strings efficiently, considering both their raw byte representation and logical character length.:
# Viewing raw binary representation of ASCII string
iex> inspect("hello", binaries: :as_binaries)
"<<104, 101, 108, 108, 111>>"
# Converting string to charlist using comprehension
iex> for <<byte <- "hello">>, do: byte
~c"hello"
# Demonstrating that each ASCII character takes 1 byte
iex> byte_size("hello")
5
# Converting binary string to charlist
iex> "hello" |> :binary.bin_to_list()
~c"hello"
# UTF-8 characters (variable bytes)
iex> string = "héllo"
"héllo"
iex> byte_size(string)
6 # 'é' takes 2 bytes
iex> String.length(string)
5 # But it's still 5 characters
# Viewing raw binary representation
iex> "héllo" <> <<0>>
<<104, 195, 169, 108, 108, 111, 0>>
# Pattern matching on bytes
iex> <<head::utf8, rest::binary>> = "héllo"
"héllo"
iex> head # Unicode codepoint for 'h'
104
iex> rest
"éllo"
String Creation and Basic Operations
String Literals and Special Characters
# Basic string literal
iex> "Hello, World!"
"Hello, World!"
# Escape sequences
iex> "Line 1\nLine 2\tTabbed"
"Line 1\nLine 2\tTabbed"
# Unicode escape sequences
iex> "\u0061" # Latin small letter 'a'
"a"
iex> "\u{1F600}" # Unicode emoji (😀)
"😀"
# Raw strings (no escape processing)
iex> ~S(Hello\nWorld)
"Hello\\nWorld"
String Concatenation
# Using the <> operator
iex> "Hello" <> " " <> "World"
"Hello World"
# Memory-efficient concatenation of multiple strings using IO lists
iex> IO.iodata_to_binary(["Hello", " ", "World"])
"Hello World"
# Using Enum.join
iex> ["Hello", "World"] |> Enum.join(" ")
"Hello World"
# Building strings efficiently in a loop
iex> words = ["the", "quick", "brown", "fox"]
["the", "quick", "brown", "fox"]
# intersperse - inserts a separator between each list element (useful for joining words)
iex> words |> Enum.intersperse(" ") |> IO.iodata_to_binary()
"the quick brown fox"
# Multi-line format for readability:
# words
# |> Enum.intersperse(" ")
# |> IO.iodata_to_binary()
String Interpolation Deep Dive
String interpolation in Elixir is powerful and flexible, offering various ways to embed expressions within strings.
Basic Interpolation
# Simple variable interpolation
iex> name = "World"
iex> "Hello #{name}!"
"Hello World!"
# Expression interpolation
iex> "2 + 2 = #{2 + 2}"
"2 + 2 = 4"
# Function call interpolation
iex> "Uppercase: #{String.upcase("hello")}"
"Uppercase: HELLO"
Advanced Interpolation Techniques
# Pattern matching in interpolation
iex> {:ok, value} = {:ok, 42}
iex> "The value is #{value}"
"The value is 42"
# Calling anonymous functions
iex> formatter = fn x -> String.pad_leading(Integer.to_string(x), 3, "0") end
iex> "Number: #{formatter.(42)}"
"Number: 042"
# Conditional interpolation
iex> show_details = true
iex> "Status: #{if show_details, do: "Active (since 2024)", else: "Active"}"
"Status: Active (since 2024)"
# Map access in interpolation
iex> user = %{name: "John", age: 30}
%{name: "John", age: 30}
iex> "#{user.name} is #{user.age} years old"
"John is 30 years old"
Interpolation with Custom Formatting
# Number formatting
iex> number = 1234.5678
iex> "#{:erlang.float_to_binary(number, decimals: 2)}"
"1234.57"
# Date formatting
iex> date = ~D[2024-03-21]
iex> "#{Calendar.strftime(date, "%B %d, %Y")}"
"March 21, 2024"
# Custom padding
iex> value = 42
iex> "#{String.pad_leading(Integer.to_string(value), 5, "0")}"
"00042"
Multiple Interpolations and Performance
# Multiple interpolations
iex> first = "Hello"
iex> last = "World"
iex> "#{first}, #{String.upcase(last)}!"
"Hello, WORLD!"
String Module Functions
Case Transformations
# Basic case transformations
iex> String.upcase("hello")
"HELLO"
iex> String.downcase("WORLD")
"world"
iex> String.capitalize("hello world")
"Hello world"
String Manipulation
# Trimming
iex> String.trim(" hello ")
"hello"
iex> String.trim_leading(" hello")
"hello"
iex> String.trim_trailing("hello ")
"hello"
# Padding
iex> String.pad_leading("123", 5, "0")
"00123"
iex> String.pad_trailing("hello", 10, ".")
"hello....."
# Splitting
iex> String.split("hello,world", ",")
["hello", "world"]
iex> String.split("hello world", " ", parts: 2)
["hello", "world"]
# Replacing
iex> String.replace("hello world", "world", "elixir")
"hello elixir"
iex> String.replace_leading("hello hello", "he", "je")
"jello hello"
iex> String.replace_trailing("hello hello", "lo", "p")
"hello help"
String Analysis
# Length and size
iex> String.length("hello")
5
iex> byte_size("hello")
5
iex> String.length("héllo") # UTF-8 aware
5
iex> byte_size("héllo") # Actual bytes
6
# Content checks
iex> String.contains?("hello world", "world")
true
iex> String.starts_with?("hello", "he")
true
iex> String.ends_with?("hello", "lo")
true
# String comparison
# Jaro Distance - calculates string similarity (0.0 to 1.0, where 1.0 means exact match)
iex> String.jaro_distance("hello", "hello")
1.0
# Myers Difference - shows differences between strings by identifying equal (eq), deleted (del) and inserted (ins) parts
iex> String.myers_difference("hello", "hallo")
[eq: "h", del: "e", ins: "a", eq: "llo"]
Pattern Matching with Strings
Pattern matching is one of Elixir's most powerful features, and it works exceptionally well with strings:
# Basic string pattern matching
iex> "Hello " <> rest = "Hello World"
iex> rest
"World"
# Multiple captures
# Pattern matching with specific byte size
iex> <<head::binary-size(5)>> <> " " <> tail = "Hello World"
iex> head
"Hello"
iex> tail
"World"
# Binary pattern matching
iex> <<x, y, z>> = "abc"
iex> {x, y, z}
{97, 98, 99}
# UTF-8 aware pattern matching
iex> <<char::utf8, rest::binary>> = "über"
iex> char
252
iex> rest
"ber"
Working with UTF8
Elixir has excellent support for Unicode and UTF-8 encoding:
# Unicode escape sequences
iex> "\u0061" # Latin small letter a
"a"
iex> "\u0308" # Combining diaeresis
"̈"
# Graphemes vs Codepoints
# Graphemes (visible characters) vs Codepoints (Unicode code points)
iex> String.graphemes("café")
["c", "a", "f", "é"]
iex> String.codepoints("café")
["c", "a", "f", "é"]
# Unicode operations
# next_grapheme - returns the next visible character and the rest of the string
iex> String.next_grapheme("é")
{"é", ""}
iex> String.first("élixir")
"é"
iex> String.last("café")
"é"
# Unicode normalization
iex> string = "e\u0301" # e with acute accent
# Unicode normalization - converts character combinations into standardized forms
# :nfd (decomposition) and :nfc (composition) are normalization forms
iex> String.normalize(string, :nfd)
"é"
iex> String.normalize(string, :nfc)
"é"
Regular Expressions
Elixir provides robust support for regular expressions through both the Regex
module and the ~r
sigil:
# Basic regex creation and matching
iex> regex = ~r/hello/i
iex> String.match?("Hello World", regex)
true
# Regex with named captures
iex> regex = ~r/(?<greeting>hello) (?<name>\w+)/i
iex> Regex.named_captures(regex, "Hello World")
%{"greeting" => "Hello", "name" => "World"}
# Complex pattern matching
iex> text = "Contact us at info@example.com or support@example.com"
iex> email_regex = ~r/[\w.+-]+@[a-z\d-]+(?:\.[a-z\d-]+)*/i
iex> Regex.scan(email_regex, text)
[["info@example.com"], ["support@example.com"]]
# String replacement with regex
iex> Regex.replace(~r/\d+/, "Date: 2024-03-21", "[REDACTED]")
"Date: [REDACTED]-[REDACTED]-[REDACTED]"
# Regex options
iex> regex = ~r/hello/iu # case-insensitive and Unicode
iex> String.match?("HELLO", regex)
true
Heredocs and Documentation
Heredocs are particularly useful for multiline strings and documentation. Here are some examples:
# Basic heredoc example
message = """
This is a multiline
string using heredoc syntax.
Indentation is preserved.
"""
# Module documentation example
defmodule StringUtils do
@moduledoc """
Provides utility functions for string manipulation.
## Examples
assert StringUtils.titleize("hello world") == "Hello World"
"""
@doc """
Converts a string to title case.
## Parameters
- input: The string to convert
## Examples
assert StringUtils.titleize("hello world") == "Hello World"
"""
def titleize(input) when is_binary(input) do
input
|> String.split()
|> Enum.map(&String.capitalize/1)
|> Enum.join(" ")
end
end
Extending String Capabilities
While Elixir's built-in string functionality is comprehensive, additional libraries can extend these capabilities for specialized tasks.
URL Slugs with slugify
The slugify
library is essential for generating URL-friendly strings and normalizing text. It's particularly useful for:
- Creating SEO-friendly URLs
- Converting titles to URL paths
- Handling multilingual URLs
# First, install the library in IEx
iex> Mix.install([{:slugify, "~> 1.3"}])
# Basic slugification
iex> Slug.slugify("Hello, World")
"hello-world"
# Custom separator
iex> Slug.slugify("Hello World", separator: "_")
"hello_world"
# Language-specific slugification
iex> Slug.slugify("こんにちは世界", separator: "-")
"kon-ni-chi-ha-shi-jie"
Conclusion
Throughout this guide to working with Elixir strings, we've covered essential concepts:
- Understanding Elixir's binary strings and their UTF-8 nature
- Leveraging pattern matching for efficient string operations
- Working effectively with Unicode and international text
Remember that string operations can significantly impact your application's performance, especially at scale.
Further Reading
- Elixir Documentation - Strings
- Unicode and UTF-8 in Elixir
- IO Documentation - IO Lists and Chardata
- Getting Started - Binaries, Strings and Charlists
- Regex Documentation
- EEx Documentation
The more you work with Elixir strings, the more you'll appreciate the elegance and power of its string handling capabilities. The key is to understand the underlying principles and choose the right tools for your specific use case.
Next Steps
In the upcoming article, we'll explore:
- Understanding Atoms in Elixir
Top comments (0)