DEV Community

Cover image for Learning Elixir: Working with Strings
João Paulo Abreu
João Paulo Abreu

Posted on • Edited on

Learning Elixir: Working with Strings

Strings are fundamental to any programming language, and Elixir's implementation offers unique features and capabilities that make string manipulation both powerful and efficient. From UTF-8 encoding to advanced interpolation techniques, understanding how Elixir handles strings is crucial for building robust applications.

Binary strings (double quotes) are the most common choice in modern Elixir applications, especially when working with text processing, user input, or web applications. Character lists (single quotes) are primarily used when interfacing with Erlang functions that haven't been updated to work with binary strings.

Note: The examples in this article use Elixir 1.17.3. While most string operations should work across different versions, some functionality might vary. Certain features like Mix.install/1 require Elixir >= 1.12.

Table of Contents

Introduction

In Elixir, strings are not just sequences of characters - they are binary sequences encoded in UTF-8. This design choice brings both power and sophistication to string handling. Understanding the nuances of string manipulation in Elixir is essential for writing efficient and maintainable code.

String Types and Binary Representation

Binary Strings vs Character Lists

One common source of confusion for newcomers to Elixir is the distinction between binary strings (double quotes) and character lists (single quotes). While both can represent text, they serve different purposes and have different performance characteristics:

Elixir provides two ways to represent strings, each with its own use cases:

# Binary strings (double quotes)
iex> string = "Hello, World!"
"Hello, World!"
iex> is_binary(string)
true
iex> byte_size(string)
13

# Character lists (single quotes)
iex> charlist = 'Hello, World!'
~c"Hello, World!"
iex> is_list(charlist)
true
iex> length(charlist)
13
Enter fullscreen mode Exit fullscreen mode

Understanding Binary Representation

In Elixir, strings are stored as UTF-8 encoded binaries, which means they can be viewed and manipulated at the byte level. For ASCII characters, each character occupies exactly one byte, while UTF-8 characters may use multiple bytes. Through pattern matching and binary operations, we can examine and process these strings efficiently, considering both their raw byte representation and logical character length.:

# Viewing raw binary representation of ASCII string
iex> inspect("hello", binaries: :as_binaries)
"<<104, 101, 108, 108, 111>>"

# Converting string to charlist using comprehension
iex> for <<byte <- "hello">>, do: byte
~c"hello"

# Demonstrating that each ASCII character takes 1 byte
iex> byte_size("hello")
5

# Converting binary string to charlist
iex> "hello" |> :binary.bin_to_list()
~c"hello"

# UTF-8 characters (variable bytes)
iex> string = "héllo"
"héllo"
iex> byte_size(string)
6  # 'é' takes 2 bytes
iex> String.length(string)
5  # But it's still 5 characters

# Viewing raw binary representation
iex> "héllo" <> <<0>>
<<104, 195, 169, 108, 108, 111, 0>>

# Pattern matching on bytes
iex> <<head::utf8, rest::binary>> = "héllo"
"héllo"
iex> head  # Unicode codepoint for 'h'
104
iex> rest
"éllo"
Enter fullscreen mode Exit fullscreen mode

String Creation and Basic Operations

String Literals and Special Characters

# Basic string literal
iex> "Hello, World!"
"Hello, World!"

# Escape sequences
iex> "Line 1\nLine 2\tTabbed"
"Line 1\nLine 2\tTabbed"

# Unicode escape sequences
iex> "\u0061"  # Latin small letter 'a'
"a"
iex> "\u{1F600}"  # Unicode emoji (😀)
"😀"

# Raw strings (no escape processing)
iex> ~S(Hello\nWorld)
"Hello\\nWorld"
Enter fullscreen mode Exit fullscreen mode

String Concatenation

# Using the <> operator
iex> "Hello" <> " " <> "World"
"Hello World"

# Memory-efficient concatenation of multiple strings using IO lists
iex> IO.iodata_to_binary(["Hello", " ", "World"])
"Hello World"

# Using Enum.join
iex> ["Hello", "World"] |> Enum.join(" ")
"Hello World"

# Building strings efficiently in a loop
iex> words = ["the", "quick", "brown", "fox"]
["the", "quick", "brown", "fox"]
# intersperse - inserts a separator between each list element (useful for joining words)
iex> words |> Enum.intersperse(" ") |> IO.iodata_to_binary()
"the quick brown fox"

# Multi-line format for readability:
# words
# |> Enum.intersperse(" ")
# |> IO.iodata_to_binary()
Enter fullscreen mode Exit fullscreen mode

String Interpolation Deep Dive

String interpolation in Elixir is powerful and flexible, offering various ways to embed expressions within strings.

Basic Interpolation

# Simple variable interpolation
iex> name = "World"
iex> "Hello #{name}!"
"Hello World!"

# Expression interpolation
iex> "2 + 2 = #{2 + 2}"
"2 + 2 = 4"

# Function call interpolation
iex> "Uppercase: #{String.upcase("hello")}"
"Uppercase: HELLO"
Enter fullscreen mode Exit fullscreen mode

Advanced Interpolation Techniques

# Pattern matching in interpolation
iex> {:ok, value} = {:ok, 42}
iex> "The value is #{value}"
"The value is 42"

# Calling anonymous functions
iex> formatter = fn x -> String.pad_leading(Integer.to_string(x), 3, "0") end

iex> "Number: #{formatter.(42)}"
"Number: 042"

# Conditional interpolation
iex> show_details = true
iex> "Status: #{if show_details, do: "Active (since 2024)", else: "Active"}"
"Status: Active (since 2024)"

# Map access in interpolation
iex> user = %{name: "John", age: 30}
%{name: "John", age: 30}

iex> "#{user.name} is #{user.age} years old"
"John is 30 years old"
Enter fullscreen mode Exit fullscreen mode

Interpolation with Custom Formatting

# Number formatting
iex> number = 1234.5678
iex> "#{:erlang.float_to_binary(number, decimals: 2)}"
"1234.57"

# Date formatting
iex> date = ~D[2024-03-21]
iex> "#{Calendar.strftime(date, "%B %d, %Y")}"
"March 21, 2024"

# Custom padding
iex> value = 42
iex> "#{String.pad_leading(Integer.to_string(value), 5, "0")}"
"00042"
Enter fullscreen mode Exit fullscreen mode

Multiple Interpolations and Performance

# Multiple interpolations
iex> first = "Hello"
iex> last = "World"
iex> "#{first}, #{String.upcase(last)}!"
"Hello, WORLD!"
Enter fullscreen mode Exit fullscreen mode

String Module Functions

Case Transformations

# Basic case transformations
iex> String.upcase("hello")
"HELLO"

iex> String.downcase("WORLD")
"world"

iex> String.capitalize("hello world")
"Hello world"
Enter fullscreen mode Exit fullscreen mode

String Manipulation

# Trimming
iex> String.trim("  hello  ")
"hello"
iex> String.trim_leading("  hello")
"hello"
iex> String.trim_trailing("hello  ")
"hello"

# Padding
iex> String.pad_leading("123", 5, "0")
"00123"
iex> String.pad_trailing("hello", 10, ".")
"hello....."

# Splitting
iex> String.split("hello,world", ",")
["hello", "world"]
iex> String.split("hello world", " ", parts: 2)
["hello", "world"]

# Replacing
iex> String.replace("hello world", "world", "elixir")
"hello elixir"
iex> String.replace_leading("hello hello", "he", "je")
"jello hello"
iex> String.replace_trailing("hello hello", "lo", "p")
"hello help"
Enter fullscreen mode Exit fullscreen mode

String Analysis

# Length and size
iex> String.length("hello")
5
iex> byte_size("hello")
5
iex> String.length("héllo")  # UTF-8 aware
5
iex> byte_size("héllo")      # Actual bytes
6

# Content checks
iex> String.contains?("hello world", "world")
true
iex> String.starts_with?("hello", "he")
true
iex> String.ends_with?("hello", "lo")
true

# String comparison
# Jaro Distance - calculates string similarity (0.0 to 1.0, where 1.0 means exact match)
iex> String.jaro_distance("hello", "hello")
1.0
# Myers Difference - shows differences between strings by identifying equal (eq), deleted (del) and inserted (ins) parts
iex> String.myers_difference("hello", "hallo")
[eq: "h", del: "e", ins: "a", eq: "llo"]
Enter fullscreen mode Exit fullscreen mode

Pattern Matching with Strings

Pattern matching is one of Elixir's most powerful features, and it works exceptionally well with strings:

# Basic string pattern matching
iex> "Hello " <> rest = "Hello World"
iex> rest
"World"

# Multiple captures
# Pattern matching with specific byte size
iex> <<head::binary-size(5)>> <> " " <> tail = "Hello World"
iex> head
"Hello"
iex> tail
"World"

# Binary pattern matching
iex> <<x, y, z>> = "abc"
iex> {x, y, z}
{97, 98, 99}

# UTF-8 aware pattern matching
iex> <<char::utf8, rest::binary>> = "über"
iex> char
252
iex> rest
"ber"
Enter fullscreen mode Exit fullscreen mode

Working with UTF8

Elixir has excellent support for Unicode and UTF-8 encoding:

# Unicode escape sequences
iex> "\u0061" # Latin small letter a
"a"
iex> "\u0308" # Combining diaeresis
"̈"

# Graphemes vs Codepoints
# Graphemes (visible characters) vs Codepoints (Unicode code points)
iex> String.graphemes("café")
["c", "a", "f", "é"]
iex> String.codepoints("café")
["c", "a", "f", "é"]

# Unicode operations
# next_grapheme - returns the next visible character and the rest of the string
iex> String.next_grapheme("é")
{"é", ""}
iex> String.first("élixir")
"é"
iex> String.last("café")
"é"

# Unicode normalization
iex> string = "e\u0301" # e with acute accent
# Unicode normalization - converts character combinations into standardized forms
# :nfd (decomposition) and :nfc (composition) are normalization forms
iex> String.normalize(string, :nfd)
"é"
iex> String.normalize(string, :nfc)
"é"
Enter fullscreen mode Exit fullscreen mode

Regular Expressions

Elixir provides robust support for regular expressions through both the Regex module and the ~r sigil:

# Basic regex creation and matching
iex> regex = ~r/hello/i
iex> String.match?("Hello World", regex)
true

# Regex with named captures
iex> regex = ~r/(?<greeting>hello) (?<name>\w+)/i
iex> Regex.named_captures(regex, "Hello World")
%{"greeting" => "Hello", "name" => "World"}

# Complex pattern matching
iex> text = "Contact us at info@example.com or support@example.com"
iex> email_regex = ~r/[\w.+-]+@[a-z\d-]+(?:\.[a-z\d-]+)*/i
iex> Regex.scan(email_regex, text)
[["info@example.com"], ["support@example.com"]]

# String replacement with regex
iex> Regex.replace(~r/\d+/, "Date: 2024-03-21", "[REDACTED]")
"Date: [REDACTED]-[REDACTED]-[REDACTED]"

# Regex options
iex> regex = ~r/hello/iu  # case-insensitive and Unicode
iex> String.match?("HELLO", regex)
true
Enter fullscreen mode Exit fullscreen mode

Heredocs and Documentation

Heredocs are particularly useful for multiline strings and documentation. Here are some examples:

# Basic heredoc example
message = """
This is a multiline
string using heredoc syntax.
Indentation is preserved.
"""

# Module documentation example
defmodule StringUtils do
  @moduledoc """
  Provides utility functions for string manipulation.

  ## Examples
      assert StringUtils.titleize("hello world") == "Hello World"
  """

  @doc """
  Converts a string to title case.

  ## Parameters
    - input: The string to convert

  ## Examples
      assert StringUtils.titleize("hello world") == "Hello World"
  """
  def titleize(input) when is_binary(input) do
    input
    |> String.split()
    |> Enum.map(&String.capitalize/1)
    |> Enum.join(" ")
  end
end
Enter fullscreen mode Exit fullscreen mode

Extending String Capabilities

While Elixir's built-in string functionality is comprehensive, additional libraries can extend these capabilities for specialized tasks.

URL Slugs with slugify

The slugify library is essential for generating URL-friendly strings and normalizing text. It's particularly useful for:

  • Creating SEO-friendly URLs
  • Converting titles to URL paths
  • Handling multilingual URLs
# First, install the library in IEx
iex> Mix.install([{:slugify, "~> 1.3"}])

# Basic slugification
iex> Slug.slugify("Hello, World")
"hello-world"

# Custom separator
iex> Slug.slugify("Hello World", separator: "_")
"hello_world"

# Language-specific slugification
iex> Slug.slugify("こんにちは世界", separator: "-")
"kon-ni-chi-ha-shi-jie"
Enter fullscreen mode Exit fullscreen mode

Conclusion

Throughout this guide to working with Elixir strings, we've covered essential concepts:

  • Understanding Elixir's binary strings and their UTF-8 nature
  • Leveraging pattern matching for efficient string operations
  • Working effectively with Unicode and international text

Remember that string operations can significantly impact your application's performance, especially at scale.

Further Reading

The more you work with Elixir strings, the more you'll appreciate the elegance and power of its string handling capabilities. The key is to understand the underlying principles and choose the right tools for your specific use case.

Next Steps

In the upcoming article, we'll explore:

  • Understanding Atoms in Elixir

Top comments (0)