DEV Community

Cover image for A Deep Dive Into Strings in Rust
Moe Katib
Moe Katib

Posted on • Edited on

A Deep Dive Into Strings in Rust

In many programming languages, manipulating strings is a crucial aspect of writing applications. The Rust programming language, known for its performance and safety, is no different. This article provides an in-depth exploration of strings in Rust, including the special notations and "tricks" that could simplify your coding experience.

Understanding Basic Strings in Rust

At its most basic level, a string in Rust is represented as a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. Strings are created using double quotes "".

let s = "Hello, World!";
Enter fullscreen mode Exit fullscreen mode

In this code snippet, s is a string that contains the text "Hello, World!".

String Literals and String Slices

In Rust, a string literal is a slice (&str) that points to a specific section of our program's binary output – which is read-only and thus immutable. This is also why string literals are sometimes referred to as 'static strings'.

let s: &'static str = "Hello, World!";
Enter fullscreen mode Exit fullscreen mode

Here, s is a string slice pointing to the string literal "Hello, World!".

Raw Strings

In Rust, the r before a string literal denotes a raw string. Raw strings ignore all escape characters and print the string as it is. This is helpful when you want to avoid escaping backslashes in your strings, for example, in the case of regular expressions or file paths.

let s = r"C:\Users\YourUser\Documents";
Enter fullscreen mode Exit fullscreen mode

Byte Strings

Rust also has the concept of byte strings. They're similar to text strings, but they're constructed of bytes instead of characters. You can create a byte string by prefixing a string literal with a b.

let bs: &[u8; 4] = b"test"; // bs is a byte array: [116, 101, 115, 116]
Enter fullscreen mode Exit fullscreen mode

Raw Byte Strings

A raw byte string is a combination of raw strings and byte strings. This type of string is useful for including byte sequences that might not be valid UTF-8. A raw byte string is created by prefixing a string literal with br.

let raw_bs = br"\xFF"; // raw_bs is a byte array: [92, 120, 70, 70]
Enter fullscreen mode Exit fullscreen mode

Escaping in Raw Strings

If you need to include quotation marks in a raw string, you can do so by adding additional # symbols on both sides of the string.

let s = r#"This string contains "quotes"."#;
Enter fullscreen mode Exit fullscreen mode

Multiline Raw Strings

Raw strings can be multiline. The content of the string starts at the first line that does not contain only a #.

let s = r####"
This string contains "quotes".
It also spans multiple lines.
"####;
Enter fullscreen mode Exit fullscreen mode

Keep in mind! That the number of hash symbols (#) preceding and succeeding the string delimiters be the same and at least one. Furthermore, within the raw string, various formatted elements such as tabs and others can be included without escape sequences. (Thanks to Nirmalya Sengupta for his kind suggestion in the comments below.)

Unicode Strings

String literals in Rust can also contain any valid Unicode characters.

let s = "Hello, 世界!";
Enter fullscreen mode Exit fullscreen mode

Character Escapes

Regular (non-raw) string literals support several escape sequences:

  • \\ Backslash

  • \" Double quote

  • \n Newline

  • \r Carriage return

  • \t Tab

  • \0 Null

There are also Unicode escapes:

  • \u{7FFF} Unicode character (variable length, up to 6 digits)

  • \u{1F600} Unicode emoji

Conclusion

In summary, Rust provides powerful and flexible tools for working with strings. From raw and byte strings to Unicode and escape sequences,

Top comments (2)

Collapse
 
nsengupta profile image
Nirmalya Sengupta

Very useful!

It may be beneficial to your readers if you mention, that in multiline raw strings, the number of '#' is important because:

  • the number of them at the beginning and at the end must be the same; at least 1.
  • the raw string may include other formatted bodies like 'tab' etc.

Playground

Just a suggestion.

Collapse
 
moekatib profile image
Moe Katib

Great point. Updated! Thanks a lot.