DEV Community

Konstantin Grechishchev
Konstantin Grechishchev

Posted on • Edited on

6 things you can do with the Cow πŸ„ in Rust πŸ¦€

The Cow type is a mystery even for some intermediate-level Rust developers. Despite being defined as a simple two-variant enum

pub enum Cow<'a, B> 
where
    B: 'a + ToOwned + ?Sized, 
{
    Borrowed(&'a B),
    Owned(<B as ToOwned>::Owned),
}
Enter fullscreen mode Exit fullscreen mode

, it challenges the developers to understand the ownership and lifetimes, as well yet another mystery Borrow and ToOwned traits. As a result, programmers avoid using Cow, which often leads to extra memory allocations (which are not cheap) and less efficient software.

What are the situations when you might consider using Cow? Why does it have such a strange name? Let's try to find some answers today!

A function rarely modifying the data

Let's start with the most common and straightforward use case for Cow type. It is a good illustration of the situation when most developers (including me!) encounter the Cow for the first time.

Consider the following function accepting and modifying the borrowed data (in this case &str):

fn remove_whitespaces(s: &str) -> String {
    s.to_string().replace(' ', "")
}

fn main() {
    let value = remove_whitespaces("Hello world!");
    println!("{}", value);
}
Enter fullscreen mode Exit fullscreen mode

As you can see, it does nothing but removes all white spaces from the string. What is wrong with it? What if in 99.9% of calls the string contains no white spaces? Or slight modification of the method when spaces should be removed based on some other condition.

In such cases, we could avoid to_string() call and creation an unnecessary copy of the string. However, if we are to implement such logic, we can use neither String no &str type: the first one forces the memory allocation and the last is immutable.

This is the moment when Cow plays its role. We can return Cow::Owned when the string is modified and Cow::Borrowed(s) otherwise:

use std::borrow::Cow;

fn remove_whitespaces(s: &str) -> Cow<str> {
    if s.contains(' ') {
        Cow::Owned(s.to_string().replace(' ', ""))
    } else {
        Cow::Borrowed(s)
    }
}

fn main() {
    let value = remove_whitespaces("Hello world!");
    println!("{}", value);
}
Enter fullscreen mode Exit fullscreen mode

The nice thing about Cow<str> is that it could always be dereferenced into &str later or converted into String by calling into_owned. The into_owned only allocates the memory if the string was originally borrowed.

A struct optionally owning the data

We often need to store references inside the structs. If we have no such need, you are likely ending up cloning data unnecessarily.

Consider

struct User<'a> {
    first_name: &'a str,
    last_name: &'a str,
}
Enter fullscreen mode Exit fullscreen mode

Would not it be nice to be able to create a user with a static lifetime User<'static> owning its own data? This way we could implement the method do_something_with_user(user) accepting the same struct regardless of whether the data is cloned or borrowed. Unfortunately, the only way to create User<'static> is by using &'static str.

But what if we have a String? We can solve the problem by storing not &'a str, but Cow<'a, str> inside the struct:

use std::borrow::Cow;

struct User<'a> {
    first_name: Cow<'a, str>,
    last_name: Cow<'a, str>,
}
Enter fullscreen mode Exit fullscreen mode

This way, we can construct both owned and borrowed version of the User struct:

impl<'a> User<'a> {

    pub fn new_owned(first_name: String, last_name: String) -> User<'static> {
        User {
            first_name: Cow::Owned(first_name),
            last_name: Cow::Owned(last_name),
        }
    }

    pub fn new_borrowed(first_name: &'a str, last_name: &'a str) -> Self {
        Self {
            first_name: Cow::Borrowed(first_name),
            last_name: Cow::Borrowed(last_name),
        }
    }


    pub fn first_name(&self) -> &str {
        &self.first_name
    }
    pub fn last_name(&self) -> &str {
        &self.last_name
    }
}


fn main() {
    // Static lifetime as it owns the data
    let user: User<'static> = User::new_owned("James".to_owned(), "Bond".to_owned());
    println!("Name: {} {}", user.first_name, user.last_name);

    // Static lifetime as it borrows 'static data
    let user: User<'static> = User::new_borrowed("Felix", "Leiter");
    println!("Name: {} {}", user.first_name, user.last_name);

    let first_name = "Eve".to_owned();
    let last_name = "Moneypenny".to_owned();

    // Non-static lifetime as it borrows the data
    let user= User::new_borrowed(&first_name, &last_name);
    println!("Name: {} {}", user.first_name, user.last_name);
}
Enter fullscreen mode Exit fullscreen mode

A clone on write struct

The examples above illustrate only one side of the Cow: the ability to represent the data which borrowed or owned status is figured in not in compile time, but in runtime.

But why was it named Cow then? Cow stands for copy on write. The examples above illustrate only one side of the Cow: the ability to represent the data which borrowed or owned status is figured in not in compile time, but in runtime.

The true power of Cow comes with to_mut method. If the Cow is owned, it simply returns the pointer to the underlying data, however if it is borrowed, the data is first cloned to the owned from.

It allows you to implement an interface based on the structures, lazily storing the references to the data and cloning it only if (and for the first time) the mutation is required.

Consider the code which receives the buffer of data in the form of &[u8]. We would like to pass it over some logic, conditionally modifying the data (e.g. appending a few bytes) and consume the buffer as &[u8]. Similar to the example above, we can't keep the buffer as &[u8] as we won't be able to modify it, but converting it to Vec would lead to the copy being made every time.

We can achieve the required behavior by representing the data as Cow<[u8]>:

use std::borrow::Cow;

struct LazyBuffer<'a> {
    data: Cow<'a, [u8]>,
}

impl<'a> LazyBuffer<'a> {

    pub fn new(data: &'a[u8]) -> Self {
        Self {
            data: Cow::Borrowed(data),
        }
    }

    pub fn data(&self) -> &[u8] {
        &self.data
    }

    pub fn append(&mut self, data: &[u8]) {
        self.data.to_mut().extend(data)
    }
}
Enter fullscreen mode Exit fullscreen mode

This way we can pass borrowed data around without cloning up until the moment when (and if) we need to modify it:

fn main() {
    let data = vec![0u8; 10];

    // No memory copied yet
    let mut buffer = LazyBuffer::new(&data);
    println!("{:?}", buffer.data());

    // The data is cloned
    buffer.append(&[1, 2, 3]);
    println!("{:?}", buffer.data());

    // The data is not cloned on further attempts
    buffer.append(&[4, 5, 6]);
    println!("{:?}", buffer.data());
}
Enter fullscreen mode Exit fullscreen mode

Keep your own type inside it

Most likely you would end up using Cow<str> or Cow<[u8]>, but there are cases when you might want to store your own type inside it.

In order to use the Cow with a user defined type, you would need to implemented owned and borrowed version of it. The owned and borrowed version must by tied together by the following trait boundaries:

  • Owned version should implement the Borrow trait to produced a reference to the borrowed type
  • The borrowed version should implement ToOwned trait to produce the owned type.

Implementation of the the Borrow trait is tricky and often unsafe. Indeed, in order for the fn borrow(&self) -> &Borrowed; function to return a reference to Borrowed typed, this reference should either be stored inside &self or produced unsafely.

The above often means that the borrowed type is an unsized (also know as dynamically sized type. Their size is not known at compile time, so they can only exist as a pointer or a reference.

Have you ever wondered why we use &str everywhere and nearly never use str? You can't find the definition of the str type in the standard library, it is a primitive type (part of the language). Since str is a dynamically sized type, it can only be instantiated through a pointer type, such as &str. Trait object dyn T is another example of the dynamically sized type.

Imagine you would like to implement your own version of String and str type.

use std::borrow::{Borrow, Cow};
use std::ops::Deref;

#[derive(Debug)]
struct MyString {
    data: String
}

#[derive(Debug)]
#[repr(transparent)]
struct MyStr {
    data: str,
}
Enter fullscreen mode Exit fullscreen mode

Since str is unsized, so is MyStr. You can then bound MyString and MyStr same way as String and str are bounded:

impl Borrow<MyStr> for MyString {
    fn borrow(&self) -> &MyStr {
        unsafe { &*(self.data.as_str() as *const str as *const MyStr) }
    }
}

impl ToOwned for MyStr {
    type Owned = MyString;

    fn to_owned(&self) -> MyString {
        MyString {
            data: self.data.to_owned()
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The unsafe pointer case inside the borrow method has probably drawn your attention. While looking scary, it is the usual pattern in the standard library (have a look at e.g. Path type implementation). Since MyStr is a single field struct annotated with #[repr(transparent)], it is guarantied to have zero cost compile-time representation. It means we can safely cast the valid pointer to str to the pointer to MyStr and then convert it to a reference.

We could also optionally implement the Deref trait for convenience and store MyString and MyStr into cow as well, taking all advantages provided.

impl Deref for MyString {
    type Target = MyStr;

    fn deref(&self) -> &Self::Target {
        self.borrow()
    }
}


fn main()  {
    let data = MyString { data: "Hello world".to_owned() };

    let borrowed_cow: Cow<'_, MyStr> = Cow::Borrowed(&data);
    println!("{:?}", borrowed_cow);

    let owned_cow: Cow<'_, MyStr> = Cow::Owned(data);
    println!("{:?}", owned_cow);
}
Enter fullscreen mode Exit fullscreen mode

Borrow the type as dyn Trait

As mentioned above, the trait object is another example of dynamically sized type. Somewhat surprising, we can use Cow in a similar manner to implement dynamic dispatch, similarly to Box<dyn Trait> and Arc<dyn Trait>.

Consider the following trait and struct implementations:

use std::borrow::{Borrow, Cow};
use std::fmt::Debug;
use std::ops::Deref;

trait MyTrait: Debug {
    fn data(&self) -> &str;
}

#[derive(Debug)]
struct MyString {
    data: String
}

impl MyTrait for MyString {
    fn data(&self) -> &str {
        &self.data
    }
}
Enter fullscreen mode Exit fullscreen mode

As MyString implements MyTrait, we can borrow &MyString as &dyn MyTrait:

impl<'a> Borrow<dyn MyTrait + 'a> for MyString {
    fn borrow(&self) -> &(dyn MyTrait + 'a) {
        self
    }
}
Enter fullscreen mode Exit fullscreen mode

We can also convert any MyTrait implementation to MyString:

impl ToOwned for dyn MyTrait {
    type Owned = MyString;

    fn to_owned(&self) -> MyString {
        MyString {
            data: self.data().to_owned()
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Since we have defined Borrow and ToOwned, we can now put MyString into Cow<dyn MyTrait>:

fn main()  {
    let data = MyString { data: "Hello world".to_owned() };

    let borrowed_cow: Cow<'_, dyn MyTrait> = Cow::Borrowed(&data);
    println!("{:?}", borrowed_cow);

    let owned_cow: Cow<'_, dyn MyTrait> = Cow::Owned(data);
    println!("{:?}", owned_cow);
}
Enter fullscreen mode Exit fullscreen mode

The above could be useful to implement, e.g. the mutable vector of the trait objects:

fn main()  {
    let data = MyString { data: "Hello world".to_owned() };
    let cow1: Cow<'_, dyn MyTrait> = Cow::Borrowed(&data);

    let data = MyString { data: "Hello world".to_owned() };
    let cow2: Cow<'_, dyn MyTrait> = Cow::Owned(data);

    let mut vector: Vec<Cow<'_, dyn MyTrait>> = vec![cow1, cow2];
}
Enter fullscreen mode Exit fullscreen mode

Implement safe wrapper over FFI type

The above MyString example is exciting but somewhat artificial. Let's consider the real-life pattern when you would like to store your own type inside the Cow.

Imagine you are using the C library in your rust project. Let's say you receive a buffer of data from the C code in the form of the pointer *const u8 and length usize. Say you would like to pass the data around the layer of the rust logic, possibly modifying it (does it trigger you to think about Cow?). Finally, you might want to access the data (modified or not) in rust as &[u8] or pass into another C function as the pointer *const u8 and length usize.(Here we assume that this C function would not release the memory. If this assumption surprises you, consider reading 7 ways to pass a string between πŸ¦€ Rust and C article)

As we would like to avoid cloning the data unnecessarily, we would represent the buffer as the following struct:

use std::borrow::{Borrow, Cow};
use std::fmt::{Debug, Formatter};
use std::ops::Deref;

struct NativeBuffer {
    pub ptr: *const u8,
    pub len: usize
}
Enter fullscreen mode Exit fullscreen mode

This struct does not own its data, it borrows it from the C pointer with an unknown lifetime.

For convince only, we can implement the traits to access the buffer as &[u8] slice and print it:

impl Borrow<[u8]> for NativeBuffer {
    fn borrow(&self) -> &[u8] {
        unsafe {
            std::slice::from_raw_parts(self.ptr, self.len)
        }
    }
}

impl Deref for NativeBuffer {
    type Target = [u8];

    fn deref(&self) -> &Self::Target {
        self.borrow()
    }
}

impl Debug for NativeBuffer {
    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
        let data: &[u8] = self.borrow();
        write!(f, "NativeBuffer {{ data: {:?}, len: {} }}", data, self.len)
    }
}
Enter fullscreen mode Exit fullscreen mode

In order to store the NativeBuffer in the Cow we first need to define the owning version of it:

#[derive(Debug)]
struct OwnedBuffer {
    owned_data: Vec<u8>,
    native_proxy: NativeBuffer,
}

impl ToOwned for NativeBuffer {
    type Owned = OwnedBuffer;

    fn to_owned(&self) -> OwnedBuffer {
        let slice: &[u8] = self.borrow();
        let owned_data = slice.to_vec();
        let native_proxy = NativeBuffer {
            ptr: owned_data.as_ptr(),
            len: owned_data.len()
        };
        OwnedBuffer {
            owned_data,
            native_proxy,
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The trick is to borrow the data as a slice and convert it to Vec. We also need to store the NativeBuffer inside OwnedBuffer. It contains a pointer to the data inside the vector and the length of it, so we could implement the Borrow trait:

impl Borrow<NativeBuffer> for OwnedBuffer {
    fn borrow(&self) -> &NativeBuffer {
        &self.native_proxy
    }
}
Enter fullscreen mode Exit fullscreen mode

We can now define the method to mutate the data:

impl OwnedBuffer {

    pub fn append(&mut self, data: &[u8]) {
        self.owned_data.extend(data);
        self.native_proxy = NativeBuffer {
            ptr: self.owned_data.as_ptr(),
            len: self.owned_data.len()
        };
    }
}
Enter fullscreen mode Exit fullscreen mode

It is important to ensure to keep the native buffer pointers up to date.

We can finally put our borrowed buffer in the Cow and implement the conditional mutation logic, for example:

fn main() {
    // Simulates the data coming across FFI (from C)
    let data = vec![1, 2, 3];
    let ptr = data.as_ptr();
    let len = data.len();

    let native_buffer = NativeBuffer { ptr, len};
    let mut buffer = Cow::Borrowed(&native_buffer);
    // NativeBuffer { data: [1, 2, 3], len: 3 }
    println!("{:?}", buffer);

    // No data cloned
    assert_eq!(buffer.ptr, ptr);
    assert_eq!(buffer.len, len);

    if buffer.len > 1 {
        buffer.to_mut().append(&[4, 5, 6]);
        // OwnedBuffer { owned_data: [1, 2, 3, 4, 5, 6], native_proxy: NativeBuffer { data: [1, 2, 3, 4, 5, 6], len: 6 } }
        println!("{:?}", buffer);

        // Data is cloned
        assert_ne!(buffer.ptr, ptr);
        assert_eq!(buffer.len, len + 3);
    }

    let slice: &[u8] = &buffer;
    // [1, 2, 3, 4, 5, 6]
    println!("{:?}", slice);
}
Enter fullscreen mode Exit fullscreen mode

The buffer is only cloned if the length of it is bigger than 1.

Summary

I sincerely hope that this post helped to demystify the Cow type and increase its adoption among the rust community! If you like the article, please put your reaction up and consider reading my other posts!

Top comments (7)

Collapse
 
fryuni profile image
Luiz Ferraz

Since MyStr is a single field struct, it is guarantied to have zero cost compile-time representation. It means we can safely cast the valid pointer to str to the pointer to MyStr and then convert it to a reference.

That assumption is only safe if you add repr(transparent), otherwise the compiler is allowed to change the representation of the data. A reference to a str has a pointer and the length (aka, a fat pointer) but the compiler doesn't guarantee that those will be in the same order.

The same applies if there was a struct in there, the compiler is allowed to add padding bytes on either side of the inner field.

doc.rust-lang.org/nomicon/other-re...

Collapse
 
kgrech profile image
Konstantin Grechishchev • Edited

Good point.

I am looking into the definition of the Path struct in standard library:

#[cfg_attr(not(test), rustc_diagnostic_item = "Path")]
#[stable(feature = "rust1", since = "1.0.0")]
// FIXME:
// `Path::new` current implementation relies
// on `Path` being layout-compatible with `OsStr`.
// When attribute privacy is implemented, `Path` should be annotated as `#[repr(transparent)]`.
// Anyway, `Path` representation and layout are considered implementation detail, are
// not documented and must not be relied upon.
pub struct Path {
    inner: OsStr,
}
Enter fullscreen mode Exit fullscreen mode

and the comment of top of it is really confusing.

Collapse
 
jmfayard profile image
Jean-Michel πŸ•΅πŸ»β€β™‚οΈ Fayard

Is there a French influence in Rust or something?
We love cows because half of the french cuisine is based on milk
and we have 42th great expressions using cows as a metaphore.
For example "It's amazing" can be said as "C'est vachement πŸ„ bien"!

Collapse
 
kilterdev profile image
Stanislav

Good article! Especially last two sections were really insightful.

Getting the feeling there is still long way to go in becoming somewhat proficient on the Rust field.
Some topics are really confusing not in the conceptual way, but in the way they are implemented and used in Rust.

Programming in Rust is in some way really more about Rust, not about the programming itself :))
Though there is really good, nice guts feeling once you rein the horses.
I think it's worth it! No hurry.

Collapse
 
ed3899 profile image
Eduardo Casanova

Still a long way to master Rust but thank you for the effort you put into this. It has helped me a lot.

Collapse
 
segmentationfault profile image
SegSFault • Edited

Hi! Thank you for the article.
Regarding the second use: A struct optionally owning the data.
What is a rationale/reason behind using User<'static> instead of simply User<'a>?
Even though not important, the associated functions

pub fn first_name(&self) -> &str {
    &self.first_name
}
pub fn last_name(&self) -> &str {
    &self.last_name
}
Enter fullscreen mode Exit fullscreen mode

seem to be redundant, don't they?

Collapse
 
kgrech profile image
Konstantin Grechishchev

When you say 'a, what are the lifetime 'a bounds? Is it lifetime of &self?

If so, the return value would be valid as long as reference to self is valid, while we would like it to be valid for the rest the program lifetime (or until we drop it)