DEV Community

Mungai Keren
Mungai Keren

Posted on

Missing Values in R — remove na values

The first method — is.na()

is.na tests the presence of missing values or null values in a data set. The method searches through every single column of the dataset, finding outliers with a na value that might affect the calculation.

Example;

x <- c(1,2,3,4,NA) is.na(x) returns a series of FALSE and TRUE depending on whether the values of the vector have na values. The output in this case would be FALSE FALSE FALSE FALSE TRUE

Image description

Second method — na.omit()
Here’s a sample dataset with missing values.

a dataset with missing values. Screenshot from R studio.

na.omit() method removes the rows with na values from a list. The na.omit() function returns a list without any rows that contain na values. This is the faster way to remove na values in R.

Image description

Complete cases complete.cases() — Returns vector of rows with na values.
The na.omit() function relies on the sweeping assumption that the dropped na rows are similar to the typical member of the dataset, and are not total outliers whereas the complete.cases() allows you to perform a more detailed review and expression.

Removing the na rows in a dataset might not be the right decision here and we might therefore consider inspecting datasets of the original data to evaluate if other factors are at work.

We accomplish this with the complete.cases() function. This R function will examine a data frame and return a result vector of the rows which contain missing values. We can examine the dropped records and purge them if we wish.

Missing values removed using the complete.cases() function.

A display of the removed missing values which can be examined.

Fix in place using the na.rm
Another way of dealing with missing values is by using the na.rm logical parameter. When na.rm is True, it skips over the na values. However, when the na.rm is False, then it returns NA from the calculation being done on the entire row or column.

The rows with the value of na are retained in the data frame but excluded in relevant calculations. This is often the best method if you find that there are significant trends in the observation, with na values. Support for this varies by package so please check the documentation for your specific package.

notice the na value on the rowSums because we had an na value in the dataset on column 2 row 2

Dealing with missing data from a dataset is critical to proper data science. R makes dealing with this missing data so easy that's why it is often used in statistical analysis.

Top comments (0)