DEV Community

maxwizard01
maxwizard01

Posted on • Edited on

How to read csv file using R

Reading csv file

so how do we open and read csv file in R. before we proceed in to that i would advise you to read introduction to R and some lesson under it if you are new here so that you can understand better. so let's go into the lesson.

What is CSV ?

CSV means Comma Separated Values.it is the commonest import and export format for spreadsheets and databases which are always used as exchange of data between applications and popular data format in Data Science.
A CSV file stores data in a tabular form i.e each data field is separated by a delimiter(comma in most cases). for all csv file, it must be saved with the .csv file extension.

how to Read from CSV file

Sometime we might want to work with big data which is already in form of csv. All we need is to read the data from the file. Before we read the csv file we need to called read.csv() functiion to read the file.

How to use read.csv() function

To use the csv function we make use of syntax that look like the following.

read.csv(fileName)
Enter fullscreen mode Exit fullscreen mode

Example

let's say I have a file named payroll.csv in my current folder. the file look like the following.
Alt Text
so I need to extract all data inside the file to my code now therefore I will read throgh the file with the following codes.
codes

myData=read.csv('payroll.csv')
myData
Enter fullscreen mode Exit fullscreen mode

Output:

> csvFile=read.csv('payroll.csv')
> print(csvFile)
Empl     EMPL.Name Hourly.Rate hourly.worked Gross.Pay X5.5.Tax  Net.Pay
1  E0001 Mr. Abdullahi         5.5            35     192.5  10.5875 181.9125
2  E0002   Mrs Khadija         7.0            30     210.0  11.5500 198.4500
3  E0003    Mrs Zainab         8.5            25     212.5  11.6875 200.8125
4  E0004   Mrs Fatimah        10.0            20     200.0  11.0000 189.0000
5  E0005    Mr Ibrahim        11.5            15     172.5   9.4875 163.0125
6  E0006          Ford        13.0            10     130.0   7.1500 122.8500
7  E0007          Bell        14.5             5      72.5   3.9875  68.5125
8  E0008          John        13.0            10     130.0   7.1500 122.8500
9  E0009       francis        14.5             5      72.5   3.9875  68.5125
10 E0010       Olamide        13.0             5      65.0   3.5750  61.4250
Enter fullscreen mode Exit fullscreen mode

can you see that we have all the data now.you can now manipulate the data the way you like.
However read.csv() function take some other parameter apart from filename they are:

  1. File: You have to specify the file name, or Full path along with file name. You can also use the URL of the external (online) csv files. For example, sample.csv or β€œC:/Users/ Suresh/ Documents/ R Programs/ sample.csv”
  2. header: If the csv contains Columns names as the First Row as we have in the example above (where first column is empl) then please specify TRUE otherwise, FALSE
  3. sep: It is a short form of separator. You have to specify the character that is separating the fields. ” , β€œ means data is separated by comma.
  4. nrows: It is an integer value. You can use this argument to restrict the number of rows to read. For example, if you want top 4 records, use nrows = 4
  5. skip: this specify the number of rows you want to skip from file before started reading the csv file. For example, if you want to skip top 3 records, use skip = 3. Now let's take a look at another example

Example2

extract some data from the payroll starting from the third person to the seventh person.
solution.
according to the question we need to skip the first and second, then printed 5 people informations(3-7 inclusive).
so here is our codes
Codes

myData=read.csv('payroll.csv',header=TRUE,skip=2,nrow=5)
myData
Enter fullscreen mode Exit fullscreen mode

Output:

> myData=read.csv('payroll.csv',header=TRUE,skip=2,nrow=5)
> myData
  E0002 Mrs.Khadija   X7 X30  X210  X11.55  X198.45
1 E0003  Mrs Zainab  8.5  25 212.5 11.6875 200.8125
2 E0004 Mrs Fatimah 10.0  20 200.0 11.0000 189.0000
3 E0005  Mr Ibrahim 11.5  15 172.5  9.4875 163.0125
4 E0006        Ford 13.0  10 130.0  7.1500 122.8500
5 E0007        Bell 14.5   5  72.5  3.9875  68.5125
> 
Enter fullscreen mode Exit fullscreen mode

can you see the result? it has ommited the first and the second Also it printed exactly 5rows.
Head and Tail function
In R Programming, Head and Tail functions are the very useful functions to work with external data(read csv files). let say your csv file is too big and you want to extract the top performing records (top 20 records) then you can use these functions.
head(Data, limit): This method will return top six elements if you don't provide the limit. If you specify the limit for example "5" then, it will return first five records. you can select 10,20,30 or as many as you want. nrows can be use in the read.csv() function as a substitution.
tail(Data, limit): by name Tail will be the opposite of head, therefore tail method will return last six elements if you failed to specify the limit otherwis it will return the number of limit provided. It is something like selecting bottom 10 records.

Example3

let say we have a csv file name schoolsheet that look like the following: print the top ten record of student scores.
Alt Text
solution
firstly note that we need to print just top 10rows from schoolsheet.csv. we have two method of solving this
Method1 codes.

 mySchool=read.csv('schoolsheet.csv',header=TRUE)
 head(mySchool,10)
Enter fullscreen mode Exit fullscreen mode

Result

>myschool=read.csv('payroll.csv',header=TRUE)
>head(csvFile,10)
   number         student.Name test1 test2 Exam total
1   E0001 Nimatallah Olayiwola    10    10   43    63
2   E0002     Zainab Olayiwola    10    11   65    86
3   E0003       Roofiah Amuzat    10    15   66    91
4   E0004        Oduola Hafsoh    10     9   40    59
5   E0005 Ishola Kech Khadijah    10     7   40    57
6   E0006        Aminu Fatimoh    10    12   46    68
7   E0007       Ogundele Ahmad    10     7   45    62
8   E0008         Yusuf Badmus     8     8   40    56
9   E0009        Aminu Ibrahim     6    10   45    61
10  E0010 AbduLlaah AbdulAzeez     3     9   41    53
> 
Enter fullscreen mode Exit fullscreen mode

Method2

 mySchool=read.csv('schoolsheet.csv',header=TRUE,nrows=10)
 mySchool
Enter fullscreen mode Exit fullscreen mode

Result

>myschool=read.csv('payroll.csv',header=TRUE)
>head(csvFile,10)
   number         student.Name test1 test2 Exam total
1   E0001 Nimatallah Olayiwola    10    10   43    63
2   E0002     Zainab Olayiwola    10    11   65    86
3   E0003       Roofiah Amuzat    10    15   66    91
4   E0004        Oduola Hafsoh    10     9   40    59
5   E0005 Ishola Kech Khadijah    10     7   40    57
6   E0006        Aminu Fatimoh    10    12   46    68
7   E0007       Ogundele Ahmad    10     7   45    62
8   E0008         Yusuf Badmus     8     8   40    56
9   E0009        Aminu Ibrahim     6    10   45    61
10  E0010 AbduLlaah AbdulAzeez     3     9   41    53
> 
Enter fullscreen mode Exit fullscreen mode

Now can you see the little different between the two method??
Yeah! there is no exactly the way to solve problem in programming use any ways and just make it done.

AN IMPORTANCE NOTICE.

Note that all the code in this article will not work on your laptop because I am using data on my system. so if you want wish to use the same code here as practice then follow the following process before you started.
for schoolsheet.csv
copy the following data and paste it to your notepad.

number,student Name,test1,test2,Exam,total
E0001,Nimatallah Olayiwola,10,10,43,63
E0002,Zainab Olayiwola,10,11,65,86
E0003,Roofiah Amuzat,10,15,66,91
E0004,Oduola Hafsoh,10,9,40,59
E0005,Ishola Kech Khadijah,10,7,40,57
E0006,Aminu Fatimoh,10,12,46,68
E0007,Ogundele Ahmad,10,7,45,62
E0008,Yusuf Badmus,8,8,40,56
E0009,Aminu Ibrahim,6,10,45,61
E0010,AbduLlaah AbdulAzeez,3,9,41,53
E0011,Adeniran AbdulQudus,10,13,40,63
E0012,Muhammad Olanrewaju,10,13,40,63
E0013,Ojo Abdulmujeeb,7,7,36,50
E0014,Abdulshakur AbdulHakeem,10,13,50,73
E0015,Adebayo Musharraf,10,12,47,69
E0016,Ojobare Fadheelah,5,8,37,50
E0017,Odukoya AbdurRaheem,10,11,40,61
E0018,Adeniyi Aadam,10,14,40,64
E0019,Aderinto Muadh,10,8,35,53
Enter fullscreen mode Exit fullscreen mode

save it as schoolsheet.csv
for payroll.csv
copy the following and paste it to your notepad

Empl,EMPL Name,Hourly Rate,hourly worked,Gross Pay,5.5 Tax,Net Pay
E0001,Mr. Abdullahi,5.5,35,192.5,10.5875,181.9125
E0002,Mrs Khadija,7,30,210,11.55,198.45
E0003,Mrs Zainab,8.5,25,212.5,11.6875,200.8125
E0004,Mrs Fatimah,10,20,200,11,189
E0005,Mr Ibrahim,11.5,15,172.5,9.4875,163.0125
E0006,Ford,13,10,130,7.15,122.85
E0007,Bell,14.5,5,72.5,3.9875,68.5125
E0008,John,13,10,130,7.15,122.85
E0009,francis,14.5,5,72.5,3.9875,68.5125
E0010,Olamide,13,5,65,3.575,61.425
Enter fullscreen mode Exit fullscreen mode

save it as payroll.csv
I hope you find this article interesting?? please consider to like or follow me you can alse chat me up on whatsapp (07045225718). we shall meet in the next lesson where we will be treating how to extract and manipulate csv data till then keep on coding! it's your guy Maxwizard.
You can also read Data structure to read the next articles.

πŸ”₯ Introduction to R (1)

πŸ”₯ Data Structure (2)

πŸ”₯ Statistical value (mean, median, mode etc) (3)

πŸ”₯ Tabular Presentation of Data (4)

πŸ”₯ Ploting graph with R

πŸ”₯ constructing frequency distribution with R (6)

Top comments (0)