Reading csv file
so how do we open and read csv file in R. before we proceed in to that i would advise you to read introduction to R and some lesson under it if you are new here so that you can understand better. so let's go into the lesson.
What is CSV ?
CSV means Comma Separated Values.it is the commonest import and export format for spreadsheets and databases which are always used as exchange of data between applications and popular data format in Data Science.
A CSV file stores data in a tabular form i.e each data field is separated by a delimiter(comma in most cases). for all csv file, it must be saved with the .csv
file extension.
how to Read from CSV file
Sometime we might want to work with big data which is already in form of csv. All we need is to read the data from the file. Before we read the csv file we need to called read.csv() functiion to read the file.
How to use read.csv() function
To use the csv function we make use of syntax that look like the following.
read.csv(fileName)
Example
let's say I have a file named payroll.csv
in my current folder. the file look like the following.
so I need to extract all data inside the file to my code now therefore I will read throgh the file with the following codes.
codes
myData=read.csv('payroll.csv')
myData
Output:
> csvFile=read.csv('payroll.csv')
> print(csvFile)
Empl EMPL.Name Hourly.Rate hourly.worked Gross.Pay X5.5.Tax Net.Pay
1 E0001 Mr. Abdullahi 5.5 35 192.5 10.5875 181.9125
2 E0002 Mrs Khadija 7.0 30 210.0 11.5500 198.4500
3 E0003 Mrs Zainab 8.5 25 212.5 11.6875 200.8125
4 E0004 Mrs Fatimah 10.0 20 200.0 11.0000 189.0000
5 E0005 Mr Ibrahim 11.5 15 172.5 9.4875 163.0125
6 E0006 Ford 13.0 10 130.0 7.1500 122.8500
7 E0007 Bell 14.5 5 72.5 3.9875 68.5125
8 E0008 John 13.0 10 130.0 7.1500 122.8500
9 E0009 francis 14.5 5 72.5 3.9875 68.5125
10 E0010 Olamide 13.0 5 65.0 3.5750 61.4250
can you see that we have all the data now.you can now manipulate the data the way you like.
However read.csv() function take some other parameter apart from filename
they are:
- File: You have to specify the file name, or Full path along with file name. You can also use the URL of the external (online) csv files. For example, sample.csv or βC:/Users/ Suresh/ Documents/ R Programs/ sample.csvβ
-
header: If the csv contains Columns names as the First Row as we have in the example above (where first column is
empl
) then please specify TRUE otherwise, FALSE - sep: It is a short form of separator. You have to specify the character that is separating the fields. β , β means data is separated by comma.
- nrows: It is an integer value. You can use this argument to restrict the number of rows to read. For example, if you want top 4 records, use nrows = 4
- skip: this specify the number of rows you want to skip from file before started reading the csv file. For example, if you want to skip top 3 records, use skip = 3. Now let's take a look at another example
Example2
extract some data from the payroll starting from the third person to the seventh person.
solution.
according to the question we need to skip the first and second, then printed 5 people informations(3-7 inclusive).
so here is our codes
Codes
myData=read.csv('payroll.csv',header=TRUE,skip=2,nrow=5)
myData
Output:
> myData=read.csv('payroll.csv',header=TRUE,skip=2,nrow=5)
> myData
E0002 Mrs.Khadija X7 X30 X210 X11.55 X198.45
1 E0003 Mrs Zainab 8.5 25 212.5 11.6875 200.8125
2 E0004 Mrs Fatimah 10.0 20 200.0 11.0000 189.0000
3 E0005 Mr Ibrahim 11.5 15 172.5 9.4875 163.0125
4 E0006 Ford 13.0 10 130.0 7.1500 122.8500
5 E0007 Bell 14.5 5 72.5 3.9875 68.5125
>
can you see the result? it has ommited the first and the second Also it printed exactly 5rows.
Head and Tail function
In R Programming, Head and Tail functions are the very useful functions to work with external data(read csv files). let say your csv file is too big and you want to extract the top performing records (top 20 records) then you can use these functions.
head(Data, limit): This method will return top six elements if you don't provide the limit. If you specify the limit for example "5" then, it will return first five records. you can select 10,20,30 or as many as you want. nrows
can be use in the read.csv()
function as a substitution.
tail(Data, limit): by name Tail will be the opposite of head, therefore tail method will return last six elements if you failed to specify the limit otherwis it will return the number of limit provided. It is something like selecting bottom 10 records.
Example3
let say we have a csv file name schoolsheet
that look like the following: print the top ten record of student scores.
solution
firstly note that we need to print just top 10rows from schoolsheet.csv. we have two method of solving this
Method1 codes.
mySchool=read.csv('schoolsheet.csv',header=TRUE)
head(mySchool,10)
Result
>myschool=read.csv('payroll.csv',header=TRUE)
>head(csvFile,10)
number student.Name test1 test2 Exam total
1 E0001 Nimatallah Olayiwola 10 10 43 63
2 E0002 Zainab Olayiwola 10 11 65 86
3 E0003 Roofiah Amuzat 10 15 66 91
4 E0004 Oduola Hafsoh 10 9 40 59
5 E0005 Ishola Kech Khadijah 10 7 40 57
6 E0006 Aminu Fatimoh 10 12 46 68
7 E0007 Ogundele Ahmad 10 7 45 62
8 E0008 Yusuf Badmus 8 8 40 56
9 E0009 Aminu Ibrahim 6 10 45 61
10 E0010 AbduLlaah AbdulAzeez 3 9 41 53
>
Method2
mySchool=read.csv('schoolsheet.csv',header=TRUE,nrows=10)
mySchool
Result
>myschool=read.csv('payroll.csv',header=TRUE)
>head(csvFile,10)
number student.Name test1 test2 Exam total
1 E0001 Nimatallah Olayiwola 10 10 43 63
2 E0002 Zainab Olayiwola 10 11 65 86
3 E0003 Roofiah Amuzat 10 15 66 91
4 E0004 Oduola Hafsoh 10 9 40 59
5 E0005 Ishola Kech Khadijah 10 7 40 57
6 E0006 Aminu Fatimoh 10 12 46 68
7 E0007 Ogundele Ahmad 10 7 45 62
8 E0008 Yusuf Badmus 8 8 40 56
9 E0009 Aminu Ibrahim 6 10 45 61
10 E0010 AbduLlaah AbdulAzeez 3 9 41 53
>
Now can you see the little different between the two method??
Yeah! there is no exactly the way to solve problem in programming use any ways and just make it done.
AN IMPORTANCE NOTICE.
Note that all the code in this article will not work on your laptop because I am using data on my system. so if you want wish to use the same code here as practice then follow the following process before you started.
for schoolsheet.csv
copy the following data and paste it to your notepad.
number,student Name,test1,test2,Exam,total
E0001,Nimatallah Olayiwola,10,10,43,63
E0002,Zainab Olayiwola,10,11,65,86
E0003,Roofiah Amuzat,10,15,66,91
E0004,Oduola Hafsoh,10,9,40,59
E0005,Ishola Kech Khadijah,10,7,40,57
E0006,Aminu Fatimoh,10,12,46,68
E0007,Ogundele Ahmad,10,7,45,62
E0008,Yusuf Badmus,8,8,40,56
E0009,Aminu Ibrahim,6,10,45,61
E0010,AbduLlaah AbdulAzeez,3,9,41,53
E0011,Adeniran AbdulQudus,10,13,40,63
E0012,Muhammad Olanrewaju,10,13,40,63
E0013,Ojo Abdulmujeeb,7,7,36,50
E0014,Abdulshakur AbdulHakeem,10,13,50,73
E0015,Adebayo Musharraf,10,12,47,69
E0016,Ojobare Fadheelah,5,8,37,50
E0017,Odukoya AbdurRaheem,10,11,40,61
E0018,Adeniyi Aadam,10,14,40,64
E0019,Aderinto Muadh,10,8,35,53
save it as schoolsheet.csv
for payroll.csv
copy the following and paste it to your notepad
Empl,EMPL Name,Hourly Rate,hourly worked,Gross Pay,5.5 Tax,Net Pay
E0001,Mr. Abdullahi,5.5,35,192.5,10.5875,181.9125
E0002,Mrs Khadija,7,30,210,11.55,198.45
E0003,Mrs Zainab,8.5,25,212.5,11.6875,200.8125
E0004,Mrs Fatimah,10,20,200,11,189
E0005,Mr Ibrahim,11.5,15,172.5,9.4875,163.0125
E0006,Ford,13,10,130,7.15,122.85
E0007,Bell,14.5,5,72.5,3.9875,68.5125
E0008,John,13,10,130,7.15,122.85
E0009,francis,14.5,5,72.5,3.9875,68.5125
E0010,Olamide,13,5,65,3.575,61.425
save it as payroll.csv
I hope you find this article interesting?? please consider to like or follow me you can alse chat me up on whatsapp (07045225718)
. we shall meet in the next lesson where we will be treating how to extract and manipulate csv data till then keep on coding! it's your guy Maxwizard.
You can also read Data structure to read the next articles.
Top comments (0)