TLDR;
Ruby comes O.O.T.B. with a module to read and parse CSV files
two_dimensional_array = CSV.read('/path/to/file.csv')
=>
[["TWO"],
["DIMENSIONAL"],
["ARRAY"]]
This article will cover the basics of working with CSVs in Ruby. I will be operating on a MacOS linux-like file system with a ZSH terminal shell, but I'm sure Windows users can benefit as well!
What is a CSV File?
Popular applications like Excel and Numbers can read and write to pure CSV, but technically their default extensions are .xlxs and .numbers.
CSV means 'comma separated values'. A pure .csv file is really just a string with values separated by commas and newlines. The commas separate the columns, and the newlines separate the rows.
Do you want to see what CSV data looks like?
Navigate to a directory in your terminal where you have a pure CSV file saved.
$ pwd
/Users/jvon1904/csv
$ ls
contacts.csv
Then use the cat
command in the terminal with the file name as the argument, and you will see what a pure CSV really is!
$ cat contacts.csv
ID,First Name,Last Name,Age,Gender
1,Victoria,Waite,38,F
2,Jamar,Hayes,37,M
3,Leonard,Brendle,39,M
4,Abby,Atchison,57,F
5,Marc ,Stockton,64,M
6,Geraldine,Roybal,52,F
7,James,Coles,57,M
8,Hiram,Spellman,58,M
9,Bradford,Vela,41,M
10,William,Haskell,74,M
11,Christopher,Mason,70,M
12,Thomas,Atkinson,68,M
13,Peggy,Underwood,37,F
14,Charles,Wilson,66,M
15,Joanne,Sanchez,42,F
16,Leo,Sanders,58,*
17,Robert,Castillo,39,M
18,Joan ,Traxler,82,F
19,Dana,Pitts,78,F
20,Susan,Dupont,34,F%
Notice how entries #5 and #18 have spaces after the first name. That's because spaces were accidentally left in the file.
So there it is. CSVs are just values, commas, and newlines.
The Ruby CSV Module
Ruby ships with two libraries, the Core and the Std-lib (Standard Library). The Core contains the classes that make up the Ruby language, stuff like Stings, Arrays, Classes, Integers, Files, etc. That's because everything in Ruby is an object that ultimately inherits from BasicObject.
$ irb
> Array.class
=> Class
> Array.class.superclass
=> Module
> Array.class.superclass.superclass
=> Object
> Array.class.superclass.superclass.superclass
=> BasicObject
> Array.class.superclass.superclass.superclass.superclass
=> nil
Since the Core is the core of Ruby, everything is included whenever you are coding in Ruby.
The Std-lib contains extensions to Ruby. They are modules that need to be required, just like gems, only they are already installed on your computer (unless you deleted them of course). They are worth checking out and contain some really cool and helpful modules.
You can inspect all the code by navigating to where they are stored.
Open up an IRB session and type the global variable $:
, it will return an array of paths which Ruby searches for modules in when they are required. Your paths might be different especially if you don't use RVM.
$ irb
> $:
=>
["/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/site_ruby/3.0.0",
"/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/site_ruby/3.0.0/arm64-darwin20",
"/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/site_ruby",
"/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/vendor_ruby/3.0.0",
"/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/vendor_ruby/3.0.0/arm64-darwin20",
"/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/vendor_ruby",
"/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/3.0.0",
"/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/3.0.0/arm64-darwin20"]
That neat little variable helps me remember where they are located. The second to last path is where the Std-lib resides.
$ pwd
/Users/jvon1904/.rvm/rubies/ruby-3.0.0/lib/ruby/3.0.0
$ ls
English.rb expect.rb open-uri.rb ripper
abbrev.rb fiddle open3.rb ripper.rb
arm64-darwin20 fiddle.rb openssl rubygems
base64.rb fileutils.rb openssl.rb rubygems.rb
benchmark find.rb optionparser.rb securerandom.rb
benchmark.rb forwardable optparse set
bigdecimal forwardable.rb optparse.rb set.rb
bigdecimal.rb getoptlong.rb ostruct.rb shellwords.rb
bundler io pathname.rb singleton.rb
bundler.rb ipaddr.rb pp.rb socket.rb
cgi irb prettyprint.rb syslog
cgi.rb irb.rb prime.rb tempfile.rb
coverage.rb json pstore.rb time.rb
csv json.rb psych timeout.rb
csv.rb kconv.rb psych.rb tmpdir.rb
date.rb logger racc tracer.rb
debug.rb logger.rb racc.rb tsort.rb
delegate.rb matrix rdoc un.rb
did_you_mean matrix.rb rdoc.rb unicode_normalize
did_you_mean.rb mkmf.rb readline.rb uri
digest monitor.rb reline uri.rb
digest.rb mutex_m.rb reline.rb weakref.rb
drb net resolv-replace.rb yaml
drb.rb objspace.rb resolv.rb yaml.rb
erb.rb observer.rb rinda
Since they are plain .rb files, you can open them up to see their inner workings. You can even modify them, although don't do it unless you know what you're doing. ๐
As was mentioned, each module in the Std-lib needs to be required. So if you want to use the CSV class, make sure you require 'csv'
.
# Otherwise you'll get this:
> CSV
(irb):15:in `<main>': uninitialized constant CSV (NameError)
```
```ruby
# Don't stress, just do this:
> require 'csv'
=> true
> CSV
=> CSV
> CSV.class
=> Class
```
It's always a great idea to hit up `CSV.methods.sort` to reference all its capabilities.
## Using the CSV Module to Read and Parse CSVs
There are two main methods for reading and parsing CSVs, `#read` and `#parse`! Use `#read` to read an actual file, and `#parse` to parse a properly formatted string. Let's compare the two.
```ruby
$ irb
> require 'csv'
=> true
> my_csv_string = "this,is,a,csv\ncan,you,believe,it?"
=> "this,is,a,csv\ncan,you,believe,it?"
> parsed_data = CSV.parse(my_csv_string)
=> [["this", "is", "a", "csv"], ["can", "you", "believe", "it?"]]
```
There it is! A two dimensional array from a CSV!
Just make sure when you want to escape a newline character, you use double quotes.
CSV#parse has two parameters, a string to parse, and a hash of options. Maybe for some odd reason we want to parse a CSV string with that's separated by semicolons... so an SSV? We can pass the `col_sep` option in like so.
```ruby
> CSV.parse("this;is;an;ssv\ncan;you;believe;it?", col_sep: ';')
=> [["this", "is", "an", "ssv"], ["can", "you", "believe", "it?"]]
```
The `CSV#parse` method can parse an actual file, but you have to open the file first. For instance, `CSV.parse(File.open('path/to/file.csv'))`. Thankfully, this is what `CSV#read` is for!
## Extracting Data from CSV Files
I created a simple CSV shown in this screenshot:
![contact.csv image](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t2bfi6r1h979nyzwpy31.jpg)
Now let's find the path so we can use Ruby to extract those values with `CSV#read`!
```
$ pwd
/Users/jvon1904/csv
$ ls
contacts.csv
```
```ruby
$ irb
> require 'csv'
=> true
# Make sure you remember the first forward slash in your path
> contacts_csv = CSV.read('/Users/jvon1904/csv/contacts.csv')
=>
[["ID", "First Name", "Last Name", "Age", "Gender"],
...
> contacts_csv
=>
[["ID", "First Name", "Last Name", "Age", "Gender"],
["1", "Victoria", "Waite", "38", "F"],
["2", "Jamar", "Hayes", "37", "M"],
["3", "Leonard", "Brendle", "39", "M"],
["4", "Abby", "Atchison", "57", "F"],
["5", "Marc ", "Stockton", "64", "M"],
["6", "Geraldine", "Roybal", "52", "F"],
["7", "James", "Coles", "57", "M"],
["8", "Hiram", "Spellman", "58", "M"],
["9", "Bradford", "Vela", "41", "M"],
["10", "William", "Haskell", "74", "M"],
["11", "Christopher", "Mason", "70", "M"],
["12", "Thomas", "Atkinson", "68", "M"],
["13", "Peggy", "Underwood", "37", "F"],
["14", "Charles", "Wilson", "66", "M"],
["15", "Joanne", "Sanchez", "42", "F"],
["16", "Leo", "Sanders", "58", "M"],
["17", "Robert", "Castillo", "39", "M"],
["18", "Joan ", "Traxler", "82", "F"],
["19", "Dana", "Pitts", "78", "F"],
["20", "Susan", "Dupont", "34", "F"]]
```
Great! With this data, you now have the power to create class instances with each row, or save them to a database, or whatever you want! In a future article I will write about just that. For now, here's some ideas of how you can play around with this.
```ruby
# getting a record is easy now
> contacts_csv.last
=> ["20", "Susan", "Dupont", "34", "F"]
# retrieve all female contacts
> contacts_csv.select { |row| row[4] == 'F' }
=>
[["1", "Victoria", "Waite", "38", "F"],
["4", "Abby", "Atchison", "57", "F"],
["6", "Geraldine", "Roybal", "52", "F"],
["13", "Peggy", "Underwood", "37", "F"],
["15", "Joanne", "Sanchez", "42", "F"],
["18", "Joan ", "Traxler", "82", "F"],
["19", "Dana", "Pitts", "78", "F"],
["20", "Susan", "Dupont", "34", "F"]]
#retrieve the first names of contacts under 40
> contacts_csv.select{ |row| row[3].to_i < 40 }.map{ |row| row[1] }
=> ["First Name", "Victoria", "Jamar", "Leonard", "Peggy", "Robert", "Susan"]
```
Oops! See how we got the "First Name" there? That's a header, so it shouldn't be part of the records. There's a way to get around this, but instead of getting an array back, we'll get a `CSV::Table` class. Let's check it out!
```ruby
# we just need to pass in the headers option
> parsed_data = CSV.read('/Users/jvon1904/csv/contacts.csv', headers:
true)
=> #<CSV::Table mode:col_or_row row_count:21>
> parsed_data.class
=> CSV::Table
```
Be aware the every time you pass in that `header: true` option, it will return a `CSV::Table`.
We can access indices the same was as arrays.
```ruby
# only it will return a CSV::Row class now
> parsed_data[0]
=> #<CSV::Row "ID":"1" "First Name":"Victoria" "Last Name":"Waite" "Age":"38" "Gender":"F">
> parsed_data[4][16]
=> "M"
> parsed_data[6].to_h
=>
{"ID"=>"7",
"First Name"=>"James",
"Last Name"=>"Coles",
"Age"=>"57",
"Gender"=>"M"}
```
We can access columns by using the `#by_col` method.
```ruby
> parsed_data.by_col[2]
=>
["Waite",
"Hayes",
"Brendle",
"Atchison",
"Stockton",
"Roybal",
"Coles",
"Spellman",
"Vela",
"Haskell",
"Mason",
"Atkinson",
"Underwood",
"Wilson",
"Sanchez",
"Sanders",
"Castillo",
"Traxler",
"Pitts",
"Dupont"]
# use the bang sign `!` to change the orientation of the table
> parsed_data.by_col!
=> #<CSV::Table mode:col row_count:21>
# now switch it back
> parsed_data.by_row!
=> #<CSV::Table mode:row row_count:21>
> parsed_data[14]["First Name"]
=> "Joanne"
```
Two more things. Let's see if we can change the format of the integers into floats, so they behave more like currency, and then write the file back to CSV.
```ruby
> parsed_data.each do |row|
> row["Age"] = row["Age"].to_f
> end
=> #<CSV::Table mode:row row_count:21>
> parsed_data.by_col[3]
=>
[38.0,
37.0,
39.0,
57.0,
64.0,
52.0,
57.0,
58.0,
41.0,
74.0,
70.0,
68.0,
37.0,
66.0,
42.0,
58.0,
39.0,
82.0,
78.0,
34.0]
```
Now we'll write to a new file. For this we'll use the `CSV#open` method with two arguments, the path, and a 'w' for 'write'.
```ruby
> CSV.open('ruby_made_csv.csv', 'w') do |file|
# we start by pushing the headers into the file
> file << parsed_data.headers
# next we'll push each line in one by one
> parsed_data.each do |row|
> file << row
> end
> end
=> #<CSV::Table mode:col_or_row row_count:21>
# you can execute shell commands by using back-ticks! ๐
> `ls`
=> "contacts.csv\nruby_made_csv.csv\n"
# there they are!
```
Hopefully this has given you a sample of all you can do with CSVs in Ruby!
To learn how to persist this data to Postgres, read my article [here](https://dev.to/jvon1904/insert-csv-rows-into-a-database-using-vanilla-ruby-5694).
Top comments (0)