Whew! That was VERY cool.
Just finished my first CLI project for Flatiron. SO satisfying to be able to use everything we've learned in the past few weeks on a single app!
I guess I'll walk you through some highlights?
We had a choice between scraping and working with an API. I know that in the future I'll probably have more work that involves API's, but I really wanted to figure out the whole Nokogiri business. So I picked a project that I could actually use: a NJ hike finder. My wife's family lives out in NJ and have an extremely energetic Doberman named Bellatrix. Hike's are GOLD to that dog. And to that wife, come to think of it.
I found a great website (or so I thought...) called "NJHiking.com". A couple hundred hikes, lots of information. I was ready to roll.
The first thing I wanted to do was find out how far any of these hikes were from my in-laws house. A little research led me to an awesome Gem called Geocoder which, among other things, can take any two sets of coordinates and calculate the distance between them. But there were issues with the website I was scraping. Many, many issues.
Problem #1 - NOTHING was labeled! Virtually no classes, no ID's. Nothing. Just a bunch of h3's, href's, and p's. But I was determined! The first set of coordinates were hidden in an href:
Naturally there were other href's on the page, but none in which the SECOND char of .text was a number. Thus my first bit of search code:
results1.each do |r|
if r.text[1] != nil && r.text[1].match(/\d/)
@new_hike.coordinates = r.text
end
end
First problem solved! Sort of.
Problem #2 - Geocoder doesn't like this format: N40.88787° W74.81405°. It likes THIS format: ["40.88787, -74.81405"]. Which is a hassle. I needed to get rid of the N, get rid of the W, get rid of the degrees, convert the second number to a negative, AND put them in an array. Which I did like so:
one = location[0].scan(/\d{2}[.]\d{5}/)
two = location[1].scan(/\d{2}[.]\d{5}/)
two_minus = (two.join.to_f * -1).to_s.split
final_location = [one, two_minus].join(" ").split
And then...
Problem #3 - This was the worst. Unfortunately, the coordinates for all of these hikes were on each hikes PERSONAL page. Which meant that looking at the first page (where all the hikes were listed) was useless for anything except knowing what the hike was called. At first I tried a sort of "double" scrape where I got all of the hike names and their coordinates and put them into an array of hashes. Then I used Geocoder to see which of those hash values (the coordinates) were within a 50mile radius of my in-laws. Totally worked, BUT...it took forever. Nokogiri had to scrape the original page for the first website, then go to that website, scrape that page, then go BACK and do it all over again. 200 times. The first time I ran it I thought there was an open loop somewhere in my code because it took so long to finish.
Eventually I just scrapped the idea of figuring out the distance prior to picking the hike.
After that it was pretty smooth sailing! The lack of labeling was super annoying, and relegated my choices of hike attributes to some pretty simple things: the length of the hike, it's website, and whether it had a restroom or not. But in the end I was pretty happy with the turnout.
The real prize in the cereal box, though, was how it all fit together. Seeing the separation of classes. Calling instance methods of one class inside another class. It was like watching a game of Jenga in reverse. I know for you coders out there that have been doing it for years it must seem pretty boring, but me?
I was jazzed as hell. :)
Top comments (0)