DEV Community

Cover image for Playing with webdriver
Daniel Fitzpatrick
Daniel Fitzpatrick

Posted on • Edited on

Playing with webdriver

Webdriver has been a persistently alluring technology since I discovered it a couple of years ago. However, regular HTTP clients have always been sufficient for my needs.

I have recently wanted to pull some data off the local church website, and I have been unable to log in with any HTTP clients. So, I attempted to throw Etaoin at the problem, and it worked marvelously.

You will need a username and password for a congregate site to follow along. I suspect the routes will be identical.

;; Getting started
(require '[etaoin.api :as api])
(def base-url "https://mysite.com")
(def user "...")
(def pass "...")
(def ff (api/firefox))
Enter fullscreen mode Exit fullscreen mode

Logging in is easy.

(api/go ff (str base-url "/members/login/"))
(api/fill-multi ff {:username user :password pass})
(api/submit ff {:id "password"})
Enter fullscreen mode Exit fullscreen mode

After submitting the login form, I am unsure how to verify that the member landing page has loaded. So, for now, I advise just waiting a few seconds. If you're following along, you will have the browser in front of you and can "eyeball" it. I would appreciate any suggestions for improvement here.

We're in, so what now?

Well, I have trouble remembering names and faces. What if I had a flashcard system to help me memorize them? We can build that from the directory.

Let's navigate to the directory page and inspect it before proceeding.

(api/go ff (str base-url "/members/directory"))
Enter fullscreen mode Exit fullscreen mode

It looks like each directory element is identifiable by the album class.

snapshot of directory html

Let's dig into an album tag.

<div class="album">
  <a href="/members/directory/family/XXX">
    <span class="album-img">
      <img src="image-url" alt="...">
    </span>
    <span class="album-title">Doe, John</span>
  </a>
</div>
Enter fullscreen mode Exit fullscreen mode

We'll need a couple of functions. One takes an album and grabs the tag's value with class=album-title, and the other grabs the image source.

(require '[clojure.string :as s])

(defn get-album-title [album-entry]
  (->> {:class "album-title"}
       (api/child ff album-entry)
       (api/get-element-text-el ff)))

(defn get-album-image [album-entry]
  (as-> album-entry $
    (api/child ff $ {:tag "img"})
    (api/get-element-attr-el ff $ :src)
    (s/replace $ #"\?.*" "")
    (str base-url $)))
Enter fullscreen mode Exit fullscreen mode

This code may look familiar because it is similar to the kind of web-scraping you would do with a regular HTTP client. If not, I've got you covered.

album-entry represents a DOM element like the <div class=album> tag we inspected earlier, children and all. Call the child function to get the sub-element we want, and then finally, a get-element-<thing> function returns the string we need.

Let's put it together.

(->> {:class "album"}
     (api/query-all ff)
     (mapv (juxt get-album-title get-album-image)))

;=>
[["Doe, John" "path/to/image.jpg"]
 ["Doe, Jane" "path/to/image2.jpg"] ...]
Enter fullscreen mode Exit fullscreen mode

At this point, I am beginning to lose interest. But I like having options. So, let's convert this to JSON and print it to the console. You can see the project here

Perhaps I will revisit and finish the project in another post.

Top comments (0)