Here's a quick 'n' dirty way to dump your new-fangled post analytics to a CSV using Rust. You have to save the page source to src/page.html
. Y'know, for graphs and stuff. Who doesn't like graphs?
This ain't polished - It was my "one-hour-before-my-day-job-starts" project today. Snag the regex for your own real version, or improve this one and show me!
extern crate chrono;
extern crate csv;
#[macro_use]
extern crate lazy_static;
extern crate regex;
extern crate select;
extern crate serde;
#[macro_use]
extern crate serde_derive;
use chrono::prelude::*;
use regex::Regex;
use select::{
document::Document,
predicate::{Class, Name},
};
use std::{
error::Error,
fs::{File, OpenOptions},
};
lazy_static! {
static ref NOW: DateTime<Local> = Local::now();
static ref STAT_RE: Regex = Regex::new(".+?([0-9]+).+//.?([0-9]+).+//.?([0-9]+).+").unwrap();
}
#[derive(Debug, Serialize)]
struct Record {
time: String,
title: String,
views: i32,
reactions: i32,
comments: i32,
}
impl Record {
fn new(time: String, title: String, views: i32, reactions: i32, comments: i32) -> Self {
Self {
time,
title,
views,
reactions,
comments,
}
}
}
fn write_entries(rs: Vec<Record>, f: File) -> Result<(), Box<Error>> {
let mut wtr = csv::Writer::from_writer(f);
for r in rs {
wtr.serialize(r)?;
}
wtr.flush()?;
Ok(())
}
fn scrape_page(doc: &Document) -> Result<Vec<Record>, Box<Error>> {
let mut ret = Vec::new();
for node in doc.find(Class("dashboard-pageviews-indicator")) {
let text = node.text();
if STAT_RE.is_match(&text) {
let title = node
.parent()
.unwrap()
.parent()
.unwrap()
.find(Name("a"))
.next()
.unwrap()
.find(Name("h2"))
.next()
.unwrap()
.text();
for cap in STAT_RE.captures_iter(&text) {
let r = Record::new(
NOW.to_rfc2822(),
title.clone(),
cap[1].parse::<i32>()?,
cap[2].parse::<i32>()?,
cap[3].parse::<i32>()?,
);
ret.push(r);
}
}
}
Ok(ret)
}
fn run() -> Result<(), Box<Error>> {
let doc = Document::from(include_str!("page.html"));
let file = OpenOptions::new()
.write(true)
.create(true)
.append(true)
.open("stats.csv")?;
let entries = scrape_page(&doc)?;
write_entries(entries, file)?;
Ok(())
}
fn main() {
if let Err(e) = run() {
eprintln!("Error: {}", e);
::std::process::exit(1);
}
}
edit finished off the error handling
Top comments (11)
You may have given me an excuse to finally execute some Rust code! I’ve done lots of reading but haven’t actually tried using Rust yet.
Evangelizing: complete.
I'll make a post about my
hello DEV scrapity-scrape world
experience once I'm done. 🙂Adding the view count to
https://dev.to/deciduously/scrape-your-devto-pageviews-with-rust-2dgc.json
?That's a good idea!
Ben do you have interesting Rust open source project on your radar? Something that you yourself learn from or contribute to?
I recently googled "Emacs written in Rust" and found this one remacs.
Remacs is what I would have suggested! I haven't gotten too involved but there a lot of little bite-sized translations of all the Lisp functions from C to look at. There also always the Servo project too, and ripgrep
Another one that's interesting is bat (as an alternative to cat).
Thanks, this is interesting.
Full docs for select.rs here - viable alternative to, say, Python in my opinion
This is really cool Ben thanks for sharing!