DEV Community

Cover image for A JavaScript scraper for the Wikipedia Academy Award List.
Everton Tenorio
Everton Tenorio

Posted on

A JavaScript scraper for the Wikipedia Academy Award List.

Scraping the Academy Award winners listed on Wikipedia with cheerio and saving them to a CSV file.

Today, a simple demonstration of how to scrape data using JavaScript with the cheerio library. For this, we'll use the list of Academy Award winners directly from Wikipedia.

First, install the necessary packages:

npm install cheerio axios
Enter fullscreen mode Exit fullscreen mode

The URL used is:

const url = 'https://en.wikipedia.org/wiki/List_of_Academy_Award%E2%80%93winning_films';
Enter fullscreen mode Exit fullscreen mode

Next, we'll load the HTML using the load function and prepare two variables to hold the columns and the necessary information from the table:

const { data: html } = await axios.get(url);
const $ = cheerio.load(html); 

const theadData = [];
const tableData = [];
Enter fullscreen mode Exit fullscreen mode

table

Now we'll select and manipulate the elements as we traverse the DOM, which are Cheerio objects returned in the $ function:

$('tbody').each((i, column) => { 
    const columnData = [];
    $(column)
      .find('th')
      .each((j, cell) => {
      columnData.push($(cell).text().replace('\n',''));
    });
    theadData.push(columnData)
  }) 

  tableData.push(theadData[0]) 

$('table tr').each((i, row) => {
    const rowData = []; 
    $(row)
      .find('td')
      .each((j, cell) => {
        rowData.push($(cell).text().trim());
      });

    if (rowData.length) tableData.push(rowData)
  })
Enter fullscreen mode Exit fullscreen mode

Glad you still know jQuery...

Finally, save the data as it is, even without processing the data 😅 into a .csv spreadsheet with fs.writeFileSync.

Note, I used ";" as the delimiter.

const csvContent = tableData
    .map((row) => row.join(';')) 
    .join('\n');

fs.writeFileSync('academy_awards.csv', csvContent, 'utf-8');
Enter fullscreen mode Exit fullscreen mode

running

node scraper.js
Enter fullscreen mode Exit fullscreen mode

cheerio csv

I’ve written other tutorials here on dev.to about scraping, with Go and Python, and If this article helped you or you enjoyed it, consider contributing: donate

Top comments (0)