DEV Community

milktea02
milktea02

Posted on • Edited on

Google Sheets to track Dividends (How I pack and search the data after webscraping it)

In my last post I wrote about using Google Sheets to webscrape dividend data from dividendhistory.org.

Today I'm going to share with you my adventures to improve efficiency and how to store data using sheets TRANSPOSE() and FLATTEN() functions and search using MATCH().

TOC

The Improved Scrape

Before, to scrape the dividend history table I was using the IMPORTHTML() function but also noted that some times I may make up to two calls because the web page may only have one table instead of two causing essentially an index out of range error.

After reviewing the page source I found out the table has an id:



<table id='dividend_table' class='table table-striped table-bordered'>
  <thead>
    <tr>
      <th>Ex-Dividend Date</th>
      <th>Payout Date</th>
      <th>Cash Amount</th>
      <th>% Change</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><i>2022-10-29</i></td>
      <td><i>2022-11-26</i></td>
      <td><i>$1.33**</i></td>
      <td><i>unconfirmed/estimated</i></td>
    </tr>
    <tr>
      <td><i>2022-07-30</i></td>
      <td><i>2022-08-26</i></td>
      <td><i>$1.33**</i></td>
      <td><i>unconfirmed/estimated</i></td>
    </tr>   
    <tr>
      <td>2022-04-29</td>
      <td>2022-05-26</td>
      <td>$1.33</td>
      <td></td>
    </tr>
...
...
</table>


Enter fullscreen mode Exit fullscreen mode

d'oh!

I can use IMPORTXML() and just query for this table directly!



=IMPORTXML("https://dividendhistory.org/payout/TSX/BMO", "//table[@id='dividend_table']/*/*")


Enter fullscreen mode Exit fullscreen mode

And using that query looks like this in Sheets:
screenshot of how the query would look like in sheets

You can continue to use the INDEX(range, row, col) function to grab the exact cells you want but for myself I wanted to do a few fun things.

Packing the data

Now, I only really need some rows of data and I don't really care for the % Change column, so we use ARRAY_CONSTRAIN(range, rows, cols)



=ARRAY_CONSTRAIN(IMPORTXML("https://dividendhistory.org/payout/TSX/BMO", "//table[@id='dividend_table']/*/*"), 6, 3)


Enter fullscreen mode Exit fullscreen mode

after applying array constrain function we only have 6 rows and 3 columns of data

I want to further pack the data so that it occupies it's own row or column which can be done using TRANSPOSE() and FLATTEN() and you'll see that order will matter:

Examples of flattening and transposing cells on sheets

We want to apply TRANSPOSE() first then FLATTEN():



=FLATTEN(TRANSPOSE(ARRAY_CONSTRAIN(IMPORTXML("https://dividendhistory.org/payout/TSX/BMO", "//table[@id='dividend_table']/*/*"), 6, 3)))


Enter fullscreen mode Exit fullscreen mode

Which makes it look like the columns are combined to a stack (hehe stack):

screenshot of data after applying a transpose and flatten

Searching the Data

If you ever had to take a course on algorithms and data structures you might remember doing a lot of stuff related to pointers and offsets and this is why I stored the data to fit one column!

When am I going to get paid next?

I can make use of the MATCH(key, range, [search_type]) formula to search for a relative offset.

Screenshot of tooltip for using MATCH()

The search_type is useful because I want to find the next payout date. To find the date that is equal to or greater than today I can set the search_type to -1 and luckily for us the dates are already sorted in descending order:

screenshot of tooltip for using MATCH() search_type

Sample of what we're going to do with the data:
Screenshot of the sheet getting the next payout date and the amount as well!

As of writing, TODAY() is 2022-04-08 and we can see that the relative offset of the next payout date (2022-05-26) when searching from B14 to B18 is 3 from the top, so located in B16



=MATCH(TODAY(), B14:B18, -1)


Enter fullscreen mode Exit fullscreen mode

We can also get the exact row where Payout Date starts at using ROW() (useful if you drag things here and there instead of hardcoding):



=ROW(B13)


Enter fullscreen mode Exit fullscreen mode

So now we know that the next payout date is located in cell B(13 + 3 but how can we get the information from that cell address?

We can use the INDIRECT() function to reference a cell using an address that's in TEXT.

Basically we want a generic way to write:



=INDIRECT("B16")


Enter fullscreen mode Exit fullscreen mode

The cool thing is we can use COLUMN and INDIRECT() also accepts address in RyCx format if we set the optional is_A1_notation flag to FALSE. Again I like using & to build strings in sheets:



=INDIRECT("R" & SUM(B1 + B3) & "C" & COLUMN(), FALSE)


Enter fullscreen mode Exit fullscreen mode

And we can do the same for getting the next Cash Amount by using a different starting offset but same relative offset:



=INDIRECT("R" & SUM(B2 + B3) & "C" & COLUMN(), FALSE)

Enter fullscreen mode Exit fullscreen mode




What about the other data?

In my last post I also scraped the yield. I also did this and with a bit of data massaging also tacked it onto the column as you can see here in my raw data spreadsheet example:

poor resolution of my data spreadsheet

So ultimately I end up making only two calls to dividend history per ticker symbol which is much better than up-to two calls for the dividend amount, and one call for the yield. We also get all the useful data from the page.

Here is an example of what you can do with the raw data and only taking the relevant data:

screenshot of a nicer table that uses the raw data as a source

There's all sorts of stuff you can do for example, taking the frequency and calculating how much dividend you could expect from a single share over a year. You can use IF() to decide if the frequency would be 12 (monthly) or 4 (quarterly).

And here is a link to the example if you would like to make a copy and check out the stuff for yourself. It's been pre-populated with Canadian banks and some American companies that pay dividends.

https://docs.google.com/spreadsheets/d/1CNfWRi7pY5zWcQ1htC5m557oBYS8-YKnMtTuStbhTaQ/edit?usp=drivesdk

Top comments (2)

Collapse
 
aguyaz profile image
albert • Edited

Thank you. I cant believe no one has commented on this in 2 yrs. Excellent tutorial on how you used these little known (to me) functions to wrangle all that great stock and dividend information.

One glitch: NASDAQ listings seem to add one line of information to the table id="news-table"

"Next Earnings: ..."

which does not appear in TSX news table and as a result throws off your table at the bottom of Raw Dividend Data

Collapse
 
milktea02 profile image
milktea02

Thanks for your kind comment and catching this bug!