If you've been following along you know I've been working on a Ruby CLI gem that will automate adding WordPress blog data to AirTable. It uses the WordPress API to collect a blog's title, date published, ID and URL, formats the data, and then uses the AirTable API to create new rows in a specified table.
Last week, I talked about how the gem can only add all the blog post data wholesale to AirTable. It doesn't take into account the rows of data already in the table, meaning we could have duplicate content.
At first, I considered having the gem's sync() method accept flags, so that the user could specify how the data was synced: just adding new rows or deleting all the table data first before adding all the WordPress data wholesale. After some deliberation, I decided to scrap that idea.
New Plan
Instead of accepting flags for the sync method, the gem should get all the data from WordPress, all the data from AirTable and compare those two sets of data.
Why?
We could just check and see what the highest ID was in AirTable and retrieve any WordPress posts with an ID greater than that ID. Unfortunately, we wouldn't be able to tailor our request for conditional IDs (meaning, we can't ask the WordPress API for only posts with IDs great than 772). So we would have to request all of the posts anyway.
We could just request the WordPress API to only send us the posts published after a certain date. Unfortunately, WordPress authors can change their publish date, meaning we might end up with holes in our data.
Instead, I want to compare the WordPress post IDs from two whole sets of data. Any IDs that exist in both sets will be updated in AirTable (because maybe we changed a blog post's name to improve the SEO). Any IDs that exist only in the WordPress dataset will be added to AirTable as new rows. Any IDs that only exist in the AirTable dataset will have their rows removed from AirTable (maybe we decided to delete a post that didn't make sense anymore).
First Steps
The first step in accomplishing this new workflow is making sure all the data is retrieved from AirTable. AirTable (like WordPress) sends a max of 100 records per request. If there are more records available, the response will have a top level key of offset. To get the next set of data, you send the offset with the new request.
How I accomplished it
def collect_row_data
row_data = []
offset = ""
loop do
at_response = call_at("",offset)
row_data.push(at_response.parsed_response["records"].collect{|post| {post["id"] => post["fields"]["ID"]}})
offset = at_response.parsed_response["offset"]
break if !at_response.parsed_response["offset"]
end
row_data.flatten
end
I decided to use a do while loop to call the AirTable API as long as there is an offset value in the response. Every time we get a response back, we create a new hash, with the key being the AirTable row's id and the value being the WP post id. If we don't get a response with a key of offset, we break the loop. This way, we know we will always call the AirTable API at least once. Finally, we flatten our array of arrays and return it.
Next Steps
Next, I'll work on a method for comparing our two datasets. Then we'll send each array of IDs ([exist in both], [only in AT], [only in WP]) to methods that will handle updating, deleting and adding AirTable rows.
Top comments (0)