So this year I am going into Transition Year and we are advised to search for work experience as it is part of the Transition Year program here in Ireland which is great.
They provided us with websites that show work experiences specifically for the Transition year which are
Then they told us to look daily on these websites for experiences as most work experiences are "first come, first served" which means that they are gonna get whoever applies first.
So... As a developer, I had the idea that I can probably get the changes of a website through fetch or cURL so I don't have to check every day for changes on the work experiences and still know if a new experience arises.
Alright so how am I gonna do this? I decided to use my trusty and favorite Node.js and a method that is called "Web Scraping".
Web scraping is the process of using bots to extract content and data from a website.
So firstly I created a loop that would fetch careersportal. ie every 1 minute and display me the HTML.
setInterval(function() {
fetch("https://careersportal.ie/workx/student_search.php?s_sector1=8&s_ty=&s_cx_county=6&s_cx_address=", {
"method": "GET",
"headers": {}
})
.then(res => res.text())
.then(text => console.log(text));
}, 60000);
Which... Didn't work.
It kept responding to me with HTML code of what seemed to be google statistics and drupal which was quite unusual. After debugging and trying to use something such as Phantomjs which is a headless browser used for automating web page interaction still no result. I decided to dig inside the code of the website in case it had something that blocked the request.
I ended up finding that the whole webpage https://careersportal.ie is just for displaying buttons and it is actually using an iframe to display all the work experience placings from a URL that was https://cc.careersportal.ie. So I used that URL to fetch everything and it seemed to work!
Now on the problems with fetching https://ty.ie/
Thankfully Ty.ie uses an API which it gets its work experiences from, which was using WordPress. So I was able to fetch that API in order to get the companies that had work experience.
While CareersPortal has work placements with descriptions and contact info, Ty.ie has companies which they affiliate with for Ty Students to apply.
While the fetch of ty.ie API worked it only worked for around 1 hour which is not ideal. I found out that ty. ie uses something called Nonce which is added by WordPress.
A nonce is a one-time use security token generated by WordPress to help protect URLs from forms of misuse.
So I had to think about how to bypass this as I realized that I needed to provide a nonce or else the API would return an error.
So after testing and looking into the code of the website. It looks like when you enter the https://ty.ie/ website it indeed creates a nonce that is stored in a variable called "um_scripts" and then used by the website to access the API.
With that info, I am able to extract that code from the HTML using something called JSDOM which is an npm package that parses and interacts with assembled HTML just like a browser so I am able to use code to get that variable and then called the API myself with that nonce code to get the companies!
Which seems to work!
Now we need to check if there is any difference on the pages which I am able to detect by saving a previous fetch response on a txt file and then comparing the 2 responses to see if anything changed using the following code
const diff = (diffMe, diffBy) => diffMe.split(diffBy).join('')
const differences = diff(oldpage, newpage)
if (differences == "") {
// No Change
} else {
// Something changed
}}
If there is any difference we want to alert the user. I decided to do that by using Discord.js to make a discord bot that will notify me when there is a change within a specific channel that will be associated with that fetch/request!
Here is a diagram of how it all works basically:
This was a pretty interesting project as it had some interesting problems and I learned a lot such as how to get information from a website but also new packages that I haven't used before such as JSDOM and also how to connect everything together using node.js!
-
Originally Posted on: https://blog.arisamiga.rocks/post/tyexperience/
Top comments (0)