π§ There's a tool for everything nowadays
A colleague of mine showed something to me today that I had never come across before and I was impressed π:
Google Dataset Search π
A search engine (powered by Google, who aren't too bad at that search thing) that returns results back as a semi-curated list of datasets π available on the web, regardless of where they are hosted!
(It's been around for quite a while now too!)
π€·ββοΈ So what?
One of the biggest problems with both learning and understanding topics like machine learning and big data analytics is getting access to large datasets.
Lots of sites (such as Kaggle) have made awesome inroads into making datasets more accessible but they can't possibly host everything.
And that's where properly indexing and search can help.
Google has a good history in making popular search engines. But it's the approach behind dataset search that I'm more interested in:
Standardisation π - it's up to dataset owners to make their dataset indexable in a specific format, so it can be found more easily and more precisely.
Give me an example! π§
Okay, try searching for "programming":
https://datasetsearch.research.google.com/search?query=Programming
What do we see?
- ποΈ Three different datasets.
- π‘ Three potential project ideas.
- π Three different data sources.
It's that last one that works for me - I don't need to go through curated lists of data set sources or validate the security of a dataset found in a Reddit post. I can just search.
Let me know what you think!
- Have you used dataset search?
- Where do you get your datasets?
- What do you use public datasets for?
𧑠Tom Anderson
www.thomas-anderson.net
Liked something I did and want to help me out?
Top comments (1)
Great resource bro. Thanks