Kaggle is basically a Reddit for us data people.
Have a question? Ask it.
Made something? Share it.
Like someone else's work? Upvote it!
It's an information dreamland and the best place to share your data innovations.
And this is my story about my data innovation that was born out of a very niche requirement.
The Accidental Viral Dataset(s)
Now, let's get one thing straight - I never set out to make a viral dataset. I just wanted to solve a problem, and in doing so, I ended up creating something that resonated with more people than I thought.
Dataset #1: Dog Intelligence Comparison Based on Size
Yes, you read that right. A dataset that compares dog intelligence based on their size. Because why not? Who doesn't want to know if their tiny Pomeranian is secretly a genius or if their big, goofy Labrador is just… well, big and goofy?
For one of my college assignments, I wanted to do something different - something that would grab my classmates' attention and add a little humor to the daily grind of lectures. So, instead of a standard dataset, I decided to explore the world of dog intelligence.
I compiled data from various sources, structured it, and made it accessible. Before I knew it, the dataset gained traction, with people analyzing it for fun and even using it for machine learning models. I mean, imagine training an AI to predict if your future dog will outsmart you - absolutely worth it.
Currently at 16K views & a whooping 3.3K, all-time downloads!
Dataset #2: Indian Names Corpus (NLTK Data)
This one came from a more practical need. While working on an NLP project during my internship at HSBC, I discovered a frustrating gap - there was no comprehensive Indian Names corpus available online. The ones I did find were either too small or lacked last names entirely.
So, I decided to build my own. Using manual data collection techniques and a bit of Python magic (a.k.a. BeautifulSoup), I wrote a script that scraped first names and last names from various baby name websites. The goal was simple: create a structured, accessible dataset for anyone working on NLP projects involving Indian names.
Turns out, I wasn't the only one who needed it. This dataset quickly became a go-to for people developing NLP models for chatbots, identity verification, and text processing. The demand for diverse datasets is real, and I was happy to contribute something meaningful.
This one, even after 2 years, was downloaded 34 times in the last 30 days!
One day, I casually uploaded a dataset. The next, I was watching it take off like a rocket. Kaggle's algorithm pushed it to more users, people started using it in their projects, and suddenly, I had my first (and then second) viral dataset.
My datasets didn't just find a home on Kaggle - they sparked curiosity, inspired new analyses, and led to some fascinating notebooks created by other users exploring them in unexpected ways."
Lessons from the Kaggle Fame
1️⃣ The niche is your best friend - You don't have to create the biggest dataset on Kaggle. Sometimes, a hyper-specific dataset serves a bigger purpose.
2️⃣ Data Science needs diversity- Whether it's dog intelligence or Indian names, there's always a gap to fill. Find it.
3️⃣ Kaggle is more than competitions - It's a space to share, collaborate, and innovate. Uploading my datasets wasn't just about sharing data - it was about contributing to a community that thrives on curiosity.
What's Next?
Now that I’ve experienced the excitement of sharing datasets on Kaggle, will there be more? Definitely. What’s coming next? Well, you’ll just have to wait and see. 😉
Till then, if you haven't already, check out my datasets here:
🐶 Dog Intelligence Comparison Based on Size
📜 Indian Names Corpus (NLTK Data)
👀Hair Eye Color
And if you're on Kaggle, say hi! Let's geek out over data together. 🚀
Stay tuned for more content on how to build a Kaggle profile and thrive in the Data World!
Top comments (0)