DEV Community

Cover image for Gemika’s Enchanted Guide to Iris Dataset with Magic and Machine Learning 🌟🧙‍♂️ (Part #6)
gerry leo nugroho
gerry leo nugroho

Posted on

Gemika’s Enchanted Guide to Iris Dataset with Magic and Machine Learning 🌟🧙‍♂️ (Part #6)

Greetings, my fearless young sorcerers! I’m Professor Gerry Leo Nugroho, your guide through the enchanted halls of data science at Hogwarts, and a trusted comrade of Albus Dumbledore. Last time, we ventured into Hagrid’s hut and tamed the wild Iris Dataset like a pack of rowdy dragons—cleaning gaps and scaling numbers until they purred like kittens! 🐉

Now, my little Gryffindor champion, Gemika Haziq Nugroho, and I are grasping the Elder Wand itself. It’s time to split our magical scroll into two mighty realms—ready for a wizardly challenge! ✨📜


Chapter 6: The Elder Wand’s Power: Splitting the Dataset 🪄⚡

Imagine standing atop the Astronomy Tower, the Elder Wand humming in your hand, its ancient power crackling like a storm over the Black Lake. With a single flick, we’ll divide the Iris Dataset into two enchanted groups: training and testing. 🌺

Think of it as splitting a dragon’s hoard—most of the treasure (80%) goes to train our spells, while a precious chunk (20%) tests their might. This isn’t just a random chop—it’s a sacred art, ensuring our magic grows strong and true, like Dumbledore facing Grindelwald in a duel of legends! ⚡🪞 Together, we’ll wield this power to make our Irises bow to our command!


6.1 The Code & Algorithm: The Spell of Division

Let’s summon our spellbook (or Jupyter Lab) and cast the mighty train_test_split charm from sklearn. This spell slices our dataset with the precision of a phoenix feather! Here’s the magic, with a wink to my eager Gemika:

# Summoning the Elder Wand’s tools
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Loading our tamed Iris scroll
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Preparing the wand—features (X) and labels (y)
X = iris_df.drop('species', axis=1)  # The flower traits
y = iris_df['species']               # The flower names

# Spell: Split the dataset—like a flick of the Elder Wand!
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Peeking at our realms
print("Training Realm (80%):", X_train.shape)
print("Testing Realm (20%):", X_test.shape)
Enter fullscreen mode Exit fullscreen mode

6.2 What’s Happening in the Spell?

  • X and y: We separate the flower traits (X) from their names (y)—like sorting wands from their owners!
  • train_test_split: This charm splits our data—80% (120 flowers) for training, 20% (30 flowers) for testing. The random_state=42 is our magical seed, ensuring the split’s the same every time—like a spell etched in stone!
  • .shape: Shows the size of each realm—like counting knights for a battle!

Run this, and you’ll see:

Training Realm (80%): (120, 4)  
Testing Realm (20%): (30, 4)  
Enter fullscreen mode Exit fullscreen mode

Four columns for traits, split into two groups—our Elder Wand has spoken! 🌟✨

Gemika Haziq Nugroho - Gerry Leo Nugroho - 05

Gemika Haziq Nugroho - Gerry Leo Nugroho - 07


6.3 🧙‍♂️ The Scrolls of Wisdom Speak: A Deep Dive into the Mystical Iris Realm

As the last echoes of our incantations fade, we turn our gaze upon the ancient scrolls—our dataset, now revealed in full splendor. What secrets does it whisper? What arcane truths have we uncovered? Let us don our enchanted spectacles and peer into the magical insights before us!

6.3.1 🔮 The Grand Reveal: A Dataset Full of Mystery

Behold! The first glance at our Iris scrolls—rows upon rows of sacred floral measurements. The wise botanists of old have meticulously recorded the spells of sepal length, sepal width, petal length, and petal width, weaving them together with the names of three magical species: Setosa, Versicolor, and Virginica.

But alas! Not all is as simple as it seems. We must look deeper, beyond the mere ink upon parchment, to find the true essence of each flower.


6.3.2 📜 The Grand Split: Knowledge Divided for Training and Testing

Like the sorting of first-years into their Hogwarts houses, we have split our data into two realms:

  • 1️⃣ The Training Realm (80%) – A vast, sprawling expanse where young models shall hone their magical abilities.
  • 2️⃣ The Testing Realm (20%) – A secretive, untamed land where knowledge will be tested against unseen challenges.

One might ask—why this division? Ah, young wizard, for just as students must study before their O.W.L exams, our machine learning spells must first be trained before facing the great unknown!


6.3.3 🎨 The Grand Iris Visualization: A Spellbinding Revelation

With a flick of our wands (and the powers of Seaborn), a grand tapestry appears before us—a scatterplot of unparalleled beauty. Colors swirl like house banners in the Great Hall, each dot a flower, each cluster a tale of nature’s design.


6.3.4 🧐 Key Insights from the Arcane Patterns:

📍 The Setosa Revelation – Like a lone castle shrouded in mist, Setosa stands apart, its petal and sepal measurements so distinct that even a first-year student could separate them. No confusion, no hesitation—it is an island in a sea of data.

📍 The Versicolor-Virginica Duel – But behold! A battle of shadows unfolds! Versicolor and Virginica, though noble species in their own right, find themselves entangled, their features overlapping like rival spells clashing in midair. Their petal length and width dance too closely, blurring the lines between them. Could a finer incantation—a stronger classifier—separate them more clearly?

📍 The Hidden Doorway to the Future – Our visualization hints at a prophecy: the fate of classification spells yet to come. With the power of machine learning, we could forge a model strong enough to untangle the Versicolor-Virginica conundrum, turning our observations into predictive magic!


6.4 Hogwarts Application: Triwizard Tournament Trials

Picture the Triwizard Tournament—Harry, Cedric, Fleur, and Viktor lined up, wands sparking. Dumbledore booms, “Gerry, divide the Hogwarts students for the First Task!” With train_test_split, we’d split them—80% to practice dodging dragons (training), 20% to face the real roaring beasts (testing).

We’d train our champions on most of the clues—broom skills, spell speed—then test their mettle against the Hungarian Horntail. Our split ensures they’re ready, just like prepping our Irises for magical predictions! 🏆🐉✨


6.5 Gemika’s Quiz Time! 🧑‍🚀

My little Gemika, twirling a stick like it’s the Elder Wand, tilts his head. “Abi,” he wonders, “why do we split the Iris flowers into two groups?” I chuckle—he’s sharper than a Goblin’s blade! What do you think, young wizards?

  • A) To confuse the flowers so they grow upside down.
  • B) To train our magic on most, then test it on a few—like a duel!
  • C) To give half to Hagrid’s dragons as a snack.

Scribble your answer or shout it louder than a Quidditch cheer—Gemika’s all ears! 🗣️📝✨ (Hint: Think Tournament, not treats!)


6.6 Next Chapter: The First Spell Unleashed

Hold your broomsticks, because next we’re casting our first prediction spell—K-Nearest Neighbors! We’ll use our training realm to teach it, then test it on the rest—like finding lost first-years in the castle maze. It’ll be so thrilling, even Peeves might join the fun! Get ready for more magic and a dash of mischief!🌟✨

Top comments (0)