DEV Community

Cover image for GETTING STARTED WITH MACHINE LEARNING: A BEGINNER’S GUIDE USING SCIKIT-LEARN
Aniekpeno Thompson
Aniekpeno Thompson

Posted on

GETTING STARTED WITH MACHINE LEARNING: A BEGINNER’S GUIDE USING SCIKIT-LEARN

GETTING STARTED WITH MACHINE LEARNING: A BEGINNER’S GUIDE USING SCIKIT-LEARN

Introduction
Machine learning is a subset of artificial intelligence (AI) that enables systems to learn and improve from data without being explicitly programmed. It plays a critical role in data science by providing tools and techniques to make predictions, uncover patterns, and automate decision-making processes.

With machine learning, you don’t have to gather your insights manually. You just need an algorithm and the machine will do the rest for you! Isn’t this exciting? Scikit learn is one of the attractions where we can implement machine learning using Python.

It is a free machine learning library which contains simple and efficient tools for data analysis and mining purposes.In machine learning, tasks are typically divided into two main categories: classification and regression.

Classification involves predicting discrete labels (e.g., spam vs. not spam), while regression predicts continuous values (e.g., house prices).

What is Machine Learning?
Machine learning revolves around algorithms that improve through experience and data. Depending on the type of data and problem, machine learning can be broadly classified into two types:

Supervised Learning: In supervised learning, the model learns from labeled data, where input-output pairs are provided. Examples include:Classification: Predicting categories like sentiment analysis (positive/negative).
Regression: Predicting continuous outcomes like stock prices.
Supervised Learning: This is a process of an algorithm learning from the training dataset. Supervised learning is where you generate a mapping function between the input variable (X) and an output variable (Y) and you use an algorithm to generate a function between them.

It is also known as predictive modeling which refers to a process of making predictions using the data. Some of the algorithms include Linear Regression, Logistic Regression, Decision tree, Random forest, and Naive Bayes classifier.
We will be further discussing a use case of supervised learning where we train the machine using logistic regression.

Unsupervised Learning: This is a process where a model is trained using information which is not labeled. This process can be used to cluster the input data in classes on the basis of their statistical properties.
Unsupervised learning is also called as clustering analysis which means the grouping of objects based on the information found in the data describing the objects or their relationship.

The goal is that objects in one group should be similar to each other but different from objects in another group. Some of the algorithms include K-means clustering, Hierarchical clustering etc.

Introduction to Scikit-Learn
Scikit-Learn is a Python library designed for efficient and straightforward implementation of machine learning algorithms. It is highly regarded for its simplicity, consistency, and extensive functionality. Key features include:
Preprocessing tools for preparing data.
A wide range of algorithms for classification, regression, and clustering.
Model evaluation and validation techniques.

In this article, we will be discussing Scikit learn in python. Before talking about Scikit learn, one must understand the concept of machine learning. I will take you through the following topics, which will serve as fundamentals for the upcoming blogs:

Overview of Scikit Learn
Scikit learn is a library used to perform machine learning in Python. Scikit learn is an open source library which is licensed under BSD and is reusable in various contexts, encouraging academic and commercial use. It provides a range of supervised and unsupervised learning algorithms in Python.
Scikit learn consists of popular algorithms and libraries. Apart from that, it also contains the following packages:
.NumPy
.Matplotlib
.SciPy (Scientific Python)

To implement Scikit learn, we first need to import the above packages. You can download these two packages using the command line or if you are using PyCharm, you can directly install it by going to your setting in the same way you do it for other packages.Next, in a similar manner, you have to import Sklearn. Scikit learn is built upon the SciPy (Scientific Python) that must be installed before you can use Scikit-learn. You can refer to this website to download the same. Also, install Scipy and wheel package if it’s not present, you can type in the below command:

pip install scipy

After importing the above libraries, let’s dig deeper and understand how exactly Scikit learn is used.

Scikit learn comes with sample datasets, such as iris and digits. You can import the datasets and play around with them. After that, you have to import SVM which stands for Support Vector Machine. SVM is a form of machine learning which is used to analyze data.With this, we have covered just one of the many popular algorithms python has to offer.

We have covered all the basics of Scikit learn the library, so you can start practicing now. The more you practice the more you will learn.If you wish to check out more articles on the market’s most trending technologies like Artificial Intelligence, DevOps, Ethical Hacking, then you can refer to Edureka’s official site.

Do look out for other articles in this series which will explain the various other aspects of Python and Data Science.  

How to Use Scikit-Learn in Python?

Here’s a small example of how Scikit-learn is used in Python for Logistic Regression:from sklearn.linear_model import LogisticRegression; model = LogisticRegression().fit(X_train, y_train)

Explanation:
from sklearn.linear_model import LogisticRegression: It imports the Logistic Regression model from scikit-learn’s linear_model module. 

model = LogisticRegression().fit(X_train, y_train): It creates a Logistic Regression classifier object (model).

.fit(X_train, y_train): It trains the model using the features in X_train and the corresponding target labels in y_train. This essentially lets the model learn the relationship between the features and the classes they belong to (e.g., spam vs not spam emails).

Now, you must have understood what is Scikit-learn in Python and what it is used for. Scikit-learn is a versatile Python library that is widely used for various machine learning tasks. Its simplicity and efficiency make it a valuable tool for beginners and professionals.  

Building Your First Model
Let’s walk through building a simple classification model using Scikit-Learn and the Iris dataset.

Step 1: Import Libraries and Load the Dataset

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

Load datasetiris = load_iris()X, y = iris.data, iris.target

Split datasetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 2: Train a Model
We will use the K-Nearest Neighbors (KNN) algorithm:

Initialize the modelknn = KNeighborsClassifier(n_neighbors=3)# Train the modelknn.fit(X_train, y_train)

Step 3: Make Predictions

Make predictionspredictions = knn.predict(X_test)

Evaluating the Model
Evaluation is crucial to understand how well your model performs. Scikit-Learn provides several metrics:

Step 1: Import Metricsfrom sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report.

Step 2: Calculate Metrics# Accuracyaccuracy = accuracy_score(y_test, predictions)print(f"Accuracy: {accuracy}")

Detailed classification reportprint(classification_report(y_test, predictions))

Accuracy: The proportion of correct predictions.
Precision: The fraction of relevant instances among the retrieved instances.
Recall: The fraction of relevant instances that were retrieved.

Conclusion

Understanding basic machine learning concepts and building simple models are the first steps toward mastering this exciting field. Scikit-Learn’s simplicity and robustness make it an ideal starting point. By experimenting with models like KNN and Logistic Regression, you build a strong foundation for tackling more complex algorithms and techniques in the future.Useful

Resources
Scikit-Learn Documentation: https://scikit-learn.org/stable/ Machine Learning Crash Course by Google: https://developers.google.com/machine-learning/crash-course Kaggle’s Machine Learning Tutorials: https://www.kaggle.com/learn/machine-learning

Top comments (0)