If you are here reading this blog post, you must be well aware of the importance of machine learning (ML). The applications of machine learning have been increasing tremendously in 2023 due to its high demand in both business and academia. There is a wide variety of ML algorithms, which can categorised into three broad groups:
- Supervised Learning algorithms model the relationship between features (independent variables) and a label (target) given a set of observations. Then the model is used to predict the label of new observations using the features.
- Unsupervised Learning algorithms try to find the structure in unlabelled data.
- Reinforcement Learning works based on an action-reward principle. An agent learns to reach a goal by iteratively calculating the reward of its actions.
What Is Data Labelling?
In ML, data labelling is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that a machine learning model can learn from it. For example, labels might indicate whether a photo contains a cow or a cup, which words were articulated in an audio recording, or if an x-ray contains a tumour. Data labelling is required for a range of use cases including computer vision, natural language processing, and speech recognition.
In this blog post, we will give you an overview of supervised machine learning algorithms that are commonly used.
1. Introduction
1.1 Definition
Supervised learning is a machine learning approach where algorithms learn from labelled data to identify patterns and relationships. Labelled data provides the algorithm with correct answers, enabling accurate predictions on new, unseen data. It plays a crucial role in training, allowing algorithms to recognize patterns, generalise, and make precise predictions.
Without labelled data, supervised learning lacks the guidance for informed decisions. Leveraging labelled data has led to remarkable success in domains like spam detection and medical diagnosis.
1.2 Importance and Applications
Supervised learning has made significant impacts in various industries. In healthcare, it can be used for medical diagnosis, where it can diagnose diseases accurately by analysing patient data. It is also used in drug discovery, where it accelerates the identification of potential drugs through pattern recognition. In the domain of finance, it assists in fraud detection by identifying suspicious patterns in real time. It also assists with credit scoring, where it assesses the creditworthiness accurately, enabling informed lending decisions.
In marketing, it can help with customer segmentation for targeted marketing. It also predicts customer churn, allowing proactive retention strategies. It also assists in email spam detection, where it filters spam emails, enhancing email management. It can help with image recognition to perform accurate object classification and facial recognition. Sentiment analysis can also be performed, where the supervised learning determines sentiment polarity, providing insights into public opinion. These applications highlight how supervised learning improves accuracy, efficiency, and decision-making across industries.
2. Understanding Supervised Learning
2.1 Key Concepts
In supervised learning, the dataset includes input features (independent variables) and target labels (dependent variables). The algorithm analyses the labelled data to extract patterns and relationships between the input features and target labels. During training, the algorithm adjusts its parameters to minimise the difference between predicted and actual labels. This process enables the algorithm to accurately map input features to target labels.
Once trained, the algorithm can make predictions or classifications on new, unlabelled data by applying the learned patterns. Labelled data is crucial as it allows the algorithm to generalise and make informed decisions on unseen instances. Understanding these concepts is essential for implementing supervised learning algorithms and exploring their applications.
2.2 Types of Supervised Learning
Supervised learning can be categorised into two main types: classification and regression. Let's explore each type in more detail:
Classification:
Classification involves categorising data into predefined categories or classes based on their features. There are two primary types of classification:
- Binary Classification: In binary classification, the data is divided into two classes or categories. For example, classifying emails as spam or non-spam, or determining whether a transaction is fraudulent or legitimate.
- Multi-Class Classification: In multi-class classification, the data is classified into more than two classes. For instance, classifying images into different objects or recognizing handwritten digits from 0 to 9.
Real-world examples of classification problems include sentiment analysis in natural language processing, customer churn prediction, disease diagnosis, and image recognition.
Regression:
Regression involves predicting continuous numerical values based on input features. The goal is to establish a relationship between the independent variables and the dependent variable, enabling us to estimate or predict numeric outcomes accurately. Regression finds applications in various domains, such as predicting housing prices based on features like location, size, and amenities, estimating sales revenue based on marketing expenditure, and forecasting stock prices. By understanding the distinction between classification and regression, you can comprehend the diverse range of supervised learning tasks and their applications in solving real-world problems.
2.3 The Role of Labelled Data in Supervised Learning
Labelled data plays a crucial role in supervised learning algorithms. Here's why it is so important:
- Providing Correct Answers: Labelled data provides the algorithm with target labels during the training process. This allows the algorithm to learn the relationship between the input features and the desired output. It helps the algorithm make informed decisions and learn from known outcomes.
- Guided Learning: Supervised learning relies on labelled examples to understand and generalise patterns, allowing accurate predictions or classifications on unseen instances.
- Learning Underlying Patterns: Labelled data helps the algorithm identify patterns, dependencies, and correlations, capturing the essence of the data.
- Training for Accuracy: The algorithm adjusts its internal parameters or model structure to minimise the discrepancy between its predicted output and the true labels. It serves as training ground for the algorithm to learn from its mistakes and make increasingly accurate predictions or classifications.
Labelled data guides and trains supervised learning algorithms, enabling them to make accurate predictions in real-world scenarios.
3. Exploring Common Supervised Learning Algorithms
3.1 Linear Regression
Linear regression is a popular algorithm used for predicting continuous numerical values based on input features. It assumes a linear relationship between the input features and the target variable. The algorithm aims to find the best-fitting line that minimises the difference between the predicted values and the actual target values.
In linear regression, the algorithm learns the coefficients (slopes) and the intercept of the line that represents the linear relationship between the input features and the target variable. It uses a mathematical technique called Ordinary Least Squares (OLS) to estimate the optimal values for these coefficients.
The algorithm's prediction is a weighted sum of the input features, where each feature is multiplied by its corresponding coefficient. The intercept represents the predicted value when all input features are zero.
Let's go through a simple code implementation of linear regression using Python and the Scikit-learn library:
<Import the necessary libraries>
In this example, we have a simple dataset with one input feature (X) and the corresponding target variable (y). We create a linear regression model and train it using the fit method. Then, we use the trained model to make predictions on new data (new_data) using the predict method.
In linear regression, the coefficients represent the slopes of the line, indicating the relationship between the input feature and the target variable. A positive coefficient indicates a positive relationship, while a negative coefficient suggests a negative relationship. The magnitude of the coefficient reflects the strength of the relationship.
To evaluate the model's performance, various metrics can be used, such as the Mean Squared Error (MSE) or the coefficient of determination (R-squared). These metrics assess the accuracy and goodness-of-fit of the model's predictions to the actual target values. A lower MSE and a higher R-squared indicate better model performance.
3.2 Logistic Regression
Logistic regression is another algorithm used for binary classification problems, where the target variable has two possible outcomes (e.g., yes/no, true/false). The algorithm applies a sigmoid function to transform the linear combination of input features into a probability score between 0 and 1.
The algorithm learns the coefficients (weights) associated with each input feature to maximise the likelihood of the observed data. It uses a technique called Maximum Likelihood Estimation (MLE) to estimate the optimal values for these coefficients.
<Import the necessary libraries>
The same dataset example is used. Similar to linear regression, it is trained using fit method, and then predictions are made on new data (new_data).
In logistic regression, the coefficients represent the impact of each input feature on the log-odds of the positive class. Positive coefficients suggest a positive relationship with the positive class, while negative coefficients suggest a negative relationship. To evaluate the model's performance, metrics such as accuracy, precision, recall, and F1-score can be used. These metrics assess the model's ability to correctly classify instances and its overall predictive performance.
3.3 Decision Trees
Decision trees are versatile algorithms used for both classification and regression tasks. They learn a series of hierarchical decision rules based on the input features to make predictions. Each internal node represents a decision based on a specific feature, while each leaf node represents a class label or a predicted value. The algorithm splits the data at each internal node based on the feature that maximises the separation of the classes or minimises the impurity of the target variable.
<Import the necessary libraries>
The splitting process continues recursively until a stopping criterion is met, such as reaching a maximum depth or having a minimum number of samples at a leaf node. This hierarchical structure allows decision trees to capture complex relationships and make predictions based on multiple features.
In this example, we have a dataset with two input features (X) and a binary target variable (y). We create a decision tree classifier using the DecisionTreeClassifier class and then train and predict like the previous model.
The output requires examining the splits at each internal node and the corresponding class labels or predicted values at the leaf nodes. This allows us to understand the decision-making process and the rules followed by the algorithm. To evaluate the model's performance, metrics such as accuracy, precision, recall, and F1-score can be used. These metrics assess the model's ability to correctly classify instances and its overall predictive performance.
3.4 Support Vector Machine
Support Vector Machine (SVM) is a powerful algorithm used for both classification and regression tasks. It aims to find an optimal hyperplane that separates data points belonging to different classes with the maximum margin. In SVM, data points are represented as vectors in a high-dimensional feature space. The algorithm finds a hyperplane that best separates the classes by maximising the distance (margin) between the hyperplane and the nearest data points, known as support vectors.
SVM can handle both linearly separable and non-linearly separable data by using different kernel functions. These functions transform the data into a higher-dimensional space, where the classes become linearly separable. The goal of SVM is to find the hyperplane that achieves the maximum margin while minimising classification errors. This creates a robust decision boundary that generalises well to unseen data.
The same dataset example is used as a decision tree. We create an SVM classifier using the SVC class with a linear kernel, and then trained and predicted like the previous model. After training the SVM model and making predictions, we need to interpret the results and evaluate the model's performance.
The result involves understanding the position of the hyperplane and the support vectors in the feature space. The hyperplane separates the classes, and the support vectors are the data points closest to the hyperplane. To evaluate the model's performance, metrics such as accuracy, precision, recall, and F1-score can be used. These metrics assess the model's ability to correctly classify instances and its overall predictive performance.
<Import the necessary libraries>
3.5 Neural Networks
Neural networks, also known as Artificial Neural Networks (ANN), are powerful machine learning algorithms inspired by the structure and function of the human brain. They consist of interconnected layers of artificial neurons, called nodes or units, that work together to process and analyse data.
In a neural network, information flows through the layers from the input layer to the output layer. Each node in a layer receives input signals, performs a computation using weights and activation functions, and passes the output to the next layer. The weights represent the strength of connections between nodes, and the activation functions introduce non-linearities into the model.
Neural networks learn by adjusting the weights based on the input data and desired outputs. This process, known as training, involves minimising the difference between the predicted outputs and the actual targets using optimization algorithms such as gradient descent. The network learns to recognize patterns and relationships in the data through iterative training.
Neural networks can be shallow with only a few layers or deep with many hidden layers. Deep neural networks, also known as deep learning models, have achieved remarkable success in various domains, including image and speech recognition, and natural language processing.
<Import the necessary libraries like numpy and keras>
The same dataset example is used as a decision tree. We create a neural network model using the Sequential class from Keras. The model consists of two fully connected layers (Dense layers) with ReLU and sigmoid activation functions. We compile the model with binary cross-entropy loss and the Adam optimizer, and then train and predict like the previous model.
The output involves understanding the learned weights and biases of the model and their impact on the computations within each node. It also involves analysing the activation patterns and the overall flow of information through the network. To evaluate the model's performance, metrics such as loss and accuracy can be used. Loss measures the discrepancy between the predicted outputs and the actual targets, while accuracy assesses the model's ability to correctly classify instances.
4. Practical Example: Email Spam Classification with Neural Network
In this practical example, we will demonstrate how to build a neural network model for email spam classification using a basic fictitious dataset. The example will walk you through the steps of loading and preprocessing the dataset, feature extraction and transformation, training the model, and evaluating its performance.
4.1. Loading and Preprocessing the Email Dataset
We start by loading our fictitious email dataset, which consists of a collection of labelled emails classified as either spam or non-spam. Let's consider a small dataset with 100 emails, where 60 emails are labelled as spam (1) and 40 emails are labelled as non-spam (0). Each email is represented as a text string, and the corresponding label indicates its classification.
4.2. Feature Extraction and Transformation
To transform the email text into numerical features, we employ a simple bag-of-words approach. We create a vocabulary of unique words present in the dataset and represent each email as a binary vector indicating the presence or absence of these words. For example, if our vocabulary contains 1,000 unique words, each email will be represented as a binary vector of length 1,000. To evaluate the performance of our model, we split the dataset into training and testing sets. We randomly divide the data, allocating 80% (80 emails) for training and 20% (20 emails) for testing. This ensures that the model is trained on a sufficient amount of data while having unseen examples for evaluation.
4.3 Training and Evaluation
Next, we build a neural network model using the Keras library. The model consists of an input layer, one or more hidden layers, and an output layer. Let's consider a simple architecture with one hidden layer containing 50 neurons. We choose the ReLU activation function for the hidden layer and the sigmoid activation function for the output layer to classify emails as spam or non-spam. We compile the model with binary cross-entropy loss and Adam optimizer. Then, we train the model using the training data. Let's train the model for 10 epochs. After training, the model will learn to classify emails based on the patterns and relationships it discovers in the training data.
After training, we evaluate the model's performance using the testing data. We calculate evaluation metrics such as accuracy, precision, recall, and F1-score to assess how well the model generalises to unseen emails. For our example, let's assume our model achieves an accuracy of 85% on the testing set.
4.4 Prediction
Finally, we demonstrate how to use the trained model to make predictions on new, unseen email samples. Let's take a few example emails, preprocess them, and pass them through the trained model. The model will predict the probability of each email being spam, and we can apply a threshold (e.g., 0.5) to classify them as either spam or non-spam. For instance, if the predicted probability is above 0.5, we classify the email as spam; otherwise, we classify it as non-spam.
By following this practical example, you can learn how to build a neural network model for email spam classification using a basic fictitious dataset. The example illustrates the steps involved in loading and preprocessing the data, feature extraction and transformation, training the model, and evaluating its performance. Additionally, it demonstrates how to make predictions on new email samples. With this example, you can gain hands-on experience in applying neural networks to solve the email spam classification problem.
5. Conclusion
In this beginner's guide to supervised learning, we explored the fundamental concepts and practical applications of this powerful machine learning approach. We discussed the importance of labelled data in supervised learning and its role in enabling accurate predictions and classifications.
Through the blog, we covered various types of supervised learning, including classification and regression, and introduced popular algorithms such as linear regression, logistic regression, decision trees, support vector machines (SVM), and neural networks. We provided code examples and highlighted the interpretation and evaluation of results.
A practical example of email spam classification using a neural network was presented, showcasing the step-by-step process of data preprocessing, feature extraction, model training, evaluation, and prediction.
To further your journey in supervised learning, we encourage you to explore additional datasets, experiment with different algorithms, and participate in real-world projects. By continuously practising and expanding your knowledge, you can unlock the full potential of supervised learning and make meaningful contributions to the field of machine learning.
About the Company
E2E Cloud, a leading self-service cloud platform from India, is perfectly aligned with the principles of supervised learning discussed in this blog. Just as supervised learning algorithms rely on labelled data to make accurate predictions, E2E Cloud leverages cutting-edge technology to deliver high performance for web and mobile server-side applications. With our Compute Platform utilising high-frequency CPU cores from the latest generations of Intel Xeon or AMD EPYC CPUs, E2E Cloud ensures exceptional processing power for optimal performance.
At E2E Cloud, we understand the importance of a seamless user experience, which is why our compute plans include generous system memory and fast SSD or NVME SSD storage. This strategic combination ensures a remarkable performance-to-price ratio, enabling businesses to run their applications efficiently and effectively. Additionally, E2E Cloud's commitment to providing a noise-free environment ensures that users never have to worry about noisy neighbourhood problems, allowing them to focus on maximising the potential of their applications.
With its robust infrastructure and dedication to user satisfaction, E2E Cloud complements the principles of supervised learning by offering a reliable and high-performing platform for businesses to harness the power of their web and mobile applications.