What is Supervised Machine Learning? Learn at Theta Academy

Supervised Machine Learning

What is Supervised Machine Learning?

In supervised learning, a labeled dataset is used to train the model, meaning that every input data point has an output label assigned to it. Using the example input-output pairs supplied during training, the objective is to acquire the knowledge of a mapping function from the input to the output. Neural networks, support vector machines (SVM), decision trees, k-nearest neighbors, and random forests are a few of the most widely used learning techniques. R and Python are commonly utilized in the computation of these algorithms.

In this blog, we will cover:

How does Supervised Machine Learning work?
Uses of Supervised Machine Learning
Types of Regression techniques

How does Supervised Machine Learning operate?

Data Collection: The first step is gathering an input-output pair dataset. If you’re developing a model to forecast housing prices, for instance, your dataset may include features and associated pricing for things like house size, number of bedrooms, etc.
Data Preprocessing: To prepare the data for training, this stage entails cleaning and formatting it. To handle missing values, and scale features, encode categorical variables, and divide the data into training and testing sets, among other things, may be necessary.
Model Selection: Based on the nature of the situation, you select a model type. For supervised learning, a variety of methods are available, including neural networks, support vector machines, decision trees, and linear regression. The quantity of the dataset, the intended performance, and the nature of the problem all influence the model choice.
Training the Model: Using the labeled training data, the chosen model is trained in this stage. Through an optimization process, the model modifies its internal parameters to determine the link between the input features and the output labels. In this procedure, the difference between the expected and actual outputs is measured by a preset loss function, which is minimized.
Evaluation: After training, the model is assessed using the testing set, a different subset of the dataset. Depending on the particular job (classification, regression, etc.), metrics like accuracy, precision, recall, or F1-score are used to evaluate the model’s performance.
Fine-tuning and Validation: To enhance performance, you may adjust the model’s hyperparameters or select a new model architecture in light of the evaluation results. To make sure that the model’s performance is not influenced by the specific selection of training and testing sets, this stage frequently uses cross-validation techniques.
Deployment: Lastly, after you are happy with the model’s output, you may use it to apply predictions to fresh, untainted data. This could entail incorporating the model into a website, software program, or other system that allows real-time forecast generation.

All things considered, supervised learning is an effective method for resolving a variety of predictive modeling issues, ranging from picture identification and spam identification to stock price predictions and medical diagnostics.

Uses of Supervised Machine Learning

Supervised learning is, at its most basic, the process of teaching an algorithm to map an input to a specific output. If the mapping is accurate, the algorithm has learned its lessons correctly. If not, modify the algorithm as needed to ensure that it learns appropriately. Algorithms using supervised learning can forecast previously unknown data that we may get in the future. Several business applications, such as the following, are developed and advanced through the use of supervised learning models:

Supervised learning algorithms are used in image and object identification systems to locate, isolate, and classify objects from movies or images. This makes the algorithms helpful for a variety of computer vision approaches and imagery analysis.
Predictive analytics: With the rise in bitcoin and stock trading application cases, predictive analysis is currently expanding rapidly. This enables businesses to examine particular outcomes depending on a specified output variable, assisting company executives in rationalizing choices that will benefit the firm as a whole.
Analyzing consumer sentiment: Organisations are extracting crucial information, such as context, emotion, and intent, by using supervised machine learning algorithms. Gaining a deeper comprehension of client interactions and enhancing brand growth can both benefit from this data.
Spam detection: One of the most common applications of supervised learning models is spam detection. Organizations may efficiently arrange correspondence related to spam and non-spam by training databases to identify trends or anomalies in fresh data through the use of supervised classification algorithms.

Types of Regression Techniques

Linear Regression

A straightforward yet effective supervised learning method is linear regression. Finding the relationship between the input (explanatory) and output (response) variables is the goal of linear regression. The fundamental elements of a basic linear regression consist of:

Constant input variable
The variable of continuous reaction
Meeting the linear regression assumptions

These are the assumptions of linear regression:

(1) A linear relationship between the variable of input and output
(2) Errors that are regularly distributed and
(3) Independent of input

Given an input set (x) and response set (y) of data points. The goal of simple linear regression is to fit a line across as many points as possible while reducing the squared distance between each point and the values of the fitted line.

The form of the regression equation is:

y = b0 + b1x + e

The input variable is called x, the error term is called e, the intercept is called b0, the slope of the regression line is called b1, and the predicted value of the response variable is called y. The relationship between changes in input and changes in output is shown by the slope b1. For instance:

If y = 1.27 + 0.64x is the regression equation, then a unit increase in x will have an impact on y of 0.64. A unit change in x results in a 0.64 decrease in y if the equation is changed to y = 1.27 – 0.64x. The square of the correlation between x and y, or R^2 (R squared), is the metric used to assess the model’s quality. The fit is better the higher the R^2 value.

After regression, a hypothesis test is conducted to see whether the link between x and y is valid. The theory takes the following form:

H0: b1 is equal to 0.
Ha: b1 does not equal zero.

Since b1 determines or reflects the relationship between x and y—that is, the idea that x and y are related and that changes in x result in changes in y—we hypothesize that b1 is equal to zero. The p-value, which determines if the predictor’s capacity to explain the response according to the model is accepted or rejected, is then determined using a straightforward t-test.

Support Vector Regression

Support vector regression is a technique that can be applied to regression problems, even though support vector machines (SVMs) are frequently employed in classification. In essence, it includes details on every major characteristic that defines the margin (the area where the data points reside). It essentially takes the form of a “line” that crosses all of your data points with an approximate margin to allow for their inclusion.

The basic concept remains the same: you attempt to reduce the margin that approximates your data points to reduce the epsilon value (error). (In this image, the y line with the “e” epsilon +/- margin values represents the support vector!)

Decision Tree Regression

A decision tree is a collection of guidelines for categorizing data. It examines a data set’s variables, identifies the most significant ones, and then generates a tree of choices that best divides the data. Data is divided into buckets based on factors, and the number of each bucket is counted after each split to construct the tree.

The main concept is that decision tree creation is a recursive process. To an observation set (S), the method goes like this:

An endpoint with the most common class labeled is an endpoint of the tree if all observations in S belong to the same class or if S is relatively tiny.
Find the best rule based on a single feature (such as “is weight > 150?”) to divide S into subsets, one for each class, if it is too big and contains more than one class.
Apply step 1 to every new subgroup if you have to move on to step 2. Apply step 1 to the sub-subsets, and so on, if your subsets need to move on to step 2. Once everything is properly divided (into extremely small buckets or a single class), you have a tree-like collection of rules.

Here’s a simple illustration. Can we determine someone’s ethnicity based just on their gender and weight? Are they Japanese or American?

Name	Nationality	Weight (lbs.)	Sex
Larry	American	195	M
Jerry	American	190	M
Carrie	American	160	F
Cheri	American	165	F
Yoshi	Japanese	165	M
Yasuo	Japanese	160	M
Michiki	Japanese	130	F
Noriko	Japanese	140	F

The model first verifies if the subject weighs less than 150 pounds. In that case, they are categorized as Japanese. If not, it checks to determine if they weigh more than 177 pounds. They are American if that is. If it isn’t, the final query inquires as to gender. These three questions allow you to infer someone’s nationality based on their weight and sex. For example, if a man weighs 200 pounds, the model will likely estimate that he is American.

This example is useful because the tree accurately explains the facts; in reality, there is overlap—for example, there are thin Americans and overweight Japanese people. Even if the model isn’t perfect, it would be tuned to produce the highest number of accurate predictions.

* In the majority of tree-generating algorithms, the branching rule that yields the most information gain is considered the “best” one.* Typically, trees are “pruned” to prevent overfitting. The pruning method makes the model a little bit more flexible by removing final nodes depending on misclassification rates.

Random Forest Regression

Certainly! Let’s break down the information step by step:

Coin Tossing Example: Imagine a slightly biased coin that has a 51% chance of landing on heads (and 49% on tails). If you toss this coin 1,000 times, 75% of the time you’ll get more heads than tails.
Scaling Up: Now, if you toss the same biased coin 10,000 times, the probability of getting more heads than tails increases to over 99%.
Application to Models: Each biased coin represents a model that is 51% accurate (slightly better than random guessing). If you use 1,000 of these models, there’s a 75% chance that all of them together will give you the correct answer.
Combining Models: If you create 10,000 such models, the combined accuracy exceeds 99%. This means that even though each model is only slightly better than guessing, when you use a large number of them together, their collective accuracy becomes very high.

In simple terms, this concept illustrates how multiple slightly accurate models, when used together, can greatly improve overall accuracy. It’s like having many biased coins where each one has a small advantage in predicting outcomes, but together they can be very reliable. This is precisely what random forest models do: they use a portion of the data to construct hundreds or even thousands of individual trees (trees can be regressions and classifications). After that, they forecast the result by averaging, voting, or using any other appropriate method to combine the outputs of these trees.

Conclusion

We have developed a basic grasp of supervised learning and its operation in this blog. Regression approaches have also been explored and discussed. When the outcome variable is known and is utilized specifically in training, the learning process is known as supervised learning (Supervised). One example of supervised learning is when I get data from patients and healthy normals, and I have to build a model that separates the patients from the healthy normals. In this case, it is known if the subject is a patient or normal. In recent years, machine learning has fundamentally altered the way we do business. A radical breakthrough that differentiates existing automation solutions from machine learning is the move away from rules-based programming.