Supervised and Unsupervised Learning

What is machine learning?

Whenever a new technology comes to light and is widely accepted there tends to be different opinions on how it is described and even implemented.

Machine learning has been no exception either, but in this article, I would like to agree with most of my peers that it is indeed a subset of artificial intelligence that widely employs mathematical models, algorithms, and data sets that are known as training data sets to train computer systems, some would prefer programs to make decisions on their own without being explicitly instructed to do so.

Machine learning has been a buzzword for a while now and as it has demonstrated to have far more reaching impacts almost in every sector now. From autonomously operating cars, content generating systems, automated surgeries, recommendation systems, and face recognition the list is ever increasing as more innovations come to light.

Broad Classification of Machine Learning Algorithms.

Machine learning algorithms are generally classified into three categories:

i. Supervised learning.

ii. Unsupervised learning.

iii. Semi-supervised learning.

Supervised Learning

You’ve probably met a common term in data science ‘labeled data’ which can be sometimes referred to as data annotation.

This is the process of marking your data to show the result or the output that you want your model to output.

The process of then using this labeled data to train a machine learning model to produce the desired level of accurate and quality output is what we refer to as supervised learning.

Training a machine learning model stops whenever the desired level of accuracy in the output has been achieved.

The process of training an algorithm heavily relies on the quality of the data, this then means that preprocessing data(cleaning, labeling…etc. ) can be a time-consuming and expensive process especially if the volume of data is huge.

If one is not able to meet that cost in terms of Labour and money then you’re left with no choice but to opt for other techniques other than supervised as we would see below.

Algorithms under supervised learning usually fall under the following categories or classes;

-Regression algorithms: involve predicting a value

-Classification algorithms: they involve predicting the category

Examples of these specific algorithms include:

· Support vector machines

· Logistic regression

· Linear Regression

· Naive Bayes Method

· Decision Trees

· KNN algorithms

2.Unsupervised Learning.

Photo by Annie Spratt on splash

Having already stated what ‘labeling data’ means it is easier to say that as for the case of unsupervised learning algorithms are left to discover features and patterns in the data that is used as input.

There is a general idea that supervised learning algorithms are easier to learn as compared to their unsupervised counterparts. But it is also important to note that if you want to avoid the trouble of having to label data unsupervised learning algorithms come in handy.

Unsupervised learning algorithms are classified further into the following subfields:

Clustering: is the most commonly used technique and is used in placing datasets with similar characteristics into the same groups/clusters.

The K-Means Clustering algorithms is a common example that usually takes into account the distance between a centroid and a data point and assigns a data point to the cluster with the least distance. A common application of these algorithms is in customer segmentation and analysis. Others include mixture models, hierarchical clustering and dbscan

Association: It’s a very important technique that is particularly used in finding products that correlate mostly and occur together for instance if a person purchases milk he or she is likely to purchase bread or sugar.

Commonly applied in market basket analysis and finding relationships between data in transactional databases as well as in arranging products in hyper stores and supermarkets and even online stores. A common example is the Apriori algorithm.

Although some consider the Dimensionality reduction technique to be under this category I would like to view them as data preprocessing techniques.

3. Semi-Supervised Learning.

It is a technique that employs the use of both labeled and unlabeled data although much of the data is unlabeled. The small section of the data that is labeled is meant to increase the accuracy of the model.

Due to the high costs attributed to the process of labeling data semi-supervised learning is seen as a more cost-effective approach since much of the data is unlabeled.

There is much to learn about machine learning far more complex concepts and a lot of code to write but getting the basic constructs is equally important.

Click this link to read on how to preprocess data for more accurate models.

Thank you

Connect with me

Isaac Tonyloi

Isaac Tonyloi

Supervised and Unsupervised Learning