What is Unsupervised Learning?
In contrast, Unsupervised Learning means there’s no supervision or guidance.
It’s often thought of as having no correct answers, just acceptable ones.
For example, in Clustering (this falls under Unsupervised Learning) we’re
trying to discover where data points aggregate (e.g. are there natural clusters?).
Each data point is not labeled anything so our model and computer won’t be
learning from examples. Instead, our computer is learning to identify patterns
without any external guidance.
This seems to be the essence of true Artificial Intelligence wherein the
computer can learn without human intervention. It’s about learning from the
data itself and trying to find the relationship between different inputs (notice
there’s no expected output here in contrast to Regression and Classification
discussed earlier). The focus is on inputs and trying to find the patterns and
relationships among them. Perhaps there are natural clusters or there are clear
associations among the inputs. It’s also possible that there’s no useful
relationship at all.
What is Supervised Learning?
First, Supervised Learning is a lot similar to learning from examples. For
instance, we have a huge collection of images correctly labeled as either dogs
or cats. Our computer will then learn from those given examples and correct
labels. Perhaps our computer will find patterns and similarities among those
images. And finally when we introduce new images, our computer and model
will successfully identify an image whether there’s a dog or cat in it.
It’s a lot like learning with supervision. There are correct answers (e.g. cats or
dogs) and it’s the job of our model to align itself so on new data it can still
produce correct answers (in an acceptable performance level because it’s hard
to reach 100%).
For example, Linear Regression is considered under Supervised Learning.
Remember that in linear regression we’re trying to predict the value of y for a
given x. But first, we have to find patterns and “fit” a line that best describes
the relationship between x and y (and predict y values for new x inputs).
How to Approach a Problem
Many data scientists approach a problem in a binary way. Does the task fall
under Supervised or Unsupervised Learning?
The quickest way to figure it out is by determining the expected output. Are we
trying to predict y values based on new x values (Supervised Learning,
Regression)? Is a new input under category A or category B based on
previously labeled data (Supervised Learning, Classification)? Are we trying
to discover and reveal how data points aggregate and if there are natural
clusters (Unsupervised Learning, Clustering)? Do inputs have an interesting
relationship with one another (do they have a high probability of co-
Many advanced data analysis problems fall under those general questions.
After all, the objective is always to predict something (based on previous
examples) or explore the data (find out if there are patterns).