Data analysis is a process of inspecting, cleansing, transforming, and modelingdata with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively
Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyse actual phenomena" with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge. In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundation professional communities.
Machine learning (ML)
Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use in order to perform a specific task effectively without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.Machine learning algorithms are used in a wide variety of applications, such as email filtering, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning . In its application across business problems, machine learning is also referred to as predictive analytics.
Data Analysis vs Data Science vs Machine Learning
Data Analysis and Data Science are almost the same because they share the
same goal, which is to derive insights from data and use it for better decision
Often, data analysis is associated with using Microsoft Excel and other tools
for summarizing data and finding patterns. On the other hand, data science is
often associated with using programming to deal with massive data sets. In
fact, data science became popular as a result of the generation of gigabytes of
data coming from online sources and activities (search engines, social media).
Being a data scientist sounds way cooler than being a data analyst. Although
the job functions might be similar and overlapping, it all deals with
discovering patterns and generating insights from data. It’s also about asking
intelligent questions about the nature of the data (e.g. Are data points form
organic clusters? Is there really a connection between age and cancer?).
What about machine learning? Often, the terms data science and machine
learning are used interchangeably. That’s because the latter is about “learning
from data.” When applying machine learning algorithms, the computer detects
patterns and uses “what it learned” on new data.
For instance, we want to know if a person will pay his debts. Luckily we have
a sizable dataset about different people who either paid his debt or not. We
also have collected other data (creating customer profiles) such as age, income
range, location, and occupation. When we apply the appropriate machine
learning algorithm, the computer will learn from the data. We can then input
new data (new info from a new applicant) and what the computer learned will
be applied to that new data.
We might then create a simple program that immediately evaluates whether a
person will pay his debts or not based on his information (age, income range,location, and occupation). This is an example of using data to predict
someone’s likely behavior.
Learning from data opens a lot of possibilities especially in predictions and
optimizations. This has become a reality thanks to availability of massive
datasets and superior computer processing power. We can now process data in
gigabytes within a day using computers or cloud capabilities.
Although data science and machine learning algorithms are still far from
perfect, these are already useful in many applications such as image
recognition, product recommendations, search engine rankings, and medical
diagnosis. And to this moment, scientists and engineers around the globe
continue to improve the accuracy and performance of their tools, models, and
Limitations of Data Analysis & Machine Learning
You might have read from news and online articles that machine learning and
advanced data analysis can change the fabric of society (automation, loss of
jobs, universal basic income, artificial intelligence takeover).
In fact, the society is being changed right now. Behind the scenes machine
learning and continuous data analysis are at work especially in search engines,
social media, and e-commerce. Machine learning now makes it easier and
faster to do the following:
● Are there human faces in the picture?
● Will a user click an ad? (is it personalized and appealing to him/her?)
● How to create accurate captions on YouTube videos? (recognise
speech and translate into text)
● Will an engine or component fail? (preventive maintenance in
● Is a transaction fraudulent?
● Is an email spam or not?
These are made possible by availability of massive datasets and great
processing power. However, advanced data analysis using Python (and
machine learning) is not magic. It’s not the solution to all problem. That’sbecause the accuracy and performance of our tools and models heavily depend
on the integrity of data and our own skill and judgment.
Yes, computers and algorithms are great at providing answers. But it’s also
about asking the right questions. Those intelligent questions will come from us
humans. It also depends on us if we’ll use the answers being provided by our
Accuracy & Performance
The most common use of data analysis is in successful predictions
(forecasting) and optimization. Will the demand for our product increase in the
next five years? What are the optimal routes for deliveries that lead to the
lowest operational costs?
That’s why an accuracy improvement of even just 1% can translate into
millions of dollars of additional revenues. For instance, big stores can stock up
certain products in advance if the results of the analysis predicts an increasing
demand. Shipping and logistics can also better plan the routes and schedules
for lower fuel usage and faster deliveries.
Aside from improving accuracy, another priority is on ensuring reliable
performance. How can our analysis perform on new data sets? Should we
consider other factors when analyzing the data and making predictions? Our
work should always produce consistently accurate results. Otherwise, it’s not
scientific at all because the results are not reproducible. We might as well
shoot in the dark instead of making ourselves exhausted in sophisticated data
Apart from successful forecasting and optimization, proper data analysis can
also help us uncover opportunities. Later we can realize that what we did is
also applicable to other projects and fields. We can also detect outliers and
interesting patterns if we dig deep enough. For example, perhaps customers
congregate in clusters that are big enough for us to explore and tap into. Maybe
there are unusually higher concentrations of customers that fall into a certain
income range or spending level.
Those are just typical examples of the applications of proper data analysis. In
the next chapter, let’s discuss one of the most used examples in illustrating thepromising potential of data analysis and machine learning. We’ll also discuss
its implications and the opportunities it presents.