Statistics | Population, and sample | Machine Learning Prerequisite

Neeraj Dana
Neeraj Dana


in very simple words . the process of Gathering describing and analysing the data is known as statistics


Population is basically the collection of any item (humans ,students ,males ,females ,animals,plants ..etc) in a boundry or limits

for example

  1. Populations of humans in India  = All (total numbers of ) humans in india                   (so the limit here is Full India)
  2. Populations of males in India  = All (total numbers of ) males in india
  3. Populations of kids whoes age is  between 4-15  = All (total numbers of ) kids  between 4-15 (in this case the limit or boundry is the age of 4-15)


Lets say i have a website and the users can put thier reviews or feed back on the website  and daily i am getting 10000+ reviews and i am just storing them in text files so this is the data which is unprocessed raw input which can be passed through some statistical analytics and can give some helpfull insight which can improve my buisness

for example:

  1. I asked 100 people one question and stored the results in excel that is data
  2. I measured Height of 50 students in the school that 50 records are data
  3. i have done a survey of election by calling 500 people that is data

In short the raw data collected for statistical analysis is Data


Its the subset of the population from which the Data is collected important it should represent the whole population

Lets say i want to open a ecommerce  (this is just for demonstartion purpose ) shop and i have two options i can sell one is Sweet dishes and the other is spicy disehes and for i did a servey in one city of lets say Aburoad (in rajasthan ) and took 10000 people and ask the which food do you like (sweet or spicy) and i found that 90% of people  like spicy food

so does it make sense to conclude that 90% of indians will like spicy food no right because my way of collecting the data was wrong

Why its not a good sample

  1. I took limited number of records compared to population (135.26 crores) so clearly the number of people in servey should be more atleast 50 lakh
  2. I took all people from same place may be in rajasthan majority of people likes spicy food if i go to gujrat they may not like spicy food as rajasthan if i go to maharastra theeir will be some other result and so on and so forth so my samples (the people in the servey) should be from different part of the country in short it should be taken from the whole population let say some from rajasthan some from gujarat ,maharastra and so on
machine learningmachine-learningdeep learningdeep-learningdata sciencemaths