Statistics

Introduction to Statistics For R Programming & Python

Jun 26, 2022Public

4 min read

Statistics is the practice or science of collecting and analyzing large amounts of numerical data, particularly for the purpose of inferring proportions in the population as a whole from those in a representative sample.

Introduction to statistics

Statistics is a very broad subject, with applications in a vast number of different fields.

Population and Sample are the two terms you hear very frequently in statistics.

For example Census (Population) vs Survey (Sample), Production (Population) vs audit samples (Sample)Population: Gathering data from the whole population of interest

Sample: Gathering data from the sample in order to make conclusinos about the population.

Types of sampling are simple random sampling, systematic sampling, statified sampling and cluster sampling

1. Simple Random Sampling:

Every member of the population has an equal chance of being chosen in a simple random sample. The population as a whole should be included in your sampling frame.

To conduct this type of sampling, tools such as random number generators or other techniques based entirely on chance can be used.

2. Systematic Sampling:

Systematic sampling is similar to simple random sampling, but it is generally easier to carry out. Every member of the population is assigned a number, but rather than being generated at random, the numbers are assigned at regular intervals.

3. Stratified Sampling

Stratified sampling entails segmenting the population into subpopulations that may differ significantly. It enables you to draw more precise conclusions by ensuring that all subgroups are adequately represented in the sample.

To use this sampling method, divide the population into subgroups (called strata) based on the relevant characteristic (e.g. gender, age range, income bracket, job role).

You calculate how many people should be sampled from each subgroup based on the overall population proportions. Then, using random or systematic sampling, you select a sample from each subgroup.

The popoulation is patitioned into non-overlapping groups, called strata and a sample is selected by some design within each stratum.

4. Cluster Sampling

Divide the population into groups (Clusters)
Obtain a simple random sample of so many clusters from all possible clusters
Obtain data on every sampling unit each of the randomly selected clusters

Categories of Statistics:

Introduction to Statistics For R Programming & Python - webypost

When we talk about categories there are two types of statistics, Descriptive and Inferential Statistics

Statistics allows us to derive knowledge from large datas.sets and this knowledge can then be used to make predictions, decisions, classificatinos etc.,

In a nutshell, descriptive statistics are concerned with describing the visible characteristics of a dataset (a population or sample). Meanwhile, inferential statistics are concerned with making predictions or make inferences about a larger dataset based on a subset of that data.

Types of Data:

Descrete vs continuous numeric variables

Values or observations that is counted as distinct and seperate and can only take particular values are called descrete numeric variables.

Categorical data helps to study the category of the data. Ordinal and Nominal are two types of categorical variables.