**AIOU Solved Project Binomial Distribution**

**ACKNOWLEDGEMENTS**

This postulation is devoted to Allah, my Creator and my Master, and envoy, Mohammed (May Allah favor and give him), who showed us the motivation behind life. My country Pakistan, the hottest womb; Allama Iqbal Open University, Islamabad; my second wonderful home; My awesome guardians, who never quit giving of themselves in incalculable ways, My dearest friend, who drives me through the valley of dimness with the light of trust and support, My cherished siblings and sisters; especially my dearest sibling, who remains by me when things look disheartening, My beloved Parents: whom I can’t compel myself to quit loving. All the general population in my life who touch my heart, I commit to this research.

**ABSTRACT**

The language of statistics identifies numerical data of two types: Continuous data and Categorical data. Continuous data describes the quantity measured on a scale. Eg. Comparison of apical debris extrusion with rotary vs. reciprocating file motion. This is measured in micro-gram of debris extruded from the root apex. On the other hand, categorical data speaks about the quality of the data and is expressed in proportions or percentage. Eg. Prevalence of white spot lesion (WSL) in patients undergoing fixed orthodontic therapy. This information is expressed in percentage of patients having WSL. The representation of data is inclusive of two parameters:

- The measure of central tendency and the measure if dispersion.
- The measure of central tendency is direction towards the central most value of the data set as given by the mean or median.
- The measure of dispersion includes standard deviation (SD), standard error and confidence interval.
- Sample size has a significant effect on sample distribution.
- It is often observed that small sample size results in non-binomial distribution. This is a result of inadequate estimation of the dispersion of the data, and the frequency distribution does not result in a binomial curve.

To understand the effect of sample size on distribution, let us consider the following research question. What is the shear bond strength of self-etch adhesive to dentin?

**Introduction**

The binomial distribution is a probability distribution that summarizes the likelihood that a value will take one of two independent values under a given set of parameters or assumptions. The underlying assumptions of the binomial distribution are that there is only one outcome for each trial, that each trial has the same probability of success, and that each trial is mutually exclusive, or independent of each other.

- The binomial distribution is a probability distribution that summarizes the likelihood that a value will take one of two independent values under a given set of parameters or assumptions.
- The underlying assumptions of the binomial distribution are that there is only one outcome for each trial, that each trial has the same probability of success, and that each trial is mutually exclusive or independent of each other.
- The binomial distribution is a common discrete distribution used in statistics, as opposed to a continuous distribution, such as the normal distribution.

**Understanding Binomial Distribution**

The binomial distribution is a common discrete distribution used in statistics, as opposed to a continuous distribution, such as the normal distribution. This is because the binomial distribution only counts two states, typically represented as 1 (for a success) or 0 (for a failure) given a number of trials in the data. The binomial distribution, therefore, represents the probability for x successes in n trials, given a success probability p for each trial.

#### Read More Thesis: **Click Here**

#### AIOU Solved Assignments: Click Here

Binomial distribution summarizes the number of trials, or observations when each trial has the same probability of attaining one particular value. The binomial distribution determines the probability of observing a specified number of successful outcomes in a specified number of trials.

The binomial distribution is often used in social science statistics as a building block for models for dichotomous outcome variables, like whether a Republican or Democrat will win an upcoming election or whether an individual will die within a specified period of time, etc.

**Analyzing Binomial Distribution**

The expected value, or mean, of a binomial distribution, is calculated by multiplying the number of trials by the probability of successes. For example, the expected value of the number of heads in 100 trials of head and tales is 50, or (100 * 0.5). Another common example of the binomial distribution is by estimating the chances of success for a free-throw shooter in basketball where 1 = a basket is made and 0 = a miss.

The mean of the binomial distribution is np, and the variance of the binomial distribution is np (1 − p). When p = 0.5, the distribution is symmetric around the mean. When p > 0.5, the distribution is skewed to the left. When p < 0.5, the distribution is skewed to the right.

The binomial distribution is the sum of a series of multiple independent and identically distributed Bernoulli trials. In a Bernoulli trial, the experiment is said to be random and can only have two possible outcomes: success or failure.

For example, flipping a coin is considered to be a Bernoulli trial; each trial can only take one of two values (heads or tails), each success has the same probability (the probability of flipping a head is 0.5), and the results of one trial do not influence the results of another. The Bernoulli distribution is a special case of the binomial distribution where the number of trials n = 1.

**Example of Binomial Distribution**

The binomial distribution is calculated by multiplying the probability of success raised to the power of the number of successes and the probability of failure raised to the power of the difference between the number of successes and the number of trials. Then, multiply the product by the combination between the number of trials and the number of successes.

For example, assume that a casino created a new game in which participants are able to place bets on the number of heads or tails in a specified number of coin flips. Assume a participant wants to place a $10 bet that there will be exactly six heads in 20 coin flips. The participant wants to calculate the probability of this occurring, and therefore, they use the calculation for the binomial distribution.

The probability was calculated as: (20! / (6! * (20 – 6))) * (0.50)^(6) * (1 – 0.50) ^ (20 – 6). Consequently, the probability of exactly six heads occurring in 20 coin flips is 0.037, or 3.7%. The expected value was 10 heads in this case, so the participant made a poor bet.

A **binomial experiment** is a statistical experiment that has the following properties:

- The experiment consists of n repeated trials.
- Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.
- The probability of success, denoted by P, is the same on every trial.
- The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.

Consider the following statistical experiment. You flip a coin 2 times and count the number of times the coin lands on heads. This is a binomial experiment because:

- The experiment consists of repeated trials. We flip a coin 2 times.
- Each trial can result in just two possible outcomes – heads or tails.
- The probability of success is constant – 0.5 on every trial.
- The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other trials.

**Notation**

The following notation is helpful, when we talk about binomial probability.

- x: The number of successes that result from the binomial experiment.
- n: The number of trials in the binomial experiment.
- P: The probability of success on an individual trial.
- Q: The probability of failure on an individual trial. (This is equal to 1 – P.)
- n!: The factorial of n (also known as n factorial).
- b(x; n, P): Binomial probability – the probability that an n-trial binomial experiment results in exactly x successes, when the probability of success on an individual trial is P.
_{n}C_{r}: The number of combinations of n things, taken r at a time.

**Binomial Distribution**

A **binomial random variable** is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a **binomial distribution**.

Suppose we flip a coin two times and count the number of heads (successes). The binomial random variable is the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is presented below.

Number of heads |
Probability |

0 | 0.25 |

1 | 0.50 |

2 | 0.25 |

The binomial distribution has the following properties:

- The mean of the distribution (μ
_{x}) is equal to n * P . - The variance (σ
^{2}_{x}) is n * P * ( 1 – P ). - The standard deviation (σ
_{x}) is sqrt[ n * P * ( 1 – P ) ].

The **binomial probability** refers to the probability that a binomial experiment results in exactly x successes. For example, in the above table, we see that the binomial probability of getting exactly one head in two coin flips is 0.50.

Given x, n, and P, we can compute the binomial probability based on the binomial formula:

**Binomial Formula.** Suppose a binomial experiment consists of n trials and results in x successes. If the probability of success on an individual trial is P, then the binomial probability is:

b(x; n, P) = _{n}C_{x} * P^{x} * (1 – P)^{n – x}

or

b(x; n, P) = { n! / [ x! (n – x)! ] } * P^{x} * (1 – P)^{n – x}

**Cumulative Binomial Probability**

A **cumulative binomial probability** refers to the probability that the binomial random variable falls within a specified range (e.g., is greater than or equal to a stated lower limit and less than or equal to a stated upper limit).

For example, we might be interested in the cumulative binomial probability of obtaining 45 or fewer heads in 100 tosses of a coin (see Example 1 below). This would be the sum of all these individual binomial probabilities.

b(x < 45; 100, 0.5) = b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + … + b(x = 44; 100, 0.5) + b(x = 45; 100, 0.5)

**Practical study**

The binomial distribution is the most common type of distribution assumed in technical stock market analysis and in other types of statistical analyses. The standard binomial distribution has two parameters: the mean and the standard deviation. For a binomial distribution, 68% of the observations are within +/- one standard deviation of the mean, 95% are within +/- two standard deviations, and 99.7% are within +- three standard deviations.

This theory states that averages calculated from independent, identically distributed random variables have approximately binomial distributions, regardless of the type of distribution from which the variables are sampled (provided it has finite variance). Binomial distribution is sometimes confused with symmetrical distribution. Symmetrical distribution is one where a dividing line produces two mirror images, but the actual data could be two humps or a series of hills in addition to the bell curve that indicates a binomial distribution. Real life data rarely, if ever, follow a perfect binomial distribution. The skewness and kurtosis coefficients measure how different a given distribution is from a binomial distribution. The skewness measures the symmetry of a distribution. The binomial distribution is symmetric and has a skewness of zero. If the distribution of a data set has a skewness less than zero, or negative skewness, then the left tail of the distribution is longer than the right tail; positive skewness implies that the right tail of the distribution is longer than the left.

The kurtosis statistic measures the thickness of the tail ends of a distribution in relation to the tails of the binomial distribution. Distributions with large kurtosis exhibit tail data exceeding the tails of the binomial distribution (e.g., five or more standard deviations from the mean). Distributions with low kurtosis exhibit tail data that is generally less extreme than the tails of the binomial distribution. The binomial distribution has a kurtosis of three, which indicates the distribution has neither fat nor thin tails. Therefore, if an observed distribution has a kurtosis greater than three, the distribution is said to have heavy tails when compared to the binomial distribution. If the distribution has a kurtosis of less than three, it is said to have thin tails when compared to the binomial distribution.

The assumption of a binomial distribution is applied to asset prices as well as price action. Traders may plot price points over time to fit recent price action into a binomial distribution. The further price action moves from the mean, in this case, the more likelihood that an asset is being over or undervalued. Traders can use the standard deviations to suggest potential trades. This type of trading is generally done on very short time frames as larger timescales make it much harder to pick entry and exit points.

Similarly, many statistical theories attempt to model asset prices under the assumption that they follow a binomial distribution. In reality, price distributions tend to have fat tails and, therefore, have kurtosis greater than three. Such assets have had price movements greater than three standard deviations beyond the mean more often than would be expected under the assumption of a binomial distribution. Even if an asset has went through a long period where it fits a binomial distribution, there is no guarantee that the past performance truly informs the future prospects.

The binomial distribution is widely used. Part of the appeal is that it is well behaved and mathematically tractable. However, the central limit theorem provides a theoretical basis for why it has wide applicability.

The central limit theorem basically states that as the sample size (N) becomes large, the following occur:

- The sampling distribution of the mean becomes approximately binomial regardless of the distribution of the original variable.
- The sampling distribution of the mean is centered at the population mean, μ, of the original variable. In addition, the standard deviation of the sampling distribution of the mean approaches σ/N−−√.

What is the probability that the world series will last 4 games? 5 games? 6 games? 7 games? Assume that the teams are evenly matched.

Solution: This is a very tricky application of the binomial distribution. If you can follow the logic of this solution, you have a good understanding of the material covered in the tutorial, to this point.

In the world series, there are two baseball teams. The series ends when the winning team wins 4 games. Therefore, we define a success as a win by the team that ultimately becomes the world series champion.

For the purpose of this analysis, we assume that the teams are evenly matched. Therefore, the probability that a particular team wins a particular game is 0.5.

Let’s look first at the simplest case. What is the probability that the series lasts only 4 games. This can occur if one team wins the first 4 games. The probability of the National League team winning 4 games in a row is:

b(4; 4, 0.5) = _{4}C_{4} * (0.5)^{4} * (0.5)^{0} = 0.0625

Similarly, when we compute the probability of the American League team winning 4 games in a row, we find that it is also 0.0625. Therefore, probability that the series ends in four games would be 0.0625 + 0.0625 = 0.125; since the series would end if either the American or National League team won 4 games in a row.

Now let’s tackle the question of finding probability that the world series ends in 5 games. The trick in finding this solution is to recognize that the series can only end in 5 games, if one team has won 3 out of the first 4 games. So let’s first find the probability that the American League team wins exactly 3 of the first 4 games.

b(3; 4, 0.5) = _{4}C_{3} * (0.5)^{3} * (0.5)^{1} = 0.25

Okay, here comes some more tricky stuff, so listen up. Given that the American League team has won 3 of the first 4 games, the American League team has a 50/50 chance of winning the fifth game to end the series. Therefore, the probability of the American League team winning the series in 5 games is 0.25 * 0.50 = 0.125. Since the National League team could also win the series in 5 games, the probability that the series ends in 5 games would be 0.125 + 0.125 = 0.25.

The rest of the problem would be solved in the same way. You should find that the probability of the series ending in 6 games is 0.3125; and the probability of the series ending in 7 games is also 0.3125.

**SWOT analysis**

Strengths and Weaknesses | The internal environment – the situation inside the company or organization | For example: relating to products, pricing, costs, profitability, performance, quality, people, skills, adaptability, brands, services, reputation, processes, infrastructure, etc. | Factors tend to be in the present |

Opportunities and Threats | The external environment – the situation outside the company or organization | For example: factors relating to markets, sectors, audience, fashion, seasonality, trends, competition, economics, politics, society, culture, technology, environmental, media, law, etc. | Factors tend to be in the future |

**Conclusion**

So far, we’ve been talking about the binomial curve as if it is a static thing. However, it might be more accurate to talk of binomial curves, plural, as the curve can broaden or narrow, depending on the **variance** of the random variable. No matter the shape of the curve, however, three things will always be true:

a binomial curve is always symmetrical about the mean

a binomial curve is always thicker and fatter in the middle, and tapers at its tails

the area under a binomial curve will add to 100%.

Now that we know what is common to all binomial curves, let’s explore what causes them to broaden or narrow. Generally, if a variable has a higher **variance** (that is, if a wider spread of values is possible), then the curve will be broader and shorter. However, if the variance is small (where most values occur very close to the mean), the curve will be narrow and tall in the middle. Check out the following graphic for a visual.

The binomial distribution, or bell curve, is broad and dense in the middle, with shallow, tapering tails. Often, a random variable that tends to clump around a central mean and exhibits few extreme values (such as heights and weights) is binomially distributed. Because of the sheer number of variables in nature that exhibit binomial behavior, the binomial distribution is a commonly used distribution in inferential statistics.

- The parameters of binomial distribution are mean and SD.
- Distribution is a function of SD.
- Sample size plays a role in binomial
- Skewed distribution can also be representative if the population under study.
- Binomial distribution of data can be ascertained by certain statistical tests.

**Recommendations**

Binomial distribution is not the only “ideal” distribution that is to be achieved. Data that do not follow a binomial distribution are called non-binomial data. In certain cases, binomial distribution is not possible especially when large samples size is not possible. In other cases, the distribution can be skewed to the left or right depending on the parameter measure. This is also a type of non-binomial data that follows Poisson’s distribution independent of the sample size. For example, any data on DMFS would often have skewed distribution to the left. This happens due to the nature of the data set. The best DMFS score is 0 and in a population of school children, the mean DMFS value would be closer to 0 and taper gradually towards the right. This kind of skewed data is also a true representative of the population.

**References**

- Krithikadatta J, Valarmathi S. Research Methodology in dentistry: Part II The relevance of statistics in research. J Conserv Dent 2012;15:206-213.
- Altman D, Bland M. The binomial distribution. BMJ 1995;1995:298.
- Razali, Nornadiah; Wah, Yap Bee. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics 2011;2:21-33.