Confidence Interval for a Proportion

On your USB drive, create a new directory, copy model.R to that directory, rename the file in the new directory, double click on the file to open Rstudio. Then copy all of the text below the line and paste it into your Rstudio editor pane.

  
# Line 1: a small demonstration of getting a
#   confidence interval for the proportion of
#   a characteristic in a population.
#
# First we will generate a population where each 
# member of the population has exactly one of
# five possible values, 1, 2, 3, 4, or 5.
#
source("../gnrnd5.R")
gnrnd5(75681625407,967845)
#let us look at the head and tail values
head(L1,40)
tail(L1,40)
#
# now we could find the exact proportion of each 
# of the characteristics in the population by 
# using the table(L1) command, but for this 
# demonstration we are to find a confidence
# interval for one of those proportions based
# on the results of taking a sample.

# ##########################
# ##  Problem: determine a 95% confidence interval
# ##      for the proportion of the
# ##      value  3 in the population based upon
# ##      the results of a sample of size 97.
# ##########################

# take a simple random sample of size 97
#
#  Be careful:  Every time we do this we get 
#               a different random sample
#
L2 <- as.integer( runif(97, 1, 4126) )
# L2  holds the index values of our simple random sample
L2
L3 <- L1[ L2 ]   # L3 holds the simple random sample
L3
# we need to find the proportion of 3's in the
# sample
table( L3 )
num_3s <- table(L3)[3]
num_3s
n <- length( L3 )
n
phat <- num_3s / n
phat
#
# using the sample proportion phat
# we note that the n*phat > 10 and n*(1-phat)>10
# so we can use the normal approximation ...
#
# find the standard deviation of sample proportions
sdsp <- sqrt( phat*(1-phat)/n)
sdsp
#
#  find the z value that has 2.5% of the area to its right
z <- qnorm(0.025, lower.tail=FALSE)
z
#  Then our confidence interval is
#     ( phat - z*sdsp, phat + z*sdsp)
phat - z * sdsp
phat + z * sdsp
#
#   Of course, we could use the function
#   ci_prop() to do this in one easy step.
#
source("../ci_prop.R")
ci_prop( n , num_3s,  0.95)
#
#
#################################
# go back and execute lines 34-69 many more times.
#    Each time you get a different random sample.
#    For each sample you get a different 95%
#    confidence interval for the proportion of 3's
#    in the population.  By the way, the 
#    true proportion of 3's is about  0.1998401    
#################################