# Here is a problem taken from the "real world".
# simplified approach...
# In order to decide which of two colors to use on a
# company web site the design team calls together 100
# people. They divide the people into two 50 person groups.
# The first group is shown color one and the second group is
# shown color two. Each group is asked to rate the
# color they have been shown on a scale of 0 to 6 where
# 0 means they dislike the color and 6 means that they
# really like the color. The first group has a mean score
# of 4.8 while the second group has a mean score of 4.6.
# Some in the design team say "The first color is significantly
# better than the second color." Others in the design
# team say "The scores are too close to make that conclusion."
# What can we say about this, from a statistical perspective?
# First, the sampling is questionable. We do not know if
# the 100 people are in any way representative of people
# who will use the web site.
#
# After that, we could form a
# null hypothesis that the mean scores for the web site
# users, if we could find and query all of them, would
# be the same value for the two colors.
# Then our alternative is that the
# mean score for all web site users would be higher for
# the first color than it would be for the second color.
#
# Then we would turn to our t-test for the equality of
# two means, hypoth_2test_unknown(). However,
# to use that function we need to know the desired level
# of significance and we need to know the standard
# deviation of the two 50-person sample scores.
#
# Of these
# the former is easy. We will run the test at the 0.05
# level of significance.
#
# What about the latter issue:
# knowing the standard deviations? Let us see how those
# sample standard deviations might affect the analysis.
# First generate two samples, one with mean 4.8 and
# the other with mean 4.6. Do this to get a large
# standard deviation in each by taking almost all 5's and
# then including a few 1's and maybe a 3
first <- rep(5,50)
first[1]<-1
first[2]<-1
first[3]<-3
first #look at the first
mean( first)
sd(first)
#
# Then, create a second sample, starting from the first
# and then changing just 3 values so that we alter the mean
# value of that second sample
second <- first
second[4]<-1
second[5]<-1
second[6]<-3
second # look at the second
mean(second)
sd(second)
# Now do a two population test with the null
# hypothesis that the two means are the same versus
# the alternative that the first mean is greater
# than the second.
#
source("../hypo_2unknown.R")
hypoth_2test_unknown(sd(first), 50, mean( first),
sd(second), 50, mean(second),
1, 0.05)
# the full Attd value of 0.16 means that if the null
# hypothesis is true then we would get samples with
# these kind of differences, or more, in the means about
# 1/6 of repeated samples of this size.
# Therefore, we do not have enough evidence
# in these two samples to reject the null hypothesis.
# Note that full Attd is meant as the attained
# or achieved significance using the full degrees of
# freedom.
# Now, do this again, but in samples called third and fourth.
# For these, however we will
# make the standard deviation small by just using scores
# of 5's and 4's.
third <- c(rep(5,40),rep(4,10))
third # look at third
mean( third )
sd(third)
fourth<- c(rep(5,30),rep(4,20))
fourth #look at fourth
mean(fourth)
sd(fourth)
# Now run the same test, but with the new samples
hypoth_2test_unknown(sd(third), 50, mean( third),
sd(fourth), 50, mean(fourth),
1, 0.05)
# the full Attd value of 0.0146 tells us that if the
# null hypothesis were true then we would get two samples
# showing this difference, or one more extreme and this,
# in about 1.46% of the samples. That is too rare!
# Therefore, we would reject the null hypothesis at
# the 0.05 level, in favor of the alternative which says
# that the mean of the third is higher than is the mean
# of the fourth.
# Now let us look at the real data. First we can read in
# all of the data.
clean_data <- read.csv("josh_real_clean.csv")
# then look at it.
clean_data
str( clean_data )
# The two groups that we are examining ar the FB color
# (items 1-36 and 109-122) and the GB color (items 37-71
# and 123-137). Those values represent the responses
# from the first group (all 50 people) and the second
# group (another 50 people).
# However, looking at the data we have a real problem.
# The $Confidence score is the result of asking "How confident
# are you based on the color?" The $Unease score is the
# result of asking "How uneasy are you based on the color?"
# These are opposite readings. The more "confident" you are
# the less "uneasy" you should be. If you respond with a 6
# for both questions then it is clear that you are not being
# truthful. You are just marking down answers.
# The same is true for $Trust and $Untrust. Items 109-137
# are clearly responses that are so conflicted that they are
# meaningless. We should ignore them
# We are only interested in the values where the useif.0
# value is 0, and then we only want to look at the
# Confidence scores.
# The first group we want is the FB items, 1 through 36
clean_1 <- clean_data$Confidence[1:36]
mean( clean_1 )
sd( clean_1 )
# the second group we want is the GB items, 37 through 71
clean_2 <-clean_data$Confidence[37:71]
mean(clean_2)
sd(clean_2)
# Now we can run the test
hypoth_2test_unknown(sd(clean_1), length(clean_1), mean( clean_1),
sd(clean_2), length(clean_2), mean(clean_2),
1, 0.05)
# Based on the sample that we have, excluding clearly
# bad data and using the resulting means and standard
# deviations, the result is that we do not have evidence to
# reject the null hypothesis of "no difference between colors"
# in favor of the alternative "the FB color is better than
# the GB color" at the 0.05 level of significance.
######### the real case
# It turns out there was a third color tested, the HB's.
# Let us pull out those good items, 72 through 108.
# What if we compare the FB and HB scores on confidence.
clean_3 <-clean_data$Confidence[72:108]
mean(clean_3)
sd(clean_3)
hypoth_2test_unknown(sd(clean_1), length(clean_1), mean( clean_1),
sd(clean_3), length(clean_3), mean(clean_3),
1, 0.05)
# In this comparison, with an attained level of 0.0115, we would
# have significant evidence, at the 0.05 level, to reject the
# idea that the two colors, FB and HB are viewed in the same
# way in favor of the hypothesis that the FB color
# engenders more confidence than does the HB color.
# We could go on to test GB and HB but even taking the step above
# should not have been done. What we really want to be able
# to say is are all the colors the same, or is there some
# difference. We should not be looking at different pairs,
# FB vs. GB, FB vs HB, and then GB vs HB. The appropriate
# test is called an ANalysis Of VAriance, or ANOVA, and
# at this time that is beyond the material of this course.