Doing Tests of Proportions Quiz,
Qn1.
Download the file deviceprefs.csv from the course materials. This file describes a study in which people with and without disabilities indicated their preferences for touchpads or trackballs as computer input devices. You will use R to analyze this file to answer the questions in this quiz.. With this and every quiz in this course, you can find what you need by understanding and mimicking coursera.R, the R code file used in the lecture. This first question gives you credit for getting R, RStudio, and deviceprefs.csv ready to go. Are you ready to proceed?

Qn2. How many subjects’ preferences were recorded?
Note: For this and every other quiz in this course, when you miss a question, code will be revealed to help you. However, you cannot usually just copy the code verbatim. For example, the variable ‘df” is used throughout these code snippets to refer to the “data frame”, the term R uses for the variable that holds the .csv file that you read in. If you read in your .csv file into a variable with a name other than “df”, then you will need to use your variable name, not “df”. Similarly, the variable “m” is used in these code snippets to hold a fitted statistical model. If you use a variable name other than “m” for your model, you will need to change “m” to be your variable name.
Be sure that when you copy the code provided for missed questions, you understand that code by looking p the documentation for the R functions used. You can do that after loading a function’s library with the question mark operator. For example, if the function is “foo” then you would do:
?foo
As stated, this only works if the library defines “foo” is loaded. You load libraries into memory with the library command:
And this won’t work unless the “foolib” package is installed. You install packages using install.packages, which brings the package files into your computer and should only have to be done once:
Does the data table indicate a one-sample proportion or a two-sample proportion?
As described, the data table shows input device preferences of certain people with and without disability. How many subjects have a disability?
Ignoring for a moment disability status, perform a one-sample chi-square test to see whether the proportion of subjects who preferred the trackball (or touchpad) differed significantly from chance. To the nearest hundredth (two digits), what is the chi-square statistic? Hint: Note that this question is not asking for the p-value!

For people without disabilities, perform a binomial test to see whether their preference for touchpads differed significantly from chance. To the nearest ten-thousandth (four digits), what is the p-value? Hint: Run a binomial test comparing the sum of rows of people without disabilities who prefer the touchpad against the number of all rows of people without disabilities. With two possible preferences, touchpad and trackball, the chance probability would be 1/2/. Do not correct for multiple comparisons; consider this a single test on a subset of the data.

For people with disabilities, perform a binomial test to see whether their preference for touchpads differed significantly from chance. To the nearest ten-thousandth (four digits), what is the p-value? Hint: Run a binomial test comparing the sum of rows of people with disabilities. With two possible preferences, touchpad, and trackball, the chance probability would be 1/2. Do not correct for multiple comparisons; consider this a single test on a subset of the data.

Conduct a two-sample Chi-square test of proportions on preferences by disability status. To the nearest hundredth (two digits), what is the chi-square statistic?

Perform a two-sample G-test on preferences by disability status. To the nearest hundredth (to digits), what is the G statistic? Hint: Use the RVaideMemoire library and its G.test function.
Perform Fisher’s exact test on preferences by disability status. To the nearest ten-thousandth ( four digits), What is the p-value?

PSY 223 Final Project Guidelines and Rubric

The final project for this course is the creation of a statistical analysis report. The two research courses (PSY 223 and PSY 224) will demystify statistics and research methods in order to show that they are based on simple principles that apply to situations in the social sciences. In psychology, we need to distinguish what is “real” from what is “not real but looks real.” Is this patient really depressed? Does this form of group treatment of adolescents work better than a different form of treatment?

In this summative assessment, you will choose a scenario from a given set to be the basis for your statistical analysis report. Within the scenario, you will be given a data set based on two groups. You will apply the statistical analysis skills you have learned in this course to interpret the data and write up a report of the results. You will be evaluated not only on your computations but also on your explanation of the interpretation of the data.

The project is divided into three milestones and a final product. The milestones will be submitted at various points throughout the course to scaffold learning and to ensure quality final submissions. These milestones will be submitted in Modules Two, Four, and Five. The final project will be submitted in Module
Seven.
In this assignment, you will demonstrate your mastery of the following course outcomes:

Analyze descriptive and inferential statistics for preparing statistically accurate psychological research
Utilize appropriate statistical techniques for computing descriptive statistics and generating graphs regarding statistical analyses of psychological research
Select appropriate statistical procedures for use in statistical analyses regarding psychological research
Interpret the results of statistical analyses of psychological research data for drawing informed conclusions regarding the implications of psychological research
Assess scenarios involving statistical procedures for ensuring alignment with the expectations of the APA Ethical Principles of Psychologists

Find the remaining section of the final paper instructions in the attachment.
1 psy223_final_project_guidelines_and_rubric.pdf

Estimate the linear effect of dose

Background: This second part of the assignment requires some more theoretical work based on fitting a linear regression model to investigate the effect of three dosage levels on an outcome. Suppose a clinical investigator is interested in examining the relationship between the effect of increasing doses of vitamin D supplement given to individuals who are Vitamin-D deficient. She performs a randomised trial in which she allocates (at random) volunteers to three groups, 1000 IU (International Units), 2000 IU and 3000 IU of supplement, per day for a perion of three months, after whcih the serum levels of a key metabolite of Vitamin D called 25 (OH) D are measured in each participant. (N.B. This is a hypothetical scenrio based on a real question that is current in epidemiology at the moment.)

Question 1
One possible analysis of the data described is to estimate the linear effect of dose, i.e to assume a linear relationship of expected outcome (labelled Y, as usual) to dose level, which for simplicity we will represent as X = 1,2,3 representing doses 1000IU, 2000IU, and 3000 UI respectively. To estimate the average rate of change in Y with dose we would fit the simple linear regression model with the standard assumptions for the error term:

Yi = Bo + B1x1 + ei

To objective is to show (algebraically) that if the sample size allocation between group1 1, group 2 and group 3 is 1:1:4 (i.e. n1-n, n2=n, n3 = 4n), then the leat squares estimate of B1 is

B1 = (4Y_bar3 - 3Y_bar1 - Y-bar2)/7

Question 2
The dataset provided contains some simulated data that might have arisen from the study just described, with 15 participants in groups 1, and 2, and 60 participants in dose group 3. Fit the regression model discussed above and demonstrate that the result obtained from B1 in question 1 is true in this sample.

Dataset for use in this assignment..
dosevd_reg_KA.xlsx

Sample statistics exams from my Mathlab Homework Help

  1. At a hospital nursing station the following information is available about a patient.
    (a) Name: Jim Wood

(b) Age: 17

(c) Weight: 165 lb

(d) Height: 6’1”

(e) Blood type: A

(f) Temperature: 96.8 °F

(g) Condition: Fair

(h) Date of admission: January 21, 1998

(i) Response to treatment: Excellent

For the information (a) to (i) list the highest level of measurement as ratio, interval,
ordinal, or nominal.

  1. What technique for gathering data (sampling, experiment, simulation or census) do you think was used in
    each of the following studies?
    (a) The manager of an automobile repair shop selects a random sample of service records and records the total amount of time each vehicle was in the facility.

(b) The same manager tests a computerized diagnostic machine by comparing its performance on a random

  sample of 20 vehicles with the evaluation of a professional mechanic for the same 20 vehicles.

(c) The same manager surveys every customer who has had a car serviced to determine the quality of the

  customer’s service and the customer’s level of satisfaction with the service.

(d) An automobile manufacturer uses a computer simulation program to test the aerodynamic properties of a

 proposed new automobile body design.

(e) A service manager uses computer software to simulate a new arrangement of automotive workstations to see if the arrangement will provide more efficient service.

  1. Describe how you could use a random number table to simulate the experiment of tossing one die 275
    times. The results of tossing a die once can be any of the digits 1, 2, 3, 4, 5, or 6.
  2. What technique (observational study or experiment) for gathering data do you think was used in the
    following study?

In a national forest, 87 deer were caught, tagged, and then released back into the wild. Two weeks later,
62 deer were caught and 43 were found to have tags. From this, it was possible to estimate how many
deer live in the forest.

  1. Ticket sales for cultural attractions in a metropolitan area were as follows (in thousands of tickets) Opera: 10; Theater: 45; Symphony: 30; Ballet: 8; Other: 7.

(a) Make a circle graph for this data.

(b) Why is a circle graph a good choice for this data? What other type of graph would be appropriate?

  1. Different types of cameras are available to record memories of family, vacations and special moments. A
    survey of 1,000 cameras purchased last year showed that 250 were 35 mm, 260 were disk, 450 were instant and 40 were other types.

(a) Make a bar graph showing the camera types and the volume of sales.

(b) Make a Pareto chart of the same data.

  1. The first year students on one floor of a dorm were polled to determine how often they phoned home during the first 8 weeks of the fall term. The results are given below.
  2. 8 6 25 4 21 10 1 24 12 4 16
  3. 2 12 28 14 17 12 1 16 18 18 3
  4. 6 6 12 10 20 9 6 8 6 8 15

(a) Make a stem and leaf display of the data using 2 lines per stem.

(b) What can you say about the frequency of calls home for these first term students?

  1. Statistical Abstract of the United States (117th edition) reported the value of computers and peripherals
    produced in the United States. The data (in billions of dollars) is as follows: For 1990, 52.6; for 1991, 49.1;
    for 1992, 54.7; for 1993, 57.9; for 1994, 65.6; for 1995, 81.0.

(a) Organize the data in a table.

(b) Make a time plot of the data.

(c) What does your graph indicate about this data?

  1. A survey of students using a new automated telephone registration process identified the following
    complaints about the new system. There were 350 complaints about the line being busy. Lack of advising
    information produced 100 complaints, difficulty in entering course selection codes correctly produced 35 complaints. Difficulty in changing a previous course selection produced 120 complaints.

(a) Make a Pareto chart of this information.

(b) Based on the chart, what suggestions would you make to improve the system and cut down on the number of complaints?

  1. If you are creating a frequency polygon based on some data in a frequency table, what information from the frequency table would you use to plot the point representing each class of data?

A. the lower class limit and the class frequency
B. the lower class boundary and the class frequency
C. the class midpoint and class width
D. the class frequency and the upper class boundary
E. the class frequency and the class midpoint

  1. Which of the five choices below describes a feature that is true about Pareto charts?

A. The bars in the graph are always displayed in descending order of height from left to right.
B. The bars in the graph can be vertical or horizontal.
C. The bars in the graph always touch.
D. The intervals on the horizontal axis represent equal units of time.
E. Each data value is broken into two parts.

  1. Statistical Abstracts (117th edition) reports gasoline excise taxes, in cents per gallon, in the west (mountain region) as follows:

28 26 9 22 19 18 19 24

Find the mean, the median, and the mode of these taxes.

  1. In the process of tuna fishing, porpoises are sometimes accidentally caught and killed. A U. S. oceanographic institute wants to study the number of porpoises killed in this way. Records from eight commercial tuna fishing fleets gave the following information about the number of porpoises killed in a
    three month period:
  2. 6 18 9 0 15 3 10
    (a) Find the range.
    (b) Find the sample mean.
    (c) Find the sample standard deviation.
  3. According to data provided by the Statistical Abstract of the United States (117th edition), the number of daily newspapers in the five states in the midwest has a mean =30.8 with standard deviation =38.3.
    For five states in the Pacific region =61.4 with a standard deviation =19.07.

(a) Compute the coefficient of variation for each region.

(b) Which region has the greater variation in the number of newspapers?

  1. From years of experience fishing for trout in the Yellowstone River you know that the mean length of trout you catch is 14.7 inches with standard deviation 1.5 inches.

(a) Use Chebyshev’s Theorem to find an interval for the lengths of trout which will contain the lengths of at least 75% of the fish you catch.

(b) Use Chebyshev’s Theorem to find an interval for the lengths of trout which will contain the lengths of at least 93.8% of the fish you catch.

  1. A study was done showing the age distribution of people doing volunteer work for a random sample of 545 volunteers.
    Age 14 – 17 18 – 24 25 – 44 45 – 64 65 – 80
    Frequency 142 125 72 124 82
    (a) Estimate the sample mean age of volunteers.

(b) Estimate the sample standard deviation.

  1. In the French class at Eva College a standard weighting is given to the required activities in all sections. These weights are: Final exam: 40%; Midterm: 30%; Attendance: 10%; Language Lab: 20%. Each of the four activities is graded on a 100 point scale. George earned 93 points on the final, 82 points on the
    midterm, 75 points on attendance and 80 points on language lab. Compute his overall average in his French
    class.
  2. In one personality assessment test, a group of questions relate to self-acceptance. A random sample of 15 scores on the self-acceptance portion are
    5 20 22 27 30 17 12 15
  3. 9 18 13 12 28 19

(a) Compute the five-number summary and the interquartile range.

(b) Make a box-and-whisker plot.

Need Help with your statistics Exam?
Please contact us via WhatsApp of email:-

Use SPSS to produce a scatterplot of maths scores
Multilevel modelling assignment question
This coursework accounts for 10% of the total mark for the portfolio. In addition to the combined marks for each of the portfolio tasks, you will also be graded on the structure, presentation and clarity of the portfolio as a whole. So your work should be professionally presented, with good use of English.
In the real world, you will be expected to communicate the results from a statistical analysis you perform to non-statisticians, so you should conclude each task with a brief explanation of your results, presented in terms a layperson would understand.

Assignment description
This task is in the form of a tutorial based on Heck, Thomas and Tabata (2010). It will take you, step-by-step, through the process of building a multilevel model to explore the effect of socioeconomic status and school attended on the maths scores for a sample of American school students.
The data are presented in the file Mathscores.sav. This task must be performed using SPSS.
The file contains data for 6871 students attending 419 schools.
schcode
School identification code, numbered 1 to 419
Rid
Identification of each student within each school (non=unique)
id
Unique identifier for each student
ses
Standardised score on socio-economic index. This means that the scores have been standardised to a mean of zero and s.d. of 1. Therefore zero represents the brand mean socio-economic status across all students represented, and a unit difference represents a difference of 1 standard deviation.
math
The overall percentage scores of each student in a standard maths test. The next three variables are indicators of difference between the schools, and so may be used to explain any random effects we observe.
ses_mean
The mean of the standardised socio-economic scores within the sample from each school
per4yrc
The percentage of students planning to take a four-year university course after leaving within each school
public
Whether the school is public (1) or private (0). Note that this is the American meaning of public school, so equivalent to a British state school.

Use SPSS to produce a scatterplot of maths scores against socio-economic status using only the first 80 observations. Modify plot to add a regression line.

Hint: use Data Select Cases Based on time or case range What does this suggest about the nature of the relationship between these two variables? [2 marks]

Remove the cases selection and perform a simple regression analysis to show the effect of the socio-economic status on maths scores for all of the students in the sample.

What do the results indicate? How strong is this model?
Based on the standard regression assumptions, explain why the simple regression model may not be valid. [3 marks]
Reproduce the scatterplot (using the subset of 80 students), but this time, set markers by schcode, and add best fit lines for each school represented.

Hint: use the Add Fit Line at Subgroups option.
Use this plot to explain why multilevel modelling may be a better way of analysing this data. [3 marks]
8 marks total for Part 1
Remember to remove the case selection before moving on to the next part.

Null model random intercepts, no predictors

In this part we will build a model to show how allowing random intercepts for the different schools allows us to build a more appropriate model.

Select Analyze  Mixed Models  Linear.
Add schcode to the Subjects window. Continue. Select math as your dependent variable but don’t add any predictors.
Click the Random… button. Check that Variance Components is selected (otherwise we will also have random slopes), and an intercept is included. Add schcode to the Combinations box. Continue.

Click the Estimation button and select Maximum Likelihood. This is necessary for comparing nested models – we cannot do this if we use the default restricted ML. Continue.
Click the Statistics button and select Parameter estimates, Tests for covariance parameters, and Covariances of random effects. Continue.

Click OK.
Note the deviance and number of parameters. [1 mark]
What effect has this had on the estimate of the fixed (overall) intercept in comparison with the regression model? [1 mark]

The Estimates of Covariance Parameters table details tests for within group effects (called Residual) and the between groups effect (Intercept).

Given the null hypotheses of “no effect”, interpret these results in the context of the
data. [2 marks]