Posts tagged with sas statistics

Disaggregate Choice Model project
You have disaggregate data of choice of brands (denoted by 1, 2 and 3) in the cereal category by 50 consumers. Let’s say brands 1, 2 and 3 are Kellogg’s, Post and General Mills respectively.

The details of the dataset are as follows (a snap-shot is provided below):

customer_id price brand Brand_number int1 int2 int3
1 1.6481 0 1 1 0 0
2 1.5123 0 1 1 0 0
3 1.9469 0 1 1 0 0
4 1.8847 0 1 1 0 0
5 1.2578 0 1 1 0 0
6 1.1513 1 1 1 0 0
7 1.0651 1 1 1 0 0
8 0.8359 1 1 1 0 0
9 1.1679 1 1 1 0 0
10 2.3237 0 1 1 0 0
11 1.3236 0 1 1 0 0
12 2.0052 0 1 1 0 0
13 1.8917 0 1 1 0 0

Customer_id: This denotes/indexes the customer who is making the purchase.
Price: This indicates the price per pound of the brand.
Brand: If the particular brand is chosen by the customer, then this takes the value of 1; if the brand is not chosen then this takes the value of 0.
Brand_number: This number tells you which brand it is: brands 1, 2 and 3 are Kellogg’s, Post and General Mills respectively.
Int1, int2 and int3: these are the intercept terms that are used.

The objective of this assignment is to model the consumer’s choice of cereal brands using brands’ pricing information. Estimate the model and interpret the results of the model.

You should use proc mdc to do this. See the class notes for the appropriate code.

STA 9700: Homework 2

Reading Assignment

                      Read STA 9700 Lecture Notes 2;  Read again, write questions in margins.
                        (There is some related material in Kutner, pg. 2-27.)
                 STA 9708 LN 5 (Expectation and variance of random variables)
        

Questions based on STA 9700 Lecture Notes 2
2.1 Looking at Fig. 2.1 in Lecture Notes 2, we see that there is a general rise in the NetWt of the bags as the Count increases. While the phrase "general rise" is not clearly defined, it is certainly better than the following commonplace description, "Bags with more M&M's are heavier." That statement is far too simplistic!

(a) The data for Fig. 2.1 is shown on pages 15-17 of Lecture Notes 2. Using the data, give several examples of pairs of bags for which the statement "Bags with more M&M's are heavier" is false.

(b) Having shown that not all bags with more M&M's are heavier than all bags with fewer M&M's, consider this next vague description, "The average bag containing 18 M&M's weighs more than the average bag containing 17 M&M's." What is vague about that statement? Hint: which bag is the average bag? What is the definition of the average bag? (That is as hard as defining or locating the average American, which should be easy because we hear about that dude everyday on the news.)

(c) Critique this statement: “Since on page 12 the sample slope is 1.276 when regressing net weight on count for the 192 bags, then the sample average for bags with Count=18 must be higher than for Count=17.” And, find a counterexample in the data set, itself!

(d) What statement are we struggling to make here about the relationship between the sub-populations of Net Weights and their Count?

2.2 Putting together the BigMM SAS program and the following Proc Reg routine, we can create a SAS program that computes the sample slope, the sample intercept, and the root mean square error for each of the 8 groups of bags of M&M's (there are 24 bags per group), outputs those statistic to a SAS file, and prints the file.

               proc reg outest=LTatum;                 
               model NetWt=Count;
               By Group;
               run; 
               proc print data=LTatum; run;

The Proc Reg option "outest=LTatum" instructs SAS to save the regression statistics (or "estimates") into a SAS file named "Ltatum." The output is shown below due to difficulties with SAS, but I would be delighted if you are able to produce it yourself! The sample slopes are in the Count column.
Net

Obs    Group    _MODEL_    _TYPE_    _DEPVAR_     _RMSE_    Intercept     Count      Wt

 1        2     MODEL1     PARMS      NetWt      1.52202     25.2154     1.28176     -1
 2        3     MODEL1     PARMS      NetWt      0.94023     27.2769     1.16531     -1
 3        5     MODEL1     PARMS      NetWt      0.96081     17.6571     1.65238     -1
 4        6     MODEL1     PARMS      NetWt      1.01435     19.1121     1.59741     -1
 5        7     MODEL1     PARMS      NetWt      1.53226     26.1459     1.22875     -1
 6        8     MODEL1     PARMS      NetWt      1.09972     28.7744     1.11778     -1
 7        9     MODEL1     PARMS      NetWt      0.99709     22.1760     1.42708     -1
 8       10     MODEL1     PARMS      NetWt      1.10568     26.5912     1.18456     -1

(a) You now have 8 different sample slopes, or 8 different values for . These can be viewed as 8 values drawn from what population? (Hint: You need The Story of Many Possible Samples.)
(b) Imagine that for our production run of 10,000 bags of Peanut M&M's that we regressed the 10,000 net weights on their respective 10,000 counts. What would we call the resulting intercept and slope? Show the answer in words and Greek letters.
(c) Using The Story of Many Possible Samples, explain what it would mean to say that is an unbiased estimator.

2.3 Refer to the SAS output on page 12, for the regression using all 192 bags.
(a) Compute the value for count=18.
(b) What is estimated by b1?
(c) How is the value related to ?

Expected Value and Variance Review Questions

2.4 For a roll of a fair die with 4 sides, numbered 1 to 4, find the expected value and the variance.
2.5 Find the probability distribtuion for the average of two rolls of a fair die with four sides. Then, compute expected value and variance of the average from the distribution.
2.6 How were the answers to question 2.5 related to those of question 2.4?

2.7 Generic Calculus Questions; warming up to least squares: Find the derivative with respect to x of the following functions:

(a) y = x2 
       (b) y = (4x + 3)2 
(c) y = (-3x2 + x)

2.8 The R function

            lm(y~x) 

will regress y on x, and the function

          summary(lm(y~x)) 

produces output similar to the SAS regression output. For BigMM, see if you can get output with similar values as those given by SAS on page 16. Locate the estimate of the variance of epsilon.