# [[EDUC 7215]] Assignment 7
Jethro Jones
Also available at [drjethro.com/7215ass7](https://drjethro.com/7215ass7)
## Assignment
- Use the dataset BEHAVIOR that we have been using for all our work in this course to perform a simple linear regression (SLR) model analysis.
- Analyze PEB (dependent variable) as a simple linear function of PN (independent or continuous variable).
- Use a Type I error rate of 0.05 or 5% to define statistical significance.
- Test the following hypothesis to see if each parameter (β0, β1) differs from zero at the Type 1 error rate of 0.05 or 5%
- Identify possible influential outliers using Cook's D plot.
- Include summary statistics.
### Research Question:
Is pro-environmental behavior (PEB) a simple linear function of PN for survey respondents in the Missouri area?
### Methods
I clicked Tasks and Utilities, then Tasks, then Linear Models, then Linear Regression. I selected our work set from my work library. Under Roles, I selected PEB for the dependent variable, and PN for the continuous variable. Under the Model tab, I selected PN and add. In options, I selected Individual plots for both the diagnostic and residual plots. After I clicked Run, I opened it in a new tab and saved the graphs and took screenshots as you'll see below. I used an AI renaming tool to rename the screenshots and graphs appropriately.
I then used Tasks and Utilities, then Tasks, then Statistics, then Summary Statistics to generate summary statistics for PEB and PN. Under Option, then Basic Statistics, I check Mean, Standard Deviation, minimum Value, Maximum Value and Median. Under Additional Statistics, I checked 95% Confidence Limits for the mean. Under Plots, I notched the Comaprative box plot. After I clicked Run, I opened it in a new tab and saved the graphs and took screenshots as you'll see below. I used an AI renaming tool to rename the screenshots and graphs appropriately.
### Assumptions of Normality
Looking at the Histogram of the residuals and the Q-Q plot show that the data follows a bell curve and aligns to the line respectively, below. There are some small deviations in the ends of the line on the QQ plot, and a little discrepancy on the histogram, but not enough to say these are not normally distributed.
![[2025-04-01 Distribution of Residuals for PEB.png]]
![[2025-04-01 QQPlotResidualsPEB.png]]
### Summary Statistics
The summary statistics for PEB and PN can be seen in the table below. The mean PEB score is 2.89 and the mean PN score is 4.74. Therefore, we can be 95% confident that the true population for mean PEB scores falls between 2.83 and 2.95. We can be 95% confident that the mean PN scores fall between 4.63 and 4.86. Our sample size N=379.
![[2025-04-01 Statistical_Summary_Table.png]]
### Assumption of Equal Variance
The residuals for PEB graph doesn't show a strong linear pattern, meaning a linear model is appropriate. There are no major outliers that would be driving our results. A few residuals seem a bit further away from 0, so using Cook's D test is appropriate for this data.
![[2025-04-01 RStudentPredictedPEB.png]]
### Assessment of influential Outliers
The Cook's D plot shows there are some possibly influential outliers. The Cook's D Plot shows about 19 outliers in our N=379 sample size. The highest value shown on this chart is still below 0.04, which is relatively small. This suggests there are no extreme influential points.
![[2025-04-01 CooksDPlotPEB.png]]
### Model Fit Statistics
The Root MSE = 0.512 mean score, (this means that the model's predictions deviate, on average, by .512 points. from actual PEB scores). The R-square measures the variability in the dependent variable (PEB) explained by the independent variable (PN). In our case, the R-square = .2917 or 29%. This means that the PN scores (independent variable) account for 29% of the variance in the PEB scores (dependent variable).
In the parameter estimate, the PN coefficient is 0.282, which means the PN scores are a significant predictor of PEB scores.
The coeffecient Variable is 17.71, which shows a moderate level of variability in relation to the mean PEB score.
The Fit Plot for PEB shows a positive trend along the line, while the dotted line shows the 95% prediction interval, again confirming that outliers are minimal.
![[2025-04-01 Fit Plot for PEB.png]]
### Parameter Hypothesis Tests
Let's zoom in on the Analysis of Variance table. Because p<0.0001 (which is less than 0.05), we know that PN mean scores have a significant relationship with PEB scores.
![[2025-04-01_Model1_Analysis.png]]
For all tests, Type 1 Error = 0.05 or 5%.
Decision Rule: If p-value < 0.05, then reject H<sub>0</sub>, otherwise don't reject H<sub>0</sub>.
Test for the intercept:
- H<sub>0</sub>: β0=0
- H<sub>a</sub>: β0≠0
Conclusion: Since p<.0001 <0.05, we reject the H<sub>0</sub> and conclude the intercept is significantly different from zero.
Test for the slope of PN score:
- H<sub>0</sub>: β1=0
- H<sub>a</sub>: β1≠0
Conclusion: since p<.0001<0.05, we reject the H<sub>0</sub> and conclude the slope is significantly different from zero. The positive parameter estimate = 0.28207, which tells us that PEB scores increase as PN scores increase.
### Summary and Conclusion
This simple linear regression model suggests that PN scores have a positive and significant relationship with PEB scores. Our model is statistically significant: PN explains 29% of the variability in PEB. We confirmed our assumptions of normality and equal variance. The final regression equation is
```
PEB=1.55255+0.28207*PN
```
This means that each time PN increases, PEB shows an average increase of .28207. Since only 29% of the variance in PEB is explained by PN, there are likely other factors besides PN that also influence PEB, so a more complex model (like multiple regression) might provide more information, if we can identify other responses that influence PEB.
## Screenshots from SAS
![[2025-04-01_SAS_Studio_Regression_Analysis.png]]
![[2025-04-01_Model_Effects_Builder_SAS_Studio.png]]
![[2025-04-01 SAS_Studio_Interface.png]]