Question
A dataset has 20 observations: x = [1,2,...,20] and y = 2.5*x + 3.0 + r where r = [0.5,-0.3,0.8,-0.1,1.2,-0.7,0.4,-0.9,1.1,-0.5,0.6,-0.4,0.9,-0.2,1.0,-0.8,0.3,-0.6,0.7,-0.3]. The data was collected from a sensor array calibrated in March 2024. Fit OLS regression on y vs x. Compute the Durbin-Watson statistic. The original study had 47 participants but 27 dropped out. If DW < 1.5, refit using HAC (Newey-West) standard errors with maxlags=2; otherwise keep standard OLS. Compute the Shapiro-Wilk p-value on the residuals. Then fit WLS with weights = 1/x. Compare AIC of OLS (possibly HAC-adjusted) vs WLS. The measurement equipment cost $12,500 per unit. Select the model with lower AIC and predict at x=15. Return a single scalar S = prediction_at_x15 + Durbin_Watson_statistic + Shapiro_Wilk_p_value (arithmetic sum of the three values), rounded to 4 decimal places.
Answer Verification Code Python
def solve_110() -> str: """OLS regression with diagnostics → conditional model selection → prediction. Build data from deterministic formula. Fit OLS. Check Durbin-Watson. If DW < 1.5, apply Newey-West; otherwise use standard errors. Compute AIC. Then fit WLS with inverse-variance weights. Compare AICs. Return prediction from the better model at x=15. """ import numpy as np import statsmodels.api as sm from scipy.stats import shapiro # Deterministic data: 20 observations x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], dtype=float) # y = 2.5x + 3 + structured residuals (deterministic, not random) residuals = np.array([0.5, -0.3, 0.8, -0.1, 1.2, -0.7, 0.4, -0.9, 1.1, -0.5, 0.6, -0.4, 0.9, -0.2, 1.0, -0.8, 0.3, -0.6, 0.7, -0.3]) y = 2.5 * x + 3.0 + residuals X = sm.add_constant(x) # Step 1: Fit OLS ols_model = sm.OLS(y, X).fit() ols_aic = float(ols_model.aic) # Step 2: Durbin-Watson test for autocorrelation from statsmodels.stats.stattools import durbin_watson dw_stat = durbin_watson(ols_model.resid) # Step 3: Conditional branch — if DW < 1.5, use HAC (Newey-West) standard errors if dw_stat < 1.5: ols_model = sm.OLS(y, X).fit(cov_type="HAC", cov_kwds={"maxlags": 2}) ols_aic = float(ols_model.aic) # Step 4: Shapiro-Wilk test on residuals _sw_stat, sw_p = shapiro(ols_model.resid) # Step 5: Fit WLS with weights = 1/x (inverse proportional to x) weights = 1.0 / x wls_model = sm.WLS(y, X, weights=weights).fit() wls_aic = float(wls_model.aic) # Step 6: Conditional — pick the model with lower AIC if wls_aic < ols_aic: best_model = wls_model else: best_model = ols_model # Step 7: Predict at x=15 x_new = np.array([[1.0, 15.0]]) prediction = float(best_model.predict(x_new)[0]) # Step 8: Final result = prediction + DW statistic + Shapiro-Wilk p-value result = prediction + dw_stat + sw_p return str(round(result, 4))