Circuit Pulse Banner

Testing

Test Questions

30 Level-8 questions designed to rigorously evaluate chain-of-thought reasoning across advanced computational domains

About These Questions

This page presents 30 Level-8 benchmark questions used to evaluate the AI Chain-of-Thought Analysis program's ability to decompose, plan, and execute multi-stage computational problems. Each question is drawn from the GSM-Level8 dataset, which was originally modeled on the GSM-Symbolic benchmark introduced by Apple Research. However, through several rounds of revision, these questions have become significantly more difficult than any of the original GSM-Symbolic questions—requiring not just arithmetic reasoning but deep, cross-domain computational analysis.

Important: The Python shown on each card is answer verification code, not the agent's chain-of-thought output. It exists only to compute the benchmark's verified answer. The real model output is the clickable/tappable CoT Answer link on each card, which opens the full HTML analysis produced by the program.

What Makes These Questions Hard

Every question in this set is rated Difficulty 8—the highest tier. Several characteristics make them demanding tests of chain-of-thought reasoning:

  • Multi-stage computation: Each question requires roughly twice as many solving steps as Level-7 questions, with 8–15 discrete computational stages that must be completed in sequence. For example, Question 1 requires OLS regression fitting, Durbin-Watson diagnostics, conditional HAC correction, Shapiro-Wilk testing, WLS re-fitting, AIC comparison, model selection, and prediction—all feeding into a single final scalar.
  • Conditional branching: Every question contains at least one if/else decision point where the computation path diverges based on an intermediate result (e.g., “If DW < 1.5, refit using Newey-West standard errors”). The agent must correctly compute the intermediate value and follow the right branch.
  • Broad library coverage: Answers require combining functions from NumPy, SciPy, statsmodels, SymPy, scikit-learn, lifelines, PuLP, Pandas, and pytz—often multiple libraries per question.
  • 3+ distractor sentences: Each question embeds at least three pieces of irrelevant information (at least two numeric), designed to test whether the agent gets distracted by plausible-sounding but mathematically irrelevant details. Distractor text is highlighted like this in the question displays below.

Subject Domains

The 30 questions span a wide range of computational domains: regression diagnostics and model selection (OLS, WLS, ridge, Breusch-Pagan, Cook's distance, VIF); survival analysis (Kaplan-Meier, Cox PH, log-rank tests); linear and quadratic programming; control systems (Lyapunov equations, Bode analysis, controllability); spectral graph theory and network partitioning; meta-analysis and multi-study hypothesis testing; multi-criteria decision making (TOPSIS, Pareto fronts); numerical PDE solving (heat equation); symbolic mathematics (integration, Padé approximants, Taylor series); signal processing (FFT, bandpass filtering); clustering and dimensionality reduction (PCA, K-means, silhouette/gap statistics); and timezone-aware datetime arithmetic.

Why These Questions Test Chain-of-Thought

A chain-of-thought reasoning system must decompose each of these problems into a correct sequence of analytical tasks, select appropriate tools and libraries, execute multi-step code without error, correctly evaluate intermediate results for conditional branching, and combine all partial results into a single precise numerical answer. The combination of cross-domain knowledge, precise numerical computation, and conditional logic makes these questions a rigorous benchmark for evaluating planning, execution, and self-correction capabilities.

30 Questions
8 Difficulty
14 Exact
7 Close
9 Different
Distractor information (irrelevant to the solution)
✓ Exact match
≈ Close match (<1% error)
✗ Different answer
1
ID: 110 Difficulty: 8 Best: GPT-OSS-20b

Question

A dataset has 20 observations: x = [1,2,...,20] and y = 2.5*x + 3.0 + r where r = [0.5,-0.3,0.8,-0.1,1.2,-0.7,0.4,-0.9,1.1,-0.5,0.6,-0.4,0.9,-0.2,1.0,-0.8,0.3,-0.6,0.7,-0.3]. The data was collected from a sensor array calibrated in March 2024. Fit OLS regression on y vs x. Compute the Durbin-Watson statistic. The original study had 47 participants but 27 dropped out. If DW < 1.5, refit using HAC (Newey-West) standard errors with maxlags=2; otherwise keep standard OLS. Compute the Shapiro-Wilk p-value on the residuals. Then fit WLS with weights = 1/x. Compare AIC of OLS (possibly HAC-adjusted) vs WLS. The measurement equipment cost $12,500 per unit. Select the model with lower AIC and predict at x=15. Return a single scalar S = prediction_at_x15 + Durbin_Watson_statistic + Shapiro_Wilk_p_value (arithmetic sum of the three values), rounded to 4 decimal places.

Answer Verification Code Python

def solve_110() -> str:
    """OLS regression with diagnostics  conditional model selection  prediction.

    Build data from deterministic formula. Fit OLS. Check Durbin-Watson.
    If DW < 1.5, apply Newey-West; otherwise use standard errors.
    Compute AIC. Then fit WLS with inverse-variance weights. Compare AICs.
    Return prediction from the better model at x=15.
    """
    import numpy as np
    import statsmodels.api as sm
    from scipy.stats import shapiro

    # Deterministic data: 20 observations
    x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
                  11, 12, 13, 14, 15, 16, 17, 18, 19, 20], dtype=float)
    # y = 2.5x + 3 + structured residuals (deterministic, not random)
    residuals = np.array([0.5, -0.3, 0.8, -0.1, 1.2, -0.7, 0.4, -0.9, 1.1, -0.5,
                          0.6, -0.4, 0.9, -0.2, 1.0, -0.8, 0.3, -0.6, 0.7, -0.3])
    y = 2.5 * x + 3.0 + residuals

    X = sm.add_constant(x)

    # Step 1: Fit OLS
    ols_model = sm.OLS(y, X).fit()
    ols_aic = float(ols_model.aic)

    # Step 2: Durbin-Watson test for autocorrelation
    from statsmodels.stats.stattools import durbin_watson
    dw_stat = durbin_watson(ols_model.resid)

    # Step 3: Conditional branch — if DW < 1.5, use HAC (Newey-West) standard errors
    if dw_stat < 1.5:
        ols_model = sm.OLS(y, X).fit(cov_type="HAC", cov_kwds={"maxlags": 2})
        ols_aic = float(ols_model.aic)

    # Step 4: Shapiro-Wilk test on residuals
    _sw_stat, sw_p = shapiro(ols_model.resid)

    # Step 5: Fit WLS with weights = 1/x (inverse proportional to x)
    weights = 1.0 / x
    wls_model = sm.WLS(y, X, weights=weights).fit()
    wls_aic = float(wls_model.aic)

    # Step 6: Conditional — pick the model with lower AIC
    if wls_aic < ols_aic:
        best_model = wls_model
    else:
        best_model = ols_model

    # Step 7: Predict at x=15
    x_new = np.array([[1.0, 15.0]])
    prediction = float(best_model.predict(x_new)[0])

    # Step 8: Final result = prediction + DW statistic + Shapiro-Wilk p-value
    result = prediction + dw_stat + sw_p
    return str(round(result, 4))
Correct Answer 44.2788
CoT Answer 44.2788
2
ID: 111 Difficulty: 8 Best: GPT-OSS-20b

Question

Solve the ODE y'' + 3y' + 2y = sin(t), y(0)=0, y'(0)=0 symbolically. Evaluate the symbolic solution at 512 evenly spaced points over [0,20]. The equation models a damped spring-mass system with 8.3 kg mass. Also solve numerically using a high-accuracy integrator (rtol=1e-10, atol=1e-12). Compute the normalized cross-correlation peak between the two solutions. The numerical solver was benchmarked against 14 competing algorithms. If correlation > 0.99, integrate the symbolic solution over [0,20] using Simpson's rule; otherwise integrate the numerical solution. Compute the real-valued FFT (rfft, yielding N/2+1 frequency bins) of the symbolic solution samples. The original research grant was $250,000 over 3 years. Compute normalized spectral entropy (Shannon entropy of PSD / log2(N_rfft_bins) where N_rfft_bins is the number of rfft output values). Return integral_value + 100*normalized_entropy, rounded to 4 decimal places.

Answer Verification Code Python

def solve_111() -> str:
    """Symbolic ODE  numerical verification  FFT of solution  spectral analysis.

    Solve y'' + 3y' + 2y = sin(t) symbolically. Evaluate at 512 points over [0,20].
    Also solve numerically. Compute cross-correlation peak. Combine with spectral
    entropy of the symbolic solution's FFT. Conditional: if correlation > 0.99,
    use symbolic solution's integral; otherwise use numerical.
    """
    import numpy as np
    import sympy as sp
    from scipy.integrate import solve_ivp, simpson
    from scipy.signal import correlate

    t = sp.Symbol("t")
    y = sp.Function("y")

    # Symbolic solution of y'' + 3y' + 2y = sin(t), y(0)=0, y'(0)=0
    ode = sp.Eq(y(t).diff(t, 2) + 3 * y(t).diff(t) + 2 * y(t), sp.sin(t))
    sol = sp.dsolve(ode, y(t), ics={y(0): 0, y(t).diff(t).subs(t, 0): 0})
    y_sym = sol.rhs

    # Evaluate symbolic solution at 512 points over [0, 20]
    t_vals = np.linspace(0, 20, 512)
    y_sym_func = sp.lambdify(t, y_sym, modules=["numpy"])
    y_sym_vals = np.array([float(y_sym_func(tv)) for tv in t_vals])

    # Numerical solution
    def ode_num(t_val: float, state: list[float]) -> list[float]:
        return [state[1], -3 * state[1] - 2 * state[0] + np.sin(t_val)]

    num_sol = solve_ivp(ode_num, [0, 20], [0.0, 0.0], t_eval=t_vals, rtol=1e-10, atol=1e-12)
    y_num_vals = num_sol.y[0]

    # Cross-correlation to check agreement
    corr = correlate(y_sym_vals, y_num_vals, mode="full")
    norm = np.sqrt(np.sum(y_sym_vals**2) * np.sum(y_num_vals**2))
    max_corr = float(np.max(corr) / norm) if norm > 0 else 0.0

    # Conditional: if correlation > 0.99, integrate the symbolic solution
    if max_corr > 0.99:
        integral_val = float(simpson(y_sym_vals, x=t_vals))
    else:
        integral_val = float(simpson(y_num_vals, x=t_vals))

    # FFT spectral entropy of the chosen solution
    fft_vals = np.fft.rfft(y_sym_vals)
    psd = np.abs(fft_vals) ** 2
    psd_sum = psd.sum()
    if psd_sum > 0:
        psd_norm = psd / psd_sum
        import math
        entropy = -sum(p * math.log2(p) for p in psd_norm if p > 1e-15)
        max_entropy = math.log2(len(psd_norm))
        norm_entropy = entropy / max_entropy if max_entropy > 0 else 0.0
    else:
        norm_entropy = 0.0

    result = integral_val + 100 * norm_entropy
    return str(round(result, 4))
Correct Answer 7.9002
CoT Answer 7.9002
3
ID: 112 Difficulty: 8 Best: GPT-OSS-20b

Question

Generate a 40x5 deterministic data matrix where row i (1-indexed) has features: x1=sin(i*0.3)*5+i*0.2, x2=cos(i*0.5)*3+i*0.15, x3=log(i+1)*2, x4=(i mod 7)-3, x5=sqrt(i)*1.5-sin(i*0.7). The dataset was originally compiled from 12 weather stations in Tasmania. Apply PCA (without standardizing the features) to reduce to 2 dimensions. Cluster the 2D data into 3 groups using K-means (random_state=42, n_init=10). Compute the silhouette score. The research team consisted of 6 postdoctoral fellows. Assign deterministic survival times: for each sample i, time = 10 + cluster_label*5 + (i mod 5)*2. Every 4th sample (0-indexed i where i%4==3) is censored. Fit Kaplan-Meier on the largest cluster. Station elevation ranged from 150 to 2,100 meters. Return median_survival_time + silhouette_score, rounded to 4 decimal places.

Answer Verification Code Python

def solve_112() -> str:
    """PCA on deterministic data  cluster assignment  survival analysis.

    Build a 40x5 data matrix from deterministic formulas. PCA to 2D.
    K-means into 3 clusters. Assign survival times by cluster.
    Fit Kaplan-Meier. Return median survival of the largest cluster + silhouette score.
    """
    import numpy as np
    from sklearn.decomposition import PCA
    from sklearn.cluster import KMeans
    from sklearn.metrics import silhouette_score
    from lifelines import KaplanMeierFitter  # type: ignore[import-untyped]

    # Deterministic data: 40 samples, 5 features
    n = 40
    indices = np.arange(1, n + 1, dtype=float)
    x1 = np.sin(indices * 0.3) * 5 + indices * 0.2
    x2 = np.cos(indices * 0.5) * 3 + indices * 0.15
    x3 = np.log(indices + 1) * 2
    x4 = (indices % 7).astype(float) - 3.0
    x5 = np.sqrt(indices) * 1.5 - np.sin(indices * 0.7)
    X = np.column_stack([x1, x2, x3, x4, x5])

    # PCA to 2D
    pca = PCA(n_components=2)
    X_2d = pca.fit_transform(X)

    # K-means clustering (k=3)
    km = KMeans(n_clusters=3, random_state=42, n_init=10)
    labels = km.fit_predict(X_2d)

    # Silhouette score
    sil = float(silhouette_score(X_2d, labels))

    # Assign deterministic survival times based on cluster membership
    # Cluster label → base survival time mapping (sorted by cluster size)
    from collections import Counter
    cluster_counts: Counter[int] = Counter(labels.tolist())
    sorted_clusters = sorted(cluster_counts.keys(), key=lambda c: -cluster_counts[c])
    largest_cluster = sorted_clusters[0]

    survival_times = np.zeros(n)
    event_observed = np.ones(n)
    for i in range(n):
        cl = labels[i]
        # Deterministic survival time based on position and cluster
        base = 10.0 + cl * 5.0 + (i % 5) * 2.0
        survival_times[i] = base
        # Censoring: every 4th observation in each cluster is censored
        if i % 4 == 3:
            event_observed[i] = 0

    # Kaplan-Meier for the largest cluster
    mask = labels == largest_cluster
    kmf = KaplanMeierFitter()
    kmf.fit(survival_times[mask], event_observed=event_observed[mask])
    median_surv = float(kmf.median_survival_time_)

    result = median_surv + sil
    return str(round(result, 4))
Correct Answer 21.457
CoT Answer 21.457
4
ID: 113 Difficulty: 8 Best: GPT-OSS-20b

Question

A factory produces 4 products using 3 resources. Profit per unit: [5,4,3,7]. Resource consumption matrix (rows=resources, cols=products): [[2,1,1,3],[1,3,2,1],[3,1,2,2]]. Resource limits: [100,120,150]. The factory operates 16 shifts per week. Solve the LP (maximize profit). Compute shadow prices from dual values. The plant manager has 22 years of experience. If any shadow price exceeds 2.0, add a slack variable for that resource with bounds [0,20] and cost 0, then re-solve the modified LP. The factory was built in 1987 at a cost of $4.2 million. Sum the shadow prices of all binding constraints (shadow price > 1e-6). Return optimal_value_after_modification + binding_shadow_price_sum, rounded to 4 decimal places.

Answer Verification Code Python

def solve_113() -> str:
    """Linear programming  sensitivity analysis  parametric optimization.

    Solve a production LP with 4 products, 3 resources. Compute shadow prices.
    If any shadow price exceeds a threshold, add a slack variable and re-solve.
    Compute reduced costs. Return optimal value + sum of binding shadow prices.
    """
    import numpy as np
    from scipy.optimize import linprog

    # 4 products, 3 resource constraints
    # Maximize: 5x1 + 4x2 + 3x3 + 7x4  → minimize: -5x1 -4x2 -3x3 -7x4
    c = [-5, -4, -3, -7]

    # Resource constraints: A_ub @ x <= b_ub
    A_ub = [
        [2, 1, 1, 3],   # Resource A: 100 units
        [1, 3, 2, 1],   # Resource B: 120 units
        [3, 1, 2, 2],   # Resource C: 150 units
    ]
    b_ub = [100, 120, 150]
    bounds = [(0, None)] * 4

    # Step 1: Initial solve
    res1 = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds, method="highs")
    assert res1.success and res1.fun is not None
    optimal_1 = -res1.fun

    # Step 2: Compute shadow prices from dual values
    # In HiGHS, the dual values for inequality constraints are in res.ineqlin.marginals
    shadow_prices = np.abs(res1.ineqlin.marginals)

    # Step 3: Conditional — if any shadow price > 2.0, add slack to that resource
    threshold = 2.0
    modified = False
    A_ub_mod = [row[:] for row in A_ub]
    b_ub_mod = b_ub[:]
    c_mod = c[:]

    for i, sp_val in enumerate(shadow_prices):
        if sp_val > threshold:
            # Add a slack variable for this resource (cost = 0, adds capacity)
            for j, row in enumerate(A_ub_mod):
                if j == i:
                    row.append(-1)  # slack reduces consumption
                else:
                    row.append(0)
            c_mod.append(0)
            bounds.append((0, 20))  # Max 20 extra units
            modified = True

    if modified:
        res2 = linprog(c_mod, A_ub=A_ub_mod, b_ub=b_ub_mod, bounds=bounds, method="highs")
        assert res2.success and res2.fun is not None
        optimal_2 = -res2.fun
    else:
        optimal_2 = optimal_1

    # Step 4: Sum of shadow prices for binding constraints (slack = 0)
    binding_shadow_sum = float(np.sum(shadow_prices[shadow_prices > 1e-6]))

    result = optimal_2 + binding_shadow_sum
    return str(round(result, 4))
Correct Answer 337.4154
CoT Answer 340.0846
5
ID: 114 Difficulty: 8 Best: GPT-OSS-20b

Question

A deterministic time series has 60 points: for t in [0,59], y = 2t+5+sin(t*0.5)*2 when t<20, y = -t+65+cos(t*0.3)*3 when 20<=t<40, y = 0.5t+10+sin(t*0.7)*1.5 when t>=40. The sensor was installed on floor 23 of a 45-story building. Compute the EMA with span=5 (alpha=2/6). Apply CUSUM changepoint detection on the EMA (cumulative sum of deviations from the mean). The equipment serial number is AX-7842-B. Find changepoints where the second difference of CUSUM changes sign, filtering to minimum spacing of 10. If more than 2 significant changepoints are found, fit piecewise linear regression using the top 3 changepoints as segment boundaries; otherwise fit a single linear regression on the full EMA. The building was constructed in 2018 at a cost of $89 million. Compute RMSE between the EMA and the fitted predictions. Return RMSE + number_of_significant_changepoints, rounded to 4 decimal places.

Answer Verification Code Python

def solve_114() -> str:
    """Time series EMA  changepoint detection  regression on segments.

    Compute EMA of deterministic time series. Detect changepoints using
    cumulative sum (CUSUM). If more than 2 changepoints, fit piecewise
    linear regression; otherwise fit single linear regression. Return
    RMSE + number of changepoints.
    """
    import numpy as np
    import statsmodels.api as sm

    # Deterministic time series: 60 points with two regime changes
    t = np.arange(60, dtype=float)
    y = np.where(t < 20, 2.0 * t + 5.0 + np.sin(t * 0.5) * 2,
                 np.where(t < 40, -1.0 * t + 65.0 + np.cos(t * 0.3) * 3,
                          0.5 * t + 10.0 + np.sin(t * 0.7) * 1.5))

    # Step 1: Compute EMA (span=5)
    alpha = 2.0 / (5 + 1)
    ema = np.zeros_like(y)
    ema[0] = y[0]
    for i in range(1, len(y)):
        ema[i] = alpha * y[i] + (1 - alpha) * ema[i - 1]

    # Step 2: CUSUM changepoint detection
    mean_y = np.mean(ema)
    cusum = np.cumsum(ema - mean_y)

    # Find changepoints: points where CUSUM changes direction significantly
    diffs = np.diff(np.sign(np.diff(cusum)))
    changepoints = np.where(np.abs(diffs) > 0.5)[0] + 1

    # Filter to significant changepoints (minimum spacing of 10)
    significant_cps: list[int] = []
    last_cp = -10
    for cp in changepoints:
        if cp - last_cp >= 10:
            significant_cps.append(int(cp))
            last_cp = cp

    n_changepoints = len(significant_cps)

    # Step 3: Conditional — piecewise vs single regression
    if n_changepoints > 2:
        # Piecewise linear regression using significant changepoints
        # Use top 2 changepoints
        cps_sorted = sorted(significant_cps[:3])
        segments = [0] + cps_sorted + [len(t)]
        predictions = np.zeros_like(y)
        for seg_idx in range(len(segments) - 1):
            start, end = segments[seg_idx], segments[seg_idx + 1]
            t_seg = t[start:end].reshape(-1, 1)
            y_seg = ema[start:end]
            X_seg = sm.add_constant(t_seg)
            model = sm.OLS(y_seg, X_seg).fit()
            predictions[start:end] = model.predict(X_seg)
    else:
        X_full = sm.add_constant(t.reshape(-1, 1))
        model = sm.OLS(ema, X_full).fit()
        predictions = model.predict(X_full)

    # Step 4: RMSE
    rmse = float(np.sqrt(np.mean((ema - predictions) ** 2)))

    result = rmse + n_changepoints
    return str(round(result, 4))
Correct Answer 5.7836
CoT Answer 4.0788
6
ID: 115 Difficulty: 8 Best: GPT-OSS-20b

Question

A 6x6 matrix A has rows: [10,2,3,1,-1,4], [2,15,-1,3,2,-1], [3,-1,12,2,4,1], [1,3,2,18,-2,3], [-1,2,4,-2,14,2], [4,-1,1,3,2,16]. The matrix was derived from a structural analysis of a bridge with 73 cable stays. Solve Ax=b where b=[30,25,20,35,15,28]. Compute the condition number of A. The analysis software license costs $8,400 annually. If condition number > 100, solve using SVD pseudoinverse; otherwise use standard LU-based solve. Apply one step of iterative refinement: compute residual r=b-Ax, solve for correction c=A^(-1)*r, then x_refined = x + c. Compute SVD of A and find the spectral gap (ratio S[0]/S[1]). The bridge spans 1,240 meters across the river. Return ||x_refined|| + spectral_gap + log10(condition_number), rounded to 4 decimal places.

Answer Verification Code Python

def solve_115() -> str:
    """Matrix decomposition  condition number check  iterative refinement.

    Build a 6x6 matrix from deterministic values. Compute LU decomposition.
    Check condition number. If condition > 100, use SVD pseudoinverse;
    otherwise use LU solve. Solve Ax=b, compute residual norm.
    Also compute the spectral gap (ratio of largest to second-largest
    singular value). Return solution norm + spectral gap + log10(condition).
    """
    import numpy as np
    import math

    # Deterministic 6x6 matrix
    A = np.array([
        [10, 2, 3, 1, -1, 4],
        [2, 15, -1, 3, 2, -1],
        [3, -1, 12, 2, 4, 1],
        [1, 3, 2, 18, -2, 3],
        [-1, 2, 4, -2, 14, 2],
        [4, -1, 1, 3, 2, 16],
    ], dtype=float)

    b = np.array([30, 25, 20, 35, 15, 28], dtype=float)

    # Step 1: Compute condition number
    cond = float(np.linalg.cond(A))
    log_cond = math.log10(cond)

    # Step 2: SVD for singular values
    _, S, _ = np.linalg.svd(A)
    spectral_gap = float(S[0] / S[1])

    # Step 3: Conditional solve
    if cond > 100:
        # Use SVD pseudoinverse
        x = np.linalg.lstsq(A, b, rcond=None)[0]
    else:
        # Use standard solve (LU-based)
        x = np.linalg.solve(A, b)

    # Step 4: Iterative refinement (one step)
    residual = b - A @ x
    correction = np.linalg.solve(A, residual)
    x_refined = x + correction

    # Step 5: Solution norm
    sol_norm = float(np.linalg.norm(x_refined))

    result = sol_norm + spectral_gap + log_cond
    return str(round(result, 4))
Correct Answer 5.2734
CoT Answer 5.2734
7
ID: 116 Difficulty: 8 Best: GPT-OSS-20b

Question

Compute the definite integral of exp(-x^2) from 0 to 1 symbolically. The integral appears in quantum tunneling probability calculations. Compute the Taylor series of exp(-x^2) around x=0 to order 8. Integrate the series term-by-term from 0 to 1. Construct a [2/2] Padé approximant in u=x^2 by matching the first 5 Taylor coefficients of exp(-u): c0=1, c1=-1, c2=1/2, c3=-1/6, c4=1/24. The original paper had 847 citations as of January 2026. Solve the coefficient matching system for the Padé numerator P(u)=p0+p1*u+p2*u^2 and denominator Q(u)=1+q1*u+q2*u^2. Evaluate the Padé approximant at x=0.8. The laboratory temperature was maintained at 22.5°C. If the relative error vs exp(-0.64) is < 1%, use the Padé value; otherwise use the exact value. Find the poles of Q(u) as roots in x. Return integral_value + chosen_value + sum_of_absolute_real_parts_of_poles, rounded to 4 decimal places.

Answer Verification Code Python

def solve_116() -> str:
    """Symbolic integration  series expansion  Padé approximant  pole analysis.

    Compute a definite integral symbolically. Take its Taylor series.
    Construct a [2/2] Padé approximant manually by matching coefficients.
    Find poles. Evaluate at a test point. If the approximant error is < 1%,
    return Padé value; otherwise return exact value. Combine with pole locations.
    """
    import sympy as sp

    x = sp.Symbol("x")

    # Step 1: Symbolic integral of exp(-x^2) from 0 to 1 (erf-related)
    integral_val = float(sp.integrate(sp.exp(-x**2), (x, 0, 1)))

    # Step 2: Taylor series of exp(-x^2) around x=0, order 8
    sp.series(sp.exp(-x**2), x, 0, 9).removeO()  # validate series expansion

    # Step 4: Construct [4/4] Padé approximant of exp(-x^2) manually
    # exp(-x^2) = 1 - x^2 + x^4/2 - x^6/6 + x^8/24 - ...
    # Let u = x^2. Then exp(-u) = 1 - u + u^2/2 - u^3/6 + u^4/24
    # [2/2] Padé in u: P(u)/Q(u) matching coefficients c0..c4
    # c0=1, c1=-1, c2=1/2, c3=-1/6, c4=1/24
    # P(u) = p0 + p1*u + p2*u^2, Q(u) = 1 + q1*u + q2*u^2
    # Matching: p0=1, p1+q1=-1, p2+q1*c1+q2=c2, q1*c2+q2*c1=c3, q1*c3+q2*c2=c4
    # From the system:
    # p0 = 1
    # p1 = c1 - q1 = -1 - q1
    # p2 = c2 - q1*c1 - q2 = 1/2 + q1 - q2
    # q1*c2 + q2*c1 = c3  → q1/2 - q2 = -1/6
    # q1*c3 + q2*c2 = c4  → -q1/6 + q2/2 = 1/24
    # Solving: from eq4: q1/2 - q2 = -1/6 → q2 = q1/2 + 1/6
    # Sub into eq5: -q1/6 + (q1/2+1/6)/2 = 1/24
    # -q1/6 + q1/4 + 1/12 = 1/24
    # q1(-1/6+1/4) + 1/12 = 1/24
    # q1(1/12) = 1/24 - 1/12 = -1/24
    # q1 = -1/2
    q1_val = sp.Rational(-1, 2)
    q2_val = q1_val / 2 + sp.Rational(1, 6)  # = -1/4 + 1/6 = -1/12
    p0_val = sp.Integer(1)
    p1_val = sp.Integer(-1) - q1_val  # = -1/2
    p2_val = sp.Rational(1, 2) + q1_val - q2_val  # = 1/2 - 1/2 + 1/12 = 1/12

    u = x**2
    P_expr = p0_val + p1_val * u + p2_val * u**2
    Q_expr = 1 + q1_val * u + q2_val * u**2
    pade_expr = cast(sp.Expr, P_expr / Q_expr)

    # Step 5: Evaluate at x = 0.8
    exact_val = float(sp.exp(-sp.Rational(64, 100)))
    pade_val = float(pade_expr.subs(x, sp.Rational(4, 5)))

    # Step 6: Conditional — check approximation error
    error = abs(exact_val - pade_val) / abs(exact_val)
    if error < 0.01:
        chosen_val = pade_val
    else:
        chosen_val = exact_val

    # Step 7: Find poles of the Padé approximant (roots of denominator Q)
    poles = sp.solve(Q_expr, x)
    pole_sum = sum(abs(complex(sp.N(p)).real) for p in poles) if poles else 0.0

    result = integral_val + chosen_val + float(pole_sum)
    return str(round(result, 4))
Correct Answer 3.7901
CoT Answer 2.2467
8
ID: 117 Difficulty: 8 Best: GPT-OSS-20b

Question

A clinical trial produced a 2x3 contingency table (Treatment [A,B] × Outcome [Low,Medium,High]): row A = [15,25,10], row B = [8,18,24]. The trial enrolled patients from 37 hospitals across 12 countries. Run a chi-square test for independence. The study budget was $3.7 million over 4 years. If p < 0.05, expand the contingency table into individual-level data (each cell generates that many data points with features [treatment, outcome_level] and binary target: high_outcome=1 if outcome_level==2). Fit logistic regression (random_state=42, solver='lbfgs') and compute ROC AUC. The principal investigator has published 156 papers. If AUC > 0.7, compute the mean predicted probability from logistic regression; otherwise use the overall proportion of high outcomes. Return a single scalar S = AUC + chi_square_statistic + chosen_mean_value, rounded to 4 decimal places. If p >= 0.05 (so logistic regression is not fit), set AUC = 0.5 before computing S.

Answer Verification Code Python

def solve_117() -> str:
    """Chi-square test  Fisher's exact test  logistic regression  ROC AUC.

    Build a 2x3 contingency table from deterministic data. Chi-square test
    for independence. If p < 0.05, proceed with logistic regression on
    expanded data. Compute ROC AUC. If AUC > 0.7, use logistic predictions;
    otherwise use baseline proportions. Return AUC + chi-square statistic.
    """
    import numpy as np
    from scipy.stats import chi2_contingency
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import roc_auc_score

    # Deterministic 2x3 contingency table
    # Treatment vs Outcome (Low/Medium/High)
    observed = np.array([
        [15, 25, 10],  # Treatment A
        [8, 18, 24],   # Treatment B
    ])

    # Step 1: Chi-square test
    chi2, p_val, _dof, _expected = chi2_contingency(observed)

    # Step 2: Expand contingency table into individual-level data
    # Each cell (i, j) generates observed[i,j] data points
    X_list: list[list[float]] = []
    y_list: list[int] = []
    for i in range(observed.shape[0]):
        for j in range(observed.shape[1]):
            count = observed[i, j]
            for _ in range(count):
                # Feature: treatment (0 or 1) and outcome level (0, 1, 2)
                X_list.append([float(i), float(j)])
                # Binary target: high outcome (j == 2) or not
                y_list.append(1 if j == 2 else 0)

    X = np.array(X_list)
    y = np.array(y_list)

    # Step 3: Conditional — if chi-square is significant, fit logistic regression
    y_pred_prob = np.array([], dtype=float)
    if p_val < 0.05:
        lr = LogisticRegression(random_state=42, solver="lbfgs").fit(X, y)
        y_pred_prob = lr.predict_proba(X)[:, 1]
        auc = float(roc_auc_score(y, y_pred_prob))
    else:
        auc = 0.5  # baseline

    # Step 4: Conditional — if AUC > 0.7, compute mean predicted probability
    if auc > 0.7:
        result_val = float(np.mean(y_pred_prob))
    else:
        result_val = float(np.mean(y))

    result = auc + chi2 + result_val
    return str(round(result, 4))
Correct Answer 10.3746
CoT Answer 10.3746
9
ID: 118 Difficulty: 8 Best: GPT-OSS-20b

Question

Eight offices have (name, UTC_offset, staff_count): (New York,-5,12), (London,0,15), (Berlin,1,8), (Dubai,4,10), (Mumbai,5,14), (Singapore,8,9), (Tokyo,9,11), (Sydney,10,7). The company was founded in 2003 with just 4 employees. For each pair of offices, compute business hour overlap assuming 9:00-17:00 local time, by converting to UTC windows and finding intersection length. Sum all pairwise overlaps. The CEO's corner office is on the 58th floor. Build an LP: maximize sum(staff_i * x_i) where x_i = meeting hours for office i (0-8 each), subject to x_i+x_j <= overlap(i,j) for each pair. The company's annual revenue is $1.2 billion. If the optimal value exceeds 50, add a constraint sum(x_i) <= 30 and re-solve. Return final_optimal_value + total_pairwise_overlap, rounded to 4 decimal places.

Answer Verification Code Python

def solve_118() -> str:
    """Timezone calculations  business hour overlap  LP scheduling.

    Given 8 offices with UTC offsets and staff counts, compute pairwise
    business hour overlaps. Build an LP to maximize total meeting coverage.
    If the optimal value exceeds a threshold, add a constraint and re-solve.
    Return optimal + total overlap hours.
    """
    import numpy as np
    from scipy.optimize import linprog

    # 8 offices with UTC offsets and staff counts
    offices = [
        ("New York", -5, 12),
        ("London", 0, 15),
        ("Berlin", 1, 8),
        ("Dubai", 4, 10),
        ("Mumbai", 5, 14),  # Using 5 instead of 5.5 for simplicity in LP
        ("Singapore", 8, 9),
        ("Tokyo", 9, 11),
        ("Sydney", 10, 7),
    ]

    # Step 1: Compute pairwise business hour overlaps (9:00-17:00 local)
    n = len(offices)
    overlap_matrix = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            off_i = offices[i][1]
            off_j = offices[j][1]
            # Business hours in UTC for office i: [9-off_i, 17-off_i]
            start_i = 9 - off_i
            end_i = 17 - off_i
            start_j = 9 - off_j
            end_j = 17 - off_j
            overlap = max(0, min(end_i, end_j) - max(start_i, start_j))
            overlap_matrix[i, j] = overlap

    total_overlap = float(np.sum(overlap_matrix)) / 2  # Each pair counted twice

    # Step 2: LP — schedule meetings to maximize weighted coverage
    # Decision variables: x_i = number of meeting hours for office i (0-8)
    # Maximize sum(staff_i * x_i) subject to pairwise overlap constraints
    staff = np.array([o[2] for o in offices], dtype=float)
    c = -staff  # minimize negative = maximize

    # Constraints: x_i + x_j <= overlap(i,j) for each pair
    A_ub_list: list[list[float]] = []
    b_ub_list: list[float] = []
    for i in range(n):
        for j in range(i + 1, n):
            row = [0.0] * n
            row[i] = 1.0
            row[j] = 1.0
            A_ub_list.append(row)
            b_ub_list.append(overlap_matrix[i, j])

    bounds = [(0, 8)] * n
    res = linprog(c, A_ub=A_ub_list, b_ub=b_ub_list, bounds=bounds, method="highs")
    assert res.success and res.fun is not None
    optimal_1 = -res.fun

    # Step 3: Conditional — if optimal exceeds 50, add budget constraint
    if optimal_1 > 50:
        # Add constraint: total meeting hours <= 30
        A_ub_list.append([1.0] * n)
        b_ub_list.append(30.0)
        res2 = linprog(c, A_ub=A_ub_list, b_ub=b_ub_list, bounds=bounds, method="highs")
        assert res2.success and res2.fun is not None
        optimal_final = -res2.fun
    else:
        optimal_final = optimal_1

    result = optimal_final + total_overlap
    return str(round(result, 4))
Correct Answer 109.0
CoT Answer 125.0
10
ID: 119 Difficulty: 8 Best: GPT-OSS-20b

Question

A 10-node weighted graph has 18 edges: (0,1,5),(0,2,3),(0,3,1),(1,2,4),(1,4,2),(2,3,6),(2,5,1),(3,5,3),(3,6,2),(4,7,5),(4,8,3),(5,6,4),(5,9,2),(6,7,1),(6,9,3),(7,8,6),(7,9,2),(8,9,4). The network was designed by a team of 9 engineers in Helsinki. Build the weighted adjacency matrix, degree matrix, and graph Laplacian L=D-A. Compute all eigenvalues of L. The second-smallest eigenvalue (Fiedler value) measures algebraic connectivity. The project received €2.4 million in EU funding. Partition nodes into 2 groups based on the sign of the Fiedler vector. Compute modularity Q = (1/2m)*sum[(A[i,j]-d[i]*d[j]/(2m)) * delta(g_i,g_j)]. The total cable length is 847 kilometers. If Q > 0.3, sub-partition the larger group using its sub-Laplacian's Fiedler vector and add the sub-Fiedler value to the result. Return Q + fiedler_value + partition_size_ratio (min/max group sizes) [+ sub_fiedler if applicable], rounded to 4 decimal places.

Answer Verification Code Python

def solve_119() -> str:
    """Eigenvalue decomposition  graph partitioning  modularity computation.

    Build a 10-node weighted graph from deterministic edge weights.
    Compute the Fiedler vector (2nd eigenvector of Laplacian). Partition
    into 2 groups based on sign of Fiedler vector. Compute modularity.
    If modularity > 0.3, sub-partition the larger group; otherwise keep
    2 groups. Return modularity + Fiedler value + partition size ratio.
    """
    import numpy as np

    n = 10
    # Deterministic weighted adjacency matrix (symmetric)
    edges = [
        (0, 1, 5), (0, 2, 3), (0, 3, 1),
        (1, 2, 4), (1, 4, 2),
        (2, 3, 6), (2, 5, 1),
        (3, 5, 3), (3, 6, 2),
        (4, 7, 5), (4, 8, 3),
        (5, 6, 4), (5, 9, 2),
        (6, 7, 1), (6, 9, 3),
        (7, 8, 6), (7, 9, 2),
        (8, 9, 4),
    ]

    A = np.zeros((n, n))
    for i, j, w in edges:
        A[i, j] = w
        A[j, i] = w

    # Degree matrix and Laplacian
    D = np.diag(A.sum(axis=1))
    L = D - A

    # Eigenvalue decomposition of Laplacian
    eigenvalues, eigenvectors = np.linalg.eigh(L)
    fiedler_value = float(eigenvalues[1])
    fiedler_vector = eigenvectors[:, 1]

    # Partition based on Fiedler vector sign
    group1 = np.where(fiedler_vector >= 0)[0]
    group2 = np.where(fiedler_vector < 0)[0]

    # Compute modularity
    m = float(A.sum() / 2)  # total edge weight
    degrees = A.sum(axis=1)
    q_sum = 0.0
    for i in range(n):
        for j in range(n):
            if (i in group1 and j in group1) or (i in group2 and j in group2):
                q_sum += A[i, j] - degrees[i] * degrees[j] / (2 * m)
    q_sum /= (2 * m)
    modularity = float(q_sum)

    # Partition size ratio
    size_ratio = min(len(group1), len(group2)) / max(len(group1), len(group2))

    # Conditional: if modularity > 0.3, sub-partition the larger group
    if modularity > 0.3:
        larger = group1 if len(group1) >= len(group2) else group2
        # Sub-Laplacian for the larger group
        sub_A = A[np.ix_(larger, larger)]
        sub_D = np.diag(sub_A.sum(axis=1))
        sub_L = sub_D - sub_A
        if len(larger) > 2:
            sub_evals = np.linalg.eigvalsh(sub_L)
            sub_fiedler = float(sub_evals[1])
        else:
            sub_fiedler = 0.0
        result = modularity + fiedler_value + size_ratio + sub_fiedler
    else:
        result = modularity + fiedler_value + size_ratio

    return str(round(result, 4))
Correct Answer 7.1079
CoT Answer 7.1079
11
ID: 120 Difficulty: 8 Best: GPT-OSS-120b

Question

Generate 30 data points: x = linspace(1,10,30) and y = 3*x^2 - 2*x + 5 + noise where noise[i] = (-1)^i * (0.5 + 0.3*x[i]). The experiment was conducted at an altitude of 2,350 meters. Fit OLS with polynomial features [x, x^2]. Compute residuals. Run the Breusch-Pagan test for heteroscedasticity. The lab technician had 14 years of training. If p < 0.1 (heteroscedastic), estimate the variance function by regressing |residuals| on x, then apply WLS with weights = 1/fitted_variance^2 (clipped minimum 0.01). Compare R² of OLS vs WLS. The spectrometer was recalibrated every 72 hours. Return max(R²_OLS, R²_WLS) + Breusch_Pagan_statistic + polynomial_degree, rounded to 4 decimal places.

Answer Verification Code Python

def solve_120() -> str:
    """Polynomial fitting  residual analysis  heteroscedasticity test  WLS.

    Fit polynomial to deterministic data. Compute residuals. Breusch-Pagan
    test. If heteroscedastic, estimate variance function and apply WLS.
    Compare R-squared of OLS vs WLS. Return the better R-squared +
    Breusch-Pagan statistic + polynomial degree.
    """
    import numpy as np
    import statsmodels.api as sm
    from statsmodels.stats.diagnostic import het_breuschpagan

    # Deterministic data with heteroscedastic noise pattern
    x = np.linspace(1, 10, 30)
    # y = 3x^2 - 2x + 5 + heteroscedastic noise (proportional to x)
    noise = np.array([(-1)**i * (0.5 + 0.3 * x[i]) for i in range(30)])
    y = 3 * x**2 - 2 * x + 5 + noise

    # Step 1: Fit OLS with polynomial features (degree 2)
    X_poly = np.column_stack([x, x**2])
    X_poly_c = sm.add_constant(X_poly)
    ols_model = sm.OLS(y, X_poly_c).fit()
    r2_ols = float(ols_model.rsquared)

    # Step 2: Residual analysis
    residuals = ols_model.resid

    # Step 3: Breusch-Pagan test for heteroscedasticity
    bp_stat, bp_pval, _, _ = het_breuschpagan(residuals, X_poly_c)

    # Step 4: Conditional — if heteroscedastic (p < 0.1), apply WLS
    if bp_pval < 0.1:
        # Estimate variance function: |residuals| ~ linear in x
        abs_resid = np.abs(residuals)
        X_var = sm.add_constant(x.reshape(-1, 1))
        var_model = sm.OLS(abs_resid, X_var).fit()
        fitted_var = var_model.predict(X_var)
        weights = 1.0 / np.maximum(fitted_var**2, 0.01)

        wls_model = sm.WLS(y, X_poly_c, weights=weights).fit()
        r2_wls = float(wls_model.rsquared)
        best_r2 = max(r2_ols, r2_wls)
    else:
        best_r2 = r2_ols

    poly_degree = 2
    result = best_r2 + bp_stat + poly_degree
    return str(round(result, 4))
Correct Answer 30.4151
CoT Answer 30.4154
12
ID: 121 Difficulty: 8 Best: GPT-OSS-20b

Question

Minimize 0.5*x^T*Q*x + c^T*x where Q is a 5x5 PD matrix with rows: [4,1,0,0,0],[1,3,1,0,0],[0,1,5,1,0],[0,0,1,4,1],[0,0,0,1,6] and c=[-3,-2,-4,-1,-5]. The optimization problem arose from a logistics network with 83 distribution centers. Subject to constraints: [1,1,1,0,0]x<=10, [0,1,1,1,0]x<=8, [0,0,1,1,1]x<=12, [1,0,0,1,1]x<=9, and x>=0. Solve using SLSQP. The network handles 14,500 shipments daily. Compute constraint slacks. Count active constraints (slack < 0.01). Estimate dual variables by perturbing each constraint by 0.01 and re-solving. The fleet consists of 247 trucks. Compute duality gap = |primal_obj - (primal_obj + sum(dual*slack))|. Check complementary slackness: if any dual > 0.01 has slack > 0.01, count as violation. Return a single scalar S = optimal_value + duality_gap + active_count; if complementary-slackness violations exist (violation_count > 0), add violation_count to S. Round to 4 decimal places.

Answer Verification Code Python

def solve_121() -> str:
    """Convex optimization  dual problem  complementary slackness check.

    Solve a quadratic program with 5 variables. Extract dual variables.
    Check complementary slackness. If all satisfied within tolerance,
    compute the duality gap; otherwise identify the violated constraint.
    Return optimal value + duality gap + number of active constraints.
    """
    import numpy as np
    from scipy.optimize import minimize

    # Quadratic objective: minimize 0.5 * x^T Q x + c^T x
    Q = np.array([
        [4, 1, 0, 0, 0],
        [1, 3, 1, 0, 0],
        [0, 1, 5, 1, 0],
        [0, 0, 1, 4, 1],
        [0, 0, 0, 1, 6],
    ], dtype=float)

    c_vec = np.array([-3, -2, -4, -1, -5], dtype=float)

    # Constraints: A @ x <= b
    A_ub = np.array([
        [1, 1, 1, 0, 0],
        [0, 1, 1, 1, 0],
        [0, 0, 1, 1, 1],
        [1, 0, 0, 1, 1],
    ], dtype=float)
    b_ub = np.array([10, 8, 12, 9], dtype=float)

    # Step 1: Solve the QP
    def objective(x: np.ndarray) -> float:
        return float(0.5 * x @ Q @ x + c_vec @ x)

    def jac(x: np.ndarray) -> np.ndarray:
        return Q @ x + c_vec

    constraints = []
    for i in range(len(b_ub)):
        constraints.append({
            "type": "ineq",
            "fun": lambda x, i=i: float(b_ub[i] - A_ub[i] @ x),
        })

    x0 = np.ones(5) * 0.5
    bounds = [(0, None)] * 5
    res = minimize(objective, x0, jac=jac, method="SLSQP",
                   constraints=constraints, bounds=bounds)

    optimal_val = float(res.fun)
    x_opt = res.x

    # Step 2: Compute constraint slacks
    slacks = b_ub - A_ub @ x_opt
    active_count = int(np.sum(np.abs(slacks) < 0.01))

    # Step 3: Dual estimation — use Lagrange multipliers from KKT
    # Approximate dual via sensitivity: perturb each constraint and re-solve
    dual_vars = np.zeros(len(b_ub))
    for i in range(len(b_ub)):
        b_perturbed = b_ub.copy()
        b_perturbed[i] += 0.01
        constraints_p = []
        for j in range(len(b_ub)):
            constraints_p.append({
                "type": "ineq",
                "fun": lambda x, j=j, bp=b_perturbed: float(bp[j] - A_ub[j] @ x),
            })
        res_p = minimize(objective, x_opt, jac=jac, method="SLSQP",
                         constraints=constraints_p, bounds=bounds)
        dual_vars[i] = (optimal_val - res_p.fun) / 0.01

    # Step 4: Duality gap estimate
    dual_obj = optimal_val + float(np.sum(dual_vars * slacks))
    duality_gap = abs(optimal_val - dual_obj)

    # Step 5: Conditional — check complementary slackness
    cs_violations = 0
    for i in range(len(b_ub)):
        if abs(dual_vars[i]) > 0.01 and slacks[i] > 0.01:
            cs_violations += 1

    if cs_violations == 0:
        result = optimal_val + duality_gap + active_count
    else:
        result = optimal_val + duality_gap + active_count + cs_violations
    return str(round(result, 4))
Correct Answer -4.848
CoT Answer -4.848
13
ID: 122 Difficulty: 8 Best: GPT-OSS-20b

Question

Build a 20x20 deterministic matrix M where M[i,j] = sin(i*0.5 + j*0.3)*10 + cos(i*0.2 - j*0.7)*5 + i*0.1. The matrix represents pixel intensities from a satellite image taken at 37°N latitude. Compute SVD. Find the minimum rank k where cumulative energy (sum of squared singular values) exceeds 90%. Reconstruct at rank k. The satellite orbits at 712 km altitude. Compute relative Frobenius error ||M-M_k||/||M||. If error < 5%, also try rank k-1 and keep it if its error < 10%. The image was processed by a cluster of 128 GPUs. Compute spectral entropy of the singular value distribution (-sum(s_norm * log2(s_norm)) where s_norm = S/sum(S)). Return chosen_rank + relative_error*100 + spectral_entropy, rounded to 4 decimal places.

Answer Verification Code Python

def solve_122() -> str:
    """SVD image compression  rank selection  reconstruction error analysis.

    Build a 20x20 deterministic matrix (simulated image). SVD decomposition.
    Choose rank k where cumulative energy exceeds 90%. Reconstruct.
    If Frobenius error < 5% of original norm, also try k-1 and compare.
    Return: chosen_rank + relative_error*100 + spectral_entropy.
    """
    import numpy as np
    import math

    # Deterministic 20x20 "image" matrix
    i_idx, j_idx = np.meshgrid(np.arange(20), np.arange(20), indexing="ij")
    M = np.sin(i_idx * 0.5 + j_idx * 0.3) * 10 + np.cos(i_idx * 0.2 - j_idx * 0.7) * 5 + i_idx * 0.1

    original_norm = float(np.linalg.norm(M, "fro"))

    # Step 1: SVD
    U, S, Vt = np.linalg.svd(M, full_matrices=False)

    # Step 2: Cumulative energy
    energy = np.cumsum(S**2) / np.sum(S**2)
    k = int(np.searchsorted(energy, 0.90)) + 1
    k = min(k, len(S))

    # Step 3: Reconstruct at rank k
    M_k = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
    error_k = float(np.linalg.norm(M - M_k, "fro"))
    rel_error_k = error_k / original_norm

    # Step 4: Conditional — if error < 5%, also try k-1
    if rel_error_k < 0.05 and k > 1:
        k_alt = k - 1
        M_k_alt = U[:, :k_alt] @ np.diag(S[:k_alt]) @ Vt[:k_alt, :]
        error_k_alt = float(np.linalg.norm(M - M_k_alt, "fro"))
        rel_error_k_alt = error_k_alt / original_norm
        # Use k-1 if it's still under 10% error
        if rel_error_k_alt < 0.10:
            chosen_k = k_alt
            chosen_error = rel_error_k_alt
        else:
            chosen_k = k
            chosen_error = rel_error_k
    else:
        chosen_k = k
        chosen_error = rel_error_k

    # Step 5: Spectral entropy
    S_norm = S / S.sum()
    spectral_entropy = -sum(s * math.log2(s) for s in S_norm if s > 1e-15)

    result = chosen_k + chosen_error * 100 + spectral_entropy
    return str(round(result, 4))
Correct Answer 34.004
CoT Answer 34.004
14
ID: 123 Difficulty: 8 Best: GPT-OSS-20b

Question

Five independent experiments compare treatment vs control groups (each n=7). Samples: A1=[10.2,11.5,9.8,12.1,10.7,11.3,10.9] vs B1=[12.5,13.1,11.8,14.2,12.9,13.7,12.3]; A2=[20.1,19.5,21.3,20.8,19.9,20.6,21.1] vs B2=[21.2,20.8,22.1,21.5,20.3,21.9,21.7]; A3=[5.5,6.2,5.8,6.1,5.3,6.4,5.9] vs B3=[7.1,7.8,6.9,7.5,7.2,7.6,7.3]; A4=[15.3,14.8,15.7,14.5,15.1,15.9,14.7] vs B4=[15.5,15.1,16.0,14.9,15.3,16.1,15.0]; A5=[8.1,9.2,7.8,8.5,9.0,8.3,8.7] vs B5=[10.5,11.2,10.1,10.8,11.5,10.3,10.9]. The studies were conducted across 23 clinical sites. Run independent t-tests on each pair. Apply Bonferroni correction (alpha=0.05/5). The total enrollment was 1,847 patients. Compute Cohen's d (pooled SD) for each pair. Compute Cochran's Q heterogeneity statistic. The average follow-up was 18.5 months. If >= 3 Bonferroni-significant, use fixed-effects meta-analysis (inverse-variance weighted mean); otherwise use DerSimonian-Laird random-effects. Return |combined_effect_size| + Q_statistic, rounded to 4 decimal places.

Answer Verification Code Python

def solve_123() -> str:
    """Multi-step hypothesis testing  Bonferroni correction  meta-analysis.

    Run 5 independent t-tests on deterministic sample pairs. Apply
    Bonferroni correction. Count significant results. If >= 3 significant,
    compute fixed-effects meta-analysis of effect sizes; otherwise compute
    random-effects. Return combined effect size + chi-squared heterogeneity.
    """
    import numpy as np
    from scipy import stats

    # 5 deterministic sample pairs (control vs treatment)
    samples = [
        (np.array([10.2, 11.5, 9.8, 12.1, 10.7, 11.3, 10.9]),
         np.array([12.5, 13.1, 11.8, 14.2, 12.9, 13.7, 12.3])),
        (np.array([20.1, 19.5, 21.3, 20.8, 19.9, 20.6, 21.1]),
         np.array([21.2, 20.8, 22.1, 21.5, 20.3, 21.9, 21.7])),
        (np.array([5.5, 6.2, 5.8, 6.1, 5.3, 6.4, 5.9]),
         np.array([7.1, 7.8, 6.9, 7.5, 7.2, 7.6, 7.3])),
        (np.array([15.3, 14.8, 15.7, 14.5, 15.1, 15.9, 14.7]),
         np.array([15.5, 15.1, 16.0, 14.9, 15.3, 16.1, 15.0])),
        (np.array([8.1, 9.2, 7.8, 8.5, 9.0, 8.3, 8.7]),
         np.array([10.5, 11.2, 10.1, 10.8, 11.5, 10.3, 10.9])),
    ]

    n_tests = len(samples)
    alpha = 0.05
    bonferroni_alpha = alpha / n_tests

    # Step 1: Run t-tests and compute effect sizes (Cohen's d)
    t_stats: list[float] = []
    p_values: list[float] = []
    effect_sizes: list[float] = []
    variances: list[float] = []

    for ctrl, treat in samples:
        t_stat, p_val = stats.ttest_ind(treat, ctrl)
        t_stats.append(float(t_stat))
        p_values.append(float(p_val))

        n1, n2 = len(ctrl), len(treat)
        pooled_std = float(np.sqrt(((n1 - 1) * ctrl.var(ddof=1) + (n2 - 1) * treat.var(ddof=1)) / (n1 + n2 - 2)))
        d = float((treat.mean() - ctrl.mean()) / pooled_std)
        effect_sizes.append(d)
        # Variance of Cohen's d: approximately (n1+n2)/(n1*n2) + d^2/(2*(n1+n2-2))
        var_d = (n1 + n2) / (n1 * n2) + d**2 / (2 * (n1 + n2 - 2))
        variances.append(var_d)

    # Step 2: Bonferroni correction
    significant_count = sum(1 for p in p_values if p < bonferroni_alpha)

    # Step 3: Heterogeneity test (Cochran's Q)
    weights = [1.0 / v for v in variances]
    w_sum = sum(weights)
    weighted_mean = sum(w * d for w, d in zip(weights, effect_sizes)) / w_sum
    Q = sum(w * (d - weighted_mean)**2 for w, d in zip(weights, effect_sizes))

    # Step 4: Conditional — fixed vs random effects meta-analysis
    if significant_count >= 3:
        # Fixed effects: weighted mean
        combined_effect = weighted_mean
    else:
        # Random effects: DerSimonian-Laird
        df = n_tests - 1
        C = w_sum - sum(w**2 for w in weights) / w_sum
        tau2 = max(0, (Q - df) / C)
        weights_re = [1.0 / (v + tau2) for v in variances]
        w_sum_re = sum(weights_re)
        combined_effect = sum(w * d for w, d in zip(weights_re, effect_sizes)) / w_sum_re

    result = abs(combined_effect) + Q
    return str(round(result, 4))
Correct Answer 20.517
CoT Answer 21.9896
15
ID: 124 Difficulty: 8 Best: GPT-OSS-120b

Question

A trajectory has 50 deterministic points: t=linspace(0,5,50), y=exp(-0.3t)*cos(2*pi*t)+sin(arange(50)*0.7)*0.05. The oscillator was designed for a seismograph prototype weighing 3.7 kg. Fit a cubic spline. Evaluate spline, first and second derivatives on a fine 200-point grid over [0,5]. The prototype cost $42,000 to manufacture. Fit a damped oscillator model (y''=-a*y'-b*y) via least squares on the derivatives: regress y'' on [y', y]. Extract omega_n=sqrt(|b|) and zeta=-a/(2*omega_n). Compute relative residual norm. The testing facility spans 1,200 square meters. If relative residual < 0.3, compute natural frequency = omega_n/(2*pi) and damping_ratio = zeta; otherwise fit exponential decay envelope instead. Compute arc length of the spline curve over [0,5]. Return natural_frequency + damping_ratio + 0.1*arc_length, rounded to 4 decimal places.

Answer Verification Code Python

def solve_124() -> str:
    """Spline interpolation  derivative estimation  ODE parameter fitting.

    Fit cubic spline to noisy trajectory data. Estimate derivatives.
    Fit a damped oscillator model by least squares. If residual norm
    is below threshold, compute the natural frequency and damping ratio.
    Otherwise, fit a higher-order model. Return natural frequency +
    damping ratio + fit quality metric.
    """
    import numpy as np
    from scipy.interpolate import CubicSpline

    # Deterministic trajectory data: damped oscillation
    t = np.linspace(0, 5, 50)
    # True signal: exp(-0.3t) * cos(2*pi*t) with small deterministic perturbation
    perturbation = np.sin(np.arange(50) * 0.7) * 0.05
    y = np.exp(-0.3 * t) * np.cos(2 * np.pi * t) + perturbation

    # Step 1: Cubic spline interpolation
    cs = CubicSpline(t, y)

    # Step 2: Evaluate spline and its derivatives at fine grid
    t_fine = np.linspace(0, 5, 200)
    y_fine = cs(t_fine)
    dy_fine = cs(t_fine, 1)  # first derivative
    ddy_fine = cs(t_fine, 2)  # second derivative

    # Step 3: Fit damped oscillator: y'' + 2*zeta*omega_n*y' + omega_n^2*y = 0
    # Rearrange: y'' = -2*zeta*omega_n*y' - omega_n^2*y
    # Linear regression: y'' = a*y' + b*y → a = -2*zeta*omega_n, b = -omega_n^2
    A_matrix = np.column_stack([dy_fine, y_fine])
    coeffs, _residuals, _, _ = np.linalg.lstsq(A_matrix, ddy_fine, rcond=None)
    a, b = coeffs

    omega_n_sq = -b
    omega_n = float(np.sqrt(abs(omega_n_sq)))
    zeta = float(-a / (2 * omega_n)) if omega_n > 0 else 0.0

    # Step 4: Compute fit quality
    y_pred = A_matrix @ coeffs
    residual_norm = float(np.linalg.norm(ddy_fine - y_pred))
    total_norm = float(np.linalg.norm(ddy_fine))
    relative_residual = residual_norm / total_norm if total_norm > 0 else 1.0

    # Step 5: Conditional — if fit is good, use these parameters
    if relative_residual < 0.3:
        natural_freq = omega_n / (2 * np.pi)  # in Hz
        damping_ratio = zeta
    else:
        # Fit exponential decay envelope instead
        envelope = np.abs(y)
        # Fit log(|y|) ~ -alpha * t + c
        mask = envelope > 0.01
        if np.sum(mask) > 5:
            log_env = np.log(envelope[mask])
            t_mask = t[mask]
            poly = np.polyfit(t_mask, log_env, 1)
            natural_freq = abs(float(poly[0]))
            damping_ratio = natural_freq / (2 * np.pi)
        else:
            natural_freq = 1.0
            damping_ratio = 0.1

    # Step 6: Arc length of the spline curve
    arc_integrand = np.sqrt(1 + cs(t_fine, 1)**2)
    from scipy.integrate import simpson
    arc_length = float(simpson(arc_integrand, x=t_fine))

    result = natural_freq + damping_ratio + arc_length * 0.1
    return str(round(result, 4))
Correct Answer 2.3162
CoT Answer 2.1978
16
ID: 125 Difficulty: 8 Best: GPT-OSS-20b

Question

A 6-state Markov chain has transition matrix rows: [0.1,0.3,0.2,0.1,0.2,0.1], [0.2,0.1,0.3,0.2,0.1,0.1], [0.1,0.2,0.1,0.3,0.1,0.2], [0.15,0.15,0.2,0.1,0.15,0.25], [0,0,0,0,1,0], [0,0,0,0,0,1]. States 4-5 are absorbing. The chain models patient transitions across 6 wards in a hospital with 340 beds. Compute the fundamental matrix N=(I-Q)^(-1) where Q is the 4x4 transient sub-matrix. The hospital serves a catchment area of 285,000 residents. Compute absorption probabilities B=N*R and expected hitting times h=N*1. If any hitting time exceeds 20, add a 0.2 shortcut probability from the slowest transient state to absorbing state 4, proportionally reduce its other transient transitions, renormalize, and recompute. The hospital was established in 1952. Build an LP: minimize sum(u_i) s.t. (I-Q)u >= 1, u >= 0 (confirming hitting time lower bound). Return hitting_time_from_state_0 + max(absorption_probability), rounded to 4 decimal places.

Answer Verification Code Python

def solve_125() -> str:
    """Markov chain  absorption probabilities  expected hitting times  LP bound.

    Build a 6-state absorbing Markov chain. Compute absorption probabilities
    from each transient state. Compute expected hitting times. If any hitting
    time exceeds 20 steps, add a shortcut transition and recompute.
    Solve an LP related to the steady-state. Return expected hitting time
    from state 0 + max absorption probability.
    """
    import numpy as np
    from scipy.optimize import linprog

    # 6-state Markov chain: states 0-3 transient, states 4-5 absorbing
    P = np.array([
        [0.1, 0.3, 0.2, 0.1, 0.2, 0.1],
        [0.2, 0.1, 0.3, 0.2, 0.1, 0.1],
        [0.1, 0.2, 0.1, 0.3, 0.1, 0.2],
        [0.15, 0.15, 0.2, 0.1, 0.15, 0.25],
        [0.0, 0.0, 0.0, 0.0, 1.0, 0.0],
        [0.0, 0.0, 0.0, 0.0, 0.0, 1.0],
    ])

    # Extract Q (transient-to-transient) and R (transient-to-absorbing)
    Q = P[:4, :4]
    R = P[:4, 4:6]

    # Step 1: Fundamental matrix N = (eye_4 - Q)^{-1}
    eye_4 = np.eye(4)
    N = np.linalg.inv(eye_4 - Q)

    # Step 2: Absorption probabilities b_abs = N @ R
    b_abs = N @ R

    # Step 3: Expected hitting times (to any absorbing state) = N @ ones
    hitting_times = N @ np.ones(4)

    # Step 4: Conditional — if any hitting time > 20, add shortcut
    max_hitting = float(np.max(hitting_times))
    if max_hitting > 20:
        # Add shortcut from the slowest transient state to absorbing state 4
        slowest = int(np.argmax(hitting_times))
        P_mod = P.copy()
        P_mod[slowest, 4] += 0.2
        P_mod[slowest, :4] *= (1 - P_mod[slowest, 4:].sum()) / P_mod[slowest, :4].sum() if P_mod[slowest, :4].sum() > 0 else 0
        # Renormalize
        P_mod[slowest] /= P_mod[slowest].sum()
        Q_mod = P_mod[:4, :4]
        R_mod = P_mod[:4, 4:6]
        N_mod = np.linalg.inv(np.eye(4) - Q_mod)
        hitting_times = N_mod @ np.ones(4)
        b_abs = N_mod @ R_mod

    # Step 5: LP — minimize expected cost to reach absorbing state
    # Variables: u_i = expected cost from state i
    # Minimize sum(u_i) subject to u_i >= 1 + sum(Q[i,j]*u_j)
    # Rearranged: u_i - sum(Q[i,j]*u_j) >= 1 → (eye_4-Q)u >= 1
    c = np.ones(4)  # minimize sum
    A_ub = -(eye_4 - Q)  # -(eye_4-Q)u <= -1
    b_ub = -np.ones(4)
    bounds = [(0, None)] * 4
    res = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds, method="highs")
    assert res.success

    hitting_from_0 = float(hitting_times[0])
    max_abs_prob = float(np.max(b_abs))

    result = hitting_from_0 + max_abs_prob
    return str(round(result, 4))
Correct Answer 3.9837
CoT Answer 3.9837
17
ID: 126 Difficulty: 8 Best: GPT-OSS-120b

Question

Ten tasks have (start_datetime, start_tz, end_datetime, end_tz, complexity) as given in the context data. The project manager earned her PMP certification in 2019. For each task, compute business hours (Mon-Fri 9:00-17:00 local, counted in 30-min increments). The office coffee machine dispenses 175 cups per day. Fit OLS: business_hours ~ complexity. The team's average tenure is 4.7 years. If R-squared > 0.7, predict business hours for complexity=9; otherwise use the median. Return prediction + total_business_hours + R_squared, rounded to 4 decimal places.

Answer Verification Code Python

def solve_126() -> str:
    """Time zone aware datetime  holiday detection  work hour calculation  regression.

    Compute working hours between pairs of datetime strings across
    different time zones, accounting for weekends. Fit a linear model
    predicting turnaround time from complexity score. If  > 0.7,
    predict for new data point; otherwise use median. Return prediction +
    total business hours + .
    """
    import numpy as np
    from datetime import datetime, timedelta
    import pytz
    import statsmodels.api as sm

    # 10 tasks with start/end datetimes and complexity scores
    tasks = [
        ("2025-03-03 09:00", "America/New_York", "2025-03-04 14:00", "America/New_York", 3),
        ("2025-03-05 10:00", "Europe/London", "2025-03-06 16:00", "Europe/London", 5),
        ("2025-03-07 08:00", "Asia/Tokyo", "2025-03-07 17:00", "Asia/Tokyo", 2),
        ("2025-03-10 11:00", "America/Chicago", "2025-03-11 15:00", "America/Chicago", 4),
        ("2025-03-12 09:30", "Europe/Berlin", "2025-03-13 12:00", "Europe/Berlin", 6),
        ("2025-03-14 08:00", "Asia/Singapore", "2025-03-14 18:00", "Asia/Singapore", 1),
        ("2025-03-17 10:00", "America/New_York", "2025-03-19 11:00", "America/New_York", 7),
        ("2025-03-20 09:00", "Europe/London", "2025-03-21 17:00", "Europe/London", 8),
        ("2025-03-24 07:00", "Asia/Tokyo", "2025-03-25 16:00", "Asia/Tokyo", 3),
        ("2025-03-26 10:00", "America/Chicago", "2025-03-27 14:00", "America/Chicago", 5),
    ]

    # Step 1: Compute business hours for each task (9:00-17:00 local, Mon-Fri)
    business_hours: list[float] = []
    complexities: list[float] = []

    for start_str, start_tz, end_str, end_tz, complexity in tasks:
        tz_start = pytz.timezone(start_tz)
        tz_end = pytz.timezone(end_tz)
        start_dt = tz_start.localize(datetime.strptime(start_str, "%Y-%m-%d %H:%M"))
        end_dt = tz_end.localize(datetime.strptime(end_str, "%Y-%m-%d %H:%M"))

        # Count business hours in the start timezone
        hours = 0.0
        current = start_dt
        while current < end_dt:
            if current.weekday() < 5:  # Monday-Friday
                local_hour = current.hour + current.minute / 60.0
                if 9.0 <= local_hour < 17.0:
                    hours += 0.5  # 30-minute increments
            current += timedelta(minutes=30)

        business_hours.append(hours)
        complexities.append(float(complexity))

    total_bh = sum(business_hours)

    # Step 2: Fit linear regression: business_hours ~ complexity
    X = np.array(complexities).reshape(-1, 1)
    X_c = sm.add_constant(X)
    y = np.array(business_hours)
    model = sm.OLS(y, X_c).fit()
    r_squared = float(model.rsquared)

    # Step 3: Conditional — if R² > 0.7, predict for complexity=9
    if r_squared > 0.7:
        new_x = np.array([[1.0, 9.0]])
        prediction = float(model.predict(new_x)[0])
    else:
        prediction = float(np.median(business_hours))

    result = prediction + total_bh + r_squared
    return str(round(result, 4))
Correct Answer 138.5011
CoT Answer 138.5011
18
ID: 127 Difficulty: 8 Best: GPT-OSS-20b

Question

A 4x4 discrete-time state matrix A has rows: [0.8,0.1,-0.2,0.05], [0.1,0.7,0.15,-0.1], [-0.05,0.1,0.9,0.1], [0.1,-0.05,0.1,0.75]. Input matrix B (4x2): [1,0],[0,1],[0.5,0.5],[0,0.5]. The system models a thermal control unit in a spacecraft with 16 solar panels. Compute the matrix exponential of A. Find all eigenvalues and the spectral radius. The spacecraft weighs 4,200 kg at launch. If all eigenvalues have |lambda|<1 (discrete-time stable), solve the discrete Lyapunov equation A*X*A^T + Q = X with Q=I_4. Otherwise compute A*A^T trace. The mission cost was $890 million over 7 years. Build the 4-step controllability matrix [B, AB, A^2*B, A^3*B] and compute its rank. If fully controllable (rank=4), compute minimum-energy control: build Gramian W=sum(A^k*B*B^T*(A^T)^k, k=0..3) and compute x_target^T * W^(-1) * x_target for x_target=[1,0,0,0]. Return spectral_radius + trace(X_lyapunov_or_AAt) + controllability_rank + min_energy, rounded to 4 decimal places.

Answer Verification Code Python

def solve_127() -> str:
    """Matrix exponential  system stability  Lyapunov equation  controllability.

    Build a 4x4 state matrix. Compute matrix exponential. Check eigenvalue
    stability. If stable, solve Lyapunov equation AXA' + Q = X for
    observability Gramian. If controllable, compute the minimum-energy
    control input. Return: spectral_radius + trace(Gramian) + controllability_rank.
    """
    import numpy as np
    from scipy.linalg import solve_discrete_lyapunov

    # 4x4 state-space system matrix A
    A = np.array([
        [0.8, 0.1, -0.2, 0.05],
        [0.1, 0.7, 0.15, -0.1],
        [-0.05, 0.1, 0.9, 0.1],
        [0.1, -0.05, 0.1, 0.75],
    ])

    # Input matrix B (4x2)
    B = np.array([
        [1.0, 0.0],
        [0.0, 1.0],
        [0.5, 0.5],
        [0.0, 0.5],
    ])

    # Step 2: Eigenvalue analysis
    eigenvalues = np.linalg.eigvals(A)
    spectral_radius = float(np.max(np.abs(eigenvalues)))

    # Step 3: Check discrete-time stability (all eigenvalues inside unit circle)
    is_stable = all(abs(ev) < 1.0 for ev in eigenvalues)

    # Step 4: Conditional — if stable, solve Lyapunov equation
    if is_stable:
        Q = np.eye(4)  # Process noise covariance
        # Solve A X A^T + Q = X → X = A X A^T + Q (discrete Lyapunov)
        X = solve_discrete_lyapunov(A, Q)
        gramian_trace = float(np.trace(X))
    else:
        gramian_trace = float(np.trace(A @ A.T))

    # Step 5: Controllability matrix [B, AB, A²B, A³B]
    n = A.shape[0]
    C_mat = np.hstack([np.linalg.matrix_power(A, k) @ B for k in range(n)])
    ctrl_rank = int(np.linalg.matrix_rank(C_mat))

    # Step 6: If controllable (rank = n), compute minimum-energy control
    if ctrl_rank == n:
        # Minimum energy to reach x_target = [1, 0, 0, 0] from origin
        # Using controllability Gramian w_gram = sum(A^k B B^T (A^T)^k, k=0..n-1)
        w_gram = np.zeros((n, n))
        for k in range(n):
            Ak_B = np.linalg.matrix_power(A, k) @ B
            w_gram += Ak_B @ Ak_B.T
        x_target = np.array([1.0, 0.0, 0.0, 0.0])
        min_energy = float(x_target @ np.linalg.inv(w_gram) @ x_target)
    else:
        min_energy = 0.0

    result = spectral_radius + gramian_trace + ctrl_rank + min_energy
    return str(round(result, 4))
Correct Answer 42.8912
CoT Answer 42.8912
19
ID: 128 Difficulty: 8 Best: GPT-OSS-20b

Question

Build survival data for 50 subjects (1-indexed). Group assignment: group = i mod 2 (alternating). Survival times: group 0 gets 10+2*sin(i*0.5)+i*0.3, group 1 gets 15+3*cos(i*0.4)+i*0.2. Censoring: event=0 for every 3rd sample (0-indexed i where i%3==0), event=1 otherwise. Age covariate: age = 30 + i*0.8 + sin(i*0.3)*5. The study was funded by a $1.8 million NIH grant. Fit Kaplan-Meier for each group separately. Compute median survival for each group. Run log-rank test between the two groups. The hospital's annual budget is $340 million. If log-rank p < 0.05, fit a Cox proportional hazards model with covariates [group, age] and extract the concordance index. The patient cohort had an average BMI of 27.3. If C-index > 0.65, extract hazard ratios; otherwise use KM medians. Return a single scalar S = C_index + |median_group1 - median_group0| + log_rank_chi_squared, rounded to 4 decimal places. If log-rank p >= 0.05 (so Cox PH is not fit), set C_index = 0.5 before computing S.

Answer Verification Code Python

def solve_128() -> str:
    """Survival analysis  Cox model comparison  concordance index.

    Build deterministic survival data for 50 subjects. Fit Kaplan-Meier.
    Test log-rank between two groups. If significant, fit Cox proportional
    hazards. Compute concordance index. If C-index > 0.65, extract
    hazard ratios; otherwise fall back to KM estimates. Return C-index +
    median survival difference + log-rank chi-squared.
    """
    import numpy as np
    from lifelines import KaplanMeierFitter, CoxPHFitter  # type: ignore[import-untyped]
    from lifelines.statistics import logrank_test  # type: ignore[import-untyped]
    import pandas as pd

    # Deterministic survival data: 50 subjects, 2 groups
    n = 50
    indices = np.arange(1, n + 1, dtype=float)

    # Group assignment (deterministic)
    group = (indices % 2).astype(int)  # alternating 1, 0, 1, 0, ...

    # Survival times (deterministic, different distributions per group)
    time_group0 = 10 + 2 * np.sin(indices[group == 0] * 0.5) + indices[group == 0] * 0.3
    time_group1 = 15 + 3 * np.cos(indices[group == 1] * 0.4) + indices[group == 1] * 0.2

    times = np.zeros(n)
    times[group == 0] = time_group0
    times[group == 1] = time_group1

    # Event indicator (deterministic: every 3rd subject is censored)
    event = np.ones(n)
    event[::3] = 0

    # Age covariate (deterministic)
    age = 30 + (indices * 0.8) + np.sin(indices * 0.3) * 5

    # Step 1: Kaplan-Meier for each group
    kmf0 = KaplanMeierFitter()
    kmf0.fit(times[group == 0], event[group == 0])
    median0 = float(kmf0.median_survival_time_)

    kmf1 = KaplanMeierFitter()
    kmf1.fit(times[group == 1], event[group == 1])
    median1 = float(kmf1.median_survival_time_)

    # Step 2: Log-rank test
    lr = logrank_test(times[group == 0], times[group == 1],
                      event[group == 0], event[group == 1])
    lr_chi2 = float(lr.test_statistic)
    lr_p = float(lr.p_value)

    # Step 3: Conditional — if log-rank significant, fit Cox PH
    if lr_p < 0.05:
        df = pd.DataFrame({
            "time": times,
            "event": event,
            "group": group.astype(float),
            "age": age,
        })
        cph = CoxPHFitter()
        cph.fit(df, duration_col="time", event_col="event")
        c_index = float(cph.concordance_index_)

        # Step 4: Conditional — if C-index > 0.65, use Cox hazard ratios
        if c_index > 0.65:
            result = c_index + abs(median1 - median0) + lr_chi2
        else:
            result = c_index + abs(median1 - median0) + lr_chi2
    else:
        c_index = 0.5
        result = c_index + abs(median1 - median0) + lr_chi2

    return str(round(result, 4))
Correct Answer 4.4841
CoT Answer 4.7006
20
ID: 129 Difficulty: 8 Best: GPT-OSS-20b

Question

Generate a deterministic signal: s(t) = 3*sin(2*pi*2*t) + 5*sin(2*pi*5*t) + 2*sin(2*pi*10*t) + sin(2*pi*20*t) + 0.5*sin(2*pi*50*t), sampled at 200 Hz for 5 seconds (1000 points). The signal was captured by a hydrophone array at a depth of 340 meters. Compute FFT and identify the dominant frequency (largest magnitude excluding DC). The recording vessel had a displacement of 8,700 tonnes. Design a Butterworth bandpass filter (order 4) passing 3-8 Hz. Apply the filter using zero-phase filtering (filtfilt). Compute cross-correlation between original and filtered signal. Compute energy ratio (filtered/original). The experiment was part of a 14-month research cruise. If correlation < 0.5, widen the passband to 1-15 Hz, redesign the filter, and reapply. Return correlation + energy_ratio + dominant_frequency, rounded to 4 decimal places.

Answer Verification Code Python

def solve_129() -> str:
    """Fourier analysis  bandpass filter  signal reconstruction  correlation.

    Generate a deterministic multi-component signal. FFT analysis.
    Design a bandpass filter (pass 3-8 Hz). Apply filter. Compute
    cross-correlation between original and filtered. If correlation
    is below threshold, widen the passband and refilter. Return
    correlation + energy ratio + dominant frequency.
    """
    import numpy as np
    from scipy.signal import butter, filtfilt
    from scipy.fft import rfft, rfftfreq

    # Deterministic signal: sum of sinusoids
    fs = 200  # Hz
    t = np.arange(0, 5, 1.0 / fs)  # 5 seconds
    signal = (3.0 * np.sin(2 * np.pi * 2 * t)       # 2 Hz
              + 5.0 * np.sin(2 * np.pi * 5 * t)    # 5 Hz
              + 2.0 * np.sin(2 * np.pi * 10 * t)   # 10 Hz
              + 1.0 * np.sin(2 * np.pi * 20 * t)   # 20 Hz
              + 0.5 * np.sin(2 * np.pi * 50 * t))  # 50 Hz

    # Step 1: FFT analysis
    N = len(signal)
    fft_vals = rfft(signal)
    freqs = rfftfreq(N, 1.0 / fs)
    magnitudes = np.abs(fft_vals)

    # Step 2: Find dominant frequency
    dominant_idx = int(np.argmax(magnitudes[1:])) + 1
    dominant_freq = float(freqs[dominant_idx])

    # Step 3: Bandpass filter (3-8 Hz)
    low, high = 3.0, 8.0
    b, a = butter(4, [low, high], btype="band", fs=fs)
    filtered = filtfilt(b, a, signal)

    # Step 4: Cross-correlation
    corr = float(np.corrcoef(signal, filtered)[0, 1])

    # Step 5: Energy ratio (filtered vs original)
    energy_orig = float(np.sum(signal**2))
    energy_filt = float(np.sum(filtered**2))
    energy_ratio = energy_filt / energy_orig

    # Step 6: Conditional — if correlation < 0.5, widen passband
    if corr < 0.5:
        low_wide, high_wide = 1.0, 15.0
        b2, a2 = butter(4, [low_wide, high_wide], btype="band", fs=fs)
        filtered_wide = filtfilt(b2, a2, signal)
        corr = float(np.corrcoef(signal, filtered_wide)[0, 1])
        energy_filt = float(np.sum(filtered_wide**2))
        energy_ratio = energy_filt / energy_orig

    result = corr + energy_ratio + dominant_freq
    return str(round(result, 4))
Correct Answer 6.4325
CoT Answer 6.4324
21
ID: 130 Difficulty: 8 Best: GPT-OSS-20b

Question

Build a deterministic 100x5 return matrix: for period t (1-100), asset returns are r1=0.001*sin(t*0.1)+0.005, r2=0.002*cos(t*0.15)+0.004, r3=0.0015*sin(t*0.2+1)+0.006, r4=0.001*cos(t*0.12+0.5)+0.003, r5=0.0025*sin(t*0.08+2)+0.007. The fund was established in 2011 with $50 million AUM. Apply PCA to the return matrix. Record the explained variance ratio of the first component. Compute the 5x5 covariance matrix and expected returns (column means). The fund's management fee is 1.25% annually. Compute the minimum-variance portfolio weights using w = Sigma^(-1)*1 / (1^T*Sigma^(-1)*1). If any weight is negative (short selling), re-solve with long-only constraint (w>=0, sum(w)=1) using SLSQP optimization. Use the post-constraint weights for all downstream metrics if long-only is triggered. The fund has 47 institutional investors. Annualize portfolio return (*252) and risk (*sqrt(252)). Compute Sharpe ratio with risk-free=2%. Here, risk-free 2% means decimal 0.02 annual. Compute HHI (sum of squared weights). Return Sharpe_ratio + explained_variance_ratio_PC1 + HHI, rounded to 4 decimal places.

Answer Verification Code Python

def solve_130() -> str:
    """PCA on financial returns  portfolio optimization  Sharpe ratio.

    Build a 100x5 deterministic return matrix. PCA to identify principal
    portfolios. Compute minimum-variance portfolio weights. If any weight
    is negative (short selling), apply long-only constraint via QP.
    Compute expected return and risk. Return Sharpe ratio + explained
    variance ratio + portfolio concentration (HHI).
    """
    import numpy as np
    from sklearn.decomposition import PCA
    from scipy.optimize import minimize

    # Deterministic returns: 100 periods, 5 assets
    n_periods = 100
    t = np.arange(1, n_periods + 1, dtype=float)
    returns = np.column_stack([
        0.001 * np.sin(t * 0.1) + 0.005,
        0.002 * np.cos(t * 0.15) + 0.004,
        0.0015 * np.sin(t * 0.2 + 1) + 0.006,
        0.001 * np.cos(t * 0.12 + 0.5) + 0.003,
        0.0025 * np.sin(t * 0.08 + 2) + 0.007,
    ])

    # Step 1: PCA
    pca = PCA(n_components=5).fit(returns)
    explained_ratio = float(pca.explained_variance_ratio_[0])

    # Step 2: Covariance matrix and expected returns
    cov_matrix = np.cov(returns.T)
    exp_returns = np.mean(returns, axis=0)

    # Step 3: Minimum variance portfolio (unconstrained)
    cov_inv = np.linalg.inv(cov_matrix)
    ones = np.ones(5)
    w_mv = cov_inv @ ones / (ones @ cov_inv @ ones)

    # Step 4: Conditional — if any weight is negative, apply long-only constraint
    if np.any(w_mv < 0):
        # Constrained optimization: minimize w^T Sigma w, sum(w) = 1, w >= 0
        def port_var(w: np.ndarray) -> float:
            return float(w @ cov_matrix @ w)

        constraints = [{"type": "eq", "fun": lambda w: np.sum(w) - 1}]
        bounds = [(0, 1)] * 5
        res = minimize(port_var, np.ones(5) / 5, method="SLSQP",
                       constraints=constraints, bounds=bounds)
        w_opt = res.x
    else:
        w_opt = w_mv

    # Step 5: Portfolio metrics
    port_return = float(w_opt @ exp_returns) * 252  # annualized
    port_risk = float(np.sqrt(w_opt @ cov_matrix @ w_opt)) * np.sqrt(252)
    risk_free_rate = 0.02
    sharpe = (port_return - risk_free_rate) / port_risk if port_risk > 0 else 0.0

    # Step 6: Portfolio concentration (Herfindahl-Hirschman Index)
    hhi = float(np.sum(w_opt**2))

    result = sharpe + explained_ratio + hhi
    return str(round(result, 4))
Correct Answer 150.8828
CoT Answer 150.8828
22
ID: 131 Difficulty: 8 Best: GPT-OSS-20b

Question

Consider the 2D dynamical system: dx/dt = x^2 - y - 1, dy/dt = x - y^2 + 1. The system was proposed in a 1994 paper by Dr. Chen with 312 citations. Find all equilibrium points by solving f1=0 and f2=0 simultaneously. Compute the 2x2 Jacobian matrix J = [[df1/dx, df1/dy],[df2/dx, df2/dy]]. The research was funded by NSF grant DMS-9401234. For each real equilibrium, evaluate J and compute eigenvalues. Sum the real parts of all eigenvalues across all real equilibria (counting multiplicity). Also compute trace(J) at the origin (0,0). The department has 23 faculty members. If any equilibrium has an eigenvalue with positive real part (unstable), compute det(J) at the first real equilibrium and add 0.01*|det(J)| to the sum. Return sum_of_real_parts + number_of_real_equilibria + trace_at_origin, rounded to 4 decimal places.

Answer Verification Code Python

def solve_131() -> str:
    """Symbolic differential equation  phase portrait analysis  stability classification.

    Solve a system of 2 ODEs symbolically. Classify equilibrium points.
    Compute the Jacobian at each equilibrium. If any eigenvalue has
    positive real part, compute the unstable manifold direction.
    Return sum of real parts of all eigenvalues + number of equilibria +
    trace of the Jacobian at the origin.
    """
    import sympy as sp

    x, y = sp.symbols("x y")

    # System: dx/dt = x^2 - y - 1, dy/dt = x - y^2 + 1
    f1 = x**2 - y - 1
    f2 = x - y**2 + 1

    # Step 1: Find equilibrium points (f1 = 0 and f2 = 0)
    equilibria = sp.solve([f1, f2], [x, y])

    # Step 2: Jacobian matrix
    J = sp.Matrix([
        [sp.diff(f1, x), sp.diff(f1, y)],
        [sp.diff(f2, x), sp.diff(f2, y)],
    ])

    # Step 3: Analyze each equilibrium
    total_real_parts = 0.0
    n_equilibria = 0
    trace_at_origin = float(J.subs([(x, 0), (y, 0)]).trace())  # type: ignore[arg-type]
    has_unstable = False

    for eq in equilibria:
        if isinstance(eq, tuple):
            x_val, y_val = eq
        else:
            continue

        # Check if equilibrium is real
        x_n = complex(sp.N(x_val))
        y_n = complex(sp.N(y_val))
        if abs(x_n.imag) > 1e-6 or abs(y_n.imag) > 1e-6:
            continue

        n_equilibria += 1

        # Evaluate Jacobian at this equilibrium
        J_eq = J.subs([(x, x_val), (y, y_val)])
        eigenvalues = J_eq.eigenvals()

        for ev, mult in eigenvalues.items():
            ev_n = complex(sp.N(ev))
            total_real_parts += ev_n.real * mult
            if ev_n.real > 0:
                has_unstable = True

    # Step 4: Conditional — if unstable equilibrium exists
    if has_unstable:
        # Compute the determinant of the Jacobian at the first real equilibrium
        for eq in equilibria:
            if isinstance(eq, tuple):
                x_val, y_val = eq
                x_n = complex(sp.N(x_val))
                y_n = complex(sp.N(y_val))
                if abs(x_n.imag) < 1e-6 and abs(y_n.imag) < 1e-6:
                    J_eq = J.subs([(x, x_val), (y, y_val)])
                    det_J = float(sp.N(J_eq.det()))
                    total_real_parts += abs(det_J) * 0.01
                    break

    result = total_real_parts + n_equilibria + trace_at_origin
    return str(round(result, 4))
Correct Answer 4.01
CoT Answer 4.11
23
ID: 132 Difficulty: 8 Best: GPT-OSS-120b

Question

Generate 60 deterministic 3D points in 3 clusters. Each cluster has 20 points indexed 0-19. Cluster 1 center (2,3,1): x=2+0.5*sin(i*0.5), y=3+0.3*cos(i*0.4), z=1+0.4*sin(i*0.3+1). Cluster 2 center (8,2,5): x=8+0.6*cos(i*0.6), y=2+0.4*sin(i*0.5), z=5+0.3*cos(i*0.4+2). Cluster 3 center (5,8,3): x=5+0.4*sin(i*0.4+0.5), y=8+0.5*cos(i*0.3), z=3+0.35*sin(i*0.6+1.5). The data represents mineral compositions from 3 geological formations. Run K-means for k=2,3,4,5,6 (random_state=42, n_init=10). Compute silhouette scores. The survey covered an area of 4,700 square kilometers. Compute gap statistic for each k using 10 deterministic uniform reference sets (generated via formula: ref_data[d] = min[d] + range[d] * ((arange(60)*(ref+1)*7 + d*13) % 60)/60). The drilling cost averaged $180 per meter. If silhouette and gap agree on optimal k, use it; otherwise use elbow method (largest inertia drop). Return optimal_k + max_silhouette + gap_at_optimal_k, rounded to 4 decimal places.

Answer Verification Code Python

def solve_132() -> str:
    """K-means  silhouette analysis  gap statistic  optimal k selection.

    Generate 60 deterministic 3D data points in 3 natural clusters.
    Run K-means for k=2..6. Compute silhouette scores. Compute gap
    statistic using uniform reference. If gap and silhouette agree on
    optimal k, use that; otherwise use elbow method on inertia.
    Return optimal_k + max_silhouette + gap_at_optimal_k.
    """
    import numpy as np
    from sklearn.cluster import KMeans
    from sklearn.metrics import silhouette_score

    # Generate 60 deterministic 3D data points in 3 clusters
    n_per_cluster = 20
    indices = np.arange(n_per_cluster, dtype=float)

    # Cluster 1: centered at (2, 3, 1)
    c1 = np.column_stack([
        2 + 0.5 * np.sin(indices * 0.5),
        3 + 0.3 * np.cos(indices * 0.4),
        1 + 0.4 * np.sin(indices * 0.3 + 1),
    ])

    # Cluster 2: centered at (8, 2, 5)
    c2 = np.column_stack([
        8 + 0.6 * np.cos(indices * 0.6),
        2 + 0.4 * np.sin(indices * 0.5),
        5 + 0.3 * np.cos(indices * 0.4 + 2),
    ])

    # Cluster 3: centered at (5, 8, 3)
    c3 = np.column_stack([
        5 + 0.4 * np.sin(indices * 0.4 + 0.5),
        8 + 0.5 * np.cos(indices * 0.3),
        3 + 0.35 * np.sin(indices * 0.6 + 1.5),
    ])

    X = np.vstack([c1, c2, c3])

    # Step 1: Run K-means for k=2..6
    silhouette_scores: list[float] = []
    inertias: list[float] = []
    log_inertias: list[float] = []

    for k in range(2, 7):
        km = KMeans(n_clusters=k, random_state=42, n_init=10)
        labels = km.fit_predict(X)
        sil = float(silhouette_score(X, labels))
        silhouette_scores.append(sil)
        inertias.append(float(km.inertia_))
        log_inertias.append(float(np.log(km.inertia_)))

    # Step 2: Compute gap statistic using uniform reference
    n_ref = 10
    data_range = X.max(axis=0) - X.min(axis=0)
    data_min = X.min(axis=0)

    gap_stats: list[float] = []
    for ki, k in enumerate(range(2, 7)):
        ref_log_inertias: list[float] = []
        for ref in range(n_ref):
            # Deterministic uniform reference using formula instead of RNG
            ref_data = np.zeros_like(X)
            for d in range(X.shape[1]):
                # Deterministic "uniform" points using sinusoidal spacing
                ref_points = data_min[d] + data_range[d] * (
                    (np.arange(len(X)) * (ref + 1) * 7 + d * 13) % len(X)
                ) / len(X)
                ref_data[:, d] = ref_points

            km_ref = KMeans(n_clusters=k, random_state=42, n_init=10).fit(ref_data)
            ref_log_inertias.append(float(np.log(km_ref.inertia_)))

        gap = float(np.mean(ref_log_inertias)) - log_inertias[ki]
        gap_stats.append(gap)

    # Step 3: Optimal k by silhouette
    k_sil = int(np.argmax(silhouette_scores)) + 2
    max_sil = max(silhouette_scores)

    # Step 4: Optimal k by gap statistic
    k_gap = int(np.argmax(gap_stats)) + 2

    # Step 5: Conditional — if they agree, use that k; otherwise elbow method
    if k_sil == k_gap:
        optimal_k = k_sil
    else:
        # Elbow: largest drop in inertia
        drops = [inertias[i] - inertias[i + 1] for i in range(len(inertias) - 1)]
        optimal_k = int(np.argmax(drops)) + 2

    gap_at_optimal = gap_stats[optimal_k - 2]
    result = optimal_k + max_sil + gap_at_optimal
    return str(round(result, 4))
Correct Answer 6.0239
CoT Answer 6.027
24
ID: 133 Difficulty: 8 Best: GPT-OSS-20b

Question

Solve the 1D heat equation u_t = 0.01 * u_xx on domain [0,1] with u(0,t)=0, u(1,t)=0, u(x,0)=sin(pi*x)+0.5*sin(3*pi*x). Use explicit finite differences with nx=50 spatial points and initial dt=0.0005. The simulation models heat dissipation in a rod made of titanium alloy Ti-6Al-4V. Check CFL stability: r = alpha*dt/dx^2. If r > 0.5, reduce dt to 0.9*0.5*dx^2/alpha. The rod weighs 3.2 kg and is 1 meter long. Run until t_final=0.1. Compute max temperature at final time. Integrate u^2 over [0,1] using Simpson's rule for total energy. The furnace operates at 1,450°C. Compare with analytical solution u_exact = sin(pi*x)*exp(-0.01*pi^2*t) + 0.5*sin(3*pi*x)*exp(-0.01*9*pi^2*t). Compute relative L2 error. If error > 5%, add 100*error to result. Return max_temp + energy + r [+ 100*error if applicable], rounded to 4 decimal places.

Answer Verification Code Python

def solve_133() -> str:
    """Numerical PDE (heat equation)  finite differences  stability analysis.

    Solve 1D heat equation u_t = alpha * u_xx with deterministic initial
    and boundary conditions. Check CFL stability condition. If unstable,
    reduce time step. Compute solution at final time. Return max temperature
    at final time + total energy + stability parameter.
    """
    import numpy as np

    # Parameters
    L = 1.0          # length
    alpha = 0.01     # thermal diffusivity
    nx = 50          # spatial points
    dx = L / (nx - 1)
    dt = 0.0005      # initial time step
    t_final = 0.1

    # Step 1: CFL stability check: r = alpha * dt / dx^2 must be <= 0.5
    r = alpha * dt / dx**2

    # Step 2: Conditional — if unstable, reduce dt
    if r > 0.5:
        dt = 0.5 * dx**2 / alpha * 0.9  # safe margin
        r = alpha * dt / dx**2

    nt = int(t_final / dt) + 1

    # Step 3: Initial condition: u(x, 0) = sin(pi*x) + 0.5*sin(3*pi*x)
    x = np.linspace(0, L, nx)
    u = np.sin(np.pi * x) + 0.5 * np.sin(3 * np.pi * x)

    # Step 4: Boundary conditions: u(0,t) = 0, u(L,t) = 0
    # Step 5: Time stepping (explicit finite difference)
    for _ in range(nt):
        u_new = u.copy()
        for i in range(1, nx - 1):
            u_new[i] = u[i] + r * (u[i + 1] - 2 * u[i] + u[i - 1])
        u_new[0] = 0.0
        u_new[-1] = 0.0
        u = u_new

    # Step 6: Compute metrics
    max_temp = float(np.max(u))

    # Total energy (integral of u^2)
    from scipy.integrate import simpson
    energy = float(simpson(u**2, x=x))

    # Analytical solution for comparison
    # u(x,t) = sin(pi*x)*exp(-alpha*pi^2*t) + 0.5*sin(3*pi*x)*exp(-alpha*9*pi^2*t)
    u_analytical = (np.sin(np.pi * x) * np.exp(-alpha * np.pi**2 * t_final)
                    + 0.5 * np.sin(3 * np.pi * x) * np.exp(-alpha * 9 * np.pi**2 * t_final))

    # Relative error
    rel_error = float(np.linalg.norm(u - u_analytical)) / float(np.linalg.norm(u_analytical))

    # Step 7: Conditional — if error > 5%, report reduced accuracy
    if rel_error > 0.05:
        result = max_temp + energy + r + rel_error * 100
    else:
        result = max_temp + energy + r

    return str(round(result, 4))
Correct Answer 1.6398
CoT Answer 1.6396
25
ID: 134 Difficulty: 8 Best: GPT-OSS-120b

Question

Fit OLS regression on 25 data points: x=linspace(1,25,25), y=2*x+5+noise where noise=[0.5,-0.3,0.8,-0.1,0.4,-0.6,0.2,-0.5,0.7,-0.2,0.3,-0.4,0.6,-0.1,0.5,-0.3,0.4,-0.7,0.1,-0.4,0.6,-0.2,0.3,-0.5,0.8], except y[12] is overridden to 2*x[12]+5+15 (influential outlier). The data was collected from 25 sampling stations along a 450 km river. Compute Cook's distance for all points. Apply threshold = 4/n. Count influential points. The survey team used 7 boats over 12 weeks. If any influential points exist, remove them and refit OLS. Also compute DFFITS for the original model. The river basin covers 87,000 square kilometers. Apply DFFITS threshold = 2*sqrt(2/n). Count DFFITS violations. Return R²_after_removal + max(Cook's_d) + n_influential + 0.1*n_DFFITS_violations, rounded to 4 decimal places.

Answer Verification Code Python

def solve_134() -> str:
    """Regression diagnostics  Cook's distance  influential point removal  refit.

    Fit OLS on 25 deterministic data points. Compute Cook's distance.
    If any point has Cook's d > 4/n, remove it and refit. Compare
    R-squared before and after removal. Also compute DFFITS for each
    point. Return R²_after + max(Cook's_d) + number of influential points.
    """
    import numpy as np
    import statsmodels.api as sm
    from statsmodels.stats.outliers_influence import OLSInfluence

    # Deterministic data with 1 influential point
    n = 25
    x = np.linspace(1, 25, n)
    y = 2.0 * x + 5.0 + np.array([
        0.5, -0.3, 0.8, -0.1, 0.4, -0.6, 0.2, -0.5, 0.7, -0.2,
        0.3, -0.4, 0.6, -0.1, 0.5, -0.3, 0.4, -0.7, 0.1, -0.4,
        0.6, -0.2, 0.3, -0.5, 0.8
    ])
    # Add influential point: modify point 12 significantly
    y[12] = 2.0 * x[12] + 5.0 + 15.0  # large outlier

    # Step 1: Fit OLS
    X = sm.add_constant(x.reshape(-1, 1))
    model1 = sm.OLS(y, X).fit()
    r2_before = float(model1.rsquared)

    # Step 2: Compute Cook's distance
    influence = OLSInfluence(model1)
    cooks_d = influence.cooks_distance[0]
    max_cooks = float(np.max(cooks_d))

    # Step 3: Threshold = 4/n
    threshold = 4.0 / n
    influential_mask = cooks_d > threshold
    n_influential = int(np.sum(influential_mask))

    # Step 4: Conditional — if influential points exist, remove and refit
    if n_influential > 0:
        keep_mask = ~influential_mask
        X_clean = X[keep_mask]
        y_clean = y[keep_mask]
        model2 = sm.OLS(y_clean, X_clean).fit()
        r2_after = float(model2.rsquared)
    else:
        r2_after = r2_before

    # Step 5: Compute DFFITS
    dffits = influence.dffits[0]

    # Step 6: Conditional — if DFFITS threshold exceeded, add penalty
    dffits_threshold = 2 * np.sqrt(2.0 / n)  # 2*sqrt(p/n) for p=2
    dffits_violations = int(np.sum(np.abs(dffits) > dffits_threshold))

    result = r2_after + max_cooks + n_influential + dffits_violations * 0.1
    return str(round(result, 4))
Correct Answer 2.5672
CoT Answer 2.5672
26
ID: 135 Difficulty: 8 Best: GPT-OSS-20b

Question

Consider transfer function G(s) = 10/(s^3+6s^2+11s+6) = 10/((s+1)(s+2)(s+3)). The system models a chemical reactor with a volume of 2,500 liters. Find all poles. Compute the frequency response at 500 log-spaced frequencies from 0.01 to 100 rad/s. Convert magnitude to dB. Find the gain crossover frequency (where |G(jw)| = 0 dB). The reactor operates at 185°C and 3.4 atm pressure. Compute phase at the gain crossover → phase margin = 180 + phase_at_crossover. Find the -3dB bandwidth relative to DC gain. The control system was designed by a team of 11 engineers. If phase margin < 30°, add a lead compensator C(s) = (s+1)/(s+10) in series and recompute frequency response, crossover, and phase margin. Return a single scalar S = phase_margin + gain_at_crossover + bandwidth, rounded to 4 decimal places, where gain_at_crossover is the linear magnitude |G(jw_c)| (not dB) at the selected crossover index.

Answer Verification Code Python

def solve_135() -> str:
    """Symbolic Laplace transform  transfer function  Bode analysis  gain margin.

    Compute Laplace transform of a function. Derive transfer function.
    Compute magnitude and phase at multiple frequencies. Find gain
    crossover and phase margin. If phase margin < 30°, add compensator
    and recompute. Return phase margin + gain at crossover + bandwidth.
    """
    import numpy as np
    import sympy as sp

    s = sp.Symbol("s")

    # Transfer function: G(s) = 10 / (s^3 + 6s^2 + 11s + 6)
    # = 10 / ((s+1)(s+2)(s+3))
    den = s**3 + 6 * s**2 + 11 * s + 6

    # Step 1: Find poles
    _poles = sp.solve(den, s)  # noqa: F841

    # Step 2: Frequency response at many frequencies
    freqs = np.logspace(-2, 2, 500)
    magnitude = np.zeros(len(freqs))
    phase = np.zeros(len(freqs))

    for idx, w in enumerate(freqs):
        jw = complex(0, w)
        G_jw = 10.0 / (jw**3 + 6 * jw**2 + 11 * jw + 6)
        magnitude[idx] = abs(G_jw)
        phase[idx] = np.degrees(np.arctan2(G_jw.imag, G_jw.real))

    # Convert magnitude to dB
    mag_db = 20 * np.log10(magnitude)

    # Step 3: Find gain crossover frequency (|G(jw)| = 1, i.e., 0 dB)
    crossover_idx = np.argmin(np.abs(mag_db))
    phase_at_crossover = phase[crossover_idx]
    phase_margin = 180 + phase_at_crossover

    # Step 4: Bandwidth (-3 dB frequency)
    dc_gain_db = mag_db[0]
    bw_idx = np.argmin(np.abs(mag_db - (dc_gain_db - 3)))
    bandwidth = freqs[bw_idx]

    # Step 5: Conditional — if phase margin < 30°, add lead compensator
    if phase_margin < 30:
        # Lead compensator: C(s) = K*(s + z)/(s + p) where p > z
        # Choose z = 1, p = 10, K adjusted for unity gain at crossover
        comp_z = 1.0
        comp_p = 10.0
        for idx, w in enumerate(freqs):
            jw = complex(0, w)
            G_jw = 10.0 / (jw**3 + 6 * jw**2 + 11 * jw + 6)
            C_jw = (jw + comp_z) / (jw + comp_p)
            GC_jw = G_jw * C_jw
            magnitude[idx] = abs(GC_jw)
            phase[idx] = np.degrees(np.arctan2(GC_jw.imag, GC_jw.real))

        mag_db = 20 * np.log10(magnitude)
        crossover_idx = np.argmin(np.abs(mag_db))
        phase_at_crossover = phase[crossover_idx]
        phase_margin = 180 + phase_at_crossover
        bw_idx = np.argmin(np.abs(mag_db - (mag_db[0] - 3)))
        bandwidth = freqs[bw_idx]

    gain_at_crossover = float(magnitude[crossover_idx])

    result = phase_margin + gain_at_crossover + bandwidth
    return str(round(result, 4))
Correct Answer 92.4202
CoT Answer 91.8013
27
ID: 136 Difficulty: 8 Best: GPT-OSS-20b

Question

Fifteen alternatives are evaluated on 3 objectives (maximize obj1, maximize obj2, minimize obj3). Decision matrix rows: [80,70,15],[75,85,20],[90,60,10],[65,90,25],[85,75,12],[70,80,18],[95,55,8],[60,95,30],[88,72,14],[72,88,22],[82,78,16],[78,82,19],[92,65,11],[68,92,28],[86,74,13]. The evaluation committee had 9 members with a combined 187 years of experience. Normalize using vector normalization. Apply weights [0.4, 0.35, 0.25]. Compute TOPSIS: ideal and anti-ideal solutions, distances, and closeness scores. Rank all alternatives. The project budget was $15.4 million. Find the Pareto front (non-dominated alternatives considering directions). Count Pareto front size. The project timeline was 36 months. If the TOPSIS winner dominates all others on at least 2 objectives, it wins; otherwise use weighted-sum method (weights [0.4, 0.35, -0.25]) to select. Return a single scalar S = TOPSIS_score_of_top_ranked_alternative + Pareto_front_size + winner_index, rounded to 4 decimal places. Use 0-based indexing for winner_index.

Answer Verification Code Python

def solve_136() -> str:
    """Multi-objective optimization  Pareto front  TOPSIS ranking.

    Define 3 objectives for 15 deterministic alternatives. Normalize
    the decision matrix. Compute ideal and anti-ideal solutions. Rank
    using TOPSIS. If the top-ranked alternative dominates all others
    on at least 2 objectives, it's the winner; otherwise use weighted
    sum. Return TOPSIS score of winner + Pareto front size + rank index.
    """
    import numpy as np

    # 15 alternatives, 3 objectives (maximize obj1, obj2; minimize obj3)
    # Deterministic decision matrix
    alternatives = np.array([
        [80, 70, 15],
        [75, 85, 20],
        [90, 60, 10],
        [65, 90, 25],
        [85, 75, 12],
        [70, 80, 18],
        [95, 55, 8],
        [60, 95, 30],
        [88, 72, 14],
        [72, 88, 22],
        [82, 78, 16],
        [78, 82, 19],
        [92, 65, 11],
        [68, 92, 28],
        [86, 74, 13],
    ], dtype=float)

    n_alt, n_obj = alternatives.shape
    # Objective directions: max, max, min
    directions = [1, 1, -1]  # 1 for max, -1 for min

    # Step 1: Normalize the decision matrix
    norm_factors = np.sqrt(np.sum(alternatives**2, axis=0))
    norm_matrix = alternatives / norm_factors

    # Step 2: Apply direction (flip min objectives)
    for j in range(n_obj):
        if directions[j] == -1:
            norm_matrix[:, j] = -norm_matrix[:, j]

    # Step 3: Weights (equal for now)
    weights = np.array([0.4, 0.35, 0.25])
    weighted = norm_matrix * weights

    # Step 4: Ideal and anti-ideal solutions
    ideal = np.max(weighted, axis=0)
    anti_ideal = np.min(weighted, axis=0)

    # Step 5: Distance to ideal and anti-ideal
    d_plus = np.sqrt(np.sum((weighted - ideal)**2, axis=1))
    d_minus = np.sqrt(np.sum((weighted - anti_ideal)**2, axis=1))

    # Step 6: TOPSIS scores
    topsis_scores = d_minus / (d_plus + d_minus)
    ranking = np.argsort(-topsis_scores)
    top_idx = ranking[0]
    top_score = float(topsis_scores[top_idx])

    # Step 7: Find Pareto front (non-dominated solutions)
    pareto_front: list[int] = []
    for i in range(n_alt):
        dominated = False
        for j in range(n_alt):
            if i == j:
                continue
            # j dominates i if j is >= i on all objectives and > on at least one
            # (using original values with directions)
            better_all = True
            strictly_better = False
            for k in range(n_obj):
                val_i = alternatives[i, k] * directions[k]
                val_j = alternatives[j, k] * directions[k]
                if val_j < val_i:
                    better_all = False
                    break
                if val_j > val_i:
                    strictly_better = True
            if better_all and strictly_better:
                dominated = True
                break
        if not dominated:
            pareto_front.append(i)

    pareto_size = len(pareto_front)

    # Step 8: Conditional — check if top-ranked dominates on 2+ objectives
    dom_count = 0
    for k in range(n_obj):
        val_top = alternatives[top_idx, k] * directions[k]
        better_than_all = all(
            val_top >= alternatives[j, k] * directions[k]
            for j in range(n_alt) if j != top_idx
        )
        if better_than_all:
            dom_count += 1

    if dom_count >= 2:
        winner_idx = top_idx
    else:
        # Weighted sum method
        ws_scores = np.sum(alternatives * np.array([0.4, 0.35, -0.25]), axis=1)
        winner_idx = int(np.argmax(ws_scores))

    result = top_score + pareto_size + winner_idx
    return str(round(result, 4))
Correct Answer 18.7127
CoT Answer 20.6
28
ID: 137 Difficulty: 8 Best: GPT-OSS-20b

Question

Data points: x=[0,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0], y=[1.0,2.3,1.8,3.5,2.1,4.2,3.0,5.1,2.8,4.5,3.2]. The data was recorded by a 14-channel DAQ system sampling at 500 kHz. Fit cubic spline interpolation. Evaluate spline and derivatives on a 100-point fine grid over [0,5]. Compute curvature kappa = |y''|/(1+y'^2)^(3/2) at each point and find max curvature. The DAQ system cost $67,000. Integrate the curvature values using Simpson's rule (scipy.integrate.simpson) on the 100-point grid. To check convergence, also integrate on a 200-point grid; if the absolute difference exceeds 1e-6, count a refinement and also compute on a 400-point grid. If the 400-vs-200 difference also exceeds 1e-6, count another refinement. Use the finest computed grid's integral. The signal-to-noise ratio was approximately 42 dB. Also compute the arc length of the spline curve by integrating sqrt(1+y'^2) using Simpson's rule on a 200-point grid over [0,5]. Return total_curvature_integral + max_curvature + n_refinements + 0.1*arc_length, rounded to 4 decimal places.

Answer Verification Code Python

def solve_137() -> str:
    """Interpolation  numerical differentiation  grid-based quadrature  convergence check.

    Build a complex function from data. Interpolate with cubic spline.
    Compute first and second derivatives numerically. Use Simpson's rule
    on fine grids for integration. If successive grid refinements change
    the result beyond tolerance, refine and recompute. Return integral
    value + max curvature + number of grid refinements + 0.1*arc_length.
    """
    import numpy as np
    from scipy.interpolate import CubicSpline
    from scipy.integrate import simpson

    # Deterministic data points for a complex curve
    x_data = np.array([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
    y_data = np.array([1.0, 2.3, 1.8, 3.5, 2.1, 4.2, 3.0, 5.1, 2.8, 4.5, 3.2])

    # Step 1: Cubic spline interpolation
    cs = CubicSpline(x_data, y_data)

    # Step 2: Evaluate on a fine grid
    n_grid = 100
    x_fine = np.linspace(0, 5, n_grid)
    dy_fine = cs(x_fine, 1)
    ddy_fine = cs(x_fine, 2)

    # Step 3: Curvature at each point: kappa = |y''| / (1 + y'^2)^(3/2)
    curvature = np.abs(ddy_fine) / (1 + dy_fine**2)**1.5
    max_curvature = float(np.max(curvature))

    # Step 4: Integrate curvature using Simpson's rule on the 100-point grid
    integral_100 = float(simpson(curvature, x=x_fine))

    # Step 5: Convergence check — compare successive grid refinements
    n_refinements = 0
    tolerance = 1e-6

    x_200 = np.linspace(0, 5, 200)
    curvature_200 = np.abs(cs(x_200, 2)) / (1 + cs(x_200, 1)**2)**1.5
    integral_200 = float(simpson(curvature_200, x=x_200))

    if abs(integral_200 - integral_100) > tolerance:
        n_refinements = 1
        x_400 = np.linspace(0, 5, 400)
        curvature_400 = np.abs(cs(x_400, 2)) / (1 + cs(x_400, 1)**2)**1.5
        integral_400 = float(simpson(curvature_400, x=x_400))

        if abs(integral_400 - integral_200) > tolerance:
            n_refinements = 2
        integral_val = integral_400
    else:
        integral_val = integral_200

    # Step 6: Arc length using Simpson's rule on a 200-point grid
    x_arc = np.linspace(0, 5, 200)
    arc_integrand = np.sqrt(1 + cs(x_arc, 1)**2)
    arc_length = float(simpson(arc_integrand, x=x_arc))

    result = integral_val + max_curvature + n_refinements + arc_length * 0.1
    return str(round(result, 4))
Correct Answer 56.4112
CoT Answer 56.2037
29
ID: 138 Difficulty: 8 Best: GPT-OSS-120b

Question

Schedule 20 jobs on 4 machines. Jobs (duration, deadline, priority): (3,10,2),(5,15,3),(2,8,1),(4,12,2),(6,20,4),(1,5,1),(3,14,3),(7,25,5),(2,9,2),(4,16,3),(5,18,4),(3,11,2),(6,22,4),(2,7,1),(4,13,3),(1,6,1),(5,19,3),(3,10,2),(7,24,5),(2,8,1). The factory floor spans 12,000 square meters. Apply LPT (Longest Processing Time) greedy assignment: sort by decreasing duration, assign each to least-loaded machine. Compute greedy makespan. Within each machine, sort by EDF (Earliest Deadline First). The factory has 245 employees across 3 shifts. Count late jobs (finish_time > deadline). Solve LP relaxation: minimize C_max subject to x[i,m]>=0, sum_m(x[i,m])=1, sum_i(duration_i * x[i,m]) <= C_max for each machine. The plant manager has 18 years of experience. If LP makespan < 80% of greedy makespan, re-assign using LP-guided rounding (assign each job to its highest-fraction machine, checking capacity <= 1.2*LP_makespan). Compute machine utilization = total_duration/(n_machines*makespan). Return makespan + late_jobs + utilization, rounded to 4 decimal places.

Answer Verification Code Python

def solve_138() -> str:
    """Integer programming  bin packing  schedule optimization.

    20 jobs with deterministic durations and deadlines across 4 machines.
    Minimize makespan using LP relaxation. If fractional solution exists,
    round and verify feasibility. If infeasible after rounding, relax
    one deadline constraint. Return makespan + number of late jobs +
    machine utilization ratio.
    """
    import numpy as np
    from scipy.optimize import linprog

    # 20 jobs: (duration, deadline, priority)
    jobs = [
        (3, 10, 2), (5, 15, 3), (2, 8, 1), (4, 12, 2), (6, 20, 4),
        (1, 5, 1), (3, 14, 3), (7, 25, 5), (2, 9, 2), (4, 16, 3),
        (5, 18, 4), (3, 11, 2), (6, 22, 4), (2, 7, 1), (4, 13, 3),
        (1, 6, 1), (5, 19, 3), (3, 10, 2), (7, 24, 5), (2, 8, 1),
    ]
    n_jobs = len(jobs)
    n_machines = 4

    # Step 1: Greedy assignment (Longest Processing Time first)
    sorted_jobs = sorted(enumerate(jobs), key=lambda x: -x[1][0])
    machine_loads = [0.0] * n_machines
    assignment: list[int] = [0] * n_jobs

    for job_idx, (duration, _deadline, _priority) in sorted_jobs:
        # Assign to least loaded machine
        min_machine = int(np.argmin(machine_loads))
        assignment[job_idx] = min_machine
        machine_loads[min_machine] += duration

    makespan_greedy = max(machine_loads)

    # Step 2: Check which jobs are late
    # Compute start times (simple sequential on each machine)
    start_times = [0.0] * n_jobs
    # Process in priority order within each machine
    for m in range(n_machines):
        jobs_on_m = [(i, jobs[i]) for i in range(n_jobs) if assignment[i] == m]
        jobs_on_m.sort(key=lambda x: x[1][1])  # by deadline (EDF)
        current_time = 0.0
        for idx, (duration, _deadline, _priority) in jobs_on_m:
            start_times[idx] = current_time
            current_time += duration

    late_jobs = 0
    for i in range(n_jobs):
        finish_time = start_times[i] + jobs[i][0]
        if finish_time > jobs[i][1]:
            late_jobs += 1

    # Step 3: LP relaxation for lower bound on makespan
    # Minimize C_max subject to sum of durations on each machine <= C_max
    # Variables: x[i,m] = fraction of job i on machine m, plus C_max
    n_vars = n_jobs * n_machines + 1  # last var is C_max
    c = np.zeros(n_vars)
    c[-1] = 1.0  # minimize C_max

    # Constraints:
    A_ub_list: list[list[float]] = []
    b_ub_list: list[float] = []

    # For each machine: sum(duration_i * x[i,m]) <= C_max
    for m in range(n_machines):
        row = [0.0] * n_vars
        for i in range(n_jobs):
            row[i * n_machines + m] = jobs[i][0]
        row[-1] = -1.0  # -C_max
        A_ub_list.append(row)
        b_ub_list.append(0.0)

    # Equality: sum over machines for each job = 1
    A_eq_list: list[list[float]] = []
    b_eq_list: list[float] = []
    for i in range(n_jobs):
        row = [0.0] * n_vars
        for m in range(n_machines):
            row[i * n_machines + m] = 1.0
        A_eq_list.append(row)
        b_eq_list.append(1.0)

    bounds_lp = [(0.0, 1.0)] * (n_jobs * n_machines) + [(0.0, None)]
    res = linprog(c, A_ub=A_ub_list, b_ub=b_ub_list,
                  A_eq=A_eq_list, b_eq=b_eq_list,
                  bounds=bounds_lp, method="highs")

    if res.success and res.x is not None:
        lp_makespan = float(res.x[-1])
    else:
        lp_makespan = makespan_greedy

    # Step 4: Conditional — if LP bound is much lower, try to improve
    if lp_makespan < makespan_greedy * 0.8 and res.x is not None:
        # Re-assign using LP-guided rounding
        # Use the LP solution to guide assignment
        x_sol = res.x[:-1].reshape(n_jobs, n_machines)
        machine_loads_new = [0.0] * n_machines
        for i in range(n_jobs):
            best_m = int(np.argmax(x_sol[i]))
            # But check if it overloads
            if machine_loads_new[best_m] + jobs[i][0] <= lp_makespan * 1.2:
                assignment[i] = best_m
            else:
                min_m = int(np.argmin(machine_loads_new))
                assignment[i] = min_m
            machine_loads_new[assignment[i]] += jobs[i][0]
        makespan_final = max(machine_loads_new)
    else:
        makespan_final = makespan_greedy

    # Step 5: Machine utilization
    total_duration = sum(j[0] for j in jobs)
    utilization = total_duration / (n_machines * makespan_final)

    result = makespan_final + late_jobs + utilization
    return str(round(result, 4))
Correct Answer 19.9868
CoT Answer 19.9868
30
ID: 139 Difficulty: 8 Best: GPT-OSS-120b

Question

Build a deterministic 50x6 feature matrix: for sample t (1-50), x1=sin(t*0.2)*3+t*0.1, x2=0.85*x1+cos(t*0.3)*0.5 (collinear with x1), x3=log(t+1)*2, x4=0.7*x3+sin(t*0.4)*0.8 (collinear with x3), x5=(t mod 5)*2-4, x6=sqrt(t)*1.5. The dataset was compiled from 50 test flights of a drone weighing 2.8 kg. True coefficients: beta=[2,-1.5,3,-0.5,1,2.5]. y = X*beta + sin(t*0.5)*2. Fit OLS. Compute VIF for each predictor (regress each on the others). The drone's maximum altitude is 4,500 meters. If max VIF > 10, apply ridge regression with 5-fold cross-validation over 100 alphas from 1e-3 to 1e3 (log-spaced). The flight tests were conducted over 14 months. Compute effective VIF reduction ratio (max_VIF_OLS / max_VIF_ridge where VIF_ridge ≈ VIF_OLS / (1 + alpha*VIF_OLS)). Return R²_ridge + optimal_alpha + VIF_reduction_ratio, rounded to 4 decimal places.

Answer Verification Code Python

def solve_139() -> str:
    """Multivariate regression  VIF  ridge regression  cross-validation.

    Build deterministic 50x6 feature matrix with known collinearity.
    Fit OLS. Compute VIF. If max VIF > 10, apply ridge regression
    with cross-validated alpha. Compare OLS vs ridge predictions.
    Return ridge  + optimal alpha + max VIF reduction ratio.
    """
    import numpy as np
    import statsmodels.api as sm
    from sklearn.linear_model import RidgeCV
    from sklearn.metrics import r2_score

    # Deterministic 50x6 feature matrix with collinearity
    n = 50
    t = np.arange(1, n + 1, dtype=float)

    x1 = np.sin(t * 0.2) * 3 + t * 0.1
    x2 = 0.85 * x1 + np.cos(t * 0.3) * 0.5  # correlated with x1
    x3 = np.log(t + 1) * 2
    x4 = 0.7 * x3 + np.sin(t * 0.4) * 0.8   # correlated with x3
    x5 = (t % 5).astype(float) * 2 - 4
    x6 = np.sqrt(t) * 1.5

    X = np.column_stack([x1, x2, x3, x4, x5, x6])
    # True coefficients
    beta_true = np.array([2.0, -1.5, 3.0, -0.5, 1.0, 2.5])
    y = X @ beta_true + np.sin(t * 0.5) * 2  # deterministic noise

    # Step 1: OLS fit
    X_c = sm.add_constant(X)
    ols_model = sm.OLS(y, X_c).fit()
    r2_ols = float(ols_model.rsquared)

    # Step 2: Compute VIF for each predictor
    from sklearn.linear_model import LinearRegression
    vifs: list[float] = []
    for i in range(X.shape[1]):
        mask = [j for j in range(X.shape[1]) if j != i]
        X_others = X[:, mask]
        lr = LinearRegression().fit(X_others, X[:, i])
        r2_i = lr.score(X_others, X[:, i])
        vif = 1.0 / (1.0 - r2_i) if r2_i < 1 else 1e6
        vifs.append(vif)

    max_vif_ols = max(vifs)

    # Step 3: Conditional — if VIF > 10, apply ridge regression
    if max_vif_ols > 10:
        alphas = np.logspace(-3, 3, 100)
        ridge = RidgeCV(alphas=alphas, cv=5).fit(X, y)
        optimal_alpha = float(ridge.alpha_)
        y_pred_ridge = ridge.predict(X)
        r2_ridge = float(r2_score(y, y_pred_ridge))

        # Compute effective VIF after ridge (approximate)
        # Ridge shrinks coefficients, so effective VIF is reduced
        # VIF_ridge ≈ VIF_ols / (1 + alpha * VIF_ols)
        vifs_ridge = [v / (1 + optimal_alpha * v) for v in vifs]
        max_vif_ridge = max(vifs_ridge)
        vif_reduction = max_vif_ols / max_vif_ridge if max_vif_ridge > 0 else 1.0
    else:
        r2_ridge = r2_ols
        optimal_alpha = 0.0
        vif_reduction = 1.0

    result = r2_ridge + optimal_alpha + vif_reduction
    return str(round(result, 4))
Correct Answer 63.6224
CoT Answer 8.5963