.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/02_meta-analyses/16_plot_jackknife_vs_resampled_stability.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_02_meta-analyses_16_plot_jackknife_vs_resampled_stability.py: .. _metas_jackknife_vs_resampled_stability: ======================================================= Stability diagnostics: Jackknife vs. ResampledStability ======================================================= Once a meta-analysis has been run and a thresholded result is in hand, the natural next question is: *how much should we trust it?* NiMARE provides two post-hoc diagnostics that approach this question from complementary angles. :class:`~nimare.diagnostics.Jackknife` asks *"which studies are responsible for each significant cluster?"* It loops through every study, refits the meta-analysis without that study, and reports how much the cluster-level statistics drop — a high contribution score means the cluster depends heavily on a single experiment. :class:`~nimare.diagnostics.ResampledStability` asks *"how reproducibly does each brain voxel survive thresholding when we perturb the study set?"* It generates many resampled versions of the dataset, refits the full pipeline on each, and averages the binary significant/not-significant outcome across resamples — yielding a voxelwise stability map between 0 (never significant) and 1 (always significant). The two diagnostics measure different things and have different computational costs: * **Jackknife** is fast (N refits for N studies), cluster-level, and study-level — ideal as a first robustness check built into every workflow. * **ResampledStability** is spatially explicit, voxelwise, and policy-flexible — better suited for publication-quality robustness figures and large-dataset analyses. This example runs both on the same ALE result so you can compare their outputs directly. .. note:: For real analyses use ``n_iters ≥ 5000`` for the FWE corrector and ``n_resamples ≥ 100`` for :class:`~nimare.diagnostics.ResampledStability`. We use small counts here purely for documentation-build speed. .. GENERATED FROM PYTHON SOURCE LINES 41-43 Imports and constants ----------------------------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 43-63 .. code-block:: Python import copy import os import warnings import matplotlib.pyplot as plt import pandas as pd from nilearn.plotting import plot_stat_map from nimare.correct import FWECorrector from nimare.diagnostics import Jackknife, ResampledStability from nimare.meta.cbma.ale import ALE from nimare.nimads import Studyset from nimare.utils import get_resource_path warnings.filterwarnings("ignore") N_ITERS = 50 # increase to ≥5000 for real analyses N_RESAMPLES = 20 # increase to ≥100 for real analyses RANDOM_STATE = 42 .. GENERATED FROM PYTHON SOURCE LINES 64-74 Load data and fit the baseline ALE meta-analysis ----------------------------------------------------------------------------- We use the NiMARE pain dataset (21 studies, MNI 2 mm) throughout this example. Both diagnostics operate on an already-fitted :class:`~nimare.results.MetaResult`, so we run ALE and apply a Monte Carlo FWE corrector once and then reuse that result for each diagnostic. The cluster-level corrected z-map is our primary target image — the one that determines which voxels are "significant" and therefore which clusters the diagnostics evaluate. .. GENERATED FROM PYTHON SOURCE LINES 74-86 .. code-block:: Python studyset_file = os.path.join(get_resource_path(), "nidm_pain_studyset.json") studyset = Studyset(studyset_file, target="mni152_2mm") print(f"Number of studies: {len(studyset.studies)}") ale = ALE() result = ale.fit(studyset) corrector = FWECorrector(method="montecarlo", n_iters=N_ITERS, n_cores=1) result = corrector.transform(result) TARGET_IMAGE = "z_desc-size_level-cluster_corr-FWE_method-montecarlo" .. rst-class:: sphx-glr-script-out .. code-block:: none Number of studies: 21 0%| | 0/50 [00:00
Cluster ID X Y Z Peak Stat Cluster Size (mm3)
0 PositiveTail 1 -58.0 -26.0 22.0 2.053749 1584
1 PositiveTail 2 -32.0 -62.0 -38.0 2.053749 2632
2 PositiveTail 3 -34.0 14.0 0.0 2.053749 1160
3 PositiveTail 4 0.0 8.0 46.0 2.053749 5320
4 PositiveTail 5 38.0 8.0 -2.0 2.053749 6488
5 PositiveTail 6 54.0 -26.0 20.0 2.053749 1608
6 PositiveTail 7 18.0 -102.0 -4.0 0.100434 480


.. GENERATED FROM PYTHON SOURCE LINES 136-143 Jackknife study-contribution table `````````````````````````````````````````````````````````````````````````````` Each row is a study; each column is a cluster. Cell values are the mean proportional contribution of that study to that cluster. High values (e.g. > 0.8) flag studies whose removal would substantially weaken a cluster — worth inspecting for outlier coordinates, inflated sample sizes, or duplicate peaks. .. GENERATED FROM PYTHON SOURCE LINES 143-147 .. code-block:: Python contrib_key = f"{TARGET_IMAGE}_diag-Jackknife_tab-counts_tail-positive" contrib_df = result_jk.tables.get(contrib_key) contrib_df .. raw:: html
id PositiveTail 1 PositiveTail 2 PositiveTail 3 PositiveTail 4 PositiveTail 5 PositiveTail 6 PositiveTail 7
0 pain_01.nidm-1 0.0 0.073695 0.0 0.0 0.032443 0.0 0.058134
1 pain_02.nidm-1 0.0 0.0 0.000024 0.0 0.000006 0.142421 0.000014
2 pain_03.nidm-1 0.0 0.109131 0.0 0.134174 0.023071 0.0 0.385138
3 pain_04.nidm-1 0.0 0.076218 0.184233 0.113124 0.079341 0.231365 0.395319
4 pain_05.nidm-1 0.0 0.12414 0.037756 0.083693 0.093972 0.0 0.0
5 pain_06.nidm-1 0.0 0.0 0.000057 0.039228 0.0 0.0 0.0
6 pain_07.nidm-1 0.0 0.015453 0.0 0.0 0.0 0.0 0.0
7 pain_08.nidm-1 0.0 0.11783 0.0 0.042313 0.000005 0.14787 0.0
8 pain_09.nidm-1 0.0 0.058825 0.0 0.0 0.001745 0.0 0.0
9 pain_10.nidm-1 0.0557 0.11594 0.0 0.020533 0.075745 0.139293 0.0
10 pain_11.nidm-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
11 pain_12.nidm-1 0.0 0.0 0.195963 0.0 0.053017 0.120445 0.0
12 pain_13.nidm-1 0.170124 0.0 0.173871 0.0 0.120761 0.0 0.0
13 pain_14.nidm-1 0.0 0.118262 0.0 0.0 0.069786 0.0 0.157469
14 pain_15.nidm-1 0.136875 0.0 0.0 0.057199 0.035299 0.0 0.0
15 pain_16.nidm-1 0.100759 0.000326 0.189962 0.058444 0.051459 0.100128 0.0
16 pain_17.nidm-1 0.214114 0.080555 0.0 0.0 0.0 0.0 0.0
17 pain_18.nidm-1 0.207505 0.0 0.000324 0.0 0.094171 0.11225 0.0
18 pain_19.nidm-1 0.0 0.104601 0.0 0.155254 0.131266 0.0 0.0
19 pain_20.nidm-1 0.0 0.0 0.0 0.113222 0.086047 0.0 0.0
20 pain_21.nidm-1 0.11048 0.0 0.211829 0.17764 0.045384 0.0 0.0


.. GENERATED FROM PYTHON SOURCE LINES 148-153 Visualise study contributions as a heatmap `````````````````````````````````````````````````````````````````````````````` A heatmap makes it easy to spot studies that dominate one or more clusters and studies that contribute little across the board. If a single row shows consistently high values, the corresponding study warrants closer scrutiny. .. GENERATED FROM PYTHON SOURCE LINES 153-185 .. code-block:: Python if contrib_df is not None and not contrib_df.empty: contrib_values = contrib_df.apply(pd.to_numeric, errors="coerce").fillna(0.0) fig, ax = plt.subplots( figsize=( max(6, len(contrib_values.columns) * 1.2), max(5, len(contrib_values) * 0.35), ) ) im = ax.imshow( contrib_values.to_numpy(dtype=float), aspect="auto", cmap="YlOrRd", vmin=0, vmax=1, ) ax.set_xticks(range(len(contrib_values.columns))) ax.set_xticklabels(contrib_values.columns, rotation=45, ha="right", fontsize=9) ax.set_yticks(range(len(contrib_values))) ax.set_yticklabels(contrib_values.index, fontsize=7) ax.set_xlabel("Cluster", fontsize=11) ax.set_ylabel("Study", fontsize=11) ax.set_title("Jackknife: proportional contribution per study × cluster", fontsize=12) plt.colorbar(im, ax=ax, label="Contribution (0 = none, 1 = complete)") plt.tight_layout() plt.show() mean_contrib = contrib_values.mean(axis=1).sort_values(ascending=False) print("Mean contribution across all clusters (top 10):") print(mean_contrib.head(10).to_string()) else: print("No clusters found — increase N_ITERS or lower the cluster threshold.") .. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_002.png :alt: Jackknife: proportional contribution per study × cluster :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Mean contribution across all clusters (top 10): 3 0.134950 2 0.081439 20 0.068167 15 0.062635 12 0.058094 17 0.051781 9 0.050901 18 0.048890 11 0.046178 13 0.043190 .. GENERATED FROM PYTHON SOURCE LINES 186-217 ResampledStability: voxelwise reproducibility under resampling ----------------------------------------------------------------------------- :class:`~nimare.diagnostics.ResampledStability` estimates how reliably each voxel survives thresholding when the composition of the study set changes. For each replicate the algorithm: 1. Draws a subset of studies according to the chosen ``resampling_policy``. 2. Refits the full estimator (and corrector, if the target image requires it) on the subset. 3. Thresholds the result and records a binary support map (1 = significant, 0 = not). The final stability map is the **mean binary support across all replicates**. A stability of 1 means the voxel survived thresholding in every replicate; a stability of 0 means it never survived. Three resampling policies are available: * ``"leave_1_out"`` — omit exactly one study per replicate; deterministic; generates N replicates for N studies. * ``"leave_k_out"`` — omit k studies per replicate; useful for testing sensitivity to blocks of studies. * ``"subsample"`` — random subsamples of ``target_n`` studies; flexible and recommended for large datasets (> 30 studies). Unlike Jackknife, ResampledStability does **not** identify which study is responsible for instability — it only tells you *where* the result is stable. The two diagnostics are therefore complementary: run Jackknife first to flag influential studies, then add ResampledStability to document spatial reliability for publication. .. GENERATED FROM PYTHON SOURCE LINES 219-226 Leave-one-out stability `````````````````````````````````````````````````````````````````````````````` ``"leave_1_out"`` is the most conservative policy: it drops exactly one study per replicate, giving N deterministic replicates. Because every study is omitted exactly once, the result is fully reproducible without a random seed. This policy is recommended for small datasets (< 25 studies) where removing a single study can substantially change the analysis. .. GENERATED FROM PYTHON SOURCE LINES 226-236 .. code-block:: Python rs_loo = ResampledStability( target_image=TARGET_IMAGE, resampling_policy="leave_1_out", n_cores=1, ) result_loo = rs_loo.transform(copy.deepcopy(result)) print("Leave-one-out summary:") print(result_loo.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"]) .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/21 [00:00 0] ax.hist(nonzero, bins=20, range=(0, 1), color="steelblue", edgecolor="white") ax.set_title(title, fontsize=10) ax.set_xlabel("Stability") ax.set_xlim(0, 1) mean_val = nonzero.mean() if len(nonzero) > 0 else 0 ax.axvline(mean_val, color="red", linestyle="--", label=f"mean = {mean_val:.2f}") ax.legend(fontsize=8) axes[0].set_ylabel("Voxel count") fig.suptitle("Distribution of non-zero stability values across resampling policies", fontsize=13) fig.tight_layout() plt.show() .. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_004.png :alt: Distribution of non-zero stability values across resampling policies, Leave-one-out, Leave-3-out (20 resamples), Subsample n=15 (20 resamples) :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 338-346 Baseline, Jackknife clusters, and stability side by side ----------------------------------------------------------------------------- Placing the corrected z-map, the Jackknife cluster-label map, and the leave-one-out stability map on the same axial slices makes it easy to see whether the regions identified as clusters are also the regions with the highest voxelwise stability. When they agree, the result is doubly supported. When the stability map is patchy or low inside a cluster boundary, that cluster deserves more scrutiny. .. GENERATED FROM PYTHON SOURCE LINES 346-397 .. code-block:: Python fig, axes = plt.subplots(3, 1, figsize=(14, 11)) plot_stat_map( result.get_map(TARGET_IMAGE), cut_coords=5, display_mode="z", title="ALE — cluster-level FWE corrected z-map (baseline)", threshold=1.65, cmap="RdBu_r", symmetric_cbar=True, vmax=5, axes=axes[0], figure=fig, ) if contrib_df is not None and not contrib_df.empty: label_key = f"label_{TARGET_IMAGE}_tail-positive" if label_key in result_jk.maps: plot_stat_map( result_jk.get_map(label_key), cut_coords=5, display_mode="z", title="Jackknife — cluster label map (colour = cluster ID)", threshold=0.5, cmap="Set1", symmetric_cbar=False, axes=axes[1], figure=fig, ) else: axes[1].set_title("Jackknife label map not available") else: axes[1].set_title("No clusters found for Jackknife") plot_stat_map( result_loo.get_map(stability_key), cut_coords=5, display_mode="z", title="ResampledStability — leave-one-out voxelwise stability (0–1)", threshold=0.1, vmin=0, vmax=1, cmap="hot", symmetric_cbar=False, axes=axes[2], figure=fig, ) fig.tight_layout() plt.show() .. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_005.png :alt: Jackknife label map not available :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 398-404 Numerical stability summary across policies ----------------------------------------------------------------------------- The table below shows how many voxels survive at three stability thresholds (> 0, ≥ 0.5, ≥ 0.8) under each resampling policy. A strict threshold of 0.8 retains only the voxels that survived thresholding in at least 80 % of resamples — a reasonable bar for high-confidence reporting. .. GENERATED FROM PYTHON SOURCE LINES 404-427 .. code-block:: Python rows = [] for res, label in configs: stab = res.get_map(stability_key, return_type="array") nonzero = stab[stab > 0] rows.append( { "Policy": label, "N replicates": int( res.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"][ "n_resamples" ].iloc[0] ), "Stable voxels (>0)": int(len(nonzero)), "Stable voxels (≥0.5)": int((stab >= 0.5).sum()), "Stable voxels (≥0.8)": int((stab >= 0.8).sum()), "Mean stability (nonzero)": ( round(float(nonzero.mean()), 3) if len(nonzero) > 0 else 0.0 ), } ) pd.DataFrame(rows).set_index("Policy") .. raw:: html
N replicates Stable voxels (>0) Stable voxels (≥0.5) Stable voxels (≥0.8) Mean stability (nonzero)
Policy
Leave-one-out 21 2344 2343 2083 0.896
Leave-3-out (20 resamples) 20 2344 1894 958 0.691
Subsample n=15 (20 resamples) 20 2305 991 353 0.450


.. GENERATED FROM PYTHON SOURCE LINES 428-516 Key differences at a glance ----------------------------------------------------------------------------- .. list-table:: :header-rows: 1 :widths: 30 35 35 * - Feature - Jackknife - ResampledStability * - Question answered - Which studies drive each cluster? - How reliably does each voxel survive thresholding? * - Output granularity - Study × cluster (one scalar per pair) - Voxelwise map (one value per brain voxel) * - Output range - 0–1 (proportional contribution) - 0–1 (proportion of resamples surviving threshold) * - Number of estimator refits - N (one per study) - ``n_resamples`` * - Resampling policy - Fixed: leave-one-out - Choice: leave_1_out / leave_k_out / subsample * - Works with pairwise estimators? - Yes (v 0.1.2+) - No * - Null distribution rebuilt per replicate? - No (fast path for CBMA) - Depends on policy (subsample rebuilds it) * - Typical compute cost - O(N) estimator fits - O(n_resamples) estimator fits * - Default in CBMAWorkflow / IBMAWorkflow? - Yes - No * - Primary use case - Influence and outlier detection - Spatial reliability for publication figures **When to use Jackknife** Use :class:`~nimare.diagnostics.Jackknife` whenever you want to know which studies are responsible for a significant cluster. It is the right first diagnostic in almost every meta-analysis: * It is **fast** — N refits where N is the study count, without rebuilding the null distribution. * It is **interpretable** — reviewers can cross-check high-contribution studies against the original publications. * It works with **all single-sample and pairwise estimators** in NiMARE. * It runs **automatically** inside :class:`~nimare.workflows.cbma.CBMAWorkflow` and :class:`~nimare.workflows.ibma.IBMAWorkflow`. If any study shows a contribution > 0.8 in a cluster, inspect it carefully: unusual coordinate densities, atypical sample sizes, or duplicate peaks from the same laboratory are common culprits. **When to use ResampledStability** Use :class:`~nimare.diagnostics.ResampledStability` when you need a spatially explicit reliability map — for example to include in a supplementary figure, to compare the robustness of two estimators, or to flag voxels at the edges of clusters that may not be trustworthy. * Choose ``"leave_1_out"`` for **small datasets** (< 25 studies) where each study carries substantial weight. * Choose ``"leave_k_out"`` for **larger datasets** (> 30 studies). * Choose ``"subsample"`` for **larger datasets** (> 30 studies) or when you want to quantify what fraction of the result survives at a reduced sample size (e.g. ``target_n = int(0.75 * n_studies)``). A practical benchmark: report mean stability per cluster; flag clusters with mean stability < 0.5 under ``"leave_1_out"`` as potentially unreliable even if they survived FWE correction. **Recommended workflow** 1. Run :class:`~nimare.diagnostics.Jackknife` (or use :class:`~nimare.workflows.cbma.CBMAWorkflow` which runs it automatically) to identify influential studies. 2. For publication, add :class:`~nimare.diagnostics.ResampledStability` with ``"leave_1_out"`` (small datasets) or ``"subsample"`` (large datasets) and include the stability map as a supplementary figure. 3. Interpret the two diagnostics together: a cluster that is both driven by a single study (high Jackknife contribution) *and* spatially unstable (low ResampledStability) should be downgraded in confidence or omitted. .. GENERATED FROM PYTHON SOURCE LINES 518-521 References ----------------------------------------------------------------------------- .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 23.384 seconds) .. _sphx_glr_download_auto_examples_02_meta-analyses_16_plot_jackknife_vs_resampled_stability.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 16_plot_jackknife_vs_resampled_stability.ipynb <16_plot_jackknife_vs_resampled_stability.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 16_plot_jackknife_vs_resampled_stability.py <16_plot_jackknife_vs_resampled_stability.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 16_plot_jackknife_vs_resampled_stability.zip <16_plot_jackknife_vs_resampled_stability.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_