Note

Go to the end to download the full example code.

Stability diagnostics: Jackknife vs. ResampledStability

Once a meta-analysis has been run and a thresholded result is in hand, the natural next question is: how much should we trust it? NiMARE provides two post-hoc diagnostics that approach this question from complementary angles.

Jackknife asks “which studies are responsible for each significant cluster?” It loops through every study, refits the meta-analysis without that study, and reports how much the cluster-level statistics drop — a high contribution score means the cluster depends heavily on a single experiment.

ResampledStability asks “how reproducibly does each brain voxel survive thresholding when we perturb the study set?” It generates many resampled versions of the dataset, refits the full pipeline on each, and averages the binary significant/not-significant outcome across resamples — yielding a voxelwise stability map between 0 (never significant) and 1 (always significant).

The two diagnostics measure different things and have different computational costs:

Jackknife is fast (N refits for N studies), cluster-level, and study-level — ideal as a first robustness check built into every workflow.
ResampledStability is spatially explicit, voxelwise, and policy-flexible — better suited for publication-quality robustness figures and large-dataset analyses.

This example runs both on the same ALE result so you can compare their outputs directly.

Note

For real analyses use n_iters ≥ 5000 for the FWE corrector and n_resamples ≥ 100 for ResampledStability. We use small counts here purely for documentation-build speed.

Imports and constants

import copy
import os
import warnings

import matplotlib.pyplot as plt
import pandas as pd
from nilearn.plotting import plot_stat_map

from nimare.correct import FWECorrector
from nimare.diagnostics import Jackknife, ResampledStability
from nimare.meta.cbma.ale import ALE
from nimare.nimads import Studyset
from nimare.utils import get_resource_path

warnings.filterwarnings("ignore")

N_ITERS = 50       # increase to ≥5000 for real analyses
N_RESAMPLES = 20   # increase to ≥100 for real analyses
RANDOM_STATE = 42

Load data and fit the baseline ALE meta-analysis

We use the NiMARE pain dataset (21 studies, MNI 2 mm) throughout this example. Both diagnostics operate on an already-fitted MetaResult, so we run ALE and apply a Monte Carlo FWE corrector once and then reuse that result for each diagnostic.

The cluster-level corrected z-map is our primary target image — the one that determines which voxels are “significant” and therefore which clusters the diagnostics evaluate.

studyset_file = os.path.join(get_resource_path(), "nidm_pain_studyset.json")
studyset = Studyset(studyset_file, target="mni152_2mm")
print(f"Number of studies: {len(studyset.studies)}")

ale = ALE()
result = ale.fit(studyset)

corrector = FWECorrector(method="montecarlo", n_iters=N_ITERS, n_cores=1)
result = corrector.transform(result)

TARGET_IMAGE = "z_desc-size_level-cluster_corr-FWE_method-montecarlo"

Number of studies: 21

  0%|          | 0/50 [00:00<?, ?it/s]
  4%|▍         | 2/50 [00:00<00:02, 17.94it/s]
  8%|▊         | 4/50 [00:00<00:02, 17.70it/s]
 12%|█▏        | 6/50 [00:00<00:02, 17.60it/s]
 16%|█▌        | 8/50 [00:00<00:02, 17.57it/s]
 20%|██        | 10/50 [00:00<00:02, 17.61it/s]
 24%|██▍       | 12/50 [00:00<00:02, 17.50it/s]
 28%|██▊       | 14/50 [00:00<00:02, 17.57it/s]
 32%|███▏      | 16/50 [00:00<00:01, 17.60it/s]
 36%|███▌      | 18/50 [00:01<00:01, 17.65it/s]
 40%|████      | 20/50 [00:01<00:01, 17.73it/s]
 44%|████▍     | 22/50 [00:01<00:01, 17.76it/s]
 48%|████▊     | 24/50 [00:01<00:01, 17.68it/s]
 52%|█████▏    | 26/50 [00:01<00:01, 17.76it/s]
 56%|█████▌    | 28/50 [00:01<00:01, 17.70it/s]
 60%|██████    | 30/50 [00:01<00:01, 17.72it/s]
 64%|██████▍   | 32/50 [00:01<00:01, 17.84it/s]
 68%|██████▊   | 34/50 [00:01<00:00, 17.90it/s]
 72%|███████▏  | 36/50 [00:02<00:00, 17.89it/s]
 76%|███████▌  | 38/50 [00:02<00:00, 17.92it/s]
 80%|████████  | 40/50 [00:02<00:00, 17.88it/s]
 84%|████████▍ | 42/50 [00:02<00:00, 17.92it/s]
 88%|████████▊ | 44/50 [00:02<00:00, 17.95it/s]
 92%|█████████▏| 46/50 [00:02<00:00, 17.95it/s]
 96%|█████████▌| 48/50 [00:02<00:00, 17.99it/s]
100%|██████████| 50/50 [00:02<00:00, 18.07it/s]
100%|██████████| 50/50 [00:02<00:00, 17.80it/s]

Baseline corrected result

Before running any diagnostics, we visualise the cluster-level FWE-corrected z-map. This is the map that the diagnostics will characterise.

plot_stat_map(
    result.get_map(TARGET_IMAGE),
    cut_coords=5,
    display_mode="z",
    title="ALE — cluster-level FWE corrected z-map (baseline)",
    threshold=1.65,
    cmap="RdBu_r",
    symmetric_cbar=True,
    vmax=5,
)
plt.show()

16 plot jackknife vs resampled stability

Jackknife: identifying influential studies

Jackknife characterises each study’s proportional contribution to every significant cluster.

For each study i the algorithm:

Refits the estimator on all remaining N − 1 studies.
Computes the proportional reduction in the summary statistic at each voxel: 1 − (stat_{−i} / stat_full).
Averages across all voxels in the cluster.

A contribution near 1 means removing study i substantially reduces the cluster statistic — that study is a strong driver. A contribution near 0 means the cluster survives almost unchanged without it.

Because Jackknife only recomputes the summary statistic (not the full Monte Carlo null distribution) for each N − 1 dataset, it is comparatively fast. It is the default diagnostic in CBMAWorkflow and IBMAWorkflow.

jackknife = Jackknife(target_image=TARGET_IMAGE, n_cores=1)
result_jk = jackknife.transform(copy.deepcopy(result))

  0%|          | 0/21 [00:00<?, ?it/s]
  5%|▍         | 1/21 [00:00<00:05,  3.62it/s]
 10%|▉         | 2/21 [00:00<00:05,  3.61it/s]
 14%|█▍        | 3/21 [00:00<00:04,  3.60it/s]
 19%|█▉        | 4/21 [00:01<00:04,  3.60it/s]
 24%|██▍       | 5/21 [00:01<00:04,  3.57it/s]
 29%|██▊       | 6/21 [00:01<00:04,  3.52it/s]
 33%|███▎      | 7/21 [00:01<00:03,  3.51it/s]
 38%|███▊      | 8/21 [00:02<00:03,  3.53it/s]
 43%|████▎     | 9/21 [00:02<00:03,  3.55it/s]
 48%|████▊     | 10/21 [00:02<00:03,  3.55it/s]
 52%|█████▏    | 11/21 [00:03<00:02,  3.54it/s]
 57%|█████▋    | 12/21 [00:03<00:02,  3.56it/s]
 62%|██████▏   | 13/21 [00:03<00:02,  3.56it/s]
 67%|██████▋   | 14/21 [00:03<00:01,  3.54it/s]
 71%|███████▏  | 15/21 [00:04<00:01,  3.55it/s]
 76%|███████▌  | 16/21 [00:04<00:01,  3.56it/s]
 81%|████████  | 17/21 [00:04<00:01,  3.57it/s]
 86%|████████▌ | 18/21 [00:05<00:00,  3.57it/s]
 90%|█████████ | 19/21 [00:05<00:00,  3.58it/s]
 95%|█████████▌| 20/21 [00:05<00:00,  3.58it/s]
100%|██████████| 21/21 [00:05<00:00,  3.57it/s]
100%|██████████| 21/21 [00:05<00:00,  3.56it/s]

Jackknife clusters table

The clusters table summarises the location and size of each significant cluster. Peak statistic and MNI centre of mass are reported for each.

clust_key = f"{TARGET_IMAGE}_tab-clust"
result_jk.tables[clust_key]

	Cluster ID	X	Y	Z	Peak Stat	Cluster Size (mm3)
0	PositiveTail 1	-58.0	-26.0	22.0	2.053749	1584
1	PositiveTail 2	-32.0	-62.0	-38.0	2.053749	2632
2	PositiveTail 3	-34.0	14.0	0.0	2.053749	1160
3	PositiveTail 4	0.0	8.0	46.0	2.053749	5320
4	PositiveTail 5	38.0	8.0	-2.0	2.053749	6488
5	PositiveTail 6	54.0	-26.0	20.0	2.053749	1608
6	PositiveTail 7	18.0	-102.0	-4.0	0.100434	480

Jackknife study-contribution table

Each row is a study; each column is a cluster. Cell values are the mean proportional contribution of that study to that cluster. High values (e.g. > 0.8) flag studies whose removal would substantially weaken a cluster — worth inspecting for outlier coordinates, inflated sample sizes, or duplicate peaks.

contrib_key = f"{TARGET_IMAGE}_diag-Jackknife_tab-counts_tail-positive"
contrib_df = result_jk.tables.get(contrib_key)
contrib_df

	id	PositiveTail 1	PositiveTail 2	PositiveTail 3	PositiveTail 4	PositiveTail 5	PositiveTail 6	PositiveTail 7
0	pain_01.nidm-1	0.0	0.073695	0.0	0.0	0.032443	0.0	0.058134
1	pain_02.nidm-1	0.0	0.0	0.000024	0.0	0.000006	0.142421	0.000014
2	pain_03.nidm-1	0.0	0.109131	0.0	0.134174	0.023071	0.0	0.385138
3	pain_04.nidm-1	0.0	0.076218	0.184233	0.113124	0.079341	0.231365	0.395319
4	pain_05.nidm-1	0.0	0.12414	0.037756	0.083693	0.093972	0.0	0.0
5	pain_06.nidm-1	0.0	0.0	0.000057	0.039228	0.0	0.0	0.0
6	pain_07.nidm-1	0.0	0.015453	0.0	0.0	0.0	0.0	0.0
7	pain_08.nidm-1	0.0	0.11783	0.0	0.042313	0.000005	0.14787	0.0
8	pain_09.nidm-1	0.0	0.058825	0.0	0.0	0.001745	0.0	0.0
9	pain_10.nidm-1	0.0557	0.11594	0.0	0.020533	0.075745	0.139293	0.0
10	pain_11.nidm-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0
11	pain_12.nidm-1	0.0	0.0	0.195963	0.0	0.053017	0.120445	0.0
12	pain_13.nidm-1	0.170124	0.0	0.173871	0.0	0.120761	0.0	0.0
13	pain_14.nidm-1	0.0	0.118262	0.0	0.0	0.069786	0.0	0.157469
14	pain_15.nidm-1	0.136875	0.0	0.0	0.057199	0.035299	0.0	0.0
15	pain_16.nidm-1	0.100759	0.000326	0.189962	0.058444	0.051459	0.100128	0.0
16	pain_17.nidm-1	0.214114	0.080555	0.0	0.0	0.0	0.0	0.0
17	pain_18.nidm-1	0.207505	0.0	0.000324	0.0	0.094171	0.11225	0.0
18	pain_19.nidm-1	0.0	0.104601	0.0	0.155254	0.131266	0.0	0.0
19	pain_20.nidm-1	0.0	0.0	0.0	0.113222	0.086047	0.0	0.0
20	pain_21.nidm-1	0.11048	0.0	0.211829	0.17764	0.045384	0.0	0.0

Visualise study contributions as a heatmap

A heatmap makes it easy to spot studies that dominate one or more clusters and studies that contribute little across the board. If a single row shows consistently high values, the corresponding study warrants closer scrutiny.

if contrib_df is not None and not contrib_df.empty:
    contrib_values = contrib_df.apply(pd.to_numeric, errors="coerce").fillna(0.0)
    fig, ax = plt.subplots(
        figsize=(
            max(6, len(contrib_values.columns) * 1.2),
            max(5, len(contrib_values) * 0.35),
        )
    )
    im = ax.imshow(
        contrib_values.to_numpy(dtype=float),
        aspect="auto",
        cmap="YlOrRd",
        vmin=0,
        vmax=1,
    )
    ax.set_xticks(range(len(contrib_values.columns)))
    ax.set_xticklabels(contrib_values.columns, rotation=45, ha="right", fontsize=9)
    ax.set_yticks(range(len(contrib_values)))
    ax.set_yticklabels(contrib_values.index, fontsize=7)
    ax.set_xlabel("Cluster", fontsize=11)
    ax.set_ylabel("Study", fontsize=11)
    ax.set_title("Jackknife: proportional contribution per study × cluster", fontsize=12)
    plt.colorbar(im, ax=ax, label="Contribution (0 = none, 1 = complete)")
    plt.tight_layout()
    plt.show()

    mean_contrib = contrib_values.mean(axis=1).sort_values(ascending=False)
    print("Mean contribution across all clusters (top 10):")
    print(mean_contrib.head(10).to_string())
else:
    print("No clusters found — increase N_ITERS or lower the cluster threshold.")

Jackknife: proportional contribution per study × cluster

Mean contribution across all clusters (top 10):
   0.134950
   0.081439
  0.068167
  0.062635
  0.058094
  0.051781
   0.050901
  0.048890
  0.046178
  0.043190

ResampledStability: voxelwise reproducibility under resampling

ResampledStability estimates how reliably each voxel survives thresholding when the composition of the study set changes.

For each replicate the algorithm:

Draws a subset of studies according to the chosen resampling_policy.
Refits the full estimator (and corrector, if the target image requires it) on the subset.
Thresholds the result and records a binary support map (1 = significant, 0 = not).

The final stability map is the mean binary support across all replicates. A stability of 1 means the voxel survived thresholding in every replicate; a stability of 0 means it never survived.

Three resampling policies are available:

"leave_1_out" — omit exactly one study per replicate; deterministic; generates N replicates for N studies.
"leave_k_out" — omit k studies per replicate; useful for testing sensitivity to blocks of studies.
"subsample" — random subsamples of target_n studies; flexible and recommended for large datasets (> 30 studies).

Unlike Jackknife, ResampledStability does not identify which study is responsible for instability — it only tells you where the result is stable. The two diagnostics are therefore complementary: run Jackknife first to flag influential studies, then add ResampledStability to document spatial reliability for publication.

Leave-one-out stability

"leave_1_out" is the most conservative policy: it drops exactly one study per replicate, giving N deterministic replicates. Because every study is omitted exactly once, the result is fully reproducible without a random seed. This policy is recommended for small datasets (< 25 studies) where removing a single study can substantially change the analysis.

rs_loo = ResampledStability(
    target_image=TARGET_IMAGE,
    resampling_policy="leave_1_out",
    n_cores=1,
)
result_loo = rs_loo.transform(copy.deepcopy(result))

print("Leave-one-out summary:")
print(result_loo.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"])

  0%|          | 0/21 [00:00<?, ?it/s]
 24%|██▍       | 5/21 [00:00<00:00, 45.08it/s]
 48%|████▊     | 10/21 [00:00<00:00, 44.71it/s]
 71%|███████▏  | 15/21 [00:00<00:00, 44.92it/s]
 95%|█████████▌| 20/21 [00:00<00:00, 45.12it/s]
100%|██████████| 21/21 [00:00<00:00, 45.06it/s]
Leave-one-out summary:
                                        target_image  ... random_state
0  z_desc-size_level-cluster_corr-FWE_method-mont...  ...         None

[1 rows x 6 columns]

Leave-k-out stability

"leave_k_out" omits k studies per replicate. This tests whether the result is stable under a more substantial perturbation than leaving one study out — useful when you suspect that a block of methodologically similar studies (e.g. the same lab, the same paradigm) may be jointly driving a cluster. Set n_resamples to control how many random subsets are drawn.

rs_lko = ResampledStability(
    target_image=TARGET_IMAGE,
    resampling_policy="leave_k_out",
    k=3,
    n_resamples=N_RESAMPLES,
    random_state=RANDOM_STATE,
    n_cores=1,
)
result_lko = rs_lko.transform(copy.deepcopy(result))

print(f"Leave-{rs_lko.k}-out summary:")
print(result_lko.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"])

  0%|          | 0/20 [00:00<?, ?it/s]
 25%|██▌       | 5/20 [00:00<00:00, 47.34it/s]
 50%|█████     | 10/20 [00:00<00:00, 47.40it/s]
 75%|███████▌  | 15/20 [00:00<00:00, 47.42it/s]
100%|██████████| 20/20 [00:00<00:00, 47.28it/s]
100%|██████████| 20/20 [00:00<00:00, 47.29it/s]
Leave-3-out summary:
                                        target_image  ... random_state
0  z_desc-size_level-cluster_corr-FWE_method-mont...  ...           42

[1 rows x 6 columns]

Subsample stability

"subsample" draws random subsets of target_n studies. The subsetting fraction controls the stringency: subsampling at 50 % is a harder test than 90 %. A common choice is 75–80 % of the available studies, which retains enough power to detect real effects while exposing sensitivity to individual studies. This policy is the most general and is recommended for larger datasets where exhaustive leave-k-out enumeration would be prohibitive.

n_studies = len(studyset.studies)
target_n = max(3, int(n_studies * 0.75))  # 75 % of available studies

rs_sub = ResampledStability(
    target_image=TARGET_IMAGE,
    resampling_policy="subsample",
    target_n=target_n,
    n_resamples=N_RESAMPLES,
    random_state=RANDOM_STATE,
    n_cores=1,
)
result_sub = rs_sub.transform(copy.deepcopy(result))

print(f"Subsample (n={target_n}) summary:")
print(result_sub.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"])

  0%|          | 0/20 [00:00<?, ?it/s]
 25%|██▌       | 5/20 [00:00<00:00, 46.86it/s]
 55%|█████▌    | 11/20 [00:00<00:00, 49.10it/s]
 85%|████████▌ | 17/20 [00:00<00:00, 50.06it/s]
100%|██████████| 20/20 [00:00<00:00, 49.63it/s]
Subsample (n=15) summary:
                                        target_image  ... random_state
0  z_desc-size_level-cluster_corr-FWE_method-mont...  ...           42

[1 rows x 6 columns]

Stability maps: three policies side by side

All three policies yield a voxelwise stability map on the same 0–1 scale. Comparing them reveals whether spatial reliability is preserved as the perturbation grows stronger (leave-1-out → leave-3-out → subsample 75 %). Voxels that remain stable across all three policies are the most trustworthy.

stability_key = f"{TARGET_IMAGE}_diag-ResampledStability"
configs = [
    (result_loo, "Leave-one-out"),
    (result_lko, f"Leave-{rs_lko.k}-out ({N_RESAMPLES} resamples)"),
    (result_sub, f"Subsample n={target_n} ({N_RESAMPLES} resamples)"),
]

fig, axes = plt.subplots(len(configs), 1, figsize=(14, 4 * len(configs)))
for ax, (res, title) in zip(axes, configs):
    plot_stat_map(
        res.get_map(stability_key),
        cut_coords=5,
        display_mode="z",
        title=f"Stability — {title}",
        threshold=0.1,
        vmin=0,
        vmax=1,
        cmap="hot",
        symmetric_cbar=False,
        axes=ax,
        figure=fig,
    )
fig.tight_layout()
plt.show()

Distribution of stability values

Plotting the distribution of non-zero stability values across all three policies shows how the overall robustness of the result changes as the perturbation grows. A distribution concentrated near 1 indicates a stable result; a broad or left-skewed distribution signals spatial fragility.

fig, axes = plt.subplots(1, len(configs), figsize=(14, 4), sharey=True)
for ax, (res, title) in zip(axes, configs):
    stab = res.get_map(stability_key, return_type="array")
    nonzero = stab[stab > 0]
    ax.hist(nonzero, bins=20, range=(0, 1), color="steelblue", edgecolor="white")
    ax.set_title(title, fontsize=10)
    ax.set_xlabel("Stability")
    ax.set_xlim(0, 1)
    mean_val = nonzero.mean() if len(nonzero) > 0 else 0
    ax.axvline(mean_val, color="red", linestyle="--", label=f"mean = {mean_val:.2f}")
    ax.legend(fontsize=8)
axes[0].set_ylabel("Voxel count")
fig.suptitle("Distribution of non-zero stability values across resampling policies", fontsize=13)
fig.tight_layout()
plt.show()

Distribution of non-zero stability values across resampling policies, Leave-one-out, Leave-3-out (20 resamples), Subsample n=15 (20 resamples)

Baseline, Jackknife clusters, and stability side by side

Placing the corrected z-map, the Jackknife cluster-label map, and the leave-one-out stability map on the same axial slices makes it easy to see whether the regions identified as clusters are also the regions with the highest voxelwise stability. When they agree, the result is doubly supported. When the stability map is patchy or low inside a cluster boundary, that cluster deserves more scrutiny.

fig, axes = plt.subplots(3, 1, figsize=(14, 11))

plot_stat_map(
    result.get_map(TARGET_IMAGE),
    cut_coords=5,
    display_mode="z",
    title="ALE — cluster-level FWE corrected z-map (baseline)",
    threshold=1.65,
    cmap="RdBu_r",
    symmetric_cbar=True,
    vmax=5,
    axes=axes[0],
    figure=fig,
)

if contrib_df is not None and not contrib_df.empty:
    label_key = f"label_{TARGET_IMAGE}_tail-positive"
    if label_key in result_jk.maps:
        plot_stat_map(
            result_jk.get_map(label_key),
            cut_coords=5,
            display_mode="z",
            title="Jackknife — cluster label map (colour = cluster ID)",
            threshold=0.5,
            cmap="Set1",
            symmetric_cbar=False,
            axes=axes[1],
            figure=fig,
        )
    else:
        axes[1].set_title("Jackknife label map not available")
else:
    axes[1].set_title("No clusters found for Jackknife")

plot_stat_map(
    result_loo.get_map(stability_key),
    cut_coords=5,
    display_mode="z",
    title="ResampledStability — leave-one-out voxelwise stability (0–1)",
    threshold=0.1,
    vmin=0,
    vmax=1,
    cmap="hot",
    symmetric_cbar=False,
    axes=axes[2],
    figure=fig,
)

fig.tight_layout()
plt.show()

Numerical stability summary across policies

The table below shows how many voxels survive at three stability thresholds (> 0, ≥ 0.5, ≥ 0.8) under each resampling policy. A strict threshold of 0.8 retains only the voxels that survived thresholding in at least 80 % of resamples — a reasonable bar for high-confidence reporting.

rows = []
for res, label in configs:
    stab = res.get_map(stability_key, return_type="array")
    nonzero = stab[stab > 0]
    rows.append(
        {
            "Policy": label,
            "N replicates": int(
                res.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"][
                    "n_resamples"
                ].iloc[0]
            ),
            "Stable voxels (>0)": int(len(nonzero)),
            "Stable voxels (≥0.5)": int((stab >= 0.5).sum()),
            "Stable voxels (≥0.8)": int((stab >= 0.8).sum()),
            "Mean stability (nonzero)": (
                round(float(nonzero.mean()), 3) if len(nonzero) > 0 else 0.0
            ),
        }
    )

pd.DataFrame(rows).set_index("Policy")

	N replicates	Stable voxels (>0)	Stable voxels (≥0.5)	Stable voxels (≥0.8)	Mean stability (nonzero)
Policy
Leave-one-out	21	2344	2343	2083	0.896
Leave-3-out (20 resamples)	20	2344	1894	958	0.691
Subsample n=15 (20 resamples)	20	2305	991	353	0.450

Key differences at a glance

Feature	Jackknife	ResampledStability
Question answered	Which studies drive each cluster?	How reliably does each voxel survive thresholding?
Output granularity	Study × cluster (one scalar per pair)	Voxelwise map (one value per brain voxel)
Output range	0–1 (proportional contribution)	0–1 (proportion of resamples surviving threshold)
Number of estimator refits	N (one per study)	`n_resamples`
Resampling policy	Fixed: leave-one-out	Choice: leave_1_out / leave_k_out / subsample
Works with pairwise estimators?	Yes (v 0.1.2+)	No
Null distribution rebuilt per replicate?	No (fast path for CBMA)	Depends on policy (subsample rebuilds it)
Typical compute cost	O(N) estimator fits	O(n_resamples) estimator fits
Default in CBMAWorkflow / IBMAWorkflow?	Yes	No
Primary use case	Influence and outlier detection	Spatial reliability for publication figures

When to use Jackknife

Use Jackknife whenever you want to know which studies are responsible for a significant cluster. It is the right first diagnostic in almost every meta-analysis:

It is fast — N refits where N is the study count, without rebuilding the null distribution.
It is interpretable — reviewers can cross-check high-contribution studies against the original publications.
It works with all single-sample and pairwise estimators in NiMARE.
It runs automatically inside CBMAWorkflow and IBMAWorkflow.

If any study shows a contribution > 0.8 in a cluster, inspect it carefully: unusual coordinate densities, atypical sample sizes, or duplicate peaks from the same laboratory are common culprits.

When to use ResampledStability

Use ResampledStability when you need a spatially explicit reliability map — for example to include in a supplementary figure, to compare the robustness of two estimators, or to flag voxels at the edges of clusters that may not be trustworthy.

Choose "leave_1_out" for small datasets (< 25 studies) where each study carries substantial weight.
Choose "leave_k_out" for larger datasets (> 30 studies).
Choose "subsample" for larger datasets (> 30 studies) or when you want to quantify what fraction of the result survives at a reduced sample size (e.g. target_n = int(0.75 * n_studies)).

A practical benchmark: report mean stability per cluster; flag clusters with mean stability < 0.5 under "leave_1_out" as potentially unreliable even if they survived FWE correction.

Recommended workflow

Run Jackknife (or use CBMAWorkflow which runs it automatically) to identify influential studies.
For publication, add ResampledStability with "leave_1_out" (small datasets) or "subsample" (large datasets) and include the stability map as a supplementary figure.
Interpret the two diagnostics together: a cluster that is both driven by a single study (high Jackknife contribution) and spatially unstable (low ResampledStability) should be downgraded in confidence or omitted.

References

Total running time of the script: (0 minutes 23.384 seconds)

Gallery generated by Sphinx-Gallery