.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/02_meta-analyses/16_plot_jackknife_vs_resampled_stability.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_02_meta-analyses_16_plot_jackknife_vs_resampled_stability.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_02_meta-analyses_16_plot_jackknife_vs_resampled_stability.py:


.. _metas_jackknife_vs_resampled_stability:

=======================================================
Stability diagnostics: Jackknife vs. ResampledStability
=======================================================

Once a meta-analysis has been run and a thresholded result is in hand, the natural next
question is: *how much should we trust it?*  NiMARE provides two post-hoc diagnostics
that approach this question from complementary angles.

:class:`~nimare.diagnostics.Jackknife` asks *"which studies are responsible for each
significant cluster?"*  It loops through every study, refits the meta-analysis without
that study, and reports how much the cluster-level statistics drop — a high contribution
score means the cluster depends heavily on a single experiment.

:class:`~nimare.diagnostics.ResampledStability` asks *"how reproducibly does each brain
voxel survive thresholding when we perturb the study set?"*  It generates many resampled
versions of the dataset, refits the full pipeline on each, and averages the binary
significant/not-significant outcome across resamples — yielding a voxelwise stability map
between 0 (never significant) and 1 (always significant).

The two diagnostics measure different things and have different computational costs:

* **Jackknife** is fast (N refits for N studies), cluster-level, and study-level — ideal
  as a first robustness check built into every workflow.
* **ResampledStability** is spatially explicit, voxelwise, and policy-flexible — better
  suited for publication-quality robustness figures and large-dataset analyses.

This example runs both on the same ALE result so you can compare their outputs directly.

.. note::

   For real analyses use ``n_iters ≥ 5000`` for the FWE corrector and
   ``n_resamples ≥ 100`` for :class:`~nimare.diagnostics.ResampledStability`.
   We use small counts here purely for documentation-build speed.

.. GENERATED FROM PYTHON SOURCE LINES 41-43

Imports and constants
-----------------------------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 43-63

.. code-block:: Python

    import copy
    import os
    import warnings

    import matplotlib.pyplot as plt
    import pandas as pd
    from nilearn.plotting import plot_stat_map

    from nimare.correct import FWECorrector
    from nimare.diagnostics import Jackknife, ResampledStability
    from nimare.meta.cbma.ale import ALE
    from nimare.nimads import Studyset
    from nimare.utils import get_resource_path

    warnings.filterwarnings("ignore")

    N_ITERS = 50       # increase to ≥5000 for real analyses
    N_RESAMPLES = 20   # increase to ≥100 for real analyses
    RANDOM_STATE = 42


.. GENERATED FROM PYTHON SOURCE LINES 64-74

Load data and fit the baseline ALE meta-analysis
-----------------------------------------------------------------------------
We use the NiMARE pain dataset (21 studies, MNI 2 mm) throughout this
example.  Both diagnostics operate on an already-fitted
:class:`~nimare.results.MetaResult`, so we run ALE and apply a Monte Carlo
FWE corrector once and then reuse that result for each diagnostic.

The cluster-level corrected z-map is our primary target image — the one that
determines which voxels are "significant" and therefore which clusters the
diagnostics evaluate.

.. GENERATED FROM PYTHON SOURCE LINES 74-86

.. code-block:: Python

    studyset_file = os.path.join(get_resource_path(), "nidm_pain_studyset.json")
    studyset = Studyset(studyset_file, target="mni152_2mm")
    print(f"Number of studies: {len(studyset.studies)}")

    ale = ALE()
    result = ale.fit(studyset)

    corrector = FWECorrector(method="montecarlo", n_iters=N_ITERS, n_cores=1)
    result = corrector.transform(result)

    TARGET_IMAGE = "z_desc-size_level-cluster_corr-FWE_method-montecarlo"


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Number of studies: 21
      0%|          | 0/50 [00:00<?, ?it/s]      4%|▍         | 2/50 [00:00<00:02, 17.94it/s]      8%|▊         | 4/50 [00:00<00:02, 17.70it/s]     12%|█▏        | 6/50 [00:00<00:02, 17.60it/s]     16%|█▌        | 8/50 [00:00<00:02, 17.57it/s]     20%|██        | 10/50 [00:00<00:02, 17.61it/s]     24%|██▍       | 12/50 [00:00<00:02, 17.50it/s]     28%|██▊       | 14/50 [00:00<00:02, 17.57it/s]     32%|███▏      | 16/50 [00:00<00:01, 17.60it/s]     36%|███▌      | 18/50 [00:01<00:01, 17.65it/s]     40%|████      | 20/50 [00:01<00:01, 17.73it/s]     44%|████▍     | 22/50 [00:01<00:01, 17.76it/s]     48%|████▊     | 24/50 [00:01<00:01, 17.68it/s]     52%|█████▏    | 26/50 [00:01<00:01, 17.76it/s]     56%|█████▌    | 28/50 [00:01<00:01, 17.70it/s]     60%|██████    | 30/50 [00:01<00:01, 17.72it/s]     64%|██████▍   | 32/50 [00:01<00:01, 17.84it/s]     68%|██████▊   | 34/50 [00:01<00:00, 17.90it/s]     72%|███████▏  | 36/50 [00:02<00:00, 17.89it/s]     76%|███████▌  | 38/50 [00:02<00:00, 17.92it/s]     80%|████████  | 40/50 [00:02<00:00, 17.88it/s]     84%|████████▍ | 42/50 [00:02<00:00, 17.92it/s]     88%|████████▊ | 44/50 [00:02<00:00, 17.95it/s]     92%|█████████▏| 46/50 [00:02<00:00, 17.95it/s]     96%|█████████▌| 48/50 [00:02<00:00, 17.99it/s]    100%|██████████| 50/50 [00:02<00:00, 18.07it/s]    100%|██████████| 50/50 [00:02<00:00, 17.80it/s]


.. GENERATED FROM PYTHON SOURCE LINES 87-91

Baseline corrected result
-----------------------------------------------------------------------------
Before running any diagnostics, we visualise the cluster-level FWE-corrected
z-map.  This is the map that the diagnostics will characterise.

.. GENERATED FROM PYTHON SOURCE LINES 91-103

.. code-block:: Python

    plot_stat_map(
        result.get_map(TARGET_IMAGE),
        cut_coords=5,
        display_mode="z",
        title="ALE — cluster-level FWE corrected z-map (baseline)",
        threshold=1.65,
        cmap="RdBu_r",
        symmetric_cbar=True,
        vmax=5,
    )
    plt.show()


.. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_001.png
   :alt: 16 plot jackknife vs resampled stability
   :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 104-124

Jackknife: identifying influential studies
-----------------------------------------------------------------------------
:class:`~nimare.diagnostics.Jackknife` characterises each study's
**proportional contribution** to every significant cluster.

For each study *i* the algorithm:

1. Refits the estimator on all remaining N − 1 studies.
2. Computes the proportional reduction in the summary statistic at each
   voxel: ``1 − (stat_{−i} / stat_full)``.
3. Averages across all voxels in the cluster.

A contribution near **1** means removing study *i* substantially reduces the
cluster statistic — that study is a strong driver.  A contribution near **0**
means the cluster survives almost unchanged without it.

Because Jackknife only recomputes the summary statistic (not the full Monte
Carlo null distribution) for each N − 1 dataset, it is comparatively fast.
It is the default diagnostic in :class:`~nimare.workflows.cbma.CBMAWorkflow`
and :class:`~nimare.workflows.ibma.IBMAWorkflow`.

.. GENERATED FROM PYTHON SOURCE LINES 124-127

.. code-block:: Python

    jackknife = Jackknife(target_image=TARGET_IMAGE, n_cores=1)
    result_jk = jackknife.transform(copy.deepcopy(result))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

      0%|          | 0/21 [00:00<?, ?it/s]      5%|▍         | 1/21 [00:00<00:05,  3.62it/s]     10%|▉         | 2/21 [00:00<00:05,  3.61it/s]     14%|█▍        | 3/21 [00:00<00:04,  3.60it/s]     19%|█▉        | 4/21 [00:01<00:04,  3.60it/s]     24%|██▍       | 5/21 [00:01<00:04,  3.57it/s]     29%|██▊       | 6/21 [00:01<00:04,  3.52it/s]     33%|███▎      | 7/21 [00:01<00:03,  3.51it/s]     38%|███▊      | 8/21 [00:02<00:03,  3.53it/s]     43%|████▎     | 9/21 [00:02<00:03,  3.55it/s]     48%|████▊     | 10/21 [00:02<00:03,  3.55it/s]     52%|█████▏    | 11/21 [00:03<00:02,  3.54it/s]     57%|█████▋    | 12/21 [00:03<00:02,  3.56it/s]     62%|██████▏   | 13/21 [00:03<00:02,  3.56it/s]     67%|██████▋   | 14/21 [00:03<00:01,  3.54it/s]     71%|███████▏  | 15/21 [00:04<00:01,  3.55it/s]     76%|███████▌  | 16/21 [00:04<00:01,  3.56it/s]     81%|████████  | 17/21 [00:04<00:01,  3.57it/s]     86%|████████▌ | 18/21 [00:05<00:00,  3.57it/s]     90%|█████████ | 19/21 [00:05<00:00,  3.58it/s]     95%|█████████▌| 20/21 [00:05<00:00,  3.58it/s]    100%|██████████| 21/21 [00:05<00:00,  3.57it/s]    100%|██████████| 21/21 [00:05<00:00,  3.56it/s]


.. GENERATED FROM PYTHON SOURCE LINES 128-132

Jackknife clusters table
``````````````````````````````````````````````````````````````````````````````
The clusters table summarises the location and size of each significant
cluster.  Peak statistic and MNI centre of mass are reported for each.

.. GENERATED FROM PYTHON SOURCE LINES 132-135

.. code-block:: Python

    clust_key = f"{TARGET_IMAGE}_tab-clust"
    result_jk.tables[clust_key]


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Cluster ID</th>
          <th>X</th>
          <th>Y</th>
          <th>Z</th>
          <th>Peak Stat</th>
          <th>Cluster Size (mm3)</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>PositiveTail 1</td>
          <td>-58.0</td>
          <td>-26.0</td>
          <td>22.0</td>
          <td>2.053749</td>
          <td>1584</td>
        </tr>
        <tr>
          <th>1</th>
          <td>PositiveTail 2</td>
          <td>-32.0</td>
          <td>-62.0</td>
          <td>-38.0</td>
          <td>2.053749</td>
          <td>2632</td>
        </tr>
        <tr>
          <th>2</th>
          <td>PositiveTail 3</td>
          <td>-34.0</td>
          <td>14.0</td>
          <td>0.0</td>
          <td>2.053749</td>
          <td>1160</td>
        </tr>
        <tr>
          <th>3</th>
          <td>PositiveTail 4</td>
          <td>0.0</td>
          <td>8.0</td>
          <td>46.0</td>
          <td>2.053749</td>
          <td>5320</td>
        </tr>
        <tr>
          <th>4</th>
          <td>PositiveTail 5</td>
          <td>38.0</td>
          <td>8.0</td>
          <td>-2.0</td>
          <td>2.053749</td>
          <td>6488</td>
        </tr>
        <tr>
          <th>5</th>
          <td>PositiveTail 6</td>
          <td>54.0</td>
          <td>-26.0</td>
          <td>20.0</td>
          <td>2.053749</td>
          <td>1608</td>
        </tr>
        <tr>
          <th>6</th>
          <td>PositiveTail 7</td>
          <td>18.0</td>
          <td>-102.0</td>
          <td>-4.0</td>
          <td>0.100434</td>
          <td>480</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 136-143

Jackknife study-contribution table
``````````````````````````````````````````````````````````````````````````````
Each row is a study; each column is a cluster.  Cell values are the mean
proportional contribution of that study to that cluster.  High values
(e.g. > 0.8) flag studies whose removal would substantially weaken a cluster
— worth inspecting for outlier coordinates, inflated sample sizes, or
duplicate peaks.

.. GENERATED FROM PYTHON SOURCE LINES 143-147

.. code-block:: Python

    contrib_key = f"{TARGET_IMAGE}_diag-Jackknife_tab-counts_tail-positive"
    contrib_df = result_jk.tables.get(contrib_key)
    contrib_df


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>id</th>
          <th>PositiveTail 1</th>
          <th>PositiveTail 2</th>
          <th>PositiveTail 3</th>
          <th>PositiveTail 4</th>
          <th>PositiveTail 5</th>
          <th>PositiveTail 6</th>
          <th>PositiveTail 7</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>pain_01.nidm-1</td>
          <td>0.0</td>
          <td>0.073695</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.032443</td>
          <td>0.0</td>
          <td>0.058134</td>
        </tr>
        <tr>
          <th>1</th>
          <td>pain_02.nidm-1</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.000024</td>
          <td>0.0</td>
          <td>0.000006</td>
          <td>0.142421</td>
          <td>0.000014</td>
        </tr>
        <tr>
          <th>2</th>
          <td>pain_03.nidm-1</td>
          <td>0.0</td>
          <td>0.109131</td>
          <td>0.0</td>
          <td>0.134174</td>
          <td>0.023071</td>
          <td>0.0</td>
          <td>0.385138</td>
        </tr>
        <tr>
          <th>3</th>
          <td>pain_04.nidm-1</td>
          <td>0.0</td>
          <td>0.076218</td>
          <td>0.184233</td>
          <td>0.113124</td>
          <td>0.079341</td>
          <td>0.231365</td>
          <td>0.395319</td>
        </tr>
        <tr>
          <th>4</th>
          <td>pain_05.nidm-1</td>
          <td>0.0</td>
          <td>0.12414</td>
          <td>0.037756</td>
          <td>0.083693</td>
          <td>0.093972</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>5</th>
          <td>pain_06.nidm-1</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.000057</td>
          <td>0.039228</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>6</th>
          <td>pain_07.nidm-1</td>
          <td>0.0</td>
          <td>0.015453</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>7</th>
          <td>pain_08.nidm-1</td>
          <td>0.0</td>
          <td>0.11783</td>
          <td>0.0</td>
          <td>0.042313</td>
          <td>0.000005</td>
          <td>0.14787</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>8</th>
          <td>pain_09.nidm-1</td>
          <td>0.0</td>
          <td>0.058825</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.001745</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>9</th>
          <td>pain_10.nidm-1</td>
          <td>0.0557</td>
          <td>0.11594</td>
          <td>0.0</td>
          <td>0.020533</td>
          <td>0.075745</td>
          <td>0.139293</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>10</th>
          <td>pain_11.nidm-1</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>11</th>
          <td>pain_12.nidm-1</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.195963</td>
          <td>0.0</td>
          <td>0.053017</td>
          <td>0.120445</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>12</th>
          <td>pain_13.nidm-1</td>
          <td>0.170124</td>
          <td>0.0</td>
          <td>0.173871</td>
          <td>0.0</td>
          <td>0.120761</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>13</th>
          <td>pain_14.nidm-1</td>
          <td>0.0</td>
          <td>0.118262</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.069786</td>
          <td>0.0</td>
          <td>0.157469</td>
        </tr>
        <tr>
          <th>14</th>
          <td>pain_15.nidm-1</td>
          <td>0.136875</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.057199</td>
          <td>0.035299</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>15</th>
          <td>pain_16.nidm-1</td>
          <td>0.100759</td>
          <td>0.000326</td>
          <td>0.189962</td>
          <td>0.058444</td>
          <td>0.051459</td>
          <td>0.100128</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>16</th>
          <td>pain_17.nidm-1</td>
          <td>0.214114</td>
          <td>0.080555</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>17</th>
          <td>pain_18.nidm-1</td>
          <td>0.207505</td>
          <td>0.0</td>
          <td>0.000324</td>
          <td>0.0</td>
          <td>0.094171</td>
          <td>0.11225</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>18</th>
          <td>pain_19.nidm-1</td>
          <td>0.0</td>
          <td>0.104601</td>
          <td>0.0</td>
          <td>0.155254</td>
          <td>0.131266</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>19</th>
          <td>pain_20.nidm-1</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.0</td>
          <td>0.113222</td>
          <td>0.086047</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>20</th>
          <td>pain_21.nidm-1</td>
          <td>0.11048</td>
          <td>0.0</td>
          <td>0.211829</td>
          <td>0.17764</td>
          <td>0.045384</td>
          <td>0.0</td>
          <td>0.0</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 148-153

Visualise study contributions as a heatmap
``````````````````````````````````````````````````````````````````````````````
A heatmap makes it easy to spot studies that dominate one or more clusters
and studies that contribute little across the board.  If a single row shows
consistently high values, the corresponding study warrants closer scrutiny.

.. GENERATED FROM PYTHON SOURCE LINES 153-185

.. code-block:: Python

    if contrib_df is not None and not contrib_df.empty:
        contrib_values = contrib_df.apply(pd.to_numeric, errors="coerce").fillna(0.0)
        fig, ax = plt.subplots(
            figsize=(
                max(6, len(contrib_values.columns) * 1.2),
                max(5, len(contrib_values) * 0.35),
            )
        )
        im = ax.imshow(
            contrib_values.to_numpy(dtype=float),
            aspect="auto",
            cmap="YlOrRd",
            vmin=0,
            vmax=1,
        )
        ax.set_xticks(range(len(contrib_values.columns)))
        ax.set_xticklabels(contrib_values.columns, rotation=45, ha="right", fontsize=9)
        ax.set_yticks(range(len(contrib_values)))
        ax.set_yticklabels(contrib_values.index, fontsize=7)
        ax.set_xlabel("Cluster", fontsize=11)
        ax.set_ylabel("Study", fontsize=11)
        ax.set_title("Jackknife: proportional contribution per study × cluster", fontsize=12)
        plt.colorbar(im, ax=ax, label="Contribution (0 = none, 1 = complete)")
        plt.tight_layout()
        plt.show()

        mean_contrib = contrib_values.mean(axis=1).sort_values(ascending=False)
        print("Mean contribution across all clusters (top 10):")
        print(mean_contrib.head(10).to_string())
    else:
        print("No clusters found — increase N_ITERS or lower the cluster threshold.")


.. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_002.png
   :alt: Jackknife: proportional contribution per study × cluster
   :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Mean contribution across all clusters (top 10):
    3     0.134950
    2     0.081439
    20    0.068167
    15    0.062635
    12    0.058094
    17    0.051781
    9     0.050901
    18    0.048890
    11    0.046178
    13    0.043190


.. GENERATED FROM PYTHON SOURCE LINES 186-217

ResampledStability: voxelwise reproducibility under resampling
-----------------------------------------------------------------------------
:class:`~nimare.diagnostics.ResampledStability` estimates how reliably each
voxel survives thresholding when the composition of the study set changes.

For each replicate the algorithm:

1. Draws a subset of studies according to the chosen ``resampling_policy``.
2. Refits the full estimator (and corrector, if the target image requires it)
   on the subset.
3. Thresholds the result and records a binary support map
   (1 = significant, 0 = not).

The final stability map is the **mean binary support across all replicates**.
A stability of 1 means the voxel survived thresholding in every replicate;
a stability of 0 means it never survived.

Three resampling policies are available:

* ``"leave_1_out"`` — omit exactly one study per replicate; deterministic;
  generates N replicates for N studies.
* ``"leave_k_out"`` — omit k studies per replicate; useful for testing
  sensitivity to blocks of studies.
* ``"subsample"`` — random subsamples of ``target_n`` studies; flexible and
  recommended for large datasets (> 30 studies).

Unlike Jackknife, ResampledStability does **not** identify which study is
responsible for instability — it only tells you *where* the result is stable.
The two diagnostics are therefore complementary: run Jackknife first to flag
influential studies, then add ResampledStability to document spatial
reliability for publication.

.. GENERATED FROM PYTHON SOURCE LINES 219-226

Leave-one-out stability
``````````````````````````````````````````````````````````````````````````````
``"leave_1_out"`` is the most conservative policy: it drops exactly one study
per replicate, giving N deterministic replicates.  Because every study is
omitted exactly once, the result is fully reproducible without a random seed.
This policy is recommended for small datasets (< 25 studies) where removing a
single study can substantially change the analysis.

.. GENERATED FROM PYTHON SOURCE LINES 226-236

.. code-block:: Python

    rs_loo = ResampledStability(
        target_image=TARGET_IMAGE,
        resampling_policy="leave_1_out",
        n_cores=1,
    )
    result_loo = rs_loo.transform(copy.deepcopy(result))

    print("Leave-one-out summary:")
    print(result_loo.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

      0%|          | 0/21 [00:00<?, ?it/s]     24%|██▍       | 5/21 [00:00<00:00, 45.08it/s]     48%|████▊     | 10/21 [00:00<00:00, 44.71it/s]     71%|███████▏  | 15/21 [00:00<00:00, 44.92it/s]     95%|█████████▌| 20/21 [00:00<00:00, 45.12it/s]    100%|██████████| 21/21 [00:00<00:00, 45.06it/s]
    Leave-one-out summary:
                                            target_image  ... random_state
    0  z_desc-size_level-cluster_corr-FWE_method-mont...  ...         None

    [1 rows x 6 columns]


.. GENERATED FROM PYTHON SOURCE LINES 237-244

Leave-k-out stability
``````````````````````````````````````````````````````````````````````````````
``"leave_k_out"`` omits k studies per replicate.  This tests whether the
result is stable under a more substantial perturbation than leaving one study
out — useful when you suspect that a *block* of methodologically similar
studies (e.g. the same lab, the same paradigm) may be jointly driving a
cluster.  Set ``n_resamples`` to control how many random subsets are drawn.

.. GENERATED FROM PYTHON SOURCE LINES 244-257

.. code-block:: Python

    rs_lko = ResampledStability(
        target_image=TARGET_IMAGE,
        resampling_policy="leave_k_out",
        k=3,
        n_resamples=N_RESAMPLES,
        random_state=RANDOM_STATE,
        n_cores=1,
    )
    result_lko = rs_lko.transform(copy.deepcopy(result))

    print(f"Leave-{rs_lko.k}-out summary:")
    print(result_lko.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

      0%|          | 0/20 [00:00<?, ?it/s]     25%|██▌       | 5/20 [00:00<00:00, 47.34it/s]     50%|█████     | 10/20 [00:00<00:00, 47.40it/s]     75%|███████▌  | 15/20 [00:00<00:00, 47.42it/s]    100%|██████████| 20/20 [00:00<00:00, 47.28it/s]    100%|██████████| 20/20 [00:00<00:00, 47.29it/s]
    Leave-3-out summary:
                                            target_image  ... random_state
    0  z_desc-size_level-cluster_corr-FWE_method-mont...  ...           42

    [1 rows x 6 columns]


.. GENERATED FROM PYTHON SOURCE LINES 258-266

Subsample stability
``````````````````````````````````````````````````````````````````````````````
``"subsample"`` draws random subsets of ``target_n`` studies.  The subsetting
fraction controls the stringency: subsampling at 50 % is a harder test than
90 %.  A common choice is 75–80 % of the available studies, which retains
enough power to detect real effects while exposing sensitivity to individual
studies.  This policy is the most general and is recommended for larger
datasets where exhaustive leave-k-out enumeration would be prohibitive.

.. GENERATED FROM PYTHON SOURCE LINES 266-282

.. code-block:: Python

    n_studies = len(studyset.studies)
    target_n = max(3, int(n_studies * 0.75))  # 75 % of available studies

    rs_sub = ResampledStability(
        target_image=TARGET_IMAGE,
        resampling_policy="subsample",
        target_n=target_n,
        n_resamples=N_RESAMPLES,
        random_state=RANDOM_STATE,
        n_cores=1,
    )
    result_sub = rs_sub.transform(copy.deepcopy(result))

    print(f"Subsample (n={target_n}) summary:")
    print(result_sub.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

      0%|          | 0/20 [00:00<?, ?it/s]     25%|██▌       | 5/20 [00:00<00:00, 46.86it/s]     55%|█████▌    | 11/20 [00:00<00:00, 49.10it/s]     85%|████████▌ | 17/20 [00:00<00:00, 50.06it/s]    100%|██████████| 20/20 [00:00<00:00, 49.63it/s]
    Subsample (n=15) summary:
                                            target_image  ... random_state
    0  z_desc-size_level-cluster_corr-FWE_method-mont...  ...           42

    [1 rows x 6 columns]


.. GENERATED FROM PYTHON SOURCE LINES 283-289

Stability maps: three policies side by side
``````````````````````````````````````````````````````````````````````````````
All three policies yield a voxelwise stability map on the same 0–1 scale.
Comparing them reveals whether spatial reliability is preserved as the
perturbation grows stronger (leave-1-out → leave-3-out → subsample 75 %).
Voxels that remain stable across all three policies are the most trustworthy.

.. GENERATED FROM PYTHON SOURCE LINES 289-314

.. code-block:: Python

    stability_key = f"{TARGET_IMAGE}_diag-ResampledStability"
    configs = [
        (result_loo, "Leave-one-out"),
        (result_lko, f"Leave-{rs_lko.k}-out ({N_RESAMPLES} resamples)"),
        (result_sub, f"Subsample n={target_n} ({N_RESAMPLES} resamples)"),
    ]

    fig, axes = plt.subplots(len(configs), 1, figsize=(14, 4 * len(configs)))
    for ax, (res, title) in zip(axes, configs):
        plot_stat_map(
            res.get_map(stability_key),
            cut_coords=5,
            display_mode="z",
            title=f"Stability — {title}",
            threshold=0.1,
            vmin=0,
            vmax=1,
            cmap="hot",
            symmetric_cbar=False,
            axes=ax,
            figure=fig,
        )
    fig.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_003.png
   :alt: 16 plot jackknife vs resampled stability
   :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 315-321

Distribution of stability values
``````````````````````````````````````````````````````````````````````````````
Plotting the distribution of non-zero stability values across all three
policies shows how the overall robustness of the result changes as the
perturbation grows.  A distribution concentrated near 1 indicates a stable
result; a broad or left-skewed distribution signals spatial fragility.

.. GENERATED FROM PYTHON SOURCE LINES 321-337

.. code-block:: Python

    fig, axes = plt.subplots(1, len(configs), figsize=(14, 4), sharey=True)
    for ax, (res, title) in zip(axes, configs):
        stab = res.get_map(stability_key, return_type="array")
        nonzero = stab[stab > 0]
        ax.hist(nonzero, bins=20, range=(0, 1), color="steelblue", edgecolor="white")
        ax.set_title(title, fontsize=10)
        ax.set_xlabel("Stability")
        ax.set_xlim(0, 1)
        mean_val = nonzero.mean() if len(nonzero) > 0 else 0
        ax.axvline(mean_val, color="red", linestyle="--", label=f"mean = {mean_val:.2f}")
        ax.legend(fontsize=8)
    axes[0].set_ylabel("Voxel count")
    fig.suptitle("Distribution of non-zero stability values across resampling policies", fontsize=13)
    fig.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_004.png
   :alt: Distribution of non-zero stability values across resampling policies, Leave-one-out, Leave-3-out (20 resamples), Subsample n=15 (20 resamples)
   :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_004.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 338-346

Baseline, Jackknife clusters, and stability side by side
-----------------------------------------------------------------------------
Placing the corrected z-map, the Jackknife cluster-label map, and the
leave-one-out stability map on the same axial slices makes it easy to see
whether the regions identified as clusters are also the regions with the
highest voxelwise stability.  When they agree, the result is doubly
supported.  When the stability map is patchy or low inside a cluster boundary,
that cluster deserves more scrutiny.

.. GENERATED FROM PYTHON SOURCE LINES 346-397

.. code-block:: Python

    fig, axes = plt.subplots(3, 1, figsize=(14, 11))

    plot_stat_map(
        result.get_map(TARGET_IMAGE),
        cut_coords=5,
        display_mode="z",
        title="ALE — cluster-level FWE corrected z-map (baseline)",
        threshold=1.65,
        cmap="RdBu_r",
        symmetric_cbar=True,
        vmax=5,
        axes=axes[0],
        figure=fig,
    )

    if contrib_df is not None and not contrib_df.empty:
        label_key = f"label_{TARGET_IMAGE}_tail-positive"
        if label_key in result_jk.maps:
            plot_stat_map(
                result_jk.get_map(label_key),
                cut_coords=5,
                display_mode="z",
                title="Jackknife — cluster label map (colour = cluster ID)",
                threshold=0.5,
                cmap="Set1",
                symmetric_cbar=False,
                axes=axes[1],
                figure=fig,
            )
        else:
            axes[1].set_title("Jackknife label map not available")
    else:
        axes[1].set_title("No clusters found for Jackknife")

    plot_stat_map(
        result_loo.get_map(stability_key),
        cut_coords=5,
        display_mode="z",
        title="ResampledStability — leave-one-out voxelwise stability (0–1)",
        threshold=0.1,
        vmin=0,
        vmax=1,
        cmap="hot",
        symmetric_cbar=False,
        axes=axes[2],
        figure=fig,
    )

    fig.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_005.png
   :alt: Jackknife label map not available
   :srcset: /auto_examples/02_meta-analyses/images/sphx_glr_16_plot_jackknife_vs_resampled_stability_005.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 398-404

Numerical stability summary across policies
-----------------------------------------------------------------------------
The table below shows how many voxels survive at three stability thresholds
(> 0, ≥ 0.5, ≥ 0.8) under each resampling policy.  A strict threshold of 0.8
retains only the voxels that survived thresholding in at least 80 % of
resamples — a reasonable bar for high-confidence reporting.

.. GENERATED FROM PYTHON SOURCE LINES 404-427

.. code-block:: Python

    rows = []
    for res, label in configs:
        stab = res.get_map(stability_key, return_type="array")
        nonzero = stab[stab > 0]
        rows.append(
            {
                "Policy": label,
                "N replicates": int(
                    res.tables[f"{TARGET_IMAGE}_diag-ResampledStability_tab-summary"][
                        "n_resamples"
                    ].iloc[0]
                ),
                "Stable voxels (>0)": int(len(nonzero)),
                "Stable voxels (≥0.5)": int((stab >= 0.5).sum()),
                "Stable voxels (≥0.8)": int((stab >= 0.8).sum()),
                "Mean stability (nonzero)": (
                    round(float(nonzero.mean()), 3) if len(nonzero) > 0 else 0.0
                ),
            }
        )

    pd.DataFrame(rows).set_index("Policy")


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>N replicates</th>
          <th>Stable voxels (&gt;0)</th>
          <th>Stable voxels (≥0.5)</th>
          <th>Stable voxels (≥0.8)</th>
          <th>Mean stability (nonzero)</th>
        </tr>
        <tr>
          <th>Policy</th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Leave-one-out</th>
          <td>21</td>
          <td>2344</td>
          <td>2343</td>
          <td>2083</td>
          <td>0.896</td>
        </tr>
        <tr>
          <th>Leave-3-out (20 resamples)</th>
          <td>20</td>
          <td>2344</td>
          <td>1894</td>
          <td>958</td>
          <td>0.691</td>
        </tr>
        <tr>
          <th>Subsample n=15 (20 resamples)</th>
          <td>20</td>
          <td>2305</td>
          <td>991</td>
          <td>353</td>
          <td>0.450</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 428-516

Key differences at a glance
-----------------------------------------------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Feature
     - Jackknife
     - ResampledStability
   * - Question answered
     - Which studies drive each cluster?
     - How reliably does each voxel survive thresholding?
   * - Output granularity
     - Study × cluster (one scalar per pair)
     - Voxelwise map (one value per brain voxel)
   * - Output range
     - 0–1 (proportional contribution)
     - 0–1 (proportion of resamples surviving threshold)
   * - Number of estimator refits
     - N (one per study)
     - ``n_resamples``
   * - Resampling policy
     - Fixed: leave-one-out
     - Choice: leave_1_out / leave_k_out / subsample
   * - Works with pairwise estimators?
     - Yes (v 0.1.2+)
     - No
   * - Null distribution rebuilt per replicate?
     - No (fast path for CBMA)
     - Depends on policy (subsample rebuilds it)
   * - Typical compute cost
     - O(N) estimator fits
     - O(n_resamples) estimator fits
   * - Default in CBMAWorkflow / IBMAWorkflow?
     - Yes
     - No
   * - Primary use case
     - Influence and outlier detection
     - Spatial reliability for publication figures

**When to use Jackknife**

Use :class:`~nimare.diagnostics.Jackknife` whenever you want to know which
studies are responsible for a significant cluster.  It is the right first
diagnostic in almost every meta-analysis:

* It is **fast** — N refits where N is the study count, without rebuilding
  the null distribution.
* It is **interpretable** — reviewers can cross-check high-contribution
  studies against the original publications.
* It works with **all single-sample and pairwise estimators** in NiMARE.
* It runs **automatically** inside :class:`~nimare.workflows.cbma.CBMAWorkflow`
  and :class:`~nimare.workflows.ibma.IBMAWorkflow`.

If any study shows a contribution > 0.8 in a cluster, inspect it carefully:
unusual coordinate densities, atypical sample sizes, or duplicate peaks from
the same laboratory are common culprits.

**When to use ResampledStability**

Use :class:`~nimare.diagnostics.ResampledStability` when you need a
spatially explicit reliability map — for example to include in a
supplementary figure, to compare the robustness of two estimators, or to
flag voxels at the edges of clusters that may not be trustworthy.

* Choose ``"leave_1_out"`` for **small datasets** (< 25 studies) where each
  study carries substantial weight.
* Choose ``"leave_k_out"`` for **larger datasets** (> 30 studies).
* Choose ``"subsample"`` for **larger datasets** (> 30 studies) or when you
  want to quantify what fraction of the result survives at a reduced sample
  size (e.g. ``target_n = int(0.75 * n_studies)``).

A practical benchmark: report mean stability per cluster; flag clusters with
mean stability < 0.5 under ``"leave_1_out"`` as potentially unreliable even
if they survived FWE correction.

**Recommended workflow**

1. Run :class:`~nimare.diagnostics.Jackknife` (or use
   :class:`~nimare.workflows.cbma.CBMAWorkflow` which runs it automatically)
   to identify influential studies.
2. For publication, add :class:`~nimare.diagnostics.ResampledStability` with
   ``"leave_1_out"`` (small datasets) or ``"subsample"`` (large datasets) and
   include the stability map as a supplementary figure.
3. Interpret the two diagnostics together: a cluster that is both driven by a
   single study (high Jackknife contribution) *and* spatially unstable
   (low ResampledStability) should be downgraded in confidence or omitted.

.. GENERATED FROM PYTHON SOURCE LINES 518-521

References
-----------------------------------------------------------------------------
.. footbibliography::


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 23.384 seconds)


.. _sphx_glr_download_auto_examples_02_meta-analyses_16_plot_jackknife_vs_resampled_stability.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 16_plot_jackknife_vs_resampled_stability.ipynb <16_plot_jackknife_vs_resampled_stability.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 16_plot_jackknife_vs_resampled_stability.py <16_plot_jackknife_vs_resampled_stability.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 16_plot_jackknife_vs_resampled_stability.zip <16_plot_jackknife_vs_resampled_stability.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_