The NiMARE Dataset object

This is a brief walkthrough of the Dataset class and its methods.

Start with the necessary imports

import os

from nimare.dataset import Dataset
from nimare.extract import download_nidm_pain
from nimare.transforms import ImageTransformer
from nimare.utils import get_resource_path

Datasets are stored as json or pkl[.gz] files

Json files are used to create Datasets, while generated Datasets are saved to, and loaded from, pkl[.gz] files. We use jsons because they are easy to edit, and thus build by hand, if necessary. We then store the generated Datasets as pkl.gz files because an initialized Dataset is no longer a dictionary.

# Let's start by downloading a dataset
dset_dir = download_nidm_pain()

# Now we can load and save the Dataset object
dset_file = os.path.join(get_resource_path(), "nidm_pain_dset.json")
dset = Dataset(dset_file, target="mni152_2mm", mask=None)
dset.save("pain_dset.pkl")
dset = Dataset.load("pain_dset.pkl")
os.remove("pain_dset.pkl")  # cleanup

Much of the data in Datasets is stored as DataFrames

The five DataFrames in Dataset are “coordinates” (reported peaks), “images” (statistical maps), “metadata”, “texts”, and “annotations” (labels).

Dataset.annotations contains labels describing studies

Columns include the standard identifiers and any labels. The labels may be grouped together based on label source, in which case they should be prefixed with some string followed by two underscores.

id study_id contrast_id
0 pain_01.nidm-1 pain_01.nidm 1
1 pain_02.nidm-1 pain_02.nidm 1
2 pain_03.nidm-1 pain_03.nidm 1
3 pain_04.nidm-1 pain_04.nidm 1
4 pain_05.nidm-1 pain_05.nidm 1


Dataset.coordinates contains reported peaks

Columns include the standard identifiers, as well as mm coordinates (x, y, z) and voxel indices (i, j, k) specific to the Dataset’s masker’s space.

id study_id contrast_id x y z space
0 pain_01.nidm-1 pain_01.nidm 1 48.0 -38.0 -24.0 mni152_2mm
14 pain_01.nidm-1 pain_01.nidm 1 -50.0 -42.0 -24.0 mni152_2mm
13 pain_01.nidm-1 pain_01.nidm 1 -56.0 -62.0 -6.0 mni152_2mm
12 pain_01.nidm-1 pain_01.nidm 1 -60.0 -52.0 -10.0 mni152_2mm
11 pain_01.nidm-1 pain_01.nidm 1 -2.0 -104.0 -2.0 mni152_2mm


Dataset.images contains images from studies

Columns include the standard identifiers, as well as paths to images grouped by image type (e.g., z, beta, t).

# Here we'll only show a subset of these image types to fit in the window.
columns_to_show = ["id", "study_id", "contrast_id", "beta__relative", "z__relative"]
dset.images[columns_to_show].head()
id study_id contrast_id beta__relative z__relative
0 pain_01.nidm-1 pain_01.nidm 1 pain_01.nidm/Contrast.nii.gz None
1 pain_02.nidm-1 pain_02.nidm 1 pain_02.nidm/Contrast.nii.gz None
2 pain_03.nidm-1 pain_03.nidm 1 pain_03.nidm/Contrast.nii.gz None
3 pain_04.nidm-1 pain_04.nidm 1 pain_04.nidm/Contrast.nii.gz None
4 pain_05.nidm-1 pain_05.nidm 1 pain_05.nidm/Contrast.nii.gz None


Dataset.metadata contains metadata describing studies

Columns include the standard identifiers, as well as one column for each metadata field.

id study_id contrast_id sample_sizes
0 pain_01.nidm-1 pain_01.nidm 1 [25]
1 pain_02.nidm-1 pain_02.nidm 1 [25]
2 pain_03.nidm-1 pain_03.nidm 1 [20]
3 pain_04.nidm-1 pain_04.nidm 1 [20]
4 pain_05.nidm-1 pain_05.nidm 1 [9]


Dataset.texts contains texts associated with studies

Columns include the standard identifiers, as well as one for each text type.

id study_id contrast_id
0 pain_01.nidm-1 pain_01.nidm 1
1 pain_02.nidm-1 pain_02.nidm 1
2 pain_03.nidm-1 pain_03.nidm 1
3 pain_04.nidm-1 pain_04.nidm 1
4 pain_05.nidm-1 pain_05.nidm 1


There are a handful of other important Dataset attributes

Dataset.ids contains study identifiers

Out:

array(['pain_01.nidm-1', 'pain_02.nidm-1', 'pain_03.nidm-1',
       'pain_04.nidm-1', 'pain_05.nidm-1', 'pain_06.nidm-1',
       'pain_07.nidm-1', 'pain_08.nidm-1', 'pain_09.nidm-1',
       'pain_10.nidm-1', 'pain_11.nidm-1', 'pain_12.nidm-1',
       'pain_13.nidm-1', 'pain_14.nidm-1', 'pain_15.nidm-1',
       'pain_16.nidm-1', 'pain_17.nidm-1', 'pain_18.nidm-1',
       'pain_19.nidm-1', 'pain_20.nidm-1', 'pain_21.nidm-1'], dtype=object)

Dataset.masker is a nilearn Masker object

Out:

NiftiMasker(mask_img=<nibabel.nifti1.Nifti1Image object at 0x7fba204b3410>)

Dataset.space is a string

print(f"Template space: {dset.space}")

Out:

Template space: mni152_2mm

Statistical images are not stored internally

Images are not stored within the Dataset. Instead, relative paths to image files are retained in the Dataset.images attribute. When loading a Dataset, you will likely need to specify the path to the images. To do this, you can use update_path().

dset.update_path(dset_dir)
columns_to_show = ["id", "study_id", "contrast_id", "beta", "beta__relative"]
dset.images[columns_to_show].head()
id study_id contrast_id beta beta__relative
0 pain_01.nidm-1 pain_01.nidm 1 /home/docs/.nimare/nidm_21pain/pain_01.nidm/Co... pain_01.nidm/Contrast.nii.gz
1 pain_02.nidm-1 pain_02.nidm 1 /home/docs/.nimare/nidm_21pain/pain_02.nidm/Co... pain_02.nidm/Contrast.nii.gz
2 pain_03.nidm-1 pain_03.nidm 1 /home/docs/.nimare/nidm_21pain/pain_03.nidm/Co... pain_03.nidm/Contrast.nii.gz
3 pain_04.nidm-1 pain_04.nidm 1 /home/docs/.nimare/nidm_21pain/pain_04.nidm/Co... pain_04.nidm/Contrast.nii.gz
4 pain_05.nidm-1 pain_05.nidm 1 /home/docs/.nimare/nidm_21pain/pain_05.nidm/Co... pain_05.nidm/Contrast.nii.gz


Images can also be calculated based on available files

When some images are available, but others are not, sometimes required images can be calculated from the available ones.

For example, varcope = t / beta, so if you have t-statistic images and beta images, you can also calculate varcope (variance) images.

We use the transforms module to perform these transformations (especially ImageTransformer)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/latest/lib/python3.7/site-packages/nilearn/image/resampling.py:616: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
id varcope
0 pain_01.nidm-1 /home/docs/.nimare/nidm_21pain/pain_01.nidm-1_...
1 pain_02.nidm-1 /home/docs/.nimare/nidm_21pain/pain_02.nidm-1_...
2 pain_03.nidm-1 /home/docs/.nimare/nidm_21pain/pain_03.nidm-1_...
3 pain_04.nidm-1 /home/docs/.nimare/nidm_21pain/pain_04.nidm-1_...
4 pain_05.nidm-1 /home/docs/.nimare/nidm_21pain/pain_05.nidm-1_...


Datasets support many search methods

There are get_[X] and get_studies_by_[X] methods for a range of possible search criteria. The get_[X] methods allow you to search for specific metadata, while the get_studies_by_[X] methods let you search for study identifiers within the Dataset matching criteria.

Note that the get_[X] methods return a value for every study in the Dataset by default, and for every requested study if the ids argument is provided. If a study does not have the data requested, the returned list will have None for that study.

z_images = dset.get_images(imtype="z")
z_images = [str(z) for z in z_images]
print("\n".join(z_images))

Out:

None
None
None
None
None
None
None
None
None
None
/home/docs/.nimare/nidm_21pain/pain_11.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_12.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_13.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_14.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_15.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_16.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_17.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_18.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_19.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_20.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_21.nidm/ZStatistic_T001.nii.gz

Let’s try to fill in missing z images

z_transformer = ImageTransformer(target="z")
dset = z_transformer.transform(dset)
z_images = dset.get_images(imtype="z")
z_images = [str(z) for z in z_images]
print("\n".join(z_images))

Out:

/home/docs/.nimare/nidm_21pain/pain_01.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_02.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_03.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_04.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_05.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_06.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_07.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_08.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_09.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_10.nidm-1_2.0x2.0x2.0_z.nii.gz
/home/docs/.nimare/nidm_21pain/pain_11.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_12.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_13.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_14.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_15.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_16.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_17.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_18.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_19.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_20.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_21.nidm/ZStatistic_T001.nii.gz

Datasets can also search for studies matching criteria

get_studies_by_[X] methods return a list of study identifiers matching the criteria, such as reporting a peak coordinate near a search coordinate.

sel_studies = dset.get_studies_by_coordinate(xyz=[[0, 0, 0]], r=20)
print("\n".join(sel_studies))

Out:

pain_03.nidm-1
pain_10.nidm-1
pain_11.nidm-1

Datasets are meant to be mostly immutable

While some elements of Datasets are designed to be changeable, like the paths to image files, most elements are not. NiMARE Estimators operate on Datasets and return new, updated Datasets. If you want to reduce a Dataset based on a subset of the studies in the Dataset, you need to use slice().

sub_dset = dset.slice(ids=sel_studies)
print("\n".join(sub_dset.ids))

Out:

pain_03.nidm-1
pain_10.nidm-1
pain_11.nidm-1

Total running time of the script: ( 1 minutes 5.112 seconds)

Gallery generated by Sphinx-Gallery