Note
Click here to download the full example code
Load and work with a Dataset¶
Start with the necessary imports¶
import os
import nimare
from nimare.tests.utils import get_test_data_path
Datasets are stored as jsons or pkl[.gz] files¶
Json files are used to create Datasets, while generated Datasets are saved to, and loaded from, pkl[.gz] files. We use jsons because they are easy to edit, and thus build by hand, if necessary. We then store the generated Datasets as pkl.gz files because an initialized Dataset is no longer a dictionary.
# Let's start by downloading a dataset
dset_dir = nimare.extract.download_nidm_pain()
# Now we can load and save the Dataset object
dset_file = os.path.join(get_test_data_path(), 'nidm_pain_dset.json')
dset = nimare.dataset.Dataset(dset_file, target='mni152_2mm', mask=None)
dset.save('pain_dset.pkl')
dset = nimare.dataset.Dataset.load('pain_dset.pkl')
os.remove('pain_dset.pkl') # cleanup
Out:
Dataset created in /home/docs/.nimare/nidm_21pain
Much of the data in Datasets is stored as DataFrames¶
The five DataFrames in Dataset are “coordinates” (reported peaks), “images” (statistical maps), “metadata”, “texts”, and “annotations” (labels).
print('Coordinates:')
print(dset.coordinates.head())
print('Images:')
print(dset.images.head())
print('Metadata:')
print(dset.metadata.head())
print('Texts:')
print(dset.texts.head())
print('Annotations:')
print(dset.annotations.head())
Out:
Coordinates:
id study_id contrast_id x ... space i j k
0 pain_01.nidm-1 pain_01.nidm 1 48.0 ... mni152_2mm 21 44 24
1 pain_01.nidm-1 pain_01.nidm 1 54.0 ... mni152_2mm 18 40 23
2 pain_01.nidm-1 pain_01.nidm 1 60.0 ... mni152_2mm 15 48 22
3 pain_01.nidm-1 pain_01.nidm 1 60.0 ... mni152_2mm 15 34 31
4 pain_01.nidm-1 pain_01.nidm 1 38.0 ... mni152_2mm 26 86 39
[5 rows x 10 columns]
Images:
id study_id ... t__relative z__relative
0 pain_01.nidm-1 pain_01.nidm ... pain_01.nidm/TStatistic.nii.gz None
1 pain_02.nidm-1 pain_02.nidm ... pain_02.nidm/TStatistic.nii.gz None
2 pain_03.nidm-1 pain_03.nidm ... pain_03.nidm/TStatistic.nii.gz None
3 pain_04.nidm-1 pain_04.nidm ... pain_04.nidm/TStatistic.nii.gz None
4 pain_05.nidm-1 pain_05.nidm ... pain_05.nidm/TStatistic.nii.gz None
[5 rows x 8 columns]
Metadata:
id study_id contrast_id sample_sizes
0 pain_01.nidm-1 pain_01.nidm 1 [25]
1 pain_02.nidm-1 pain_02.nidm 1 [25]
2 pain_03.nidm-1 pain_03.nidm 1 [20]
3 pain_04.nidm-1 pain_04.nidm 1 [20]
4 pain_05.nidm-1 pain_05.nidm 1 [9]
Texts:
id study_id contrast_id
0 pain_01.nidm-1 pain_01.nidm 1
1 pain_02.nidm-1 pain_02.nidm 1
2 pain_03.nidm-1 pain_03.nidm 1
3 pain_04.nidm-1 pain_04.nidm 1
4 pain_05.nidm-1 pain_05.nidm 1
Annotations:
id study_id contrast_id
0 pain_01.nidm-1 pain_01.nidm 1
1 pain_02.nidm-1 pain_02.nidm 1
2 pain_03.nidm-1 pain_03.nidm 1
3 pain_04.nidm-1 pain_04.nidm 1
4 pain_05.nidm-1 pain_05.nidm 1
There are a handful of other important Dataset attributes¶
print('Study identifiers: {}'.format(dset.ids))
print('Masker: {}'.format(dset.masker))
print('Template space: {}'.format(dset.space))
Out:
Study identifiers: ['pain_01.nidm-1' 'pain_02.nidm-1' 'pain_03.nidm-1' 'pain_04.nidm-1'
'pain_05.nidm-1' 'pain_06.nidm-1' 'pain_07.nidm-1' 'pain_08.nidm-1'
'pain_09.nidm-1' 'pain_10.nidm-1' 'pain_11.nidm-1' 'pain_12.nidm-1'
'pain_13.nidm-1' 'pain_14.nidm-1' 'pain_15.nidm-1' 'pain_16.nidm-1'
'pain_17.nidm-1' 'pain_18.nidm-1' 'pain_19.nidm-1' 'pain_20.nidm-1'
'pain_21.nidm-1']
Masker: NiftiMasker(mask_img=<nibabel.nifti1.Nifti1Image object at 0x7f0f79937cf8>)
Template space: mni152_2mm
Statistical images are not stored internally¶
Images are not stored within the Dataset. Instead, relative paths to image files are retained in the Dataset.images attribute. When loading a Dataset, you will likely need to specify the path to the images. To do this, you can use Dataset.update_path.
dset.update_path(dset_dir)
print(dset.images.head())
Out:
id ... z
0 pain_01.nidm-1 ... None
1 pain_02.nidm-1 ... None
2 pain_03.nidm-1 ... None
3 pain_04.nidm-1 ... None
4 pain_05.nidm-1 ... None
[5 rows x 12 columns]
Datasets support many search methods¶
There are get_[X]
and get_studies_by_[X]
methods for a range of
possible search criteria.
The get_[X]
methods allow you to search for specific metadata, while the
get_studies_by_[X]
methods let you search for study identifiers within
the Dataset matching criteria.
Note that the get_[X]
methods return a value for every study in the Dataset
by default, and for every requested study if the ids
argument is provided.
If a study does not have the data requested, the returned list will have
None
for that study.
z_images = dset.get_images(imtype='z')
z_images = [str(z) for z in z_images]
print('\n'.join(z_images))
Out:
None
None
None
None
None
None
None
None
None
None
/home/docs/.nimare/nidm_21pain/pain_11.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_12.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_13.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_14.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_15.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_16.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_17.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_18.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_19.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_20.nidm/ZStatistic_T001.nii.gz
/home/docs/.nimare/nidm_21pain/pain_21.nidm/ZStatistic_T001.nii.gz
Datasets can also search for studies matching criteria¶
get_studies_by_[X]
methods return a list of study identifiers matching
the criteria, such as reporting a peak coordinate near a search coordinate.
sel_studies = dset.get_studies_by_coordinate(xyz=[[0, 0, 0]], r=20)
print('\n'.join(sel_studies))
Out:
pain_03.nidm-1
pain_10.nidm-1
pain_11.nidm-1
Datasets are meant to be mostly immutable¶
While some elements of Datasets are designed to be changeable, like the paths
to image files, most elements are not.
NiMARE Estimators operate on Datasets and return new, updated Datasets.
If you want to reduce a Dataset based on a subset of the studies in the
Dataset, you need to use Dataset.slice()
.
sub_dset = dset.slice(ids=sel_studies)
print('\n'.join(sub_dset.ids))
Out:
pain_03.nidm-1
pain_10.nidm-1
pain_11.nidm-1
Total running time of the script: ( 0 minutes 13.043 seconds)