`nimare.dataset`.Dataset

class Dataset(source, target='mni152_2mm', mask=None)[source]

Bases: nimare.base.NiMAREBase

Storage container for a coordinate- and/or image-based meta-analytic dataset/database.

Changed in version 0.0.9:

[ENH] Add merge method to Dataset class

Changed in version 0.0.8:

[FIX] Set nimare.dataset.Dataset.basepath in update_path() using absolute path.

Parameters

source (str or dict) – JSON file containing dictionary with database information or the dict() object
target (str, optional) – Desired coordinate space for coordinates. Names follow NIDM convention. Default is ‘mni152_2mm’ (MNI space with 2x2x2 voxels). This parameter has no impact on images.
mask (str, nibabel.nifti1.Nifti1Image, nilearn.input_data.NiftiMasker or similar, or None, optional) – Mask(er) to use. If None, uses the target space image, with all non-zero voxels included in the mask.

Variables

ids (1D numpy.ndarray) – Identifiers
masker (nilearn.input_data.NiftiMasker or similar) – Masker object defining the space and location of the area of interest (e.g., ‘brain’).
space (str) – Standard space. Same as target parameter.
annotations (pandas.DataFrame) – Labels describing studies
coordinates (pandas.DataFrame) – Peak coordinates from studies
images (pandas.DataFrame) – Images from studies
metadata (pandas.DataFrame) – Metadata describing studies
texts (pandas.DataFrame) – Texts associated with studies

Notes

Images loaded into a Dataset are assumed to be in the same space. If images have different resolutions or affines from the Dataset’s masker, then they will be resampled automatically, at the point where they’re used, by Dataset.masker.

property annotations

Labels describing studies in the dataset.

Each study/experiment has its own row. Columns correspond to individual labels (e.g., ‘emotion’), and may be prefixed with a feature group including two underscores (e.g., ‘Neurosynth_TFIDF__emotion’).

Type: pandas.DataFrame

property coordinates

Coordinates in the dataset.

Changed in version 0.0.10: The coordinates attribute no longer includes the associated matrix indices (columns ‘i’, ‘j’, and ‘k’). These columns are calculated as needed.

Each study has one row for each peak. Columns include [‘x’, ‘y’, ‘z’] (peak locations in mm) and ‘space’ (Dataset’s space).

Type: pandas.DataFrame

copy()[source]: Create a copy of the Dataset.

get(dict_, drop_invalid=True)[source]

Retrieve files and/or metadata from the current Dataset.

Parameters

dict_ (dict) – Dictionary specifying images or metadata to collect. Keys should be variables to be used as keys for results dictionary. Values should be tuples with two values: type (e.g., ‘image’ or ‘metadata’) and specific field corresponding to column of type-specific DataFrame (e.g., ‘z’ or ‘sample_sizes’).
drop_invalid (bool, optional) – Whether to automatically ignore any studies without the required data or not. Default is False.

Returns

results (dict) – A dictionary of lists of requested data. Keys correspond to the keys in dict_.

Examples

>>> dset.get({'z_maps': ('image', 'z'), 'sample_sizes': ('metadata', 'sample_sizes')})
>>> dset.get({'coordinates': ('coordinates', None)})

get_images(ids=None, imtype=None)[source]

Get images of a certain type for a subset of studies in the dataset.

Parameters

ids (list, optional) – A list of IDs in the Dataset for which to find images. Default is None, in which case all images of requested type are returned.
imtype (str, optional) – Type of image to extract. Corresponds to column name in Dataset.images DataFrame. Default is None.

Returns

images (list) – List of images of requested type for selected IDs.

get_labels(ids=None)[source]

Extract list of labels for which studies in Dataset have annotations.

Parameters: ids (list, optional) – A list of IDs in the Dataset for which to find labels. Default is None, in which case all labels are returned.
Returns: labels (list) – List of labels for which there are annotations in the Dataset.

get_metadata(ids=None, field=None)[source]

Get metadata from Dataset.

Parameters

ids (list, optional) – A list of IDs in the Dataset for which to find metadata. Default is None, in which case all metadata of requested type are returned.
field (str, optional) – Metadata field to extract. Corresponds to column name in Dataset.metadata DataFrame. Default is None.

Returns

metadata (list) – List of values of requested type for selected IDs.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters: deep (bool, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params (dict) – Parameter names mapped to their values.

get_studies_by_coordinate(xyz, r=20)[source]

Extract list of studies with at least one focus within radius of requested coordinates.

Parameters

xyz ((X x 3) array_like) – List of coordinates against which to find studies.
r (float, optional) – Radius (in mm) within which to find studies. Default is 20mm.

Returns

found_ids (list) – A list of IDs from the Dataset with at least one focus within radius r of requested coordinates.

get_studies_by_label(labels=None, label_threshold=0.001)[source]

Extract list of studies with a given label.

Changed in version 0.0.10: Fix bug in which all IDs were returned when a label wasn’t present in the Dataset.

Changed in version 0.0.9: Default value for label_threshold changed to 0.001.

Parameters

labels (list, optional) – List of labels to use to search Dataset. If a contrast has all of the labels above the threshold, it will be returned. Default is None.
label_threshold (float, optional) – Default is 0.5.

Returns

found_ids (list) – A list of IDs from the Dataset found by the search criteria.

get_studies_by_mask(mask)[source]

Extract list of studies with at least one coordinate in mask.

Parameters: mask (img_like) – Mask across which to search for coordinates.
Returns: found_ids (list) – A list of IDs from the Dataset with at least one focus in the mask.

get_texts(ids=None, text_type=None)[source]

Extract list of texts of a given type for selected IDs.

Parameters

ids (list, optional) – A list of IDs in the Dataset for which to find texts. Default is None, in which case all texts of requested type are returned.
text_type (str, optional) – Type of text to extract. Corresponds to column name in Dataset.texts DataFrame. Default is None.

Returns

texts (list) – List of texts of requested type for selected IDs.

property ids

1D array of identifiers in Dataset.

The associated setter for this property is private, as Dataset.ids is immutable.

Type: numpy.ndarray

property images

Images in the dataset.

Each image type has its own column (e.g., ‘z’) with absolute paths to files and each study has its own row. Additionally, relative paths to image files are stored in columns with the suffix ‘__relative’ (e.g., ‘z__relative’).

Warning

Images are assumed to be in the same space, although they may have different resolutions and affines. Images will be resampled as needed at the point where they are used, via Dataset.masker.

Type: pandas.DataFrame

classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters

filename (str) – Name of file containing object.
compressed (bool, optional) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.

Returns

obj (class object) – Loaded class object.

property masker

Masker object.

Defines the space and location of the area of interest (e.g., ‘brain’).

Type: nilearn.input_data.NiftiMasker or similar

merge(right)[source]

Merge two Datasets.

New in version 0.0.9.

Parameters: right (nimare.dataset.Dataset) – Dataset to merge with.
Returns: nimare.dataset.Dataset – A Dataset of the two merged Datasets.

property metadata

Metadata describing studies in the dataset.

Each metadata field has its own column (e.g., ‘sample_sizes’) and each study has its own row.

Type: pandas.DataFrame

save(filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters

filename (str) – File to which object will be saved.
compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self

slice(ids)[source]

Create a new dataset with only requested IDs.

Parameters: ids (array_like) – List of study IDs to include in new dataset
Returns: new_dset (nimare.dataset.Dataset) – Reduced Dataset containing only requested studies.

property texts

Texts in the dataset.

Each text type has its own column (e.g., ‘abstract’) and each study has its own row.

Type: pandas.DataFrame

update_path(new_path)[source]

Update paths to images.

Prepends new path to the relative path for files in Dataset.images.

Parameters: new_path (str) – Path to prepend to relative paths of files in Dataset.images.

Examples using `nimare.dataset.Dataset`

nimare.dataset.Dataset

Examples using nimare.dataset.Dataset

`nimare.dataset`.Dataset

Examples using `nimare.dataset.Dataset`