Download the Neurosynth or NeuroQuery databases

Download and convert the Neurosynth database (with abstracts) for analysis with NiMARE.

Warning

In August 2021, the Neurosynth database was reorganized according to a new file format. As such, the fetch_neurosynth function for NiMARE versions before 0.0.10 will not work with its default parameters. In order to download the Neurosynth database in its older format using NiMARE <= 0.0.9, do the following:

nimare.extract.fetch_neurosynth(
    url=(
        "https://github.com/neurosynth/neurosynth-data/blob/"
        "e8f27c4a9a44dbfbc0750366166ad2ba34ac72d6/current_data.tar.gz?raw=true"
    ),
)

For information about where these files will be downloaded to on your machine, see Fetching resources from the internet.

Start with the necessary imports

import os
from pprint import pprint

import nimare

Download Neurosynth

Neurosynth’s data files are stored at https://github.com/neurosynth/neurosynth-data.

out_dir = os.path.abspath("../example_data/")
os.makedirs(out_dir, exist_ok=True)

files = nimare.extract.fetch_neurosynth(
    data_dir=out_dir,
    version="7",
    overwrite=False,
    source="abstract",
    vocab="terms",
)
# Note that the files are saved to a new folder within "out_dir" named "neurosynth".
pprint(files)
neurosynth_db = files[0]

Convert Neurosynth database to NiMARE dataset file

neurosynth_dset = nimare.io.convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db["coordinates"],
    metadata_file=neurosynth_db["metadata"],
    annotations_files=neurosynth_db["features"],
)
neurosynth_dset.save(os.path.join(out_dir, "neurosynth_dataset.pkl.gz"))
print(neurosynth_dset)

Add article abstracts to dataset

This is only possible because Neurosynth uses PMIDs as study IDs.

Make sure you replace the example email address with your own.

neurosynth_dset = nimare.extract.download_abstracts(neurosynth_dset, "example@example.edu")
neurosynth_dset.save(os.path.join(out_dir, "neurosynth_dataset_with_abstracts.pkl.gz"))

Do the same with NeuroQuery

NeuroQuery’s data files are stored at https://github.com/neuroquery/neuroquery_data.

files = nimare.extract.fetch_neuroquery(
    data_dir=out_dir,
    version="1",
    overwrite=False,
    source="combined",
    vocab="neuroquery7547",
    type="tfidf",
)
# Note that the files are saved to a new folder within "out_dir" named "neuroquery".
pprint(files)
neuroquery_db = files[0]

# Note that the conversion function says "neurosynth".
# This is just for backwards compatibility.
neuroquery_dset = nimare.io.convert_neurosynth_to_dataset(
    coordinates_file=neuroquery_db["coordinates"],
    metadata_file=neuroquery_db["metadata"],
    annotations_files=neuroquery_db["features"],
)
neuroquery_dset.save(os.path.join(out_dir, "neuroquery_dataset.pkl.gz"))
print(neuroquery_dset)

# NeuroQuery also uses PMIDs as study IDs.
neuroquery_dset = nimare.extract.download_abstracts(neuroquery_dset, "example@example.edu")
neuroquery_dset.save(os.path.join(out_dir, "neuroquery_dataset_with_abstracts.pkl.gz"))

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery