Neurosynth and NeuroQuery

Neurosynth and NeuroQuery are the two largest publicly-available coordinate-based databases. NiMARE includes functions for downloading releases of each database and converting the databases to NiMARE collections for analysis.

In this example, we download and convert the Neurosynth and NeuroQuery databases for analysis with NiMARE.

For most Neurosynth term-based workflows, including the decoding examples in NiMARE, you should download only the abstract-derived term annotations by passing source="abstract" and vocab="terms". Leaving these selectors unset downloads every available annotation set for the release.

The selector keywords determine which annotation files are downloaded:

Keyword	Meaning
`source`	Text source used to generate the annotations. Neurosynth currently provides `"abstract"`.
`vocab`	Annotation vocabulary. `"terms"` selects term-level tf-idf features, while `"LDA50"`, `"LDA100"`, `"LDA200"`, and `"LDA400"` select topic-model vocabularies.
`type`	Feature representation. `"tfidf"` is used for `"terms"`, while `"weight"` is used for the LDA vocabularies.

Only the combinations below are valid for Neurosynth:

version	source	vocab	type
`3`-`5`	abstract	terms	tfidf
`6`-`7`	abstract	terms	tfidf
`6`-`7`	abstract	LDA50	weight
`6`-`7`	abstract	LDA100	weight
`6`-`7`	abstract	LDA200	weight
`6`-`7`	abstract	LDA400	weight

Warning

In August 2021, the Neurosynth database was reorganized according to a new file format. As such, the fetch_neurosynth function for NiMARE versions before 0.0.10 will not work with its default parameters. In order to download the Neurosynth database in its older format using NiMARE <= 0.0.9, do the following:

nimare.extract.fetch_neurosynth(
    url=(
        "https://github.com/neurosynth/neurosynth-data/blob/"
        "e8f27c4a9a44dbfbc0750366166ad2ba34ac72d6/current_data.tar.gz?raw=true"
    ),
)

For information about where these files will be downloaded to on your machine, see Fetching resources from the internet.

Start with the necessary imports

import os
from pprint import pprint

from nimare.extract import download_abstracts, fetch_neuroquery, fetch_neurosynth

# biopython is unnecessary here, but is required by download_abstracts.
# We import it here only to document the dependency and cause an early failure if it's missing.
import Bio  # pip install biopython

Download Neurosynth

Neurosynth’s data files are stored at https://github.com/neurosynth/neurosynth-data. For term-based workflows, use the abstract-derived term annotations instead of downloading every annotation set in the release.

out_dir = os.path.abspath("../example_data/")
os.makedirs(out_dir, exist_ok=True)

files = fetch_neurosynth(
    data_dir=out_dir,
    version="7",
    overwrite=False,
    source="abstract",
    vocab="terms",
    return_type="files",
)
# Note that the files are saved to a new folder within "out_dir" named "neurosynth".
pprint(files)

Download Neurosynth directly as a Studyset

Studysets are now the default return type. For the legacy Dataset return type, pass return_type="dataset".

neurosynth_studyset = fetch_neurosynth(
    data_dir=out_dir,
    version="7",
    overwrite=False,
    source="abstract",
    vocab="terms",
)[0]

Add article abstracts to Studyset

This is only possible because Neurosynth uses PMIDs as study IDs.

Make sure you replace the example email address with your own.

neurosynth_studyset = download_abstracts(neurosynth_studyset, "example@example.edu")
neurosynth_studyset.to_nimads(os.path.join(out_dir, "neurosynth_studyset.json"))
print(neurosynth_studyset)

Do the same with NeuroQuery

NeuroQuery’s data files are stored at https://github.com/neuroquery/neuroquery_data.

files = fetch_neuroquery(
    data_dir=out_dir,
    version="1",
    overwrite=False,
    source="combined",
    vocab="neuroquery6308",
    type="tfidf",
    return_type="files",
)
# Note that the files are saved to a new folder within "out_dir" named "neuroquery".
pprint(files)

neuroquery_studyset = fetch_neuroquery(
    data_dir=out_dir,
    version="1",
    overwrite=False,
    source="combined",
    vocab="neuroquery6308",
    type="tfidf",
)[0]

# NeuroQuery also uses PMIDs as study IDs.
neuroquery_studyset = download_abstracts(neuroquery_studyset, "example@example.edu")
neuroquery_studyset.to_nimads(os.path.join(out_dir, "neuroquery_studyset.json"))
print(neuroquery_studyset)

Gallery generated by Sphinx-Gallery