Note
Go to the end to download the full example code.
Neurosynth and NeuroQuery
Neurosynth and NeuroQuery are the two largest publicly-available coordinate-based databases. NiMARE includes functions for downloading releases of each database and converting the databases to NiMARE collections for analysis.
In this example, we download and convert the Neurosynth and NeuroQuery databases for analysis with NiMARE.
For most Neurosynth term-based workflows, including the decoding examples in NiMARE, you should
download only the abstract-derived term annotations by passing source="abstract" and
vocab="terms". Leaving these selectors unset downloads every available annotation set for the
release.
The selector keywords determine which annotation files are downloaded:
Keyword |
Meaning |
|---|---|
|
Text source used to generate the annotations. Neurosynth currently provides
|
|
Annotation vocabulary. |
|
Feature representation. |
Only the combinations below are valid for Neurosynth:
version |
source |
vocab |
type |
|---|---|---|---|
|
abstract |
terms |
tfidf |
|
abstract |
terms |
tfidf |
|
abstract |
LDA50 |
weight |
|
abstract |
LDA100 |
weight |
|
abstract |
LDA200 |
weight |
|
abstract |
LDA400 |
weight |
Warning
In August 2021, the Neurosynth database was reorganized according to a new file format.
As such, the fetch_neurosynth function for NiMARE versions before 0.0.10 will not work
with its default parameters.
In order to download the Neurosynth database in its older format using NiMARE <= 0.0.9,
do the following:
nimare.extract.fetch_neurosynth(
url=(
"https://github.com/neurosynth/neurosynth-data/blob/"
"e8f27c4a9a44dbfbc0750366166ad2ba34ac72d6/current_data.tar.gz?raw=true"
),
)
For information about where these files will be downloaded to on your machine, see Fetching resources from the internet.
Start with the necessary imports
import os
from pprint import pprint
from nimare.extract import download_abstracts, fetch_neuroquery, fetch_neurosynth
# biopython is unnecessary here, but is required by download_abstracts.
# We import it here only to document the dependency and cause an early failure if it's missing.
import Bio # pip install biopython
Download Neurosynth
Neurosynth’s data files are stored at https://github.com/neurosynth/neurosynth-data. For term-based workflows, use the abstract-derived term annotations instead of downloading every annotation set in the release.
out_dir = os.path.abspath("../example_data/")
os.makedirs(out_dir, exist_ok=True)
files = fetch_neurosynth(
data_dir=out_dir,
version="7",
overwrite=False,
source="abstract",
vocab="terms",
return_type="files",
)
# Note that the files are saved to a new folder within "out_dir" named "neurosynth".
pprint(files)
Download Neurosynth directly as a Studyset
Studysets are now the default return type. For the legacy Dataset return type,
pass return_type="dataset".
neurosynth_studyset = fetch_neurosynth(
data_dir=out_dir,
version="7",
overwrite=False,
source="abstract",
vocab="terms",
)[0]
Add article abstracts to Studyset
This is only possible because Neurosynth uses PMIDs as study IDs.
Make sure you replace the example email address with your own.
neurosynth_studyset = download_abstracts(neurosynth_studyset, "example@example.edu")
neurosynth_studyset.to_nimads(os.path.join(out_dir, "neurosynth_studyset.json"))
print(neurosynth_studyset)
Do the same with NeuroQuery
NeuroQuery’s data files are stored at https://github.com/neuroquery/neuroquery_data.
files = fetch_neuroquery(
data_dir=out_dir,
version="1",
overwrite=False,
source="combined",
vocab="neuroquery6308",
type="tfidf",
return_type="files",
)
# Note that the files are saved to a new folder within "out_dir" named "neuroquery".
pprint(files)
neuroquery_studyset = fetch_neuroquery(
data_dir=out_dir,
version="1",
overwrite=False,
source="combined",
vocab="neuroquery6308",
type="tfidf",
)[0]
# NeuroQuery also uses PMIDs as study IDs.
neuroquery_studyset = download_abstracts(neuroquery_studyset, "example@example.edu")
neuroquery_studyset.to_nimads(os.path.join(out_dir, "neuroquery_studyset.json"))
print(neuroquery_studyset)