LDA topic modeling

Trains a latent Dirichlet allocation model with scikit-learn using abstracts from Neurosynth.

import os

import pandas as pd

from nimare import annotate
from nimare.dataset import Dataset
from nimare.utils import get_resource_path

Load dataset with abstracts

dset = Dataset(os.path.join(get_resource_path(), "neurosynth_laird_studies.json"))

Initialize LDA model

model = annotate.lda.LDAModel(n_topics=5, max_iter=1000, text_column="abstract")

Run model

new_dset = model.fit(dset)

View results

This DataFrame is very large, so we will only show a slice of it.

id study_id contrast_id Neurosynth_TFIDF__001 Neurosynth_TFIDF__01 Neurosynth_TFIDF__05 Neurosynth_TFIDF__10 Neurosynth_TFIDF__100 Neurosynth_TFIDF__11 Neurosynth_TFIDF__12
0 17029760-1 17029760 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
1 18760263-1 18760263 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
2 19162389-1 19162389 1 0.0 0.0 0.0 0.000000 0.0 0.176321 0.0
3 19603407-1 19603407 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
4 20197097-1 20197097 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
5 22569543-1 22569543 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
6 22659444-1 22659444 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
7 23042731-1 23042731 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
8 23702412-1 23702412 1 0.0 0.0 0.0 0.061006 0.0 0.000000 0.0
9 24681401-1 24681401 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0


Given that this DataFrame is very wide (many terms), we will transpose it before presenting it.

model.distributions_["p_topic_g_word_df"].T.head(10)
LDA5__1_motor_literature_published LDA5__2_connectivity_functional_anterior LDA5__3_connectivity_functional_cortex LDA5__4_connectivity_functional_method LDA5__5_human_functional_maps
10 0.001000 0.001000 1.000914 0.001000 1.001086
abstract 0.001000 1.000831 1.001169 0.001000 0.001000
action 0.001000 0.001000 1.000962 0.001000 1.001038
active 1.001470 0.001000 0.001000 3.000530 0.001000
addition 2.002122 0.999985 2.000892 0.001000 0.001000
additionally 0.001000 0.001000 0.001000 1.001182 1.000818
affective 0.001000 3.575280 0.001000 0.001000 2.426720
affective processes 0.001000 2.001000 0.001000 0.001000 0.001000
ale 0.001000 0.001000 0.001000 1.000934 1.001066
altered 0.001000 0.001000 0.001000 1.000634 3.001366


LDA5__1_motor_literature_published LDA5__2_connectivity_functional_anterior LDA5__3_connectivity_functional_cortex LDA5__4_connectivity_functional_method LDA5__5_human_functional_maps
Token
0 motor connectivity connectivity connectivity human
1 literature functional functional functional functional
2 published anterior cortex method maps
3 cortex social methods error connectivity
4 identified functional connectivity macm using networks
5 stimulation posterior behavioral network function
6 talairach approaches cbp control lateral
7 magnetic processes cognition active medial
8 coordinate seed dorsal specifically involved
9 use insula parcellation language cognition


Total running time of the script: ( 0 minutes 3.346 seconds)

Gallery generated by Sphinx-Gallery