LDA topic modeling

Trains a latent Dirichlet allocation model with scikit-learn using abstracts from Neurosynth.

import os

import pandas as pd

from nimare import annotate
from nimare.dataset import Dataset
from nimare.utils import get_resource_path

Load dataset with abstracts

dset = Dataset(os.path.join(get_resource_path(), "neurosynth_laird_studies.json"))

Initialize LDA model

model = annotate.lda.LDAModel(n_topics=5, max_iter=1000, text_column="abstract")

Run model

new_dset = model.fit(dset)

View results

This DataFrame is very large, so we will only show a slice of it.

id study_id contrast_id Neurosynth_TFIDF__001 Neurosynth_TFIDF__01 Neurosynth_TFIDF__05 Neurosynth_TFIDF__10 Neurosynth_TFIDF__100 Neurosynth_TFIDF__11 Neurosynth_TFIDF__12
0 17029760-1 17029760 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
1 18760263-1 18760263 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
2 19162389-1 19162389 1 0.0 0.0 0.0 0.000000 0.0 0.176321 0.0
3 19603407-1 19603407 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
4 20197097-1 20197097 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
5 22569543-1 22569543 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
6 22659444-1 22659444 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
7 23042731-1 23042731 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
8 23702412-1 23702412 1 0.0 0.0 0.0 0.061006 0.0 0.000000 0.0
9 24681401-1 24681401 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0


Given that this DataFrame is very wide (many terms), we will transpose it before presenting it.

model.distributions_["p_topic_g_word_df"].T.head(10)
LDA5__1_cortex_prefrontal_prefrontal cortex LDA5__2_connectivity_functional_anterior LDA5__3_connectivity_posterior_functional connectivity LDA5__4_functional_social_human LDA5__5_10_neuropsychiatric_new
10 0.001000 0.001000 0.001 2.001000 0.001
abstract 0.001000 0.001000 0.001 2.001000 0.001
action 0.001000 1.000876 0.001 1.001124 0.001
active 2.816191 1.185809 0.001 0.001000 0.001
addition 0.001000 4.001138 0.001 1.000862 0.001
additionally 0.001000 1.000883 0.001 1.001117 0.001
affective 0.001000 3.000612 0.001 3.001388 0.001
affective processes 0.001000 2.001000 0.001 0.001000 0.001
ale 0.001000 2.001000 0.001 0.001000 0.001
altered 0.001000 3.001219 0.001 1.000781 0.001


LDA5__1_cortex_prefrontal_prefrontal cortex LDA5__2_connectivity_functional_anterior LDA5__3_connectivity_posterior_functional connectivity LDA5__4_functional_social_human LDA5__5_10_neuropsychiatric_new
Token
0 cortex connectivity connectivity functional 10
1 prefrontal functional posterior social pattern
2 prefrontal cortex anterior functional connectivity human neuropsychiatric
3 cognition macm functional cognitive new
4 parietal structural human connectivity non
5 active motor cognitive frontal nonhuman
6 lobe insula seed functions nonhuman primates
7 language approaches method maps number
8 important networks anterior cbp obtained
9 non patterns cluster network order


Total running time of the script: ( 0 minutes 4.085 seconds)

Gallery generated by Sphinx-Gallery