LDA topic modeling

This example trains a latent Dirichlet allocation model with scikit-learn using abstracts from Neurosynth.

import os

import pandas as pd

from nimare import annotate
from nimare.dataset import Dataset
from nimare.utils import get_resource_path

Load dataset with abstracts

dset = Dataset(os.path.join(get_resource_path(), "neurosynth_laird_studies.json"))

Initialize LDA model

model = annotate.lda.LDAModel(n_topics=5, max_iter=1000, text_column="abstract")

Run model

new_dset = model.fit(dset)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/nimare/envs/0.0.11/lib/python3.7/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
  warnings.warn(msg, category=FutureWarning)

View results

This DataFrame is very large, so we will only show a slice of it.

id study_id contrast_id Neurosynth_TFIDF__001 Neurosynth_TFIDF__01 Neurosynth_TFIDF__05 Neurosynth_TFIDF__10 Neurosynth_TFIDF__100 Neurosynth_TFIDF__11 Neurosynth_TFIDF__12
0 17029760-1 17029760 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
1 18760263-1 18760263 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
2 19162389-1 19162389 1 0.0 0.0 0.0 0.000000 0.0 0.176321 0.0
3 19603407-1 19603407 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
4 20197097-1 20197097 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
5 22569543-1 22569543 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
6 22659444-1 22659444 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
7 23042731-1 23042731 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
8 23702412-1 23702412 1 0.0 0.0 0.0 0.061006 0.0 0.000000 0.0
9 24681401-1 24681401 1 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0


Given that this DataFrame is very wide (many terms), we will transpose it before presenting it.

model.distributions_["p_topic_g_word_df"].T.head(10)
LDA5__1 LDA5__2 LDA5__3 LDA5__4 LDA5__5
10 0.001000 1.001110 1.000890 0.001000 0.001000
abstract 0.001000 0.001000 2.001000 0.001000 0.001000
action 0.001000 1.001501 0.001000 0.001000 1.000499
active 3.001772 0.001000 0.001000 0.001000 1.000228
addition 2.001377 0.001000 0.001000 1.002053 1.999570
additionally 0.001000 0.001000 1.001270 0.001000 1.000730
affective 0.001000 1.001370 2.001307 0.001000 3.000323
affective processes 0.001000 0.001000 0.001000 0.001000 2.001000
ale 0.001000 0.001000 0.001000 2.001000 0.001000
altered 0.001000 0.001000 1.001019 0.001000 3.000981


LDA5__1 LDA5__2 LDA5__3 LDA5__4 LDA5__5
Token
0 cortex human social literature connectivity
1 prefrontal maps functional talairach functional
2 identified cortex cbp suggest functional connectivity
3 motor functional frontal coordinate macm
4 lateral probabilistic network estimation anterior
5 published cognition parcellation likelihood estimation cognitive
6 stimulation cognitive analytic likelihood structural
7 prefrontal cortex frontal connectivity reported networks
8 active medial processes ale posterior
9 parietal new hemisphere estimation ale approaches


Total running time of the script: ( 0 minutes 3.526 seconds)

Gallery generated by Sphinx-Gallery