nimare.annotate.lda
.LDAModel
- class LDAModel(text_df, text_column='abstract', n_topics=50, n_iters=1000, alpha='auto', beta=0.001)[source]
Bases:
nimare.base.NiMAREBase
Perform topic modeling using Latent Dirichlet Allocation (LDA).
Build an LDA 1 topic model with the Java toolbox MALLET 2, as performed in 3.
- Parameters
text_df (
pandas.DataFrame
) – A pandas DataFrame with two columns (‘id’ and text_column) containing article text.text_column (
str
, optional) – Name of column in text_df that contains text. Default is ‘abstract’.n_topics (
int
, optional) – Number of topics to generate. Default=50.n_iters (
int
, optional) – Number of iterations to run in training topic model. Default=1000.alpha (
float
or ‘auto’, optional) – The Dirichlet prior on the per-document topic distributions. Default: auto, which calculates 50 / n_topics, based on Poldrack et al. (2012).beta (
float
, optional) – The Dirichlet prior on the per-topic word distribution. Default: 0.001, based on Poldrack et al. (2012).
- Variables
commands_ (
list
ofstr
) – List of MALLET commands called to fit model.
References
- 1
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.
- 2
McCallum, Andrew Kachites. “Mallet: A machine learning for language toolkit.” (2002).
- 3
Poldrack, Russell A., et al. “Discovering relations between mind, brain, and mental disorders using topic mapping.” PLoS computational biology 8.10 (2012): e1002707. https://doi.org/10.1371/journal.pcbi.1002707
See also
nimare.extract.download_mallet
This function will be called automatically to download MALLET.
- fit()[source]
Fit LDA model to corpus.
- Variables
p_topic_g_doc_ (
numpy.ndarray
) – Probability of each topic given a documentp_word_g_topic_ (
numpy.ndarray
) – Probability of each word given a topic