nimare.annotate.lda
.LDAModel
- class LDAModel(n_topics, max_iter=1000, alpha=None, beta=0.001, text_column='abstract', n_cores=1)[source]
Bases:
NiMAREBase
Generate a latent Dirichlet allocation (LDA) topic model.
This class is a light wrapper around scikit-learn tools for tokenization and LDA.
- Parameters:
n_topics (
int
) – Number of topics for topic model. This corresponds to the model’sn_components
parameter. Must be an integer >= 1.max_iter (
int
, optional) – Maximum number of iterations to use during model fitting. Default = 1000.alpha (
float
or None, optional) – Thealpha
value for the model. This corresponds to the model’sdoc_topic_prior
parameter. Default is None, which evaluates to1 / n_topics
, as was used in Poldrack et al.[1].beta (
float
or None, optional) – Thebeta
value for the model. This corresponds to the model’stopic_word_prior
parameter. If None, it evaluates to1 / n_topics
. Default is 0.001, which was used in Poldrack et al.[1].text_column (
str
, optional) – The source of text to use for the model. This should correspond to an existing column in thetexts
attribute. Default is “abstract”.n_cores (
int
, optional) – Number of cores to use for parallelization. If <=0, defaults to using all available cores. Default is 1.
- Variables:
model (
LatentDirichletAllocation
)
Notes
Latent Dirichlet allocation was first developed in Blei et al.[2], and was first applied to neuroimaging articles in Poldrack et al.[1].
References
See also
CountVectorizer
Used to build a vocabulary of terms and their associated counts from texts in the
self.text_column
of the Dataset’stexts
attribute.LatentDirichletAllocation
Used to train the LDA model.
Methods
fit
(dset)Fit the LDA topic model to text from a Dataset.
get_params
([deep])Get parameters for this estimator.
load
(filename[, compressed])Load a pickled class instance from file.
save
(filename[, compress])Pickle the class instance to the provided file.
set_params
(**params)Set the parameters of this estimator.
- fit(dset)[source]
Fit the LDA topic model to text from a Dataset.
- Parameters:
dset (
Dataset
) – A Dataset with, at minimum, text available in theself.text_column
column of itstexts
attribute.- Returns:
dset – A new Dataset with an updated
annotations
attribute.- Return type:
- Variables:
distributions (
dict
) –A dictionary containing additional distributions produced by the model, including:
p_topic_g_word
:numpy.ndarray
of shape (n_topics, n_tokens) containing the topic-term weights for the model.p_topic_g_word_df
:pandas.DataFrame
of shape (n_topics, n_tokens) containing the topic-term weights for the model.
- classmethod load(filename, compressed=True)[source]
Load a pickled class instance from file.
- Parameters:
- Returns:
obj – Loaded class object.
- Return type:
class object
- set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Return type:
self