nimare.annotate.lda.LDAModel
- class LDAModel(n_topics, max_iter=1000, alpha=None, beta=0.001, text_column='abstract', n_cores=1)[source]
Bases:
NiMAREBaseGenerate a latent Dirichlet allocation (LDA) topic model.
This class is a light wrapper around scikit-learn tools for tokenization and LDA.
- Parameters:
n_topics (
int) – Number of topics for topic model. This corresponds to the model’sn_componentsparameter. Must be an integer >= 1.max_iter (
int, optional) – Maximum number of iterations to use during model fitting. Default = 1000.alpha (
floator None, optional) – Thealphavalue for the model. This corresponds to the model’sdoc_topic_priorparameter. Default is None, which evaluates to1 / n_topics, as was used in Poldrack et al.[1].beta (
floator None, optional) – Thebetavalue for the model. This corresponds to the model’stopic_word_priorparameter. If None, it evaluates to1 / n_topics. Default is 0.001, which was used in Poldrack et al.[1].text_column (
str, optional) – The source of text to use for the model. This should correspond to an existing column in the collection’stextstable. Default is “abstract”.n_cores (
int, optional) – Number of cores to use for parallelization. If <=0, defaults to using all available cores. Default is 1.
- Variables:
model (
LatentDirichletAllocation)
Notes
Latent Dirichlet allocation was first developed in Blei et al.[2], and was first applied to neuroimaging articles in Poldrack et al.[1].
References
See also
CountVectorizerUsed to build a vocabulary of terms and their associated counts from texts in the
self.text_columnof the input Studyset/Dataset collection’stextsattribute.LatentDirichletAllocationUsed to train the LDA model.
Methods
fit(dset)Fit the LDA topic model to text from a Studyset/Dataset collection.
get_params([deep])Get parameters for this estimator.
load(filename[, compressed])Load a pickled class instance from file.
save(filename[, compress])Pickle the class instance to the provided file.
set_params(**params)Set the parameters of this estimator.
- fit(dset)[source]
Fit the LDA topic model to text from a Studyset/Dataset collection.
- Parameters:
dset (
StudysetorDataset) – A Studyset-backed collection with, at minimum, text available in theself.text_columncolumn of itstextstable.- Returns:
dset – A new object with updated analysis-level annotations.
- Return type:
same type as input when possible
- Variables:
distributions (
dict) –A dictionary containing additional distributions produced by the model, including:
p_topic_g_word:numpy.ndarrayof shape (n_topics, n_tokens) containing the topic-term weights for the model.p_topic_g_word_df:pandas.DataFrameof shape (n_topics, n_tokens) containing the topic-term weights for the model.
warning:: (..) – Support for
Datasetinputs is deprecated and will be removed in a future release. PreferStudyset.
- classmethod load(filename, compressed=True)[source]
Load a pickled class instance from file.
- Parameters:
- Returns:
obj – Loaded class object.
- Return type:
class object
- set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.- Return type:
self