`nimare.annotate.lda`.LDAModel

class LDAModel(n_topics, max_iter=1000, alpha=None, beta=0.001, text_column='abstract')[source]

Bases: nimare.base.NiMAREBase

Generate a latent Dirichlet allocation (LDA) topic model.

This class is a light wrapper around scikit-learn tools for tokenization and LDA.

Parameters

n_topics (int) – Number of topics for topic model. This corresponds to the model’s n_components parameter. Must be an integer >= 1.
max_iter (int, optional) – Maximum number of iterations to use during model fitting. Default = 1000.
alpha (float or None, optional) – The alpha value for the model. This corresponds to the model’s doc_topic_prior parameter. Default is None, which evaluates to 1 / n_topics, as was used in 2.
beta (float or None, optional) – The beta value for the model. This corresponds to the model’s topic_word_prior parameter. If None, it evaluates to 1 / n_topics. Default is 0.001, which was used in 2.
text_column (str, optional) – The source of text to use for the model. This should correspond to an existing column in the texts attribute. Default is “abstract”.

Variables

model (LatentDirichletAllocation) –

Notes

Latent Dirichlet allocation was first developed in 1, and was first applied to neuroimaging articles in 2.

References

1: Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.
2(1,2,3): Poldrack, Russell A., et al. “Discovering relations between mind, brain, and mental disorders using topic mapping.” PLoS computational biology 8.10 (2012): e1002707. https://doi.org/10.1371/journal.pcbi.1002707

See also

CountVectorizer: Used to build a vocabulary of terms and their associated counts from texts in the self.text_column of the Dataset’s texts attribute.
LatentDirichletAllocation: Used to train the LDA model.

fit(dset)[source]

Fit the LDA topic model to text from a Dataset.

Parameters

dset (Dataset) – A Dataset with, at minimum, text available in the self.text_column column of its texts attribute.

Returns

dset (Dataset) – A new Dataset with an updated annotations attribute.

Variables

distributions_ (dict) –

A dictionary containing additional distributions produced by the model, including:

p_topic_g_word: numpy.ndarray of shape (n_topics, n_tokens) containing the topic-term weights for the model.
p_topic_g_word_df: pandas.DataFrame of shape (n_topics, n_tokens) containing the topic-term weights for the model.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters: deep (bool, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params (dict) – Parameter names mapped to their values.

classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters

filename (str) – Name of file containing object.
compressed (bool, optional) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.

Returns

obj (class object) – Loaded class object.

save(filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters

filename (str) – File to which object will be saved.
compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self

nimare.annotate.lda.LDAModel

`nimare.annotate.lda`.LDAModel