`nimare.annotate.lda`.LDAModel

class LDAModel(n_topics, max_iter=1000, alpha=None, beta=0.001, text_column='abstract', n_cores=1)[source]

Bases: NiMAREBase

Generate a latent Dirichlet allocation (LDA) topic model.

This class is a light wrapper around scikit-learn tools for tokenization and LDA.

Parameters:

n_topics (int) – Number of topics for topic model. This corresponds to the model’s n_components parameter. Must be an integer >= 1.
max_iter (int, optional) – Maximum number of iterations to use during model fitting. Default = 1000.
alpha (float or None, optional) – The alpha value for the model. This corresponds to the model’s doc_topic_prior parameter. Default is None, which evaluates to 1 / n_topics, as was used in Poldrack et al.[1].
beta (float or None, optional) – The beta value for the model. This corresponds to the model’s topic_word_prior parameter. If None, it evaluates to 1 / n_topics. Default is 0.001, which was used in Poldrack et al.[1].
text_column (str, optional) – The source of text to use for the model. This should correspond to an existing column in the collection’s texts table. Default is “abstract”.
n_cores (int, optional) – Number of cores to use for parallelization. If <=0, defaults to using all available cores. Default is 1.

Variables:

model (LatentDirichletAllocation)

Notes

Latent Dirichlet allocation was first developed in Blei et al.[2], and was first applied to neuroimaging articles in Poldrack et al.[1].

References

See also

CountVectorizer: Used to build a vocabulary of terms and their associated counts from texts in the self.text_column of the input Studyset/Dataset collection’s texts attribute.
LatentDirichletAllocation: Used to train the LDA model.

Methods

`fit`(dset)	Fit the LDA topic model to text from a Studyset/Dataset collection.
`get_params`([deep])	Get parameters for this estimator.
`load`(filename[, compressed])	Load a pickled class instance from file.
`save`(filename[, compress])	Pickle the class instance to the provided file.
`set_params`(**params)	Set the parameters of this estimator.

fit(dset)[source]

Fit the LDA topic model to text from a Studyset/Dataset collection.

Parameters:

dset (Studyset or Dataset) – A Studyset-backed collection with, at minimum, text available in the self.text_column column of its texts table.

Returns:

dset – A new object with updated analysis-level annotations.

Return type:

same type as input when possible

Variables:

distributions (dict) –
A dictionary containing additional distributions produced by the model, including:
- p_topic_g_word: numpy.ndarray of shape (n_topics, n_tokens) containing the topic-term weights for the model.
- p_topic_g_word_df: pandas.DataFrame of shape (n_topics, n_tokens) containing the topic-term weights for the model.
warning:: (..) – Support for Dataset inputs is deprecated and will be removed in a future release. Prefer Studyset.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters:

filename (str) – Name of file containing object.
compressed (bool, default=True) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.

Returns:

obj – Loaded class object.

Return type:

class object

save(filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters:

filename (str) – File to which object will be saved.
compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type:: self

nimare.annotate.lda.LDAModel

`nimare.annotate.lda`.LDAModel