nimare.annotate.lda.LDAModel

class LDAModel(text_df, text_column='abstract', n_topics=50, n_iters=1000, alpha='auto', beta=0.001)[source]

Bases: nimare.base.NiMAREBase

Perform topic modeling using Latent Dirichlet Allocation (LDA).

Build an LDA 1 topic model with the Java toolbox MALLET 2, as performed in 3.

Parameters
  • text_df (pandas.DataFrame) – A pandas DataFrame with two columns (‘id’ and text_column) containing article text.

  • text_column (str, optional) – Name of column in text_df that contains text. Default is ‘abstract’.

  • n_topics (int, optional) – Number of topics to generate. Default=50.

  • n_iters (int, optional) – Number of iterations to run in training topic model. Default=1000.

  • alpha (float or ‘auto’, optional) – The Dirichlet prior on the per-document topic distributions. Default: auto, which calculates 50 / n_topics, based on Poldrack et al. (2012).

  • beta (float, optional) – The Dirichlet prior on the per-topic word distribution. Default: 0.001, based on Poldrack et al. (2012).

Variables

commands_ (list of str) – List of MALLET commands called to fit model.

References

1

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.

2

McCallum, Andrew Kachites. “Mallet: A machine learning for language toolkit.” (2002).

3

Poldrack, Russell A., et al. “Discovering relations between mind, brain, and mental disorders using topic mapping.” PLoS computational biology 8.10 (2012): e1002707. https://doi.org/10.1371/journal.pcbi.1002707

See also

nimare.extract.download_mallet

This function will be called automatically to download MALLET.

fit()[source]

Fit LDA model to corpus.

Variables
  • p_topic_g_doc_ (numpy.ndarray) – Probability of each topic given a document

  • p_word_g_topic_ (numpy.ndarray) – Probability of each word given a topic

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters
  • filename (str) – Name of file containing object.

  • compressed (bool, optional) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.

Returns

obj (class object) – Loaded class object.

save(filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters
  • filename (str) – File to which object will be saved.

  • compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

Examples using nimare.annotate.lda.LDAModel