nimare.annotate.topic

Automated annotation with text-derived topic models.

class BoltzmannModel(text_df, coordinates_df)[source]

Generate a deep Boltzmann machine topic model [R3153eaf72258-1].

Warning

This method is not yet implemented.

References

[R3153eaf72258-1]Monti, Ricardo, et al. “Text-mining the NeuroSynth corpus using deep Boltzmann machines.” 2016 International Workshop on Pattern Recognition in NeuroImaging (PRNI). IEEE, 2016. https://doi.org/10.1109/PRNI.2016.7552329

Methods

get_params(self[, deep]) Get parameters for this estimator.
load(filename[, compressed]) Load a pickled class instance from file.
save(self, filename[, compress]) Pickle the class instance to the provided file.
set_params(self, \*\*params) Set the parameters of this estimator.
fit  
get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters:
  • filename (str) – Name of file containing object.
  • compressed (bool, optional) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.
Returns:

obj – Loaded class object.

Return type:

class object

save(self, filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters:
  • filename (str) – File to which object will be saved.
  • compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.
set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
class GCLDAModel(count_df, coordinates_df, mask='mni152_2mm', n_topics=100, n_regions=2, symmetric=True, alpha=0.1, beta=0.01, gamma=0.01, delta=1.0, dobs=25, roi_size=50.0, seed_init=1)[source]

Generate a generalized correspondence latent Dirichlet allocation (GCLDA) [R1dd767d5a89c-1] topic model.

Parameters:
  • count_df (pandas.DataFrame) – A DataFrame with feature counts for the model. The index is ‘id’, used for identifying studies. Other columns are features (e.g., unigrams and bigrams from Neurosynth), where each value is the number of times the feature is found in a given article.
  • coordinates_df (pandas.DataFrame, optional) – A DataFrame with a list of foci in the dataset. The index is ‘id’, used for identifying studies. Additional columns include ‘x’, ‘y’ and ‘z’ (foci in standard space).
  • n_topics (int, optional) – Number of topics to generate in model. The default is 100.
  • n_regions (int, optional) – Number of subregions per topic (>=1). The default is 2.
  • alpha (float, optional) – Prior count on topics for each document. The default is 0.1.
  • beta (float, optional) – Prior count on word-types for each topic. The default is 0.01.
  • gamma (float, optional) – Prior count added to y-counts when sampling z assignments. The default is 0.01.
  • delta (float, optional) – Prior count on subregions for each topic. The default is 1.0.
  • dobs (int, optional) – Spatial region ‘default observations’ (# observations weighting Sigma estimates in direction of default ‘roi_size’ value). The default is 25.
  • roi_size (float, optional) – Default spatial ‘region of interest’ size (default value of diagonals in covariance matrix for spatial distribution, which the distributions are biased towards). The default is 50.0.
  • symmetric (bool, optional) – Whether or not to use symmetry constraint on subregions. Symmetry requires n_regions = 2. The default is False.
  • seed_init (int, optional) – Initial value of random seed. The default is 1.
p_topic_g_voxel_

Probability of each topic (T) give a voxel (V)

Type:(V x T) numpy.ndarray
p_voxel_g_topic_

Probability of each voxel (V) given a topic (T)

Type:(V x T) numpy.ndarray
p_topic_g_word_

Probability of each topic (T) given a word (W)

Type:(W x T) numpy.ndarray
p_word_g_topic_

Probability of each word (W) given a topic (T)

Type:(W x T) numpy.ndarray

References

[R1dd767d5a89c-1]Rubin, Timothy N., et al. “Decoding brain activity using a large-scale probabilistic functional-anatomical atlas of human cognition.” PLoS computational biology 13.10 (2017): e1005649. https://doi.org/10.1371/journal.pcbi.1005649

See also

nimare.decode.continuous.gclda_decode_map
GCLDA map decoding
nimare.decode.discrete.gclda_decode_roi
GCLDA ROI decoding
nimare.decode.encode.encode_gclda
GCLDA text-to-map encoding

Methods

compute_log_likelihood(self[, model, …]) Compute log-likelihood [1] of a model object given current model.
fit(self[, n_iters, loglikely_freq, verbose]) Run multiple iterations.
get_params(self[, deep]) Get parameters for this estimator.
get_probs(self) Get conditional probability of selecting each voxel in the brain mask given each topic.
load(filename[, compressed]) Load a pickled class instance from file.
save(self, filename[, compress]) Pickle the class instance to the provided file.
set_params(self, \*\*params) Set the parameters of this estimator.
compute_log_likelihood(self, model=None, update_vectors=True)[source]

Compute log-likelihood [1] of a model object given current model. Computes the log-likelihood of data in any model object (either train or test) given the posterior predictive distributions over peaks and word-types for the model. Note that this is not computing the joint log-likelihood of model parameters and data.

Parameters:
  • model (gclda.Model, optional) – The model for which log-likelihoods will be calculated. If not provided, log-likelihood will be calculated for the current model (self).
  • update_vectors (bool, optional) – Whether to update model’s log-likelihood vectors or not.
Returns:

  • x_loglikely (float) – Total log-likelihood of all peak tokens.
  • w_loglikely (float) – Total log-likelihood of all word tokens.
  • tot_loglikely (float) – Total log-likelihood of peak + word tokens.

References

[1](1, 2) Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2009). Distributed algorithms for topic models. Journal of Machine Learning Research, 10(Aug), 1801-1828.
fit(self, n_iters=10000, loglikely_freq=10, verbose=1)[source]

Run multiple iterations.

Parameters:
  • n_iters (int, optional) – Number of iterations to run. Default is 10000.
  • loglikely_freq (int, optional) – The frequency with which log-likelihood is updated. Default value is 1 (log-likelihood is updated every iteration).
  • verbose ({0, 1, 2}, optional) – Determines how much info is printed to console. 0 = none, 1 = a little, 2 = a lot. Default value is 2.
get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_probs(self)[source]

Get conditional probability of selecting each voxel in the brain mask given each topic.

Returns:
  • p_topic_g_voxel (numpy.ndarray of numpy.float64) – A voxel-by-topic array of conditional probabilities: p(topic|voxel). For cell ij, the value is the probability of topic j being selected given voxel i is active.
  • p_voxel_g_topic (numpy.ndarray of numpy.float64) – A voxel-by-topic array of conditional probabilities: p(voxel|topic). For cell ij, the value is the probability of voxel i being selected given topic j has already been selected.
  • p_topic_g_word (numpy.ndarray of numpy.float64) – A word-by-topic array of conditional probabilities: p(topic|word). For cell ij, the value is the probability of topic i being selected given word j is present.
  • p_word_g_topic (numpy.ndarray of numpy.float64) – A word-by-topic array of conditional probabilities: p(word|topic). For cell ij, the value is the probability of word j being selected given topic i has already been selected.
classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters:
  • filename (str) – Name of file containing object.
  • compressed (bool, optional) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.
Returns:

obj – Loaded class object.

Return type:

class object

save(self, filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters:
  • filename (str) – File to which object will be saved.
  • compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.
set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
class LDAModel(text_df, text_column='abstract', n_topics=50, n_iters=1000, alpha='auto', beta=0.001)[source]

Perform topic modeling using Latent Dirichlet Allocation [R6c0c7b2ff831-1] with the Java toolbox MALLET [R6c0c7b2ff831-2], as performed in [R6c0c7b2ff831-3].

Parameters:
  • text_df (pandas.DataFrame) – A pandas DataFrame with two columns (‘id’ and text_column) containing article text.
  • text_column (str, optional) – Name of column in text_df that contains text. Default is ‘abstract’.
  • n_topics (int, optional) – Number of topics to generate. Default=50.
  • n_iters (int, optional) – Number of iterations to run in training topic model. Default=1000.
  • alpha (float or ‘auto’, optional) – The Dirichlet prior on the per-document topic distributions. Default: auto, which calculates 50 / n_topics, based on Poldrack et al. (2012).
  • beta (float, optional) – The Dirichlet prior on the per-topic word distribution. Default: 0.001, based on Poldrack et al. (2012).
commands_

List of MALLET commands called to fit model.

Type:list of str

References

[R6c0c7b2ff831-1]Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.
[R6c0c7b2ff831-2]McCallum, Andrew Kachites. “Mallet: A machine learning for language toolkit.” (2002).
[R6c0c7b2ff831-3]Poldrack, Russell A., et al. “Discovering relations between mind, brain, and mental disorders using topic mapping.” PLoS computational biology 8.10 (2012): e1002707. https://doi.org/10.1371/journal.pcbi.1002707

Methods

fit(self) Fit LDA model to corpus.
get_params(self[, deep]) Get parameters for this estimator.
load(filename[, compressed]) Load a pickled class instance from file.
save(self, filename[, compress]) Pickle the class instance to the provided file.
set_params(self, \*\*params) Set the parameters of this estimator.
fit(self)[source]

Fit LDA model to corpus.

p_topic_g_doc_

Probability of each topic given a document

Type:numpy.ndarray
p_word_g_topic_

Probability of each word given a topic

Type:numpy.ndarray
get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
classmethod load(filename, compressed=True)[source]

Load a pickled class instance from file.

Parameters:
  • filename (str) – Name of file containing object.
  • compressed (bool, optional) – If True, the file is assumed to be compressed and gzip will be used to load it. Otherwise, it will assume that the file is not compressed. Default = True.
Returns:

obj – Loaded class object.

Return type:

class object

save(self, filename, compress=True)[source]

Pickle the class instance to the provided file.

Parameters:
  • filename (str) – File to which object will be saved.
  • compress (bool, optional) – If True, the file will be compressed with gzip. Otherwise, the uncompressed version will be saved. Default = True.
set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self