nimare.annotate.gclda
.GCLDAModel
- class GCLDAModel(count_df, coordinates_df, mask='mni152_2mm', n_topics=100, n_regions=2, symmetric=True, alpha=0.1, beta=0.01, gamma=0.01, delta=1.0, dobs=25, roi_size=50.0, seed_init=1)[source]
Bases:
nimare.base.NiMAREBase
Generate a generalized correspondence latent Dirichlet allocation (GCLDA) topic model.
Changed in version 0.0.8:
[ENH] Support symmetric GC-LDA topics with more than two subregions.
- Parameters
count_df (
pandas.DataFrame
) – A DataFrame with feature counts for the model. The index is ‘id’, used for identifying studies. Other columns are features (e.g., unigrams and bigrams from Neurosynth), where each value is the number of times the feature is found in a given article.coordinates_df (
pandas.DataFrame
) – A DataFrame with a list of foci in the dataset. The index is ‘id’, used for identifying studies. Additional columns include ‘x’, ‘y’ and ‘z’ (foci in standard space).n_topics (
int
, optional) – Number of topics to generate in model. As a good rule of thumb, the number of topics should be less than the number of studies in the dataset. Otherwise, there can be errors during model training. The default is 100.n_regions (
int
, optional) – Number of subregions per topic (>=1). The default is 2.alpha (
float
, optional) – Prior count on topics for each document. The default is 0.1.beta (
float
, optional) – Prior count on word-types for each topic. The default is 0.01.gamma (
float
, optional) – Prior count added to y-counts when sampling z assignments. The default is 0.01.delta (
float
, optional) – Prior count on subregions for each topic. The default is 1.0.dobs (
int
, optional) – Spatial region ‘default observations’ (# observations weighting Sigma estimates in direction of default ‘roi_size’ value). The default is 25.roi_size (
float
, optional) – Default spatial ‘region of interest’ size (default value of diagonals in covariance matrix for spatial distribution, which the distributions are biased towards). The default is 50.0.symmetric (
bool
, optional) – Whether or not to use symmetry constraint on subregions. Symmetry requires n_regions = 2. The default is False.seed_init (
int
, optional) – Initial value of random seed. The default is 1.
- Variables
p_topic_g_voxel_ ((V x T)
numpy.ndarray
) – Probability of each topic (T) give a voxel (V)p_voxel_g_topic_ ((V x T)
numpy.ndarray
) – Probability of each voxel (V) given a topic (T)p_topic_g_word_ ((W x T)
numpy.ndarray
) – Probability of each topic (T) given a word (W)p_word_g_topic_ ((W x T)
numpy.ndarray
) – Probability of each word (W) given a topic (T)
References
Rubin, Timothy N., et al. “Decoding brain activity using a large-scale probabilistic functional-anatomical atlas of human cognition.” PLoS computational biology 13.10 (2017): e1005649. https://doi.org/10.1371/journal.pcbi.1005649
See also
nimare.decode.continuous.gclda_decode_map
GCLDA map decoding
nimare.decode.discrete.gclda_decode_roi
GCLDA ROI decoding
nimare.decode.encode.encode_gclda
GCLDA text-to-map encoding
- compute_log_likelihood(model=None, update_vectors=True)[source]
Compute log-likelihood of a model object given current model.
Computes the log-likelihood of data in any model object (either train or test) given the posterior predictive distributions over peaks and word-types for the model, using the method described in Newman et al. (2009) 1. Note that this is not computing the joint log-likelihood of model parameters and data.
- Parameters
model (
gclda.Model
, optional) – The model for which log-likelihoods will be calculated. If not provided, log-likelihood will be calculated for the current model (self).update_vectors (
bool
, optional) – Whether to update model’s log-likelihood vectors or not.
- Returns
References
- 1
Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2009). Distributed algorithms for topic models. Journal of Machine Learning Research, 10(Aug), 1801-1828.
- fit(n_iters=10000, loglikely_freq=10)[source]
Run multiple iterations.
Changed in version 0.0.8: [ENH] Remove
verbose
parameter.
- get_probability_distributions()[source]
Get conditional probability of selecting each voxel in the brain mask given each topic.
- Returns
p_topic_g_voxel (
numpy.ndarray
ofnumpy.float64
) – A voxel-by-topic array of conditional probabilities: p(topic|voxel). For cell ij, the value is the probability of topic j being selected given voxel i is active.p_voxel_g_topic (
numpy.ndarray
ofnumpy.float64
) – A voxel-by-topic array of conditional probabilities: p(voxel|topic). For cell ij, the value is the probability of voxel i being selected given topic j has already been selected.p_topic_g_word (
numpy.ndarray
ofnumpy.float64
) – A word-by-topic array of conditional probabilities: p(topic|word). For cell ij, the value is the probability of topic i being selected given word j is present.p_word_g_topic (
numpy.ndarray
ofnumpy.float64
) – A word-by-topic array of conditional probabilities: p(word|topic). For cell ij, the value is the probability of word j being selected given topic i has already been selected.