nimare.annotate.text¶
Text extraction tools.
Functions
download_abstracts (dataset, email) |
Download the abstracts for a list of PubMed IDs. |
generate_cooccurrence (text_df[, …]) |
Build co-occurrence matrix from documents. |
generate_counts (text_df[, text_column, tfidf]) |
Generate tf-idf weights for unigrams/bigrams derived from textual data. |
uk_to_us (text) |
Convert UK spellings to US based on a converter. |
-
download_abstracts
(dataset, email)[source]¶ Download the abstracts for a list of PubMed IDs. Uses the BioPython package.
Parameters: - dataset (
nimare.dataset.Dataset
orlist
ofstr
) – A Dataset object where IDs are in the form PMID-EXPID or a list of PubMed IDs - email (
str
) – Email address to use to call the PubMed API
Returns: dataset – Dataset with abstracts added.
Return type: nimare.dataset.Dataset
orlist
ofstr
- dataset (
-
generate_cooccurrence
(text_df, text_column='abstract', vocabulary=None, window=5)[source]¶ Build co-occurrence matrix from documents. Not the same approach as used by the GloVe model.
Parameters: - text_df ((D x 2)
pandas.DataFrame
) – A DataFrame with two columns (‘id’ and ‘text’). D = document. - vocabulary (
list
, optional) – List of words in vocabulary to extract from text. - window (
int
, optional) – Window size for cooccurrence. Words which appear within window words of one another co-occur.
Returns: df – One cooccurrence matrix per document in text_df.
Return type: (V, V, D)
pandas.Panel
- text_df ((D x 2)
-
generate_counts
(text_df, text_column='abstract', tfidf=True)[source]¶ Generate tf-idf weights for unigrams/bigrams derived from textual data.
Parameters: text_df ((D x 2) pandas.DataFrame
) – A DataFrame with two columns (‘id’ and ‘text’). D = document.Returns: weights_df – A DataFrame where the index is ‘id’ and the columns are the unigrams/bigrams derived from the data. D = document. T = term. Return type: (D x T) pandas.DataFrame
-
uk_to_us
(text)[source]¶ Convert UK spellings to US based on a converter.
english_spellings.csv: From http://www.tysto.com/uk-us-spelling-list.html
Parameters: text ( str
) –Returns: text Return type: str