.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/01_datasets/07_plot_parquet_studyset.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_01_datasets_07_plot_parquet_studyset.py: Loading Studysets from parquet. .. _parquet_studyset: =============================== Loading Studysets from parquet =============================== NiMARE can load a :class:`~nimare.nimads.Studyset` from a directory of parquet files. This table-backed format is useful for large NeuroStore Studysets because it avoids parsing a single large nested JSON file and keeps the Studyset lazy until nested Study/Analysis objects are explicitly needed. The main use case for this format will be distributed Studyset releases from https://www.neurostore.org/api/neurostore-studyset-releases/. Release archives are expected to contain the same manifest and table layout demonstrated here. .. GENERATED FROM PYTHON SOURCE LINES 18-26 .. code-block:: Python from pathlib import Path import pandas as pd from nimare.nimads import Studyset from nimare.utils import get_resource_path .. GENERATED FROM PYTHON SOURCE LINES 27-32 Find the example parquet Studyset ----------------------------------------------------------------------------- A parquet Studyset directory contains a ``studyset.json`` manifest and one parquet file per canonical Studyset table. This example uses a small packaged slice of a NeuroStore release. .. GENERATED FROM PYTHON SOURCE LINES 32-45 .. code-block:: Python parquet_dir = Path(get_resource_path()) / "neurostore_parquet_studyset" if not parquet_dir.exists(): # Support running this example directly from a source checkout before the # new packaged resource has been installed into the active environment. parquet_dir = ( Path(__file__).resolve().parents[2] / "nimare" / "resources" / "neurostore_parquet_studyset" ) print(sorted(path.name for path in parquet_dir.iterdir())) .. rst-class:: sphx-glr-script-out .. code-block:: none ['analyses.parquet', 'annotations.parquet', 'coordinates.parquet', 'images.parquet', 'metadata.parquet', 'studies.parquet', 'studyset.json', 'texts.parquet'] .. GENERATED FROM PYTHON SOURCE LINES 46-50 Inspect the manifest ----------------------------------------------------------------------------- The manifest records the Studyset id/name, schema version, annotation ids, and table filenames. .. GENERATED FROM PYTHON SOURCE LINES 50-53 .. code-block:: Python print((parquet_dir / "studyset.json").read_text()) .. rst-class:: sphx-glr-script-out .. code-block:: none { "annotations": [ { "id": "test-neurostore-annotation" } ], "format": "nimare-studyset-parquet", "id": "test-neurostore-parquet-studyset", "name": "test-neurostore-parquet-studyset", "tables": { "analyses": "analyses.parquet", "annotations": "annotations.parquet", "coordinates": "coordinates.parquet", "images": "images.parquet", "metadata": "metadata.parquet", "studies": "studies.parquet", "texts": "texts.parquet" }, "version": 1 } .. GENERATED FROM PYTHON SOURCE LINES 54-66 Inspect the parquet table shapes ----------------------------------------------------------------------------- The table layout is: - ``studies.parquet``: one row per study, with ``study_id``, ``name``, ``description``, ``authors``, and ``publication``. - ``analyses.parquet``: one row per analysis, with the full analysis ``id``. - ``coordinates.parquet``: coordinate rows keyed by analysis id. - ``metadata.parquet``: one row per analysis with metadata descriptors. - ``annotations.parquet``: one row per analysis with annotation feature columns. - ``images.parquet``: image references keyed by analysis id. - ``texts.parquet``: text fields keyed by analysis id. .. GENERATED FROM PYTHON SOURCE LINES 66-71 .. code-block:: Python for table_file in sorted(parquet_dir.glob("*.parquet")): table = pd.read_parquet(table_file) print(f"{table_file.name}: {table.shape}") .. rst-class:: sphx-glr-script-out .. code-block:: none analyses.parquet: (8, 4) annotations.parquet: (8, 502) coordinates.parquet: (47, 15) images.parquet: (8, 5) metadata.parquet: (8, 155) studies.parquet: (4, 5) texts.parquet: (8, 3) .. GENERATED FROM PYTHON SOURCE LINES 72-77 Load the Studyset ----------------------------------------------------------------------------- The constructor recognizes a parquet Studyset directory and returns a table-backed Studyset. The nested Study/Analysis object graph is not materialized during loading. .. GENERATED FROM PYTHON SOURCE LINES 77-85 .. code-block:: Python studyset = Studyset(parquet_dir) print(studyset) print(f"Studyset ID: {studyset.id}") print(f"Number of studies: {len(studyset.study_ids)}") print(f"Number of analyses: {len(studyset.ids)}") print(f"Materialized nested objects? {studyset.is_materialized}") .. rst-class:: sphx-glr-script-out .. code-block:: none Studyset: test-neurostore-parquet-studyset :: studies: 4 Studyset ID: test-neurostore-parquet-studyset Number of studies: 4 Number of analyses: 8 Materialized nested objects? False .. GENERATED FROM PYTHON SOURCE LINES 86-89 Work with table-backed views ----------------------------------------------------------------------------- The standard Studyset table views are available immediately. .. GENERATED FROM PYTHON SOURCE LINES 89-101 .. code-block:: Python print(studyset.coordinates.head()) print(studyset.metadata.head()) annotation_columns = [ column for column in studyset.annotations_df.columns if column not in {"id", "study_id", "contrast_id"} ] print(f"Annotation feature columns: {len(annotation_columns)}") print(studyset.annotations_df[["id"] + annotation_columns[:5]].head()) .. rst-class:: sphx-glr-script-out .. code-block:: none id study_id contrast_id ... value_f p value_r 0 22XctM7fX2Dw-D68jH5p6HXSj 22XctM7fX2Dw D68jH5p6HXSj ... NaN NaN NaN 1 22XctM7fX2Dw-D68jH5p6HXSj 22XctM7fX2Dw D68jH5p6HXSj ... NaN NaN NaN 2 22XctM7fX2Dw-D68jH5p6HXSj 22XctM7fX2Dw D68jH5p6HXSj ... NaN NaN NaN 3 22XctM7fX2Dw-D68jH5p6HXSj 22XctM7fX2Dw D68jH5p6HXSj ... NaN NaN NaN 4 22XctM7fX2Dw-D68jH5p6HXSj 22XctM7fX2Dw D68jH5p6HXSj ... NaN NaN NaN [5 rows x 15 columns] id study_id ... ADNI als_diagnostic_criteria 0 22XctM7fX2Dw-D68jH5p6HXSj 22XctM7fX2Dw ... NaN NaN 1 22XctM7fX2Dw-Ghcw82nz5KLD 22XctM7fX2Dw ... NaN NaN 2 22iyhNgni5Du-fhbM8khTqcVx 22iyhNgni5Du ... NaN NaN 3 22iyhNgni5Du-hzpLkdj5mGWX 22iyhNgni5Du ... NaN NaN 4 22iyhNgni5Du-nAc8LAP6RwTB 22iyhNgni5Du ... NaN NaN [5 rows x 155 columns] Annotation feature columns: 499 id ... ParticipantDemographicsExtractor.groups[0].age_median 0 22XctM7fX2Dw-D68jH5p6HXSj ... NaN 1 22XctM7fX2Dw-Ghcw82nz5KLD ... NaN 2 22iyhNgni5Du-fhbM8khTqcVx ... NaN 3 22iyhNgni5Du-hzpLkdj5mGWX ... NaN 4 22iyhNgni5Du-nAc8LAP6RwTB ... NaN [5 rows x 6 columns] .. GENERATED FROM PYTHON SOURCE LINES 102-107 Materialize only when needed ----------------------------------------------------------------------------- Accessing ``studyset.studies`` reconstructs nested Study, Analysis, and Point objects from the parquet-backed tables. Most Studyset-aware NiMARE workflows can use the table-backed views without this step. .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: Python first_study = studyset.studies[0] print(f"First study: {first_study.id}") print(f"Analyses in first study: {len(first_study.analyses)}") print(f"Materialized nested objects? {studyset.is_materialized}") .. rst-class:: sphx-glr-script-out .. code-block:: none First study: 22XctM7fX2Dw Analyses in first study: 2 Materialized nested objects? True .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.340 seconds) .. _sphx_glr_download_auto_examples_01_datasets_07_plot_parquet_studyset.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 07_plot_parquet_studyset.ipynb <07_plot_parquet_studyset.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 07_plot_parquet_studyset.py <07_plot_parquet_studyset.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 07_plot_parquet_studyset.zip <07_plot_parquet_studyset.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_