{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial 01 — Quickstart\n",
    "\n",
    "This notebook walks through a minimal Ageas workflow on synthetic data:\n",
    "1. Generate a synthetic AnnData with informative and noise genes\n",
    "2. Wrap it in a `Multimodal_Corpus` (the dataset object Ageas consumes)\n",
    "3. Load a model hangar from a config folder\n",
    "4. Run `n_kfold_selection` to pick the best classifiers\n",
    "5. Predict cell-type probabilities on the corpus\n",
    "6. Call `Deck.debrief()` to obtain per-class feature importances\n",
    "\n",
    "The synthetic data has 2 classes and 20 features (2 informative, rest noise),\n",
    "so the whole notebook runs in under a minute on CPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import shutil\n",
    "import warnings\n",
    "\n",
    "import pandas as pd\n",
    "from sklearn.metrics import accuracy_score, roc_auc_score\n",
    "\n",
    "from ageas import Hangar, n_kfold_selection\n",
    "from ageas.tool import Multimodal_Corpus, make_fake_adata\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Synthetic data\n",
    "\n",
    "`make_fake_adata` creates an AnnData with:\n",
    "- `obs['celltype']` — integer cell-type labels\n",
    "- `var['name']` — gene symbol strings\n",
    "- 16 noise genes + 2 informative + 2 redundant by default"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Seed set to 42\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AnnData object with n_obs × n_vars = 100 × 20\n",
      "    obs: 'celltype'\n",
      "    var: 'name'\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>celltype</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>fake_cell_0</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_cell_1</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_cell_2</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_cell_3</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_cell_4</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            celltype\n",
       "fake_cell_0        0\n",
       "fake_cell_1        1\n",
       "fake_cell_2        0\n",
       "fake_cell_3        0\n",
       "fake_cell_4        0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adata = make_fake_adata(n_class=2, n_clusters_per_class=1)\n",
    "print(adata)\n",
    "adata.obs.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save to disk — Multimodal_Corpus reads from a file path.\n",
    "# In a real workflow, point this at your own .h5ad file.\n",
    "adata_path = \"ageas_tut01.h5ad\"\n",
    "adata.write_h5ad(adata_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Build corpus\n",
    "\n",
    "`Multimodal_Corpus` wraps the AnnData file and tracks the label → integer\n",
    "mapping needed by the classifiers.  It also implements PyTorch's\n",
    "`Dataset` interface so Ageas can stream batches via DataLoaders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Label map: {0: 0, 1: 1}\n",
      "Cells: 100\n"
     ]
    }
   ],
   "source": [
    "corpus = Multimodal_Corpus(\n",
    "    adata_path,\n",
    "    label_key=\"celltype\",\n",
    "    backed=False,   # load fully into RAM (fine for small data)\n",
    ")\n",
    "print(\"Label map:\", corpus.label_dict)\n",
    "print(\"Cells:\", len(corpus))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Load hangar\n",
    "\n",
    "A `Hangar` reads a folder of JSON config files, one sub-folder per model\n",
    "family (`logreg`, `svc`, `xgb`, `mlp`, `resnet`, `rnn`).  The\n",
    "`data/configs/sample_panel` directory bundled with the repo is a good\n",
    "starting point for experimentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hangar loaded 8 units\n"
     ]
    }
   ],
   "source": [
    "# Adjust this path if you're running from outside the repo root.\n",
    "config_folder = \"../../data/configs/sample_panel\"\n",
    "\n",
    "hangar = Hangar(config_folder)\n",
    "print(f\"Hangar loaded {len(hangar.units)} units\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Model selection with `n_kfold_selection`\n",
    "\n",
    "`n_kfold_selection` runs k-fold cross-validation and keeps only the models\n",
    "that clear the `retention_point` threshold.  The survivors are then\n",
    "retrained on the full dataset in a \"last mission\" pass (`skip_final=False`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.5 K  | train\n",
      "1 | blocks    | ModuleList         | 8.8 K  | train\n",
      "2 | dropout   | Dropout            | 0      | train\n",
      "3 | fc        | Linear             | 18     | train\n",
      "4 | criterion | CrossEntropyLoss   | 0      | train\n",
      "5 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "6 | f1        | MulticlassF1Score  | 0      | train\n",
      "7 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "10.3 K    Trainable params\n",
      "0         Non-trainable params\n",
      "10.3 K    Total params\n",
      "0.041     Total estimated model params size (MB)\n",
      "29        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.4 K  | train\n",
      "1 | maxpool   | MaxPool1d          | 0      | train\n",
      "2 | blocks    | ModuleList         | 2.8 K  | train\n",
      "3 | avgpool   | AdaptiveAvgPool1d  | 0      | train\n",
      "4 | dropout   | Dropout            | 0      | train\n",
      "5 | fc        | Linear             | 34     | train\n",
      "6 | criterion | CrossEntropyLoss   | 0      | train\n",
      "7 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "8 | f1        | MulticlassF1Score  | 0      | train\n",
      "9 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "4.2 K     Trainable params\n",
      "0         Non-trainable params\n",
      "4.2 K     Total params\n",
      "0.017     Total estimated model params size (MB)\n",
      "37        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.4 K  | train\n",
      "1 | maxpool   | MaxPool1d          | 0      | train\n",
      "2 | blocks    | ModuleList         | 2.5 K  | train\n",
      "3 | avgpool   | AdaptiveAvgPool1d  | 0      | train\n",
      "4 | dropout   | Dropout            | 0      | train\n",
      "5 | fc        | Linear             | 10     | train\n",
      "6 | criterion | CrossEntropyLoss   | 0      | train\n",
      "7 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "8 | f1        | MulticlassF1Score  | 0      | train\n",
      "9 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "3.9 K     Trainable params\n",
      "0         Non-trainable params\n",
      "3.9 K     Total params\n",
      "0.016     Total estimated model params size (MB)\n",
      "30        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.5 K  | train\n",
      "1 | blocks    | ModuleList         | 5.7 K  | train\n",
      "2 | dropout   | Dropout            | 0      | train\n",
      "3 | fc        | Linear             | 18     | train\n",
      "4 | criterion | CrossEntropyLoss   | 0      | train\n",
      "5 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "6 | f1        | MulticlassF1Score  | 0      | train\n",
      "7 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "7.2 K     Trainable params\n",
      "0         Non-trainable params\n",
      "7.2 K     Total params\n",
      "0.029     Total estimated model params size (MB)\n",
      "41        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=2` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.5 K  | train\n",
      "1 | blocks    | ModuleList         | 8.8 K  | train\n",
      "2 | dropout   | Dropout            | 0      | train\n",
      "3 | fc        | Linear             | 18     | train\n",
      "4 | criterion | CrossEntropyLoss   | 0      | train\n",
      "5 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "6 | f1        | MulticlassF1Score  | 0      | train\n",
      "7 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "10.3 K    Trainable params\n",
      "0         Non-trainable params\n",
      "10.3 K    Total params\n",
      "0.041     Total estimated model params size (MB)\n",
      "29        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.4 K  | train\n",
      "1 | maxpool   | MaxPool1d          | 0      | train\n",
      "2 | blocks    | ModuleList         | 2.8 K  | train\n",
      "3 | avgpool   | AdaptiveAvgPool1d  | 0      | train\n",
      "4 | dropout   | Dropout            | 0      | train\n",
      "5 | fc        | Linear             | 34     | train\n",
      "6 | criterion | CrossEntropyLoss   | 0      | train\n",
      "7 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "8 | f1        | MulticlassF1Score  | 0      | train\n",
      "9 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "4.2 K     Trainable params\n",
      "0         Non-trainable params\n",
      "4.2 K     Total params\n",
      "0.017     Total estimated model params size (MB)\n",
      "37        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.4 K  | train\n",
      "1 | maxpool   | MaxPool1d          | 0      | train\n",
      "2 | blocks    | ModuleList         | 2.5 K  | train\n",
      "3 | avgpool   | AdaptiveAvgPool1d  | 0      | train\n",
      "4 | dropout   | Dropout            | 0      | train\n",
      "5 | fc        | Linear             | 10     | train\n",
      "6 | criterion | CrossEntropyLoss   | 0      | train\n",
      "7 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "8 | f1        | MulticlassF1Score  | 0      | train\n",
      "9 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "3.9 K     Trainable params\n",
      "0         Non-trainable params\n",
      "3.9 K     Total params\n",
      "0.016     Total estimated model params size (MB)\n",
      "30        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.5 K  | train\n",
      "1 | blocks    | ModuleList         | 5.7 K  | train\n",
      "2 | dropout   | Dropout            | 0      | train\n",
      "3 | fc        | Linear             | 18     | train\n",
      "4 | criterion | CrossEntropyLoss   | 0      | train\n",
      "5 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "6 | f1        | MulticlassF1Score  | 0      | train\n",
      "7 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "7.2 K     Trainable params\n",
      "0         Non-trainable params\n",
      "7.2 K     Total params\n",
      "0.029     Total estimated model params size (MB)\n",
      "41        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=2` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.5 K  | train\n",
      "1 | blocks    | ModuleList         | 8.8 K  | train\n",
      "2 | dropout   | Dropout            | 0      | train\n",
      "3 | fc        | Linear             | 18     | train\n",
      "4 | criterion | CrossEntropyLoss   | 0      | train\n",
      "5 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "6 | f1        | MulticlassF1Score  | 0      | train\n",
      "7 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "10.3 K    Trainable params\n",
      "0         Non-trainable params\n",
      "10.3 K    Total params\n",
      "0.041     Total estimated model params size (MB)\n",
      "29        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.4 K  | train\n",
      "1 | maxpool   | MaxPool1d          | 0      | train\n",
      "2 | blocks    | ModuleList         | 2.8 K  | train\n",
      "3 | avgpool   | AdaptiveAvgPool1d  | 0      | train\n",
      "4 | dropout   | Dropout            | 0      | train\n",
      "5 | fc        | Linear             | 34     | train\n",
      "6 | criterion | CrossEntropyLoss   | 0      | train\n",
      "7 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "8 | f1        | MulticlassF1Score  | 0      | train\n",
      "9 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "4.2 K     Trainable params\n",
      "0         Non-trainable params\n",
      "4.2 K     Total params\n",
      "0.017     Total estimated model params size (MB)\n",
      "37        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.4 K  | train\n",
      "1 | maxpool   | MaxPool1d          | 0      | train\n",
      "2 | blocks    | ModuleList         | 2.5 K  | train\n",
      "3 | avgpool   | AdaptiveAvgPool1d  | 0      | train\n",
      "4 | dropout   | Dropout            | 0      | train\n",
      "5 | fc        | Linear             | 10     | train\n",
      "6 | criterion | CrossEntropyLoss   | 0      | train\n",
      "7 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "8 | f1        | MulticlassF1Score  | 0      | train\n",
      "9 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "3.9 K     Trainable params\n",
      "0         Non-trainable params\n",
      "3.9 K     Total params\n",
      "0.016     Total estimated model params size (MB)\n",
      "30        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=1` reached.\n",
      "GPU available: True (cuda), used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "\n",
      "  | Name      | Type               | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0 | embedder  | Sequential         | 1.5 K  | train\n",
      "1 | blocks    | ModuleList         | 5.7 K  | train\n",
      "2 | dropout   | Dropout            | 0      | train\n",
      "3 | fc        | Linear             | 18     | train\n",
      "4 | criterion | CrossEntropyLoss   | 0      | train\n",
      "5 | accuracy  | MulticlassAccuracy | 0      | train\n",
      "6 | f1        | MulticlassF1Score  | 0      | train\n",
      "7 | auroc     | MulticlassAUROC    | 0      | train\n",
      "---------------------------------------------------------\n",
      "7.2 K     Trainable params\n",
      "0         Non-trainable params\n",
      "7.2 K     Total params\n",
      "0.029     Total estimated model params size (MB)\n",
      "41        Modules in train mode\n",
      "0         Modules in eval mode\n",
      "`Trainer.fit` stopped: `max_epochs=2` reached.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Surviving units (8):\n",
      "  svc_linear_0\n",
      "  rnn_lstm_0\n",
      "  resnet_bottleneck_0\n",
      "  resnet_basic_0\n",
      "  logreg_sample_log_reg\n",
      "  mlp_resEnc_0\n",
      "  xgb_mn_0\n",
      "  xgb_mn_1\n"
     ]
    }
   ],
   "source": [
    "deck = n_kfold_selection(\n",
    "    hangar=hangar,\n",
    "    query_dataset=corpus,\n",
    "    test_dataset=corpus,     # using same data for demo; use a real holdout in practice\n",
    "    kfold_selection_list=[2],\n",
    "    valid_fraction=0.1,\n",
    "    monitor_metric=\"test.accuracy\",\n",
    "    n_dataloader_workers=1,\n",
    "    retention_point=0.5,     # keep any model with accuracy > 0.5\n",
    "    cutoff_point=0.0,\n",
    "    seed=42,\n",
    "    verbose=False,\n",
    ")\n",
    "shutil.rmtree(\"cache\", ignore_errors=True)\n",
    "\n",
    "print(f\"Surviving units ({len(deck.squad)}):\")\n",
    "for uid in deck.squad:\n",
    "    print(\" \", uid)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Predict\n",
    "\n",
    "`Deck.predict` runs every surviving unit on the corpus and returns an\n",
    "ensemble probability matrix (shape `[n_cells, n_classes]`) and the true\n",
    "labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ensemble accuracy : 1.0000\n",
      "Ensemble AUROC    : 1.0000\n"
     ]
    }
   ],
   "source": [
    "all_preds, all_labels = deck.predict(query_dataset=corpus)\n",
    "\n",
    "acc   = accuracy_score(all_labels, all_preds.argmax(axis=-1))\n",
    "auroc = roc_auc_score(all_labels, all_preds[:, 1])\n",
    "\n",
    "print(f\"Ensemble accuracy : {acc:.4f}\")\n",
    "print(f\"Ensemble AUROC    : {auroc:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Feature importance with `Deck.debrief`\n",
    "\n",
    "`Deck.debrief` calls each unit's `explain()` method (coefficients for\n",
    "linear models, SHAP for XGBoost, Integrated Gradients for neural nets),\n",
    "aggregates the per-class scores into a single `DataFrame` indexed by\n",
    "feature name, and prints each unit's individual scores.\n",
    "\n",
    "Positive `Class_0_Scores` means the feature pushes a cell toward class 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Aggregated importance table:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Class_0_Scores</th>\n",
       "      <th>Class_1_Scores</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>fake_gene_0</th>\n",
       "      <td>-1.103242</td>\n",
       "      <td>1.103242</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_1</th>\n",
       "      <td>14.013287</td>\n",
       "      <td>-14.013287</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_2</th>\n",
       "      <td>-0.503321</td>\n",
       "      <td>0.503321</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_3</th>\n",
       "      <td>1.012185</td>\n",
       "      <td>-1.012185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_4</th>\n",
       "      <td>-1.052482</td>\n",
       "      <td>1.052482</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_5</th>\n",
       "      <td>-0.549032</td>\n",
       "      <td>0.549032</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_6</th>\n",
       "      <td>0.704132</td>\n",
       "      <td>-0.704132</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_7</th>\n",
       "      <td>-1.332842</td>\n",
       "      <td>1.332842</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_8</th>\n",
       "      <td>0.825337</td>\n",
       "      <td>-0.825337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_9</th>\n",
       "      <td>1.004996</td>\n",
       "      <td>-1.004996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_10</th>\n",
       "      <td>-0.020199</td>\n",
       "      <td>0.020199</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_11</th>\n",
       "      <td>0.755457</td>\n",
       "      <td>-0.755457</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_12</th>\n",
       "      <td>10.399876</td>\n",
       "      <td>-10.399876</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_13</th>\n",
       "      <td>-0.500709</td>\n",
       "      <td>0.500709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_14</th>\n",
       "      <td>-0.491509</td>\n",
       "      <td>0.491509</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fake_gene_15</th>\n",
       "      <td>-0.045541</td>\n",
       "      <td>0.045541</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>informative_gene_0</th>\n",
       "      <td>0.516871</td>\n",
       "      <td>-0.516871</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>informative_gene_1</th>\n",
       "      <td>0.312430</td>\n",
       "      <td>-0.312430</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>redundant_gene_0</th>\n",
       "      <td>-0.134174</td>\n",
       "      <td>0.134174</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>redundant_gene_1</th>\n",
       "      <td>-1.095171</td>\n",
       "      <td>1.095171</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    Class_0_Scores  Class_1_Scores\n",
       "fake_gene_0              -1.103242        1.103242\n",
       "fake_gene_1              14.013287      -14.013287\n",
       "fake_gene_2              -0.503321        0.503321\n",
       "fake_gene_3               1.012185       -1.012185\n",
       "fake_gene_4              -1.052482        1.052482\n",
       "fake_gene_5              -0.549032        0.549032\n",
       "fake_gene_6               0.704132       -0.704132\n",
       "fake_gene_7              -1.332842        1.332842\n",
       "fake_gene_8               0.825337       -0.825337\n",
       "fake_gene_9               1.004996       -1.004996\n",
       "fake_gene_10             -0.020199        0.020199\n",
       "fake_gene_11              0.755457       -0.755457\n",
       "fake_gene_12             10.399876      -10.399876\n",
       "fake_gene_13             -0.500709        0.500709\n",
       "fake_gene_14             -0.491509        0.491509\n",
       "fake_gene_15             -0.045541        0.045541\n",
       "informative_gene_0        0.516871       -0.516871\n",
       "informative_gene_1        0.312430       -0.312430\n",
       "redundant_gene_0         -0.134174        0.134174\n",
       "redundant_gene_1         -1.095171        1.095171"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "importance = deck.debrief(exp_dataset=corpus, verbose=True)\n",
    "print(\"\\nAggregated importance table:\")\n",
    "importance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "fake_gene_1     14.013287\n",
       "fake_gene_12    10.399876\n",
       "fake_gene_3      1.012185\n",
       "fake_gene_9      1.004996\n",
       "fake_gene_8      0.825337\n",
       "Name: Class_0_Scores, dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Top 5 features driving class 0\n",
    "importance[\"Class_0_Scores\"].sort_values(ascending=False).head(5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}