{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial 01 — Quickstart\n",
"\n",
"This notebook walks through a minimal Ageas workflow on synthetic data:\n",
"1. Generate a synthetic AnnData with informative and noise genes\n",
"2. Wrap it in a `Multimodal_Corpus` (the dataset object Ageas consumes)\n",
"3. Load a model hangar from a config folder\n",
"4. Run `n_kfold_selection` to pick the best classifiers\n",
"5. Predict cell-type probabilities on the corpus\n",
"6. Call `Deck.debrief()` to obtain per-class feature importances\n",
"\n",
"The synthetic data has 2 classes and 20 features (2 informative, rest noise),\n",
"so the whole notebook runs in under a minute on CPU."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"import warnings\n",
"\n",
"import pandas as pd\n",
"from sklearn.metrics import accuracy_score, roc_auc_score\n",
"\n",
"from ageas import Hangar, n_kfold_selection\n",
"from ageas.tool import Multimodal_Corpus, make_fake_adata\n",
"\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Synthetic data\n",
"\n",
"`make_fake_adata` creates an AnnData with:\n",
"- `obs['celltype']` — integer cell-type labels\n",
"- `var['name']` — gene symbol strings\n",
"- 16 noise genes + 2 informative + 2 redundant by default"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Seed set to 42\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"AnnData object with n_obs × n_vars = 100 × 20\n",
" obs: 'celltype'\n",
" var: 'name'\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" celltype | \n",
"
\n",
" \n",
" \n",
" \n",
" | fake_cell_0 | \n",
" 0 | \n",
"
\n",
" \n",
" | fake_cell_1 | \n",
" 1 | \n",
"
\n",
" \n",
" | fake_cell_2 | \n",
" 0 | \n",
"
\n",
" \n",
" | fake_cell_3 | \n",
" 0 | \n",
"
\n",
" \n",
" | fake_cell_4 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" celltype\n",
"fake_cell_0 0\n",
"fake_cell_1 1\n",
"fake_cell_2 0\n",
"fake_cell_3 0\n",
"fake_cell_4 0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adata = make_fake_adata(n_class=2, n_clusters_per_class=1)\n",
"print(adata)\n",
"adata.obs.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Save to disk — Multimodal_Corpus reads from a file path.\n",
"# In a real workflow, point this at your own .h5ad file.\n",
"adata_path = \"ageas_tut01.h5ad\"\n",
"adata.write_h5ad(adata_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Build corpus\n",
"\n",
"`Multimodal_Corpus` wraps the AnnData file and tracks the label → integer\n",
"mapping needed by the classifiers. It also implements PyTorch's\n",
"`Dataset` interface so Ageas can stream batches via DataLoaders."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Label map: {0: 0, 1: 1}\n",
"Cells: 100\n"
]
}
],
"source": [
"corpus = Multimodal_Corpus(\n",
" adata_path,\n",
" label_key=\"celltype\",\n",
" backed=False, # load fully into RAM (fine for small data)\n",
")\n",
"print(\"Label map:\", corpus.label_dict)\n",
"print(\"Cells:\", len(corpus))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Load hangar\n",
"\n",
"A `Hangar` reads a folder of JSON config files, one sub-folder per model\n",
"family (`logreg`, `svc`, `xgb`, `mlp`, `resnet`, `rnn`). The\n",
"`data/configs/sample_panel` directory bundled with the repo is a good\n",
"starting point for experimentation."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hangar loaded 8 units\n"
]
}
],
"source": [
"# Adjust this path if you're running from outside the repo root.\n",
"config_folder = \"../../data/configs/sample_panel\"\n",
"\n",
"hangar = Hangar(config_folder)\n",
"print(f\"Hangar loaded {len(hangar.units)} units\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Model selection with `n_kfold_selection`\n",
"\n",
"`n_kfold_selection` runs k-fold cross-validation and keeps only the models\n",
"that clear the `retention_point` threshold. The survivors are then\n",
"retrained on the full dataset in a \"last mission\" pass (`skip_final=False`)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.5 K | train\n",
"1 | blocks | ModuleList | 8.8 K | train\n",
"2 | dropout | Dropout | 0 | train\n",
"3 | fc | Linear | 18 | train\n",
"4 | criterion | CrossEntropyLoss | 0 | train\n",
"5 | accuracy | MulticlassAccuracy | 0 | train\n",
"6 | f1 | MulticlassF1Score | 0 | train\n",
"7 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"10.3 K Trainable params\n",
"0 Non-trainable params\n",
"10.3 K Total params\n",
"0.041 Total estimated model params size (MB)\n",
"29 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.4 K | train\n",
"1 | maxpool | MaxPool1d | 0 | train\n",
"2 | blocks | ModuleList | 2.8 K | train\n",
"3 | avgpool | AdaptiveAvgPool1d | 0 | train\n",
"4 | dropout | Dropout | 0 | train\n",
"5 | fc | Linear | 34 | train\n",
"6 | criterion | CrossEntropyLoss | 0 | train\n",
"7 | accuracy | MulticlassAccuracy | 0 | train\n",
"8 | f1 | MulticlassF1Score | 0 | train\n",
"9 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"4.2 K Trainable params\n",
"0 Non-trainable params\n",
"4.2 K Total params\n",
"0.017 Total estimated model params size (MB)\n",
"37 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.4 K | train\n",
"1 | maxpool | MaxPool1d | 0 | train\n",
"2 | blocks | ModuleList | 2.5 K | train\n",
"3 | avgpool | AdaptiveAvgPool1d | 0 | train\n",
"4 | dropout | Dropout | 0 | train\n",
"5 | fc | Linear | 10 | train\n",
"6 | criterion | CrossEntropyLoss | 0 | train\n",
"7 | accuracy | MulticlassAccuracy | 0 | train\n",
"8 | f1 | MulticlassF1Score | 0 | train\n",
"9 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"3.9 K Trainable params\n",
"0 Non-trainable params\n",
"3.9 K Total params\n",
"0.016 Total estimated model params size (MB)\n",
"30 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.5 K | train\n",
"1 | blocks | ModuleList | 5.7 K | train\n",
"2 | dropout | Dropout | 0 | train\n",
"3 | fc | Linear | 18 | train\n",
"4 | criterion | CrossEntropyLoss | 0 | train\n",
"5 | accuracy | MulticlassAccuracy | 0 | train\n",
"6 | f1 | MulticlassF1Score | 0 | train\n",
"7 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"7.2 K Trainable params\n",
"0 Non-trainable params\n",
"7.2 K Total params\n",
"0.029 Total estimated model params size (MB)\n",
"41 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=2` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.5 K | train\n",
"1 | blocks | ModuleList | 8.8 K | train\n",
"2 | dropout | Dropout | 0 | train\n",
"3 | fc | Linear | 18 | train\n",
"4 | criterion | CrossEntropyLoss | 0 | train\n",
"5 | accuracy | MulticlassAccuracy | 0 | train\n",
"6 | f1 | MulticlassF1Score | 0 | train\n",
"7 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"10.3 K Trainable params\n",
"0 Non-trainable params\n",
"10.3 K Total params\n",
"0.041 Total estimated model params size (MB)\n",
"29 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.4 K | train\n",
"1 | maxpool | MaxPool1d | 0 | train\n",
"2 | blocks | ModuleList | 2.8 K | train\n",
"3 | avgpool | AdaptiveAvgPool1d | 0 | train\n",
"4 | dropout | Dropout | 0 | train\n",
"5 | fc | Linear | 34 | train\n",
"6 | criterion | CrossEntropyLoss | 0 | train\n",
"7 | accuracy | MulticlassAccuracy | 0 | train\n",
"8 | f1 | MulticlassF1Score | 0 | train\n",
"9 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"4.2 K Trainable params\n",
"0 Non-trainable params\n",
"4.2 K Total params\n",
"0.017 Total estimated model params size (MB)\n",
"37 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.4 K | train\n",
"1 | maxpool | MaxPool1d | 0 | train\n",
"2 | blocks | ModuleList | 2.5 K | train\n",
"3 | avgpool | AdaptiveAvgPool1d | 0 | train\n",
"4 | dropout | Dropout | 0 | train\n",
"5 | fc | Linear | 10 | train\n",
"6 | criterion | CrossEntropyLoss | 0 | train\n",
"7 | accuracy | MulticlassAccuracy | 0 | train\n",
"8 | f1 | MulticlassF1Score | 0 | train\n",
"9 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"3.9 K Trainable params\n",
"0 Non-trainable params\n",
"3.9 K Total params\n",
"0.016 Total estimated model params size (MB)\n",
"30 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.5 K | train\n",
"1 | blocks | ModuleList | 5.7 K | train\n",
"2 | dropout | Dropout | 0 | train\n",
"3 | fc | Linear | 18 | train\n",
"4 | criterion | CrossEntropyLoss | 0 | train\n",
"5 | accuracy | MulticlassAccuracy | 0 | train\n",
"6 | f1 | MulticlassF1Score | 0 | train\n",
"7 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"7.2 K Trainable params\n",
"0 Non-trainable params\n",
"7.2 K Total params\n",
"0.029 Total estimated model params size (MB)\n",
"41 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=2` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.5 K | train\n",
"1 | blocks | ModuleList | 8.8 K | train\n",
"2 | dropout | Dropout | 0 | train\n",
"3 | fc | Linear | 18 | train\n",
"4 | criterion | CrossEntropyLoss | 0 | train\n",
"5 | accuracy | MulticlassAccuracy | 0 | train\n",
"6 | f1 | MulticlassF1Score | 0 | train\n",
"7 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"10.3 K Trainable params\n",
"0 Non-trainable params\n",
"10.3 K Total params\n",
"0.041 Total estimated model params size (MB)\n",
"29 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.4 K | train\n",
"1 | maxpool | MaxPool1d | 0 | train\n",
"2 | blocks | ModuleList | 2.8 K | train\n",
"3 | avgpool | AdaptiveAvgPool1d | 0 | train\n",
"4 | dropout | Dropout | 0 | train\n",
"5 | fc | Linear | 34 | train\n",
"6 | criterion | CrossEntropyLoss | 0 | train\n",
"7 | accuracy | MulticlassAccuracy | 0 | train\n",
"8 | f1 | MulticlassF1Score | 0 | train\n",
"9 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"4.2 K Trainable params\n",
"0 Non-trainable params\n",
"4.2 K Total params\n",
"0.017 Total estimated model params size (MB)\n",
"37 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.4 K | train\n",
"1 | maxpool | MaxPool1d | 0 | train\n",
"2 | blocks | ModuleList | 2.5 K | train\n",
"3 | avgpool | AdaptiveAvgPool1d | 0 | train\n",
"4 | dropout | Dropout | 0 | train\n",
"5 | fc | Linear | 10 | train\n",
"6 | criterion | CrossEntropyLoss | 0 | train\n",
"7 | accuracy | MulticlassAccuracy | 0 | train\n",
"8 | f1 | MulticlassF1Score | 0 | train\n",
"9 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"3.9 K Trainable params\n",
"0 Non-trainable params\n",
"3.9 K Total params\n",
"0.016 Total estimated model params size (MB)\n",
"30 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=1` reached.\n",
"GPU available: True (cuda), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n",
"\n",
" | Name | Type | Params | Mode \n",
"---------------------------------------------------------\n",
"0 | embedder | Sequential | 1.5 K | train\n",
"1 | blocks | ModuleList | 5.7 K | train\n",
"2 | dropout | Dropout | 0 | train\n",
"3 | fc | Linear | 18 | train\n",
"4 | criterion | CrossEntropyLoss | 0 | train\n",
"5 | accuracy | MulticlassAccuracy | 0 | train\n",
"6 | f1 | MulticlassF1Score | 0 | train\n",
"7 | auroc | MulticlassAUROC | 0 | train\n",
"---------------------------------------------------------\n",
"7.2 K Trainable params\n",
"0 Non-trainable params\n",
"7.2 K Total params\n",
"0.029 Total estimated model params size (MB)\n",
"41 Modules in train mode\n",
"0 Modules in eval mode\n",
"`Trainer.fit` stopped: `max_epochs=2` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Surviving units (8):\n",
" svc_linear_0\n",
" rnn_lstm_0\n",
" resnet_bottleneck_0\n",
" resnet_basic_0\n",
" logreg_sample_log_reg\n",
" mlp_resEnc_0\n",
" xgb_mn_0\n",
" xgb_mn_1\n"
]
}
],
"source": [
"deck = n_kfold_selection(\n",
" hangar=hangar,\n",
" query_dataset=corpus,\n",
" test_dataset=corpus, # using same data for demo; use a real holdout in practice\n",
" kfold_selection_list=[2],\n",
" valid_fraction=0.1,\n",
" monitor_metric=\"test.accuracy\",\n",
" n_dataloader_workers=1,\n",
" retention_point=0.5, # keep any model with accuracy > 0.5\n",
" cutoff_point=0.0,\n",
" seed=42,\n",
" verbose=False,\n",
")\n",
"shutil.rmtree(\"cache\", ignore_errors=True)\n",
"\n",
"print(f\"Surviving units ({len(deck.squad)}):\")\n",
"for uid in deck.squad:\n",
" print(\" \", uid)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Predict\n",
"\n",
"`Deck.predict` runs every surviving unit on the corpus and returns an\n",
"ensemble probability matrix (shape `[n_cells, n_classes]`) and the true\n",
"labels."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ensemble accuracy : 1.0000\n",
"Ensemble AUROC : 1.0000\n"
]
}
],
"source": [
"all_preds, all_labels = deck.predict(query_dataset=corpus)\n",
"\n",
"acc = accuracy_score(all_labels, all_preds.argmax(axis=-1))\n",
"auroc = roc_auc_score(all_labels, all_preds[:, 1])\n",
"\n",
"print(f\"Ensemble accuracy : {acc:.4f}\")\n",
"print(f\"Ensemble AUROC : {auroc:.4f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Feature importance with `Deck.debrief`\n",
"\n",
"`Deck.debrief` calls each unit's `explain()` method (coefficients for\n",
"linear models, SHAP for XGBoost, Integrated Gradients for neural nets),\n",
"aggregates the per-class scores into a single `DataFrame` indexed by\n",
"feature name, and prints each unit's individual scores.\n",
"\n",
"Positive `Class_0_Scores` means the feature pushes a cell toward class 0."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Aggregated importance table:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Class_0_Scores | \n",
" Class_1_Scores | \n",
"
\n",
" \n",
" \n",
" \n",
" | fake_gene_0 | \n",
" -1.103242 | \n",
" 1.103242 | \n",
"
\n",
" \n",
" | fake_gene_1 | \n",
" 14.013287 | \n",
" -14.013287 | \n",
"
\n",
" \n",
" | fake_gene_2 | \n",
" -0.503321 | \n",
" 0.503321 | \n",
"
\n",
" \n",
" | fake_gene_3 | \n",
" 1.012185 | \n",
" -1.012185 | \n",
"
\n",
" \n",
" | fake_gene_4 | \n",
" -1.052482 | \n",
" 1.052482 | \n",
"
\n",
" \n",
" | fake_gene_5 | \n",
" -0.549032 | \n",
" 0.549032 | \n",
"
\n",
" \n",
" | fake_gene_6 | \n",
" 0.704132 | \n",
" -0.704132 | \n",
"
\n",
" \n",
" | fake_gene_7 | \n",
" -1.332842 | \n",
" 1.332842 | \n",
"
\n",
" \n",
" | fake_gene_8 | \n",
" 0.825337 | \n",
" -0.825337 | \n",
"
\n",
" \n",
" | fake_gene_9 | \n",
" 1.004996 | \n",
" -1.004996 | \n",
"
\n",
" \n",
" | fake_gene_10 | \n",
" -0.020199 | \n",
" 0.020199 | \n",
"
\n",
" \n",
" | fake_gene_11 | \n",
" 0.755457 | \n",
" -0.755457 | \n",
"
\n",
" \n",
" | fake_gene_12 | \n",
" 10.399876 | \n",
" -10.399876 | \n",
"
\n",
" \n",
" | fake_gene_13 | \n",
" -0.500709 | \n",
" 0.500709 | \n",
"
\n",
" \n",
" | fake_gene_14 | \n",
" -0.491509 | \n",
" 0.491509 | \n",
"
\n",
" \n",
" | fake_gene_15 | \n",
" -0.045541 | \n",
" 0.045541 | \n",
"
\n",
" \n",
" | informative_gene_0 | \n",
" 0.516871 | \n",
" -0.516871 | \n",
"
\n",
" \n",
" | informative_gene_1 | \n",
" 0.312430 | \n",
" -0.312430 | \n",
"
\n",
" \n",
" | redundant_gene_0 | \n",
" -0.134174 | \n",
" 0.134174 | \n",
"
\n",
" \n",
" | redundant_gene_1 | \n",
" -1.095171 | \n",
" 1.095171 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Class_0_Scores Class_1_Scores\n",
"fake_gene_0 -1.103242 1.103242\n",
"fake_gene_1 14.013287 -14.013287\n",
"fake_gene_2 -0.503321 0.503321\n",
"fake_gene_3 1.012185 -1.012185\n",
"fake_gene_4 -1.052482 1.052482\n",
"fake_gene_5 -0.549032 0.549032\n",
"fake_gene_6 0.704132 -0.704132\n",
"fake_gene_7 -1.332842 1.332842\n",
"fake_gene_8 0.825337 -0.825337\n",
"fake_gene_9 1.004996 -1.004996\n",
"fake_gene_10 -0.020199 0.020199\n",
"fake_gene_11 0.755457 -0.755457\n",
"fake_gene_12 10.399876 -10.399876\n",
"fake_gene_13 -0.500709 0.500709\n",
"fake_gene_14 -0.491509 0.491509\n",
"fake_gene_15 -0.045541 0.045541\n",
"informative_gene_0 0.516871 -0.516871\n",
"informative_gene_1 0.312430 -0.312430\n",
"redundant_gene_0 -0.134174 0.134174\n",
"redundant_gene_1 -1.095171 1.095171"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"importance = deck.debrief(exp_dataset=corpus, verbose=True)\n",
"print(\"\\nAggregated importance table:\")\n",
"importance"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"fake_gene_1 14.013287\n",
"fake_gene_12 10.399876\n",
"fake_gene_3 1.012185\n",
"fake_gene_9 1.004996\n",
"fake_gene_8 0.825337\n",
"Name: Class_0_Scores, dtype: float64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Top 5 features driving class 0\n",
"importance[\"Class_0_Scores\"].sort_values(ascending=False).head(5)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}