Topic classification#

We can use the outputs of pycisTopic to train a model to predict topic probabilities for a given sequence.

Since we plan on adding detailed use cases describing topic classification later on, we will only provide a brief overview of the workflow here. Refer to the introductory notebook for a more detailed explanation of the CREsted workflow.

Hide code cell source

# Set package settings
import matplotlib
import os

## Set the font type to ensure text is saved as whole words
matplotlib.rcParams["pdf.fonttype"] = 42  # Use TrueType fonts instead of Type 3 fonts
matplotlib.rcParams["ps.fonttype"] = 42  # For PostScript as well, if needed

## Set the base directory for data retrieval with crested.get_dataset()/get_model()
os.environ['CRESTED_DATA_DIR'] = '/staging/leuven/stg_00002/lcb/cblaauw/'

Import data#

For this tutorial, we will use the mouse BICCN dataset. We will use the preprocessed, binarized outputs of pycisTopic as input data for the topic classification model.

To train a topic classification model, we need the following data:

  1. A folder containing BED files per topic (output of pycisTopic).

  2. A genome fasta and optionally a chromosome sizes file.

import crested
# Set the genome
genome = crested.Genome("mm10/genome.fa", "mm10/genome.chrom.sizes")
crested.register_genome(genome)  # Register the genome so that it's automatically used in every function
2026-02-16T15:04:11.542386+0100 INFO Genome genome registered.
# Download the tutorial data
beds_folder, regions_file = crested.get_dataset("mouse_cortex_bed")

We can import a folder of BED files using the crested.import_beds() function.
This will return an Anndata object with the regions as .var and the bed file names as .obs (here: our Topics).
In this case, the adata.X values are binary, representing whether that region is associated with a topic or not.

# Import the beds into an AnnData object - the regions file is optional for import_beds
adata = crested.import_beds(beds_folder=beds_folder, regions_file=regions_file)
adata
2026-02-16T15:03:14.757905+0100 WARNING Chromsizes file not provided. Will not check if regions are within chromosomes
2026-02-16T15:03:15.642825+0100 INFO Reading bed files from /staging/leuven/stg_00002/lcb/cblaauw/data/mouse_biccn/beds.tar.gz.untar and using /staging/leuven/stg_00002/lcb/cblaauw/data/mouse_biccn/consensus_peaks_biccn.bed as var_names...
2026-02-16T15:03:29.218412+0100 WARNING 107610 consensus regions are not open in any class. Removing them from the AnnData object. Disable this behavior by setting 'remove_empty_regions=False'
AnnData object with n_obs × n_vars = 80 × 439383
    obs: 'file_path', 'n_open_regions'
    var: 'n_classes', 'chr', 'start', 'end'

We have 80 classes (topics) and 439386 regions in the dataset.

Preprocessing#

For topic classification there is little preprocessing to be performed compared to peak regression.
The data does not need to be normalized since the values are binary and we don’t filter any regions on specificity since by nature of topic modelling the selected regions should already be ‘meaningful’ regions.
You could change the width of the regions, but we tend to keep the regions at 500bp for topic classification.

The only preprocessing step we need to perform is to split the data into training and testing sets.

# Standard train/val/test split
crested.pp.train_val_test_split(adata, strategy="chr", val_chroms=["chr8", "chr10"], test_chroms=["chr9", "chr18"])
print(adata.var["split"].value_counts())
2026-02-16T15:03:29.634609+0100 INFO Lazily importing module crested.pp. This could take a second...
split
train    354013
val       45113
test      40257
Name: count, dtype: int64

Model training#

Model training has the same workflow as peak regression. The only differences are:

  1. We select a different model architecture. Since we’re training on 500bp regions we don’t need the dilated convolutions of the dilated CNN.

  2. We select a different config, since we’re monitoring other metrics and are using a different loss for classification.

# Datamodule
datamodule = crested.tl.data.AnnDataModule(
    adata,
    batch_size=128,  # lower this if you encounter OOM errors
    max_stochastic_shift=3,  # optional augmentation
    always_reverse_complement=True,  # default True. Will double the effective size of the training dataset.
)

# Architecture: we will use the DeepTopic CNN model
model_architecture = crested.tl.zoo.deeptopic_cnn(seq_len=500, num_classes=80)

# Config: we will use the default topic classification config (binary cross entropy loss and AUC/ROC metrics)
config = crested.tl.default_configs("topic_classification")
print(config)
2026-02-16T15:04:21.343827+0100 INFO Lazily importing module crested.tl. This could take a second...
TaskConfig(optimizer=<keras.src.optimizers.adam.Adam object at 0x14ba081b86e0>, loss=<LossFunctionWrapper(<function binary_crossentropy at 0x14ba02aa5080>, kwargs={'from_logits': False, 'label_smoothing': 0.0, 'axis': -1})>, metrics=[<AUC name=auROC>, <AUC name=auPR>, <CategoricalAccuracy name=categorical_accuracy>])

Set up the trainer object and train the model:

trainer = crested.tl.Crested(
    data=datamodule,
    model=model_architecture,
    config=config,
    project_name="mouse_biccn",  # change to your liking
    run_name="topic_classification",
    logger='wandb',  # or 'tensorboard', None
)
trainer.fit(epochs=100)

Hide code cell output

Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)         Output Shape          Param #  Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ sequence            │ (None, 500, 4)    │          0 │ -                 │
│ (InputLayer)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv1d (Conv1D)     │ (None, 500, 1024) │     69,632 │ sequence[0][0]    │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalization │ (None, 500, 1024) │      4,096 │ conv1d[0][0]      │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation          │ (None, 500, 1024) │          0 │ batch_normalizat… │
│ (Activation)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling1d       │ (None, 125, 1024) │          0 │ activation[0][0]  │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout (Dropout)   │ (None, 125, 1024) │          0 │ max_pooling1d[0]… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv1d_1 (Conv1D)   │ (None, 125, 512)  │  5,767,168 │ dropout[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 125, 512)  │      2,048 │ conv1d_1[0][0]    │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_1        │ (None, 125, 512)  │          0 │ batch_normalizat… │
│ (Activation)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling1d_1     │ (None, 32, 512)   │          0 │ activation_1[0][ │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_1 (Dropout) │ (None, 32, 512)   │          0 │ max_pooling1d_1[ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv1d_2 (Conv1D)   │ (None, 32, 512)   │  2,883,584 │ dropout_1[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 32, 512)   │      2,048 │ conv1d_2[0][0]    │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_2        │ (None, 32, 512)   │          0 │ batch_normalizat… │
│ (Activation)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling1d_2     │ (None, 8, 512)    │          0 │ activation_2[0][ │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_2 (Dropout) │ (None, 8, 512)    │          0 │ max_pooling1d_2[ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv1d_3 (Conv1D)   │ (None, 8, 512)    │  1,310,720 │ dropout_2[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 8, 512)    │      2,048 │ conv1d_3[0][0]    │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_3        │ (None, 8, 512)    │          0 │ batch_normalizat… │
│ (Activation)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add (Add)           │ (None, 8, 512)    │          0 │ activation_3[0][ │
│                     │                   │            │ dropout_2[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling1d_3     │ (None, 2, 512)    │          0 │ add[0][0]         │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_3 (Dropout) │ (None, 2, 512)    │          0 │ max_pooling1d_3[ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv1d_4 (Conv1D)   │ (None, 2, 512)    │    524,288 │ dropout_3[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 2, 512)    │      2,048 │ conv1d_4[0][0]    │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_4        │ (None, 2, 512)    │          0 │ batch_normalizat… │
│ (Activation)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_1 (Add)         │ (None, 2, 512)    │          0 │ activation_4[0][ │
│                     │                   │            │ dropout_3[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ flatten (Flatten)   │ (None, 1024)      │          0 │ add_1[0][0]       │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_4 (Dropout) │ (None, 1024)      │          0 │ flatten[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ denseblock_dense    │ (None, 1024)      │  1,048,576 │ dropout_4[0][0]   │
│ (Dense)             │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ denseblock_batchno… │ (None, 1024)      │      4,096 │ denseblock_dense… │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ denseblock_activat… │ (None, 1024)      │          0 │ denseblock_batch… │
│ (Activation)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ denseblock_dropout  │ (None, 1024)      │          0 │ denseblock_activ… │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense (Dense)       │ (None, 80)        │     82,000 │ denseblock_dropo… │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 11,702,352 (44.64 MB)
 Trainable params: 11,694,160 (44.61 MB)
 Non-trainable params: 8,192 (32.00 KB)
None
2026-02-16T15:05:49.654130+0100 INFO Loading sequences into memory...
2026-02-16T15:05:55.914284+0100 INFO Loading sequences into memory...
Epoch 1/100
  10/5532 ━━━━━━━━━━━━━━━━━━━━ 1:09 13ms/step - auPR: 0.0467 - auROC: 0.5093 - categorical_accuracy: 0.0112 - loss: 0.7205
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 128s 17ms/step - auPR: 0.0931 - auROC: 0.6814 - categorical_accuracy: 0.0411 - loss: 0.1609 - val_auPR: 0.1200 - val_auROC: 0.7129 - val_categorical_accuracy: 0.0336 - val_loss: 0.1599 - learning_rate: 0.0010
Epoch 2/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1176 - auROC: 0.7174 - categorical_accuracy: 0.0476 - loss: 0.1559 - val_auPR: 0.1425 - val_auROC: 0.7447 - val_categorical_accuracy: 0.0503 - val_loss: 0.1573 - learning_rate: 0.0010
Epoch 3/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 72s 13ms/step - auPR: 0.1341 - auROC: 0.7440 - categorical_accuracy: 0.0584 - loss: 0.1552 - val_auPR: 0.1548 - val_auROC: 0.7579 - val_categorical_accuracy: 0.0579 - val_loss: 0.1573 - learning_rate: 0.0010
Epoch 4/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.1429 - auROC: 0.7535 - categorical_accuracy: 0.0663 - loss: 0.1546 - val_auPR: 0.1701 - val_auROC: 0.7731 - val_categorical_accuracy: 0.0704 - val_loss: 0.1558 - learning_rate: 0.0010
Epoch 5/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1492 - auROC: 0.7589 - categorical_accuracy: 0.0708 - loss: 0.1537 - val_auPR: 0.1733 - val_auROC: 0.7771 - val_categorical_accuracy: 0.0598 - val_loss: 0.1536 - learning_rate: 0.0010
Epoch 6/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.1526 - auROC: 0.7620 - categorical_accuracy: 0.0734 - loss: 0.1526 - val_auPR: 0.1784 - val_auROC: 0.7787 - val_categorical_accuracy: 0.0780 - val_loss: 0.1520 - learning_rate: 0.0010
Epoch 7/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1552 - auROC: 0.7638 - categorical_accuracy: 0.0757 - loss: 0.1516 - val_auPR: 0.1788 - val_auROC: 0.7786 - val_categorical_accuracy: 0.0808 - val_loss: 0.1526 - learning_rate: 0.0010
Epoch 8/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 80s 15ms/step - auPR: 0.1565 - auROC: 0.7650 - categorical_accuracy: 0.0763 - loss: 0.1510 - val_auPR: 0.1840 - val_auROC: 0.7833 - val_categorical_accuracy: 0.0874 - val_loss: 0.1500 - learning_rate: 0.0010
Epoch 9/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 82s 15ms/step - auPR: 0.1572 - auROC: 0.7659 - categorical_accuracy: 0.0773 - loss: 0.1505 - val_auPR: 0.1795 - val_auROC: 0.7831 - val_categorical_accuracy: 0.0747 - val_loss: 0.1497 - learning_rate: 0.0010
Epoch 10/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.1586 - auROC: 0.7665 - categorical_accuracy: 0.0774 - loss: 0.1501 - val_auPR: 0.1838 - val_auROC: 0.7881 - val_categorical_accuracy: 0.0829 - val_loss: 0.1480 - learning_rate: 0.0010
Epoch 11/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.1584 - auROC: 0.7673 - categorical_accuracy: 0.0788 - loss: 0.1498 - val_auPR: 0.1856 - val_auROC: 0.7873 - val_categorical_accuracy: 0.0909 - val_loss: 0.1475 - learning_rate: 0.0010
Epoch 12/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.1591 - auROC: 0.7673 - categorical_accuracy: 0.0790 - loss: 0.1496 - val_auPR: 0.1865 - val_auROC: 0.7880 - val_categorical_accuracy: 0.0928 - val_loss: 0.1477 - learning_rate: 0.0010
Epoch 13/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 82s 15ms/step - auPR: 0.1598 - auROC: 0.7679 - categorical_accuracy: 0.0787 - loss: 0.1494 - val_auPR: 0.1865 - val_auROC: 0.7887 - val_categorical_accuracy: 0.0909 - val_loss: 0.1470 - learning_rate: 0.0010
Epoch 14/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.1600 - auROC: 0.7680 - categorical_accuracy: 0.0786 - loss: 0.1493 - val_auPR: 0.1882 - val_auROC: 0.7892 - val_categorical_accuracy: 0.0878 - val_loss: 0.1473 - learning_rate: 0.0010
Epoch 15/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1602 - auROC: 0.7681 - categorical_accuracy: 0.0788 - loss: 0.1492 - val_auPR: 0.1865 - val_auROC: 0.7868 - val_categorical_accuracy: 0.0810 - val_loss: 0.1471 - learning_rate: 0.0010
Epoch 16/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.1601 - auROC: 0.7684 - categorical_accuracy: 0.0793 - loss: 0.1490 - val_auPR: 0.1879 - val_auROC: 0.7892 - val_categorical_accuracy: 0.0839 - val_loss: 0.1473 - learning_rate: 0.0010
Epoch 17/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1605 - auROC: 0.7685 - categorical_accuracy: 0.0794 - loss: 0.1490 - val_auPR: 0.1875 - val_auROC: 0.7898 - val_categorical_accuracy: 0.0880 - val_loss: 0.1473 - learning_rate: 0.0010
Epoch 18/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.1607 - auROC: 0.7685 - categorical_accuracy: 0.0796 - loss: 0.1489 - val_auPR: 0.1853 - val_auROC: 0.7874 - val_categorical_accuracy: 0.0876 - val_loss: 0.1472 - learning_rate: 0.0010
Epoch 19/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.1763 - auROC: 0.7826 - categorical_accuracy: 0.0897 - loss: 0.1437 - val_auPR: 0.2057 - val_auROC: 0.8041 - val_categorical_accuracy: 0.0999 - val_loss: 0.1400 - learning_rate: 2.5000e-04
Epoch 20/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 84s 15ms/step - auPR: 0.1804 - auROC: 0.7854 - categorical_accuracy: 0.0936 - loss: 0.1421 - val_auPR: 0.2087 - val_auROC: 0.8053 - val_categorical_accuracy: 0.1068 - val_loss: 0.1393 - learning_rate: 2.5000e-04
Epoch 21/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1819 - auROC: 0.7864 - categorical_accuracy: 0.0943 - loss: 0.1417 - val_auPR: 0.2084 - val_auROC: 0.8061 - val_categorical_accuracy: 0.1063 - val_loss: 0.1392 - learning_rate: 2.5000e-04
Epoch 22/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 86s 15ms/step - auPR: 0.1830 - auROC: 0.7871 - categorical_accuracy: 0.0960 - loss: 0.1415 - val_auPR: 0.2106 - val_auROC: 0.8065 - val_categorical_accuracy: 0.1070 - val_loss: 0.1388 - learning_rate: 2.5000e-04
Epoch 23/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.1837 - auROC: 0.7875 - categorical_accuracy: 0.0961 - loss: 0.1415 - val_auPR: 0.2104 - val_auROC: 0.8062 - val_categorical_accuracy: 0.1035 - val_loss: 0.1393 - learning_rate: 2.5000e-04
Epoch 24/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.1842 - auROC: 0.7880 - categorical_accuracy: 0.0973 - loss: 0.1414 - val_auPR: 0.2116 - val_auROC: 0.8074 - val_categorical_accuracy: 0.1098 - val_loss: 0.1389 - learning_rate: 2.5000e-04
Epoch 25/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.1847 - auROC: 0.7882 - categorical_accuracy: 0.0976 - loss: 0.1414 - val_auPR: 0.2112 - val_auROC: 0.8064 - val_categorical_accuracy: 0.1128 - val_loss: 0.1392 - learning_rate: 2.5000e-04
Epoch 26/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.1851 - auROC: 0.7885 - categorical_accuracy: 0.0980 - loss: 0.1414 - val_auPR: 0.2129 - val_auROC: 0.8082 - val_categorical_accuracy: 0.1147 - val_loss: 0.1387 - learning_rate: 2.5000e-04
Epoch 27/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1854 - auROC: 0.7888 - categorical_accuracy: 0.0982 - loss: 0.1413 - val_auPR: 0.2135 - val_auROC: 0.8075 - val_categorical_accuracy: 0.1167 - val_loss: 0.1387 - learning_rate: 2.5000e-04
Epoch 28/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.1861 - auROC: 0.7892 - categorical_accuracy: 0.0984 - loss: 0.1413 - val_auPR: 0.2134 - val_auROC: 0.8079 - val_categorical_accuracy: 0.1113 - val_loss: 0.1387 - learning_rate: 2.5000e-04
Epoch 29/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.1861 - auROC: 0.7891 - categorical_accuracy: 0.0992 - loss: 0.1413 - val_auPR: 0.2143 - val_auROC: 0.8087 - val_categorical_accuracy: 0.1125 - val_loss: 0.1387 - learning_rate: 2.5000e-04
Epoch 30/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.1865 - auROC: 0.7896 - categorical_accuracy: 0.0991 - loss: 0.1412 - val_auPR: 0.2148 - val_auROC: 0.8093 - val_categorical_accuracy: 0.1131 - val_loss: 0.1384 - learning_rate: 2.5000e-04
Epoch 31/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.1867 - auROC: 0.7895 - categorical_accuracy: 0.0994 - loss: 0.1413 - val_auPR: 0.2158 - val_auROC: 0.8099 - val_categorical_accuracy: 0.1162 - val_loss: 0.1382 - learning_rate: 2.5000e-04
Epoch 32/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.1874 - auROC: 0.7897 - categorical_accuracy: 0.0992 - loss: 0.1412 - val_auPR: 0.2143 - val_auROC: 0.8088 - val_categorical_accuracy: 0.1093 - val_loss: 0.1386 - learning_rate: 2.5000e-04
Epoch 33/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.1874 - auROC: 0.7898 - categorical_accuracy: 0.1001 - loss: 0.1412 - val_auPR: 0.2151 - val_auROC: 0.8088 - val_categorical_accuracy: 0.1133 - val_loss: 0.1385 - learning_rate: 2.5000e-04
Epoch 34/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.1878 - auROC: 0.7901 - categorical_accuracy: 0.0998 - loss: 0.1412 - val_auPR: 0.2149 - val_auROC: 0.8090 - val_categorical_accuracy: 0.1097 - val_loss: 0.1384 - learning_rate: 2.5000e-04
Epoch 35/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.1880 - auROC: 0.7901 - categorical_accuracy: 0.0999 - loss: 0.1412 - val_auPR: 0.2156 - val_auROC: 0.8097 - val_categorical_accuracy: 0.1137 - val_loss: 0.1384 - learning_rate: 2.5000e-04
Epoch 36/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 89s 16ms/step - auPR: 0.1881 - auROC: 0.7902 - categorical_accuracy: 0.1003 - loss: 0.1411 - val_auPR: 0.2158 - val_auROC: 0.8097 - val_categorical_accuracy: 0.1132 - val_loss: 0.1384 - learning_rate: 2.5000e-04
Epoch 37/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.1958 - auROC: 0.7961 - categorical_accuracy: 0.1049 - loss: 0.1396 - val_auPR: 0.2243 - val_auROC: 0.8150 - val_categorical_accuracy: 0.1224 - val_loss: 0.1363 - learning_rate: 6.2500e-05
Epoch 38/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 84s 15ms/step - auPR: 0.1987 - auROC: 0.7981 - categorical_accuracy: 0.1072 - loss: 0.1387 - val_auPR: 0.2259 - val_auROC: 0.8158 - val_categorical_accuracy: 0.1228 - val_loss: 0.1358 - learning_rate: 6.2500e-05
Epoch 39/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.1997 - auROC: 0.7990 - categorical_accuracy: 0.1075 - loss: 0.1383 - val_auPR: 0.2264 - val_auROC: 0.8160 - val_categorical_accuracy: 0.1224 - val_loss: 0.1356 - learning_rate: 6.2500e-05
Epoch 40/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.2007 - auROC: 0.7994 - categorical_accuracy: 0.1079 - loss: 0.1380 - val_auPR: 0.2261 - val_auROC: 0.8161 - val_categorical_accuracy: 0.1219 - val_loss: 0.1355 - learning_rate: 6.2500e-05
Epoch 41/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.2012 - auROC: 0.7998 - categorical_accuracy: 0.1079 - loss: 0.1378 - val_auPR: 0.2274 - val_auROC: 0.8163 - val_categorical_accuracy: 0.1213 - val_loss: 0.1353 - learning_rate: 6.2500e-05
Epoch 42/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2017 - auROC: 0.8000 - categorical_accuracy: 0.1084 - loss: 0.1377 - val_auPR: 0.2273 - val_auROC: 0.8166 - val_categorical_accuracy: 0.1227 - val_loss: 0.1350 - learning_rate: 6.2500e-05
Epoch 43/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2020 - auROC: 0.8003 - categorical_accuracy: 0.1088 - loss: 0.1376 - val_auPR: 0.2276 - val_auROC: 0.8169 - val_categorical_accuracy: 0.1223 - val_loss: 0.1350 - learning_rate: 6.2500e-05
Epoch 44/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2021 - auROC: 0.8004 - categorical_accuracy: 0.1081 - loss: 0.1375 - val_auPR: 0.2275 - val_auROC: 0.8163 - val_categorical_accuracy: 0.1236 - val_loss: 0.1351 - learning_rate: 6.2500e-05
Epoch 45/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2025 - auROC: 0.8007 - categorical_accuracy: 0.1088 - loss: 0.1374 - val_auPR: 0.2274 - val_auROC: 0.8160 - val_categorical_accuracy: 0.1224 - val_loss: 0.1350 - learning_rate: 6.2500e-05
Epoch 46/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.2026 - auROC: 0.8007 - categorical_accuracy: 0.1091 - loss: 0.1374 - val_auPR: 0.2276 - val_auROC: 0.8169 - val_categorical_accuracy: 0.1233 - val_loss: 0.1350 - learning_rate: 6.2500e-05
Epoch 47/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 81s 15ms/step - auPR: 0.2025 - auROC: 0.8008 - categorical_accuracy: 0.1091 - loss: 0.1373 - val_auPR: 0.2273 - val_auROC: 0.8166 - val_categorical_accuracy: 0.1208 - val_loss: 0.1350 - learning_rate: 6.2500e-05
Epoch 48/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 82s 15ms/step - auPR: 0.2060 - auROC: 0.8031 - categorical_accuracy: 0.1107 - loss: 0.1368 - val_auPR: 0.2302 - val_auROC: 0.8180 - val_categorical_accuracy: 0.1244 - val_loss: 0.1345 - learning_rate: 1.5625e-05
Epoch 49/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.2070 - auROC: 0.8039 - categorical_accuracy: 0.1115 - loss: 0.1366 - val_auPR: 0.2306 - val_auROC: 0.8185 - val_categorical_accuracy: 0.1240 - val_loss: 0.1343 - learning_rate: 1.5625e-05
Epoch 50/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 86s 15ms/step - auPR: 0.2073 - auROC: 0.8042 - categorical_accuracy: 0.1118 - loss: 0.1365 - val_auPR: 0.2311 - val_auROC: 0.8188 - val_categorical_accuracy: 0.1236 - val_loss: 0.1342 - learning_rate: 1.5625e-05
Epoch 51/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2076 - auROC: 0.8043 - categorical_accuracy: 0.1121 - loss: 0.1364 - val_auPR: 0.2313 - val_auROC: 0.8185 - val_categorical_accuracy: 0.1238 - val_loss: 0.1342 - learning_rate: 1.5625e-05
Epoch 52/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.2079 - auROC: 0.8047 - categorical_accuracy: 0.1126 - loss: 0.1363 - val_auPR: 0.2318 - val_auROC: 0.8189 - val_categorical_accuracy: 0.1253 - val_loss: 0.1341 - learning_rate: 1.5625e-05
Epoch 53/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 73s 13ms/step - auPR: 0.2083 - auROC: 0.8049 - categorical_accuracy: 0.1122 - loss: 0.1362 - val_auPR: 0.2313 - val_auROC: 0.8186 - val_categorical_accuracy: 0.1255 - val_loss: 0.1341 - learning_rate: 1.5625e-05
Epoch 54/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.2084 - auROC: 0.8050 - categorical_accuracy: 0.1121 - loss: 0.1361 - val_auPR: 0.2317 - val_auROC: 0.8188 - val_categorical_accuracy: 0.1260 - val_loss: 0.1341 - learning_rate: 1.5625e-05
Epoch 55/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.2088 - auROC: 0.8051 - categorical_accuracy: 0.1123 - loss: 0.1361 - val_auPR: 0.2316 - val_auROC: 0.8190 - val_categorical_accuracy: 0.1263 - val_loss: 0.1340 - learning_rate: 1.5625e-05
Epoch 56/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 73s 13ms/step - auPR: 0.2091 - auROC: 0.8051 - categorical_accuracy: 0.1124 - loss: 0.1360 - val_auPR: 0.2318 - val_auROC: 0.8192 - val_categorical_accuracy: 0.1251 - val_loss: 0.1340 - learning_rate: 1.5625e-05
Epoch 57/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2092 - auROC: 0.8054 - categorical_accuracy: 0.1126 - loss: 0.1360 - val_auPR: 0.2317 - val_auROC: 0.8188 - val_categorical_accuracy: 0.1256 - val_loss: 0.1339 - learning_rate: 1.5625e-05
Epoch 58/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.2092 - auROC: 0.8052 - categorical_accuracy: 0.1123 - loss: 0.1360 - val_auPR: 0.2318 - val_auROC: 0.8187 - val_categorical_accuracy: 0.1252 - val_loss: 0.1339 - learning_rate: 1.5625e-05
Epoch 59/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.2093 - auROC: 0.8055 - categorical_accuracy: 0.1123 - loss: 0.1359 - val_auPR: 0.2321 - val_auROC: 0.8188 - val_categorical_accuracy: 0.1246 - val_loss: 0.1339 - learning_rate: 1.5625e-05
Epoch 60/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.2096 - auROC: 0.8055 - categorical_accuracy: 0.1124 - loss: 0.1359 - val_auPR: 0.2322 - val_auROC: 0.8190 - val_categorical_accuracy: 0.1245 - val_loss: 0.1338 - learning_rate: 1.5625e-05
Epoch 61/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 13ms/step - auPR: 0.2097 - auROC: 0.8057 - categorical_accuracy: 0.1127 - loss: 0.1358 - val_auPR: 0.2321 - val_auROC: 0.8187 - val_categorical_accuracy: 0.1256 - val_loss: 0.1339 - learning_rate: 1.5625e-05
Epoch 62/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 73s 13ms/step - auPR: 0.2099 - auROC: 0.8056 - categorical_accuracy: 0.1126 - loss: 0.1358 - val_auPR: 0.2319 - val_auROC: 0.8189 - val_categorical_accuracy: 0.1242 - val_loss: 0.1339 - learning_rate: 1.5625e-05
Epoch 63/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.2107 - auROC: 0.8065 - categorical_accuracy: 0.1132 - loss: 0.1356 - val_auPR: 0.2326 - val_auROC: 0.8194 - val_categorical_accuracy: 0.1254 - val_loss: 0.1337 - learning_rate: 3.9063e-06
Epoch 64/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2112 - auROC: 0.8065 - categorical_accuracy: 0.1132 - loss: 0.1356 - val_auPR: 0.2327 - val_auROC: 0.8191 - val_categorical_accuracy: 0.1258 - val_loss: 0.1337 - learning_rate: 3.9063e-06
Epoch 65/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 13ms/step - auPR: 0.2109 - auROC: 0.8067 - categorical_accuracy: 0.1139 - loss: 0.1356 - val_auPR: 0.2327 - val_auROC: 0.8193 - val_categorical_accuracy: 0.1247 - val_loss: 0.1337 - learning_rate: 3.9063e-06
Epoch 66/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.2114 - auROC: 0.8068 - categorical_accuracy: 0.1132 - loss: 0.1355 - val_auPR: 0.2325 - val_auROC: 0.8193 - val_categorical_accuracy: 0.1249 - val_loss: 0.1337 - learning_rate: 3.9063e-06
Epoch 67/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2111 - auROC: 0.8070 - categorical_accuracy: 0.1135 - loss: 0.1355 - val_auPR: 0.2327 - val_auROC: 0.8193 - val_categorical_accuracy: 0.1266 - val_loss: 0.1337 - learning_rate: 3.9063e-06
Epoch 68/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.2116 - auROC: 0.8069 - categorical_accuracy: 0.1138 - loss: 0.1355 - val_auPR: 0.2328 - val_auROC: 0.8194 - val_categorical_accuracy: 0.1268 - val_loss: 0.1336 - learning_rate: 3.9063e-06
Epoch 69/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 72s 13ms/step - auPR: 0.2120 - auROC: 0.8071 - categorical_accuracy: 0.1143 - loss: 0.1354 - val_auPR: 0.2327 - val_auROC: 0.8194 - val_categorical_accuracy: 0.1247 - val_loss: 0.1337 - learning_rate: 1.0000e-06
Epoch 70/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2118 - auROC: 0.8070 - categorical_accuracy: 0.1140 - loss: 0.1355 - val_auPR: 0.2331 - val_auROC: 0.8196 - val_categorical_accuracy: 0.1267 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 71/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.2116 - auROC: 0.8071 - categorical_accuracy: 0.1136 - loss: 0.1354 - val_auPR: 0.2328 - val_auROC: 0.8192 - val_categorical_accuracy: 0.1255 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 72/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2118 - auROC: 0.8072 - categorical_accuracy: 0.1140 - loss: 0.1354 - val_auPR: 0.2328 - val_auROC: 0.8194 - val_categorical_accuracy: 0.1254 - val_loss: 0.1337 - learning_rate: 1.0000e-06
Epoch 73/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 81s 15ms/step - auPR: 0.2118 - auROC: 0.8072 - categorical_accuracy: 0.1141 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8196 - val_categorical_accuracy: 0.1268 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 74/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2120 - auROC: 0.8072 - categorical_accuracy: 0.1141 - loss: 0.1354 - val_auPR: 0.2328 - val_auROC: 0.8195 - val_categorical_accuracy: 0.1256 - val_loss: 0.1337 - learning_rate: 1.0000e-06
Epoch 75/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2119 - auROC: 0.8073 - categorical_accuracy: 0.1140 - loss: 0.1354 - val_auPR: 0.2329 - val_auROC: 0.8193 - val_categorical_accuracy: 0.1261 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 76/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.2119 - auROC: 0.8072 - categorical_accuracy: 0.1141 - loss: 0.1354 - val_auPR: 0.2331 - val_auROC: 0.8196 - val_categorical_accuracy: 0.1262 - val_loss: 0.1335 - learning_rate: 1.0000e-06
Epoch 77/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 73s 13ms/step - auPR: 0.2118 - auROC: 0.8072 - categorical_accuracy: 0.1139 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8197 - val_categorical_accuracy: 0.1259 - val_loss: 0.1335 - learning_rate: 1.0000e-06
Epoch 78/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 74s 13ms/step - auPR: 0.2123 - auROC: 0.8074 - categorical_accuracy: 0.1142 - loss: 0.1354 - val_auPR: 0.2329 - val_auROC: 0.8196 - val_categorical_accuracy: 0.1259 - val_loss: 0.1337 - learning_rate: 1.0000e-06
Epoch 79/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 79s 14ms/step - auPR: 0.2120 - auROC: 0.8073 - categorical_accuracy: 0.1138 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8196 - val_categorical_accuracy: 0.1265 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 80/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2120 - auROC: 0.8073 - categorical_accuracy: 0.1140 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8195 - val_categorical_accuracy: 0.1265 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 81/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 77s 14ms/step - auPR: 0.2122 - auROC: 0.8075 - categorical_accuracy: 0.1141 - loss: 0.1354 - val_auPR: 0.2331 - val_auROC: 0.8197 - val_categorical_accuracy: 0.1253 - val_loss: 0.1335 - learning_rate: 1.0000e-06
Epoch 82/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 78s 14ms/step - auPR: 0.2118 - auROC: 0.8073 - categorical_accuracy: 0.1141 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8195 - val_categorical_accuracy: 0.1258 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 83/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 76s 14ms/step - auPR: 0.2120 - auROC: 0.8072 - categorical_accuracy: 0.1137 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8194 - val_categorical_accuracy: 0.1264 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 84/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2121 - auROC: 0.8074 - categorical_accuracy: 0.1138 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8197 - val_categorical_accuracy: 0.1262 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 85/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2121 - auROC: 0.8074 - categorical_accuracy: 0.1143 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8196 - val_categorical_accuracy: 0.1269 - val_loss: 0.1336 - learning_rate: 1.0000e-06
Epoch 86/100
5532/5532 ━━━━━━━━━━━━━━━━━━━━ 75s 14ms/step - auPR: 0.2121 - auROC: 0.8072 - categorical_accuracy: 0.1139 - loss: 0.1354 - val_auPR: 0.2330 - val_auROC: 0.8195 - val_categorical_accuracy: 0.1259 - val_loss: 0.1336 - learning_rate: 1.0000e-06

Evaluation and prediction#

Evaluation and prediction are the same as peak regression.

The next steps you could take are to:

  1. Evaluate the model on the test set.

  2. Predict topic probabilities for a given sequence or region.

  3. Run tfmodisco to find motifs associated with each topic.

  4. Generate synthetic sequences for each topic using in silico evolution.

  5. Plot contribution scores per topic for interesting regions or sequences.

Refer to the introduction notebook for more details.