tl.modisco

tl.modisco#

TF-MoDISco (utility) functions.

Requires the modisco-lite and memelite packages to be installed. Install with: pip install crested[motif]

Functions

calculate_mean_expression_per_cell_type(...)

Read an AnnData object from an H5AD file and calculates the mean gene expression per cell type subclass.

calculate_similarity_matrix(all_patterns)

Calculate the similarity matrix for the given patterns.

calculate_tomtom_similarity_per_pattern(...)

Compute pairwise similarity between all trimmed patterns across matched HDF5 files using TOMTOM.

create_pattern_matrix(classes, all_patterns)

Create a pattern matrix from classes and patterns, with optional normalization.

create_pattern_tf_dict(pattern_match_dict, ...)

Create a dictionary mapping patterns to their associated transcription factors (TFs) and other metadata.

create_tf_ct_matrix(pattern_tf_dict, ...[, ...])

Create a tensor (matrix) of transcription factor (TF) expression and cell type contributions.

find_pattern(pattern_id, pattern_dict)

Find the index of a pattern by its ID.

find_pattern_matches(all_patterns, html_paths)

Find and filter pattern matches from the modisco-lite list of patterns to the motif database from the corresponding HTML paths.

generate_html_paths(all_patterns, classes, ...)

Generate html paths for each pattern in the filtered array.

generate_nucleotide_sequences(all_patterns)

Generate nucleotide sequences from pattern data.

get_pwms_from_modisco_file(modisco_file[, ...])

Extract PPMs (Position Probability Matrices) from a Modisco HDF5 results file.

match_h5_files_to_classes(contribution_dir, ...)

Match .h5 files in a given directory with a list of class names and returns a dictionary mapping.

pattern_similarity(all_patterns, idx1, idx2)

Compute the similarity between two patterns.

process_patterns(matched_files[, ...])

Process genomic patterns from matched HDF5 files, trim based on information content, and match to known patterns.

read_motif_to_tf_file(file_path)

Read a TSV file mapping motifs to transcription factors (TFs) into a DataFrame.

tfmodisco([contrib_dir, class_names, ...])

Run tf-modisco on one-hot encoded sequences and contribution scores stored in .npz files.