crested.tl.data.SequenceLoader

crested.tl.data.SequenceLoader#

class crested.tl.data.SequenceLoader(genome, in_memory=False, always_reverse_complement=False, deterministic_shift=False, max_stochastic_shift=0, regions=None)#

Load sequences from a genome file.

Options for reverse complementing and stochastic shifting are available.

Parameters:
  • genome (Genome) – Genome instance.

  • in_memory (bool (default: False)) – If True, the sequences of supplied regions will be loaded into memory.

  • always_reverse_complement (bool (default: False)) – If True, all sequences will be augmented with their reverse complement. Doubles the dataset size.

  • max_stochastic_shift (int (default: 0)) – Maximum stochastic shift (n base pairs) to apply randomly to each sequence.

  • regions (list[str] | None (default: None)) – List of regions to load into memory. Required if in_memory is True.

Methods table#

get_sequence(region[, stranded, shift])

Get sequence for a region, strand, and shift from memory or fasta.

Methods#

SequenceLoader.get_sequence(region, stranded=None, shift=0)#

Get sequence for a region, strand, and shift from memory or fasta.

If no strand is given in region or strand, assumes positive strand.

Parameters:
  • region (str) – Region to get the sequence for. Either (chr:start-end) or (chr:start-end:strand).

  • stranded (bool | None (default: None)) – Whether the input data is stranded. Default (None) infers from sequence (at a computational cost). If not stranded, positive strand is assumed.

  • shift (int (default: 0)) – Shift of the sequence within the extended sequence, for use with the stochastic shift mechanism.

Return type:

str

Returns:

The DNA sequence, as a string.