crested.tl.data.SequenceLoader#
- class crested.tl.data.SequenceLoader(genome, in_memory=False, always_reverse_complement=False, deterministic_shift=False, max_stochastic_shift=0, regions=None)#
Load sequences from a genome file.
Options for reverse complementing and stochastic shifting are available.
- Parameters:
genome (
Genome) – Genome instance.in_memory (
bool(default:False)) – If True, the sequences of supplied regions will be loaded into memory.always_reverse_complement (
bool(default:False)) – If True, all sequences will be augmented with their reverse complement. Doubles the dataset size.max_stochastic_shift (
int(default:0)) – Maximum stochastic shift (n base pairs) to apply randomly to each sequence.regions (
list[str] |None(default:None)) – List of regions to load into memory. Required if in_memory is True.
Methods table#
|
Get sequence for a region, strand, and shift from memory or fasta. |
Methods#
- SequenceLoader.get_sequence(region, stranded=None, shift=0)#
Get sequence for a region, strand, and shift from memory or fasta.
If no strand is given in region or strand, assumes positive strand.
- Parameters:
region (
str) – Region to get the sequence for. Either (chr:start-end) or (chr:start-end:strand).stranded (
bool|None(default:None)) – Whether the input data is stranded. Default (None) infers from sequence (at a computational cost). If not stranded, positive strand is assumed.shift (
int(default:0)) – Shift of the sequence within the extended sequence, for use with the stochastic shift mechanism.
- Return type:
- Returns:
The DNA sequence, as a string.