crested.pp.normalize_peaks

crested.pp.normalize_peaks#

crested.pp.normalize_peaks(adata, peak_threshold=0, gini_std_threshold=1.0, top_k_percent=0.01, inplace=True)#

Normalize the adata.X based on variability of the top values per cell type.

This function applies a normalization factor to each cell type, focusing on regions with the most significant peaks above a defined threshold and considering the variability within those peaks. Only used on continuous .X data. Modifies the input AnnData.X in place if inplace=True.

Parameters:
  • adata (AnnData) – The AnnData object containing the matrix (celltypes, regions) to be normalized.

  • peak_threshold (int (default: 0)) – The minimum value for a peak to be considered significant for the Gini score calculation.

  • gini_std_threshold (float (default: 1.0)) – The number of standard deviations below the mean Gini score used to determine the threshold for low variability.

  • top_k_percent (float (default: 0.01)) – The percentage (expressed as a fraction) of top values to consider for Gini score calculation.

  • inplace (bool (default: True)) – Perform computation and modify adata in-place or return a resulting copy of the adata instead.

Return type:

DataFrame | AnnData

Returns:

If inplace=True (default), modifies the AnnData in-place with the normalized matrix and normalization weights saved to adata.obsm['weights'], and returns the filtered .var of the significant peaks, as a DataFrame. If inplace=False, returns (adata, filtered_df): a modified copy of the AnnData object instead, along with a the filtered .var of the significant peaks, as a DataFrame.

Example

>>> crested.pp.normalize_peaks(
...     adata,
...     peak_threshold=0,
...     gini_std_threshold=2.0,
...     top_k_percent=0.05,
... )