API#

treasmo.tl module#

treasmo.tl.feature_sparsity(mudata, group_by={})[source]#

Function to add feature sparisity information in MuData.var

The value is defined as #non-zero value / #cells in each feature.

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

group_by: dict

If provided, calculate per group feature sparsity for each modality

Example: {‘rna’:’cell_type’, ‘atac’:’cell_type’}

Returns:
MuData

var[‘Frac.all’] var[‘Frac.GroupName’]

treasmo.tl.get_gloc_from_atac_data(peaks, split_symbol)[source]#

Helper function to get the genomic locations (including the middle point) of peaks

*No need to call from user end

treasmo.tl.nearby_peaks(g_array, min_op=1)[source]#

Helper function to get regions overlapped with the gene body.

*No need to call from user end

Parameters:
min_op: int

The peak and gene have to be at least min_op overlapped to be considered overlapped.

treasmo.tl.peaks_within_distance(genes, peaks, upstream, downstream, ref_gtf_fn, no_intersect=True, id_col='GeneSymbol', split_symbol=['-', '-'])[source]#

Function to annotate genes with nearby peaks

Parameters:
genes: List, numpy.array

gene list to be annotated

peaks: List, numpy.array

candidate peaks list

upstream: int

include peaks N bp upstream of the gene TSS

downstream: int

include peaks N bp downstream of the TES

ref_gtf_fn: str

GTF format file containing gene location information see example at ChaozhongLiu/scGREAT

  • Homo_sapiens.GRCh38.104.GeneLoc.Tab.txt

  • Mus_musculus.GRCm38.100.GeneLoc.Tab.txt

no_intersect: bool

if the candidate peak lies in another gene’s body, remove the peak or not

id_col: str

column name in ref_gtf_fn that indicates gene ID

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

Returns:
DataFrame

contains gene annotation information

treasmo.tl.TFBS_match(genes, peaks, ref_fn, min_overlap=1, split_symbol=['-', '-'])[source]#

Function to annotate TF with binding site regions

Parameters:
genes: List, numpy.array

gene list to be annotated

peaks: List, numpy.array

candidate peaks list

ref_fn: str

BED format file containing gene binding site location information

see example below from JASPAR TFBS genome track (https://jaspar.genereg.net/genome-tracks/)

chr1    280     298     AGL3    821     -

chr1    309     327     AGL3    823     +

chr1    309     327     AGL3    882     -

chr1    1577    1595    AGL3    823     +
min_overlap: int

the minimum number of overlaped base pairs between candidate peak and TFBS

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

Returns:
DataFrame

contains TF annotation information

treasmo.tl.PFM2Motif(file_name, out_file, detection_threshold=0)[source]#

Function to convert PFM (Position Frequency Matrix) into Homer Motif file

Parameters:
file_name: str

Path to PFM file, see example at https://jaspar.elixir.no/docs/#jaspar-matrix-formats

out_file: str

Path to output file

detection_threshold: int, float

See Homer explanation at http://homer.ucsd.edu/homer/motif/creatingCustomMotifs.html

treasmo.tl.peak2HomerInput(peaks, out_file, filetype='peaks', split_symbol=['-', '-'])[source]#

Helper function to convert peak list to Homer accepted input format

Parameters:
peaks: List, numpy.array

peaks list to convert

out_file: str

Path to output file

filetype: str

peaks or bed. File type to generate

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

treasmo.tl.run_HOMER_motif(peaks, out_dir, prefix, ref_genome, homer_path=None, split_symbol=['-', '-'], size=200)[source]#

Function to run Homer from Python script. It will prepare Homer required input file and output results in the directory specified

Parameters:
peaks: List, numpy.array

peaks of interests

out_dir: str

output directory to save the results

prefix: str

prefix of all files, folder called out_dir/homer_[prefix] will be created to save all the results

ref_genome: str

reference genome name, e.g., ‘hg19’, ‘hg38’

homer_path: str

path to Homer software, if Homer already added to the PATH, argument can be ignored

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

size: int

Homer paramter, the size of the region used for motif finding

Returns:
DataFrame

Homer results saved in out_dir/homer_[prefix]

treasmo.tl.motif_summary(peak_file, homer_dir, motif_index, ref_genome, homer_path=None, size=200)[source]#

Function to extract related peaks from motifs of interests.

Parameters:
peak_file: str

out_dir/prefix.peaks.bed file generated in run_HOMER_motif()

homer_dir: str

out_dir/homer_prefix in run_HOMER_motif() | Homer output folder

motif_index: int

motif of interests index in homer_dir/knownResults.html

ref_genome: str

reference genome name, e.g., ‘hg19’, ‘hg38’

homer_path: str

path to Homer software, if Homer already added to the PATH, argument can be ignored

size: int

Homer paramter, the size of the region used for motif finding Please keep it the same as in run_HOMER_motif()

Returns:
Motif related peaks information saved in homer_dir/
DataFrame

contains peak list and motif matching quality information

treasmo.core module#

treasmo.core.Morans_I(mudata, mods=['rna', 'atac'], seed=1, max_RAM=16)[source]#

Function to calculate Moran’s I for all the features in multiome data

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

seed: int

random seed to make the results reproducible

max_RAM: int

maximum limitation of memory usage (Gb)

Returns:
MuData

added with var[‘Morans.I’]

treasmo.core.Global_L(mudata, pairsDf, mods=['rna', 'atac'], permutations=0, percent=0.1, seed=1, max_RAM=16)[source]#

Function to calculate the global L index (mean of correlation strength index) for all the pairs in multiome data

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

pairsDf: pandas.DataFrame

gene-peak pair DataFrame containing pairs to be calculated

self-prepared or called from treasmo.tl.peaks_within_distance / TFBS_match

permutations: int

Number of permutations for significance test.

Default is 0, meaning no significance test 999 is a good choice for most cases, but it might take a long time (hours) to finish depending on the number of pairs

percent: float

percentage of cells to shuffle during permutation.

For most of the time, default 0.1 is already a good choice.

seed: int

random seed to make the results reproducible

max_RAM: int

maximum limitation of memory usage(Gb)

Returns:
DataFrame

pairsDf added with extra columns:

Global L results

QC metrics (feature sparsity)

treasmo.core.Local_L(mudata, genes, peaks, mods=['rna', 'atac'], rm_dropout=False, seed=1, max_RAM=16)[source]#

Function to calculate the single-cell gene-peak correlation strength index for all the pairs in multiome data

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

genes, peaks: List, numpy.array

one-to-one lists containing gene and peak names

rm_dropout: bool

make correlation strength index to be 0 if feature value is 0 (dropout)

seed: int

random seed to make the results reproducible

max_RAM: int

maximum limitation of memory usage (Gb)

Returns:
MuData

added with Local L matrix and gene-peak pair names in mudata.uns

treasmo.core.Pearsonr(mudata, genes, peaks, mods=['rna', 'atac'], p_value=False, seed=1)[source]#

Function to calculate the Pearson correlation between genes and peaks

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

genes, peaks: List, numpy.array

one-to-one lists containing gene and peak names

p_value: bool

perform significance testing or not

seed: int

random seed to make the results reproducible

Returns:
DataFrame

containing Pearson correlation r, p-value and FDR for all gene-peak pairs

treasmo.ds module#

treasmo.ds.FindAllMarkers(mudata, ident, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1)[source]#

Function to discover regulatory gene-peak markers in all groups Target group correlation strength is compared with all remaining groups by t-test.

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

ident: str

column name in mudata.obs containing group labels

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

corrct_method: str

multi-test correction method, one of [‘bonferroni’, ‘fdr’]

seed: int

random seed to make the results reproducible

Returns:
DataFrame

Differentially regulated pairs statistical test results

treasmo.ds.FindMarkers(mudata, ident, group_1, group_2, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1, log=True)[source]#

Function to compare regulatory gene-peak pairs between two group by t-test.

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

ident: str

column name in mudata.obs containing group labels

group_1: str, int

first group name in ident to compare with the second

group_2: str, int

second group name in ident to compare with the first

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

corrct_method: str

multi-test correction method, one of [‘bonferroni’, ‘fdr’]

seed: int

random seed to make the results reproducible

Returns:
DataFrame

Differentially regulated pairs statistical test results

treasmo.ds.marker_test(local_L_df, group_1, group_2=None, corrct_method='bonferroni')[source]#

Function unit to perform statistical test with given data. No need to be called from user end.

treasmo.ds.group_stat(local_L, group_1, group_2=None)[source]#

Function unit to calculate group statitics. No need to be called from user end.

treasmo.ds.add_feature_sparsity(stat_df, mudata, group, mods=['rna', 'atac'])[source]#

Helper function to add feature sparsity information in final group comparison results.

treasmo.ds.MarkerFilter(statDf, min_pct_rna=0.1, min_pct_atac=0.05, mean_diff=1.0, p_cutoff=1e-12, plot=False)[source]#

Function to filter markers from statistical test results by sparsity, correlation difference, and p-value

Parameters:
statDf: DataFrame

Differentially regulated pairs statistical test results

min_pct_rna: float

sparsity filter cutoff: percentage of cells that express the gene

min_pct_atac: float

sparsity filter cutoff: percentage of cells that have the peak

mean_diff: float

mean correlation strength difference between the group and background (all other groups)

p_cutoff: float

adjusted p-value cutoff

plot: bool

if True, plot volcano plot

Returns:
DataFrame

Filtered marker list with the same columns as stat_df

if plot==True, also return volcano plot

treasmo.ds.FindPathMarkers(mudata, ident, path, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1)[source]#

One-to-one comparison of gene-peak correlation among groups in the trajectory path by t-test.

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

ident: str

column name in mudata.obs containing group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

corrct_method: str

multi-test correction method, one of [‘bonferroni’, ‘fdr’]

seed: int

random seed to make the results reproducible

Returns:
DataFrame

Differentially regulated pairs statistical test results

treasmo.ds.TimeBinData(mudata, ident, path, pseudotime, features, bins=100, rm_outlier=False, fitted=None)[source]#

Helper function to generate bined data along trajectory.

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

It must have correlation strength index calculated.

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

pseudotime: str

column name in mudata.obs containing trajectory pseudotime labels

features: List, numpy.array

List of gene-peak pair names. Can be selected from muData.uns['Local_L_names']

bins: int

number of bins to divide the trajectory into

rm_outlier: bool

whether or not adding cap and limit strength index within +- 2*std

fitted: int, default is None

if an int, return GaussianProcessRegressor fitted data with fitted bins.

if None, return only bined raw data

Returns:
DataFrame

bined raw data

Optional: GaussianProcessRegressor fitted data

treasmo.ds.TimeBinProportion(mudata, ident, path, pseudotime, bins=100)[source]#

Function to calculate bined cell type proportion along trajectory

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

pseudotime: str

column name in mudata.obs containing trajectory pseudotime labels

bins: int

number of bins to divide the trajectory into

Returns:
DataFrame

bined data with pseudotime and cell type proportion

treasmo.ds.FindPathDynamics(mudata, ident, path, pseudotime, rm_outlier=True, var_cutoff=0.1, range_cutoff=1.0, bins=100, plot=False)[source]#

Detect highly variable gene-peak pairs along the trajectory by correlation strength range (max-min) and variance

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

pseudotime: str

column name in mudata.obs containing trajectory pseudotime labels

bins: int (argument passed to TimeBinData)

number of bins to divide the trajectory into

rm_outlier: bool (argument passed to TimeBinData)

whether or not adding cap and limit strength index within +- 2*std

var_cutoff: float

minimum variance cutoff

range_cutoff: float

minimum range (max - min) cutoff

plot: bool

if True, plot volcano plot

Returns:
DataFrame

Dynamic gene-peak pairs along the trajectory

treasmo.ds.PathDynamics(mudata, gene, peaks, ident, path, pseudotime, bins=100)[source]#

Quantify regulatory dynamics along the trajectory for a single gene and its regulatory elements.

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

gene: str

a single gene name

peak: List, numpy.array

a list of peaks correlated with the gene (gene-peak pair should exist in mudata.uns[‘Local_L_names’])

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

pseudotime: str

pseudotime label for the trajectory saved in mudata.obs

bins: int

number of bins to divide the trajectory into

Returns:
MuData

DataFrame added in mudata.uns[‘pathDym’][path][gene] describing correlation and cluster proportion changes along the trajectory

treasmo.ds.DynamicModule(mudata, ident, path, pseudotime, features=None, bins=100, fitted=100, num_iteration=5000, som_shape=(2, 2), sigma=0.5, learning_rate=0.1, random_seed=1)[source]#

Function to cluster gene-peak modules by Self-Organizing Map along the trajectory

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

pseudotime: str

pseudotime label for the trajectory saved in mudata.obs

bins: int

number of bins to divide the trajectory into

fitted: int

number of bins to divide the trajectory into for GaussianProcessRegressor fitted data

bins sets bined data for later plotting;

fitted sets bined data for clustering;

It is recommended to keep the two the same

num_iteration: int

maximum number of iteration to optimize the SOM

som_shape: Tuple[int, int]

shape of the map, defines number and similarity structure of modules

sigma: float

the radius of the different neighbors in the SOM

learning_rate: float

optimization speed, how much weights are adjusted during each iteration

random_seed: int

random seed to make the results reproducible

Returns:
Dict

key is module index, value is time bin data

treasmo.pl module#

treasmo.pl.LocalCor_Heatmap(mudata, pairs, groupby, cluster=True, save=None, **kwds)[source]#

Function to visualize the local L matrix by heatmap, and cluster features

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

pairs: List, numpy.array

List of gene-peak pair names. Can be selected from muData.uns['Local_L_names']

cluster: bool

cluster features or not

groupby: str

group cells by the label saved in mudata.obs

save: str, default is None

if provided, heatmap will be saved in the file path provided

**kwds

other arguments for sc.pl.heatmap()

treasmo.pl.visualize_marker(mudata, gene, peak, mods=['rna', 'atac'], cmaps='plasma', basis='umap', vmins=None, vmaxs=None, figsize=None, save=None, **kwds)[source]#

Function to visualize the gene-peak pair correlation in user wanted embedding. e.g., UMAP.

It returns 3 plots: gene expression, peaks accessibility, and gene-peak correlation strength

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

gene: str

gene name

peak: str

peak name

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

cmaps: str, List

Color map to use for continous variables. Could be either a single color_map or a list

basis: str, List

the embeddings to plot. Could be either a single embedding space or a list for each of the feature

vmins: float, List

min value to color. Could be either a single value or a list for gene, peak, and correlation

vmaxs: float, List

max value to color. Same as vmins

figsize: Tuple(int, int)

figure size

save: str

if provided, heatmap will be saved in the file path provided

**kwds

other arguments for sc.pl.embedding

Returns:
Embedding colored by the gene, peak, and the correlation between gene and peak
treasmo.pl.PathDynamics(mudata, ident, path, gene, peaks=None, xlim=None, ylim=None, title=None, title_fontsize=15, ticks_fontsize=12, x_label='Pseudotime', y_label='Correlation Strength', label_fontsize=12, curve_colors=None, dot_size=5, linewidth=3, ident_colors=None, show_legend=True, save=None)[source]#

Function to visualize the gene-peak pair correlation changes along pseudotime + cell type proportion visualization

Note

To visualize the results, need to run df.PathDynamics() first.

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object It must have correlation strength index calculated.

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

gene: str

gene name

peaks: List, numpy.array

list of peak names to be paired with the gene

xlim: Tuple[float, float]

(min, max), the pseudotime range

ylim: Tuple[float, float]

(min, max), the correlation range limit, useful to remove outliers

title: str

Plot title

(title/ticks/label)_fontsize: int

fontsize of plot title, ticks and label

(x/y)_label: str

labels for x/y axis

curve_colors: List, numpy.array

curve colors for each of the gene-peak pair correlation;

if not specified, defaul color palette will be applied.

dot_size: int, float

dot size in plot

linewidth: int, float

curve width

ident_colors: List, numpy.array

colors of each cluster to be plotted in the proportion bar;

If not specified, function will look for uns[IDENT_colors] first;

If not found, default color palette will be applied.

show_legend: bool

Show color legend or not

save: str

if provided, heatmap will be saved in the file path

treasmo.pl.DynamicSumMtx(mudata, ident, path, gene, peaks=None, feature_colors=None, show_legend=True, save=None, **kwds)[source]#

Function to plot regulatory element relationships in heatmap by Spearman correlation

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

It must have correlation strength index calculated.

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

gene: str

gene name

peaks: List, numpy.array

list of peak names to be paired with the gene

feature_colors: List, numpy.array

list of colors for all the gene-peak pairs

show_legend: bool

whether or not to show figure legend

save: str

if provided, heatmap will be saved in the file path provided

**kwds

other arguments for sc.pl.embedding

treasmo.pl.DynamicModule(mudata, somDict, prpDfin, xlim=None, ylim=None, split=False, n_cols=3, title=None, title_fontsize=15, ticks_fontsize=12, x_label='Pseudotime', y_label='Correlation Strength', label_fontsize=12, curve_colors=None, dot_size=5, linewidth=3, ident_colors=None, show_legend=True, save=None)[source]#

Function to plot the gene-peak modules found in the trajectory by treasmo.ds.DynamicModule

Parameters:
mudata: MuData

single-cell multi-omics data saved as MuData object

Run treasmo.ds.DynamicModule beforehead.

somDict: dict

output from treasmo.ds.DynamicModule containing modules found

prpDfin: DataFrame

output from treasmo.ds.TimeBinProportion containing prpportion changes result

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

xlim: Tuple[float, float]

(min, max), the pseudotime range

ylim: Tuple[float, float]

(min, max), the correlation range limit, useful to remove outliers

title: str

Plot title

(title/ticks/label)_fontsize: int

fontsize of plot title, ticks and label

(x/y)_label: str

labels for x/y axis

curve_colors: List, numpy.array

Curve colors for each of the gene-peak pair correlation;

If not specified, defaul color palette will be applied.

dot_size: int, float

Dot size in plot

linewidth: int, float

curve width

ident_colors: List, numpy.array

Colors of each cluster to be plotted in the proportion bar;

If not specified, function will look for uns[IDENT_colors] first;

If not found, default color palette will be applied.

show_legend: bool

Show color legend or not

save: str

If provided, heatmap will be saved in the file path

treasmo.lee_vec module#

class treasmo.lee_vec.Spatial_Pearson(connectivity=None, permutations=999)[source]#

Global Spatial Pearson Correlation Statistic; Mean of single-cell gene-peak correlation strength index.

Adapted and vectorized from pysal/esda

Methods

fit(X, Y[, percent, max_RAM, seed])

bivariate spatial pearson's R based on Eq. 18 of :cite:`Lee2001`.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_fit_request(*[, max_RAM, percent, seed])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

fit(X, Y, percent=0.1, max_RAM=16, seed=1)[source]#

bivariate spatial pearson’s R based on Eq. 18 of :cite:`Lee2001`.

Parameters:
X: numpy.ndarray [n x p]

array containing continuous data

Y: numpy.ndarray [n x p]

array containing continuous data

percent: float

percentage of cells to shuffle during permutation. For most of the time, default 0.1 is alread a good choice.

seed: int

random seed to make the results reproducible

max_RAM: int, float

maximum limitation of memory (Gb)

Returns:
the fitted estimator.

Notes

Technical details and derivations can be found in :cite:`Lee2001`.

set_fit_request(*, max_RAM: bool | None | str = '$UNCHANGED$', percent: bool | None | str = '$UNCHANGED$', seed: bool | None | str = '$UNCHANGED$') Spatial_Pearson#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
max_RAMstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for max_RAM parameter in fit.

percentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for percent parameter in fit.

seedstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for seed parameter in fit.

Returns:
selfobject

The updated object.

class treasmo.lee_vec.Spatial_Pearson_Local(connectivity=None, permutations=999)[source]#

Single-cell gene-peak correlation strength index.

Adapted and vectorized from esda library

Methods

fit(X, Y[, max_RAM, seed])

bivariate local pearson's R based on Eq.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_fit_request(*[, max_RAM, seed])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

fit(X, Y, max_RAM=16, seed=1)[source]#

bivariate local pearson’s R based on Eq. 22 in Lee (2001)

Parameters:
X: numpy.ndarray [n x p]

array containing continuous data

Y: numpy.ndarray [n x p]

array containing continuous data

seed: int

random seed to make the results reproducible

max_RAM: int, float

maximum limitation of memory (Gb)

Returns:
the fitted estimator.
set_fit_request(*, max_RAM: bool | None | str = '$UNCHANGED$', seed: bool | None | str = '$UNCHANGED$') Spatial_Pearson_Local#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
max_RAMstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for max_RAM parameter in fit.

seedstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for seed parameter in fit.

Returns:
selfobject

The updated object.

treasmo.moran_vec module#

class treasmo.moran_vec.Moran(n, w, transformation='r', permutations=0)[source]#

Moran’s I Global Autocorrelation Statistic

Adapted from pysal/esda

Parameters:
nint

number of observations

wW

spatial weights instance

transformationstring

weights transformation, default is row-standardized “r”. Other options include “B”: binary, “D”: doubly-standardized, “U”: untransformed (general weights), “V”: variance-stabilizing.

permutationsint

number of random permutations for calculation of pseudo-p_values

Notes

Technical details and derivations can be found in :cite:`cliff81`.

Attributes:
wW

original w object

permutationsint

number of permutations

Iarray

value of Moran’s I

simarray

(if permutations>0) vector of I values for permuted samples

p_simarray

(if permutations>0) p-value based on permutations (one-tailed) null: spatial randomness alternative: the observed I is extreme if it is either extremely greater or extremely lower than the values obtained based on permutations

Methods

calc_i(X[, seed, max_RAM])

Function to calculate Moran's I of all features

calc_i(X, seed=1, max_RAM=16)[source]#

Function to calculate Moran’s I of all features

Parameters:
X: n x p array

log-transformed feature matrix

seed: int

random seed to make the results reproducible

max_RAM: int, float

maximum limitation of memory (Gb)

Returns:
the fitted estimator.