API

API#

treasmo.tl module#

treasmo.tl.feature_sparsity(mudata, group_by={})[source]#

Function to add feature sparisity information in MuData.var

The value is defined as #non-zero value / #cells in each feature.

Parameters:

mudata: MuData

single-cell multi-omics data saved as MuData object

group_by: dict

If provided, calculate per group feature sparsity for each modality

Example: {‘rna’:’cell_type’, ‘atac’:’cell_type’}

Returns:

MuData: var[‘Frac.all’] var[‘Frac.GroupName’]

treasmo.tl.get_gloc_from_atac_data(peaks, split_symbol)[source]#

Helper function to get the genomic locations (including the middle point) of peaks

*No need to call from user end

treasmo.tl.nearby_peaks(g_array, min_op=1)[source]#

Helper function to get regions overlapped with the gene body.

*No need to call from user end

Parameters:

min_op: int: The peak and gene have to be at least min_op overlapped to be considered overlapped.

treasmo.tl.peaks_within_distance(genes, peaks, upstream, downstream, ref_gtf_fn, no_intersect=True, id_col='GeneSymbol', split_symbol=['-', '-'])[source]#

Function to annotate genes with nearby peaks

Parameters:

genes: List, numpy.array

gene list to be annotated

peaks: List, numpy.array

candidate peaks list

upstream: int

include peaks N bp upstream of the gene TSS

downstream: int

include peaks N bp downstream of the TES

ref_gtf_fn: str

GTF format file containing gene location information see example at ChaozhongLiu/scGREAT

Homo_sapiens.GRCh38.104.GeneLoc.Tab.txt
Mus_musculus.GRCm38.100.GeneLoc.Tab.txt

no_intersect: bool

if the candidate peak lies in another gene’s body, remove the peak or not

id_col: str

column name in ref_gtf_fn that indicates gene ID

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

Returns:

DataFrame: contains gene annotation information

treasmo.tl.TFBS_match(genes, peaks, ref_fn, min_overlap=1, split_symbol=['-', '-'])[source]#

Function to annotate TF with binding site regions

Parameters:

genes: List, numpy.array

gene list to be annotated

peaks: List, numpy.array

candidate peaks list

ref_fn: str

BED format file containing gene binding site location information

see example below from JASPAR TFBS genome track (https://jaspar.genereg.net/genome-tracks/)

chr1    280     298     AGL3    821     -

chr1    309     327     AGL3    823     +

chr1    309     327     AGL3    882     -

chr1    1577    1595    AGL3    823     +

min_overlap: int

the minimum number of overlaped base pairs between candidate peak and TFBS

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

Returns:

DataFrame: contains TF annotation information

treasmo.tl.PFM2Motif(file_name, out_file, detection_threshold=0)[source]#

Function to convert PFM (Position Frequency Matrix) into Homer Motif file

Parameters:

file_name: str: Path to PFM file, see example at https://jaspar.elixir.no/docs/#jaspar-matrix-formats
out_file: str: Path to output file
detection_threshold: int, float: See Homer explanation at http://homer.ucsd.edu/homer/motif/creatingCustomMotifs.html

treasmo.tl.peak2HomerInput(peaks, out_file, filetype='peaks', split_symbol=['-', '-'])[source]#

Helper function to convert peak list to Homer accepted input format

Parameters:

peaks: List, numpy.array

peaks list to convert

out_file: str

Path to output file

filetype: str

peaks or bed. File type to generate

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

treasmo.tl.run_HOMER_motif(peaks, out_dir, prefix, ref_genome, homer_path=None, split_symbol=['-', '-'], size=200)[source]#

Function to run Homer from Python script. It will prepare Homer required input file and output results in the directory specified

Parameters:

peaks: List, numpy.array

peaks of interests

out_dir: str

output directory to save the results

prefix: str

prefix of all files, folder called out_dir/homer_[prefix] will be created to save all the results

ref_genome: str

reference genome name, e.g., ‘hg19’, ‘hg38’

homer_path: str

path to Homer software, if Homer already added to the PATH, argument can be ignored

split_symbol: List[str, str]

how peak location ID is formatted

‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]

‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]

size: int

Homer paramter, the size of the region used for motif finding

Returns:

DataFrame: Homer results saved in out_dir/homer_[prefix]

treasmo.tl.motif_summary(peak_file, homer_dir, motif_index, ref_genome, homer_path=None, size=200)[source]#

Function to extract related peaks from motifs of interests.

Parameters:

peak_file: str: out_dir/prefix.peaks.bed file generated in run_HOMER_motif()
homer_dir: str: out_dir/homer_prefix in run_HOMER_motif() | Homer output folder
motif_index: int: motif of interests index in homer_dir/knownResults.html
ref_genome: str: reference genome name, e.g., ‘hg19’, ‘hg38’
homer_path: str: path to Homer software, if Homer already added to the PATH, argument can be ignored
size: int: Homer paramter, the size of the region used for motif finding Please keep it the same as in run_HOMER_motif()

Returns:

Motif related peaks information saved in homer_dir/
DataFrame: contains peak list and motif matching quality information

treasmo.core module#

treasmo.core.Morans_I(mudata, mods=['rna', 'atac'], seed=1, max_RAM=16)[source]#

Function to calculate Moran’s I for all the features in multiome data

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
mods: List[str, str]: scRNA-seq and scATAC-seq modality name in MuData object
seed: int: random seed to make the results reproducible
max_RAM: int: maximum limitation of memory usage (Gb)

Returns:

MuData: added with var[‘Morans.I’]

treasmo.core.Global_L(mudata, pairsDf, mods=['rna', 'atac'], permutations=0, percent=0.1, seed=1, max_RAM=16)[source]#

Function to calculate the global L index (mean of correlation strength index) for all the pairs in multiome data

Parameters:

mudata: MuData

single-cell multi-omics data saved as MuData object

mods: List[str, str]

scRNA-seq and scATAC-seq modality name in MuData object

pairsDf: pandas.DataFrame

gene-peak pair DataFrame containing pairs to be calculated

self-prepared or called from treasmo.tl.peaks_within_distance / TFBS_match

permutations: int

Number of permutations for significance test.

Default is 0, meaning no significance test 999 is a good choice for most cases, but it might take a long time (hours) to finish depending on the number of pairs

percent: float

percentage of cells to shuffle during permutation.

For most of the time, default 0.1 is already a good choice.

seed: int

random seed to make the results reproducible

max_RAM: int

maximum limitation of memory usage(Gb)

Returns:

DataFrame

pairsDf added with extra columns:

Global L results

QC metrics (feature sparsity)

treasmo.core.Local_L(mudata, genes, peaks, mods=['rna', 'atac'], rm_dropout=False, seed=1, max_RAM=16)[source]#

Function to calculate the single-cell gene-peak correlation strength index for all the pairs in multiome data

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
mods: List[str, str]: scRNA-seq and scATAC-seq modality name in MuData object
genes, peaks: List, numpy.array: one-to-one lists containing gene and peak names
rm_dropout: bool: make correlation strength index to be 0 if feature value is 0 (dropout)
seed: int: random seed to make the results reproducible
max_RAM: int: maximum limitation of memory usage (Gb)

Returns:

MuData: added with Local L matrix and gene-peak pair names in mudata.uns

treasmo.core.Pearsonr(mudata, genes, peaks, mods=['rna', 'atac'], p_value=False, seed=1)[source]#

Function to calculate the Pearson correlation between genes and peaks

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
mods: List[str, str]: scRNA-seq and scATAC-seq modality name in MuData object
genes, peaks: List, numpy.array: one-to-one lists containing gene and peak names
p_value: bool: perform significance testing or not
seed: int: random seed to make the results reproducible

Returns:

DataFrame: containing Pearson correlation r, p-value and FDR for all gene-peak pairs

treasmo.ds module#

treasmo.ds.FindAllMarkers(mudata, ident, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1)[source]#

Function to discover regulatory gene-peak markers in all groups Target group correlation strength is compared with all remaining groups by t-test.

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
ident: str: column name in mudata.obs containing group labels
mods: List[str, str]: scRNA-seq and scATAC-seq modality name in MuData object
corrct_method: str: multi-test correction method, one of [‘bonferroni’, ‘fdr’]
seed: int: random seed to make the results reproducible

Returns:

DataFrame: Differentially regulated pairs statistical test results

treasmo.ds.FindMarkers(mudata, ident, group_1, group_2, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1, log=True)[source]#

Function to compare regulatory gene-peak pairs between two group by t-test.

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
ident: str: column name in mudata.obs containing group labels
group_1: str, int: first group name in ident to compare with the second
group_2: str, int: second group name in ident to compare with the first
mods: List[str, str]: scRNA-seq and scATAC-seq modality name in MuData object
corrct_method: str: multi-test correction method, one of [‘bonferroni’, ‘fdr’]
seed: int: random seed to make the results reproducible

Returns:

DataFrame: Differentially regulated pairs statistical test results

treasmo.ds.marker_test(local_L_df, group_1, group_2=None, corrct_method='bonferroni')[source]#: Function unit to perform statistical test with given data. No need to be called from user end.

treasmo.ds.group_stat(local_L, group_1, group_2=None)[source]#: Function unit to calculate group statitics. No need to be called from user end.

treasmo.ds.add_feature_sparsity(stat_df, mudata, group, mods=['rna', 'atac'])[source]#: Helper function to add feature sparsity information in final group comparison results.

treasmo.ds.MarkerFilter(statDf, min_pct_rna=0.1, min_pct_atac=0.05, mean_diff=1.0, p_cutoff=1e-12, plot=False)[source]#

Function to filter markers from statistical test results by sparsity, correlation difference, and p-value

Parameters:

statDf: DataFrame: Differentially regulated pairs statistical test results
min_pct_rna: float: sparsity filter cutoff: percentage of cells that express the gene
min_pct_atac: float: sparsity filter cutoff: percentage of cells that have the peak
mean_diff: float: mean correlation strength difference between the group and background (all other groups)
p_cutoff: float: adjusted p-value cutoff
plot: bool: if True, plot volcano plot

Returns:

DataFrame

Filtered marker list with the same columns as stat_df

if plot==True, also return volcano plot

treasmo.ds.FindPathMarkers(mudata, ident, path, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1)[source]#

One-to-one comparison of gene-peak correlation among groups in the trajectory path by t-test.

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
ident: str: column name in mudata.obs containing group labels
path: List: list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
mods: List[str, str]: scRNA-seq and scATAC-seq modality name in MuData object
corrct_method: str: multi-test correction method, one of [‘bonferroni’, ‘fdr’]
seed: int: random seed to make the results reproducible

Returns:

DataFrame: Differentially regulated pairs statistical test results

treasmo.ds.TimeBinData(mudata, ident, path, pseudotime, features, bins=100, rm_outlier=False, fitted=None)[source]#

Helper function to generate bined data along trajectory.

Parameters:

mudata: MuData

single-cell multi-omics data saved as MuData object

It must have correlation strength index calculated.

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

pseudotime: str

column name in mudata.obs containing trajectory pseudotime labels

features: List, numpy.array

List of gene-peak pair names. Can be selected from muData.uns['Local_L_names']

bins: int

number of bins to divide the trajectory into

rm_outlier: bool

whether or not adding cap and limit strength index within +- 2*std

fitted: int, default is None

if an int, return GaussianProcessRegressor fitted data with fitted bins.

if None, return only bined raw data

Returns:

DataFrame

bined raw data

Optional: GaussianProcessRegressor fitted data

treasmo.ds.TimeBinProportion(mudata, ident, path, pseudotime, bins=100)[source]#

Function to calculate bined cell type proportion along trajectory

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
ident: str: column name in mudata.obs containing trajectory group labels
path: List: list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
pseudotime: str: column name in mudata.obs containing trajectory pseudotime labels
bins: int: number of bins to divide the trajectory into

Returns:

DataFrame: bined data with pseudotime and cell type proportion

treasmo.ds.FindPathDynamics(mudata, ident, path, pseudotime, rm_outlier=True, var_cutoff=0.1, range_cutoff=1.0, bins=100, plot=False)[source]#

Detect highly variable gene-peak pairs along the trajectory by correlation strength range (max-min) and variance

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
ident: str: column name in mudata.obs containing trajectory group labels
path: List: list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
pseudotime: str: column name in mudata.obs containing trajectory pseudotime labels
bins: int (argument passed to TimeBinData): number of bins to divide the trajectory into
rm_outlier: bool (argument passed to TimeBinData): whether or not adding cap and limit strength index within +- 2*std
var_cutoff: float: minimum variance cutoff
range_cutoff: float: minimum range (max - min) cutoff
plot: bool: if True, plot volcano plot

Returns:

DataFrame: Dynamic gene-peak pairs along the trajectory

treasmo.ds.PathDynamics(mudata, gene, peaks, ident, path, pseudotime, bins=100)[source]#

Quantify regulatory dynamics along the trajectory for a single gene and its regulatory elements.

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
gene: str: a single gene name
peak: List, numpy.array: a list of peaks correlated with the gene (gene-peak pair should exist in mudata.uns[‘Local_L_names’])
ident: str: column name in mudata.obs containing trajectory group labels
path: List: list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
pseudotime: str: pseudotime label for the trajectory saved in mudata.obs
bins: int: number of bins to divide the trajectory into

Returns:

MuData: DataFrame added in mudata.uns[‘pathDym’][path][gene] describing correlation and cluster proportion changes along the trajectory

treasmo.ds.DynamicModule(mudata, ident, path, pseudotime, features=None, bins=100, fitted=100, num_iteration=5000, som_shape=(2, 2), sigma=0.5, learning_rate=0.1, random_seed=1)[source]#

Function to cluster gene-peak modules by Self-Organizing Map along the trajectory

Parameters:

mudata: MuData

single-cell multi-omics data saved as MuData object

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

pseudotime: str

pseudotime label for the trajectory saved in mudata.obs

bins: int

number of bins to divide the trajectory into

fitted: int

number of bins to divide the trajectory into for GaussianProcessRegressor fitted data

bins sets bined data for later plotting;

fitted sets bined data for clustering;

It is recommended to keep the two the same

num_iteration: int

maximum number of iteration to optimize the SOM

som_shape: Tuple[int, int]

shape of the map, defines number and similarity structure of modules

sigma: float

the radius of the different neighbors in the SOM

learning_rate: float

optimization speed, how much weights are adjusted during each iteration

random_seed: int

random seed to make the results reproducible

Returns:

Dict: key is module index, value is time bin data

treasmo.pl module#

treasmo.pl.LocalCor_Heatmap(mudata, pairs, groupby, cluster=True, save=None, **kwds)[source]#

Function to visualize the local L matrix by heatmap, and cluster features

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
pairs: List, numpy.array: List of gene-peak pair names. Can be selected from muData.uns['Local_L_names']
cluster: bool: cluster features or not
groupby: str: group cells by the label saved in mudata.obs
save: str, default is None: if provided, heatmap will be saved in the file path provided
**kwds: other arguments for sc.pl.heatmap()

treasmo.pl.visualize_marker(mudata, gene, peak, mods=['rna', 'atac'], cmaps='plasma', basis='umap', vmins=None, vmaxs=None, figsize=None, save=None, **kwds)[source]#

Function to visualize the gene-peak pair correlation in user wanted embedding. e.g., UMAP.

It returns 3 plots: gene expression, peaks accessibility, and gene-peak correlation strength

Parameters:

mudata: MuData: single-cell multi-omics data saved as MuData object
gene: str: gene name
peak: str: peak name
mods: List[str, str]: scRNA-seq and scATAC-seq modality name in MuData object
cmaps: str, List: Color map to use for continous variables. Could be either a single color_map or a list
basis: str, List: the embeddings to plot. Could be either a single embedding space or a list for each of the feature
vmins: float, List: min value to color. Could be either a single value or a list for gene, peak, and correlation
vmaxs: float, List: max value to color. Same as vmins
figsize: Tuple(int, int): figure size
save: str: if provided, heatmap will be saved in the file path provided
**kwds: other arguments for sc.pl.embedding

Returns:

Embedding colored by the gene, peak, and the correlation between gene and peak

treasmo.pl.PathDynamics(mudata, ident, path, gene, peaks=None, xlim=None, ylim=None, title=None, title_fontsize=15, ticks_fontsize=12, x_label='Pseudotime', y_label='Correlation Strength', label_fontsize=12, curve_colors=None, dot_size=5, linewidth=3, ident_colors=None, show_legend=True, save=None)[source]#

Function to visualize the gene-peak pair correlation changes along pseudotime + cell type proportion visualization

Note

To visualize the results, need to run df.PathDynamics() first.

Parameters:

mudata: MuData

single-cell multi-omics data saved as MuData object It must have correlation strength index calculated.

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

gene: str

gene name

peaks: List, numpy.array

list of peak names to be paired with the gene

xlim: Tuple[float, float]

(min, max), the pseudotime range

ylim: Tuple[float, float]

(min, max), the correlation range limit, useful to remove outliers

title: str

Plot title

(title/ticks/label)_fontsize: int

fontsize of plot title, ticks and label

(x/y)_label: str

labels for x/y axis

curve_colors: List, numpy.array

curve colors for each of the gene-peak pair correlation;

if not specified, defaul color palette will be applied.

dot_size: int, float

dot size in plot

linewidth: int, float

curve width

ident_colors: List, numpy.array

colors of each cluster to be plotted in the proportion bar;

If not specified, function will look for uns[IDENT_colors] first;

If not found, default color palette will be applied.

show_legend: bool

Show color legend or not

save: str

if provided, heatmap will be saved in the file path

treasmo.pl.DynamicSumMtx(mudata, ident, path, gene, peaks=None, feature_colors=None, show_legend=True, save=None, **kwds)[source]#

Function to plot regulatory element relationships in heatmap by Spearman correlation

Parameters:

mudata: MuData

single-cell multi-omics data saved as MuData object

It must have correlation strength index calculated.

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

gene: str

gene name

peaks: List, numpy.array

list of peak names to be paired with the gene

feature_colors: List, numpy.array

list of colors for all the gene-peak pairs

show_legend: bool

whether or not to show figure legend

save: str

if provided, heatmap will be saved in the file path provided

**kwds

other arguments for sc.pl.embedding

treasmo.pl.DynamicModule(mudata, somDict, prpDfin, xlim=None, ylim=None, split=False, n_cols=3, title=None, title_fontsize=15, ticks_fontsize=12, x_label='Pseudotime', y_label='Correlation Strength', label_fontsize=12, curve_colors=None, dot_size=5, linewidth=3, ident_colors=None, show_legend=True, save=None)[source]#

Function to plot the gene-peak modules found in the trajectory by treasmo.ds.DynamicModule

Parameters:

mudata: MuData

single-cell multi-omics data saved as MuData object

Run treasmo.ds.DynamicModule beforehead.

somDict: dict

output from treasmo.ds.DynamicModule containing modules found

prpDfin: DataFrame

output from treasmo.ds.TimeBinProportion containing prpportion changes result

ident: str

column name in mudata.obs containing trajectory group labels

path: List

list of clusters ordered by their sequence on the trajectory. A path here should have no branch.

xlim: Tuple[float, float]

(min, max), the pseudotime range

ylim: Tuple[float, float]

(min, max), the correlation range limit, useful to remove outliers

title: str

Plot title

(title/ticks/label)_fontsize: int

fontsize of plot title, ticks and label

(x/y)_label: str

labels for x/y axis

curve_colors: List, numpy.array

Curve colors for each of the gene-peak pair correlation;

If not specified, defaul color palette will be applied.

dot_size: int, float

Dot size in plot

linewidth: int, float

curve width

ident_colors: List, numpy.array

Colors of each cluster to be plotted in the proportion bar;

If not specified, function will look for uns[IDENT_colors] first;

If not found, default color palette will be applied.

show_legend: bool

Show color legend or not

save: str

If provided, heatmap will be saved in the file path

treasmo.lee_vec module#

class treasmo.lee_vec.Spatial_Pearson(connectivity=None, permutations=999)[source]#

Global Spatial Pearson Correlation Statistic; Mean of single-cell gene-peak correlation strength index.

Adapted and vectorized from pysal/esda

Methods

`fit`(X, Y[, percent, max_RAM, seed])	bivariate spatial pearson's R based on Eq. 18 of :cite:`Lee2001`.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_fit_request`(*[, max_RAM, percent, seed])	Request metadata passed to the `fit` method.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, Y, percent=0.1, max_RAM=16, seed=1)[source]#

bivariate spatial pearson’s R based on Eq. 18 of :cite:`Lee2001`.

Parameters:

X: numpy.ndarray [n x p]: array containing continuous data
Y: numpy.ndarray [n x p]: array containing continuous data
percent: float: percentage of cells to shuffle during permutation. For most of the time, default 0.1 is alread a good choice.
seed: int: random seed to make the results reproducible
max_RAM: int, float: maximum limitation of memory (Gb)

Returns:

the fitted estimator.

Notes

Technical details and derivations can be found in :cite:`Lee2001`.

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

max_RAMstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for max_RAM parameter in fit.
percentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for percent parameter in fit.
seedstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for seed parameter in fit.

Returns:

selfobject: The updated object.

class treasmo.lee_vec.Spatial_Pearson_Local(connectivity=None, permutations=999)[source]#

Single-cell gene-peak correlation strength index.

Adapted and vectorized from esda library

Methods

`fit`(X, Y[, max_RAM, seed])	bivariate local pearson's R based on Eq.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_fit_request`(*[, max_RAM, seed])	Request metadata passed to the `fit` method.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, Y, max_RAM=16, seed=1)[source]#

bivariate local pearson’s R based on Eq. 22 in Lee (2001)

Parameters:

X: numpy.ndarray [n x p]: array containing continuous data
Y: numpy.ndarray [n x p]: array containing continuous data
seed: int: random seed to make the results reproducible
max_RAM: int, float: maximum limitation of memory (Gb)

Returns:

the fitted estimator.

set_fit_request(*, max_RAM: bool | None | str = '$UNCHANGED$', seed: bool | None | str = '$UNCHANGED$') → Spatial_Pearson_Local#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

max_RAMstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for max_RAM parameter in fit.
seedstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for seed parameter in fit.

Returns:

selfobject: The updated object.

treasmo.moran_vec module#

class treasmo.moran_vec.Moran(n, w, transformation='r', permutations=0)[source]#

Moran’s I Global Autocorrelation Statistic

Adapted from pysal/esda

Parameters:

nint: number of observations
wW: spatial weights instance
transformationstring: weights transformation, default is row-standardized “r”. Other options include “B”: binary, “D”: doubly-standardized, “U”: untransformed (general weights), “V”: variance-stabilizing.
permutationsint: number of random permutations for calculation of pseudo-p_values

Notes

Technical details and derivations can be found in :cite:`cliff81`.

Attributes:

wW: original w object
permutationsint: number of permutations
Iarray: value of Moran’s I
simarray: (if permutations>0) vector of I values for permuted samples
p_simarray: (if permutations>0) p-value based on permutations (one-tailed) null: spatial randomness alternative: the observed I is extreme if it is either extremely greater or extremely lower than the values obtained based on permutations

Methods

calc_i(X[, seed, max_RAM])

Function to calculate Moran's I of all features

calc_i(X, seed=1, max_RAM=16)[source]#

Function to calculate Moran’s I of all features

Parameters:

X: n x p array: log-transformed feature matrix
seed: int: random seed to make the results reproducible
max_RAM: int, float: maximum limitation of memory (Gb)

Returns:

the fitted estimator.

API

Contents

API#

treasmo.tl module#

treasmo.core module#

treasmo.ds module#

treasmo.pl module#

treasmo.lee_vec module#

treasmo.moran_vec module#