API#
treasmo.tl module#
- treasmo.tl.feature_sparsity(mudata, group_by={})[source]#
Function to add feature sparisity information in MuData.var
The value is defined as
#non-zero value / #cellsin each feature.- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- group_by: dict
If provided, calculate per group feature sparsity for each modality
Example: {‘rna’:’cell_type’, ‘atac’:’cell_type’}
- Returns:
- MuData
var[‘Frac.all’] var[‘Frac.GroupName’]
- treasmo.tl.get_gloc_from_atac_data(peaks, split_symbol)[source]#
Helper function to get the genomic locations (including the middle point) of peaks
*No need to call from user end
- treasmo.tl.nearby_peaks(g_array, min_op=1)[source]#
Helper function to get regions overlapped with the gene body.
*No need to call from user end
- Parameters:
- min_op: int
The peak and gene have to be at least
min_opoverlapped to be considered overlapped.
- treasmo.tl.peaks_within_distance(genes, peaks, upstream, downstream, ref_gtf_fn, no_intersect=True, id_col='GeneSymbol', split_symbol=['-', '-'])[source]#
Function to annotate genes with nearby peaks
- Parameters:
- genes: List, numpy.array
gene list to be annotated
- peaks: List, numpy.array
candidate peaks list
- upstream: int
include peaks N bp upstream of the gene TSS
- downstream: int
include peaks N bp downstream of the TES
- ref_gtf_fn: str
GTF format file containing gene location information see example at ChaozhongLiu/scGREAT
Homo_sapiens.GRCh38.104.GeneLoc.Tab.txt
Mus_musculus.GRCm38.100.GeneLoc.Tab.txt
- no_intersect: bool
if the candidate peak lies in another gene’s body, remove the peak or not
- id_col: str
column name in
ref_gtf_fnthat indicates gene ID- split_symbol: List[str, str]
how peak location ID is formatted
‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]
‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]
- Returns:
- DataFrame
contains gene annotation information
- treasmo.tl.TFBS_match(genes, peaks, ref_fn, min_overlap=1, split_symbol=['-', '-'])[source]#
Function to annotate TF with binding site regions
- Parameters:
- genes: List, numpy.array
gene list to be annotated
- peaks: List, numpy.array
candidate peaks list
- ref_fn: str
BED format file containing gene binding site location information
see example below from JASPAR TFBS genome track (https://jaspar.genereg.net/genome-tracks/)
chr1 280 298 AGL3 821 - chr1 309 327 AGL3 823 + chr1 309 327 AGL3 882 - chr1 1577 1595 AGL3 823 +
- min_overlap: int
the minimum number of overlaped base pairs between candidate peak and TFBS
- split_symbol: List[str, str]
how peak location ID is formatted
‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]
‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]
- Returns:
- DataFrame
contains TF annotation information
- treasmo.tl.PFM2Motif(file_name, out_file, detection_threshold=0)[source]#
Function to convert PFM (Position Frequency Matrix) into Homer Motif file
- Parameters:
- file_name: str
Path to PFM file, see example at https://jaspar.elixir.no/docs/#jaspar-matrix-formats
- out_file: str
Path to output file
- detection_threshold: int, float
See Homer explanation at http://homer.ucsd.edu/homer/motif/creatingCustomMotifs.html
- treasmo.tl.peak2HomerInput(peaks, out_file, filetype='peaks', split_symbol=['-', '-'])[source]#
Helper function to convert peak list to Homer accepted input format
- Parameters:
- peaks: List, numpy.array
peaks list to convert
- out_file: str
Path to output file
- filetype: str
peaks or bed. File type to generate
- split_symbol: List[str, str]
how peak location ID is formatted
‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]
‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]
- treasmo.tl.run_HOMER_motif(peaks, out_dir, prefix, ref_genome, homer_path=None, split_symbol=['-', '-'], size=200)[source]#
Function to run Homer from Python script. It will prepare Homer required input file and output results in the directory specified
- Parameters:
- peaks: List, numpy.array
peaks of interests
- out_dir: str
output directory to save the results
- prefix: str
prefix of all files, folder called out_dir/homer_[prefix] will be created to save all the results
- ref_genome: str
reference genome name, e.g., ‘hg19’, ‘hg38’
- homer_path: str
path to Homer software, if Homer already added to the PATH, argument can be ignored
- split_symbol: List[str, str]
how peak location ID is formatted
‘chr1-12345-23456’ - split_symbol=[‘-‘,’-‘]
‘chr1:12345-23456’ - split_symbol=[‘:’,’-‘]
- size: int
Homer paramter, the size of the region used for motif finding
- Returns:
- DataFrame
Homer results saved in out_dir/homer_[prefix]
- treasmo.tl.motif_summary(peak_file, homer_dir, motif_index, ref_genome, homer_path=None, size=200)[source]#
Function to extract related peaks from motifs of interests.
- Parameters:
- peak_file: str
out_dir/prefix.peaks.bed file generated in run_HOMER_motif()
- homer_dir: str
out_dir/homer_prefix in run_HOMER_motif() | Homer output folder
- motif_index: int
motif of interests index in homer_dir/knownResults.html
- ref_genome: str
reference genome name, e.g., ‘hg19’, ‘hg38’
- homer_path: str
path to Homer software, if Homer already added to the PATH, argument can be ignored
- size: int
Homer paramter, the size of the region used for motif finding Please keep it the same as in run_HOMER_motif()
- Returns:
- Motif related peaks information saved in homer_dir/
- DataFrame
contains peak list and motif matching quality information
treasmo.core module#
- treasmo.core.Morans_I(mudata, mods=['rna', 'atac'], seed=1, max_RAM=16)[source]#
Function to calculate Moran’s I for all the features in multiome data
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- seed: int
random seed to make the results reproducible
- max_RAM: int
maximum limitation of memory usage (Gb)
- Returns:
- MuData
added with var[‘Morans.I’]
- treasmo.core.Global_L(mudata, pairsDf, mods=['rna', 'atac'], permutations=0, percent=0.1, seed=1, max_RAM=16)[source]#
Function to calculate the global L index (mean of correlation strength index) for all the pairs in multiome data
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- pairsDf: pandas.DataFrame
gene-peak pair DataFrame containing pairs to be calculated
self-prepared or called from treasmo.tl.peaks_within_distance / TFBS_match
- permutations: int
Number of permutations for significance test.
Default is 0, meaning no significance test 999 is a good choice for most cases, but it might take a long time (hours) to finish depending on the number of pairs
- percent: float
percentage of cells to shuffle during permutation.
For most of the time, default 0.1 is already a good choice.
- seed: int
random seed to make the results reproducible
- max_RAM: int
maximum limitation of memory usage(Gb)
- Returns:
- DataFrame
pairsDf added with extra columns:
Global L results
QC metrics (feature sparsity)
- treasmo.core.Local_L(mudata, genes, peaks, mods=['rna', 'atac'], rm_dropout=False, seed=1, max_RAM=16)[source]#
Function to calculate the single-cell gene-peak correlation strength index for all the pairs in multiome data
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- genes, peaks: List, numpy.array
one-to-one lists containing gene and peak names
- rm_dropout: bool
make correlation strength index to be 0 if feature value is 0 (dropout)
- seed: int
random seed to make the results reproducible
- max_RAM: int
maximum limitation of memory usage (Gb)
- Returns:
- MuData
added with Local L matrix and gene-peak pair names in
mudata.uns
- treasmo.core.Pearsonr(mudata, genes, peaks, mods=['rna', 'atac'], p_value=False, seed=1)[source]#
Function to calculate the Pearson correlation between genes and peaks
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- genes, peaks: List, numpy.array
one-to-one lists containing gene and peak names
- p_value: bool
perform significance testing or not
- seed: int
random seed to make the results reproducible
- Returns:
- DataFrame
containing Pearson correlation r, p-value and FDR for all gene-peak pairs
treasmo.ds module#
- treasmo.ds.FindAllMarkers(mudata, ident, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1)[source]#
Function to discover regulatory gene-peak markers in all groups Target group correlation strength is compared with all remaining groups by t-test.
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- ident: str
column name in
mudata.obscontaining group labels- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- corrct_method: str
multi-test correction method, one of [‘bonferroni’, ‘fdr’]
- seed: int
random seed to make the results reproducible
- Returns:
- DataFrame
Differentially regulated pairs statistical test results
- treasmo.ds.FindMarkers(mudata, ident, group_1, group_2, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1, log=True)[source]#
Function to compare regulatory gene-peak pairs between two group by t-test.
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- ident: str
column name in mudata.obs containing group labels
- group_1: str, int
first group name in ident to compare with the second
- group_2: str, int
second group name in ident to compare with the first
- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- corrct_method: str
multi-test correction method, one of [‘bonferroni’, ‘fdr’]
- seed: int
random seed to make the results reproducible
- Returns:
- DataFrame
Differentially regulated pairs statistical test results
- treasmo.ds.marker_test(local_L_df, group_1, group_2=None, corrct_method='bonferroni')[source]#
Function unit to perform statistical test with given data. No need to be called from user end.
- treasmo.ds.group_stat(local_L, group_1, group_2=None)[source]#
Function unit to calculate group statitics. No need to be called from user end.
- treasmo.ds.add_feature_sparsity(stat_df, mudata, group, mods=['rna', 'atac'])[source]#
Helper function to add feature sparsity information in final group comparison results.
- treasmo.ds.MarkerFilter(statDf, min_pct_rna=0.1, min_pct_atac=0.05, mean_diff=1.0, p_cutoff=1e-12, plot=False)[source]#
Function to filter markers from statistical test results by sparsity, correlation difference, and p-value
- Parameters:
- statDf: DataFrame
Differentially regulated pairs statistical test results
- min_pct_rna: float
sparsity filter cutoff: percentage of cells that express the gene
- min_pct_atac: float
sparsity filter cutoff: percentage of cells that have the peak
- mean_diff: float
mean correlation strength difference between the group and background (all other groups)
- p_cutoff: float
adjusted p-value cutoff
- plot: bool
if True, plot volcano plot
- Returns:
- DataFrame
Filtered marker list with the same columns as stat_df
if plot==True, also return volcano plot
- treasmo.ds.FindPathMarkers(mudata, ident, path, mods=['rna', 'atac'], corrct_method='bonferroni', seed=1)[source]#
One-to-one comparison of gene-peak correlation among groups in the trajectory path by t-test.
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- ident: str
column name in
mudata.obscontaining group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- corrct_method: str
multi-test correction method, one of [‘bonferroni’, ‘fdr’]
- seed: int
random seed to make the results reproducible
- Returns:
- DataFrame
Differentially regulated pairs statistical test results
- treasmo.ds.TimeBinData(mudata, ident, path, pseudotime, features, bins=100, rm_outlier=False, fitted=None)[source]#
Helper function to generate bined data along trajectory.
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
It must have correlation strength index calculated.
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- pseudotime: str
column name in
mudata.obscontaining trajectory pseudotime labels- features: List, numpy.array
List of gene-peak pair names. Can be selected from
muData.uns['Local_L_names']- bins: int
number of bins to divide the trajectory into
- rm_outlier: bool
whether or not adding cap and limit strength index within +- 2*std
- fitted: int, default is None
if an int, return GaussianProcessRegressor fitted data with
fittedbins.if None, return only bined raw data
- Returns:
- DataFrame
bined raw data
Optional: GaussianProcessRegressor fitted data
- treasmo.ds.TimeBinProportion(mudata, ident, path, pseudotime, bins=100)[source]#
Function to calculate bined cell type proportion along trajectory
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- pseudotime: str
column name in
mudata.obscontaining trajectory pseudotime labels- bins: int
number of bins to divide the trajectory into
- Returns:
- DataFrame
bined data with pseudotime and cell type proportion
- treasmo.ds.FindPathDynamics(mudata, ident, path, pseudotime, rm_outlier=True, var_cutoff=0.1, range_cutoff=1.0, bins=100, plot=False)[source]#
Detect highly variable gene-peak pairs along the trajectory by correlation strength range (max-min) and variance
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- pseudotime: str
column name in
mudata.obscontaining trajectory pseudotime labels- bins: int (argument passed to TimeBinData)
number of bins to divide the trajectory into
- rm_outlier: bool (argument passed to TimeBinData)
whether or not adding cap and limit strength index within +- 2*std
- var_cutoff: float
minimum variance cutoff
- range_cutoff: float
minimum range (max - min) cutoff
- plot: bool
if True, plot volcano plot
- Returns:
- DataFrame
Dynamic gene-peak pairs along the trajectory
- treasmo.ds.PathDynamics(mudata, gene, peaks, ident, path, pseudotime, bins=100)[source]#
Quantify regulatory dynamics along the trajectory for a single gene and its regulatory elements.
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- gene: str
a single gene name
- peak: List, numpy.array
a list of peaks correlated with the gene (gene-peak pair should exist in mudata.uns[‘Local_L_names’])
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- pseudotime: str
pseudotime label for the trajectory saved in
mudata.obs- bins: int
number of bins to divide the trajectory into
- Returns:
- MuData
DataFrame added in mudata.uns[‘pathDym’][path][gene] describing correlation and cluster proportion changes along the trajectory
- treasmo.ds.DynamicModule(mudata, ident, path, pseudotime, features=None, bins=100, fitted=100, num_iteration=5000, som_shape=(2, 2), sigma=0.5, learning_rate=0.1, random_seed=1)[source]#
Function to cluster gene-peak modules by Self-Organizing Map along the trajectory
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- pseudotime: str
pseudotime label for the trajectory saved in
mudata.obs- bins: int
number of bins to divide the trajectory into
- fitted: int
number of bins to divide the trajectory into for GaussianProcessRegressor fitted data
binssets bined data for later plotting;fittedsets bined data for clustering;It is recommended to keep the two the same
- num_iteration: int
maximum number of iteration to optimize the SOM
- som_shape: Tuple[int, int]
shape of the map, defines number and similarity structure of modules
- sigma: float
the radius of the different neighbors in the SOM
- learning_rate: float
optimization speed, how much weights are adjusted during each iteration
- random_seed: int
random seed to make the results reproducible
- Returns:
- Dict
key is module index, value is time bin data
treasmo.pl module#
- treasmo.pl.LocalCor_Heatmap(mudata, pairs, groupby, cluster=True, save=None, **kwds)[source]#
Function to visualize the local L matrix by heatmap, and cluster features
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- pairs: List, numpy.array
List of gene-peak pair names. Can be selected from
muData.uns['Local_L_names']- cluster: bool
cluster features or not
- groupby: str
group cells by the label saved in
mudata.obs- save: str, default is None
if provided, heatmap will be saved in the file path provided
- **kwds
other arguments for sc.pl.heatmap()
- treasmo.pl.visualize_marker(mudata, gene, peak, mods=['rna', 'atac'], cmaps='plasma', basis='umap', vmins=None, vmaxs=None, figsize=None, save=None, **kwds)[source]#
Function to visualize the gene-peak pair correlation in user wanted embedding. e.g., UMAP.
It returns 3 plots: gene expression, peaks accessibility, and gene-peak correlation strength
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
- gene: str
gene name
- peak: str
peak name
- mods: List[str, str]
scRNA-seq and scATAC-seq modality name in MuData object
- cmaps: str, List
Color map to use for continous variables. Could be either a single color_map or a list
- basis: str, List
the embeddings to plot. Could be either a single embedding space or a list for each of the feature
- vmins: float, List
min value to color. Could be either a single value or a list for gene, peak, and correlation
- vmaxs: float, List
max value to color. Same as vmins
- figsize: Tuple(int, int)
figure size
- save: str
if provided, heatmap will be saved in the file path provided
- **kwds
other arguments for sc.pl.embedding
- Returns:
- Embedding colored by the gene, peak, and the correlation between gene and peak
- treasmo.pl.PathDynamics(mudata, ident, path, gene, peaks=None, xlim=None, ylim=None, title=None, title_fontsize=15, ticks_fontsize=12, x_label='Pseudotime', y_label='Correlation Strength', label_fontsize=12, curve_colors=None, dot_size=5, linewidth=3, ident_colors=None, show_legend=True, save=None)[source]#
Function to visualize the gene-peak pair correlation changes along pseudotime + cell type proportion visualization
Note
To visualize the results, need to run df.PathDynamics() first.
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object It must have correlation strength index calculated.
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- gene: str
gene name
- peaks: List, numpy.array
list of peak names to be paired with the gene
- xlim: Tuple[float, float]
(min, max), the pseudotime range
- ylim: Tuple[float, float]
(min, max), the correlation range limit, useful to remove outliers
- title: str
Plot title
- (title/ticks/label)_fontsize: int
fontsize of plot title, ticks and label
- (x/y)_label: str
labels for x/y axis
- curve_colors: List, numpy.array
curve colors for each of the gene-peak pair correlation;
if not specified, defaul color palette will be applied.
- dot_size: int, float
dot size in plot
- linewidth: int, float
curve width
- ident_colors: List, numpy.array
colors of each cluster to be plotted in the proportion bar;
If not specified, function will look for uns[IDENT_colors] first;
If not found, default color palette will be applied.
- show_legend: bool
Show color legend or not
- save: str
if provided, heatmap will be saved in the file path
- treasmo.pl.DynamicSumMtx(mudata, ident, path, gene, peaks=None, feature_colors=None, show_legend=True, save=None, **kwds)[source]#
Function to plot regulatory element relationships in heatmap by Spearman correlation
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
It must have correlation strength index calculated.
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- gene: str
gene name
- peaks: List, numpy.array
list of peak names to be paired with the gene
- feature_colors: List, numpy.array
list of colors for all the gene-peak pairs
- show_legend: bool
whether or not to show figure legend
- save: str
if provided, heatmap will be saved in the file path provided
- **kwds
other arguments for sc.pl.embedding
- treasmo.pl.DynamicModule(mudata, somDict, prpDfin, xlim=None, ylim=None, split=False, n_cols=3, title=None, title_fontsize=15, ticks_fontsize=12, x_label='Pseudotime', y_label='Correlation Strength', label_fontsize=12, curve_colors=None, dot_size=5, linewidth=3, ident_colors=None, show_legend=True, save=None)[source]#
Function to plot the gene-peak modules found in the trajectory by treasmo.ds.DynamicModule
- Parameters:
- mudata: MuData
single-cell multi-omics data saved as MuData object
Run
treasmo.ds.DynamicModulebeforehead.- somDict: dict
output from treasmo.ds.DynamicModule containing modules found
- prpDfin: DataFrame
output from treasmo.ds.TimeBinProportion containing prpportion changes result
- ident: str
column name in
mudata.obscontaining trajectory group labels- path: List
list of clusters ordered by their sequence on the trajectory. A path here should have no branch.
- xlim: Tuple[float, float]
(min, max), the pseudotime range
- ylim: Tuple[float, float]
(min, max), the correlation range limit, useful to remove outliers
- title: str
Plot title
- (title/ticks/label)_fontsize: int
fontsize of plot title, ticks and label
- (x/y)_label: str
labels for x/y axis
- curve_colors: List, numpy.array
Curve colors for each of the gene-peak pair correlation;
If not specified, defaul color palette will be applied.
- dot_size: int, float
Dot size in plot
- linewidth: int, float
curve width
- ident_colors: List, numpy.array
Colors of each cluster to be plotted in the proportion bar;
If not specified, function will look for uns[IDENT_colors] first;
If not found, default color palette will be applied.
- show_legend: bool
Show color legend or not
- save: str
If provided, heatmap will be saved in the file path
treasmo.lee_vec module#
- class treasmo.lee_vec.Spatial_Pearson(connectivity=None, permutations=999)[source]#
Global Spatial Pearson Correlation Statistic; Mean of single-cell gene-peak correlation strength index.
Adapted and vectorized from pysal/esda
Methods
fit(X, Y[, percent, max_RAM, seed])bivariate spatial pearson's R based on Eq. 18 of :cite:`Lee2001`.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, max_RAM, percent, seed])Request metadata passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
- fit(X, Y, percent=0.1, max_RAM=16, seed=1)[source]#
bivariate spatial pearson’s R based on Eq. 18 of :cite:`Lee2001`.
- Parameters:
- X: numpy.ndarray [n x p]
array containing continuous data
- Y: numpy.ndarray [n x p]
array containing continuous data
- percent: float
percentage of cells to shuffle during permutation. For most of the time, default 0.1 is alread a good choice.
- seed: int
random seed to make the results reproducible
- max_RAM: int, float
maximum limitation of memory (Gb)
- Returns:
- the fitted estimator.
Notes
Technical details and derivations can be found in :cite:`Lee2001`.
- set_fit_request(*, max_RAM: bool | None | str = '$UNCHANGED$', percent: bool | None | str = '$UNCHANGED$', seed: bool | None | str = '$UNCHANGED$') Spatial_Pearson#
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- max_RAMstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
max_RAMparameter infit.- percentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
percentparameter infit.- seedstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
seedparameter infit.
- Returns:
- selfobject
The updated object.
- class treasmo.lee_vec.Spatial_Pearson_Local(connectivity=None, permutations=999)[source]#
Single-cell gene-peak correlation strength index.
Adapted and vectorized from esda library
Methods
fit(X, Y[, max_RAM, seed])bivariate local pearson's R based on Eq.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, max_RAM, seed])Request metadata passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
- fit(X, Y, max_RAM=16, seed=1)[source]#
bivariate local pearson’s R based on Eq. 22 in Lee (2001)
- Parameters:
- X: numpy.ndarray [n x p]
array containing continuous data
- Y: numpy.ndarray [n x p]
array containing continuous data
- seed: int
random seed to make the results reproducible
- max_RAM: int, float
maximum limitation of memory (Gb)
- Returns:
- the fitted estimator.
- set_fit_request(*, max_RAM: bool | None | str = '$UNCHANGED$', seed: bool | None | str = '$UNCHANGED$') Spatial_Pearson_Local#
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- max_RAMstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
max_RAMparameter infit.- seedstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
seedparameter infit.
- Returns:
- selfobject
The updated object.
treasmo.moran_vec module#
- class treasmo.moran_vec.Moran(n, w, transformation='r', permutations=0)[source]#
Moran’s I Global Autocorrelation Statistic
Adapted from pysal/esda
- Parameters:
- nint
number of observations
- wW
spatial weights instance
- transformationstring
weights transformation, default is row-standardized “r”. Other options include “B”: binary, “D”: doubly-standardized, “U”: untransformed (general weights), “V”: variance-stabilizing.
- permutationsint
number of random permutations for calculation of pseudo-p_values
Notes
Technical details and derivations can be found in :cite:`cliff81`.
- Attributes:
- wW
original w object
- permutationsint
number of permutations
- Iarray
value of Moran’s I
- simarray
(if permutations>0) vector of I values for permuted samples
- p_simarray
(if permutations>0) p-value based on permutations (one-tailed) null: spatial randomness alternative: the observed I is extreme if it is either extremely greater or extremely lower than the values obtained based on permutations
Methods
calc_i(X[, seed, max_RAM])Function to calculate Moran's I of all features