Gene Sets Enrichment Analysis
TAPIR includes a number of functions for gene sets enrichment analysis,
powered by gseapy. Standard GSEA (gsea), preranked
(prerank) and single sample (ssgsea)
analyses can both be run with run_gsea, by selecting
the type.
from tapir.gsets import run_gsea
gsmat = run_gsea(data, subsel=None, type='ssgsea', tmp_path=r'./tmp_gsea')
data needs to contain the expression counts, with samples as rows
and genes as columns, subsel allows to subselect which gene sets
should be included in the analysis. This can help to improve considerably
the computation times. subsel can be a single string or a list
of strings; gene sets whose name contains any of the strings provided
will be included (e.g. HALLMARK_ will include all hallmark of cancer genes).
An exaustive list of gene sets is provided, but a custom file
can be set with ref.
Further flags for gseapy can be provided as keyword arguments.
Gene sets network plots
A plotting function is available to plot a single gene set
as a network. Circles (genes) are connected by the relative
number of common appearances in other gene sets (a subselection subsel
within the provided ref file and). If provided, these can be colour coded by expression
(with exp). A cutoff can filter connections below
a certain percentage relative to the highest connection value observed.
from tapir.plotting import plot_genes_network
plot_genes_network(gset, subsel, exp=None, cutoff=.1, save_file='./net.png')