Differential Expression Analysis
There are two main functions for differential expression converted from EdgeR.
build_dgelist
takes as input a pandas dataframe with expression counts,
with samples as rows and genes as columns. It returns the log2-normalized TMM matrix
and, as the name implies, a DGElist. The latter is to be used as an input for
diff_exp
which will fit a glmQL [Robinson2010] model and return the results of the
differential expression analysis. This function also needs a pandas dataframe
containing information on the samples membership to the groups to be compared,
with samples as rows, and a single column, group
, with the group
number.
If activated, the filter
option allows to remove genes that fall below
an expression threshold set by min_count
and min_total_count
,
equivalent to the flags in EdgeR.filterByExpr
.
from tapir.edger import build_dgelist, diff_exp
dgelist, tmmlog = build_dgelist(input_table)
de = diff_exp(dgelist, groups, filter=True)
The output includes log-normalized fold change, average log-normalized CPM across all samples, the quasi-likelihood F-statistics, p-values and FDR.
References
- Robinson2010
Robinson M.D., McCarthy D.J., Smyth G.K. (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data”, Bioinformatics, 26(1), 139-140.