Differential Expression Analysis

There are two main functions for differential expression converted from EdgeR. build_dgelist takes as input a pandas dataframe with expression counts, with samples as rows and genes as columns. It returns the log2-normalized TMM matrix and, as the name implies, a DGElist. The latter is to be used as an input for diff_exp which will fit a glmQL [Robinson2010] model and return the results of the differential expression analysis. This function also needs a pandas dataframe containing information on the samples membership to the groups to be compared, with samples as rows, and a single column, group, with the group number. If activated, the filter option allows to remove genes that fall below an expression threshold set by min_count and min_total_count, equivalent to the flags in EdgeR.filterByExpr.

from tapir.edger import build_dgelist, diff_exp

dgelist, tmmlog = build_dgelist(input_table)
de              = diff_exp(dgelist, groups, filter=True)

The output includes log-normalized fold change, average log-normalized CPM across all samples, the quasi-likelihood F-statistics, p-values and FDR.

References

Robinson2010: Robinson M.D., McCarthy D.J., Smyth G.K. (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data”, Bioinformatics, 26(1), 139-140.