Introduction

TFactS is to predict which are the transcription factors (TFs), regulated in a biological condition based on lists of differentially expressed genes (DEGs) obtained from transcriptome experiments. This package is based on the TFactS concept and expands it. It allows users to performe TFactS-like enrichment approach. The package can import and use the original catalogue file from the TFactS website as well as users’ defined catalogues of interest that are not supported by TFactS (e.g., Arabidopsis).

Methods - Statistical significance

This vignette is largely based on the TFactS manual. For the details about TFactS, please also see the original paper by Essaghir et al. (2010).

P-value

Briefly, the current package assumes the Sign-Less catalogue, i.e. it does not contain any regulation type information (up- or down-regulation). TFactSR compares the list of query DEGs (up and/or down) with a catalogue of target gene signatures. The core algorithm is based on Fisher’s exact test using a contingency table as follows:

TF DEGs: Present DEGs: Absent Total
Catalogue: Present k m - k m
Catalogue: Absent n - k N + k - n - m N - m
Total n N - n N
  • \(m\) is the number of target genes annotated for the TF under consideration
  • \(n\) is the number of query genes
  • \(N\) is the number of regulations in the catalog
  • \(k\) is the number of query genes that are annotated as regulated by TF (i.e., the intersection between the query and the TF signature)

\[ Pval = \left( \begin{array}{c} m \\ i \end{array} \right) \left( \begin{array}{c} N-m \\ n-i \end{array} \right) / \left( \begin{array}{c} N \\ n \end{array} \right) \]

E-value

E-value is the number of tests done (\(T\)) times the p-value.

\(Eval = pval \times T\)

BH-corrected P-value

Benjamini and Hochberg false discovery rate (FDR) controlling method: this is based on Benjamini and Hochberg (1995) and is calculated using p.adjust() function. Note that the current TFactSR package does not use Q-value (Storey 2003) under default settings.

Random Control (RC)

RC is the percentage of which a TF is called significant under a certain E-value threshold after a random simulation of user lists in specified number of repetitions:

\[ RD_{(TF)} = \frac{\#\left\{ Eval(TF) \leq \lambda \right\} \times 100} { \#\left\{rep\right\} } \]

Prerequisites

The TFactSR package requires (1) a list of DEGs and (2) a catalogue of interest. For Arabidopsis, we prepared the catalogue based on AtRegNet and ATRM. For human data, the package can do the calculation using default settings.

The Supported organisms by the original TFactS are human, rat and mouse genes. As you can see below, you can perform an enrichment analysis which TFs are regulated if you have a list of DEGs and your catalogue.

Getting started

For human/rat/mouse data, we can do the TFactS analysis as follows.

library(TFactSR)
data(DEGs)
data(catalog)

tftg <- extractTFTG(DEGs, catalog)
TFs <- tftg$TFs
all.targets <- tftg$all.targets

res <- calculateTFactS(DEGs, catalog, TFs, all.targets)
head(res)
##      TFs   m  n    N k      p.value      e.value       FDR.BH RC
## 8  FOXO3  78 18 6838 7 5.499085e-10 1.594735e-08 1.594735e-08  1
## 7  FOXO1 161 18 6838 7 9.012711e-08 2.613686e-06 1.306843e-06  2
## 9  FOXO4   9 18 6838 2 2.330683e-04 6.758980e-03 2.252993e-03  0
## 12  IRF9   3 18 6838 1 7.877425e-03 2.284453e-01 5.711133e-02  0
## 23 STAT1  61 18 6838 2 1.092615e-02 3.168585e-01 6.329948e-02  1
## 19 SMAD5   5 18 6838 1 1.309644e-02 3.797969e-01 6.329948e-02  0

Using the option “TF.col” and “TF.col”, we can specify the target column of your catalogue dataset. Carefully you have to choose the TF-target relationships as follows.

data(AtCatalog)
data(GenesUp_SH1H)

d <- extractTFTG(GenesUp_SH1H, AtCatalog,
                     TF.col = "TF",
                     TG.col = "target.genes")

res <- calculateTFactS(GenesUp_SH1H, AtCatalog, d$TFs, d$all.targets, TF.col = "TF")
head(res)
##          TFs    m  n     N k    p.value    e.value    FDR.BH RC
## 17 AT3G23250    3 74 18910 1 0.01169456  0.3742258 0.3720668  0
## 15 AT2G47460    6 74 18910 1 0.02325417  0.7441336 0.3720668  0
## 26 AT5G11260  280 74 18910 1 0.66914068 21.4125018 1.0000000  0
## 1  AT1G04370    2 74 18910 0 1.00000000 32.0000000 1.0000000  0
## 2  AT1G09530  649 74 18910 0 1.00000000 32.0000000 1.0000000  0
## 3  AT1G24260 4101 74 18910 0 1.00000000 32.0000000 1.0000000  0

Acknowledgments

We thank the Bio”Pack”thon community for helpful discussions. This work was supported by JSPS KAKENHI Grant Numbers 26850024 and 17K07663.

Reference

  1. Essaghir A, Toffalini F, Knoops L, Kallin A, van Helden J, Demoulin JB: Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data. Nucleic Acids Res. 2010 Jun 1;38(11):e120.
  2. Essaghir A, Demoulin JB: A Minimal Connected Network of Transcription Factors Regulated in Human Tumors and Its Application to the Quest for Universal Cancer Biomarkers. Plos One 7 (6), 2012, e39666.

Session info

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.6.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Asia/Tokyo
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] TFactSR_0.0.0.9000 BiocStyle_2.28.0  
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.3         cli_3.6.1           knitr_1.43         
##  [4] rlang_1.1.1         xfun_0.40           stringi_1.7.12     
##  [7] purrr_1.0.2         textshaping_0.3.6   jsonlite_1.8.7     
## [10] glue_1.6.2          rprojroot_2.0.3     htmltools_0.5.6    
## [13] ragg_1.2.5          sass_0.4.7          rmarkdown_2.24     
## [16] evaluate_0.21       jquerylib_0.1.4     fastmap_1.1.1      
## [19] lifecycle_1.0.3     yaml_2.3.7          memoise_2.0.1      
## [22] bookdown_0.35       BiocManager_1.30.22 stringr_1.5.0      
## [25] compiler_4.3.1      fs_1.6.3            rstudioapi_0.15.0  
## [28] systemfonts_1.0.4   digest_0.6.33       R6_2.5.1           
## [31] magrittr_2.0.3      bslib_0.5.1         tools_4.3.1        
## [34] pkgdown_2.0.7       cachem_1.0.8        desc_1.4.2