Introduction to the GOxploreR package

Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland

Citation

Kalifa Manjang, Shailesh Tripathi, Olli Yli-Harja, Matthias Dehmer, and Frank Emmert-Streib. 2020. Graph-based exploitation of gene ontology using goxplorer for scrutinizing biological significance. Scientific Reports 10, 1: 16672. https://doi.org/10.1038/s41598-020-73326-3 [2]

Introduction

The GOxploreR package is an R package that provides a simple and efficient way to communicate with the gene ontology (GO) database. The gene ontology is a major bioinformatics initiative by the gene ontology consortium. The goal is to categorize the gene and gene product function. The ontology is structured into three distinct aspects of gene function: molecular function (MF), cellular component (CC), and biological process (BP) together with over 45,000 terms and 130,000 relations whereas the majority of information is centered around ten model organisms [1]. In addition, GO includes annotations by linking specific gene products to GO-terms. Currently, GO is the most comprehensive and widely used knowledgebase concerning functional information about genes.

This vignette gives an overview of the functionality provided by the GOxploreR package.

The package is freely available on CRAN and can be installed using the following command:

install.packages("GOxploreR")

The package function can be loaded using:

library(GOxploreR)

Note that the package needs to be installed to be loaded.

Overview of the functionality of the package

The following is a brief description of the package functionality.

Gene2GOTermAndLevel

The Gene2GOTermAndLevel function provides information associated with a list of genes. Given a gene or a list of genes, an organism, and a domain (BP, MF or CC) the function provides the Gene Ontology terms (GO-terms) associated with the genes and their respective levels of the DAG. The default argument of the domain is BP. For the arguments of the option ‘organism’ see Table .

# The cellular component gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel(genes = c(10212, 9833, 6713), organism = "Homo sapiens", domain = "CC") 
#>    Entrezgene ID      GO ID Domain Level
#> 1          10212 GO:0005634     CC     5
#> 2          10212 GO:0005737     CC     3
#> 3          10212 GO:0016607     CC     9
#> 4          10212 GO:0016020     CC     2
#> 5          10212 GO:0005654     CC     7
#> 6           9833 GO:0016020     CC     2
#> 7           9833 GO:0005886     CC     3
#> 8           9833 GO:0005938     CC     4
#> 9           9833 GO:0005634     CC     5
#> 10          9833 GO:0005737     CC     3
#> 11          6713 GO:0016020     CC     2
#> 12          6713 GO:0005783     CC     5
#> 13          6713 GO:0005789     CC     7
#> 14          6713 GO:0043231     CC     4
# The biological process gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel(genes = c(100000642, 30592, 58153, 794484), organism = "Danio rerio") 
#>    Entrezgene ID      GO ID Domain Level
#> 1      100000642 GO:0007186     BP     5
#> 2      100000642 GO:0050911     BP     7
#> 3          30592 GO:0045214     BP     9
#> 4          30592 GO:0060047     BP     5
#> 5          30592 GO:0060038     BP     9
#> 6          30592 GO:0048738     BP     7
#> 7          30592 GO:0055005     BP    11
#> 8          30592 GO:0055015     BP    10
#> 9          30592 GO:0055004     BP    11
#> 10         30592 GO:0045823     BP     7
#> 11        794484 GO:0008150     BP     0
# The molecular function gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel(genes = c(100009600, 18131, 100017), organism = "Mouse", domain = "MF") 
#>    Entrezgene ID      GO ID Domain Level
#> 1      100009600 GO:0008270     MF     7
#> 2      100009600 GO:0043565     MF     5
#> 3      100009600 GO:0046872     MF     5
#> 4      100009600 GO:0003677     MF     4
#> 5          18131 GO:0005515     MF     2
#> 6          18131 GO:0005509     MF     6
#> 7          18131 GO:0038023     MF     2
#> 8          18131 GO:0042802     MF     3
#> 9          18131 GO:0019899     MF     3
#> 10        100017 GO:0005515     MF     2
#> 11        100017 GO:0035650     MF     3
#> 12        100017 GO:0001784     MF     5
#> 13        100017 GO:0005102     MF     3
#> 14        100017 GO:0005546     MF     7
#> 15        100017 GO:0001540     MF     3
#> 16        100017 GO:0050750     MF     5
#> 17        100017 GO:0030159     MF     4
#> 18        100017 GO:0030276     MF     3
#> 19        100017 GO:0035612     MF     3
#> 20        100017 GO:0035591     MF     3
#> 21        100017 GO:0035615     MF     4

Gene2GOTermAndLevel_ON

This function is similar to the Gene2GOTermAndLevel function, the only difference is that this function queries the Ensembl database online (ON) for GO-terms (making it relatively slow). That means the results from the Gene2GOTermAndLevel_ON function are always up to date but an internet connection is needed for its execution. This function does not provide support for Escherichia coli.

# The cellular component gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel_ON(genes = c(10212, 9833, 6713), organism = "Homo sapiens", domain ="CC") 

# The biological process gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel_ON(genes = c(100000711, 100000710, 100000277), organism = "Danio rerio") 

# The molecular function gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel_ON(genes = c(100009609, 100017, 100034361), organism = "Mouse", domain = "MF") 

GOTermXXOnLevel

This function gives the level of a GO-term based on a DAG. The results for organism-specific GO-DAGs are the same as for the general GO-DAG. The XX in the name above should be replaced by either BP, MF, or CC.

# Retrieve the level of a GO biological process term
goterms <- c("GO:0009083","GO:0006631","GO:0006629","GO:0014811","GO:0021961")
GOTermBPOnLevel(goterm = goterms)
#>         Term Level
#> 1 GO:0009083     7
#> 2 GO:0006631     7
#> 3 GO:0006629     3
#> 4 GO:0014811    15
#> 5 GO:0021961    15
# Retrieve the level of a GO molecular function term
goterms <- c("GO:0005515","GO:0016835","GO:0046976","GO:0015425","GO:0005261")
GOTermMFOnLevel(goterm = goterms)
#>         Term Level
#> 1 GO:0005515     2
#> 2 GO:0016835     3
#> 3 GO:0046976     8
#> 4 GO:0015425     8
#> 5 GO:0005261     6
# Retrieve the level of a GO cellular component term 
goterms <- c("GO:0055044","GO:0030427","GO:0036436","GO:0034980","GO:0048226")
GOTermCCOnLevel(goterm = goterms)
#>         Term Level
#> 1 GO:0055044     2
#> 2 GO:0030427     2
#> 3 GO:0036436    10
#> 4 GO:0034980     7
#> 5 GO:0048226     7

Level2GOTermXX

This function gives all the GO-terms from a given GO-level. These GO-terms can be from the general GO-DAG or from an organism-specific GO-DAG. If the “organism” argument is given, the GO-terms will be acquired from the organism’s (organism supported by the package) DAG level, However, if no value for the “organism” parameter is given then the general GO-DAG is used (default). The XX in the name should be replaced by either BP, MF, or CC.

# Retrieve all the GO-terms from a particular GO BP level 
Level2GOTermBP(level = 1, organism = "Human")
#>  [1] "GO:0008152" "GO:0032502" "GO:0002376" "GO:0048511" "GO:0043473"
#>  [6] "GO:0040011" "GO:0023052" "GO:0009987" "GO:0000003" "GO:0007610"
#> [11] "GO:0050896"
# Retrieve all the GO-terms from a particular GO MF level
Level2GOTermMF(level = 14, organism = "Rat")
#> [1] "GO:0005391" "GO:0008553" "GO:0086039" "GO:0046961" "GO:1905056"
#> [6] "GO:1905059" "GO:0008900"
# Retrieve all the GO-terms from the general GO CC level
Level2GOTermCC(level = 14)
#> [1] "GO:0030085"

Level2LeafNodeXX

This function gives all the leaf nodes from a particular GO-level. Leaf nodes can also be attained from the organism-specific GO-DAG. The “organism” parameter is optional. If supplied, the leaf node from the respective organism’s level will be acquired. The default is the general GO-DAG. The XX should be substituted with either BP, MF, or CC.

# Get all leaf nodes from a GO BP level 
Level2LeafNodeBP(level = 2, organism = "Danio rerio")
#>  [1] "GO:0006807" "GO:0007586" "GO:0032259" "GO:0030431" "GO:0035176"
#>  [6] "GO:0032504" "GO:1990845" "GO:0036268" "GO:0019835" "GO:0045730"
#> [11] "GO:0007624" "GO:0090618"
# Get all leaf nodes from a GO MF level 
Level2LeafNodeMF(level = 12)
#> [1] "GO:1905054" "GO:0098695"
# Get all leaf nodes from a GO CC level 
Level2LeafNodeCC(level = 10, organism = "Schizosaccharomyces pombe")
#>  [1] "GO:1902377" "GO:1990342" "GO:0071957" "GO:0031942" "GO:0000124"
#>  [6] "GO:0005662" "GO:0016586" "GO:0031618" "GO:0034991" "GO:0005751"
#> [11] "GO:1990707" "GO:0016514" "GO:0030126" "GO:0032221" "GO:0005671"
#> [16] "GO:0005749" "GO:0046695" "GO:0005658" "GO:0061499" "GO:0030875"
#> [21] "GO:0036266" "GO:0031262" "GO:0120104" "GO:0071339" "GO:0061496"
#> [26] "GO:0055031" "GO:0030998" "GO:1904834" "GO:0030958" "GO:0120105"
#> [31] "GO:0033698" "GO:0034990" "GO:0000136" "GO:0030127" "GO:0005742"
#> [36] "GO:0043599" "GO:0071627" "GO:0071958" "GO:0061493" "GO:0061497"
#> [41] "GO:0043625" "GO:0031307" "GO:0070692" "GO:0070985" "GO:1990574"
#> [46] "GO:0043505" "GO:0070691" "GO:1990612" "GO:0033551" "GO:0034044"
#> [51] "GO:0120106" "GO:0032865" "GO:0032585" "GO:0005750" "GO:1990537"
#> [56] "GO:1990941" "GO:0042720" "GO:0001401" "GO:0071986" "GO:0042721"
#> [61] "GO:0099616" "GO:0031303" "GO:0031309"

Level2JumpNodeXX

This function gives for a GO-level the GO-terms which correspond to jump Nodes (JNs). The JNs are GO-terms which have at least one child term not present in the level below the parent term. If no organism is given, the default is the general GO-DAG. The XX in the name should be substituted with BP, MF, or CC.

# All jump nodes from the GO BP level
head(Level2JumpNodeBP(level = 2, organism = "Homo sapiens"))
#> [1] "GO:0007155" "GO:0007568" "GO:0048856" "GO:0007154" "GO:0006955"
#> [6] "GO:0045730"
# All jump nodes from the GO MF level
head(Level2JumpNodeMF(level = 3, organism = "Homo sapiens"))
#> [1] "GO:0019239" "GO:0008233" "GO:0035591" "GO:0008047" "GO:0004888"
#> [6] "GO:0042393"
# All jump nodes from the GO CC level
head(Level2JumpNodeCC(level = 7, organism = "Homo sapiens"))
#> [1] "GO:0060205" "GO:0005882" "GO:0042641" "GO:0030139" "GO:0033017"
#> [6] "GO:0030659"

Level2RegularNodeXX

This function gives for a GO-level the GO-terms which correspond to regular Nodes (RNs). The RNs are those GO-terms whose child terms are all present in the level right below the parent’s level. The XX in the name should be subsitituted with BP, MF, or CC.

# All regular nodes from the BP level
head(Level2RegularNodeBP(level = 9, organism = "Zebrafish"))
#> [1] "GO:0002088" "GO:0060396" "GO:0048663" "GO:0071688" "GO:0014866"
#> [6] "GO:0019229"
# All regular nodes from the MF level
head(Level2RegularNodeMF(level = 7, organism = "Homo sapiens"))
#> [1] "GO:0005244" "GO:0000976" "GO:0016531" "GO:0004725" "GO:0004983"
#> [6] "GO:0004722"
# All jump nodes from the CC level
head(Level2RegularNodeCC(level = 7))
#> [1] "GO:0002102" "GO:0015934" "GO:0015935" "GO:0090533" "GO:0032160"
#> [6] "GO:0032161"

Level2NoLeafNodeXX

This function gives all the GO-terms from a GO-level that are not leaf nodes. Similarly, all non-leaf GO-terms from an organism-specific DAG can be returned by providing the organism of interest. The default is the general GO-DAG. The XX in the name should be substituted with either BP, MF or CC.

# All GO-terms on a particular GO BP level that are not leaf nodes 
Level2NoLeafNodeBP(level = 16, organism = "Homo sapiens")
#>  [1] "GO:0072540" "GO:0014808" "GO:0060314" "GO:0045623" "GO:0051281"
#>  [6] "GO:0051280" "GO:0045625" "GO:0031585" "GO:0045624" "GO:0021966"
# All GO-terms on a particular GO MF level that are not leaf nodes 
Level2NoLeafNodeMF(level = 10, organism = "Caenorhabditis elegans")
#> [1] "GO:0004970" "GO:1904315" "GO:0005283" "GO:0010485" "GO:0015271"
# All GO-terms on a particular GO CC level that are not leaf nodes 
Level2NoLeafNodeCC(level = 12, organism = "Homo sapiens")
#> [1] "GO:0098675"

getGOcategory

Given a GO-term or a list of GO-terms, this function returns the category of the term. The categories are jump nodes (JN), regular nodes (RN) and leaf nodes (LN).

goterm <- c("GO:0009083","GO:0006631","GO:0006629","GO:0016835","GO:0046976","GO:0048226")

# Returns the categories of the GO-terms in the list
getGOcategory(goterm = goterm)
#>         Term Category Domain
#> 1 GO:0009083       JN     BP
#> 2 GO:0006631       JN     BP
#> 3 GO:0006629       JN     BP
#> 4 GO:0016835       JN     MF
#> 5 GO:0046976       RN     MF
#> 6 GO:0048226       LN     CC

degreeDistXX

This function obtains the degree distribution of the GO-terms on a GO-level. A bar plot is obtained which shows how many nodes in the GO-level have a certain degree k. The XX in the name should be substituted with either BP, MF, or CC.

# Degree distribution of the GO-terms on a particular GO BP level
degreeDistBP(level = 4)
Degree distribution of the biological process GO-terms on level 4.

Degree distribution of the biological process GO-terms on level 4.

# Degree distribution of the GO-terms on a particular GO MF level
degreeDistMF(level = 2)
Degree distribution of the molecular function GO-terms on level 2.

Degree distribution of the molecular function GO-terms on level 2.

# Degree distribution of the GO-terms on a particular GO CC level
degreeDistCC(level = 10)
Degree distribution of the cellular component GO-terms on level 10.

Degree distribution of the cellular component GO-terms on level 10.

GOTermXX2ChildLevel

For a GO-term it’s children level are derived. The XX in the name should be substituted with BP, MF, or CC.

# Get the level of a GO BP term's children 
GOTermBP2ChildLevel(goterm = "GO:0007635")
#> $Terms
#> [1] "GO:0007636" "GO:0007637" "GO:0042048" "GO:0061366"
#> 
#> $Level
#> [1] 6 5 4 6
# Get the level of a GO MF term's children 
GOTermMF2ChildLevel(goterm = "GO:0098632")
#> $Terms
#> [1] "GO:0086080" "GO:0098641"
#> 
#> $Level
#> [1] 6 6
# Get the level of a GO CC term's children 
GOTermCC2ChildLevel(goterm = "GO:0071735")
#> $Terms
#> [1] "GO:0071736" "GO:0071737"
#> 
#> $Level
#> [1] 5 7

GetLeafNodesXX

This function gives all the leaf nodes of a certain organism. If the input value is empty or is “BP”, “MF”” or “CC”” the default DAG is the general GO-DAG. The value for XX should be subsituted with BP, MF or CC.

# All leaf nodes from the GO BP tree
GetLeafNodesBP("BP")

# All leaf nodes from the GO CC tree
GetLeafNodesCC(organism = "Caenorhabditis elegans")

GO2DecXX

This function returns all descedant child nodes of a GO-term. That means, we begin from a GO-term and find all the GO-terms of its children and their children until we reach all the way down of the DAG. The XX in the name should be substituted with BP, MF or CC.

# Biological process GO-term descendant terms
GO2DecBP(goterm = "GO:0044582")
#> [1] "GO:1900497" "GO:1900498" "GO:1900499"
# Molecular function GO-term descendant terms
GO2DecMF(goterm = "GO:0008553")
#> [1] "GO:0008900"
# Cellular component GO-term descendant terms
GO2DecCC(goterm = "GO:0031233")
#> NULL

GetDAG

This function gives the GO-terms in the Gene Ontology as an edgelist corresponding to a directed acyclic graph (DAG) for the GO-terms of a certain organism. This can also be obtained for the general GO-DAG (not organism-specific).

# Represent all the BP gene association GO-terms for human as an edgelist
head(GetDAG(organism = "Human", domain = "BP"))
#>      [,1]         [,2]        
#> [1,] "GO:0008150" "GO:0000003"
#> [2,] "GO:0008150" "GO:0002376"
#> [3,] "GO:0008150" "GO:0007610"
#> [4,] "GO:0008150" "GO:0008152"
#> [5,] "GO:0008150" "GO:0009758"
#> [6,] "GO:0008150" "GO:0009987"
# Represent all the MF gene association GO-terms for Mouse as an edgelist
head(GetDAG(organism = "Mouse", domain = "MF"))
#>      [,1]         [,2]        
#> [1,] "GO:0003674" "GO:0140110"
#> [2,] "GO:0003674" "GO:0003824"
#> [3,] "GO:0003674" "GO:0038024"
#> [4,] "GO:0003674" "GO:0045735"
#> [5,] "GO:0003674" "GO:0005198"
#> [6,] "GO:0003674" "GO:0005215"
# Represent all the CC gene association GO-terms for Caenorhabditis elegans as an edgelist
head(GetDAG(organism = "Caenorhabditis elegans", domain = "CC"))
#>      [,1]         [,2]        
#> [1,] "GO:0005575" "GO:0005622"
#> [2,] "GO:0005575" "GO:0032991"
#> [3,] "GO:0005575" "GO:0110165"
#> [4,] "GO:0005622" "GO:0000151"
#> [5,] "GO:0005622" "GO:0000159"
#> [6,] "GO:0005622" "GO:0000178"

visRDAGXX

The visualization of a GO-DAG is difficult primarily because of the size of the graphs containing thousands of GO-terms. For this reason, we invented a simple method that combines GO-terms with similar characteristics together. This includes a global summary of all GO-terms in the DAG. Every node in the reduced DAG comprises 1 or more GO-terms and these GO-terms can be accessed by using certain information (i.e. the level and what type of node category they represent for example “RN”, “JN” or “LN”). This is what we call the reduced GO-DAG for an organism. Furthermore, the function returns a list which contains the GO-terms in each category and the plot of the reduced DAG. To retrieve just the GO-terms in each category, the “plot” argument can be set to “FALSE” . The XX in the name should be substituted with BP, MF or CC. The total number of GO-terms in each node is represented by the node label. For instance, in Figure , Level 0 (i.e “L0 RN”) has 1 GO-term present in the node category.

The label “J”,“R” and “L” on the right side of Figure gives the number of connections between the regular node (RN) on the level and the nodes right below it (RN are nodes that have all their children nodes represented in the next level). For example, on L1, The label J = 5, R = 9 and L = 6 means that the RNs on the level have 5 of it’s children nodes as Jump nodes (JN) on L2, 9 of it’s descendant are Regular nodes (RN) and 6 of its children GO-terms on L2 are leaf nodes (LN).

# The GO-terms in each node category of the reduced Caenorhabditis elegans GO-DAG
head(visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE))
#> $`L4 RN`
#>   [1] "GO:0016504" "GO:0016301" "GO:0000166" "GO:0046872" "GO:0008237"
#>   [6] "GO:0008146" "GO:0004930" "GO:0008168" "GO:0016779" "GO:0003723"
#>  [11] "GO:0004601" "GO:0004842" "GO:0003779" "GO:0008234" "GO:0000981"
#>  [16] "GO:0001664" "GO:0016833" "GO:0016715" "GO:0043022" "GO:0015631"
#>  [21] "GO:0004812" "GO:0004518" "GO:0008236" "GO:0005543" "GO:0008641"
#>  [26] "GO:0019902" "GO:0003713" "GO:0008483" "GO:0016780" "GO:0019789"
#>  [31] "GO:0004659" "GO:0019905" "GO:0032934" "GO:0017022" "GO:0003995"
#>  [36] "GO:0019888" "GO:0031593" "GO:0004175" "GO:0016758" "GO:0070063"
#>  [41] "GO:0016831" "GO:0019887" "GO:0042578" "GO:0016818" "GO:0000104"
#>  [46] "GO:0016747" "GO:0030414" "GO:0015036" "GO:0070025" "GO:0004553"
#>  [51] "GO:0016671" "GO:0001217" "GO:0048018" "GO:0016702" "GO:0016714"
#>  [56] "GO:0015459" "GO:0016298" "GO:0009982" "GO:0030695" "GO:0016620"
#>  [61] "GO:0016868" "GO:0016706" "GO:0016811" "GO:0044390" "GO:0015267"
#>  [66] "GO:0016763" "GO:0002020" "GO:0005246" "GO:0051020" "GO:0016783"
#>  [71] "GO:0008410" "GO:0071568" "GO:0035254" "GO:0051920" "GO:0019900"
#>  [76] "GO:0016653" "GO:0016624" "GO:0005048" "GO:0016884" "GO:0016639"
#>  [81] "GO:0016857" "GO:0016812" "GO:0033558" "GO:0016615" "GO:0019211"
#>  [86] "GO:0140297" "GO:0003909" "GO:0046912" "GO:0016709" "GO:0008484"
#>  [91] "GO:0035014" "GO:0016742" "GO:0016841" "GO:0016836" "GO:0016628"
#>  [96] "GO:0004984" "GO:0008452" "GO:0019212" "GO:0032452" "GO:0016421"
#> [101] "GO:0016814"
#> 
#> $`L5 RN`
#>  [1] "GO:0004672" "GO:0000030" "GO:0016791" "GO:0008528" "GO:0004519"
#>  [6] "GO:0050661" "GO:0008417" "GO:0043565" "GO:0003727" "GO:0016174"
#> [11] "GO:0016972" "GO:0050660" "GO:0061630" "GO:0042626" "GO:0004197"
#> [16] "GO:0004180" "GO:0008017" "GO:0035091" "GO:0019901" "GO:0030295"
#> [21] "GO:0003857" "GO:0051377" "GO:0019843" "GO:0008135" "GO:0004104"
#> [26] "GO:0003729" "GO:0016597" "GO:0004190" "GO:0004177" "GO:0016307"
#> [31] "GO:0000828" "GO:0035673" "GO:0015927" "GO:0004540" "GO:0070491"
#> [36] "GO:0030515" "GO:0051287" "GO:0008374" "GO:0004368" "GO:0004000"
#> [41] "GO:0003725" "GO:0004611" "GO:0017076" "GO:0004407" "GO:0004620"
#> [46] "GO:0019171" "GO:0004864" "GO:0008376" "GO:0008320" "GO:0004527"
#> [51] "GO:0008081" "GO:0008375" "GO:0003997" "GO:0005179" "GO:0004866"
#> [56] "GO:0015165" "GO:0031543" "GO:0036002" "GO:0017069" "GO:0019903"
#> [61] "GO:0031490" "GO:0008378" "GO:0015020" "GO:0003953" "GO:0005160"
#> [66] "GO:0019205" "GO:0070569" "GO:0016462" "GO:0043175" "GO:0008173"
#> [71] "GO:0015929" "GO:0004311" "GO:0004470" "GO:0052742" "GO:0004353"
#> [76] "GO:0008235" "GO:0005092" "GO:0016407" "GO:0003884" "GO:0035596"
#> [81] "GO:0001727" "GO:0018455" "GO:0004731" "GO:0008318" "GO:0004753"
#> [86] "GO:0004576" "GO:0004645" "GO:0072542" "GO:0061629" "GO:0033613"
#> [91] "GO:0034338" "GO:0004084" "GO:0005104" "GO:0046914" "GO:0034979"
#> [96] "GO:0051723" "GO:0008227" "GO:0033744" "GO:0016410"
#> 
#> $`L6 RN`
#>  [1] "GO:0004674" "GO:0000026" "GO:0004376" "GO:0004713" "GO:0005506"
#>  [6] "GO:0008175" "GO:0004652" "GO:0005507" "GO:0043539" "GO:0004521"
#> [11] "GO:0030983" "GO:0008080" "GO:0005243" "GO:0004721" "GO:0004520"
#> [16] "GO:0046873" "GO:0000009" "GO:0052590" "GO:0008251" "GO:0016297"
#> [21] "GO:0016418" "GO:0004396" "GO:0001640" "GO:0004623" "GO:1990837"
#> [26] "GO:0005184" "GO:0008188" "GO:0035250" "GO:0042577" "GO:0000033"
#> [31] "GO:0017136" "GO:0042162" "GO:0008187" "GO:1990782" "GO:0042625"
#> [36] "GO:0035257" "GO:0090599" "GO:0017110" "GO:0004661" "GO:0004647"
#> [41] "GO:0015116" "GO:0008106" "GO:0003988" "GO:0008649" "GO:0004563"
#> [46] "GO:0004022" "GO:0004557" "GO:0042171" "GO:0004712" "GO:0031545"
#> [51] "GO:1901981" "GO:0047429"
#> 
#> $`L7 RN`
#>  [1] "GO:0004714" "GO:0004725" "GO:0003774" "GO:0016279" "GO:0003899"
#>  [6] "GO:0004722" "GO:0030971" "GO:0004707" "GO:0015078" "GO:0015276"
#> [11] "GO:0008138" "GO:0015301" "GO:0000976" "GO:0016274" "GO:0032559"
#> [16] "GO:0004114" "GO:0015171" "GO:0016892" "GO:0004697" "GO:0005244"
#> [21] "GO:0004683" "GO:0009931" "GO:0004127" "GO:0009041" "GO:0016423"
#> [26] "GO:0016888" "GO:0015085" "GO:0051998" "GO:0004708" "GO:0005347"
#> 
#> $`L8 RN`
#>  [1] "GO:0004386" "GO:0009019" "GO:0001046" "GO:0000146" "GO:0000977"
#>  [6] "GO:0018024" "GO:0015179" "GO:0008094" "GO:0005272" "GO:0003777"
#> [11] "GO:0000987" "GO:0015252" "GO:0005262" "GO:0005267" "GO:0005003"
#> [16] "GO:0008556" "GO:0008988" "GO:0008308" "GO:0004596" "GO:0015377"
#> [21] "GO:0008296" "GO:0005384" "GO:0015295" "GO:0008297"
#> 
#> $`L9 RN`
#> [1] "GO:0000978" "GO:0003678" "GO:0004402" "GO:0005245" "GO:1990939"
#> [6] "GO:0015269" "GO:0005249" "GO:0005432"
# RN GO-terms on level 1 can be access as follows
visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE)$"L1 RN"
#> [1] "GO:0003824" "GO:0005198" "GO:0060090"
# JN GO-terms on level 9 can be access as follows
visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE)$"L9 JN"
#> [1] "GO:0015379"
# LN GO-terms on level 14 can be access as follows
visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE)$"L11 LN"
#>  [1] "GO:0005335" "GO:0022848" "GO:0008068" "GO:0043994" "GO:0046972"
#>  [6] "GO:0005332" "GO:0004972" "GO:0005219" "GO:0048763" "GO:0005250"
#> [11] "GO:0005330" "GO:0008511"
# Represent the molecular function GO-DAG for organism Caenorhabditis elegans
visRDAGMF(organism = "Caenorhabditis elegans", plot = TRUE)[["plot"]]
\label{fig:figs1}Visualization of a reduced GO-DAG for Caenorhabditis elegans.

Visualization of a reduced GO-DAG for Caenorhabditis elegans.

visRsubDAGXX

The visRsubDAGXX function is similar to the visRDAGXX function, however, it visualizes an organism-specific sub-GO-DAG. The input of the function is a list of organism-specific GO-terms. If this list contains not all GO-terms of the organism, then category nodes are faded out. The XX in the function can be substituted with BP, MF or CC.

Terms <- c("GO:0022403","GO:0000278","GO:0006414","GO:0006415","GO:0006614",
           "GO:0045047","GO:0072599","GO:0006613","GO:0000279","GO:0000087",
           "GO:0070972","GO:0000184","GO:0000280","GO:0007067","GO:0006413",
           "GO:0048285","GO:0006412","GO:0000956","GO:0006612","GO:0019080",
           "GO:0019083","GO:0016071","GO:0006402","GO:0043624","GO:0043241",
           "GO:0006401","GO:0072594","GO:0022904","GO:0019058","GO:0032984",
           "GO:0045333","GO:0006259","GO:0051301","GO:0022900","GO:0006396",
           "GO:0060337","GO:0071357","GO:0034340","GO:0002682","GO:0051320",
           "GO:0045087","GO:0051325","GO:0022411","GO:0016032","GO:0044764",
           "GO:0022415","GO:0051329","GO:0050776","GO:0030198","GO:0043062")

# visualization the DAG node categories of the given biological process GO-terms
visRsubDAGBP(goterm = Terms, organism = "Human")
Visualization of a reduced sub-GO-DAG of BPs for  Human.

Visualization of a reduced sub-GO-DAG of BPs for Human.

distRankingGO

Given a list of GO-terms, the function provides ranking for the GO-terms according to the distance between the GO-terms hierarchy level and the maximal depth of paths in the GO-DAG passing through these GO-terms. The function provides options for “BP”, “MF” and “CC” ontology.

Terms <- c("GO:0000278","GO:0006414","GO:0022403","GO:0006415","GO:0006614",
           "GO:0045047","GO:0072599","GO:0006613","GO:0000184","GO:0070972",
           "GO:0006413","GO:0000087","GO:0000280","GO:0000279","GO:0006612",
           "GO:0000956","GO:0048285","GO:0019080","GO:0019083","GO:0043624",
           "GO:0006402","GO:0032984","GO:0006401","GO:0072594","GO:0019058",
           "GO:0051301","GO:0016071","GO:0006412","GO:0002682","GO:0022411",
           "GO:0001775","GO:0046649","GO:0045321","GO:0050776","GO:0007155",
           "GO:0022610","GO:0060337","GO:0071357","GO:0034340","GO:0016032",
           "GO:0044764","GO:0006396","GO:0010564","GO:0002684","GO:0006259",
           "GO:0051249","GO:0045087")

# Ordering of the GO-terms in the list
distRankingGO(goterm = Terms, domain = "BP", plot = TRUE)
#> Warning in GOTermBPOnLevel(goterm = goterm): Check that the term GO:0043624
#> GO:0022610 GO:0044764 are bp GO-terms and not obsolete
#> Warning in max(GOTermBPOnLevel(goterm = goterms)$Level): no non-missing
#> arguments to max; returning -Inf
#> Warning in max(GOTermBPOnLevel(goterm = goterms)$Level): no non-missing
#> arguments to max; returning -Inf
#> Warning in max(GOTermBPOnLevel(goterm = goterms)$Level): no non-missing
#> arguments to max; returning -Inf
#> Warning in y - cc1: longer object length is not a multiple of shorter object
#> length
#> Warning in y - cc1: longer object length is not a multiple of shorter object
#> length
#> $`GO-terms_ranking`
#>  [1] "GO:0050776" "GO:0045087" "GO:0002682" "GO:0046649" "GO:0000278"
#>  [6] "GO:0048285" "GO:0022411" "GO:0001775" "GO:0045321" "GO:0000280"
#> [11] "GO:0010564" "GO:0006259" "GO:0006402" "GO:0006412" "GO:0071357"
#> [16] "GO:0002684" "GO:0006401" "GO:0072594" "GO:0016071" "GO:0006396"
#> [21] "GO:0051249" "GO:0019080" "GO:0032984" "GO:0051301" "GO:0019058"
#> [26] "GO:0016032" "GO:0006413" "GO:0007155" "GO:0006414" "GO:0070972"
#> [31] "GO:0000956" "GO:0072599" "GO:0006612" "GO:0019083" "GO:0034340"
#> [36] "GO:0022403" "GO:0006415" "GO:0045047" "GO:0060337" "GO:0000279"
#> [41] "GO:0006613" "GO:0000184" "GO:0000087" "GO:0006614" "GO:0043624"
#> [46] "GO:0022610" "GO:0044764"
#> 
#> $indices_of_ranking
#>  [1] 34 47 29 32  1 17 30 31 33 13 43 45 21 28 38 44 23 24 27 42 46 18 22 26 25
#> [26] 40 11 35  2 10 16  7 15 19 39  3  4  6 37 14  8  9 12  5 20 36 41
#> 
#> $distance
#>  [1]   16   16   14   14   13   13   13   13   13   12   12   12   11   11   11
#> [16]   11   10   10   10   10   10    9    9    9    8    8    7    7    6    6
#> [31]    6    5    5    5    5    4    4    4    4    3    2    2    2    1 -Inf
#> [46] -Inf -Inf
#> 
#> $plot
#> Warning: Removed 3 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 3 rows containing missing values or values outside the scale range
#> (`geom_line()`).
\label{fig:rank}The hierarchy levels for a list of GO-terms (y-axis) are shown in purple and the hierarchy levels for the maximal depth of paths in the GO-DAG passing through these GO-terms is shown in red.

The hierarchy levels for a list of GO-terms (y-axis) are shown in purple and the hierarchy levels for the maximal depth of paths in the GO-DAG passing through these GO-terms is shown in red.

The function produced as output the ranked GO-terms, the indices of the ranking (indices corresponding to the original list), the distance between the GO-terms hierarchy level and the maximal depth of paths in the GO-DAG passing through these GO-terms and a visualisation of the ranking. The GO-terms are ranked according to the distance between the two points (purple and red) shown in Figure .

scoreRankingGO

The function scoreRankingGO is similar to the distRankingGO function because both function provide ordering for a given list of GO-terms. The difference is scoreRankingGO rank the GO-terms according to a score which is computed using Equation .

The function produced as output the ranked GO-terms, the indices of the ranking (indices corresponding to the original list), the scores of each GO-terms and a visualisation of the ranking.

Terms <- c("GO:0000278","GO:0006414","GO:0022403","GO:0006415","GO:0006614",
           "GO:0045047","GO:0072599","GO:0006613","GO:0000184","GO:0070972",
           "GO:0006413","GO:0000087","GO:0000280","GO:0000279","GO:0006612",
           "GO:0000956","GO:0048285","GO:0019080","GO:0019083","GO:0006402",
           "GO:0032984","GO:0006401","GO:0072594","GO:0019058","GO:0051301",
           "GO:0016071","GO:0006412","GO:0002682","GO:0022411","GO:0001775",
           "GO:0046649","GO:0045321","GO:0050776","GO:0007155","GO:0060337",
           "GO:0071357","GO:0034340","GO:0016032","GO:0006396","GO:0010564",
           "GO:0002684","GO:0006259","GO:0051249","GO:0045087")

# Ordering of the GO-terms in the list
scoreRankingGO(goterm = Terms, domain = "BP", plot = FALSE)
#> $GO_terms_ranking
#>  [1] "GO:0016032" "GO:0001775" "GO:0007155" "GO:0051301" "GO:0019080"
#>  [6] "GO:0019058" "GO:0002682" "GO:0045321" "GO:0000278" "GO:0022403"
#> [11] "GO:0050776" "GO:0002684" "GO:0046649" "GO:0022411" "GO:0019083"
#> [16] "GO:0048285" "GO:0010564" "GO:0000279" "GO:0032984" "GO:0006259"
#> [21] "GO:0000280" "GO:0051249" "GO:0006401" "GO:0016071" "GO:0006412"
#> [26] "GO:0000087" "GO:0045087" "GO:0070972" "GO:0072594" "GO:0006396"
#> [31] "GO:0006413" "GO:0006414" "GO:0072599" "GO:0006612" "GO:0006415"
#> [36] "GO:0006402" "GO:0045047" "GO:0034340" "GO:0000956" "GO:0006613"
#> [41] "GO:0071357" "GO:0006614" "GO:0060337" "GO:0000184"
#> 
#> $indices_of_ranking
#>  [1] 38 30 34 25 18 24 28 32  1  3 33 41 31 29 19 17 40 14 21 42 13 43 22 26 27
#> [26] 12 44 10 23 39 11  2  7 15  4 20  6 37 16  8 36  5 35  9
#> 
#> $score
#>  [1] 0.004048583 0.012383901 0.012383901 0.014035088 0.019138756 0.021052632
#>  [7] 0.026315789 0.027863777 0.029605263 0.035087719 0.046783626 0.046783626
#> [13] 0.049535604 0.056140351 0.059210526 0.073099415 0.073099415 0.078947368
#> [19] 0.087719298 0.087719298 0.105263158 0.111455108 0.118421053 0.118421053
#> [25] 0.135338346 0.140350877 0.143274854 0.157894737 0.157894737 0.171929825
#> [31] 0.184210526 0.198380567 0.214912281 0.214912281 0.234449761 0.266447368
#> [37] 0.280701754 0.280701754 0.328947368 0.336842105 0.355263158 0.426315789
#> [43] 0.438596491 0.489878543

prioritizedGOTerms

Given a vector of GO-terms, this function prioritizes the GO-terms by expoiting the structure of a DAG. Starting from the GO-term on the highest level and searching all the paths to the root node iteratively. If the argument “sp” is TRUE, only shortest paths are used, otherwise all paths. If any GO-terms in the input vector are found along this path, these GO-terms are removed. This is because the GO-term at the end of a path is more specific than the GO-terms along the path. For an organism, the GO-terms of that organism are used for the prioritization. If the organism argument is NULL then all the (non-retired) GO-terms from a particular ontology are used in the ranking.

Terms <- c("GO:0042254", "GO:0022613", "GO:0034470", "GO:0006364", "GO:0016072",
           "GO:0034660", "GO:0006412", "GO:0006396", "GO:0007005", "GO:0032543",
           "GO:0044085", "GO:0044281", "GO:0044257", "GO:0030163", "GO:0006082",
           "GO:0044248", "GO:0006519", "GO:0009056", "GO:0019752", "GO:0043436")

# We Prioritize the given biological process GO-terms

prioritizedGOTerms(lst = Terms, organism = "Human", sp = TRUE, domain = "BP")
#> $HF
#> [1] "GO:0006364" "GO:0032543" "GO:0044257" "GO:0007005" "GO:0019752"
#> 
#> $rankHF
#> GO:0006364 GO:0032543 GO:0044257 GO:0007005 GO:0019752 
#>          8          7          5          4          4 
#> 
#> $HI
#>  [1] "GO:0042254" "GO:0022613" "GO:0034470" "GO:0006364" "GO:0016072"
#>  [6] "GO:0034660" "GO:0006412" "GO:0006396" "GO:0007005" "GO:0032543"
#> [11] "GO:0044085" "GO:0044257" "GO:0030163" "GO:0006082" "GO:0044248"
#> [16] "GO:0006519" "GO:0019752" "GO:0043436"
#> 
#> $rankHI
#> GO:0006364 GO:0034470 GO:0016072 GO:0032543 GO:0034660 GO:0006412 GO:0006396 
#>          8          7          7          7          6          6          6 
#> GO:0044257 GO:0042254 GO:0007005 GO:0030163 GO:0019752 GO:0022613 GO:0043436 
#>          5          4          4          4          4          3          3 
#> GO:0044085 GO:0006082 GO:0044248 
#>          2          2          2 
#> 
#> attr(,"class")
#> [1] "GOxploreR" "GPrior"

GO4Organism

This function gives all the GO-terms association with an organism and their corresponding GO-levels.

# All the biological process gene association GO-terms for Human and their GO-level
head(GO4Organism(organism = "Human", domain = "BP"))
#>        GO ID Level
#> 1 GO:0008150     0
#> 2 GO:0043312     9
#> 3 GO:0002576     8
#> 4 GO:0006805     5
#> 5 GO:0009168     9
#> 6 GO:0007155     2
# All the molecular function gene association GO-terms for Mouse and their GO-level
head(GO4Organism(organism = "Mouse", domain = "MF"))
#>        GO ID Level
#> 1 GO:0005524     8
#> 2 GO:0004672     5
#> 3 GO:0004679     7
#> 4 GO:0016740     2
#> 5 GO:0000166     4
#> 6 GO:0016301     4
# All the cellular component gene association GO-terms for Rat and their GO-level
head(GO4Organism(organism = "Rat", domain = "CC"))
#>        GO ID Level
#> 1 GO:0016020     2
#> 2 GO:0005654     7
#> 3 GO:0005737     3
#> 4 GO:0005886     3
#> 5 GO:0030424     5
#> 6 GO:0031594     4

Conclusion

This vignette gave a brief overview of the functionality provided by the GOxploreR package. We showed all functions and how to use them.

References

1.
Gene Ontology Consortium. 2018. The gene ontology resource: 20 years and still going strong. Nucleic acids research 47, D1: D330–D338.
2.
Kalifa Manjang, Shailesh Tripathi, Olli Yli-Harja, Matthias Dehmer, and Frank Emmert-Streib. 2020. Graph-based exploitation of gene ontology using GOxploreR for scrutinizing biological significance. Scientific Reports 10, 1: 16672. https://doi.org/10.1038/s41598-020-73326-3