Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
Kalifa Manjang, Shailesh Tripathi, Olli Yli-Harja, Matthias Dehmer, and Frank Emmert-Streib. 2020. Graph-based exploitation of gene ontology using goxplorer for scrutinizing biological significance. Scientific Reports 10, 1: 16672. https://doi.org/10.1038/s41598-020-73326-3 [2]
The GOxploreR package is an R package that provides a simple and efficient way to communicate with the gene ontology (GO) database. The gene ontology is a major bioinformatics initiative by the gene ontology consortium. The goal is to categorize the gene and gene product function. The ontology is structured into three distinct aspects of gene function: molecular function (MF), cellular component (CC), and biological process (BP) together with over 45,000 terms and 130,000 relations whereas the majority of information is centered around ten model organisms [1]. In addition, GO includes annotations by linking specific gene products to GO-terms. Currently, GO is the most comprehensive and widely used knowledgebase concerning functional information about genes.
This vignette gives an overview of the functionality provided by the GOxploreR package.
The package is freely available on CRAN and can be installed using the following command:
install.packages("GOxploreR")
The package function can be loaded using:
Note that the package needs to be installed to be loaded.
The following is a brief description of the package functionality.
The Gene2GOTermAndLevel function provides information associated with a list of genes. Given a gene or a list of genes, an organism, and a domain (BP, MF or CC) the function provides the Gene Ontology terms (GO-terms) associated with the genes and their respective levels of the DAG. The default argument of the domain is BP. For the arguments of the option ‘organism’ see Table .
# The cellular component gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel(genes = c(10212, 9833, 6713), organism = "Homo sapiens", domain = "CC")
#> Entrezgene ID GO ID Domain Level
#> 1 10212 GO:0005634 CC 5
#> 2 10212 GO:0005737 CC 3
#> 3 10212 GO:0016607 CC 9
#> 4 10212 GO:0016020 CC 2
#> 5 10212 GO:0005654 CC 7
#> 6 9833 GO:0016020 CC 2
#> 7 9833 GO:0005886 CC 3
#> 8 9833 GO:0005938 CC 4
#> 9 9833 GO:0005634 CC 5
#> 10 9833 GO:0005737 CC 3
#> 11 6713 GO:0016020 CC 2
#> 12 6713 GO:0005783 CC 5
#> 13 6713 GO:0005789 CC 7
#> 14 6713 GO:0043231 CC 4
# The biological process gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel(genes = c(100000642, 30592, 58153, 794484), organism = "Danio rerio")
#> Entrezgene ID GO ID Domain Level
#> 1 100000642 GO:0007186 BP 5
#> 2 100000642 GO:0050911 BP 7
#> 3 30592 GO:0045214 BP 9
#> 4 30592 GO:0060047 BP 5
#> 5 30592 GO:0060038 BP 9
#> 6 30592 GO:0048738 BP 7
#> 7 30592 GO:0055005 BP 11
#> 8 30592 GO:0055015 BP 10
#> 9 30592 GO:0055004 BP 11
#> 10 30592 GO:0045823 BP 7
#> 11 794484 GO:0008150 BP 0
# The molecular function gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel(genes = c(100009600, 18131, 100017), organism = "Mouse", domain = "MF")
#> Entrezgene ID GO ID Domain Level
#> 1 100009600 GO:0008270 MF 7
#> 2 100009600 GO:0043565 MF 5
#> 3 100009600 GO:0046872 MF 5
#> 4 100009600 GO:0003677 MF 4
#> 5 18131 GO:0005515 MF 2
#> 6 18131 GO:0005509 MF 6
#> 7 18131 GO:0038023 MF 2
#> 8 18131 GO:0042802 MF 3
#> 9 18131 GO:0019899 MF 3
#> 10 100017 GO:0005515 MF 2
#> 11 100017 GO:0035650 MF 3
#> 12 100017 GO:0001784 MF 5
#> 13 100017 GO:0005102 MF 3
#> 14 100017 GO:0005546 MF 7
#> 15 100017 GO:0001540 MF 3
#> 16 100017 GO:0050750 MF 5
#> 17 100017 GO:0030159 MF 4
#> 18 100017 GO:0030276 MF 3
#> 19 100017 GO:0035612 MF 3
#> 20 100017 GO:0035591 MF 3
#> 21 100017 GO:0035615 MF 4
This function is similar to the Gene2GOTermAndLevel function, the only difference is that this function queries the Ensembl database online (ON) for GO-terms (making it relatively slow). That means the results from the Gene2GOTermAndLevel_ON function are always up to date but an internet connection is needed for its execution. This function does not provide support for Escherichia coli.
# The cellular component gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel_ON(genes = c(10212, 9833, 6713), organism = "Homo sapiens", domain ="CC")
# The biological process gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel_ON(genes = c(100000711, 100000710, 100000277), organism = "Danio rerio")
# The molecular function gene ontology terms will be retrieve and their levels
Gene2GOTermAndLevel_ON(genes = c(100009609, 100017, 100034361), organism = "Mouse", domain = "MF")
This function gives the level of a GO-term based on a DAG. The results for organism-specific GO-DAGs are the same as for the general GO-DAG. The XX in the name above should be replaced by either BP, MF, or CC.
# Retrieve the level of a GO biological process term
goterms <- c("GO:0009083","GO:0006631","GO:0006629","GO:0014811","GO:0021961")
GOTermBPOnLevel(goterm = goterms)
#> Term Level
#> 1 GO:0009083 7
#> 2 GO:0006631 7
#> 3 GO:0006629 3
#> 4 GO:0014811 15
#> 5 GO:0021961 15
# Retrieve the level of a GO molecular function term
goterms <- c("GO:0005515","GO:0016835","GO:0046976","GO:0015425","GO:0005261")
GOTermMFOnLevel(goterm = goterms)
#> Term Level
#> 1 GO:0005515 2
#> 2 GO:0016835 3
#> 3 GO:0046976 8
#> 4 GO:0015425 8
#> 5 GO:0005261 6
# Retrieve the level of a GO cellular component term
goterms <- c("GO:0055044","GO:0030427","GO:0036436","GO:0034980","GO:0048226")
GOTermCCOnLevel(goterm = goterms)
#> Term Level
#> 1 GO:0055044 2
#> 2 GO:0030427 2
#> 3 GO:0036436 10
#> 4 GO:0034980 7
#> 5 GO:0048226 7
This function gives all the GO-terms from a given GO-level. These GO-terms can be from the general GO-DAG or from an organism-specific GO-DAG. If the “organism” argument is given, the GO-terms will be acquired from the organism’s (organism supported by the package) DAG level, However, if no value for the “organism” parameter is given then the general GO-DAG is used (default). The XX in the name should be replaced by either BP, MF, or CC.
# Retrieve all the GO-terms from a particular GO BP level
Level2GOTermBP(level = 1, organism = "Human")
#> [1] "GO:0008152" "GO:0032502" "GO:0002376" "GO:0048511" "GO:0043473"
#> [6] "GO:0040011" "GO:0023052" "GO:0009987" "GO:0000003" "GO:0007610"
#> [11] "GO:0050896"
# Retrieve all the GO-terms from a particular GO MF level
Level2GOTermMF(level = 14, organism = "Rat")
#> [1] "GO:0005391" "GO:0008553" "GO:0086039" "GO:0046961" "GO:1905056"
#> [6] "GO:1905059" "GO:0008900"
#> [1] "GO:0030085"
This function gives all the leaf nodes from a particular GO-level. Leaf nodes can also be attained from the organism-specific GO-DAG. The “organism” parameter is optional. If supplied, the leaf node from the respective organism’s level will be acquired. The default is the general GO-DAG. The XX should be substituted with either BP, MF, or CC.
#> [1] "GO:0006807" "GO:0007586" "GO:0032259" "GO:0030431" "GO:0035176"
#> [6] "GO:0032504" "GO:1990845" "GO:0036268" "GO:0019835" "GO:0045730"
#> [11] "GO:0007624" "GO:0090618"
#> [1] "GO:1905054" "GO:0098695"
# Get all leaf nodes from a GO CC level
Level2LeafNodeCC(level = 10, organism = "Schizosaccharomyces pombe")
#> [1] "GO:1902377" "GO:1990342" "GO:0071957" "GO:0031942" "GO:0000124"
#> [6] "GO:0005662" "GO:0016586" "GO:0031618" "GO:0034991" "GO:0005751"
#> [11] "GO:1990707" "GO:0016514" "GO:0030126" "GO:0032221" "GO:0005671"
#> [16] "GO:0005749" "GO:0046695" "GO:0005658" "GO:0061499" "GO:0030875"
#> [21] "GO:0036266" "GO:0031262" "GO:0120104" "GO:0071339" "GO:0061496"
#> [26] "GO:0055031" "GO:0030998" "GO:1904834" "GO:0030958" "GO:0120105"
#> [31] "GO:0033698" "GO:0034990" "GO:0000136" "GO:0030127" "GO:0005742"
#> [36] "GO:0043599" "GO:0071627" "GO:0071958" "GO:0061493" "GO:0061497"
#> [41] "GO:0043625" "GO:0031307" "GO:0070692" "GO:0070985" "GO:1990574"
#> [46] "GO:0043505" "GO:0070691" "GO:1990612" "GO:0033551" "GO:0034044"
#> [51] "GO:0120106" "GO:0032865" "GO:0032585" "GO:0005750" "GO:1990537"
#> [56] "GO:1990941" "GO:0042720" "GO:0001401" "GO:0071986" "GO:0042721"
#> [61] "GO:0099616" "GO:0031303" "GO:0031309"
This function gives for a GO-level the GO-terms which correspond to jump Nodes (JNs). The JNs are GO-terms which have at least one child term not present in the level below the parent term. If no organism is given, the default is the general GO-DAG. The XX in the name should be substituted with BP, MF, or CC.
#> [1] "GO:0007155" "GO:0007568" "GO:0048856" "GO:0007154" "GO:0006955"
#> [6] "GO:0045730"
#> [1] "GO:0019239" "GO:0008233" "GO:0035591" "GO:0008047" "GO:0004888"
#> [6] "GO:0042393"
#> [1] "GO:0060205" "GO:0005882" "GO:0042641" "GO:0030139" "GO:0033017"
#> [6] "GO:0030659"
This function gives for a GO-level the GO-terms which correspond to regular Nodes (RNs). The RNs are those GO-terms whose child terms are all present in the level right below the parent’s level. The XX in the name should be subsitituted with BP, MF, or CC.
#> [1] "GO:0002088" "GO:0060396" "GO:0048663" "GO:0071688" "GO:0014866"
#> [6] "GO:0019229"
# All regular nodes from the MF level
head(Level2RegularNodeMF(level = 7, organism = "Homo sapiens"))
#> [1] "GO:0005244" "GO:0000976" "GO:0016531" "GO:0004725" "GO:0004983"
#> [6] "GO:0004722"
#> [1] "GO:0002102" "GO:0015934" "GO:0015935" "GO:0090533" "GO:0032160"
#> [6] "GO:0032161"
This function gives all the GO-terms from a GO-level that are not leaf nodes. Similarly, all non-leaf GO-terms from an organism-specific DAG can be returned by providing the organism of interest. The default is the general GO-DAG. The XX in the name should be substituted with either BP, MF or CC.
# All GO-terms on a particular GO BP level that are not leaf nodes
Level2NoLeafNodeBP(level = 16, organism = "Homo sapiens")
#> [1] "GO:0072540" "GO:0014808" "GO:0060314" "GO:0045623" "GO:0051281"
#> [6] "GO:0051280" "GO:0045625" "GO:0031585" "GO:0045624" "GO:0021966"
# All GO-terms on a particular GO MF level that are not leaf nodes
Level2NoLeafNodeMF(level = 10, organism = "Caenorhabditis elegans")
#> [1] "GO:0004970" "GO:1904315" "GO:0005283" "GO:0010485" "GO:0015271"
# All GO-terms on a particular GO CC level that are not leaf nodes
Level2NoLeafNodeCC(level = 12, organism = "Homo sapiens")
#> [1] "GO:0098675"
Given a GO-term or a list of GO-terms, this function returns the category of the term. The categories are jump nodes (JN), regular nodes (RN) and leaf nodes (LN).
goterm <- c("GO:0009083","GO:0006631","GO:0006629","GO:0016835","GO:0046976","GO:0048226")
# Returns the categories of the GO-terms in the list
getGOcategory(goterm = goterm)
#> Term Category Domain
#> 1 GO:0009083 JN BP
#> 2 GO:0006631 JN BP
#> 3 GO:0006629 JN BP
#> 4 GO:0016835 JN MF
#> 5 GO:0046976 RN MF
#> 6 GO:0048226 LN CC
This function obtains the degree distribution of the GO-terms on a GO-level. A bar plot is obtained which shows how many nodes in the GO-level have a certain degree k. The XX in the name should be substituted with either BP, MF, or CC.
Degree distribution of the biological process GO-terms on level 4.
Degree distribution of the molecular function GO-terms on level 2.
Degree distribution of the cellular component GO-terms on level 10.
For a GO-term it’s children level are derived. The XX in the name should be substituted with BP, MF, or CC.
#> $Terms
#> [1] "GO:0007636" "GO:0007637" "GO:0042048" "GO:0061366"
#>
#> $Level
#> [1] 6 5 4 6
#> $Terms
#> [1] "GO:0086080" "GO:0098641"
#>
#> $Level
#> [1] 6 6
#> $Terms
#> [1] "GO:0071736" "GO:0071737"
#>
#> $Level
#> [1] 5 7
This function gives all the leaf nodes of a certain organism. If the input value is empty or is “BP”, “MF”” or “CC”” the default DAG is the general GO-DAG. The value for XX should be subsituted with BP, MF or CC.
# All leaf nodes from the GO BP tree
GetLeafNodesBP("BP")
# All leaf nodes from the GO CC tree
GetLeafNodesCC(organism = "Caenorhabditis elegans")
This function returns all descedant child nodes of a GO-term. That means, we begin from a GO-term and find all the GO-terms of its children and their children until we reach all the way down of the DAG. The XX in the name should be substituted with BP, MF or CC.
#> [1] "GO:1900497" "GO:1900498" "GO:1900499"
#> [1] "GO:0008900"
#> NULL
This function gives the GO-terms in the Gene Ontology as an edgelist corresponding to a directed acyclic graph (DAG) for the GO-terms of a certain organism. This can also be obtained for the general GO-DAG (not organism-specific).
# Represent all the BP gene association GO-terms for human as an edgelist
head(GetDAG(organism = "Human", domain = "BP"))
#> [,1] [,2]
#> [1,] "GO:0008150" "GO:0000003"
#> [2,] "GO:0008150" "GO:0002376"
#> [3,] "GO:0008150" "GO:0007610"
#> [4,] "GO:0008150" "GO:0008152"
#> [5,] "GO:0008150" "GO:0009758"
#> [6,] "GO:0008150" "GO:0009987"
# Represent all the MF gene association GO-terms for Mouse as an edgelist
head(GetDAG(organism = "Mouse", domain = "MF"))
#> [,1] [,2]
#> [1,] "GO:0003674" "GO:0140110"
#> [2,] "GO:0003674" "GO:0003824"
#> [3,] "GO:0003674" "GO:0038024"
#> [4,] "GO:0003674" "GO:0045735"
#> [5,] "GO:0003674" "GO:0005198"
#> [6,] "GO:0003674" "GO:0005215"
# Represent all the CC gene association GO-terms for Caenorhabditis elegans as an edgelist
head(GetDAG(organism = "Caenorhabditis elegans", domain = "CC"))
#> [,1] [,2]
#> [1,] "GO:0005575" "GO:0005622"
#> [2,] "GO:0005575" "GO:0032991"
#> [3,] "GO:0005575" "GO:0110165"
#> [4,] "GO:0005622" "GO:0000151"
#> [5,] "GO:0005622" "GO:0000159"
#> [6,] "GO:0005622" "GO:0000178"
The visualization of a GO-DAG is difficult primarily because of the size of the graphs containing thousands of GO-terms. For this reason, we invented a simple method that combines GO-terms with similar characteristics together. This includes a global summary of all GO-terms in the DAG. Every node in the reduced DAG comprises 1 or more GO-terms and these GO-terms can be accessed by using certain information (i.e. the level and what type of node category they represent for example “RN”, “JN” or “LN”). This is what we call the reduced GO-DAG for an organism. Furthermore, the function returns a list which contains the GO-terms in each category and the plot of the reduced DAG. To retrieve just the GO-terms in each category, the “plot” argument can be set to “FALSE” . The XX in the name should be substituted with BP, MF or CC. The total number of GO-terms in each node is represented by the node label. For instance, in Figure , Level 0 (i.e “L0 RN”) has 1 GO-term present in the node category.
The label “J”,“R” and “L” on the right side of Figure gives the number of connections between the regular node (RN) on the level and the nodes right below it (RN are nodes that have all their children nodes represented in the next level). For example, on L1, The label J = 5, R = 9 and L = 6 means that the RNs on the level have 5 of it’s children nodes as Jump nodes (JN) on L2, 9 of it’s descendant are Regular nodes (RN) and 6 of its children GO-terms on L2 are leaf nodes (LN).
# The GO-terms in each node category of the reduced Caenorhabditis elegans GO-DAG
head(visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE))
#> $`L4 RN`
#> [1] "GO:0016504" "GO:0016301" "GO:0000166" "GO:0046872" "GO:0008237"
#> [6] "GO:0008146" "GO:0004930" "GO:0008168" "GO:0016779" "GO:0003723"
#> [11] "GO:0004601" "GO:0004842" "GO:0003779" "GO:0008234" "GO:0000981"
#> [16] "GO:0001664" "GO:0016833" "GO:0016715" "GO:0043022" "GO:0015631"
#> [21] "GO:0004812" "GO:0004518" "GO:0008236" "GO:0005543" "GO:0008641"
#> [26] "GO:0019902" "GO:0003713" "GO:0008483" "GO:0016780" "GO:0019789"
#> [31] "GO:0004659" "GO:0019905" "GO:0032934" "GO:0017022" "GO:0003995"
#> [36] "GO:0019888" "GO:0031593" "GO:0004175" "GO:0016758" "GO:0070063"
#> [41] "GO:0016831" "GO:0019887" "GO:0042578" "GO:0016818" "GO:0000104"
#> [46] "GO:0016747" "GO:0030414" "GO:0015036" "GO:0070025" "GO:0004553"
#> [51] "GO:0016671" "GO:0001217" "GO:0048018" "GO:0016702" "GO:0016714"
#> [56] "GO:0015459" "GO:0016298" "GO:0009982" "GO:0030695" "GO:0016620"
#> [61] "GO:0016868" "GO:0016706" "GO:0016811" "GO:0044390" "GO:0015267"
#> [66] "GO:0016763" "GO:0002020" "GO:0005246" "GO:0051020" "GO:0016783"
#> [71] "GO:0008410" "GO:0071568" "GO:0035254" "GO:0051920" "GO:0019900"
#> [76] "GO:0016653" "GO:0016624" "GO:0005048" "GO:0016884" "GO:0016639"
#> [81] "GO:0016857" "GO:0016812" "GO:0033558" "GO:0016615" "GO:0019211"
#> [86] "GO:0140297" "GO:0003909" "GO:0046912" "GO:0016709" "GO:0008484"
#> [91] "GO:0035014" "GO:0016742" "GO:0016841" "GO:0016836" "GO:0016628"
#> [96] "GO:0004984" "GO:0008452" "GO:0019212" "GO:0032452" "GO:0016421"
#> [101] "GO:0016814"
#>
#> $`L5 RN`
#> [1] "GO:0004672" "GO:0000030" "GO:0016791" "GO:0008528" "GO:0004519"
#> [6] "GO:0050661" "GO:0008417" "GO:0043565" "GO:0003727" "GO:0016174"
#> [11] "GO:0016972" "GO:0050660" "GO:0061630" "GO:0042626" "GO:0004197"
#> [16] "GO:0004180" "GO:0008017" "GO:0035091" "GO:0019901" "GO:0030295"
#> [21] "GO:0003857" "GO:0051377" "GO:0019843" "GO:0008135" "GO:0004104"
#> [26] "GO:0003729" "GO:0016597" "GO:0004190" "GO:0004177" "GO:0016307"
#> [31] "GO:0000828" "GO:0035673" "GO:0015927" "GO:0004540" "GO:0070491"
#> [36] "GO:0030515" "GO:0051287" "GO:0008374" "GO:0004368" "GO:0004000"
#> [41] "GO:0003725" "GO:0004611" "GO:0017076" "GO:0004407" "GO:0004620"
#> [46] "GO:0019171" "GO:0004864" "GO:0008376" "GO:0008320" "GO:0004527"
#> [51] "GO:0008081" "GO:0008375" "GO:0003997" "GO:0005179" "GO:0004866"
#> [56] "GO:0015165" "GO:0031543" "GO:0036002" "GO:0017069" "GO:0019903"
#> [61] "GO:0031490" "GO:0008378" "GO:0015020" "GO:0003953" "GO:0005160"
#> [66] "GO:0019205" "GO:0070569" "GO:0016462" "GO:0043175" "GO:0008173"
#> [71] "GO:0015929" "GO:0004311" "GO:0004470" "GO:0052742" "GO:0004353"
#> [76] "GO:0008235" "GO:0005092" "GO:0016407" "GO:0003884" "GO:0035596"
#> [81] "GO:0001727" "GO:0018455" "GO:0004731" "GO:0008318" "GO:0004753"
#> [86] "GO:0004576" "GO:0004645" "GO:0072542" "GO:0061629" "GO:0033613"
#> [91] "GO:0034338" "GO:0004084" "GO:0005104" "GO:0046914" "GO:0034979"
#> [96] "GO:0051723" "GO:0008227" "GO:0033744" "GO:0016410"
#>
#> $`L6 RN`
#> [1] "GO:0004674" "GO:0000026" "GO:0004376" "GO:0004713" "GO:0005506"
#> [6] "GO:0008175" "GO:0004652" "GO:0005507" "GO:0043539" "GO:0004521"
#> [11] "GO:0030983" "GO:0008080" "GO:0005243" "GO:0004721" "GO:0004520"
#> [16] "GO:0046873" "GO:0000009" "GO:0052590" "GO:0008251" "GO:0016297"
#> [21] "GO:0016418" "GO:0004396" "GO:0001640" "GO:0004623" "GO:1990837"
#> [26] "GO:0005184" "GO:0008188" "GO:0035250" "GO:0042577" "GO:0000033"
#> [31] "GO:0017136" "GO:0042162" "GO:0008187" "GO:1990782" "GO:0042625"
#> [36] "GO:0035257" "GO:0090599" "GO:0017110" "GO:0004661" "GO:0004647"
#> [41] "GO:0015116" "GO:0008106" "GO:0003988" "GO:0008649" "GO:0004563"
#> [46] "GO:0004022" "GO:0004557" "GO:0042171" "GO:0004712" "GO:0031545"
#> [51] "GO:1901981" "GO:0047429"
#>
#> $`L7 RN`
#> [1] "GO:0004714" "GO:0004725" "GO:0003774" "GO:0016279" "GO:0003899"
#> [6] "GO:0004722" "GO:0030971" "GO:0004707" "GO:0015078" "GO:0015276"
#> [11] "GO:0008138" "GO:0015301" "GO:0000976" "GO:0016274" "GO:0032559"
#> [16] "GO:0004114" "GO:0015171" "GO:0016892" "GO:0004697" "GO:0005244"
#> [21] "GO:0004683" "GO:0009931" "GO:0004127" "GO:0009041" "GO:0016423"
#> [26] "GO:0016888" "GO:0015085" "GO:0051998" "GO:0004708" "GO:0005347"
#>
#> $`L8 RN`
#> [1] "GO:0004386" "GO:0009019" "GO:0001046" "GO:0000146" "GO:0000977"
#> [6] "GO:0018024" "GO:0015179" "GO:0008094" "GO:0005272" "GO:0003777"
#> [11] "GO:0000987" "GO:0015252" "GO:0005262" "GO:0005267" "GO:0005003"
#> [16] "GO:0008556" "GO:0008988" "GO:0008308" "GO:0004596" "GO:0015377"
#> [21] "GO:0008296" "GO:0005384" "GO:0015295" "GO:0008297"
#>
#> $`L9 RN`
#> [1] "GO:0000978" "GO:0003678" "GO:0004402" "GO:0005245" "GO:1990939"
#> [6] "GO:0015269" "GO:0005249" "GO:0005432"
# RN GO-terms on level 1 can be access as follows
visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE)$"L1 RN"
#> [1] "GO:0003824" "GO:0005198" "GO:0060090"
# JN GO-terms on level 9 can be access as follows
visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE)$"L9 JN"
#> [1] "GO:0015379"
# LN GO-terms on level 14 can be access as follows
visRDAGMF(organism = "Caenorhabditis elegans", plot = FALSE)$"L11 LN"
#> [1] "GO:0005335" "GO:0022848" "GO:0008068" "GO:0043994" "GO:0046972"
#> [6] "GO:0005332" "GO:0004972" "GO:0005219" "GO:0048763" "GO:0005250"
#> [11] "GO:0005330" "GO:0008511"
# Represent the molecular function GO-DAG for organism Caenorhabditis elegans
visRDAGMF(organism = "Caenorhabditis elegans", plot = TRUE)[["plot"]]
Visualization of a reduced GO-DAG for Caenorhabditis elegans.
The visRsubDAGXX function is similar to the visRDAGXX function, however, it visualizes an organism-specific sub-GO-DAG. The input of the function is a list of organism-specific GO-terms. If this list contains not all GO-terms of the organism, then category nodes are faded out. The XX in the function can be substituted with BP, MF or CC.
Terms <- c("GO:0022403","GO:0000278","GO:0006414","GO:0006415","GO:0006614",
"GO:0045047","GO:0072599","GO:0006613","GO:0000279","GO:0000087",
"GO:0070972","GO:0000184","GO:0000280","GO:0007067","GO:0006413",
"GO:0048285","GO:0006412","GO:0000956","GO:0006612","GO:0019080",
"GO:0019083","GO:0016071","GO:0006402","GO:0043624","GO:0043241",
"GO:0006401","GO:0072594","GO:0022904","GO:0019058","GO:0032984",
"GO:0045333","GO:0006259","GO:0051301","GO:0022900","GO:0006396",
"GO:0060337","GO:0071357","GO:0034340","GO:0002682","GO:0051320",
"GO:0045087","GO:0051325","GO:0022411","GO:0016032","GO:0044764",
"GO:0022415","GO:0051329","GO:0050776","GO:0030198","GO:0043062")
# visualization the DAG node categories of the given biological process GO-terms
visRsubDAGBP(goterm = Terms, organism = "Human")
Visualization of a reduced sub-GO-DAG of BPs for Human.
Given a list of GO-terms, the function provides ranking for the GO-terms according to the distance between the GO-terms hierarchy level and the maximal depth of paths in the GO-DAG passing through these GO-terms. The function provides options for “BP”, “MF” and “CC” ontology.
Terms <- c("GO:0000278","GO:0006414","GO:0022403","GO:0006415","GO:0006614",
"GO:0045047","GO:0072599","GO:0006613","GO:0000184","GO:0070972",
"GO:0006413","GO:0000087","GO:0000280","GO:0000279","GO:0006612",
"GO:0000956","GO:0048285","GO:0019080","GO:0019083","GO:0043624",
"GO:0006402","GO:0032984","GO:0006401","GO:0072594","GO:0019058",
"GO:0051301","GO:0016071","GO:0006412","GO:0002682","GO:0022411",
"GO:0001775","GO:0046649","GO:0045321","GO:0050776","GO:0007155",
"GO:0022610","GO:0060337","GO:0071357","GO:0034340","GO:0016032",
"GO:0044764","GO:0006396","GO:0010564","GO:0002684","GO:0006259",
"GO:0051249","GO:0045087")
# Ordering of the GO-terms in the list
distRankingGO(goterm = Terms, domain = "BP", plot = TRUE)
#> Warning in GOTermBPOnLevel(goterm = goterm): Check that the term GO:0043624
#> GO:0022610 GO:0044764 are bp GO-terms and not obsolete
#> Warning in max(GOTermBPOnLevel(goterm = goterms)$Level): no non-missing
#> arguments to max; returning -Inf
#> Warning in max(GOTermBPOnLevel(goterm = goterms)$Level): no non-missing
#> arguments to max; returning -Inf
#> Warning in max(GOTermBPOnLevel(goterm = goterms)$Level): no non-missing
#> arguments to max; returning -Inf
#> Warning in y - cc1: longer object length is not a multiple of shorter object
#> length
#> Warning in y - cc1: longer object length is not a multiple of shorter object
#> length
#> $`GO-terms_ranking`
#> [1] "GO:0050776" "GO:0045087" "GO:0002682" "GO:0046649" "GO:0000278"
#> [6] "GO:0048285" "GO:0022411" "GO:0001775" "GO:0045321" "GO:0000280"
#> [11] "GO:0010564" "GO:0006259" "GO:0006402" "GO:0006412" "GO:0071357"
#> [16] "GO:0002684" "GO:0006401" "GO:0072594" "GO:0016071" "GO:0006396"
#> [21] "GO:0051249" "GO:0019080" "GO:0032984" "GO:0051301" "GO:0019058"
#> [26] "GO:0016032" "GO:0006413" "GO:0007155" "GO:0006414" "GO:0070972"
#> [31] "GO:0000956" "GO:0072599" "GO:0006612" "GO:0019083" "GO:0034340"
#> [36] "GO:0022403" "GO:0006415" "GO:0045047" "GO:0060337" "GO:0000279"
#> [41] "GO:0006613" "GO:0000184" "GO:0000087" "GO:0006614" "GO:0043624"
#> [46] "GO:0022610" "GO:0044764"
#>
#> $indices_of_ranking
#> [1] 34 47 29 32 1 17 30 31 33 13 43 45 21 28 38 44 23 24 27 42 46 18 22 26 25
#> [26] 40 11 35 2 10 16 7 15 19 39 3 4 6 37 14 8 9 12 5 20 36 41
#>
#> $distance
#> [1] 16 16 14 14 13 13 13 13 13 12 12 12 11 11 11
#> [16] 11 10 10 10 10 10 9 9 9 8 8 7 7 6 6
#> [31] 6 5 5 5 5 4 4 4 4 3 2 2 2 1 -Inf
#> [46] -Inf -Inf
#>
#> $plot
#> Warning: Removed 3 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 3 rows containing missing values or values outside the scale range
#> (`geom_line()`).
The hierarchy levels for a list of GO-terms (y-axis) are shown in purple and the hierarchy levels for the maximal depth of paths in the GO-DAG passing through these GO-terms is shown in red.
The function produced as output the ranked GO-terms, the indices of the ranking (indices corresponding to the original list), the distance between the GO-terms hierarchy level and the maximal depth of paths in the GO-DAG passing through these GO-terms and a visualisation of the ranking. The GO-terms are ranked according to the distance between the two points (purple and red) shown in Figure .
The function scoreRankingGO is similar to the distRankingGO function because both function provide ordering for a given list of GO-terms. The difference is scoreRankingGO rank the GO-terms according to a score which is computed using Equation .
The function produced as output the ranked GO-terms, the indices of the ranking (indices corresponding to the original list), the scores of each GO-terms and a visualisation of the ranking.
Terms <- c("GO:0000278","GO:0006414","GO:0022403","GO:0006415","GO:0006614",
"GO:0045047","GO:0072599","GO:0006613","GO:0000184","GO:0070972",
"GO:0006413","GO:0000087","GO:0000280","GO:0000279","GO:0006612",
"GO:0000956","GO:0048285","GO:0019080","GO:0019083","GO:0006402",
"GO:0032984","GO:0006401","GO:0072594","GO:0019058","GO:0051301",
"GO:0016071","GO:0006412","GO:0002682","GO:0022411","GO:0001775",
"GO:0046649","GO:0045321","GO:0050776","GO:0007155","GO:0060337",
"GO:0071357","GO:0034340","GO:0016032","GO:0006396","GO:0010564",
"GO:0002684","GO:0006259","GO:0051249","GO:0045087")
# Ordering of the GO-terms in the list
scoreRankingGO(goterm = Terms, domain = "BP", plot = FALSE)
#> $GO_terms_ranking
#> [1] "GO:0016032" "GO:0001775" "GO:0007155" "GO:0051301" "GO:0019080"
#> [6] "GO:0019058" "GO:0002682" "GO:0045321" "GO:0000278" "GO:0022403"
#> [11] "GO:0050776" "GO:0002684" "GO:0046649" "GO:0022411" "GO:0019083"
#> [16] "GO:0048285" "GO:0010564" "GO:0000279" "GO:0032984" "GO:0006259"
#> [21] "GO:0000280" "GO:0051249" "GO:0006401" "GO:0016071" "GO:0006412"
#> [26] "GO:0000087" "GO:0045087" "GO:0070972" "GO:0072594" "GO:0006396"
#> [31] "GO:0006413" "GO:0006414" "GO:0072599" "GO:0006612" "GO:0006415"
#> [36] "GO:0006402" "GO:0045047" "GO:0034340" "GO:0000956" "GO:0006613"
#> [41] "GO:0071357" "GO:0006614" "GO:0060337" "GO:0000184"
#>
#> $indices_of_ranking
#> [1] 38 30 34 25 18 24 28 32 1 3 33 41 31 29 19 17 40 14 21 42 13 43 22 26 27
#> [26] 12 44 10 23 39 11 2 7 15 4 20 6 37 16 8 36 5 35 9
#>
#> $score
#> [1] 0.004048583 0.012383901 0.012383901 0.014035088 0.019138756 0.021052632
#> [7] 0.026315789 0.027863777 0.029605263 0.035087719 0.046783626 0.046783626
#> [13] 0.049535604 0.056140351 0.059210526 0.073099415 0.073099415 0.078947368
#> [19] 0.087719298 0.087719298 0.105263158 0.111455108 0.118421053 0.118421053
#> [25] 0.135338346 0.140350877 0.143274854 0.157894737 0.157894737 0.171929825
#> [31] 0.184210526 0.198380567 0.214912281 0.214912281 0.234449761 0.266447368
#> [37] 0.280701754 0.280701754 0.328947368 0.336842105 0.355263158 0.426315789
#> [43] 0.438596491 0.489878543
Given a vector of GO-terms, this function prioritizes the GO-terms by expoiting the structure of a DAG. Starting from the GO-term on the highest level and searching all the paths to the root node iteratively. If the argument “sp” is TRUE, only shortest paths are used, otherwise all paths. If any GO-terms in the input vector are found along this path, these GO-terms are removed. This is because the GO-term at the end of a path is more specific than the GO-terms along the path. For an organism, the GO-terms of that organism are used for the prioritization. If the organism argument is NULL then all the (non-retired) GO-terms from a particular ontology are used in the ranking.
Terms <- c("GO:0042254", "GO:0022613", "GO:0034470", "GO:0006364", "GO:0016072",
"GO:0034660", "GO:0006412", "GO:0006396", "GO:0007005", "GO:0032543",
"GO:0044085", "GO:0044281", "GO:0044257", "GO:0030163", "GO:0006082",
"GO:0044248", "GO:0006519", "GO:0009056", "GO:0019752", "GO:0043436")
# We Prioritize the given biological process GO-terms
prioritizedGOTerms(lst = Terms, organism = "Human", sp = TRUE, domain = "BP")
#> $HF
#> [1] "GO:0006364" "GO:0032543" "GO:0044257" "GO:0007005" "GO:0019752"
#>
#> $rankHF
#> GO:0006364 GO:0032543 GO:0044257 GO:0007005 GO:0019752
#> 8 7 5 4 4
#>
#> $HI
#> [1] "GO:0042254" "GO:0022613" "GO:0034470" "GO:0006364" "GO:0016072"
#> [6] "GO:0034660" "GO:0006412" "GO:0006396" "GO:0007005" "GO:0032543"
#> [11] "GO:0044085" "GO:0044257" "GO:0030163" "GO:0006082" "GO:0044248"
#> [16] "GO:0006519" "GO:0019752" "GO:0043436"
#>
#> $rankHI
#> GO:0006364 GO:0034470 GO:0016072 GO:0032543 GO:0034660 GO:0006412 GO:0006396
#> 8 7 7 7 6 6 6
#> GO:0044257 GO:0042254 GO:0007005 GO:0030163 GO:0019752 GO:0022613 GO:0043436
#> 5 4 4 4 4 3 3
#> GO:0044085 GO:0006082 GO:0044248
#> 2 2 2
#>
#> attr(,"class")
#> [1] "GOxploreR" "GPrior"
This function gives all the GO-terms association with an organism and their corresponding GO-levels.
# All the biological process gene association GO-terms for Human and their GO-level
head(GO4Organism(organism = "Human", domain = "BP"))
#> GO ID Level
#> 1 GO:0008150 0
#> 2 GO:0043312 9
#> 3 GO:0002576 8
#> 4 GO:0006805 5
#> 5 GO:0009168 9
#> 6 GO:0007155 2
# All the molecular function gene association GO-terms for Mouse and their GO-level
head(GO4Organism(organism = "Mouse", domain = "MF"))
#> GO ID Level
#> 1 GO:0005524 8
#> 2 GO:0004672 5
#> 3 GO:0004679 7
#> 4 GO:0016740 2
#> 5 GO:0000166 4
#> 6 GO:0016301 4
# All the cellular component gene association GO-terms for Rat and their GO-level
head(GO4Organism(organism = "Rat", domain = "CC"))
#> GO ID Level
#> 1 GO:0016020 2
#> 2 GO:0005654 7
#> 3 GO:0005737 3
#> 4 GO:0005886 3
#> 5 GO:0030424 5
#> 6 GO:0031594 4
This vignette gave a brief overview of the functionality provided by the GOxploreR package. We showed all functions and how to use them.