BiomeNet : Database of scored functional networks enabling network biology for every sequenced
genome of Earth's biome

Gene network databases have been lagging far behind the genome projects.

Advances in DNA sequencing and genome assembly technology promoted rapid increase in the number of species with sequenced genome. By April 2019, the Genome OnLine database (GOLD) reported sequenced genomes for more than 134,000 cellular organisms, including over 5,000 eukaryotic species. Annotation of every sequenced genome should be followed by functional annotation of individual genes and pathways, and construction of biological networks significantly facilitates functional analysis of genomes by disclosing interactions among different genes. However, creation of network databases has been lagging far behind genome projects and currently, the largest network database, STRING provides gene networks for not more than 500 eukaryotic species, which is less than 10% of all sequenced eukaryote genomes. Considering a recent launch of the Earth BioGenome Project aiming to sequence genomes of all eukaryotic species on Earth in the next 10 years, it can be expected that the knowledge gap between genomes and interactomes will continue to widen. This problem may be solved by public computational pipelines that can automatically construct gene network models for every sequenced genome.

BiomeNet can construct a genome-scale network for every sequenced species in a few minute.

BiomeNet is a database of scored functional networks enabling network biology for every sequenced genome of Earth’s biome. Using the protein sequences of target species submitted by users, BiomeNet server identifies orthologous protein pairs of 95 source networks (comprising ~8 million links), which were previously constructed and validated for 18 species (5 animals, 6 plants, 5 bacteria, and 2 fungi). BiomeNet “Network Builder” tool returns a newly constructed gene network in a few minutes for most target species. For the constructed network, BiomeNet “Network Analyzer” provides two network-based tools for functional analysis: (1) Subnetwork analysis for inferred pathways (GO and KEGG), (2) Gene prioritization for complex traits by guilt-by-association.

Networks by BiomeNet for various species were assessed using biological process annotations by agriGO, and found to have comparable performance with those by STRING database for cattle (Bos taurus), grape (Vitis vinifera), wild pig (Sus scrofa), and potato (Solanum tuberosum) (see Figure 1).

Figure 1 Figure 1.. The precision of gene networks by BiomeNet (B), STRING (S), and randomization (R) for animals and crops (cattle, grape, wild pig and potato), estimated by the probability of finding two genes within the same agriGO biological process terms. The ability of networks to retrieve member genes for each of agriGO terms were also estimated by the area under receiver operating characteristic curve (AUROC) until retrieving 1% of false positives(FPR<0.01). *: P<0.1, **: P<0.01, ***: P<0.001

In addition, we found that BiomeNet for tobacco (Nicotiana tabacum), green foxtail (Setaria viridis), sheep (Ovis aries), and Atlantic salmon (Salmo salar) which are not available by STRING can effectively retrieve genes associated with complex traits such as drought responses in green foxtail and diet-change responses in rainbow trout (Figure 2). We believe BiomeNet will significantly enhance the potential benefit of fully decoded genomes of Earth’s biome in understanding and utilizing the Earth’s biodiversity.

Figure 2 Figure 2.. The precision of gene networks for tobacco, green foxtail, sheep, and Atlantic salmon, which are not available by STRING database. The ability of networks to retrieve member genes for each of agriGO terms were also estimated by the area under receiver operating characteristic curve (AUROC) until retrieving 1% of false positives (FPR<0.01).