Lude Franke, Harm van Bakel, Like Fokkens, Edwin D. de Jong, Michael Egmont-Petersen, Cisca Wijmenga
Complex Genetics Section, Department of Biomedical Genetics-Department of Medical Genetics, University Medical Centre Utrecht, Utrecht, The Netherlands
Although the majority of common diseases are complex, resulting from many different genes with weak effects, it can be assumed there are often only a limiting number of molecular pathways that contribute to disease etiology. Linkage studies have led to the identification of a considerable number of susceptibility loci, but lag behind in pinpointing genes contributing to disease because these regions usually span 10s of Mb’s. To aid in the identification of causative genes we propose a prioritization method for positional candidate genes, by assuming that the majority of causative genes are functionally closely related.
We used a Bayesian approach to generate a gene network, based upon data from Gene Ontology (GO), KEGG, BIND, HPRD, Reactome, a dataset which contained approximately 70,000 predicted protein-protein interactions (Lehner and Fraser, 2004), 3,000 predicted human protein-protein interactions (Stelzl et al, 2005) and co-expression data, derived from approximately 10,000 human microarray experiments stored within the Gene Expression Omnibus and the Stanford Microarray Database.
We used the gene network to analyze 96 heritable disorders for which at least three contributing disease genes have been identified. By constructing artificial susceptibility loci around each disease gene, containing 50, 100, 150 or 200 genes, we used a graph theoretic measure to relate positional candidate genes in different loci with each other.
Finally we determined per gene an empiric p-value, which was used to rank per disorder for each locus the positional candidate genes.
Overview | Basic principle of the positional candidate gene prioritization method using gene networks. Depicted in this figure are three different gene-gene interaction data sources that are integrated in a Bayesian way. After integration of the data sources the actual gene network is constructed. As an example, all genes get an initial score of 0 assigned and three different susceptibility loci, each containing a disease gene (P, Q or R) and two non-disease genes, are analyzed. Per locus the three positional candidate genes increase the scores of genes functionally nearby within the gene network, using a kernel function which models the relationship between gene-gene distance and score effect. Once all loci have been processed, shuffling the three susceptibility loci 10,000 times across the genome allows for the determination of an empiric p-value per gene, and the eventual ranking of the positional candidate genes per locus. Genes P, Q and R should then end up as the top ranked genes, as they have the most significant p-values.
Screenshot | Prioritizer showing the results of the analysis of Turcot syndrome, a malignant tumor of the central nervous system, in which three disease genes (PMS2, APC and MLH1) have been implicated.
For 43% of the disease genes the analysis of the loci using the gene network performed well: the true disease genes were ranked within the top 10 per artificial linkage region, when each region contained 100 genes. Compared to a previous method (Turner et al, 2003), in which only 12% of the disease genes were correctly returned, our method is somewhat less specific, but performances much better in identifying the correct disease genes.
We have shown that by assuming that disease genes in a specific disorder are usually functionally related, we are capable of substantially enriching for true disease genes when analyzing susceptibility loci. This method therefore could be valuable for analyzing common disease loci in which the contributing disease genes have not yet been identified.
The resulting program (Prioritizer) allows for the analysis and visualization of user-defined susceptibility loci, and will be available soon on this website.
Version 1.2, 2006/07/28 | This release can now also generate a text report file.
Version 1.1, 2006/05/04 | This release contains some small bug fixes
Version 1.0, 2006/12/15 | Initial release
Click on the Windows / Linux or Mac OSX Prioritizer icon to start the downloading process:
Download Mac OS X
version (298 mb)
Includes the GO + MA
+ PPI + TP Network
Download Windows, Linux
and Unix compatible
version (258 mb)
Includes the GO + MA
+ PPI + TP Network
Please let us know!
If you download Prioritizer please drop an e-mail to let us know that you downloaded and used it. We can then use this information to assess whether it is worthwhile to keep on developing Prioritizer.
Installation is straightforward: When using Mac OS X, mount the disk image by double clicking it, and subsequently double click the Prioritizer icon, after which the program should start automatically. When using Windows, Linux or Unix unzip the downloaded file, e.g. using WinZip, after which you can run Prioritizer either through Prioritizer.bat (when using Windows) or through Prioritizer.sh (when using Linux or Unix, please ensure this file is executable).
Warning: Please be aware that the unzipped files roughly consume 700 mb of free disk space. Additionally when you want to determine accurate, topology corrected, p-values within Prioritizer, the program takes up 610 mb of free internal memory, which requires at least 768 mb of total internal memory, when all other active programs are closed down. It is recommended that have you at least 1 gb when performing this type of analysis. Of course, if you do not have this amount of memory, you can still very well prioritize positional candidate genes within Prioritizer.
Prioritizer requires Java 1.5 or higher. Listed here is how to ensure you have a proper Java version installed:
- Windows 98, 2000, XP and Linux: Please make sure you have a recent Java (JDK or JRE) installation by going to the ‘Command Prompt’. Issue the following command ‘java -version’ (without the brackets). If the Java version is 1.5.0 or higher, your Java installation is appropriate for use with Prioritizer. If not, head to java.sun.com to download the most current version.
- Mac OS X 10.4 Tiger: Prioritizer works on Mac OS X 10.4 Tiger when you have installed the Java 1.5 version. Please use ‘Software Update’ to get all updates, this will also give you the latest Java 1.5 release
- Lude Franke, Harm van Bakel, Like Fokkens, Edwin D. de Jong, Michael Egmont-Petersen, Cisca Wijmenga. 2006. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 2006 Jun;78(6):1011-25
- Lehner, B. and A.G. Fraser. 2004. A first-draft human protein-interactionmap. Genome Biol 5:R63.
- Turner, F.S., D.R. Clutterbuck, and C.A. Semple. 2003. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol 4:R75.
- Dudbridge, F. and B.P. Koeleman. 2004. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am J Hum Genet 75: 424-435.
- Stelzl U., Worm U., Lalowski M., Haenig C., Brembeck F.H., Goehler H., Stroedicke M., Zenkner M., Schoenherr A., Koeppen S., Timm J., Mintzlaff S., Abraham C., Bock N., Kietzmann S., Goedde A., Toksoz E., Droege A., Krobitsch S., Korn B., Birchmeier W., Lehrach H. and Wanker E.E. 2005. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005 Sep 23;122(6):957-68.