skip to content

Department of Physiology, Development and Neuroscience



Integration and exploitation of Big Data for human health

We are interested in the representation and exploitation of genotypic and phenotypic knowledge about disease and pathophysiology. A great deal of information is now available from public databases, patient electronic health records and formal models of physiology which can be used to discover new knowledge, and further our understanding of the etiology and management of disease. Mobilising this knowledge, which may either be symbolic or quantitative, is a critical challenge, as is combining knowledge from different model organisms with data from humans. For example the ability to cross the species divide has long been a thorny problem, as human and model organism phenotypes are described using different formal ontologies and conceptual approaches. To address this, we are working with a series of ontologies and tools that use them, allowing the seamless integration of

phenotypic data between species and knowledge sources. This involves the development and application of new tools in artificial intelligence, knowledge representation, and formal ontology. We are now applying these semantic approaches to the integrate patient electronic health record data with external sources of knowledge, to produce very large knowledge graphs which can be used for graph convolutional neural network analysis with the aim of supporting clinical decision making and diagnosis , patient management, disease etiology and new approaches to therapy. This work is supported by King Abdullah University of Science and Technology, KSA and the Alan Turing Institute.

I am an adjunct Professor at the Jackson Laboratory in Bar Harbor, Maine, USA and Fellow of the Alan Turing Institute.

Data sharing initiatives

Data access and integration have become central to modern biology and using our experience we are developing with the German Federal Radiation Protection agency (BfS) and the MELODI ( framework, a public database for primary experimental and epidemiological data from radiation biology: STORE ( This database provides a platform for international data sharing and uses state of the art informatics to maximise data discovery and recovery. The project is currently supported under the Radonorm project: which has received funding from the Euratom research and training programme 2019-2020 under grant agreement No 900009. An ontology supporting FAIR data in radiation science is being developed collaboratively with the GeneLab project at NASA ( and the University of Birmingham Centre for Computational Biology.


Prof Robert Hoehndorf, Computational Biology, King Abdullah University of Science and Technology, Saudi Arabia

Prof George Gkoutos and Dr Luke Slater, Centre for Computational Biology, College of Medical and Dental Sciences, University of Birmingham, UK

Prof Peter Robinson, Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin

Prof John Sundberg, The Jackson Laboratory, Bar Harbor, Maine, USA
Dr Ulrike Kulka, Bundesamt fuer Strahlenschutz, Neuherberg, Germany
Dr Jack Miller, Dr Dan Berrios, Dr Sylvain Costes, NASA Ames Laboratory, USA.


Key publications: 

Slater LT, Williams JA, Karwath A, Fanning H, Ball S, Schofield PN, Hoehndorf R, Gkoutos GV. (2021) Multi-faceted semantic clustering with text-derived phenotypes. Comput Biol Med. 2021 Sep 27;138:104904. doi: 10.1016/j.compbiomed.2021.104904. Epub ahead of print. PMID: 34600327.

Kafkas, S., Althubaiti, S., Gkoutos. G.V; Hoehndorf, R., Schofield, P.N. (2021) Linking common human diseases to their phenotypes; development of a resource for human phenomics. J Biomed Semantics. 2021 Aug 23;12(1):17. doi: 10.1186/s13326-021-00249-x. PMID: 34425897; PMCID: PMC8383460.

Abdelhakim M, McMurray E, Syed AR, Kafkas S, Kamau AA, Schofield PN, Hoehndorf R (2020) DDIEM: Drug database for inborn errors of metabolism.Orphanet J Rare Dis 15, 146

Althubaiti, S., Karwath, A., Dallol, A., Noor, A., Alkhayyat,S.S., Alwassia, R., Mineta, K., Gojobori, T., Beggs, A., Schofield, P.N., Gkoutos,G.V., and Hoehndorf, R. (2019) Ontology-based prediction of cancer driver genes. Sci Rep 9, 17405.

Boudellioua, I., Kulmanov, M., Schofield, P.N., Gkoutos, G.V., and Hoehndorf, R. (2019). DeepPVP: phenotype-based prioritization of causative variants using deep learning. BMC Bioinformatics 20, 65.

Alghamdi, S.M., Sundberg, B.A., Sundberg, J.P., Schofield, P.N., and Hoehndorf, R. (2019). Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 9, 4025.

Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R,( 2017), Semantic prioritization of novel causative genomic variants, PLoS computational biology, 13:e1005500.

Gkoutos GV, Schofield PN, Hoehndorf R, (2017), The anatomy of phenotype ontologies: principles, properties and applications, Briefings in Bioinformatics, Brief Bioinform bbx035

Hoehndorf R, Schofield PN, Gkoutos GV, (2015), Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases, Scientific reports, 5, 10888

Groza T, Kohler S, Moldenhauer D et al., (2015), The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease, American Journal of Human Genetics, 97, 111-124

Hoehndorf R, Schofield PN, Gkoutos GV, (2015), The role of ontologies in biological and biomedical research: a functional perspective, Briefings in Bioinformatics, 10.1093/bib/bbv011

Hoehndorf R, Slater L, Schofield PN, Gkoutos GV, (2015), Aber-OWL: a framework for ontology-based data access in biology, BMC bioinformatics, 16, 26

Sundberg JP, Roopenian DC, Liu ET, Schofield PN, (2013), The Cinderella effect: searching for the best fit between mouse models and human diseases, The Journal of Investigative Dermatology, 133, 2509-2513

Hoehndorf R, Schofield PN, Gkoutos GV, (2011), PhenomeNET: a whole-phenome approach to disease gene discovery, Nucleic Acids Research, 39, e119

Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R, (2017), Semantic prioritization of novel causative genomic variants, PLoS computational biology 2017, 13:e1005500.

Teaching and Supervisions


Course Organiser: Human Reproduction
Lecturer: Functional Anatomy of the Body
Lecturer: MPhil in Computational Biology
Director of Studies in Veterinary Sciences, Robinson College.

Professor in Biomedical Informatics
Picture of Paul Schofield

Contact Details

+44 (0) 1223 333878, Fax: +44 (0) 1223 333840
Email address: