|
University of Glasgow scientists have harnessed a powerful supercomputer, to develop a new machine learning model which can help translate the language of proteins. In a new study, published in Nature Communications, the cross-disciplinary team developed a large language model (LLM), called PLM-Interact, to better understand protein interactions, and even predict which mutations will impact how these crucial molecules ‘talk’ to one another. Early tests of PLM-interact, a protein language model (PLM), show that it outperforms competing models in understanding and predicting how proteins interact with one another. The team’s research demonstrates PLM-interact could help us better understand key areas of medical science, including the development of diseases such as cancer and virus infection. The research team led by Dr Ke Yuan from the University’s School of Cancer Sciences and the Cancer Research UK Scotland Institute, Prof Craig Macdonald from the School of Computing Science and Prof David L Robertson, from the MRC-University of Glasgow Centre for Virus Research (CVR) are developing these types of AI model to add much-needed detail on how diseases arise. PLM-interact could also provide new insight into how viruses interact with their host species. In the future, it is possible this approach could even be used to predict a virus’s pandemic potential and identify new drug targets. Proteins are the main structural components of all cells and viruses and play a key role in biological processes by interacting with other proteins. Disruption of these protein-to-protein interactions (PPIs) is often linked with disease formation, including cancers and genetic diseases. Additionally, protein-to-protein interactions play an important role in viral infections, with viruses relying on the proteins in our cells to help them replicate and continue the infection process. A better understanding of protein interactions would offer scientists vital new insights into disease and infections, potentially paving the way for the development of new therapies or vaccines. However, currently identifying protein-to-protein interactions experimentally can be both costly and time-consuming, and new ways to speed up the learning process are required. Dr Ke Yuan, one of the paper’s corresponding authors, said: “It’s great to think that DiRAC, which was developed to help scientists understand the laws of nature from the smallest subatomic particles to the largest scales in the Universe, has helped us build this new model to explore the inner space of protein interactions instead. The research team also trained PLM-interact with a further 22,383 protein-to-protein interactions, this time from 5,882 human and 996 virus proteins. Once again PLM-interact outperformed existing protein models in its ability to predict how human and virus proteins interacted, demonstrating the model’s power as an accurate virus prediction tool. Prof David L Robertson, head of CVR Bioinformatics, University of Glasgow and the paper’s other corresponding author said “The urgency to understand virus-host interactions during Covid-19 pandemic is a good illustration of why a tool like PLM-interact could be invaluable in the future. Being able to quickly and accurately gain insight into how viruses interact with our proteins could help us better understand virus emergence and disease risks, which in turn can help speed up the development of new treatments and therapies. The study, ‘PLM-interact: extending protein language models to predict protein-protein interactions’ is published in Nature Communications. The work was funded by European Union’s Horizon 2020 research and innovation 562 programme, the Medical Research Council with support from Cancer Research UK, Prostate Cancer UK and the Biotechnology and Biological Sciences Research Council.
|