Proteins with a single transmembrane domain make up ~30% of all membrane proteins. Their interaction and oligomerisation in the membrane is vital for many biological processes. Due to a lack of crystal structures, their interaction interfaces are poorly understood. Here we present THOIPA (Transmembrane Homodimer Interface Prediction Algorithm), a algorithm which can predict interfacial residues from sequence data alone.

How THOIPA works

1) The full-length sequence is used to obtain homologous sequences for the protein, using BLAST.

2) Based on the input TMD, the TMD region of each homologue is identified, extracted and combined into a multiple sequence alignments.

3) Parameters such as sequence conservation, hydrophobicity, and residues co-variation are extracted from the multiple-sequence alignment.

4) The parameters are used as the input for a machine learning algorithm, previously trained against interfaces derived from crystal structural and NMR studies.

5) The probability of each residue which are interface residues were predicted with the classifier model.