Linear schematic of an E4 protein

The E4 protein is the most abundant of the papillomavirus proteins. Although initially designated as an E (early) protein, it is expressed at high levels in differentiated cells and is primarily a late protein. E4 proteins from some papillomaviruses form inclusion bodies and can interact with and collapse keratin filaments. This might aid in virion release by resulting in fragile cornified envelopes. E4 expression also causes a G2 arrest that might provide an ideal environment for amplification of viral DNA in differentiating cells. E4 proteins are the most divergent papillomavirus gene products and they range in length from 70 to over 300 residues. The E4 proteins are expressed primarily from a spliced message that joins the first few amino acids of the E1 protein to the E4 ORF. The E4 ORF overlaps the region of the genome that encodes the E2 hinge region. The N-terminal region of many E4 proteins contains a conserved leucine motif (LLXLL) and a proline cluster that are important for keratin association and cell cycle arrest, respectively. A C-terminal domain of E4 mediates multimerization. E4 proteins are progressively cleaved at the N-terminus giving rise to several species of shorter E4 proteins.

Prediction of the E1^E4 splice product

The viral E4 ORF is embedded within the E2 coding region, and is usually encoded from the +1 frame. During the annotation process PaVE uses published evidence (reviewed in Doorbar 2013) as well as splice-site prediction algorithms. Specifically a combination of the Spliceview (Rogozin and Milanesi, 1997) and ASSP (Wang M. and Marin A. 2006) methods are used. Finally, homology based approaches were used.