PaVE

About PaVE Frequently Asked Questions Licenses About The Team Contact Us

The PapillomaVirus Episteme (PaVE) has been established to provide highly organized and curated papillomavirus genomics information and tools to the scientific community. The PaVE consists of a database and web applications that support the storage, annotation, analysis, and exchange of information. To the extent possible, the PaVE adopts an open source software approach and emphasizes integration and reuse of existing tools. The PaVE currently contains 853 annotated papillomavirus genomes (including 254 Non-reference genomes), 11338 genes and regions, 6687 protein sequences, and 108 protein structures, which users can explore, analyze or download. In addition, because of recent advances in Next-Gen sequencing, several putative novel genomes have been described (see de Villiers et al., 2004 and Bernard et al., 2010) that do not meet all these requirements, and will therefore not be recognized as novel viral types by the International Human Papillomavirus Reference Center. In order to reflect the known papillomavirus diversity PaVE has chosen to include viruses that meet the "90% sequence identity" rule, even if they do not meet the other criteria. These viruses will be identified by the appendix "nr". For more details refer to the Taxonomy Concept page. The seamless integration of the data and the analytical tools is designed to assist in accelerating scientific progress and ultimately in our understanding, detection, diagnosis, and treatment of diseases caused by papillomaviruses.

Many original papillomavirus sequences contained errors that are still present in the original Genbank sequence record and the RefSeq records. These records were rectified based on updated sequences submitted to the Los Alamos Papillomavirus resource. Some other genomes contain mutations or errors that disrupt major ORFs. In the interest of developing an accurate Reference Clone, if these mutations or errors are not present in multiple variants of the same viral type, they have been corrected. Revisions are noted in the Refclone table. The PaVE Reference Genomes (HPV_REF) have been re-annotated for uniformity.

If you use the PaVE website to assist in research publications or proposals, please cite both the URL, pave.niaid.nih.gov and The Papillomavirus Episteme: a major update to the papillomavirus sequence database. Koenraad Van Doorslaer, Zhiwen Li, Sandhya Xirasagar, Piet Maes, David Kaminsky, David Liou, Qiang Sun, Ramandeep Kaur, Yentram Huyen and Alison A. McBride

You can view papers that reference PaVE here.

Disclaimer: The U.S. Government does not warrant or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, data, documents or software available from this server.

Database Search

How does the Keyword search work?

DataTables has a built in search algorithm referred to as "smart" searching. A smart search in DataTables provides the following abilities:

Match words out of order. For example if you search for Allan Fife it would match a row containing the words Allan and Fife, regardless of the order or position that they appear in the table.
Partial word matching. As DataTables provides on-the-fly filtering with immediate feedback to the user, parts of words can be matched in the result set. For example All will match Allan.
Preserved text. DataTables 1.10 adds the ability to search for an exact phrase by enclosing the search text in double quotes. For example "E1 " will match only text which contains the phrase E1 followed by a space or nothing. It will not match E1^E4 However, "E1" will still match E1^E4.

How do I search the PaVE database by Taxonomy?

PaVE features a taxonomy search function. To search based on taxonomy click on the "Search Database" under the "Search" tab in the PaVE web site top navigation. In the "Select Filters" list click on "Taxonomy" to open the taxonomy filter modal window. To select a taxonomic group including all of the subgroups below it, simply click on the check box next to the group name. You may select multiple taxonomic groups in this way. Scroll up and down to see all of the groups. To make finer selections specific to taxonomic subgroups, select or deselect subgroups. When all of your selections have been made. Click the "Done" button at the top of the window. The search filter will instantly be applied to the search results. Each selection that you made will appear in the "Taxonomy" chip at the top of the search results table to indicate that your filter is applied.

How can I remove applied filters?

The fastest way to clear a specific filter is to close the filter chip listed at the top of the table. Filters can also be deselected in the filter lists if you don't want to remove all selections for a given filter.

How do I select sequence for download?

You can select rows from the table by clicking on them! If you want to select all of the displayed rows in the table, use the "Select All" link at the top of the table. The sequences can then be downloaded by selecting the desired format from the "Download" menu.

Phylogenetic Tree

How can I select subtrees?

You can select subtrees in the phylogenetic tree by:

Selecting a genus in the legend. Each genus selected will be highlighted in the tree. Multiple genera can be selected, but they will all be highlighted in the same color.
Clicking on a node (the little circles at the branch junctions) and selecting what to highlight.

Can I highlight multiple groups in different colors?

All selections made from the legend are highlighted in the same color. Selections made by clicking the nodes within the tree are made in a different color from the legend selections, but all node selections are highlighted in the same color.

Can I hide unwanted genera?

Hiding all branches for a given genus based on selections from the legend is not possible, as the genera are often interleaved on parental branches, but branches in the tree can be hidden by clicking on the node circles and selecting "Collapse subtree."

Why did I get an error when I tried to generate a tree?

There are several reasons that phylogenetic tree generation can fail. If the error says the "Jukes-Cantor-like Mk-model distances cannot be computed when only variable sites are present", the issue is that there is too much variability in the input sequences; they aren't related enough to form a phylogenetic tree. Additional FAQs related to PAUP*, the program used by PaVE to generate the phylogenetic tree, can be found in the official PAUP* documentation.

BLAST

How do I run a PV specific BLAST search?

All BLAST searches in PaVE are run against our curated set of genomes. To run a BLAST search, either click on the "Search" tab in the PaVE web site top navigation, then select "PV Specific BLAST", or select a feature in the Locus Viewer by going to the "Search Database" page under the "Search" tab and clicking on the name of a genome and then select BLAST; the form will be filled in with the feature's sequence.

What options does PaVE run BLAST with?

While the BLAST+ programs provide many options, PaVE only uses the '-evalue' and '-outfmt' flags to provide the user selected E-value to BLAST and request the output formats that we use for display and download. All other options are run with the default values for the selected program. This should work well for most searches, but is not ideal for all.

What format does the input data need to be in for BLAST?

PaVE's BLAST form only accepts FASTA formatted input. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line (defline) is distinguished from the sequence data by a greater-than (">") symbol at the beginning. Example:

>GENE X PROTEIN 
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP

Blank lines are not allowed in the middle of FASTA input, but a blank line is allowed between sequences in the input.

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters. Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).

How do I specify a "homology cutoff value" for my BLAST results?

Use the E-value cutoff selection in PaVE's BLAST search window. Click on the "Search" tab in the PaVE web site top navigation, then select "PV Specific BLAST". Under the BLAST Search tab, you will see the E-value cutoff option. E-values are the probability of finding a similar hit by chance alone, thus the lower the E-value, the higher the quality of your BLAST hit. To set the filter to a higher stringency (and return fewer, higher quality results), lower the value. E-value cutoff value options are: 10, 0.1, 1e-05, and 1e-10. By default, the value is set to "1e-10".

I've submitted a BLAST query and I see a list of results, how do I view the BLAST sequence hits?

BLAST results on PaVE are returned in a table format sorted by E-value. To view the sequence query aligned to the BLAST hit, click on the plus icon on the left hand side of the results table to open the alignment results.

L1 Typing Tool

What is the L1 typing tool?

The L1 typing tool is designed to assist users in gauging the probability that their sequence belongs to a novel PV type based on the similarity of its L1 gene sequence to current PV types in the PaVE database. The tool was developed by Piet Maes, at the Katholieke Universiteit Leuven. The tool generates alignments with MAFFT and uses PAUP* to construct the phylogenetic tree.

Users should refer to the Submission page for details regarding the official type assignment process.

What format does the input data need to be in for the L1 typing tool?

PaVE's L1 typing tool only accepts FASTA formatted input of putative L1 genes. It is recommended that users not upload more than 10 sequences at a time. The more sequences the user submits, the longer the user will need to wait for a result.

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line (defline) is distinguished from the sequence data by a greater-than (">") symbol at the beginning. Example:

>L1 gene new type
atgactttgtggctgccaacgacgggtaaagtatacttgcctccaacaccaccagtagcccgggtgcaaagcacggatgattatgtggaaagaacaagtgtgttctatcatgctatgagtgatcgtctactaactgtaggacacccattttatgatgtgagatccagtgatggctcaacaattgaggttcctaaagtctcaggaaatcaatatagagcttttagggtccgtttaccggatccgaataaatttgctttagcagacatgtcagtctataacccagaaaaagaaagattagtttgggcttgtgcaggcttggagataggccgaggacagccacttggagtaggtacatcaggccatcccttatttaataaattaagggatacagaaaacaatagtaattatcagggtgggtcacgggacgacagacagaacacatcttttgatccaaaacaggttcaaatgtttgttgtaggatgtgttccatatatgggagaacattgggataaagcacctgtttgtgcatccgaaaaaaataatcaaagagggctatgtccaccactagaactaaaaaatacagtaatagaagatggggatatgtttgacatagggtttgggaatattaataataaagagctttccattaacaagtcagatgttagtttagatatagtaaatgaaatatgcaaataccctgactttttaacaatgtctaatgatgtttatggggacgcatgtttcttttttgccagaagagagcaatgttatgccagacattattttgtaaggggaggtaatgtaggtgatgctattcccgatggcactgttaatcaggaccacaaatattacttgcctgccaaatcagaccaacaacagtatattttaggcaattctacttattttcccactgttagtggttctttggtaacctcagatgcacaactttttaataggcctttttggttacgtagagcacaaggacacaataatggtatactgtggggaaaccagatatttattacagttgctgacaatacaagaaacaccaacttttccattagtgtttccactgaagatggaccagttacagaatacaatgctcagcaaataagagaatatttaagacatgttgaagaatatcaactatcatttattttacagctttgtaaagtatctttaaaggctgaggtcttaacgcaaattaatgcaatgaattctgatatattggaggattggcaattaggatttgtacctactccagataattcagtacatgatttgtataggtacattagttccaaggctactaaatgtcctgatgctgctgtagaaaaagaaagagaagatccctttggaaaatacacgttttggaatgtagatttaagtgaaaagttatccttagatttagatcaatatcctttaggaaggaaattcttatttcagtctggattgcaaactagacctagaattgtacgatcctctgtaaaagtgtccaaaggcacaaagcgtaaacggtcgtga

Blank lines are not allowed in the middle of FASTA input, but a blank line is allowed between sequences in the input.

Sequences are expected to be represented in the standard IUB/IUPAC nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length. Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue).

Locus Viewer

When I am observing a sequence record in Locus View, how can I display the sequence of a particular feature?

Click on your feature of interest (coding sequence, origin of replication, source, etc.). The pane "Selected Feature Details" will display detailed information about the feature that you have selected. The pane "Selected Sequence Details" will display the entire locus sequence with your selected feature highlighted.

Structure Viewer

What is displayed on the initial page load?

There are four (4) main components on the Structure Viewer page. The first component you see is the Mol* structure viewer. A detailed user guide for Mol* is available at https://molstar.org/viewer-docs/; be sure to check it out! The viewer will display the structure selected by the user (or the most similar structure to the protein the user selected), along with the sequence of the first chain listed in the PDB file. Use the drop down menus at the top of the viewer to change which chain is displayed.

The second component is the sequence alignment, which displays the alignment between the sequence of the displayed structure and a protein. The protein will either be the one the user requested to view the structure for, or the most similar protein to the structure sequence if no protein was selected. You can change which protein sequence is aligned to the structure sequence by selected a new protein from the Homologous Sequences table, which is the third major component on the page.

The Homologous Sequence table lists all of the proteins in PaVE that are the same type as the protein the currently viewed structure was made from (e.g. L1). The listed sequences can be filtered by minimum percent identity, either by changing the number displayed in the text above the table, or by using the slider. The table lists the protein ID, percent identity, and displays what portion of the protein is covered by the structure (grey).

The final component visible on the page is the Related Structures list. These are structures that either come from the same type of protein (e.g. L1), or that represent a different chain in the current structure. Selecting an ID from this list will refresh the page to display information related to that structure.

Can I make the structure viewer bigger?

You can view the structure viewer in full screen mode by selecting the square corners icon (below the wrench) from the vertical tools list in the viewer window.

Can I show/hide the tools menu?

The tools menu on the right side of the viewer can be shown/hidden by selecting the wrench icon in the vertical tools list in the viewer.

How can I select a location/region of interest in the structure?

Selections can be made by clicking directly on the structure, by clicking on an amino acid in the sequence at the top of the structure viewer, or by selecting a location in the alignment below the viewer. To select more than one location at a time you can click and drag along the seuqence at the top of the viewer, or hold the shift key to make multiple selections on the structure or alignment.

BLAST+ - US Government License and Copyright
Citation: Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec 15;10:421. DOI: 10.1186/1471-2105-10-421. PubMed PMID: 20003500; PubMed Central PMCID: PMC2803857.

Celery - BSD 3-Clause license

DataTables - MIT License

Flask - BSD-3-Clause license

MAFFT - BSD license

MSAViewer - Boost Software License 1.0
Citation: Yachdav G, Wilzbach S, Rauscher B, Sheridan R, Sillitoe I, Procter J, Lewis S, Rost B, Goldberg T. MSAViewer: interactive JavaScript visualization of multiple sequence alignments. Bioinformatics. 2016 Nov 15;32(22):3501-3503. DOI: 10.1093/bioinformatics/btw474. PMID: 27412096 PMCID: PMC5181560

Mol* viewer - GNU Affero General Public License

The Newick Utilities - Citation: Junier T, Zdobnov EM. The Newick Utilities: High-throughput Phylogenetic tree Processing in the UNIX Shell. 2010 Jul 1;26(13):1669-70. DOI: 10.1093/bioinformatics/btq243. PMID: 20472542 PMCID: PMC2887050

PAUP* - GPL

phylotree.js - BSD 3-Clause license

Swagger - Apache 2.0 License

trimAl - GNU GENERAL PUBLIC LICENSE
Citation: Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009 Aug 1;25(15):1972-3. DOI: 10.1093/bioinformatics/btp348. PubMed PMID: 19505945; PubMed Central PMCID: PMC2712344.

The PaVE team includes members from the laboratory of Dr. Alison McBride at the DNA Tumor Virus Section in the Laboratory of Viral Diseases in the Division of Intramural Research and the Bioinformatics and Computational Biosciences Branch (BCBB) headed by Dr. Darrell Hurt at Office of Cyber Infrastructure and Computational Biology (OCICB) in the NIAID/NIH.

Current Contributors

Lab of Viral Diseases, National Institute of Allergy and Infectious Diseases

Alison McBride

Alix Warburton

Bioinformatics and Computational Biosciences Branch

Jennifer Dommer

Cyrus Afrasiabi

Samuel Ezeji

Lewis Kim

Krishnaveni Kaladi

David Liou

Huy Nguyen

Yamil Boo Irizarry

Duc Doan

Kristen Browne

Mike Dolan

Operations and Engineering Branch

Wei Lu

Rega Institute, Katholieke Universiteit Leuven

Piet Maes

University of Arizona

Koenraad van Doorslaer

Josh Pace

Former Contributors

Vivek Gopalan

Sandya Bandaru

Yasmin Mahmoud

Qina Tan

Wei Liang

Qiang Sun

Sandhya Xirasagar

Zhiwen Li

Maarten Rudolph Leerkes

David Kaminsky

Richard Burke Squires

Vijayaraj Nagarajan

PaVE Scientific Advisors

Alison McBride, Ph.D.
National Institute of Allergy & Infectious Diseases (NIAID)

Koenraad Van Doorslaer, Ph.D.
University of Arizona

Hans-Ulrich Bernard, Ph.D.
University of California, Irvine

Thomas S. Brettin, Ph.D.
Los Alamos National Laboratory

Thomas R. Broker, Ph.D.
University of Alabama, Birmingham

Chris B. Buck, Ph.D.
National Cancer Institute (NCI)

Robert D. Burk, MD.
Albert Einstein College of Medicine, Yeshiva University

John Doorbar, Ph.D.
Department of Pathology, University of Cambridge

Marc van Ranst, MD/Ph.D.
Rega Institute for Medical Research, Leuven Belgium

Ethel-Michele de Villiers, Ph.D.
German Cancer Research Center, Heidelberg Germany

Zigui Chen, Ph.D.
The Chinese University of Hong Kong

Aare Abroi Ph.D.
Estonian Biocentre, Tartu, Estonia

Ignacio Bravo, Ph.D.
Centre National de la Recherche Scientifique, France

The NIAID Office of Cyber Infrastructure and Computational Biology (OCICB) along with the Bioinformatics and Computational Biosciences Branch (BCBB) welcomes your feedback for improvement of the PaVE resource. Please report bugs, provide suggestions for development, and ask questions by contacting our support team.

Please note that your email address is saved with the express purpose of responding to your request. Please visit our Privacy Policy page for more details.

You can also connect with us on Facebook.