CPAP manual
Contents
  • Introduction

    Cancer Panel Analysis Pipeline (CPAP) is a free web-server package for processing Ion Ampliseq™ Cancer Panel data obtained with Ion torrent/PGM sequencer. CPAP requires the variant calling files in Excel or VCF format , which are generated by the variant caller plugin of Torrent Suite Software preinstalled with the Ion Torrent/PGMTM sequencer. The variant calling files can be downloaded from the Torrent Server through the web browser.

    The main features of CPAP include:

    • Importing genetic variants into SQLite database
    • Annotating genetic variants using local stored databases
      • NCBI dbSNP
      • 1000 Genomes Project
        • 1000G American
        • 1000G Asian
        • 1000G African
        • 1000G European
      • Catalogue of Somatic Mutations In Cancer (COSMIC)
        • COSMIC ID
        • Known mutations in diverse tissues
      • Funtional annotation results from ANNOVAR
      • Protein domain from InterPro annotation
      • dbNSFP (an integrated database of functional predictions from various algorithms):
        • PhyloP
        • SIFT
        • Polyphen2
        • LRT
        • MutationTaster
    • Generating interactive annotation reports
      • Circos plot (cross-sample comparison or group-wise comparison)
      • Dynamic HTML table & Pie chart
      • Filter box: Filters & Filter History
      • Downloadable files:
        • Variant annotation table ( XLS or Tab-delimited Text )
        • Filtered annotation table ( XLS or Tab-delimited Text )
        • Filtered Circos plot (SVG or PNG)
        • Sequence retrieval for experimental validation (Fasta)
      • Compressed & Indexed Variant Call Format (VCF) file for each sample (VCF+tabix Track Format supported by UCSC Genome Browser )
    • Linking annotation reports to external resources
      • NCBI dbSNP
      • Catalogue of Somatic Mutations In Cancer (COSMIC)
      • UCSC Genome Browser
        • Variants from each sample (custom track)
        • Amplicons of Ampliseq™ Cancer Panel (custom track)
        • NCBI RefGene
        • OMIM
        • dbSNP
        • COSMIC
        • HapMap


  • CPAP analysis workflow 


  • How to use the CPAP web server

    • INPUT file preparation

      • How to obtain variant calling files (Retrieve variant calling files from Torrent Server)
        • Login to the Torrent Browser (through web browser)
        • Select a run report
        • Access the Variant Caller Report through the link (variantCaller_v3.html) provided in the section of Plugin Summary
        • Access Variant Caller Report of each sample based on related barcode number
        • Download variant calling file (variant.xls or TSVC_variants.vcf) of each sample through the link (Download all variant calls as a table file) provided in the “File Links” section
        • The content of variant calling files (A. variant.xls and B. TSVC_variants.vcf)
          A. variant.xls
          Chrom Position Gene Sym Target ID Type Zygosity Ref Variant Var Freq P-value Coverage Ref Cov Var Cov HotSpot ID
          chr3 178936091 PIK3CA AMPL35831 SNP Het G A 54.82 1.26E-05 1795 811 984 COSM763;
          chr3 178938877 PIK3CA AMPL339432 SNP Het G A 22.4 6.31E-05 433 336 97 ---
          chr4 1806181 FGFR3 AMPL142574 SNP Hom C G 61.2 1.00E-07 317 122 194 ---
          chr4 1807894 FGFR3 AMPL411633 SNP Hom G A 99.41 2.00E-09 1015 6 1009 ---
          chr4 55141055 PDGFRA AMPL43181 SNP Hom A G 100 5.01E-09 797 0 797 ---
          chr4 55962451 KDR AMPL52189 SNP Het A G 2.13 7.94E-05 1031 1008 22 ---
          chr5 112175617 APC AMPL145156 SNP Het T A 67.47 5.01E-05 292 94 197 ---
          chr5 112175770 APC AMPL59934 SNP Het G A 98.85 1.00E-03 2693 30 2662
          chr7 55249063 EGFR AMPL493236 SNP Hom G A 99.53 1.58E-09 1070 3 1065 ---
          chr10 43613843 RET AMPL78961 SNP Hom G T 100 2.00E-08 576 0 576 ---
          chr14 105246407 AKT1 AMPL329410 SNP Hom G A 99.76 2.00E-10 1696 4 1692 ---
          chr17 37881497 ERBB2 AMPL504796 SNP Het G A 1.44 2.51E-03 975 961 14 ---

          B. TSVC_variants.vcf

      • Rename variant calling files

        • Why should we rename these variant calling files?
          Torrent Suite Software generates variant calling files in separate folders based on sample barcodes but uses an identical file name (variant.xls) for naming each variant calling file.
        • Nomenclature criteria:
          1.Only the combination of alphabetic characters [A to Z, a to z] ,numbers [0 to 9] and underscore “_” are allowed in nomenclature.
          2.Dot “.”, Hyphen “-”, Comma “,” Colon “:” and White space “ “ should be avoided in nomenclature.
          3.File name should always starts with a character [A to Z, a to z]. (ex. Sample01.xls, Sample02.xls, Sample_03.xls … or Sample01.vcf, Sample02.vcf, Sample_03.vcf … )
          4.The length of file name must be constrained within 30 characters, excluding the file extension (.xls or .vcf).
          5.The content of each variant calling file should be kept unchanged because our script only recognizes the default variantCaller output of Torrent Suite Software as correct input format.

      • How to prepare the compressed file for uploading


        • Criteria:
          1.Variant calling files should be compressed into a single compressed file.
          2.Supported file formats: .tar.bz2, .tar.gz and .zip
          3.The compressed file should contain at least 2 variant calling files.
          4. Directory structure is not allowed in the compressed file.
          5. Only the same file format (*.xls or *.vcf) is allowed in a single compressed file.
        • Unix command: tar jcvf upload.tar.bz2 *.xls (or *.vcf) OR tar czvf upload.tar.gz *.xls (or *.vcf) OR zip -r upload.zip *.xls (or *.vcf)

      • Upload the compressed file & Retrieve finished job by job identifier


      • Group-wise Comparison


        Samples from two different finished jobs can be displayed as two distinct colors by using our Group-wise comparsion module, which is especially useful for the comparsion between normal and cancer samples.


  • Output


    The output page of CPAP consist of:


  • Example Uses


  • Database Update


  • Benchmarking

    For the best usability of our server, the scheduling system dynamically assigns CPU cores to each job, which is based on the max availability of computing resources.

    Operating System Ubuntu 12.10 x86_64

    Job Queuing System


    HTCondor


    Hardware

    vCPUs 8*2.0GHz
    vRAM 8GB
    vHDD 500GB

    Programming Language Shell script
    Rscript
    C
    Perl
    PHP
    javascript