Evaluating alignment quality and stress-testing the aligner
When constructing a sequence alignment system various design decisions can have an impact on sequence alignment quality. To assess the effects each of these decisions can have on alignment quality, we have created a method to score alignment accuracy across numerous test cases using genome sequences that have diverged to varying degrees. We refer to such test results as “accuracy profiles” because they demonstrate the aligner’s average accuracy over many evolutionary scenarios.
Overview of the methodology
Using the evolver software
The following components are necessary to compile the evolver software under Windows, Linux, or Mac OS X:
The following additional software supports automated aligner testing and generation of postscript accuracy plots:
All of these packages must be installed in order to perform aligner accuracy profiling. Some of these packages may already be installed on your system.
Genome alignment software is notoriously CPU intensive. Performing all but the most rudimentary set of alignment experiments will require significant computation resources. This software is designed to work on large compute clusters with the condor job scheduler, however submission scripts for other scheduling systems should be straightforward.
Follow instructions to compile mauveAligner from source Then download the sgEvolver source and build as follows:
How to simulate evolution and test an aligner’s accuracy:
Using the Condor High Throughput Computing environment to process many simulations rapidly
Follow steps 1 through 5 above. When editing simujobparams.pm be sure to set the paths to the aligners and the scoring tools. These paths must be accessible from the compute node running the job, thus they should be on shared storage. The simujobrun.pl script executes programs in order to carry out simulated evolution, alignment, and scoring of alignments. In particular, evolution requires dd, seq-gen, and sgEvolver. Alignment requires an aligner, and scoring requires scoreAlignment and extractBackbone. Step 5 will create a condor DagMan submission script called jobs.dag and a job submission script called mauveAlign.condor. Edit the mauveAlign.condor submission script to select the aligner you would like to test. Options are mauve, mavid, mlagan, slagan, and none. Optionally, the debug parameter may also be given to simujobrun. When debug is used none of the data files generated during the simulated evolution and alignment process are deleted and all get sent back to the job submission host. It may be necessary to set other condor-specific parameters as well. Submit the condor jobs with condor_submit_dag -maxjobs ## jobs.dag Here ## is the maximum number of condor jobs that will run simultaneously. When all jobs have completed successfully, use scoregen.pl to extract scores from the alignjob directories and generate a heat plot in postscript format. scoregen.pl depends on rgradientplot.R in your tools directory. When running multiple replicates of the simulation, scoregen.pl will automatically average together the scores for each replicate and plot a single average score for that combination of mutation rates.