Using progressiveMauve from the command-line
progressiveMauve also offers many command-line options to provide detailed control over alignment parameters. This section gives a reference of the parameters and some examples regarding their usage.
Before running any of the example commands it will be necessary to locate the progressiveMauve binary on your system, as described in the mauveAligner command-line page.
Command-line paramter reference
--apply-backbone=<file>
Read an existing sequence alignment in XMFA format and apply backbone statistics to it
--disable-backbone
Disable backbone detection
--mums
Find MUMs only, do not attempt to determine locally collinear blocks (LCBs)
--seed-weight=<number>
Use the specified seed weight for calculating initial anchors
--output=<file>
Output file name. Prints to screen by default
--backbone-output=<file>
Backbone output file name (optional).
--match-input=<file>
Use specified match file instead of searching for matches
--input-id-matrix=<file>
An identity matrix describing similarity among all pairs of input sequences/alignments
--max-gapped-aligner-length=<number>
Maximum number of base pairs to attempt aligning with the gapped aligner
--input-guide-tree=<file>
A phylogenetic guide tree in NEWICK format that describes the order in which sequences will be aligned
--output-guide-tree=<file>
Write out the guide tree used for alignment to a file
--version
Display software version information
--debug
Run in debug mode (perform internal consistency checks–very slow)
--scratch-path-1=<path>
Designate a path that can be used for temporary data storage. Two or more paths should be specified.
--scratch-path-2=<path>
Designate a path that can be used for temporary data storage. Two or more paths should be specified.
--collinear
Assume that input sequences are collinear–they have no rearrangements
--scoring-scheme=<ancestral|sp_ancestral|sp>
Selects the anchoring score function. Default is extant sum-of-pairs (sp).
--no-weight-scaling
Don’t scale LCB weights by conservation distance and breakpoint distance
--max-breakpoint-distance-scale=<number [0,1]>
Set the maximum weight scaling by breakpoint distance. Defaults to 0.9
--conservation-distance-scale=<number [0,1]>
Scale conservation distances by this amount. Defaults to 1
--skip-refinement
Do not perform iterative refinement
--skip-gapped-alignment
Do not perform gapped alignment
--bp-dist-estimate-min-score=<number>
Minimum LCB score for estimating pairwise breakpoint distance
--mem-clean
Set this to true when debugging memory allocations
--gap-open=<number>
Gap open penalty
--gap-extend=<number>
Gap extend penalty
--substitution-matrix=<file>
Nucleotide substitution matrix in NCBI format
--weight=<number>
Minimum pairwise LCB score
--min-scaled-penalty=<number>
Minimum breakpoint penalty after scaling the penalty by expected divergence
--hmm-p-go-homologous=<number>
Probability of transitioning from the unrelated to the homologous state [0.0001]
--hmm-p-go-unrelated=<number>
Probability of transitioning from the homologous to the unrelated state [0.000001]
--seed-family
Use a family of spaced seeds to improve sensitivity
Examples of progressiveMauve usage
Example 1. Align three genomes from the input files genome_1.gbk, genome_2.gbk, and genome_3.gbk, saving the output to a file called threeway.xmfa
progressiveMauve --output=threeway.xmfa genome_1.gbk genome_2.gbk genome_3.gbk
Example 2. Align the same three genomes but also save the guide tree and produce a backbone file
progressiveMauve --output=threeway.xmfa --output-guide-tree=threeway.tree --backbone-output=threeway.backbone genome_1.gbk genome_2.gbk genome_3.gbk
Example 3. Align the same three genomes, but do not detect forced alignment of unrelated sequence and do not create a backbone file
progressiveMauve --output=threeway_no_backbone.xmfa --disable-backbone genome_1.gbk genome_2.gbk genome_3.gbk
Example 4. Detect forced alignment of unrelated sequence in the alignment produced in Example 3. Use custom Homology HMM transition parameters. Save a backbone file.
progressiveMauve --apply-backbone=threeway_no_backbone.xmfa --output=threeway.xmfa --backbone-output=threeway.backbone --hmm-p-go-homologous=0.001 --hmm-p-go-unrelated=0.000005
Example 5. Compute ungapped local-multiple alignments among the input sequences and save them to a file called threeway.mums
progressiveMauve --mums --output=threeway.mums genome_1.gbk genome_2.gbk genome_3.gbk
Example 6. Compute an alignment of the same three genomes, using previously computed local-multiple alignments
progressiveMauve --match-input=threeway.mums --output=threeway.xmfa genome_1.gbk genome_2.gbk genome_3.gbk
Example 7. Set a custom breakpoint penalty to cope with genomes where default penalty does not work. The default penalty can be extracted from the program’s textual output, in this hypothetical example, the default penalty will be 100000.
progressiveMauve --output=threeway.xmfa --weight=50000 genome_1.gbk genome_2.gbk genome_3.gbk
Example 8. Set a minimum scaled breakpoint penalty to cope with the case where most genomes are aligned correctly, but manual inspection reveals that a divergent genome has too many predicted rearrangements.
progressiveMauve --min-scaled-penalty=5000 --output=threeway.xmfa genome_1.gbk genome_2.gbk genome_3.gbk
Example 9. Globally align a set of collinear virus genomes that reside in a single FastA file. Use seed families to improve anchoring sensitivity in regions below 70% sequence identity.
progressiveMauve --collinear --seed-family --disable-backbone --output=virus.xmfa all_virii.fasta