The usual way to demonstrate saturation in nucleotide sequences is to plot the fraction of differences between sequences against the evolutionary distance separating them. When the number of observed differences, for example for the fraction of third codon positions, no longer increases with increasing evolutionary distance, the sequence is said to be saturated. The same technique can be applied to amino acid (aa) sequences. We have developed a Java application called ASaturA that discriminates aa substitutions with high and low probabilities of occurrence. All aa replacements are defined either as 'frequent' or as 'rare' depending on their mutation probabilities, which are inferred from substitution probability matrices, such as the well-known PAM and BLOSUM. These 20 by 20 matrices provide the empirically derived probabilities of one aa being replaced by another one when sequences have diverged over a certain evolutionary distance. ASaturA sorts all substitutions according to these probabilities and a probability 'cut-off' value can be chosen that differentiates between frequent and rare substitutions. For each sequence pair, the program plots the number of observed frequent and rare aa replacements against their evolutionary distance. By modifying the substitution probability 'cut-off' value, the number of aa substitutions classified as frequent or rare can be changed. Ideally, careful selection of the 'cut-off' value splits the original data set into a saturated and an unsaturated one. Besides the most widely used substitution probability matrices, such as PAM, BLOSUM, mtREV24 and JTT, user-defined matrices can be used also.


Van de Peer, Y., Frickey, T., Taylor, J.S., Meyer, A. (2002) Dealing with saturation at the amino acid level: A case study involving anciently duplicated zebrafish genes. Gene 295, 205-11.