Logo Search packages:      
Sourcecode: python-biopython version File versions

def Bio::Align::AlignInfo::SummaryInfo::replacement_dictionary (   self,
  skip_chars = [] 
)

Generate a replacement dictionary to plug into a substitution matrix

This should look at an alignment, and be able to generate the number
of substitutions of different residues for each other in the
aligned object.

Will then return a dictionary with this information:
{('A', 'C') : 10, ('C', 'A') : 12, ('G', 'C') : 15 ....}

This also treats weighted sequences. The following example shows how
we calculate the replacement dictionary. Given the following
multiple sequence alignments:

GTATC  0.5
AT--C  0.8
CTGTC  1.0

For the first column we have:
('A', 'G') : 0.5 * 0.8 = 0.4
('C', 'G') : 0.5 * 1.0 = 0.5
('A', 'C') : 0.8 * 1.0 = 0.8

We then continue this for all of the columns in the alignment, summing
the information for each substitution in each column, until we end
up with the replacement dictionary.

Arguments:
o skip_chars - A list of characters to skip when creating the dictionary.
For instance, you might have Xs (screened stuff) or Ns, and not want
to include the ambiguity characters in the dictionary.

Definition at line 205 of file AlignInfo.py.

00205                                                      :
        """Generate a replacement dictionary to plug into a substitution matrix
        
        This should look at an alignment, and be able to generate the number
        of substitutions of different residues for each other in the
        aligned object.

        Will then return a dictionary with this information:
        {('A', 'C') : 10, ('C', 'A') : 12, ('G', 'C') : 15 ....}

        This also treats weighted sequences. The following example shows how
        we calculate the replacement dictionary. Given the following
        multiple sequence alignments:

        GTATC  0.5
        AT--C  0.8
        CTGTC  1.0

        For the first column we have:
        ('A', 'G') : 0.5 * 0.8 = 0.4
        ('C', 'G') : 0.5 * 1.0 = 0.5
        ('A', 'C') : 0.8 * 1.0 = 0.8

        We then continue this for all of the columns in the alignment, summing
        the information for each substitution in each column, until we end
        up with the replacement dictionary.

        Arguments:
        o skip_chars - A list of characters to skip when creating the dictionary.
        For instance, you might have Xs (screened stuff) or Ns, and not want
        to include the ambiguity characters in the dictionary.
        """
        # get a starting dictionary based on the alphabet of the alignment
        rep_dict, skip_items = self._get_base_replacements(skip_chars)

        # iterate through each record
        for rec_num1 in range(len(self.alignment._records)):
            # iterate through each record from one beyond the current record
            # to the end of the list of records
            for rec_num2 in range(rec_num1 + 1, len(self.alignment._records)):
                # for each pair of records, compare the sequences and add
                # the pertinent info to the dictionary
                rep_dict = self._pair_replacement(
                    self.alignment._records[rec_num1].seq,
                    self.alignment._records[rec_num2].seq,
                    self.alignment._records[rec_num1].annotations['weight'],
                    self.alignment._records[rec_num2].annotations['weight'],
                    rep_dict, skip_items)

        return rep_dict

    def _pair_replacement(self, seq1, seq2, weight1, weight2,


Generated by  Doxygen 1.6.0   Back to index