Logo Search packages:      
Sourcecode: python-biopython version File versions

def Bio::NeuralNetwork::Gene::Signature::SignatureCoder::representation (   self,
  sequence 
)

Convert a sequence into a representation of its signatures.

Arguments:

o sequence - A Seq object we are going to convert into a set of
signatures.

Returns:
A list of relative signature representations. Each item in the
list corresponds to the signature passed in to the initializer and
is the number of times that the signature was found, divided by the
total number of signatures found in the sequence.

Definition at line 155 of file Signature.py.

00155                                       :
        """Convert a sequence into a representation of its signatures.

        Arguments:

        o sequence - A Seq object we are going to convert into a set of
        signatures.

        Returns:
        A list of relative signature representations. Each item in the
        list corresponds to the signature passed in to the initializer and
        is the number of times that the signature was found, divided by the
        total number of signatures found in the sequence.
        """
        # check to be sure we have signatures to deal with,
        # otherwise just return an empty list
        if len(self._signatures) == 0:
            return []
        
        # initialize a dictionary to hold the signature counts
        sequence_sigs = {}
        for sig in self._signatures:
            sequence_sigs[sig] = 0

        # get a list of all of the first parts of the signatures
        all_first_sigs = []
        for sig_start, sig_end in self._signatures:
            all_first_sigs.append(sig_start)
        
        # count all of the signatures we are looking for in the sequence
        sig_size = len(self._signatures[0][0])
        smallest_sig_size = sig_size * 2

        for start in range(len(sequence) - (smallest_sig_size - 1)):
            # if the first part matches any of the signatures we are looking
            # for, then expand out to look for the second part
            first_sig = sequence[start:start + sig_size].data
            if first_sig in all_first_sigs:
                for second in range(start + sig_size,
                                    (start + sig_size + 1) + self._max_gap):
                    second_sig = sequence[second:second + sig_size].data

                    # if we find the motif, increase the counts for it
                    if sequence_sigs.has_key((first_sig, second_sig)):
                        sequence_sigs[(first_sig, second_sig)] += 1

        # -- normalize the signature info to go between zero and one
        min_count = min(sequence_sigs.values())
        max_count = max(sequence_sigs.values())

        # as long as we have some signatures present, normalize them
        # otherwise we'll just return 0 for everything 
        if max_count > 0:
            for sig in sequence_sigs.keys():
                sequence_sigs[sig] = (float(sequence_sigs[sig] - min_count)
                                      / float(max_count))

        # return the relative signature info in the specified order
        sig_amounts = []
        for sig in self._signatures:
            sig_amounts.append(sequence_sigs[sig])

        return sig_amounts
        
        


Generated by  Doxygen 1.6.0   Back to index