Logo Search packages:      
Sourcecode: python-biopython version File versions

def Bio::HMM::Trainer::BaumWelchTrainer::train (   self,
  training_seqs,
  stopping_criteria,
  dp_method = ScaledDPAlgorithms 
)

Estimate the parameters using training sequences.

The algorithm for this is taken from Durbin et al. p64, so this
is a good place to go for a reference on what is going on.

Arguments:

o training_seqs -- A list of TrainingSequence objects to be used
for estimating the parameters.

o stopping_criteria -- A function, that when passed the change
in log likelihood and threshold, will indicate if we should stop
the estimation iterations.

o dp_method -- A class instance specifying the dynamic programming
implementation we should use to calculate the forward and
backward variables. By default, we use the scaling method.

Definition at line 168 of file Trainer.py.

                                             :
        """Estimate the parameters using training sequences.

        The algorithm for this is taken from Durbin et al. p64, so this
        is a good place to go for a reference on what is going on.
        
        Arguments:

        o training_seqs -- A list of TrainingSequence objects to be used
        for estimating the parameters.
        
        o stopping_criteria -- A function, that when passed the change
        in log likelihood and threshold, will indicate if we should stop
        the estimation iterations.

        o dp_method -- A class instance specifying the dynamic programming
        implementation we should use to calculate the forward and
        backward variables. By default, we use the scaling method.
        """
        prev_log_likelihood = None
        num_iterations = 1
        
        while 1:            
            transition_count = self._markov_model.get_blank_transitions()
            emission_count = self._markov_model.get_blank_emissions()

            # remember all of the sequence probabilities
            all_probabilities = []
            
            for training_seq in training_seqs:
                # calculate the forward and backward variables
                DP = dp_method(self._markov_model, training_seq)
                forward_var, seq_prob = DP.forward_algorithm()
                backward_var =  DP.backward_algorithm()
                
                all_probabilities.append(seq_prob)

                # update the counts for transitions and emissions
                transition_count = self.update_transitions(transition_count,
                                                           training_seq,
                                                           forward_var,
                                                           backward_var,
                                                           seq_prob)
                emission_count = self.update_emissions(emission_count,
                                                       training_seq,
                                                       forward_var,
                                                       backward_var,
                                                       seq_prob)

            # update the markov model with the new probabilities
            ml_transitions, ml_emissions = \
                self.estimate_params(transition_count, emission_count)
            self._markov_model.transition_prob = ml_transitions
            self._markov_model.emission_prob = ml_emissions

            cur_log_likelihood =  self.log_likelihood(all_probabilities)

            # if we have previously calculated the log likelihood (ie.
            # not the first round), see if we can finish
            if prev_log_likelihood is not None:
                # XXX log likelihoods are negatives -- am I calculating
                # the change properly, or should I use the negatives...
                # I'm not sure at all if this is right.
                log_likelihood_change = abs(abs(cur_log_likelihood) -
                                            abs(prev_log_likelihood))

                # check whether we have completed enough iterations to have
                # a good estimation
                if stopping_criteria(log_likelihood_change, num_iterations):
                    break

            # set up for another round of iterations
            prev_log_likelihood = cur_log_likelihood
            num_iterations += 1

        return self._markov_model

    def update_transitions(self, transition_counts, training_seq,


Generated by  Doxygen 1.6.0   Back to index