Logo Search packages:      
Sourcecode: python-biopython version File versions  Download package

def Bio::MarkovModel::train_bw (   states,
  alphabet,
  training_data,
  pseudo_initial = None,
  pseudo_transition = None,
  pseudo_emission = None,
  update_fn = None 
)

train_bw(states, alphabet, training_data[, pseudo_initial]
[, pseudo_transition][, pseudo_emission][, update_fn]) -> MarkovModel

Train a MarkovModel using the Baum-Welch algorithm.  states is a list
of strings that describe the names of each state.  alphabet is a
list of objects that indicate the allowed outputs.  training_data
is a list of observations.  Each observation is a list of objects
from the alphabet.

pseudo_initial, pseudo_transition, and pseudo_emission are
optional parameters that you can use to assign pseudo-counts to
different matrices.  They should be matrices of the appropriate
size that contain numbers to add to each parameter matrix, before
normalization.

update_fn is an optional callback that takes parameters
(iteration, log_likelihood).  It is called once per iteration.

Definition at line 111 of file MarkovModel.py.

              :
    """train_bw(states, alphabet, training_data[, pseudo_initial]
    [, pseudo_transition][, pseudo_emission][, update_fn]) -> MarkovModel

    Train a MarkovModel using the Baum-Welch algorithm.  states is a list
    of strings that describe the names of each state.  alphabet is a
    list of objects that indicate the allowed outputs.  training_data
    is a list of observations.  Each observation is a list of objects
    from the alphabet.

    pseudo_initial, pseudo_transition, and pseudo_emission are
    optional parameters that you can use to assign pseudo-counts to
    different matrices.  They should be matrices of the appropriate
    size that contain numbers to add to each parameter matrix, before
    normalization.

    update_fn is an optional callback that takes parameters
    (iteration, log_likelihood).  It is called once per iteration.

    """
    N, M = len(states), len(alphabet)
    if not training_data:
        raise ValueError, "No training data given."
    pseudo_initial, pseudo_emission, pseudo_transition = map(
        _safe_asarray, (pseudo_initial, pseudo_emission, pseudo_transition))
    if pseudo_initial and shape(pseudo_initial) != (N,):
        raise ValueError, "pseudo_initial not shape len(states)"
    if pseudo_transition and shape(pseudo_transition) != (N,N):
        raise ValueError, "pseudo_transition not shape " + \
              "len(states) X len(states)"
    if pseudo_emission and shape(pseudo_emission) != (N,M):
        raise ValueError, "pseudo_emission not shape " + \
              "len(states) X len(alphabet)"
        
    # Training data is given as a list of members of the alphabet.
    # Replace those with indexes into the alphabet list for easier
    # computation.
    training_outputs = []
    indexes = listfns.itemindex(alphabet)
    for outputs in training_data:
        training_outputs.append([indexes[x] for x in outputs])

    # Do some sanity checking on the outputs.
    lengths = map(len, training_outputs)
    if min(lengths) == 0:
        raise ValueError, "I got training data with outputs of length 0"

    # Do the training with baum welch.
    x = _baum_welch(N, M, training_outputs,
                    pseudo_initial=pseudo_initial,
                    pseudo_transition=pseudo_transition,
                    pseudo_emission=pseudo_emission,
                    update_fn=update_fn)
    p_initial, p_transition, p_emission = x
    return MarkovModel(states, alphabet, p_initial, p_transition, p_emission)

MAX_ITERATIONS = 1000


Generated by  Doxygen 1.6.0   Back to index