common.math package¶
Submodules¶
common.math.least_squares module¶
Some mathematical utility functions

common.math.least_squares.
correlation_coefficient
(data_x, data_y)¶

common.math.least_squares.
least_squares_fit
(data_x, data_y, sig_y)¶ Linear least squares fit of y = a0+a1*x
data_x – the independent data data_y – the dependent data sig_y – the errors in the dependent data
data_x, data_y, sig_y should be sequences of equal length
Returns (a0, a1, sig_a0, sig_a1, chi2)
common.math.lfit module¶
General least squares fit (see Numerical Recipes)
This module contains Python versions of some of the procedures from “Numerical Recipes” dealing with least squares fits.

common.math.lfit.
covsrt
(covar, ma, ia, mfit)¶ Expand in storage the covariance matrix covar, so as to take into account parameters that are being held fixed. (For the latter return zero covariances.)

common.math.lfit.
gaussj
(a, b)¶ Linear equation solution by GaussJordan elimination, equation (2.1.1) above a[1..n][1..n] is the input matrix. b[1..n][1..m] is input containing m righthand side vectors. On output, a is replaced by its matrix inverse, and b is replaced by the corresponding set of solution vectors.

common.math.lfit.
lfit
(x, y, sig, a, ia, funcs)¶ Given a set of data points x[1..ndat], y[1..ndat] with individual standard deviations sig[1..ndat], use chisquared minimization to fit for some or all of the coefficients a[1..ma] of a function that depends linearly on a, y = Sum(a_i * afunc_i(x)). The input array ia[1..ma] indicates by nonzero entries those components of a that should be fitted for, and by zero entries those components that should be held fixed at their input values. The program returns values of a[1..ma], chisq, at the covariance matrix covar[1..ma][1..ma]. (Parameters held fixed will return zero covariances.) The user supplies a routine funcs(x, i) that returns the ma basis functions evaluated at x=x in the array afunc[1..ma].

common.math.lfit.
main
()¶ This is an example of the use of the above functions.

common.math.lfit.
poly
(x, i, lvl=4)¶ This function can be used as input for lfit

common.math.lfit.
poly2d
(x, i)¶ This function can be used as input for lfit
x is a list of (x,y) tuples
common.math.mpfit module¶
Perform LevenbergMarquardt leastsquares minimization, based on MINPACK1.
AUTHORSThe original version of this software, called LMFIT, was written in FORTRAN as part of the MINPACK1 package by XXX.
Craig Markwardt converted the FORTRAN code to IDL. The information for the IDL version is:
Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770 craigm@lheamail.gsfc.nasa.gov UPDATED VERSIONs can be found on my WEB PAGE:
 Mark Rivers created this Python version from Craig’s IDL version.
Mark Rivers, University of Chicago Building 434A, Argonne National Laboratory 9700 South Cass Avenue, Argonne, IL 60439 rivers@cars.uchicago.edu Updated versions can be found at http://cars.uchicago.edu/software
DESCRIPTIONMPFIT uses the LevenbergMarquardt technique to solve the leastsquares problem. In its typical use, MPFIT will be used to fit a usersupplied function (the “model”) to usersupplied data points (the “data”) by adjusting a set of parameters. MPFIT is based upon MINPACK1 (LMDIF.F) by More’ and collaborators.
For example, a researcher may think that a set of observed data points is best modelled with a Gaussian curve. A Gaussian curve is parameterized by its mean, standard deviation and normalization. MPFIT will, within certain constraints, find the set of parameters which best fits the data. The fit is “best” in the leastsquares sense; that is, the sum of the weighted squared differences between the model and data is minimized.
The LevenbergMarquardt technique is a particular strategy for iteratively searching for the best fit. This particular implementation is drawn from MINPACK1 (see NETLIB), and is much faster and more accurate than the version provided in the Scientific Python package in Scientific.Functions.LeastSquares. This version allows upper and lower bounding constraints to be placed on each parameter, or the parameter can be held fixed.
The usersupplied Python function should return an array of weighted deviations between model and data. In a typical scientific problem the residuals should be weighted so that each deviate has a gaussian sigma of 1.0. If X represents values of the independent variable, Y represents a measurement for each value of X, and ERR represents the error in the measurements, then the deviates could be calculated as follows:
DEVIATES = (Y  F(X)) / ERRwhere F is the analytical function representing the model. You are recommended to use the convenience functions MPFITFUN and MPFITEXPR, which are driver functions that calculate the deviates for you. If ERR are the 1sigma uncertainties in Y, then
TOTAL( DEVIATES^2 )will be the total chisquared value. MPFIT will minimize the chisquare value. The values of X, Y and ERR are passed through MPFIT to the usersupplied function via the FUNCTKW keyword.
Simple constraints can be placed on parameter values by using the PARINFO keyword to MPFIT. See below for a description of this keyword.
MPFIT does not perform more general optimization tasks. See TNMIN instead. MPFIT is customized, based on MINPACK1, to the leastsquares minimization problem.
USER FUNCTIONThe user must define a function which returns the appropriate values as specified above. The function should return the weighted deviations between the model and the data. It should also return a status flag and an optional partial derivative array. For applications which use finitedifference derivatives – the default – the user function should be declared in the following way:
 def myfunct(p, fjac=None, x=None, y=None, err=None)
 # Parameter values are passed in “p” # If fjac==None then partial derivatives should not be # computed. It will always be None if MPFIT is called with default # flag. model = F(x, p) # Nonnegative status value means MPFIT should continue, negative means # stop the calculation. status = 0 return([status, (ymodel)/err]
See below for applications with analytical derivatives.
The keyword parameters X, Y, and ERR in the example above are suggestive but not required. Any parameters can be passed to MYFUNCT by using the functkw keyword to MPFIT. Use MPFITFUN and MPFITEXPR if you need ideas on how to do that. The function must accept a parameter list, P.
In general there are no restrictions on the number of dimensions in X, Y or ERR. However the deviates must be returned in a onedimensional numpy array of type float.
User functions may also indicate a fatal error condition using the status return described above. If status is set to a number between 15 and 1 then MPFIT will stop the calculation and return to the caller.
ANALYTIC DERIVATIVESIn the search for the bestfit solution, MPFIT by default calculates derivatives numerically via a finite difference approximation. The usersupplied function need not calculate the derivatives explicitly. However, if you desire to compute them analytically, then the AUTODERIVATIVE=0 keyword must be passed to MPFIT. As a practical matter, it is often sufficient and even faster to allow MPFIT to calculate the derivatives numerically, and so AUTODERIVATIVE=0 is not necessary.
If AUTODERIVATIVE=0 is used then the user function must check the parameter FJAC, and if FJAC!=None then return the partial derivative array in the return list.
 def myfunct(p, fjac=None, x=None, y=None, err=None)
# Parameter values are passed in “p” # If FJAC!=None then partial derivatives must be comptuted. # FJAC contains an array of len(p), where each entry # is 1 if that parameter is free and 0 if it is fixed. model = F(x, p) Nonnegative status value means MPFIT should continue, negative means # stop the calculation. status = 0 if (dojac):
pderiv = numpy.zeros([len(x), len(p)], numpy.float) for j in range(len(p)):
pderiv[:,j] = FGRAD(x, p, j)
 else:
 pderiv = None
return([status, (ymodel)/err, pderiv]
where FGRAD(x, p, i) is a user function which must compute the derivative of the model with respect to parameter P[i] at X. When finite differencing is used for computing derivatives (ie, when AUTODERIVATIVE=1), or when MPFIT needs only the errors but not the derivatives the parameter FJAC=None.
Derivatives should be returned in the PDERIV array. PDERIV should be an m x n array, where m is the number of data points and n is the number of parameters. dp[i,j] is the derivative at the ith point with respect to the jth parameter.
The derivatives with respect to fixed parameters are ignored; zero is an appropriate value to insert for those derivatives. Upon input to the user function, FJAC is set to a vector with the same length as P, with a value of 1 for a parameter which is free, and a value of zero for a parameter which is fixed (and hence no derivative needs to be calculated).
If the data is higher than one dimensional, then the last dimension should be the parameter dimension. Example: fitting a 50x50 image, “dp” should be 50x50xNPAR.
CONSTRAINING PARAMETER VALUES WITH THE PARINFO KEYWORDThe behavior of MPFIT can be modified with respect to each parameter to be fitted. A parameter value can be fixed; simple boundary constraints can be imposed; limitations on the parameter changes can be imposed; properties of the automatic derivative can be modified; and parameters can be tied to one another.
These properties are governed by the PARINFO structure, which is passed as a keyword parameter to MPFIT.
PARINFO should be a list of dictionaries, one list entry for each parameter. Each parameter is associated with one element of the array, in numerical order. The dictionary can have the following keys (none are required, keys are case insensitive):
 ‘value’  the starting parameter value (but see the START_PARAMS
 parameter for more information,[not available for this environmentVOG 110806] ).
 ‘fixed’  a boolean value, whether the parameter is to be held
 fixed or not. Fixed parameters are not varied by MPFIT, but are passed on to MYFUNCT for evaluation.
 ‘limited’  a twoelement boolean array. If the first/second
 element is set, then the parameter is bounded on the lower/upper side. A parameter can be bounded on both sides. Both LIMITED and LIMITS must be given together.
 ‘limits’  a twoelement float array. Gives the
 parameter limits on the lower and upper sides, respectively. Zero, one or two of these values can be set, depending on the values of LIMITED. Both LIMITED and LIMITS must be given together.
 ‘parname’  a string, giving the name of the parameter. The
 fitting code of MPFIT does not use this tag in any way. However, the default iterfunct will print the parameter name if available.
 ‘step’  the step size to be used in calculating the numerical
 derivatives. If set to zero, then the step size is computed automatically. Ignored when AUTODERIVATIVE=0.
 ‘mpside’  the sidedness of the finite difference when computing
numerical derivatives. This field can take four values:
0  onesided derivative computed automatically 1  onesided derivative (f(x+h)  f(x) )/h
 1  onesided derivative (f(x)  f(xh))/h
 2  twosided derivative (f(x+h)  f(xh))/(2*h)
Where H is the STEP parameter described above. The “automatic” onesided derivative method will chose a direction for the finite difference which does not violate any constraints. The other methods do not perform this check. The twosided method is in principle more precise, but requires twice as many function evaluations. Default: 0.
 ‘mpmaxstep’  the maximum change to be made in the parameter
value. During the fitting process, the parameter will never be changed by more than this value in one iteration.
A value of 0 indicates no maximum. Default: 0.
 ‘tied’  a string expression which “ties” the parameter to other
 free or fixed parameters. Any expression involving constants and the parameter array P are permitted. Example: if parameter 2 is always to be twice parameter 1 then use the following: parinfo(2).tied = ‘2 * p(1)’. Since they are totally constrained, tied parameters are considered to be fixed; no errors are computed for them. [ NOTE: the PARNAME can’t be used in expressions. ]
 ‘mpprint’  if set to 1, then the default iterfunct will print the
 parameter value. If set to 0, the parameter value will not be printed. This tag can be used to selectively print only a few parameter values out of many. Default: 1 (all parameters printed)
Future modifications to the PARINFO structure, if any, will involve adding dictionary tags beginning with the two letters “MP”. Therefore programmers are urged to avoid using tags starting with the same letters; otherwise they are free to include their own fields within the PARINFO structure, and they will be ignored.
PARINFO Example: parinfo = [{‘value’:0., ‘fixed’:0, ‘limited’:[0,0], ‘limits’:[0.,0.]} for i in range(5)] parinfo[0][‘fixed’] = 1 parinfo[4][‘limited’][0] = 1 parinfo[4][‘limits’][0] = 50. values = [5.7, 2.2, 500., 1.5, 2000.] for i in range(5): parinfo[i][‘value’]=values[i]
A total of 5 parameters, with starting values of 5.7, 2.2, 500, 1.5, and 2000 are given. The first parameter is fixed at a value of 5.7, and the last parameter is constrained to be above 50.
EXAMPLEimport mpfit import numpy as numpy x = numpy.arange(100, numpy.float) p0 = [5.7, 2.2, 500., 1.5, 2000.] y = ( p[0] + p[1]*[x] + p[2]*[x**2] + p[3]*numpy.sqrt(x) +
p[4]*numpy.log(x))fa = {‘x’:x, ‘y’:y, ‘err’:err} m = mpfit(‘myfunct’, p0, functkw=fa) print ‘status = ‘, m.status if (m.status <= 0): print ‘error message = ‘, m.errmsg print ‘parameters = ‘, m.params
Minimizes sum of squares of MYFUNCT. MYFUNCT is called with the X, Y, and ERR keyword parameters that are given by FUNCTKW. The results can be obtained from the returned object m.
EXAMPLE IIFit the parameters of a model with some noise added. A plot shows the results.
#!/usr/bin/env python
import numpy import mpfit import pylab
 def peval(x, p):
 # The model function with parameters p return p[0] * numpy.sin(2*numpy.pi*p[1]*x+p[2])
 def myfunct(p, fjac=None, x=None, y=None, err=None ):
 # Function that return the weighted deviates model = peval(x, p) status = 0 return([status, (ymodel)/err])
# Generate model data x = numpy.arange( 0, 6e2, 6e2/30 ) preal = [10, 1.0/3e2, numpy.pi/6] y_true = peval( x, preal ) y = y_true + 2 * numpy.randn( len(x) ) err = 1.0 + 0.15 * numpy.randn( len(x) )
# Initial estimates for MPFIT p0 = [8, 1/2.3e2, numpy.pi/3] fa = {‘x’:x, ‘y’:y, ‘err’:err}
# Call MPFIT with user defined function ‘myfunct’ m = mpfit.mpfit( myfunct, p0, functkw=fa )
print “status: “, m.status if (m.status <= 0):
print ‘error message = ‘, m.errmsg
 else:
 print “Iterations: “, m.niter print “Fitted pars: “, m.params print “Uncertainties: “, m.perror
# Plot the result with Matplotlib pylab.errorbar( x, y, yerr=err, fmt=’ro’, label=”Noisy data” ) pylab.plot( x,peval(x,m.params), label=”Fit” ) pylab.plot( x,y_true, ‘g’, label=”True data” ) pylab.xlabel( “X” ) pylab.ylabel( “Measurement data” ) pylab.title( “Leastsquares fit to noisy data using MPFIT” ) pylab.legend() pylab.show()
EXAMPLE IIIIf you want to to have more control via the PARINFO structure, add the following lines after the assignment of ‘p0’.
numpars = len( p0 ) parinfo = [{‘value’:0., ‘fixed’:0, ‘limited’:[0,0],
‘limits’:[0.,0.]} for i in range(numpars)]
 for i in range(numpars):
 parinfo[i][‘value’] = p0[i]
Change the instantiation of the mpfit object to:
m = mpfit.mpfit( myfunct, parinfo=parinfo, functkw=fa )
THEORY OF OPERATIONThere are many specific strategies for function minimization. One very popular technique is to use function gradient information to realize the local structure of the function. Near a local minimum the function value can be taylor expanded about x0 as follows:
 f(x) = f(x0) + f’(x0) . (xx0) + (1/2) (xx0) . f’‘(x0) . (xx0)
 —– ————— —————————— (1)
Order 0th 1st 2nd
Here f’(x) is the gradient vector of f at x, and f’‘(x) is the Hessian matrix of second derivatives of f at x. The vector x is the set of function parameters, not the measured data vector. One can find the minimum of f, f(xm) using Newton’s method, and arrives at the following linear equation:
f’‘(x0) . (xmx0) =  f’(x0) (2)If an inverse can be found for f’‘(x0) then one can solve for (xmx0), the step vector from the current position x0 to the new projected minimum. Here the problem has been linearized (ie, the gradient information is known to first order). f’‘(x0) is symmetric n x n matrix, and should be positive definite.
The Levenberg  Marquardt technique is a variation on this theme. It adds an additional diagonal term to the equation which may aid the convergence properties:
(f’‘(x0) + nu I) . (xmx0) = f’(x0) (2a)where I is the identity matrix. When nu is large, the overall matrix is diagonally dominant, and the iterations follow steepest descent. When nu is small, the iterations are quadratically convergent.
In principle, if f’‘(x0) and f’(x0) are known then xmx0 can be determined. However the Hessian matrix is often difficult or impossible to compute. The gradient f’(x0) may be easier to compute, if even by finite difference techniques. Socalled quasiNewton techniques attempt to successively estimate f’‘(x0) by building up gradient information as the iterations proceed.
In the least squares problem there are further simplifications which assist in solving eqn (2). The function to be minimized is a sum of squares:
f = Sum(hi^2) (3)where hi is the ith residual out of m residuals as described above. This can be substituted back into eqn (2) after computing the derivatives:
f’ = 2 Sum(hi hi’) f’’ = 2 Sum(hi’ hj’) + 2 Sum(hi hi’‘) (4)If one assumes that the parameters are already close enough to a minimum, then one typically finds that the second term in f’’ is negligible [or, in any case, is too difficult to compute]. Thus, equation (2) can be solved, at least approximately, using only gradient information.
In matrix notation, the combination of eqns (2) and (4) becomes:
hT’ . h’ . dx =  hT’ . h (5)Where h is the residual vector (length m), hT is its transpose, h’ is the Jacobian matrix (dimensions n x m), and dx is (xmx0). The user function supplies the residual vector h, and in some cases h’ when it is not found by finite differences (see MPFIT_FDJAC2, which finds h and hT’). Even if dx is not the best absolute step to take, it does provide a good estimate of the best direction, so often a line minimization will occur along the dx vector direction.
The method of solution employed by MINPACK is to form the Q . R factorization of h’, where Q is an orthogonal matrix such that QT . Q = I, and R is upper right triangular. Using h’ = Q . R and the ortogonality of Q, eqn (5) becomes
 (RT . QT) . (Q . R) . dx =  (RT . QT) . h
 RT . R . dx =  RT . QT . h (6)
 R . dx =  QT . h
where the last statement follows because R is upper triangular. Here, R, QT and h are known so this is a matter of solving for dx. The routine MPFIT_QRFAC provides the QR factorization of h, with pivoting, and MPFIT_QRSOLV provides the solution for dx.
REFERENCESMINPACK1, Jorge More’, available from netlib (www.netlib.org). “Optimization Software Guide,” Jorge More’ and Stephen Wright,
SIAM, Frontiers in Applied Mathematics, Number 14.
 More’, Jorge J., “The LevenbergMarquardt Algorithm:
Implementation and Theory,” in Numerical Analysis, ed. Watson, G. A., Lecture Notes in Mathematics 630, SpringerVerlag, 1977.
MODIFICATION HISTORYTranslated from MINPACK1 in FORTRAN, AprJul 1998, CM
Copyright (C) 19972002, Craig Markwardt This software is provided as is without any warranty whatsoever. Permission to use, copy, modify, and distribute modified or unmodified copies is granted, provided this copyright and disclaimer are included unchanged.
Translated from MPFIT (Craig Markwardt’s IDL package) to Python, August, 2002. Mark RiversConverted to NumPy by Martin Vogelaar, 11082006

class
common.math.mpfit.
machar
(double=1)¶ Bases:
object

class
common.math.mpfit.
mpfit
(fcn, xall=None, functkw={}, parinfo=None, ftol=1e10, xtol=1e10, gtol=1e10, damp=0.0, maxiter=200, factor=100.0, nprint=1, iterfunct='default', iterkw={}, nocovar=0, fastnorm=0, rescale=0, autoderivative=1, quiet=0, diag=None, epsfcn=None, debug=0)¶ Bases:
object

calc_covar
(rr, ipvt=None, tol=1e14)¶

call
(fcn2, x, functkw, fjac=None)¶

defiter
(fcn, x, iter, fnorm=None, functkw=None, quiet=0, iterstop=None, parinfo=None, format=None, pformat='%.10g', dof=1)¶

enorm
(vec)¶

fdjac2
(fcn, x, fvec, step=None, ulimited=None, ulimit=None, dside=None, epsfcn=None, autoderivative=1, functkw=None, xall=None, ifree=None, dstep=None)¶

lmpar
(r, ipvt, diag, qtb, delta, x, sdiag, par=None)¶

parinfo
(parinfo=None, key='a', default=None, n=0)¶

qrfac
(a, pivot=0)¶

qrsolv
(r, ipvt, diag, qtb, sdiag)¶

tie
(p, ptied=None)¶

common.math.statistics module¶
This module provides several statistical utility functions

common.math.statistics.
get_mean_and_stdev
(data)¶ Calculate the mean and the sample standard deviation on a sequence of numbers and their weights.
 data : iterable
 the data for which to calculate the statistics
 (float, float)
 a tuple of the calculated mean and standard deviation

common.math.statistics.
get_median
(data)¶ Return the median of a sequence of numbers
 data : iterable
 the data for which to calculate the statistics
 float
 the determined median

common.math.statistics.
get_rms_min_max
(data)¶ Calculate the RMS (Root Mean Square), minimum, and maximum of a sequence of numbers.
 data : iterable
 the data for which to calculate the statistics
 (float, float, float)
 a tuple of the calculated RMS, minimum, and maximum

common.math.statistics.
get_weighted_mean_and_error
(data, errors)¶  Calculate the weighted mean and its uncertainty of a sequence of
 numbers and their errors.
 data : iterable
 the data for which to calculate the statistics
 errors : iterable
 the errors matching the input data
 (float, float)
 a tuple of the calculated mean and uncertainty

common.math.statistics.
get_weighted_mean_and_error_chisquare_corrected
(data, errors)¶  Calculate the weighted mean and its uncertainty of a sequence of
 numbers and their errors correcting for over/under estimation of the propagated errors.
 data : iterable
 the data for which to calculate the statistics
 errors : iterable
 the errors matching the input data
 (float, float)
 a tuple of the calculated mean and uncertainty

common.math.statistics.
get_weighted_mean_and_stdev
(data, weights, weight_type=None)¶ Calculate the weighted mean and the weighted sample standard deviation on a sequence of numbers and their weights.
 data : iterable
 the data for which to calculate the statistics
 weights : iterable
 the weights matching the input data
 weight_type : str
one of None, ‘frequency’, or ‘reliability’
If the weight type is None, the calculated standard deviation is biased. If it is ‘frequency’ or ‘reliability’, the calculated standard deviation is unbiased.
A frequency type weight indicates how many of a given value are in the data. For example, a set of values [2,2,4,5,5,5] would have data = [2,4,5] with weights [2,1,3]. A reliability type weight indicates the weights represent the confidence in the data values.
 (float, float)
 a tuple of the calculated weighted mean and standard deviation

common.math.statistics.
kappa_sigma_clip
(data, kappa=4.0, min_len=0, mask=False, iter=100, verbose=False)¶ Iteratively clip the outlying values of a sequence of nummbers. Kappasigma clipping looks at all values of data and rejects (clips) any value greater than +/ kappa*sigma from the mean value of data.
 data : iterable
 the data to clip or to determine the clipping mask
 kappa : float
 sigmamultiplier to determine threshold
 min_len : int
 minimum number of data points to retain in output data (clipping stops if the length of the clipped data falls below this number)
 mask : bool
 don’t return the masked data, but a tuple of the mask used to extract it and the length of the masked data
 iter : int
 the maximum number of iterations
 verbose : bool
 print out clipping Messages
 numpy.ndarray or (numpy.ndarray, int)
 a clipped version of data or a tuple of (mask, len), where the length of mask is the same as the length of the input data, and len is the length of the clipped sequence that would result by applying the mask (e.g., numpy.take or array indexing).

common.math.statistics.
mad_estimator
(data)¶ Estimate the standard deviation of a Gaussian distribution using the median of the absolute deviation from the median (MAD) scaled by 1.4826.
 data : iterable
 the data for which to calculate the statistics
 float
 the scaled MAD

common.math.statistics.
median
(data)¶ DEPRECATED! Use get_median() instead.
Return the median of a sequence of numbers
 data : iterable
 the data for which to calculate the statistics
 float
 the determined median

common.math.statistics.
median_absolute_deviation
(data, scale=1.0)¶ Calculate the median of the absolute deviation from the median, or the MAD of a sequence of numbers.
 data : iterable
 the data for which to calculate the statistics
 scale : float
 scaling factor to apply to the output to estimate other statistics, e.g., the standard deviation of a Guassian distribution
 float
 the scaled MAD
Module contents¶
Mathematics