Question : Find a function which best suits the data

Hi,

I have data which when I plot , I get  a curve. The data keeps changing and I have to find a function, which essentially satisfies the data. Is there any way in perl, where we can estimate a function to the data obtained.
Thanks in advance
vijay

Answer : Find a function which best suits the data

Ok. You can create a polynomial based on the points you have. If you have seven points, you will need to create six-degree polynomial. Perhaps your data is best represented not by a single high-order polynomial, but by a sucession of low-order polynomials joined together. That's called a spline.

This is the reference I capture from the book stated earlier:

The two subroutines that follow are used in tandem to generate and evaluate cubic splines. Like poly_fit(), they expect a series of points from which to generate the spline spline_generate() produces what we'll call the spline coefficients: the second derivatives at each of the input points. spline_evaluate() allows you to interpolate along the spline: given a new x value and the spline coefficients, it tells you the corresponding y value.  
 
# spline_generate(@points) calculates the coefficients for
# the cubic spline that hits all of the points in @points
# (provided as an array of arrays, each being an [x, y] pair).
# It returns the coefficients as a reference to an array.
#
sub spline_generate {
    my @points = @_;
    my ($i, $delta, $temp, @factors, @coeffs);
    $coeffs[0] = $factors[0] = 0;


    # Decomposition phase of the tridiagonal system of equations
    for ($i = 1; $i < @points - 1; $i++) {
        $delta = ($points[$i][0] - $points[$i-1][0]) /
            ($points[$i+1][0] - $points[$i-1][0]);
        $temp = $delta * $coeffs[$i-1] + 2;
        $coeffs[$i] = ($delta - 1) / @points;
        $factors[$i] = ($points[$i+1][1] - $points[$i][1]) /
            ($points[$i+1][0] - $points[$i][0]) -
                ($points[$i][1] - $points[$i-1][1]) /
                    ($points[$i][0] - $points[$i-1][0]);
        $factors[$i] = ( 6 * $factors[$i] /
                        ($points[$i+1][0] - $points[$i-1][0]) -
                        $delta * $factors[$i-1] ) / $temp;
    }


    # Backsubstitution phase of the tridiagonal system
    #
    $coeffs[$#points] = 0;
    for ($i = @points - 2; $i >= 0; $i--) {
        $coeffs[$i] = $coeffs[$i] * $coeffs[$i+1] + $factors[$i];
    }
    return \@coeffs;
}


# spline_evaluate($x, $coeffs, @points) returns the y-value
# for the given x-value, along the spline generated by
# $coeffs = spline_generate(@points).
#
sub spline_evaluate {
    my ($x, $coeffs, @points) = @_;
    my ($i, $delta, $mult);


    # Which section of the spline are we in?
    #
    for ($i = @points - 2; $i >= 1; $i--) {
        last if $x >= $points[$i][0];
    }


    $delta = $points[$i+1][0] - $points[$i][0];
    $mult = ( $coeffs->[$i]/2 ) +
        ($x - $points[$i][0]) * ($coeffs->[$i+1] - $coeffs->[$i])
            / (6 * $delta);
    $mult *= $x - $points[$i][0];
    $mult += ($points[$i+1][1] - $points[$i][1]) / $delta;
    $mult -= ($coeffs->[$i+1] + 2 * $coeffs->[$i]) * $delta / 6;
    return $points[$i][1] + $mult * ($x - $points[$i][0]);
}

@points = ( [-1,1], [0,2], [1,-1], [2, 2] );
my $coeffs = spline_generate @points;
print "Spline coefficients: @$coeffs\n";
for (my $i = -1; $i <= 3; $i += .5) {
    printf "[%.2f, %.2f]\n", $i, spline_evaluate($i, $coeffs, @points);
}  
 
We provide four points (in @points) to spline_generate(). To verify that the resulting spline coefficients are accurate, we step along the x in intervals of 0.5 and determine the value of the cubic spline at that point.  
 
Spline coefficients: 0 -7.35483870967742 10.8387096774194 0
[-1.00, 1.00]
[-0.50, 1.96]
[0.00, 2.00]
[0.50, 0.28]
[1.00, -1.00]
[1.50, -0.18]
[2.00, 2.00]
[2.50, 4.18]
[3.00, 5.00]

If you rather prefer a method that can generate polynomial,

sub poly_fit {
    my @points = @_;
    my ($i, $j, $y, @solution);
    for ($i = 0; $i < @points; $i++) {
        ($x, $y) = ( @ {$points[$i]} );  
    for ($j = 0; $j < @points-1; $j++) {
            $points[$i][$j] = $x ** (@points - $j - 1);
        }
        $points[$i][@points-1] = 1;
        $points[$i][@points]   = $y;
    }
    return linear_solve( @points );
}  
 
What polynomial includes the points (2, 7000), (3, 6000), and (4, 9000)? We can fit any three points to a parabola—a polynomial of degree 2—and we can find out those points as follows:  
 
@solution = poly_fit( [2,7000], [3,6000], [4,9000] );
print "@solution\n";                         # 2000 -11000 21000  
 
This tells us that the parabola hitting those three points is 2000x2-11000x+21000.

 
Random Solutions  
 
programming4us programming4us