Curve fitting

Info from here.

Create example data

Create data from a function y = a * e^(b * x) + c + noise.

# tibbles instead of data.frame for cleaner syntax
# includes ggplot2
library(tidyverse) 

a <- 23.7
b <- -0.01
c <- 37.9

set.seed(1) #reproducible

example_data <- tibble(
  x = 1:500,
  y = a * exp(b * x) + c + rnorm(500, 0, 5)
)

g <- ggplot(example_data, aes(x, y)) + 
  geom_point(alpha = .2) + 
  expand_limits(x = 0, y = 0)

g

Exponential fit

Here we will use the nls (Nonlinear Least Squares) function to estimate some parameters for an exponential model. We will assume the form y = a * e^(b * x) + c. We can set the trace to TRUE to see how the algorithm attempts to find the parameters. We could also use lower and upper parameters to set bounds on our estimates of the coefficients (run ?nls).

exp_model <- nls(y ~ a * exp(b * x) + c, data = example_data, start = list(a = 1, b = 1, c = 1), trace = TRUE)

## Inf :  1 1 1
## Inf :    2.779366e-06   1.000000e+00 -1.490282e+185
## Inf :   -1.490958e-12   1.000000e+00  1.140191e+191
## Inf :   -4.124165e-18   1.000000e+00  1.725187e+191
## Inf :   -1.230131e-23   1.000000e+00  7.477957e+187
## Inf :    4.630765e-29   1.000000e+00 -4.844672e+182
## Inf :    1.029108e-34   1.000000e+00 -5.930485e+176
## Inf :   -8.193268e-41   1.000000e+00 -9.817949e+170
## Inf :    3.57422e-46   1.00000e+00 -4.68593e+165
## Inf :   -6.298201e-52   1.000000e+00  5.683653e+159
## Inf :   -1.400974e-58   1.000000e+00  1.519829e+153
## 5.237138e+306 :    1.516109e-64   1.000000e+00 -4.059431e+147
## 8.049291e+294 :    1.879589e-70   1.000000e+00 -5.786018e+141
## 8.158728e+281 :   5.984027e-77  1.000000e+00 9.876361e+134
## 6.112797e+269 :   -5.179678e-83   1.000000e+00  1.544031e+128
## 1.758065e+257 :   -2.777795e-89   1.000000e+00 -4.578569e+121
## 2.629482e+245 :    3.397179e-95   1.000000e+00 -7.810888e+116
## 6.718016e+232 :  -1.717140e-101   1.000000e+00  1.297536e+111
## 3.19294e+220 :   1.183798e-107   1.000000e+00 -1.497102e+104
## 3.154732e+208 :   1.176695e-113   1.000000e+00  -1.330234e+98
## 1.84454e+196 :  -8.997606e-120   1.000000e+00   2.229427e+92
## 1.162167e+184 :   7.141950e-126   1.000000e+00  -1.493661e+86
## 2.897853e+172 :  -1.127770e-131   1.000000e+00   3.355787e+80
## 3.083854e+160 :   1.163397e-137   1.000000e+00  -1.386364e+74
## 1.723466e+149 :  -2.750308e-143   1.000000e+00   4.070487e+68
## 3.748853e+137 :  -4.056295e-149   1.000000e+00   2.172452e+62
## 3.257501e+126 :  -1.195706e-154   1.000000e+00   1.971698e+57
## 7.245989e+112 :  1.783321e-161  1.000000e+00  2.175654e+50
## 1.088806e+102 :   6.912874e-167   1.000000e+00  -6.768699e+44
## 5.374782e+87 :  -4.856979e-174   1.000000e+00   3.299454e+38
## 1.550033e+77 :   2.608265e-179   1.000000e+00  -3.819806e+32
## 1.08776e+67 :   2.184999e-184   1.000000e+00  -2.566676e+27
## 4.289769e+55 :   4.339129e-190   1.000000e+00  -5.424943e+21
## 5.24377e+44 :   1.517082e-195   1.000000e+00  -2.179842e+16
## 4.231805e+34 :  -1.362844e-200   1.000000e+00   1.749396e+11
## 6.002008e+23 :   5.132513e-206   1.000000e+00  -8.630624e+05
## 1.370804e+13 :  -2.452835e-211   1.000000e+00   4.516087e+01
## 53921139 :  -4.880299e-214   9.999960e-01   4.274843e+01
## 7308222 :  -4.878564e-214   9.979991e-01   4.274841e+01
## 261382.2 :  -1.320396e-213   9.925925e-01   4.274847e+01
## 30622 :  -1.956274e-212   9.629962e-01   4.274864e+01

## Error in nls(y ~ a * exp(b * x) + c, data = example_data, start = list(a = 1, : singular gradient

The algorithm didn’t find a solution, we need to provide better initial estimates. If we look at the graph, we can see some kind of floor effect at high values of x, where the floor is around y = 40. So we can make a guess of c = 40. Then, at x = 0, it looks like y ~= 60. If we put these into our formula (60 = a * e^(b * 0) + 40) we get a = 20. We also know that the value of y decreases as x increases, so b must be negative. Now we try these new guesses:

exp_model <- nls(y ~ a * exp(b * x) + c, data = example_data, start = list(a = 20, b = -1, c = 40), trace = TRUE)

## 33871.6 :  20 -1 40

## Error in numericDeriv(form[[3L]], names(ind), env): Missing value or an infinity produced when evaluating the model

We still get an error. If you plot our assumed formula in something like Desmos, the graph looks very different to ours, it is very steep. So we can try a smaller gradient (b).

exp_model <- nls(y ~ a * exp(b * x) + c, data = example_data, start = list(a = 20, b = -0.1, c = 40), trace = TRUE)

## 28509.58 :  20.0 -0.1 40.0
## 18237.88 :  16.03433315 -0.02107305 40.33933094
## 14644.91 :  18.694970189 -0.008082207 39.578002795
## 12770.45 :  24.13680444 -0.01053179 38.13190398
## 12757.92 :  24.50718252 -0.01018671 37.94686590
## 12757.9 :  24.51211176 -0.01019428 37.94251313

It converged and found the three values.

summary(exp_model)

## 
## Formula: y ~ a * exp(b * x) + c
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## a 24.5121118  1.0353559   23.68   <2e-16 ***
## b -0.0101943  0.0009122  -11.18   <2e-16 ***
## c 37.9425131  0.4358450   87.06   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.067 on 497 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 4.204e-06

Power fit

Same approach… I() is used here because ^ has a special meaning in a stats formula, but we want to use its mathematical meaning (“to the power of”). We assume a power formula (y = a * x^b + c). It’s more difficult to guess the initial values here. In this formula, c is the y-intercept, so we could first guess c = 60.

pow_model <- nls(y ~ a * I(x ^ b) + c, data = example_data, start = list(a = 1, b = 1, c = 60), trace = TRUE)

## 47045658 :   1  1 60
## 13660771 :  -0.4394207  1.0632696 62.1857775
## 1404461 :  -0.3111440  0.9602111 61.2684086
## 56642.72 :  -0.5453078  0.7097977 62.8097169
## 40995.61 :  -1.3097525  0.4571029 65.3570447
## 40419.79 :  -1.5095426  0.4364405 65.7210265
## 39805.65 :  -1.730165  0.416943 66.107746
## 39158 :  -1.9722521  0.3985661 66.5169332
## 38481.77 :  -2.2362889  0.3812621 66.9482454
## 37781.91 :  -2.522605  0.364981 67.401269
## 37063.21 :  -2.8313769  0.3496714 67.8755179
## 36949.46 :  -3.4938877  0.3208925 68.8653607
## 36558.56 :  -4.2511765  0.2952569 69.9399899
## 35927.19 :  -5.1018656  0.2725202 71.0949270
## 35097.08 :  -6.0420210  0.2524169 72.3240840
## 34111.25 :  -7.0654264  0.2346781 73.6199132
## 33011.16 :  -8.163955  0.219043 74.973632
## 32923.16 :  -10.4920403   0.1914916  77.7773735
## 31805.36 :  -13.0572668   0.1699929  80.7479656
## 30005.43 :  -15.7596580   0.1533636  83.7911990
## 29136.36 :  -21.2311190   0.1277113  89.8309839
## 27561.94 :  -31.60474981   0.09841312 100.94662285
## 13968.71 :  -43.82325007   0.08881615 113.58298352
## 13494.84 :  -42.09867723   0.09381536 111.73532198
## 13493.71 :  -43.24660976   0.09212234 112.94843444
## 13493.61 :  -42.92607262   0.09263826 112.60607165
## 13493.61 :  -43.03061663   0.09248217 112.71729425
## 13493.61 :  -42.99964620   0.09252935 112.68430621
## 13493.61 :  -43.00907681   0.09251508 112.69434756

This looks like it worked. We can verify and play around here.

Compare

Lets plot the two models on top of our original graph. We can extract the coefficients to out model using coef() and converting to a list for easier access. Then we can use that to create a smooth line for both models.

exp_coeffs <- as.list(coef(exp_model))

exp_line <- tibble(
  model = "exp",
  x = 1:500,
  y = exp_coeffs$a * exp(exp_coeffs$b * x) + exp_coeffs$c
)

exp_coeffs

## $a
## [1] 24.51211
## 
## $b
## [1] -0.01019428
## 
## $c
## [1] 37.94251

pow_coeffs <- as.list(coef(pow_model))

pow_line <- tibble(
  model = "pow",
  x = 1:500,
  y = pow_coeffs$a * x ^ pow_coeffs$b + pow_coeffs$c
)

pow_coeffs

## $a
## [1] -43.00908
## 
## $b
## [1] 0.09251508
## 
## $c
## [1] 112.6943

Add the lines to the plot

g + 
  geom_line(
    aes(x, y, color = model),
    data = bind_rows(exp_line, pow_line),
    size = 1.5,
    linetype = "dashed"
  )

Curve fitting

Jack Brookes

21 March 2018

Create example data

Exponential fit

Power fit

Compare