nonLinearRegression.R

Plain text source: nonLinearRegression.R


# -*- Mode:R; Coding:utf-8; fill-column:160 -*-

################################################################################################################################################################
# @file      nonLinearRegression.R
# @author    Mitch Richling <https://www.mitchr.me>
# @Copyright Copyright 2015 by Mitch Richling.  All rights reserved.
# @brief     @EOL
# @Keywords  r base nonlinear regression multiple
#
# The topic of this example is non-linear regression; however, as most of what one might do with a non-linear model is identical to what one might do with a
# linear model, the focus is really on how to fit the data.  For the other stuff see linearRegression.R and polynomialRegression.R.  In addition to fitting the
# model, we demonstrate how to generate random data that follows a predetermined model and then how to compare our fitted model to that original model -- just
# for fun.
#


################################################################################################################################################################
## Make up some data: y=sin(x)+e where e is random (normal or uniform)

numPts   <- 100
daDat    <- data.frame(x=1:numPts/(numPts/20*pi))    # x-data
daDat$f  <- sin(daDat$x)                  # Function
daDat$e  <- rnorm(daDat$x, mean=0, sd=.5) # Identically distributed Normal Error
#daDat$e <- runif(daDat$x, -1, 1)         # Identically distributed Uniform Error
#daDat$e <- runif(daDat$x, 0, 1)          # Identically distributed, but asymmetric, Uniform Error
#daDat$e <- daDat$x*rnorm(daDat$x, sd=.5) # Non-Identically distributed Normal Error
daDat$y  <- daDat$f + daDat$e             # y-data

################################################################################################################################################################
# Compute the non-linear model with independent variable of x, dependent variable of y, and function of a*sin(b*x+c)+d

daFit   <- nls(y~a*sin(b*x+c)+d,           # Specify our model formula
               data=daDat,                 # Always use data= to simplify the formula argument, and future predict calls
               start=list(a=1,b=1,c=0,d=0) # Initial conditions (we set them to the true model values)
               )

summary(daFit)
Formula: y ~ a * sin(b * x + c) + d

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
a  1.07370    0.07131  15.057   <2e-16 ***
b  0.99692    0.03332  29.922   <2e-16 ***
c  0.03616    0.12741   0.284    0.777    
d -0.02121    0.04876  -0.435    0.665    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4863 on 96 degrees of freedom

Number of iterations to convergence: 3 
Achieved convergence tolerance: 9.034e-06
################################################################################################################################################################
# This plot is not terribly useful for the practicing data modeler as one wouldn't be modeling the data in the first place if the true model from which the data
# was generated available!  That said, it is an interesting way to explore how pushing the envelope of the various theoretical requirements impacts the accuracy
# of the fit (for example, try adding one sided, positive errors or non-normal ones).


daDat$fit <- fitted(daFit)                                # Put the fitted valeus in the data.frame so ggplot is simple to use

ggplot(daDat) +
  geom_ribbon(aes(x=x,
                  ymin=pmin(daDat$f, daDat$fit),
                  ymax=pmax(daDat$f, daDat$fit)),
              fill='pink') +
  geom_line(aes(x=x, y=f,   col='function')) +           # Note col is an aes -- so we get a legend
  geom_point(aes(x=x, y=y,  col='data')) +
  geom_line(aes(x=x, y=fit, col='fit')) +
  labs(title='Fit vs Actual',
       x='x',
       y='y') +
  scale_color_manual(values=c("black",     "red", "blue"))
plot of chunk auto-report
################################################################################################################################################################
# How we might do it with base R

par(mar=c(8,4,4,2)+0.1)
plot(0, 0, col='NA',                                                                       # Setup the plot window
     xlim=range(daDat$x),                                                                    # Set the x-range
     ylim=range(daDat$y, daDat$f, fitted(daFit)),                                            # Set the y-range
     main='Fit vs Actual', xlab='x', ylab='y')                                               # Set the plot titles
polygon(c(daDat$x, rev(daDat$x)), c(daDat$f, rev(fitted(daFit))), col='pink', border=NA)   # Fill in area between fit and Actual
points(daDat$x, daDat$y, type='p')                                                         # Draw data (with random noise)
points(daDat$x, daDat$f,       type='l', col='green')                                      # Draw Actual
points(daDat$x, fitted(daFit), type='l', col='red')                                        # Draw Fit
plot of chunk auto-report

The R session information (including the OS info, R version and all packages used):

    options(width=80)
    sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] graphics  grDevices datasets  utils     grid      stats     base     

other attached packages:
[1] RColorBrewer_1.1-2 reshape2_1.4.1     ggplot2_2.1.0      dplyr_0.4.3       
[5] data.table_1.9.6   gridExtra_2.2.1    knitr_1.13         lattice_0.20-33   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5      magrittr_1.5     munsell_0.4.3    colorspace_1.2-6
 [5] R6_2.1.2         stringr_1.0.0    highr_0.6        plyr_1.8.3      
 [9] tools_3.3.0      parallel_3.3.0   gtable_0.2.0     DBI_0.4-1       
[13] assertthat_0.1   digest_0.6.9     formatR_1.4      evaluate_0.9    
[17] labeling_0.3     stringi_1.0-1    methods_3.3.0    scales_0.4.0    
[21] chron_2.3-47    
    Sys.time()
[1] "2016-07-09 20:07:41 CDT"