## Thursday, April 12, 2012

### R: Multiple Linear Regression

In this post we will scratch power of R in statistics processing. In particular, we shall estimate keyword click value, knowing only number of clicks and total daily revenue.
In other words, we are estimating vector B from equation : Here, Y is vector of size [N], presenting revenue per day
X is matrix of size [N, M], where N stands for number of rows and M for number of columns
X holds number of clicks per keyword
B is searched vector that should provide us with estimated number of value per click per keyword
B is a vector of size [M]
U is scalar value presenting "residue" value

For illustration purposes, lets make assumptions for matrix X as:

 date keyword1 keyword2 keyword3 2012-01-01 3 4 0 2012-01-02 3 4 5 2012-01-03 1 0 3 2012-01-04 0 2 1 2012-01-05 3 0 4

and vector Y as:

 date revenue 2012-01-01 12.56 2012-01-02 11 2012-01-03 5 2012-01-04 5 2012-01-05 6

# assigning matrix X
X <- array(c(3,3,1,0,3, 4,4,0,2,0, 0,5,3,1,4), dim=c(5,3))

# assigning vector Y
> Y <- c(12.56, 11, 5, 5, 6)

# execute call to linear model function:
> lmr = lm(formula=Y ~ X)

# let's review the results:
> summary(lmr)

Call:
lm(formula = Y ~ X)

Residuals:
1        2        3        4        5
0.23747  0.05937  0.89050 -0.59367 -0.59367

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.5584     1.3074   2.722    0.224
X1            1.3803     0.5343   2.583    0.235
X2            1.1558     0.3660   3.158    0.195
X3           -0.2764     0.3512  -0.787    0.576

Residual standard error: 1.248 on 1 degrees of freedom
Multiple R-squared: 0.9699, Adjusted R-squared: 0.8796
F-statistic: 10.74 on 3 and 1 DF,  p-value: 0.2198

Fast and handy! Details on output and LR methodology can be found in  and .

 Wikipedia's entry on General Linear Regression
http://en.wikipedia.org/wiki/General_linear_model

 Using R for statistical analyses - Multiple Regression
http://www.gardenersown.co.uk/Education/Lectures/R/regression.htm#lr_models

 Overview of Multiple Linear Regression
http://online.stat.psu.edu/online/development/stat501/08multiple/07multiple_matrix.html