Thursday, April 12, 2012

R: Multiple Linear Regression

In this post we will scratch power of R in statistics processing. In particular, we shall estimate keyword click value, knowing only number of clicks and total daily revenue.
In other words, we are estimating vector B from equation [1]:

\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{U},

Here, Y is vector of size [N], presenting revenue per day
X is matrix of size [N, M], where N stands for number of rows and M for number of columns
X holds number of clicks per keyword
B is searched vector that should provide us with estimated number of value per click per keyword
B is a vector of size [M]
U is scalar value presenting "residue" value

For illustration purposes, lets make assumptions for matrix X as:

date keyword1 keyword2 keyword3
2012-01-01 3 4 0
2012-01-02 3 4 5
2012-01-03 1 0 3
2012-01-04 0 2 1
2012-01-05 3 0 4

and vector Y as:

date revenue
2012-01-01 12.56
2012-01-02 11
2012-01-03 5
2012-01-04 5
2012-01-05 6

# assigning matrix X
X <- array(c(3,3,1,0,3, 4,4,0,2,0, 0,5,3,1,4), dim=c(5,3))


# assigning vector Y
> Y <- c(12.56, 11, 5, 5, 6)


# execute call to linear model function:
> lmr = lm(formula=Y ~ X)


# let's review the results:
> summary(lmr)


Call:
lm(formula = Y ~ X)

Residuals:
       1        2        3        4        5 
 0.23747  0.05937  0.89050 -0.59367 -0.59367 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.5584     1.3074   2.722    0.224
X1            1.3803     0.5343   2.583    0.235
X2            1.1558     0.3660   3.158    0.195
X3           -0.2764     0.3512  -0.787    0.576

Residual standard error: 1.248 on 1 degrees of freedom
Multiple R-squared: 0.9699, Adjusted R-squared: 0.8796 
F-statistic: 10.74 on 3 and 1 DF,  p-value: 0.2198

Fast and handy! Details on output and LR methodology can be found in [2] and [3].

[1] Wikipedia's entry on General Linear Regression
http://en.wikipedia.org/wiki/General_linear_model

[2] Using R for statistical analyses - Multiple Regression
http://www.gardenersown.co.uk/Education/Lectures/R/regression.htm#lr_models

[3] Overview of Multiple Linear Regression
http://online.stat.psu.edu/online/development/stat501/08multiple/07multiple_matrix.html

No comments: