Feature Selection and LASSO Regression

In this lession, we will use LASSO to select features, building on a pre-implemented solver for LASSO.

Learning objectives:

  • Run LASSO with different L1 penalties.
  • Choose best L1 penalty using a validation set.
  • Choose best L1 penalty using a validation set, with additional constraint on the size of subset.

In addition, we also will implement your own LASSO solver using coordinate descent.

We will continue to use the House data from previous notebooks.

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model
from math import sqrt, log
from sklearn.metrics import mean_squared_error
In [2]:
### Importing data:
dtype_dict = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 'sqft_living15':float, 'grade':int, 
              'yr_renovated':int, 'price':float, 'bedrooms':float, 'zipcode':str, 'long':float, 'sqft_lot15':float, 
              'sqft_living':float, 'floors':float, 'condition':int, 'lat':float, 'date':str, 'sqft_basement':int, 
              'yr_built':int, 'id':str, 'sqft_lot':int, 'view':int}
sales = pd.read_csv('C:/Users/Duy Tung/Downloads/kc_house_data.csv', dtype= dtype_dict)
sales = pd.DataFrame(sales)
sales.head()
Out[2]:
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view ... grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft_lot15
0 7129300520 20141013T000000 221900.0 3.0 1.00 1180.0 5650 1.0 0 0 ... 7 1180 0 1955 0 98178 47.5112 -122.257 1340.0 5650.0
1 6414100192 20141209T000000 538000.0 3.0 2.25 2570.0 7242 2.0 0 0 ... 7 2170 400 1951 1991 98125 47.7210 -122.319 1690.0 7639.0
2 5631500400 20150225T000000 180000.0 2.0 1.00 770.0 10000 1.0 0 0 ... 6 770 0 1933 0 98028 47.7379 -122.233 2720.0 8062.0
3 2487200875 20141209T000000 604000.0 4.0 3.00 1960.0 5000 1.0 0 0 ... 7 1050 910 1965 0 98136 47.5208 -122.393 1360.0 5000.0
4 1954400510 20150218T000000 510000.0 3.0 2.00 1680.0 8080 1.0 0 0 ... 8 1680 0 1987 0 98074 47.6168 -122.045 1800.0 7503.0

5 rows × 21 columns

In [3]:
sales['sqft_living_sqrt'] = sales['sqft_living'].apply(sqrt)
sales['sqft_lot_sqrt'] = sales['sqft_lot'].apply(sqrt)
sales['bedrooms_square'] = sales['bedrooms']*sales['bedrooms']
sales['floors_square'] = sales['floors']*sales['floors']

1. Data Set-up

We consider features that are some transformations of inputs.

  • Squaring bedrooms will increase the separation between not many bedrooms (e.g. 1) and lots of bedrooms (e.g. 4) since 1^2 = 1 but 4^2 = 16. Consequently this variable will mostly affect houses with many bedrooms.
  • On the other hand, taking square root of sqft_living will decrease the separation between big house and small house. The owner may not be exactly twice as happy for getting a house that is twice as big.
In [4]:
sales['sqft_living_sqrt'] = sales['sqft_living'].apply(sqrt)
sales['sqft_lot_sqrt'] = sales['sqft_lot'].apply(sqrt)
sales['bedrooms_square'] = sales['bedrooms']*sales['bedrooms']
sales['floors_square'] = sales['floors']*sales['floors']

2. Learn Regression Weights with L1 Penalty

Let us fit a model with all the features available, plus the features we just created above.

In [5]:
all_features = ['bedrooms', 'bedrooms_square', 'bathrooms','sqft_living', 'sqft_living_sqrt',
            'sqft_lot', 'sqft_lot_sqrt', 'floors', 'floors_square', 'waterfront', 'view',
            'condition', 'grade', 'sqft_above', 'sqft_basement','yr_built', 'yr_renovated']
In [6]:
model_lasso = linear_model.Lasso(alpha = 5e2, normalize = True)
model_all = model_lasso.fit(sales[all_features], sales['price'])
In [7]:
### Find what features had non-zero weight.
coef_model_all1 = {}
for coef, feat in zip(model_all.coef_,all_features):
    coef_model_all1[feat] = coef
coef_model_all1
Out[7]:
{'bedrooms': 0.0,
 'bedrooms_square': 0.0,
 'bathrooms': 0.0,
 'sqft_living': 134.43931395541435,
 'sqft_living_sqrt': 0.0,
 'sqft_lot': 0.0,
 'sqft_lot_sqrt': 0.0,
 'floors': 0.0,
 'floors_square': 0.0,
 'waterfront': 0.0,
 'view': 24750.004585609502,
 'condition': 0.0,
 'grade': 61749.10309070813,
 'sqft_above': 0.0,
 'sqft_basement': 0.0,
 'yr_built': -0.0,
 'yr_renovated': 0.0}

3. Selecting an L1 penalty

To find a good L1 penalty, we will explore multiple values using a validation set. Let us do three way split into train, validation, and test sets:

  • Split our sales data into 2 sets: training and test
  • Further split our training data into two sets: train, validation.
In [8]:
training = pd.read_csv('C:/Users/Duy Tung/Downloads/wk3_kc_house_train_data.csv', dtype = dtype_dict)
validation = pd.read_csv('C:/Users/Duy Tung/Downloads/wk3_kc_house_valid_data.csv', dtype = dtype_dict)
testing = pd.read_csv('C:/Users/Duy Tung/Downloads/wk3_kc_house_test_data.csv', dtype = dtype_dict)
3.1 Selecting an L1 penalty

Next, we write a loop that does the following:

  • For l1_penalty in [10^1, 10^1.5, 10^2, 10^2.5, ..., 10^7] (to get this in Python, type np.logspace(1, 7, num=13).)
  • Fit a regression model with a given l1_penalty on TRAIN data. Specify l1_penalty=l1_penalty and l2_penalty=0. in the parameter list.
  • Compute the RSS on VALIDATION data (here you will want to use .predict()) for that l1_penalty
  • Report which l1_penalty produced the lowest RSS on validation data.
In [9]:
testing['sqft_living_sqrt'] = testing['sqft_living'].apply(sqrt)
testing['sqft_lot_sqrt'] = testing['sqft_lot'].apply(sqrt)
testing['bedrooms_square'] = testing['bedrooms']*testing['bedrooms']
testing['floors_square'] = testing['floors']*testing['floors']

training['sqft_living_sqrt'] = training['sqft_living'].apply(sqrt)
training['sqft_lot_sqrt'] = training['sqft_lot'].apply(sqrt)
training['bedrooms_square'] = training['bedrooms']*training['bedrooms']
training['floors_square'] = training['floors']*training['floors']

validation['sqft_living_sqrt'] = validation['sqft_living'].apply(sqrt)
validation['sqft_lot_sqrt'] = validation['sqft_lot'].apply(sqrt)
validation['bedrooms_square'] = validation['bedrooms']*validation['bedrooms']
validation['floors_square'] = validation['floors']*validation['floors']
In [10]:
l1_penalty = np.logspace(1, 7, num=13)
In [11]:
for alpha in (l1_penalty):
    print('Alpha', alpha) 
    model_lasso2 = linear_model.Lasso(alpha, normalize=True)
    model2 = model_lasso2.fit(training[all_features], training['price'])
    print('The number of nonzero coefficients:',np.count_nonzero(model2.coef_) + np.count_nonzero(model2.intercept_))
    rss_val = mean_squared_error(validation['price'], model2.predict(validation[all_features]))
    print('RSS for validation data:', rss_val)
    rss_test = mean_squared_error(testing['price'], model2.predict(testing[all_features]))
    print('RSS for testing data:', rss_test)
    print('-------------------------------------')
Alpha 10.0
The number of nonzero coefficients: 15
RSS for validation data: 41329873098.09392
RSS for testing data: 44414705707.12618
-------------------------------------
Alpha 31.622776601683793
The number of nonzero coefficients: 15
RSS for validation data: 41415869253.07181
RSS for testing data: 45005110231.37096
-------------------------------------
Alpha 100.0
The number of nonzero coefficients: 11
RSS for validation data: 44607327874.68185
RSS for testing data: 48272769559.57964
-------------------------------------
Alpha 316.22776601683796
The number of nonzero coefficients: 6
RSS for validation data: 48130755687.090935
RSS for testing data: 51517241147.93143
-------------------------------------
Alpha 1000.0
The number of nonzero coefficients: 4
RSS for validation data: 67036713402.57403
RSS for testing data: 68423119736.932465
-------------------------------------
Alpha 3162.2776601683795
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
Alpha 10000.0
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
Alpha 31622.776601683792
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
Alpha 100000.0
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
Alpha 316227.7660168379
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
Alpha 1000000.0
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
Alpha 3162277.6601683795
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
Alpha 10000000.0
The number of nonzero coefficients: 1
RSS for validation data: 126881874356.73721
RSS for testing data: 128425315836.65945
-------------------------------------
3.2 Limit the number of nonzero weights

What if we absolutely wanted to limit ourselves to, say, 7 features? This may be important if we want to derive "a rule of thumb" --- an interpretable model that has only a few features in them.

In this section, you are going to implement a simple, two phase procedure to achive this goal:

Explore a large range of l1_penalty values to find a narrow region of l1_penalty values where models are likely to have the desired number of non-zero weights. Further explore the narrow region you found to find a good value for l1_penalty that achieves the desired sparsity. Here, we will again use a validation set to choose the best value for l1_penalty.

In [12]:
max_nonzeros = 7
l1_penalty = np.logspace(1, 4, num=20)

Extract the weights of the model and count the number of nonzeros. Save the number of nonzeros to a list.

In [13]:
non_zeros = []
for alpha in l1_penalty:
    print('Alpha:', alpha)
    model_lasso3 = linear_model.Lasso(alpha=alpha, normalize=True)
    model3 = model_lasso3.fit(training[all_features], training['price'])
    non_zeros.append(np.count_nonzero(model3.coef_)+np.count_nonzero(model3.intercept_))
    print('The number of nonzero coefs:',np.count_nonzero(model3.coef_)+np.count_nonzero(model3.intercept_))
    print('-----------------------------------')
Alpha: 10.0
The number of nonzero coefs: 15
-----------------------------------
Alpha: 14.38449888287663
The number of nonzero coefs: 15
-----------------------------------
Alpha: 20.6913808111479
The number of nonzero coefs: 15
-----------------------------------
Alpha: 29.76351441631318
The number of nonzero coefs: 15
-----------------------------------
Alpha: 42.81332398719393
The number of nonzero coefs: 13
-----------------------------------
Alpha: 61.58482110660264
The number of nonzero coefs: 12
-----------------------------------
Alpha: 88.58667904100822
The number of nonzero coefs: 11
-----------------------------------
Alpha: 127.42749857031335
The number of nonzero coefs: 10
-----------------------------------
Alpha: 183.29807108324357
The number of nonzero coefs: 7
-----------------------------------
Alpha: 263.6650898730358
The number of nonzero coefs: 6
-----------------------------------
Alpha: 379.26901907322497
The number of nonzero coefs: 6
-----------------------------------
Alpha: 545.5594781168514
The number of nonzero coefs: 6
-----------------------------------
Alpha: 784.7599703514607
The number of nonzero coefs: 5
-----------------------------------
Alpha: 1128.8378916846884
The number of nonzero coefs: 3
-----------------------------------
Alpha: 1623.776739188721
The number of nonzero coefs: 3
-----------------------------------
Alpha: 2335.7214690901214
The number of nonzero coefs: 2
-----------------------------------
Alpha: 3359.818286283781
The number of nonzero coefs: 1
-----------------------------------
Alpha: 4832.930238571752
The number of nonzero coefs: 1
-----------------------------------
Alpha: 6951.927961775606
The number of nonzero coefs: 1
-----------------------------------
Alpha: 10000.0
The number of nonzero coefs: 1
-----------------------------------
In [14]:
alpha_nonzeros = {}
for alpha, nonzero in zip(l1_penalty,non_zeros):
    alpha_nonzeros[alpha] = nonzero
alpha_nonzeros
Out[14]:
{10.0: 15,
 14.38449888287663: 15,
 20.6913808111479: 15,
 29.76351441631318: 15,
 42.81332398719393: 13,
 61.58482110660264: 12,
 88.58667904100822: 11,
 127.42749857031335: 10,
 183.29807108324357: 7,
 263.6650898730358: 6,
 379.26901907322497: 6,
 545.5594781168514: 6,
 784.7599703514607: 5,
 1128.8378916846884: 3,
 1623.776739188721: 3,
 2335.7214690901214: 2,
 3359.818286283781: 1,
 4832.930238571752: 1,
 6951.927961775606: 1,
 10000.0: 1}

What values did you find for l1_penalty_min andl1_penalty_max?

In [15]:
key1 = min([val for val in alpha_nonzeros.values() if val > max_nonzeros])
l1_penalty_min = max([k for k, v in alpha_nonzeros.items() if v == key1])
print('l1_penalty_min:',l1_penalty_min)

key2 = max([val for val in alpha_nonzeros.values() if val < max_nonzeros])
l1_penalty_max = min(k for k, v in alpha_nonzeros.items() if v == key2)
print('l1_penalty_max:',l1_penalty_max)
l1_penalty_min: 127.42749857031335
l1_penalty_max: 263.6650898730358

Out of this large range, we want to find the two ends of our desired narrow range of l1_penalty. At one end, we will have l1_penalty values that have too few non-zeros, and at the other end, we will have an l1_penalty that has too many non-zeros.

More formally, find:

  • The largest l1_penalty that has more non-zeros than max_nonzero (if we pick a penalty smaller than this value, we will definitely have too many non-zero weights). Store this value in the variable l1_penalty_min (we will use it later)
  • The smallest l1_penalty that has fewer non-zeros than max_nonzero (if we pick a penalty larger than this value, we will definitely have too few non-zero weights). Store this value in the variable l1_penalty_max (we will use it later)
3.3 Exploring a narrow range

We will now explore the narrow region of l1_penalty values we found. We want the solution with the right number of non-zeros and the lowest RSS on the validation set.

In [16]:
alp_list = []
rss_list = []
for l1_penalty in np.linspace(l1_penalty_min,l1_penalty_max,20):
    model_lasso4 = linear_model.Lasso(alpha=l1_penalty, normalize=True)
    model4 = model_lasso4.fit(training[all_features], training['price'])
    nonzeros = np.count_nonzero(model4.coef_)+np.count_nonzero(model4.intercept_)
    rss = mean_squared_error(validation['price'], model4.predict(validation[all_features]))
    if nonzeros == max_nonzeros:
        print('Alpha:', l1_penalty)
        print('The number of nonzero coefs:',nonzeros)
        print('RSS for Validation data:', rss)
        print('-----------------------------------')
        alp_list.append(l1_penalty)
        rss_list.append(rss)
Alpha: 156.10909673930755
The number of nonzero coefs: 7
RSS for Validation data: 45670717723.22954
-----------------------------------
Alpha: 163.2794962815561
The number of nonzero coefs: 7
RSS for Validation data: 45747533953.46188
-----------------------------------
Alpha: 170.44989582380464
The number of nonzero coefs: 7
RSS for Validation data: 45829444534.52412
-----------------------------------
Alpha: 177.6202953660532
The number of nonzero coefs: 7
RSS for Validation data: 45916597113.50973
-----------------------------------
Alpha: 184.79069490830176
The number of nonzero coefs: 7
RSS for Validation data: 46009000194.53192
-----------------------------------
Alpha: 191.96109445055032
The number of nonzero coefs: 7
RSS for Validation data: 46106879141.27054
-----------------------------------
Alpha: 199.13149399279888
The number of nonzero coefs: 7
RSS for Validation data: 46209729096.275475
-----------------------------------
In [17]:
alp_rss = {}
for alp, rss in zip(alp_list, rss_list):
    alp_rss[alp] = rss
alp_rss
Out[17]:
{156.10909673930755: 45670717723.22954,
 163.2794962815561: 45747533953.46188,
 170.44989582380464: 45829444534.52412,
 177.6202953660532: 45916597113.50973,
 184.79069490830176: 46009000194.53192,
 191.96109445055032: 46106879141.27054,
 199.13149399279888: 46209729096.275475}

What value of l1_penalty in our narrow range has the lowest RSS on the VALIDATION set and has sparsity equal to max_nonzeros?

In [18]:
alpha_for_min_rss = min(alp_rss.keys(), key = lambda i: alp_rss[i])
print('Alpha for min RSS:',alpha_for_min_rss)
Alpha for min RSS: 156.10909673930755

What features in this model have non-zero coefficients?

In [19]:
model_lasso5 = linear_model.Lasso(alpha= alpha_for_min_rss, normalize = True)
model5 = model_lasso5.fit(training[all_features], training['price'])
model5_coef = {}
for feat, coef in zip(all_features, model5.coef_):
    model5_coef[feat] = coef
model5_coef
Out[19]:
{'bedrooms': -0.0,
 'bedrooms_square': -0.0,
 'bathrooms': 10610.890284398276,
 'sqft_living': 163.38025164762894,
 'sqft_living_sqrt': 0.0,
 'sqft_lot': -0.0,
 'sqft_lot_sqrt': -0.0,
 'floors': 0.0,
 'floors_square': 0.0,
 'waterfront': 506451.68711484905,
 'view': 41960.04355485288,
 'condition': 0.0,
 'grade': 116253.5536997075,
 'sqft_above': 0.0,
 'sqft_basement': 0.0,
 'yr_built': -2612.2348803574882,
 'yr_renovated': 0.0}

4. LASSO Regression with Coordinate Descent

We will implement your very own LASSO solver via coordinate descent by writing a function to normalize features, implementing coordinate descent for LASSO, and exploring effects of L1 penalty.

In [20]:
### The get_numpy_data function
def get_numpy_data(data_sframe, features, output):
    features_matrix = data_sframe[features] 
    features_matrix['constant'] = 1
    features.insert(0, 'constant')
    features_matrix = features_matrix[features]
    features_matrix = np.array(features_matrix)
    output_array = data_sframe[output]
    output_array = np.array(output_array)
    return(features_matrix, output_array)
In [21]:
### The predict_output function
def predict_output(feature_matrix, weights):
    x_arr = feature_matrix*weights
    predictions = x_arr.sum(axis = 1)
    return predictions

In the house dataset, features vary wildly in their relative magnitude: sqft_living is very large overall compared to bedrooms, for instance. As a result, weight for sqft_living would be much smaller than weight for bedrooms. This is problematic because "small" weights are dropped first as l1_penalty goes up.

To give equal considerations for all features, we need to normalize features as discussed in the lectures: we divide each feature by its 2-norm so that the transformed feature has norm 1.

Let's see how we can do this normalization easily with Numpy: let us first consider a small matrix.

In [22]:
def normalize_features(features_matrix):
    norms = np.linalg.norm(features_matrix, axis=0)
    normalized_features = features_matrix / norms
    return (normalized_features, norms)
4.1 Implementing Coordinate Descent with normalized features
In [23]:
feature, price = get_numpy_data(sales, ['sqft_living', 'bedrooms'], 'price')
In [24]:
normalized_features, norms = normalize_features(feature)
In [25]:
ro = [0, 0, 0]
w = [1,4,1]
prediction = predict_output(normalized_features, w)
for i in range(normalized_features.shape[1]):
    prediction = predict_output(normalized_features, w)
    error = price - prediction + normalized_features[:,i]*w[i]
    ro[i] = np.dot(normalized_features[:,i],error.transpose())
    print(ro[i])
print(ro)
79400300.01452291
87939470.82325175
80966698.66623949
[79400300.01452291, 87939470.82325175, 80966698.66623949]
In [26]:
print('The range of values of l1_penalty: between', 80966698.66623949*2,'and', 87939470.82325175*2)
The range of values of l1_penalty: between 161933397.33247897 and 175878941.6465035
4.2 Single Coordinate Descent Step

Using the formula above, implement coordinate descent that minimizes the cost function over a single feature i. Note that the intercept (weight 0) is not regularized. The function should accept feature matrix, output, current weights, l1 penalty, and index of feature to optimize over. The function should return new weight for feature i.

In [27]:
def zerolistmaker(n):
    listofzeros = [0] * n
    return listofzeros
def lasso_coordinate_descent_step(i,feature_matrix, output, weights, l1_penalty):
    
    prediction = predict_output(feature_matrix, weights)
    error = output - prediction + feature_matrix[:,i]*weights[i]
    ro = np.dot(feature_matrix[:,i],error.transpose())

    if i == 0:
        weights = ro
    elif ro < -l1_penalty/2.:
        weights = ro + l1_penalty/2
    elif ro > l1_penalty/2.:
        weights = ro - l1_penalty/2
    else:
        weights = 0.
    return weights
In [28]:
# should print 0.425558846691
import math
print (lasso_coordinate_descent_step(1, np.array([[3./math.sqrt(13),1./math.sqrt(10)],
                   [2./math.sqrt(13),3./math.sqrt(10)]]), np.array([1., 1.]), np.array([1., 4.]), 0.1))
0.4255588466910251
4.3 Cyclical coordinate descent

Now that we have a function that optimizes the cost function over a single coordinate, let us implement cyclical coordinate descent where we optimize coordinates 0, 1, ..., (d-1) in order and repeat.

In [29]:
def lasso_cyclical_coordinate_descent(normalized_features, output, initial_weights, l1_penalty, tolerance):
    
    weights = initial_weights
    ini_weights = initial_weights
    loop = 0
    
    weights_change = zerolistmaker(normalized_features.shape[1])
    
    converged = False

    while not converged:
      
        loop+=1
                             
        for i in range(len(weights)):   
                    
            weights_new = lasso_coordinate_descent_step(i,normalized_features, output, weights, l1_penalty)
            weights_change[i] =  np.abs(weights_new - weights[i])
            weights[i] = weights_new
        
        max_change = max(weights_change)
        

        if max_change < tolerance:
            converged = True

    return weights
In [30]:
feature, price = get_numpy_data(sales, ['sqft_living', 'bedrooms'], 'price')

initial_weights = [0., 0., 0.]
l1_penalty = 1e7
tolerance = 1.0

w = lasso_cyclical_coordinate_descent(normalized_features, price, initial_weights, l1_penalty, tolerance)
print(w)
[21624997.959519118, 63157247.20788954, 0.0]
4.4 Evaluating LASSO fit with more features

Let us split the sales dataset into training and test sets, and create a normalized feature matrix from the training data with more features.

In [31]:
training = pd.read_csv('C:/Users/Duy Tung/Downloads/kc_house_train_data.csv', dtype = dtype_dict)
testing = pd.read_csv('C:/Users/Duy Tung/Downloads/kc_house_test_data.csv', dtype = dtype_dict)
### Let us consider the following set of features.
more_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 
                'view', 'condition', 'grade', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated']

First, create a normalized feature matrix from the TRAINING data with these features. (Make you store the norms for the normalization, since we'll use them later)

In [32]:
feature1, price1 = get_numpy_data(training, more_features, 'price')
normalized_features1, norms1 = normalize_features(feature1)
In [33]:
### l1_penalty=1e7

initial_weights1 = [0.]*len(more_features)
l1_penalty = 1e7
tolerance = 1.0

weights1e7 = lasso_cyclical_coordinate_descent(normalized_features1, price1, initial_weights1, l1_penalty, tolerance)
weights1e7_dict = {}
for feat, coef in zip(more_features, weights1e7):
    weights1e7_dict[feat] = coef
weights1e7_dict
Out[33]:
{'constant': 24429600.23440312,
 'bedrooms': 0.0,
 'bathrooms': 0.0,
 'sqft_living': 48389174.77154896,
 'sqft_lot': 0.0,
 'floors': 0.0,
 'waterfront': 3317511.2149216533,
 'view': 7329961.811714254,
 'condition': 0.0,
 'grade': 0.0,
 'sqft_above': 0.0,
 'sqft_basement': 0.0,
 'yr_built': 0.0,
 'yr_renovated': 0.0}
In [34]:
### l1_penalty=1e8

initial_weights1 = [0.]*len(more_features)
l1_penalty = 1e8
tolerance = 1.0

weights1e8 = lasso_cyclical_coordinate_descent(normalized_features1, price1, initial_weights1, l1_penalty, tolerance)
print(weights1e8)
weights1e8_dict = {}
for feat, coef in zip(more_features, weights1e8):
    weights1e8_dict[feat] = coef
weights1e8_dict
[71114625.71488704, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Out[34]:
{'constant': 71114625.71488704,
 'bedrooms': 0.0,
 'bathrooms': 0.0,
 'sqft_living': 0.0,
 'sqft_lot': 0.0,
 'floors': 0.0,
 'waterfront': 0.0,
 'view': 0.0,
 'condition': 0.0,
 'grade': 0.0,
 'sqft_above': 0.0,
 'sqft_basement': 0.0,
 'yr_built': 0.0,
 'yr_renovated': 0.0}
In [35]:
### l1_penalty=1e4

initial_weights1 = [0.]*len(more_features)
l1_penalty = 1e4
tolerance = 5e5

weights1e4 = lasso_cyclical_coordinate_descent(normalized_features1, price1, initial_weights1, l1_penalty, tolerance)
weights1e4_dict = {}
for feat, coef in zip(more_features, weights1e4):
    weights1e4_dict[feat] = coef
weights1e4_dict
Out[35]:
{'constant': 78564738.3415677,
 'bedrooms': -22097398.924305353,
 'bathrooms': 12791071.872785168,
 'sqft_living': 93808088.09281203,
 'sqft_lot': -2013172.7570495543,
 'floors': -4219184.932650203,
 'waterfront': 6482842.817535065,
 'view': 7127408.534806885,
 'condition': 5001664.854696389,
 'grade': 14327518.437140487,
 'sqft_above': -15770959.152373984,
 'sqft_basement': -5159591.222131496,
 'yr_built': -84495341.76843643,
 'yr_renovated': 2824439.4970368384}
4.5 Rescaling learned weights

Create a normalized version of each of the weights learned above. (weights1e4, weights1e7, weights1e8).

In [36]:
weights1e7_normalized = weights1e7 / norms1
weights1e8_normalized = weights1e8 / norms1
weights1e4_normalized = weights1e4 / norms1

Evaluating each of the learned models on the test data

In [37]:
more_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 
                'view', 'condition', 'grade', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated']
(feature2, price2) = get_numpy_data(testing, more_features, 'price')
In [38]:
weights_list = [weights1e4_normalized, weights1e7_normalized, weights1e8_normalized]
for w in weights_list:
    w_array = np.array(w)
    y_pred = np.dot(feature2, w_array)
    rss = (np.square(price2 - y_pred)).sum()
    print('RSS', rss)
RSS 228459958971393.25
RSS 275962075920366.78
RSS 537166151497322.75