Maximum Likelihood Estimation for Parameter Learning

Maximum Likelihood Estimation (MLE) is a frequentist approach for estimating the parameters of a model given some observed data. The general approach for using MLE is: i) observe some data; ii)write down a model for how we believe the data was generated; iii) set the parameters of our model to values which maximize the likelihood of the parameters given the data.

In this assignment, we’re going to cover the basics of creating models, understanding likelihood, and using the Maximum Likelihood Estimation process for fitting our parameters.

Learning objectives:

  1. Implement the link function for logistic regression.
  2. Write a function to compute the derivative of the log likelihood function with respect to a single coefficient.
  3. Implement gradient ascent.
  4. Given a set of coefficients, predict sentiments.
  5. Compute classification accuracy for the logistic regression model.

Before we start, let's import libraries and load review data!

In [1]:
import pandas as pd
import numpy as np

For this assignment, we will use a subset of the Amazon product review dataset. The subset was chosen to contain similar numbers of positive and negative reviews, as the original dataset consisted primarily of positive reviews.

In [2]:
products = pd.read_csv('C:/Users/tn00230/OneDrive - University of Surrey/Python/Data/Week2/amazon_baby_subset.csv')
products.head(5)
Out[2]:
name review rating sentiment
0 Stop Pacifier Sucking without tears with Thumb... All of my kids have cried non-stop when I trie... 5 1
1 Nature's Lullabies Second Year Sticker Calendar We wanted to get something to keep track of ou... 5 1
2 Nature's Lullabies Second Year Sticker Calendar My daughter had her 1st baby over a year ago. ... 5 1
3 Lamaze Peekaboo, I Love You One of baby's first and favorite books, and it... 4 1
4 SoftPlay Peek-A-Boo Where's Elmo A Children's ... Very cute interactive book! My son loves this ... 5 1

Let us quickly explore more of this dataset.

In [3]:
products.describe()
Out[3]:
rating sentiment
count 53072.000000 53072.000000
mean 3.097490 0.001620
std 1.730509 1.000008
min 1.000000 -1.000000
25% 1.000000 -1.000000
50% 4.000000 1.000000
75% 5.000000 1.000000
max 5.000000 1.000000
In [4]:
products[products['sentiment']>0].count()
Out[4]:
name         26521
review       26438
rating       26579
sentiment    26579
dtype: int64
In [5]:
# Frequency table of the variable 'rating'
freq_table = products['rating'].value_counts().to_frame('count')
freq_table['percent'] = freq_table['count']/freq_table['count'].sum()
freq_table
Out[5]:
count percent
5 20271 0.381953
1 15183 0.286083
2 11310 0.213107
4 6308 0.118857

1. Data Set-up

We will perform some simple feature cleaning using data frames. The last assignment used all words in building bag-of-words features, but here we limit ourselves to 193 words (for simplicity). We compiled a list of 193 most frequent words into the JSON file named important_words.json. Load the words into a list important_words.

1.1 Text cleaning

We will perform 2 simple data transformations: fill in N/A's in the review column and remove punctuation

In [6]:
products = products.fillna({'review':''})
In [7]:
def remove_punctuation(text):
    import string
    return text.translate(str.maketrans('','',string.punctuation)) 

products['review_clean'] = products['review'].apply(remove_punctuation)
1.2 Compute word counts (only for important_words)

For each word in important_words, we compute a count for the number of times the word occurs in the review. We will store this count in a separate column (one for each word). The result of this feature processing is a single column for each word in important_words which keeps a count of the number of times the respective word occurs in the review text.

As a result, the data frame products should contain one column for each of the 193 important_words. For example, the column perfect contains a count of the number of times the word perfect occurs in each of the reviews.

In [8]:
import json
with open('C:/Users/tn00230/OneDrive - University of Surrey/Python/Data/Week2/important_words.json') as important_words:    
    important_words = json.load(important_words)
    
for word in important_words:
    products[word] = products['review_clean'].apply(lambda s : s.split().count(word))
In [9]:
products.head(2)
Out[9]:
name review rating sentiment review_clean baby one great love use ... seems picture completely wish buying babies won tub almost either
0 Stop Pacifier Sucking without tears with Thumb... All of my kids have cried non-stop when I trie... 5 1 All of my kids have cried nonstop when I tried... 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 Nature's Lullabies Second Year Sticker Calendar We wanted to get something to keep track of ou... 5 1 We wanted to get something to keep track of ou... 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

2 rows × 198 columns

Calculate the number of product reviews that contain the word perfect

In [10]:
products['contains_prefect'] = np.where(products['perfect']>=1,1,0)
products['contains_prefect'].sum()
Out[10]:
2955

2. Implementing Logistic Regression From Scratch

  1. Convert data frame to multi-dimensional array
  2. Estimating conditional probability with link function
  3. Compute derivative of log likelihood with respect to a single coefficient
  4. Taking gradient steps
2.1 Convert data frame to multi-dimensional array

We will write a function that extracts columns from a data frame and converts them into a multi-dimensional array. The function should accept three parameters:

  • dataframe: a data frame to be converted
  • features: a list of string, containing the names of the columns that are used as features.
  • label: a string, containing the name of the single column that is used as class labels.

The function should return two values: one 2D array for features and one 1D array for class labels.

In [11]:
def get_numpy_data(dataframe, features, label):
    dataframe['constant'] = 1
    features = ['constant'] + features
    features_frame = dataframe[features]
    feature_matrix = features_frame.as_matrix()
    label_sarray = dataframe[label]
    label_array = label_sarray.as_matrix()
    return(feature_matrix, label_array) 

Using the function written, we extract two arrays feature_matrix and sentiment.

In [12]:
feature_matrix, sentiment = get_numpy_data(products, important_words, 'sentiment')
2.2 Estimating conditional probability with sigmoid function

The link function sigmoid given by: $$P(y_i=+1|x_i ,w)= \frac{1}{1+exp(−w^T h(x_i))}$$

We will write a function to calculate the conditional probability. The function should take two parameters: feature_matrix and coefficients. Then,

  • compute the dot product of feature_matrix and coefficients.
  • compute the sigmoid function P(y=+1|x,w).

The function return the predictions given by the sigmoid function.

In [13]:
def predict_probability(feature_matrix, coefficients):
    score = np.dot(feature_matrix,coefficients.transpose())
    predictions = 1/(1+np.exp(-score))
    return predictions
2.3 Compute derivative of log likelihood with respect to a single coefficient

We will write a function feature_derivative that computes the derivative of log likelihood with respect to a single coefficient w_j. The function should:

  • accept two parameters errors and feature.
  • compute the dot product of errors and feature.
  • return the dot product. This is the derivative with respect to a single coefficient w_j.

Given derivative of (log-)likelihood: $$ \frac{\partial l}{\partial w_j} = \displaystyle\sum_{i=1}^{N} h(x_i)(1[y_i = +1]-(y_i=+1|x_i ,w))$$

In [14]:
def feature_derivative(errors, feature):     
    derivative = np.dot(errors.transpose(),feature)
    return derivative

We will also write a function to calculate the log-likelihood using the following formula: $$ ll(w) = \displaystyle\sum_{i=1}^{N} \Big((1[y_i = +1]-1)w^T h(w_i)-ln(1+exp(−w^T h(x_i)))\Big) $$

In [15]:
def compute_log_likelihood(feature_matrix, sentiment, coefficients):
    indicator = (sentiment==+1)
    scores = np.dot(feature_matrix, coefficients)
    lp = np.sum((indicator-1)*scores - np.log(1. + np.exp(-scores)))
    return lp
2.4 Taking gradient steps

Now we are ready to implement our own logistic regression. All we have to do is to write a gradient ascent function that takes gradient steps towards the optimum. We will write a function that fit a logistic regression model using gradient ascent. The function should accept the following parameters:

  • feature_matrix: 2D array of features
  • sentiment: 1D array of class labels
  • initial_coefficients: 1D array containing initial values of coefficients
  • step_size: a parameter controlling the size of the gradient steps
  • max_iter: number of iterations to run gradient ascent The function returns a set of coefficients after performing gradient ascent.

The function should carry out the following steps:

  1. initialize vector coefficients to initial_coefficients.
  2. predict the class probability P(yi=+1|xi,w) using your predict_probability function and save it to variable predictions.
  3. compute indicator value for (y_i = +1) by comparing sentiment against +1. Save it to variable indicator.
  4. compute the errors as difference between indicator and predictions. Save the errors to variable errors.
  5. For each j-th coefficient, compute the per-coefficient derivative by calling feature_derivative with the j-th column of feature_matrix. Then increment the j-th coefficient by (step_size*derivative).
  6. Once in a while, insert code to print out the log likelihood.
  7. Repeat steps 2-6 for max_iter times.
In [16]:
from math import sqrt
def logistic_regression(feature_matrix, sentiment, initial_coefficients, step_size, max_iter):
    coefficients = np.array(initial_coefficients) # make sure it's a numpy array
    for itr in range(max_iter):
        predictions = predict_probability(feature_matrix, coefficients)
        indicator = (sentiment==+1)
        errors = indicator - predictions
        for j in range(len(coefficients)): 
            derivative = feature_derivative(errors, feature_matrix[:,j])
            coefficients[j]=coefficients[j] + step_size*derivative
        if itr <= 15 or (itr <= 100 and itr % 10 == 0) or (itr <= 1000 and itr % 100 == 0) \
        or (itr <= 10000 and itr % 1000 == 0) or itr % 10000 == 0:
            lp = compute_log_likelihood(feature_matrix, sentiment, coefficients)
            print( 'iteration %*d: log likelihood of observed labels = %.8f' % \
                (int(np.ceil(np.log10(max_iter))), itr, lp))
    return coefficients

Now, let's try to run the logistic regression solver with the parameters below:

In [17]:
initial_coefficients = np.zeros(194)
step_size = 1e-7
max_iter = 301
coef = logistic_regression(feature_matrix, sentiment, initial_coefficients, step_size, max_iter)
iteration   0: log likelihood of observed labels = -36780.91768478
iteration   1: log likelihood of observed labels = -36775.13434712
iteration   2: log likelihood of observed labels = -36769.35713564
iteration   3: log likelihood of observed labels = -36763.58603240
iteration   4: log likelihood of observed labels = -36757.82101962
iteration   5: log likelihood of observed labels = -36752.06207964
iteration   6: log likelihood of observed labels = -36746.30919497
iteration   7: log likelihood of observed labels = -36740.56234821
iteration   8: log likelihood of observed labels = -36734.82152213
iteration   9: log likelihood of observed labels = -36729.08669961
iteration  10: log likelihood of observed labels = -36723.35786366
iteration  11: log likelihood of observed labels = -36717.63499744
iteration  12: log likelihood of observed labels = -36711.91808422
iteration  13: log likelihood of observed labels = -36706.20710739
iteration  14: log likelihood of observed labels = -36700.50205049
iteration  15: log likelihood of observed labels = -36694.80289716
iteration  20: log likelihood of observed labels = -36666.39512033
iteration  30: log likelihood of observed labels = -36610.01327118
iteration  40: log likelihood of observed labels = -36554.19728365
iteration  50: log likelihood of observed labels = -36498.93316099
iteration  60: log likelihood of observed labels = -36444.20783914
iteration  70: log likelihood of observed labels = -36390.00909449
iteration  80: log likelihood of observed labels = -36336.32546144
iteration  90: log likelihood of observed labels = -36283.14615871
iteration 100: log likelihood of observed labels = -36230.46102347
iteration 200: log likelihood of observed labels = -35728.89418769
iteration 300: log likelihood of observed labels = -35268.51212683

3. Predicting Sentiments

Using the result estimated above, we compute class predictions. We do this in two steps:

  1. First compute the scores using feature_matrix and coefficients using a dot product.
  2. Then apply threshold 0 on the scores to compute the class predictions. Refer to the formula above.
In [18]:
# Compute the scores as a dot product between feature_matrix and coefficients.
scores = np.dot(feature_matrix, coef)
In [19]:
class_pred = np.zeros_like(scores)
for i in range(feature_matrix.shape[0]):
    if scores[i]<0:
        class_pred[i] = -1
    else:
        class_pred[i] = +1
In [20]:
print("The number of positive sentiment:", np.sum(class_pred>0))
The number of positive sentiment: 25126

4. Measuring Accuracy

To do this we are going to see how the model performs on the new data (test set) accuracy is defined as: $$ \text{accuracy} = \frac{\text{correctly classified examples}}{\text{total examples}}$$

In [22]:
from sklearn.metrics import accuracy_score
print("Our accuracy was:", accuracy_score(sentiment, class_pred))
Our accuracy was: 0.7518653904130238

5. Report Ten "most positive" Words

In [23]:
coefficients = list(coef[1:]) # exclude intercept
word_coefficient_tuples = [(word, coefficient) for word, coefficient in zip(important_words, coefficients)]
word_coefficient_tuples = sorted(word_coefficient_tuples, key=lambda x:x[1], reverse=True)
In [24]:
word_coefficient_tuples
Out[24]:
[('great', 0.0665460841704577),
 ('love', 0.06589076292212326),
 ('easy', 0.06479458680257838),
 ('little', 0.045435626308421365),
 ('loves', 0.04497640139490604),
 ('well', 0.030135001092107077),
 ('perfect', 0.029739937104968462),
 ('old', 0.020077541034775378),
 ('nice', 0.018408707995268992),
 ('daughter', 0.017703199905701694),
 ('soft', 0.01757027224560289),
 ('fits', 0.01688247107140872),
 ('happy', 0.01680529588976808),
 ('baby', 0.015565696580423507),
 ('recommend', 0.015408450108008665),
 ('also', 0.015216196422918844),
 ('best', 0.014991791565630264),
 ('comfortable', 0.0132539900815849),
 ('car', 0.012685935745813375),
 ('clean', 0.012018174433365525),
 ('son', 0.011944817713693955),
 ('bit', 0.011708248093123262),
 ('works', 0.011703160621987424),
 ('size', 0.010715966516270301),
 ('stroller', 0.009909164635972736),
 ('room', 0.009783241021568061),
 ('price', 0.009572733543590181),
 ('play', 0.00917842898398431),
 ('easily', 0.009032818138954407),
 ('kids', 0.008582843004346528),
 ('still', 0.008264666945374463),
 ('lot', 0.007999389349248327),
 ('around', 0.007508918097314341),
 ('need', 0.00717190727002709),
 ('take', 0.0067101233146641725),
 ('keep', 0.006437668081870187),
 ('crib', 0.006002799788638706),
 ('without', 0.005923536113220277),
 ('year', 0.005733200028487892),
 ('set', 0.005674789908056569),
 ('cute', 0.00553751236405903),
 ('use', 0.005017438816565412),
 ('big', 0.004606618761970606),
 ('diaper', 0.004279382890354769),
 ('wish', 0.00402037650809975),
 ('seat', 0.00398353363941361),
 ('though', 0.0033448995950862924),
 ('every', 0.003081478639854043),
 ('enough', 0.0030678550129901094),
 ('able', 0.0029803067506612626),
 ('bag', 0.0028617875161052087),
 ('babies', 0.00285818985134274),
 ('seems', 0.002815332188646534),
 ('night', 0.0028082590666831486),
 ('good', 0.002768203906397681),
 ('many', 0.002646610638512129),
 ('makes', 0.0023134923740490217),
 ('pretty', 0.0022363904560238454),
 ('long', 0.0021872401621619436),
 ('think', 0.0017456314732199937),
 ('toy', 0.0017230705102314389),
 ('since', 0.0015547861236921195),
 ('looking', 0.0015347116392846498),
 ('us', 0.0015136828514065383),
 ('purchase', 0.0012225783235106362),
 ('put', 0.000899239081415468),
 ('cover', 0.0008292511599959564),
 ('used', 0.0006101288089752651),
 ('found', 0.00041209171300655653),
 ('really', 0.0002725920624677609),
 ('won', 0.00011988500346780872),
 ('go', 9.91666827481431e-05),
 ('looks', 1.366992855023098e-05),
 ('high', -0.00018648923566776294),
 ('day', -0.0001885722805001113),
 ('bottles', -0.0003356660003498164),
 ('chair', -0.0005159604846262285),
 ('using', -0.0006573503620923425),
 ('side', -0.0008613904437686893),
 ('worth', -0.0009762196756223372),
 ('almost', -0.0011449351626316186),
 ('hold', -0.0012476158571719626),
 ('months', -0.0013621951573487417),
 ('look', -0.0016457514473845493),
 ('sure', -0.0016689531424132907),
 ('find', -0.001941995510954018),
 ('amazon', -0.001973209371896165),
 ('month', -0.002203176605623071),
 ('getting', -0.002220344023969568),
 ('come', -0.002478056986088657),
 ('second', -0.003018666010667183),
 ('head', -0.0030257887493927997),
 ('small', -0.0030534547593308713),
 ('place', -0.0033188773175081994),
 ('together', -0.0034133092891886524),
 ('want', -0.0034808852314084323),
 ('like', -0.003504884133335206),
 ('give', -0.0035098451575682472),
 ('say', -0.0037369438587485197),
 ('wanted', -0.0038141640936843104),
 ('know', -0.004074970100351704),
 ('took', -0.00426644386401471),
 ('much', -0.00439700067450943),
 ('see', -0.004659737406888565),
 ('purchased', -0.0047899094286734025),
 ('fit', -0.004795795278093019),
 ('gate', -0.005012917306197128),
 ('bottle', -0.00504082163624915),
 ('different', -0.005041275079058815),
 ('came', -0.00510115608429506),
 ('however', -0.005102879784877111),
 ('make', -0.0052058842106071585),
 ('new', -0.005287862195972152),
 ('buying', -0.0054430034643825976),
 ('last', -0.005470161541392435),
 ('actually', -0.005605733475251416),
 ('less', -0.005654987222257695),
 ('child', -0.006087140302347554),
 ('started', -0.006262831503162304),
 ('instead', -0.006300126718179463),
 ('water', -0.006300542962225544),
 ('maybe', -0.0064009893446270215),
 ('problem', -0.006409589155003782),
 ('right', -0.006413688585050476),
 ('tub', -0.006475876873018089),
 ('said', -0.006762897175112026),
 ('went', -0.006870502394402382),
 ('quality', -0.006910108112146037),
 ('pump', -0.006952869149061942),
 ('top', -0.00700791911646568),
 ('part', -0.007041719908427805),
 ('ordered', -0.0070757309816473464),
 ('either', -0.00709205934413514),
 ('bottom', -0.007220988012088569),
 ('anything', -0.007224199425910094),
 ('made', -0.007353459369094525),
 ('weeks', -0.007372630597764411),
 ('design', -0.00773228781737099),
 ('times', -0.007764187808314246),
 ('picture', -0.008101502952676016),
 ('away', -0.008317829807696091),
 ('one', -0.008502046745757372),
 ('milk', -0.008935728615993477),
 ('stay', -0.009066533373590218),
 ('open', -0.009120491910748275),
 ('cup', -0.009259144043189976),
 ('worked', -0.009315208191956718),
 ('trying', -0.009874247980135878),
 ('completely', -0.010006213059178752),
 ('got', -0.010147655511657534),
 ('difficult', -0.010219204790778673),
 ('piece', -0.010282634317735408),
 ('two', -0.010576801537792878),
 ('box', -0.010669132724912933),
 ('going', -0.010770935491004765),
 ('try', -0.010918793208792996),
 ('another', -0.010950912169649826),
 ('unit', -0.0115230475554972),
 ('working', -0.011895059489782873),
 ('idea', -0.012101030310887904),
 ('bought', -0.012170641997242547),
 ('company', -0.012506394670958822),
 ('received', -0.01260518995366231),
 ('bad', -0.012754571329579798),
 ('something', -0.012808838024750812),
 ('never', -0.012969046546319292),
 ('first', -0.013213475301677043),
 ('hard', -0.013756676731261401),
 ('thing', -0.013868727265297583),
 ('cheap', -0.014709833465080665),
 ('reviews', -0.014866319449976982),
 ('plastic', -0.01497704490358794),
 ('better', -0.01504065897704339),
 ('broke', -0.01553769895565389),
 ('returned', -0.016001798500102516),
 ('item', -0.01713786701085479),
 ('buy', -0.01773754399721805),
 ('time', -0.018246193486087033),
 ('way', -0.018359460662945686),
 ('tried', -0.018702371424325834),
 ('could', -0.01984625666077721),
 ('thought', -0.021394348543682485),
 ('waste', -0.02404274807115496),
 ('monitor', -0.024482100545891724),
 ('return', -0.026592778462247283),
 ('back', -0.027742697230661334),
 ('get', -0.028711552980192574),
 ('disappointed', -0.028978976142317068),
 ('even', -0.030051249236035804),
 ('work', -0.03306951529475273),
 ('money', -0.038982037286487116),
 ('product', -0.041511033392108904),
 ('would', -0.05386014844520314)]
In [ ]: