Coursera Machine LearningコースのProgramming AssignmentをPythonで書く (Week 2)

この記事のまとめ

CourseraのMachine Learningコース Week 2のProgramming AssignmentをPythonで書く

背景

機械学習の勉強のために、CourseraのMachine Learningコースを受けております。

その中では機械学習のコーディングも含まれるのですが、プログラミング言語としてMATLAB/Octaveを使うことを対象にしています。MATLAB/Octaveは学習用にはいいのですが、現実的に実装するとなるとMATLABは結構高価なソフトウェアですし、Octaveだとライブラリが不十分かつ動作が不安定なので、一般的に機械学習ではよく使われるPythonを使っていきたいと考えております。

そこで、もともとはMATLAB/Octave用に用意されているProgramming Assignmentを勉強がてらにPythonで記述していこうと思います。

まずはWeek 2の第一回目のProgramming Assignmentについて実装します。初回は、線形回帰モデルの機械学習についての課題です。

※Week 1にはProgramming Assignmentはありません。

ファイル構成

ファイル構成は下記の通りです。オリジナルでは関数化されていたものの内、一部は関数化せずに直接ex1.py内に記載してしまっています。ex1data1.txt、ex1data2.txtはオリジナルからコピーしたものです。

ex1.py
computeCost.py
gradientDescent.py
ex1data1.txt
ex1data2.txt

ソースコードと解説 `ex1.py`

コード量が少し多いのでライブラリ読み込み部と、Part 1~4に分けて説明していきます。

ライブラリ読み込み

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
 
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # plot_surface
from matplotlib import cm # For using Color Map
 
# My functions for the "costFunction" function and the "gradientDescent" function
from computeCost import computeCost
from gradientDescent import gradientDescent

まずはライブラリの読み込みです。行列計算用のnumpy、2次元プロット、3次元プロット、カラーマップ用にmatplotlibとmpl_toolkitsを読み込みます。

また、自前の関数computeCostとgradientDescentを読み込みます。ソースコードは後で記載します。

Part 1

Part 1は、5x5の単位行列を作るだけです。numpyのeye関数を使うだけです。

## ==================== Part 1: Basic Function ====================
print('Running warmUpExercise ... ');
print('5x5 Identity Matrix: \n{0}'.format(np.eye((5))));
 
input('\nProgram paused. Press enter to continue.\n')

Part 2

Part 2は、ex1data1.txtを読み込んでプロットを行います。array型としてファイル読み込みを行う場合は、loadtxt関数を使えばよいです。また、読み込んだデータからXとyに分ける際、単に=を使ってしまうと参照渡しになってしまうため、値渡しをしたい場合には、.copy()を用います。

## ======================= Part 2: Plotting =======================
print('Plotting Data ...\n')
data = np.loadtxt("./ex1data1.txt",delimiter=",")
X = data[:, 0].copy()
y = data[:, 1].copy()
m = len(y) # number of training examples
 
# Plot Data
fig = plt.figure() # open a new figure window
plt.ion()
plt.plot(X, y, 'rx', label="Training data")
plt.xlabel("Population of City in 10,000s")
plt.ylabel("Profit in $10,000s")
 
plt.pause(0.01)
 
input('\nProgram paused. Press enter to continue.\n')

Part 3

Part 3は、最急勾配法を動かして機械学習を動かします。後述するgradientDescent関数を使い、 $\theta$ を最急勾配法によって求めます。最後の方で、これによって求めた $\theta$ を使い、新たな入力に対する予測を行っています。

## =================== Part 3: Gradient descent ===================
print('Running Gradient Descent ...\n')
 
X     = np.hstack([np.ones((m, 1)), np.c_[X]]) # Add x_0, size = <m x 2>
theta = np.zeros((2, 1)) # initialize fitting parameters
 
# Some gradient descent settings
iterations = 1500
alpha      = 0.01
 
# compute and display initial cost
J = computeCost(X, y, theta)
 
# run gradient descent
theta, J_history = gradientDescent(X, y, theta, alpha, iterations)
 
# print theta to screen
print('Theta found by gradient descent: {0} {1}'.format(theta[0], theta[1]))
 
# Plot the linear fit
plt.ion()
plt.plot(X[:,1], X.dot(theta), 'b', label="Linear regression")
plt.legend()
 
plt.pause(0.01)
 
 
# Predict values for population sizes of 35,000 and 70,000
predict1 = np.c_[1, 3.5].dot(theta)
print('For population = 35,000, we predict a profit of {0}'.format(predict1[0][0]*10000))
predict2 = np.c_[1, 7].dot(theta)
print('For population = 70,000, we predict a profit of {0}'.format(predict2[0][0]*10000))
 
input('\nProgram paused. Press enter to continue.\n')

Part 4

Part 4は、コスト関数を可視化します。 $\theta_0$ と $\theta_1$ の関数としてコスト関数 $J$ をplot_surface関数を用いた3次元プロットと、contour関数を用いた等高線図を描きます。

# ============= Part 4: Visualizing J(theta_0, theta_1) =============
print('Visualizing J(theta_0, theta_1) ...\n')
 
# Grid over which we will calculate J
theta0_vals = np.linspace(-10, 10, 100)
theta1_vals = np.linspace(-1, 4, 100)
 
# initialize J_vals to a matrix of 0's
J_vals = np.zeros((len(theta0_vals), len(theta1_vals)))
 
# Fill out J_vals
for i in range(0,len(theta0_vals)):
    for j in range(0,len(theta1_vals)):
        t = np.array([[theta0_vals[i]], [theta1_vals[j]]])
        J_vals[i][j] = computeCost(X, y, t)
 
# Because of the way meshgrids work in the surf command, we need to
# transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals.T;
theta0_vals, theta1_vals = np.meshgrid(theta0_vals, theta1_vals)
# Surface plot
fig2 = plt.figure()
ax = fig2.gca(projection='3d')
ax.plot_surface(theta0_vals, theta1_vals, J_vals, rstride=2, cstride=2, cmap=cm.jet)
ax.set_xlabel(r'$\theta_0$')
ax.set_ylabel(r'$\theta_1$')
ax.set_zlabel(r'J')
plt.pause(0.01)
 
# Contour plot
fig3 = plt.figure()
# Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
plt.contour(theta0_vals, theta1_vals, J_vals, np.logspace(-2, 3, 20))
plt.xlabel(r'$\theta_0$')
plt.ylabel(r'$\theta_1$')
plt.pause(0.01)
 
plt.plot(theta[0][0], theta[1][0], 'rx')
 
input('\nProgram paused. Press enter to continue.\n')

ソースコードと解説 `computeCost.py`

このファイルでは、線形回帰のためのコスト算出を行うcomputeCost関数を定義します。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
 
import numpy as np
 
def computeCost(X, y, theta):
# COMPUTECOST Compute cost for linear regression
# J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
# parameter for linear regression to fit the data points in X and y
 
    # Initialize some useful values
    m = len(y)
 
    # Compute the cost of a particular choice of theta
    J = 1/(2*m) * sum( (X.dot(theta) - np.c_[y])**2 )
 
    return J

コスト関数は本課題のPDFもしくは、week 2のLecture 4の資料にあるように、次のように計算します。

$J(\theta)=\frac{1}{2m}\sum^m_{i=1}\left(h_\theta(x^{(i)})-y^{(i)}\right)^2$

ここで、hypothesis $h_\theta(x)$ は線形回帰問題では、下記のように表されます。

$h_\theta(x)=\theta^Tx=\theta_0+\theta_1x_1$

ソースコードと解説 `gradientDescent.py`

このファイルでは、最急勾配法でコスト関数を極小化する $\theta$ を求める関数を定義します。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
 
import numpy as np
 
# My function
from computeCost import computeCost
 
def gradientDescent(X, y, theta, alpha, num_iters):
# GRADIENTDESCENT Performs gradient descent to learn theta
# theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
# taking num_iters gradient steps with learning rate alpha
 
    # Initialize some useful values
    m = len(y) # number of training examples
    J_history = np.zeros((num_iters, 1))
 
    for iter in range(1,num_iters):
 
        theta = theta - alpha * 1/m * np.c_[((X.dot(theta) - np.c_[y])  * X ).sum(0)];
 
        # Save the cost J in every iteration    
        J_history[iter] = computeCost(X, y, theta);
 
    return (theta, J_history)

最急勾配法は、コスト関数の微分値に係数 $\alpha$ を掛けた値を用いて $\theta$ を更新していきます。

$\theta_j := \theta_j - \alpha \frac{1}{m} \sum^m_{i=1} \left( h_\theta (x^{(i)}) - y^{(i)} \right) x_j^{(i)}$
以上です。最後まで読んでいただき、ありがとうございます。

　← 気に入っていただければ応援ポチをお願いします！

Search This Blog