Introduction to Advanced Machine Learning, 第一周, week01_pa(hse-aml/intro-to-dl,简单注释,答案,附图)

本文主要是介绍Introduction to Advanced Machine Learning, 第一周, week01_pa(hse-aml/intro-to-dl,简单注释,答案,附图),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

这是俄罗斯高等经济学院的系列课程第一门,Introduction to Advanced Machine Learning,第一周编程作业。
这个作业一共六个任务,难易程度:容易。
1. 计算probability
2. 计算loss function
3. 计算stochastic gradient
4. 计算mini-batch gradient
5. 计算momentum gradient
6. 计算RMS prop gradient
从3到6,收敛应该越来越快,越来越稳定。

Programming assignment (Linear models, Optimization)

In this programming assignment you will implement a linear classifier and train it using stochastic gradient descent modifications and numpy.

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import sys
sys.path.append("..")
import grading

Two-dimensional classification

To make things more intuitive, let’s solve a 2D classification problem with synthetic data.

with open('train.npy', 'rb') as fin:X = np.load(fin)with open('target.npy', 'rb') as fin:y = np.load(fin)plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, s=20)
plt.show()

!这里写图片描述

Task

Features

As you can notice the data above isn’t linearly separable. Since that we should add features (or use non-linear model). Note that decision line between two classes have form of circle, since that we can add quadratic features to make the problem linearly separable. The idea under this displayed on image below:

def expand(X):"""Adds quadratic features. This expansion allows your linear model to make non-linear separation.For each sample (row in matrix), compute an expanded row:[feature0, feature1, feature0^2, feature1^2, feature0*feature1, 1]:param X: matrix of features, shape [n_samples,2]:returns: expanded features of shape [n_samples,6]"""X_expanded = np.ones((X.shape[0], 6))X_expanded[:,0] = X[:,0]X_expanded[:,1] = X[:,1]X_expanded[:,2] = X[:,0] * X[:,0]X_expanded[:,3] = X[:,1] * X[:,1]X_expanded[:,4] = X[:,0] * X[:,1]return X_expanded
X_expanded = expand(X)
print(X_expanded)
[[ 1.20798057  0.0844994   1.45921706  0.00714015  0.10207364  1.        ][ 0.76121787  0.72510869  0.57945265  0.52578261  0.5519657   1.        ][ 0.55256189  0.51937292  0.30532464  0.26974823  0.28698568  1.        ]..., [-1.22224754  0.45743421  1.49388906  0.20924606 -0.55909785  1.        ][ 0.43973452 -1.47275142  0.19336645  2.16899674 -0.64761963  1.        ][ 1.4928118   1.15683375  2.22848708  1.33826433  1.72693508  1.        ]]

Here are some tests for your implementation of expand function.

# simple test on random numbersdummy_X = np.array([[0,0],[1,0],[2.61,-1.28],[-0.59,2.1]])# call your expand function
dummy_expanded = expand(dummy_X)# what it should have returned:   x0       x1       x0^2     x1^2     x0*x1    1
dummy_expanded_ans = np.array([[ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  1.    ],[ 1.    ,  0.    ,  1.    ,  0.    ,  0.    ,  1.    ],[ 2.61  , -1.28  ,  6.8121,  1.6384, -3.3408,  1.    ],[-0.59  ,  2.1   ,  0.3481,  4.41  , -1.239 ,  1.    ]])#tests
assert isinstance(dummy_expanded,np.ndarray), "please make sure you return numpy array"
assert dummy_expanded.shape == dummy_expanded_ans.shape, "please make sure your shape is correct"
assert np.allclose(dummy_expanded,dummy_expanded_ans,1e-3), "Something's out of order with features"print("Seems legit!")
Seems legit!

Logistic regression

To classify objects we will obtain probability of object belongs to class ‘1’. To predict probability we will use output of linear model and logistic function:

a(x;w)=w,x a ( x ; w ) = ⟨ w , x ⟩

P(y=1x,w)=11+exp(w,x)=σ(w,x) P ( y = 1 | x , w ) = 1 1 + exp ⁡ ( − ⟨ w , x ⟩ ) = σ ( ⟨ w , x ⟩ )

def probability(X, w):"""Given input features and weightsreturn predicted probabilities of y==1 given x, P(y=1|x), see description aboveDon't forget to use expand(X) function (where necessary) in this and subsequent functions.:param X: feature matrix X of shape [n_samples,6] (expanded):param w: weight vector w of shape [6] for each of the expanded features:returns: an array of predicted probabilities in [0,1] interval."""# TODO:<your code here>prob = 1/(1+np.exp(-np.dot(X,w)))
dummy_weights = np.linspace(-1, 1, 6)
ans_part1 = probability(X_expanded[:1, :], dummy_weights)[0]
## GRADED PART, DO NOT CHANGE!
grader.set_answer("xU7U4", ans_part1)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

In logistic regression the optimal parameters w w are found by cross-entropy minimization:

L(w)=1i=1[yilogP(yi|xi,w)+(1yi)log(1P(yi|xi,w))]

def compute_loss(X, y, w):"""Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,and weight vector w [6], compute scalar loss function using formula above."""# TODO:<your code here>prob = probability(X,w)n_sample = X.shape[0]loss = -sum(y * np.log(prob) + (1-y) * np.log(1-prob))/n_sample
# use output of this cell to fill answer field 
ans_part2 = compute_loss(X_expanded, y, dummy_weights)
## GRADED PART, DO NOT CHANGE!
grader.set_answer("HyTF6", ans_part2)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

Since we train our model with gradient descent, we should compute gradients.

To be specific, we need a derivative of loss function over each weight [6 of them].

wL=... ∇ w L = . . .

We won’t be giving you the exact formula this time — instead, try figuring out a derivative with pen and paper.

As usual, we’ve made a small test for you, but if you need more, feel free to check your math against finite differences (estimate how L L changes if you shift w by 105 10 − 5 or so).

def compute_grad(X, y, w):"""Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,and weight vector w [6], compute vector [6] of derivatives of L over each weights."""# X [n,d] n examples, d features# y [n,] n examples, outputs# w [d,] d features# grad[d]# np.dot(X.T, dz) [d,n][n,] = [d,]# TODO<your code here>a = probability(X,w)dz = a - y #[n,]grad = -1.0 / X.shape[0] * np.dot(X.T, dz)# because the minus here, the following update is positive, instead of negative.
# use output of this cell to fill answer field 
ans_part3 = np.linalg.norm(compute_grad(X_expanded, y, dummy_weights))
## GRADED PART, DO NOT CHANGE!
grader.set_answer("uNidL", ans_part3)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

Here’s an auxiliary function that visualizes the predictions:

from IPython import displayh = 0.01
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))def visualize(X, y, w, history):"""draws classifier prediction with matplotlib magic"""Z = probability(expand(np.c_[xx.ravel(), yy.ravel()]), w)Z = Z.reshape(xx.shape)plt.subplot(1, 2, 1)plt.contourf(xx, yy, Z, alpha=0.8)plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)plt.xlim(xx.min(), xx.max())plt.ylim(yy.min(), yy.max())plt.subplot(1, 2, 2)plt.plot(history)plt.grid()ymin, ymax = plt.ylim()plt.ylim(0, ymax)display.clear_output(wait=True)plt.show()
visualize(X, y, dummy_weights, [0.5, 0.5, 0.25])

Training

In this section we’ll use the functions you wrote to train our classifier using stochastic gradient descent.

You can try change hyperparameters like batch size, learning rate and so on to find the best one, but use our hyperparameters when fill answers.

Mini-batch SGD

Stochastic gradient descent just takes a random example on each iteration, calculates a gradient of the loss on it and makes a step:

wt=wt1η1mj=1mwL(wt,xij,yij) w t = w t − 1 − η 1 m ∑ j = 1 m ∇ w L ( w t , x i j , y i j )

# please use np.random.seed(42), eta=0.1, n_iter=100 and batch_size=4 for deterministic resultsnp.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])eta= 0.1 # learning raten_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))for i in range(n_iter):ind = np.random.choice(X_expanded.shape[0], batch_size)loss[i] = compute_loss(X_expanded, y, w)if i % 10 == 0:visualize(X_expanded[ind, :], y[ind], w, loss)# TODO:<your code here>grad = compute_grad(X_expanded[ind, :],y[ind],w)w  = w + eta * grad
visualize(X, y, w, loss)
plt.clf()
# use output of this cell to fill answer field ans_part4 = compute_loss(X_expanded, y, w)

这里写图片描述

## GRADED PART, DO NOT CHANGE!
grader.set_answer("ToK7N", ans_part4)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

SGD with momentum

Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations as can be seen in image below. It does this by adding a fraction α α of the update vector of the past time step to the current update vector.


νt=ανt1+η1mj=1mwL(wt,xij,yij) ν t = α ν t − 1 + η 1 m ∑ j = 1 m ∇ w L ( w t , x i j , y i j )

wt=wt1νt w t = w t − 1 − ν t


# please use np.random.seed(42), eta=0.05, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])eta = 0.05 # learning rate
alpha = 0.9 # momentum
nu = np.zeros_like(w)n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))for i in range(n_iter):ind = np.random.choice(X_expanded.shape[0], batch_size)loss[i] = compute_loss(X_expanded, y, w)if i % 10 == 0:visualize(X_expanded[ind, :], y[ind], w, loss)# TODO:<your code here>nu = alpha * nu + eta *  compute_grad(X_expanded[ind, :],y[ind],w)w  = w + nu
visualize(X, y, w, loss)
plt.clf()
# use output of this cell to fill answer field ans_part5 = compute_loss(X_expanded, y, w)

这里写图片描述

## GRADED PART, DO NOT CHANGE!
grader.set_answer("GBdgZ", ans_part5)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

RMSprop

Implement RMSPROP algorithm, which use squared gradients to adjust learning rate:

Gtj=αGt1j+(1α)g2tj G j t = α G j t − 1 + ( 1 − α ) g t j 2

wtj=wt1jηGtj+εgtj w j t = w j t − 1 − η G j t + ε g t j

# please use np.random.seed(42), eta=0.1, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)w = np.array([0, 0, 0, 0, 0, 1.])eta = 0.1 # learning rate
alpha = 0.9 # moving average of gradient norm squared
g2 = np.zeros_like(w)
eps = 1e-8n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):ind = np.random.choice(X_expanded.shape[0], batch_size)loss[i] = compute_loss(X_expanded, y, w)if i % 10 == 0:visualize(X_expanded[ind, :], y[ind], w, loss)# TODO:<your code here>grad = compute_grad(X_expanded[ind, :],y[ind],w)g2 = alpha * g2 + (1-alpha) * grad ** 2w  = w + eta/np.sqrt(g2 + eps) * grad
visualize(X, y, w, loss)
plt.clf()

这里写图片描述

# use output of this cell to fill answer field 
ans_part6 = compute_loss(X_expanded, y, w)
## GRADED PART, DO NOT CHANGE!
grader.set_answer("dLdHG", ans_part6)
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

这篇关于Introduction to Advanced Machine Learning, 第一周, week01_pa(hse-aml/intro-to-dl,简单注释,答案,附图)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!


原文地址:
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.chinasem.cn/article/264446

相关文章

C/C++ chrono简单使用场景示例详解

《C/C++chrono简单使用场景示例详解》:本文主要介绍C/C++chrono简单使用场景示例详解,本文通过实例代码给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友... 目录chrono使用场景举例1 输出格式化字符串chrono使用场景China编程举例1 输出格式化字符串示

windows和Linux安装Jmeter与简单使用方式

《windows和Linux安装Jmeter与简单使用方式》:本文主要介绍windows和Linux安装Jmeter与简单使用方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地... 目录Windows和linux安装Jmeter与简单使用一、下载安装包二、JDK安装1.windows设

CSS 样式表的四种应用方式及css注释的应用小结

《CSS样式表的四种应用方式及css注释的应用小结》:本文主要介绍了CSS样式表的四种应用方式及css注释的应用小结,本文通过实例代码给大家介绍的非常详细,详细内容请阅读本文,希望能对你有所帮助... 一、外部 css(推荐方式)定义:将 CSS 代码保存为独立的 .css 文件,通过 <link> 标签

IDEA自动生成注释模板的配置教程

《IDEA自动生成注释模板的配置教程》本文介绍了如何在IntelliJIDEA中配置类和方法的注释模板,包括自动生成项目名称、包名、日期和时间等内容,以及如何定制参数和返回值的注释格式,需要的朋友可以... 目录项目场景配置方法类注释模板定义类开头的注释步骤类注释效果方法注释模板定义方法开头的注释步骤方法注

使用Python开发一个简单的本地图片服务器

《使用Python开发一个简单的本地图片服务器》本文介绍了如何结合wxPython构建的图形用户界面GUI和Python内建的Web服务器功能,在本地网络中搭建一个私人的,即开即用的网页相册,文中的示... 目录项目目标核心技术栈代码深度解析完整代码工作流程主要功能与优势潜在改进与思考运行结果总结你是否曾经

Mysql表的简单操作(基本技能)

《Mysql表的简单操作(基本技能)》在数据库中,表的操作主要包括表的创建、查看、修改、删除等,了解如何操作这些表是数据库管理和开发的基本技能,本文给大家介绍Mysql表的简单操作,感兴趣的朋友一起看... 目录3.1 创建表 3.2 查看表结构3.3 修改表3.4 实践案例:修改表在数据库中,表的操作主要

springboot简单集成Security配置的教程

《springboot简单集成Security配置的教程》:本文主要介绍springboot简单集成Security配置的教程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,... 目录集成Security安全框架引入依赖编写配置类WebSecurityConfig(自定义资源权限规则

idea中创建新类时自动添加注释的实现

《idea中创建新类时自动添加注释的实现》在每次使用idea创建一个新类时,过了一段时间发现看不懂这个类是用来干嘛的,为了解决这个问题,我们可以设置在创建一个新类时自动添加注释,帮助我们理解这个类的用... 目录前言:详细操作:步骤一:点击上方的 文件(File),点击&nbmyHIgsp;设置(Setti

如何使用Python实现一个简单的window任务管理器

《如何使用Python实现一个简单的window任务管理器》这篇文章主要为大家详细介绍了如何使用Python实现一个简单的window任务管理器,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起... 任务管理器效果图完整代码import tkinter as tkfrom tkinter i

C++中函数模板与类模板的简单使用及区别介绍

《C++中函数模板与类模板的简单使用及区别介绍》这篇文章介绍了C++中的模板机制,包括函数模板和类模板的概念、语法和实际应用,函数模板通过类型参数实现泛型操作,而类模板允许创建可处理多种数据类型的类,... 目录一、函数模板定义语法真实示例二、类模板三、关键区别四、注意事项 ‌在C++中,模板是实现泛型编程