用tflearn來做深度學習辨識初音

之前的玩玩tensorflow-mnist手寫辨識初探了tensorflow跟CNN的威力之後
今天心血來潮,想寫個自動判別一張圖片裏面有沒有初音的神經網路

正好看到了用tensorflow實現的深度學習演算法的高階套件tflearn

tflearn官網開宗名義就點明了他的定位Deep learning library featuring a higher-level API for TensorFlow.,純用TensorFlow的時候必須自己去打點每個神經元的輸入輸出維度定義等等,而tflearn就常用的deep learning方法封裝成更容易使用的API,使用上也更為直覺

開始正題
整篇分兩部分

  • 準備訓練資料訓練CNN
  • 隨機抓圖測試

準備訓練資料訓練CNN

我想要訓練一個網路去幫我辨識一張圖片是不是初音的圖片
首先上網隨機抓了20張初音的JPG圖片然後全部resize成100x100

再隨便抓20張不是初音的圖片然後全部resize成100x100

上面的40張圖片當作我的training set

在隨機抓5張初音的圖片跟5張不是初音的圖片resize成100x100後當作我的validation set

這樣一來訓練資料就準備好了: https://github.com/del680202/MachineLearning-memo/blob/master/src/tensorflow/cnn_dataset_mini.zip
樣本有點少,但是理論上深度學習就是少少的資料也能做出不錯的效果才是

事先裝好numpy等環境之後,tensorflow跟tflearn的用pip就能簡單裝起來了
numpy等環境個人常用Anaconda,https://www.continuum.io/downloads

tensorflow跟tflearn的安裝
參考:https://www.tensorflow.org/get_started/os_setup

$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py2-none-any.whl
$ pip install  $TF_BINARY_URL
$ pip install tflearn

把剛才的測試資料準備好解壓縮分別放到miku,no-miku,test-set三個目錄之後
準備一個cnn.py檔案依序填入以下內容

首先把該import的填一填

import tensorflow as tf
import numpy as np
import os
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
from tflearn.data_utils import load_image

載入訓練資料的圖片轉成numpy的array

SCRIPT_PATH = os.path.dirname(os.path.abspath( __file__ ))
num = 20
imgs = []
for i in range(1, num + 1):
    imgs.append(np.asarray(load_image("%s/miku/%s.jpg" % (SCRIPT_PATH, i))))
for i in range(1, num + 1):
    imgs.append(np.asarray(load_image("%s/no-miku/%s.jpg" % (SCRIPT_PATH, i))))
imgs = np.array(imgs)
y_data = np.r_[np.c_[np.ones(num), np.zeros(num)],np.c_[np.zeros(num), np.ones(num)]]
print imgs.shape
print y_data.shape

使用tflearn的load_image可以將一張圖片載入成PIL的物件,再搭配np.asarray轉成numpy的array
而label data,前20張圖片是初音的圖片,設成向量[1, 0],而後20張圖片不是初音的圖片,設成向量[0, 1]
最後出來的資料集維度跟label維度會是
(40, 100, 100, 3) # 40張100x100的圖片 RGB 3個channel
(40, 2) #40個label的資料

用同樣的方式載入驗證資料的圖片轉成numpy的array
前五張圖片是初音的圖片,後五張不是初音的圖片

x_test = []
for i in range(1, 11):
    x_test.append(np.asarray(load_image("%s/test-set/%s.jpg" % (SCRIPT_PATH, i))))
x_test =  np.array(x_test)
y_test = np.r_[np.c_[np.ones(5), np.zeros(5)],np.c_[np.zeros(5), np.ones(5)]]
print x_test.shape
print y_test.shape
#output

#(10, 100, 100, 3)

#(10, 2)

資料準備好後開始準備CNN網路,這邊直接copy官網的範例修改
可以看出相較原生tensorflow,需要顧慮的參數少了不少

# Building convolutional network

network = input_data(shape=[None, 100, 100, 3], name='input')
network = conv_2d(network, 64, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = conv_2d(network, 128, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = fully_connected(network, 512, activation='relu')
#network = dropout(network, 0.8)

network = fully_connected(network, 1024, activation='relu')
network = dropout(network, 0.8)
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.00001,
                     loss='categorical_crossentropy', name='target')

CNN網路建構好之後就可以開始訓練了
設定訓練500是,同時設定show出訓練過程
訓練完之後將model另存成miku_model.tflearn

# Training

model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit({'input': imgs}, {'target': y_data}, n_epoch=500,
           validation_set=({'input': x_test}, {'target': y_test}),
           snapshot_step=100,show_metric=True, run_id='convnet_miku')
model.save('miku_model.tflearn')

訓練中可以看到目前的訓練狀況
包含訓練方法,loss,正確度等等 很方便

---------------------------------
Run id: convnet_miku
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 40
Validation samples: 10
--
...
Training Step: 2  | total loss: 2.38131
K
| Adam | epoch: 002 | loss: 2.38131 - acc: 0.4950 | val_loss: 2.82867 - val_acc: 0.5000 -- iter: 40/40
Training Step: 2  | total loss: 2.38131

訓練完之後預設會輸出log到/tmp/tflearn_logs下面去,可以用tensorboard打開來看看

$tensorboard --logdir='/tmp/tflearn_logs'

上面指令執行後從 http://localhost:6006 可以看到訓練過程的變化還有神經網路的長相

儘管訓練資料不多,但是精准度還是來到了8,90%,效果還不錯

隨機抓圖測試

把model訓練出來了,接下來就上網找個兩三張圖測試吧
我準備了2張初音的圖t1.jpg跟t2.jpg以及兩張不是初音的圖t3.jpg跟t4.jpg放到/tmp/下
之後用下面的程式測試

首先載入剛剛訓練的model

import numpy as np
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
from tflearn.data_utils import load_image

# Building convolutional network

network = input_data(shape=[None, 100, 100, 3], name='input')
network = conv_2d(network, 64, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = conv_2d(network, 128, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = fully_connected(network, 512, activation='relu')
network = fully_connected(network, 1024, activation='relu')
network = dropout(network, 0.8)
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.00001,
                     loss='categorical_crossentropy', name='target')

model = tflearn.DNN(network)
model.load('miku_model.tflearn')

這邊最重要的是用model.load去載入剛才儲存的miku_model.tflearn
而model的結構必須跟訓練時一樣

model成功載入後,用剛剛隨機抓的四張圖片測試
由於圖片是隨機抓的,載入圖片的時候順便resize成100x100

#Load test data

imgs = []
num = 4
for i in range(1, num + 1):
    img = load_image("/tmp/t%s.jpg" % (i))
    img = img.resize((100,100))
    img_arr = np.asarray(img)
    imgs.append(img_arr)
imgs = np.array(imgs)

#predict

print np.round(model.predict(imgs))

#output

[[ 1.  0.]
 [ 1.  0.]
 [ 0.  1.]
 [ 0.  1.]]

結果輸出,model判斷前兩張圖片是初音,後兩張圖片不是初音,這效果看來還在預期之內

總結來說,tflearn算是還不錯用的tensorflow加強套件,他之後的演化應該是可以期待的

comments powered by Disqus