之前的玩玩tensorflow-mnist手寫辨識初探了tensorflow跟CNN的威力之後
今天心血來潮,想寫個自動判別一張圖片裏面有沒有初音的神經網路
正好看到了用tensorflow實現的深度學習演算法的高階套件tflearn
- tflearn官網: http://tflearn.org/
- tflearn的MNIST範例: https://github.com/tflearn/tflearn/blob/master/examples/images/convnet_mnist.py
tflearn官網開宗名義就點明了他的定位Deep learning library featuring a higher-level API for TensorFlow.
,純用TensorFlow的時候必須自己去打點每個神經元的輸入輸出維度定義等等,而tflearn就常用的deep learning方法封裝成更容易使用的API,使用上也更為直覺
開始正題
整篇分兩部分
- 準備訓練資料訓練CNN
- 隨機抓圖測試
準備訓練資料訓練CNN
我想要訓練一個網路去幫我辨識一張圖片是不是初音的圖片
首先上網隨機抓了20張初音的JPG圖片然後全部resize成100x100
再隨便抓20張不是初音的圖片然後全部resize成100x100
上面的40張圖片當作我的training set
在隨機抓5張初音的圖片跟5張不是初音的圖片resize成100x100後當作我的validation set
這樣一來訓練資料就準備好了: https://github.com/del680202/MachineLearning-memo/blob/master/src/tensorflow/cnn_dataset_mini.zip
樣本有點少,但是理論上深度學習就是少少的資料也能做出不錯的效果才是
事先裝好numpy等環境之後,tensorflow跟tflearn的用pip就能簡單裝起來了
numpy等環境個人常用Anaconda,https://www.continuum.io/downloads
tensorflow跟tflearn的安裝
參考:https://www.tensorflow.org/get_started/os_setup
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py2-none-any.whl
$ pip install $TF_BINARY_URL
$ pip install tflearn
把剛才的測試資料準備好解壓縮分別放到miku,no-miku,test-set三個目錄之後
準備一個cnn.py檔案依序填入以下內容
首先把該import的填一填
import tensorflow as tf
import numpy as np
import os
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
from tflearn.data_utils import load_image
載入訓練資料的圖片轉成numpy的array
SCRIPT_PATH = os.path.dirname(os.path.abspath( __file__ ))
num = 20
imgs = []
for i in range(1, num + 1):
imgs.append(np.asarray(load_image("%s/miku/%s.jpg" % (SCRIPT_PATH, i))))
for i in range(1, num + 1):
imgs.append(np.asarray(load_image("%s/no-miku/%s.jpg" % (SCRIPT_PATH, i))))
imgs = np.array(imgs)
y_data = np.r_[np.c_[np.ones(num), np.zeros(num)],np.c_[np.zeros(num), np.ones(num)]]
print imgs.shape
print y_data.shape
使用tflearn的load_image可以將一張圖片載入成PIL的物件,再搭配np.asarray轉成numpy的array
而label data,前20張圖片是初音的圖片,設成向量[1, 0],而後20張圖片不是初音的圖片,設成向量[0, 1]
最後出來的資料集維度跟label維度會是
(40, 100, 100, 3) # 40張100x100的圖片 RGB 3個channel
(40, 2) #40個label的資料
用同樣的方式載入驗證資料的圖片轉成numpy的array
前五張圖片是初音的圖片,後五張不是初音的圖片
x_test = []
for i in range(1, 11):
x_test.append(np.asarray(load_image("%s/test-set/%s.jpg" % (SCRIPT_PATH, i))))
x_test = np.array(x_test)
y_test = np.r_[np.c_[np.ones(5), np.zeros(5)],np.c_[np.zeros(5), np.ones(5)]]
print x_test.shape
print y_test.shape
#output
#(10, 100, 100, 3)
#(10, 2)
資料準備好後開始準備CNN網路,這邊直接copy官網的範例修改
可以看出相較原生tensorflow,需要顧慮的參數少了不少
# Building convolutional network
network = input_data(shape=[None, 100, 100, 3], name='input')
network = conv_2d(network, 64, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = conv_2d(network, 128, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = fully_connected(network, 512, activation='relu')
#network = dropout(network, 0.8)
network = fully_connected(network, 1024, activation='relu')
network = dropout(network, 0.8)
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.00001,
loss='categorical_crossentropy', name='target')
CNN網路建構好之後就可以開始訓練了
設定訓練500是,同時設定show出訓練過程
訓練完之後將model另存成miku_model.tflearn
# Training
model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit({'input': imgs}, {'target': y_data}, n_epoch=500,
validation_set=({'input': x_test}, {'target': y_test}),
snapshot_step=100,show_metric=True, run_id='convnet_miku')
model.save('miku_model.tflearn')
訓練中可以看到目前的訓練狀況
包含訓練方法,loss,正確度等等 很方便
---------------------------------
Run id: convnet_miku
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 40
Validation samples: 10
--
...
Training Step: 2 | total loss: 2.38131
K
| Adam | epoch: 002 | loss: 2.38131 - acc: 0.4950 | val_loss: 2.82867 - val_acc: 0.5000 -- iter: 40/40
Training Step: 2 | total loss: 2.38131
訓練完之後預設會輸出log到/tmp/tflearn_logs
下面去,可以用tensorboard打開來看看
$tensorboard --logdir='/tmp/tflearn_logs'
上面指令執行後從 http://localhost:6006 可以看到訓練過程的變化還有神經網路的長相
儘管訓練資料不多,但是精准度還是來到了8,90%,效果還不錯
隨機抓圖測試
把model訓練出來了,接下來就上網找個兩三張圖測試吧
我準備了2張初音的圖t1.jpg跟t2.jpg以及兩張不是初音的圖t3.jpg跟t4.jpg放到/tmp/下
之後用下面的程式測試
首先載入剛剛訓練的model
import numpy as np
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
from tflearn.data_utils import load_image
# Building convolutional network
network = input_data(shape=[None, 100, 100, 3], name='input')
network = conv_2d(network, 64, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = conv_2d(network, 128, 5, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = fully_connected(network, 512, activation='relu')
network = fully_connected(network, 1024, activation='relu')
network = dropout(network, 0.8)
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.00001,
loss='categorical_crossentropy', name='target')
model = tflearn.DNN(network)
model.load('miku_model.tflearn')
這邊最重要的是用model.load
去載入剛才儲存的miku_model.tflearn
而model的結構必須跟訓練時一樣
model成功載入後,用剛剛隨機抓的四張圖片測試
由於圖片是隨機抓的,載入圖片的時候順便resize成100x100
#Load test data
imgs = []
num = 4
for i in range(1, num + 1):
img = load_image("/tmp/t%s.jpg" % (i))
img = img.resize((100,100))
img_arr = np.asarray(img)
imgs.append(img_arr)
imgs = np.array(imgs)
#predict
print np.round(model.predict(imgs))
#output
[[ 1. 0.]
[ 1. 0.]
[ 0. 1.]
[ 0. 1.]]
結果輸出,model判斷前兩張圖片是初音,後兩張圖片不是初音,這效果看來還在預期之內
總結來說,tflearn算是還不錯用的tensorflow加強套件,他之後的演化應該是可以期待的