要搞个人脸识别的应用,花了半天时间浏览一下,准备基于open face的模型来做移植。下面是对开源库face-recognition的使用指南进行一个翻译,看了一下基本知道了大致流程。不过我记得上次写过L softmx -> A softmx -> AM softmax的这些loss都是用在人脸识别里面的,但是如果基于softmax loss的话,每加一个人脸不都是要重新训练一波吗?不知道是不是这个情况,目前还没看到别的方式。

Deep face recognition with Keras, Dlib and OpenCV


该笔记本使用深度卷积神经网络(CNN)从输入图像中提取特征。它遵循1中描述的方法,其修改受OpenFace项目的启发。 Keras用于实现CNN,DlibOpenCV用于对齐面部在输入图像上。在LFW数据集的一小部分上评估面部识别性能,您可以将其替换为您自己的自定义数据集,例如:如果你想进一步试验这款笔记本,请附上你的家人和朋友的照片。在概述了CNN架构以及如何训练模型之后,将演示如何:

  • 在输入图像上检测,变换和裁剪面部。这可确保面部在进入CNN之前对齐。该预处理步骤对于神经网络的性能非常重要。
  • 使用CNN从对齐的输入图像中提取面部的128维表示或嵌入。在嵌入空间中,欧几里德距离直接对应于面部相似性的度量。
  • 将输入嵌入向量与数据库中标记的嵌入向量进行比较。这里,支持向量机(SVM)和KNN分类器,在标记的嵌入向量上训练,起到数据库的作用。在此上下文中的面部识别意味着使用这些分类器来预测标签,即新输入的身份。

Environment setup 环境设置

For running this notebook, create and activate a new virtual environment and install the packages listed in requirements.txt with pip install -r requirements.txt. Furthermore, you'll need a local copy of Dlib's face landmarks data file for running face alignment:

CNN architecture and training


from model import create_model

nn4_small2 = create_model()
W0801 21:29:26.376736 140043235366720 deprecation.py:506] From /home/zqh/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

模型训练旨在学习嵌入\(f(x)\)图像\(x\),使得相同身份的所有面部之间的平方L2距离较小,并且来自不同身份的一对面部之间的距离较大。当嵌入空间中的锚图像\(x^a_i\)和正图像\(x^p_i\)(相同身份)之间的距离小于两者之间的距离时,可以实现三元组损失 \(L\)。锚图像和负图像\(x^n_i\)(不同的身份)至少有一个边缘\(\alpha\)

\[ \begin{aligned} L = \sum^{m}_{i=1} \large[ \small {\mid \mid f(x_{i}^{a}) - f(x_{i}^{p})) \mid \mid_2^2} - {\mid \mid f(x_{i}^{a}) - f(x_{i}^{n})) \mid \mid_2^2} + \alpha \large ] \small_+ \end{aligned} \]

\([z]_+\)表示\(\max(z,0)\)\(m\)是训练集中三元组的数量。 Keras中的三重态损失最好用自定义层实现,因为损失函数不遵循通常的“损失(输入,目标)”模式。该层调用self.add_loss来安装三元组丢失:

from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Input, Layer
import tensorflow.python as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

# Input for anchor, positive and negative images
in_a = Input(shape=(96, 96, 3))
in_p = Input(shape=(96, 96, 3))
in_n = Input(shape=(96, 96, 3))

# Output for anchor, positive and negative embedding vectors
# The nn4_small model instance is shared (Siamese network)
emb_a = nn4_small2(in_a)
emb_p = nn4_small2(in_p)
emb_n = nn4_small2(in_n)

class TripletLossLayer(Layer):
def __init__(self, alpha, **kwargs):
self.alpha = alpha
super(TripletLossLayer, self).__init__(**kwargs)

def triplet_loss(self, inputs):
a, p, n = inputs
p_dist = K.sum(K.square(a-p), axis=-1)
n_dist = K.sum(K.square(a-n), axis=-1)
return K.sum(K.maximum(p_dist - n_dist + self.alpha, 0), axis=0)

def call(self, inputs):
loss = self.triplet_loss(inputs)
return loss

# Layer that computes the triplet loss from anchor, positive and negative embedding vectors
triplet_loss_layer = TripletLossLayer(alpha=0.2, name='triplet_loss_layer')([emb_a, emb_p, emb_n])

# Model that can be trained with anchor, positive negative images
nn4_small2_train = Model([in_a, in_p, in_n], triplet_loss_layer)


from data import triplet_generator

# triplet_generator() creates a generator that continuously returns
# ([a_batch, p_batch, n_batch], None) tuples where a_batch, p_batch
# and n_batch are batches of anchor, positive and negative RGB images
# each having a shape of (batch_size, 96, 96, 3).
generator = triplet_generator()

nn4_small2_train.compile(loss=None, optimizer='adam')
nn4_small2_train.fit_generator(generator, epochs=10, steps_per_epoch=100)

# Please note that the current implementation of the generator only generates
# random image data. The main goal of this code snippet is to demonstrate
# the general setup for model training. In the following, we will anyway
# use a pre-trained model so we don't need a generator here that operates
# on real training data. I'll maybe provide a fully functional generator
# later.
W0801 21:29:38.732154 140043235366720 training_utils.py:1101] Output triplet_loss_layer missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to triplet_loss_layer.
W0801 21:29:38.856654 140043235366720 deprecation.py:323] From /home/zqh/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Epoch 1/10
100/100 [==============================] - 19s 191ms/step - loss: 0.8117
Epoch 2/10
100/100 [==============================] - 5s 46ms/step - loss: 0.7971
Epoch 3/10
100/100 [==============================] - 5s 46ms/step - loss: 0.8035
Epoch 4/10
100/100 [==============================] - 5s 46ms/step - loss: 0.8018
Epoch 5/10
100/100 [==============================] - 5s 46ms/step - loss: 0.8049
Epoch 6/10
100/100 [==============================] - 5s 46ms/step - loss: 0.8009
Epoch 7/10
100/100 [==============================] - 5s 47ms/step - loss: 0.8003
Epoch 8/10
100/100 [==============================] - 5s 48ms/step - loss: 0.7995
Epoch 9/10
100/100 [==============================] - 5s 46ms/step - loss: 0.8004
Epoch 10/10
100/100 [==============================] - 5s 46ms/step - loss: 0.7998

<tensorflow.python.keras.callbacks.History at 0x7f5cda4385c0>


OpenFace项目提供了预训练模型,这些模型使用公共人脸识别数据集FaceScrub进行训练,和CASIA-WebFace。 Keras-OpenFace项目将预先训练的nn4.small2.v1模型的权重转换为CSV文件,然后进行转换这里x为一个二进制格式,可由Keras用load_weights加载:

nn4_small2_pretrained = create_model()

Custom dataset 自定义数据集


import numpy as np
import os.path

class IdentityMetadata():
def __init__(self, base, name, file):
# dataset base directory
self.base = base
# identity name
self.name = name
# image file name
self.file = file

def __repr__(self):
return self.image_path()

def image_path(self):
return os.path.join(self.base, self.name, self.file)

def load_metadata(path):
metadata = []
for i in sorted(os.listdir(path)):
for f in sorted(os.listdir(os.path.join(path, i))):
# Check file extension. Allow only jpg/jpeg' files.
ext = os.path.splitext(f)[1]
if ext == '.jpg' or ext == '.jpeg':
metadata.append(IdentityMetadata(path, i, f))
return np.array(metadata)

metadata = load_metadata('images')

Face alignment 面部对齐

nn4.small2.v1模型使用对齐的面部图像进行训练,因此,自定义数据集中的面部图像也必须对齐。在这里,我们使用Dlib进行人脸检测,使用OpenCV进行图像变换和裁剪,以生成对齐的96x96 RGB人脸图像。通过使用OpenFace项目中的AlignDlib实用程序,这很简单:

import cv2
import matplotlib.pyplot as plt
import matplotlib.patches as patches

from align import AlignDlib

%matplotlib inline

def load_image(path):
img = cv2.imread(path, 1)
# OpenCV loads images with color channels
# in BGR order. So we need to reverse them
return img[...,::-1]

# Initialize the OpenFace face alignment utility
alignment = AlignDlib('models/landmarks.dat')

# Load an image of Jacques Chirac
jc_orig = load_image(metadata[77].image_path())

# Detect face and return bounding box
bb = alignment.getLargestFaceBoundingBox(jc_orig)

# Transform image using specified face landmark indices and crop image to 96x96
jc_aligned = alignment.align(96, jc_orig, bb, landmarkIndices=AlignDlib.OUTER_EYES_AND_NOSE)

# Show original image

# Show original image with bounding box
plt.gca().add_patch(patches.Rectangle((bb.left(), bb.top()), bb.width(), bb.height(), fill=False, color='red'))

# Show aligned image

如OpenFace 预训练模型中所述部分,模型nn4.small2.v1需要地标索引OUTER_EYES_AND_NOSE。让我们将面部检测,转换和裁剪实现为align_image函数,以便以后重用。

def align_image(img):
return alignment.align(96, img, alignment.getLargestFaceBoundingBox(img),

Embedding vectors 嵌入向量


embedded = np.zeros((metadata.shape[0], 128))

for i, m in enumerate(metadata):
img = load_image(m.image_path())
img = align_image(img)
# scale RGB values to interval [0,1]
img = (img / 255.).astype(np.float32)
# obtain embedding vector for image
embedded[i] = nn4_small2_pretrained.predict(np.expand_dims(img, axis=0))[0]

Let's verify on a single triplet example that the squared L2 distance between its anchor-positive pair is smaller than the distance between its anchor-negative pair.


def distance(emb1, emb2):
return np.sum(np.square(emb1 - emb2))

def show_pair(idx1, idx2):
plt.suptitle(f'Distance = {distance(embedded[idx1], embedded[idx2]):.2f}')

show_pair(77, 78)
show_pair(77, 50)

正如预期的那样,Jacques Chirac的两幅图像之间的距离小于Jacques Chirac图像与GerhardSchröder图像之间的距离(0.30 <1.12)。但是我们仍然不知道距离阈值\(\tau\)是在相同身份不同身份之间作出决定的最佳边界。

Distance threshold 距离阈值

要查找$ $的最佳值,必须在一系列距离阈值上评估面部验证性能。在给定阈值处,所有可能的嵌入向量对被分类为相同的身份不同的身份并且与基础事实进行比较。因为我们正在处理偏斜的类(比正对更多的负对),我们使用F1得分作为评估指标而不是准确度

from sklearn.metrics import f1_score, accuracy_score

distances = [] # squared L2 distance between pairs
identical = [] # 1 if same identity, 0 otherwise

num = len(metadata)

for i in range(num - 1):
for j in range(1, num):
distances.append(distance(embedded[i], embedded[j]))
identical.append(1 if metadata[i].name == metadata[j].name else 0)

distances = np.array(distances)
identical = np.array(identical)

thresholds = np.arange(0.3, 1.0, 0.01)

f1_scores = [f1_score(identical, distances < t) for t in thresholds]
acc_scores = [accuracy_score(identical, distances < t) for t in thresholds]

opt_idx = np.argmax(f1_scores)
# Threshold at maximal F1 score
opt_tau = thresholds[opt_idx]
# Accuracy at maximal F1 score
opt_acc = accuracy_score(identical, distances < opt_tau)

# Plot F1 score and accuracy as function of distance threshold
plt.plot(thresholds, f1_scores, label='F1 score')
plt.plot(thresholds, acc_scores, label='Accuracy')
plt.axvline(x=opt_tau, linestyle='--', lw=1, c='lightgrey', label='Threshold')
plt.title(f'Accuracy at threshold {opt_tau:.2f} = {opt_acc:.3f}')
plt.xlabel('Distance threshold')
/home/zqh/miniconda3/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192
  return f(*args, **kwds)
/home/zqh/miniconda3/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
/home/zqh/miniconda3/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
/home/zqh/miniconda3/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192
  return f(*args, **kwds)

\(\tau\) = 0.56的面部验证准确率为95.7%。对于总是预测不同身份(有980个pos。对和8821个neg。对)的分类器的基线为89%,这也不错,但由于nn4.small2.v1是一个相对较小的模型,它仍然小于最先进的模型(> 99%)。


dist_pos = distances[identical == 1]
dist_neg = distances[identical == 0]


plt.axvline(x=opt_tau, linestyle='--', lw=1, c='lightgrey', label='Threshold')
plt.title('Distances (pos. pairs)')

plt.axvline(x=opt_tau, linestyle='--', lw=1, c='lightgrey', label='Threshold')
plt.title('Distances (neg. pairs)')

Face recognition 人脸识别

给定距离阈值$ \(的估计,人脸识别现在就像计算输入嵌入向量与数据库中所有嵌入向量之间的距离一样简单。如果输入小于\) $或标签unknown,则为输入分配具有最小距离的数据库条目的标签(即标识)。此过程还可以扩展到大型数据库,因为它可以轻松并行化。它还支持一次性学习,因为仅添加新标识的单个条目可能足以识别该标识的新示例。

更稳健的方法是使用数据库中的前$ k $评分条目标记输入,该条目基本上是KNN分类,具有欧几里德距离度量。或者,线性支持向量机可以用数据库条目训练并用于分类,即识别新输入。为了训练这些分类器,我们使用50%的数据集,用于评估其他50%。

from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import LinearSVC

targets = np.array([m.name for m in metadata])

encoder = LabelEncoder()

# Numerical encoding of identities
y = encoder.transform(targets)

train_idx = np.arange(metadata.shape[0]) % 2 != 0
test_idx = np.arange(metadata.shape[0]) % 2 == 0

# 50 train examples of 10 identities (5 examples each)
X_train = embedded[train_idx]
# 50 test examples of 10 identities (5 examples each)
X_test = embedded[test_idx]

y_train = y[train_idx]
y_test = y[test_idx]

knn = KNeighborsClassifier(n_neighbors=1, metric='euclidean')
svc = LinearSVC()

knn.fit(X_train, y_train)
svc.fit(X_train, y_train)

acc_knn = accuracy_score(y_test, knn.predict(X_test))
acc_svc = accuracy_score(y_test, svc.predict(X_test))

print(f'KNN accuracy = {acc_knn}, SVM accuracy = {acc_svc}')
KNN accuracy = 0.96, SVM accuracy = 0.98


import warnings
# Suppress LabelEncoder warning

example_idx = 6

example_image = load_image(metadata[test_idx][example_idx].image_path())
example_prediction = svc.predict([embedded[test_idx][example_idx]])
example_identity = encoder.inverse_transform(example_prediction)[0]

plt.title(f'Recognized as {example_identity}')


Dataset visualization 数据集可视化

为了将数据集嵌入到2D空间中以显示身份聚类,将t-distributed Stochastic Neighbor Embedding(t-SNE)应用于128维嵌入向量。除了一些异常值,身份集群很好地分开。

from sklearn.manifold import TSNE

X_embedded = TSNE(n_components=2).fit_transform(embedded)

for i, t in enumerate(set(targets)):
idx = targets == t
plt.scatter(X_embedded[idx, 0], X_embedded[idx, 1], label=t)

plt.legend(bbox_to_anchor=(1, 1))
