Let us understand, what is neural network binary classification using CNTK, in this chapter.
Binary classification using NN is like multi-class classification, the only thing is that there are just two output nodes instead of three or more. Here, we are going to perform binary classification using a neural network by using two techniques namely one-node and two-node technique. One-node technique is more common than two-node technique.
Loading Dataset
For both these techniques to implement using NN, we will be using banknote dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at https://archive.ics.uci.edu/ml/datasets/banknote+authentication.
For our example, we will be using 50 authentic data items having class forgery = 0, and the first 50 fake items having class forgery = 1.
Preparing training & test files
There are 1372 data items in the full dataset. The raw dataset looks as follows −
3.6216, 8.6661, -2.8076, -0.44699, 0 4.5459, 8.1674, -2.4586, -1.4621, 0 … -1.3971, 3.3191, -1.3927, -1.9948, 1 0.39012, -0.14279, -0.031994, 0.35084, 1
Now, first we need to convert this raw data into two-node CNTK format, which would be as follows −
|stats 3.62160000 8.66610000 -2.80730000 -0.44699000 |forgery 0 1 |# authentic |stats 4.54590000 8.16740000 -2.45860000 -1.46210000 |forgery 0 1 |# authentic . . . |stats -1.39710000 3.31910000 -1.39270000 -1.99480000 |forgery 1 0 |# fake |stats 0.39012000 -0.14279000 -0.03199400 0.35084000 |forgery 1 0 |# fake
You can use the following python program to create CNTK-format data from Raw data −
fin = open(".\\...", "r") #provide the location of saved dataset text file. for line in fin: line = line.strip() tokens = line.split(",") if tokens[4] == "0": print("|stats %12.8f %12.8f %12.8f %12.8f |forgery 0 1 |# authentic" % \ (float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) ) else: print("|stats %12.8f %12.8f %12.8f %12.8f |forgery 1 0 |# fake" % \ (float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) ) fin.close()
Two-node binary Classification model
There is very little difference between two-node classification and multi-class classification. Here we first, need to process the data files in CNTK format and for that we are going to use the helper function named create_reader as follows −
def create_reader(path, input_dim, output_dim, rnd_order, sweeps): x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False) y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False) streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams) mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps) return mb_src
Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −
def main(): print("Using CNTK version = " + str(C.__version__) + "\n") input_dim = 4 hidden_dim = 10 output_dim = 2 train_file = ".\\...\\" #provide the name of the training file test_file = ".\\...\\" #provide the name of the test file
Now, with the help of following code line our program will create the untrained NN −
X = C.ops.input_variable(input_dim, np.float32) Y = C.ops.input_variable(output_dim, np.float32) with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)): hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X) oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer) nnet = oLayer model = C.ops.softmax(nnet)
Now, once we created the dual untrained model, we need to set up a Learner algorithm object and afterwards use it to create a Trainer training object. We are going to use SGD learner and cross_entropy_with_softmax loss function −
tr_loss = C.cross_entropy_with_softmax(nnet, Y) tr_clas = C.classification_error(nnet, Y) max_iter = 500 batch_size = 10 learn_rate = 0.01 learner = C.sgd(nnet.parameters, learn_rate) trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
Now, once we finished with Trainer object, we need to create a reader function to read the training data −
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
Now, it is time to train our NN model −
for i in range(0, max_iter): curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch) if i % 500 == 0: mcee = trainer.previous_minibatch_loss_average macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100 print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))
Once training is completed, let us evaluate the model using test data items −
print("\nEvaluating test data \n") rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } num_test = 20 all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100 print("Classification accuracy = %0.2f%%" % acc)
After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −
np.set_printoptions(precision = 1, suppress=True) unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32) print("\nPredicting Banknote authenticity for input features: ") print(unknown[0]) pred_prob = model.eval(unknown) np.set_printoptions(precision = 4, suppress=True) print("Prediction probabilities are: ") print(pred_prob[0]) if pred_prob[0,0] < pred_prob[0,1]: print(“Prediction: authentic”) else: print(“Prediction: fake”)
Complete Two-node Classification Model
def create_reader(path, input_dim, output_dim, rnd_order, sweeps): x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False) y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False) streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams) mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps) return mb_src def main(): print("Using CNTK version = " + str(C.__version__) + "\n") input_dim = 4 hidden_dim = 10 output_dim = 2 train_file = ".\\...\\" #provide the name of the training file test_file = ".\\...\\" #provide the name of the test file X = C.ops.input_variable(input_dim, np.float32) Y = C.ops.input_variable(output_dim, np.float32) withC.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)): hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X) oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer) nnet = oLayer model = C.ops.softmax(nnet) tr_loss = C.cross_entropy_with_softmax(nnet, Y) tr_clas = C.classification_error(nnet, Y) max_iter = 500 batch_size = 10 learn_rate = 0.01 learner = C.sgd(nnet.parameters, learn_rate) trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner]) rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } for i in range(0, max_iter): curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch) if i % 500 == 0: mcee = trainer.previous_minibatch_loss_average macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100 print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc)) print("\nEvaluating test data \n") rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } num_test = 20 all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100 print("Classification accuracy = %0.2f%%" % acc) np.set_printoptions(precision = 1, suppress=True) unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32) print("\nPredicting Banknote authenticity for input features: ") print(unknown[0]) pred_prob = model.eval(unknown) np.set_printoptions(precision = 4, suppress=True) print("Prediction probabilities are: ") print(pred_prob[0]) if pred_prob[0,0] < pred_prob[0,1]: print(“Prediction: authentic”) else: print(“Prediction: fake”) if __name__== ”__main__”: main()
Output
Using CNTK version = 2.7 batch 0: mean loss = 0.6928, accuracy = 80.00% batch 50: mean loss = 0.6877, accuracy = 70.00% batch 100: mean loss = 0.6432, accuracy = 80.00% batch 150: mean loss = 0.4978, accuracy = 80.00% batch 200: mean loss = 0.4551, accuracy = 90.00% batch 250: mean loss = 0.3755, accuracy = 90.00% batch 300: mean loss = 0.2295, accuracy = 100.00% batch 350: mean loss = 0.1542, accuracy = 100.00% batch 400: mean loss = 0.1581, accuracy = 100.00% batch 450: mean loss = 0.1499, accuracy = 100.00% Evaluating test data Classification accuracy = 84.58% Predicting banknote authenticity for input features: [0.6 1.9 -3.3 -0.3] Prediction probabilities are: [0.7847 0.2536] Prediction: fake
One-node binary Classification model
The implementation program is almost like we have done above for two-node classification. The main change is that when using the two-node classification technique.
We can use the CNTK built-in classification_error() function, but in case of one-node classification CNTK doesn’t support classification_error() function. That’s the reason we need to implement a program-defined function as follows −
def class_acc(mb, x_var, y_var, model): num_correct = 0; num_wrong = 0 x_mat = mb[x_var].asarray() y_mat = mb[y_var].asarray() for i in range(mb[x_var].shape[0]): p = model.eval(x_mat[i] y = y_mat[i] if p[0,0] < 0.5 and y[0,0] == 0.0 or p[0,0] >= 0.5 and y[0,0] == 1.0: num_correct += 1 else: num_wrong += 1 return (num_correct * 100.0)/(num_correct + num_wrong)
With that change let’s see the complete one-node classification example −
Complete one-node Classification Model
import numpy as np import cntk as C def create_reader(path, input_dim, output_dim, rnd_order, sweeps): x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False) y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False) streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams) mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps) return mb_src def class_acc(mb, x_var, y_var, model): num_correct = 0; num_wrong = 0 x_mat = mb[x_var].asarray() y_mat = mb[y_var].asarray() for i in range(mb[x_var].shape[0]): p = model.eval(x_mat[i] y = y_mat[i] if p[0,0] < 0.5 and y[0,0] == 0.0 or p[0,0] >= 0.5 and y[0,0] == 1.0: num_correct += 1 else: num_wrong += 1 return (num_correct * 100.0)/(num_correct + num_wrong) def main(): print("Using CNTK version = " + str(C.__version__) + "\n") input_dim = 4 hidden_dim = 10 output_dim = 1 train_file = ".\\...\\" #provide the name of the training file test_file = ".\\...\\" #provide the name of the test file X = C.ops.input_variable(input_dim, np.float32) Y = C.ops.input_variable(output_dim, np.float32) with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)): hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X) oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer) model = oLayer tr_loss = C.cross_entropy_with_softmax(model, Y) max_iter = 1000 batch_size = 10 learn_rate = 0.01 learner = C.sgd(model.parameters, learn_rate) trainer = C.Trainer(model, (tr_loss), [learner]) rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT) banknote_input_map = {X : rdr.streams.x_src, Y : rdr.streams.y_src } for i in range(0, max_iter): curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch) if i % 100 == 0: mcee=trainer.previous_minibatch_loss_average ca = class_acc(curr_batch, X,Y, model) print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, ca)) print("\nEvaluating test data \n") rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } num_test = 20 all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = class_acc(all_test, X,Y, model) print("Classification accuracy = %0.2f%%" % acc) np.set_printoptions(precision = 1, suppress=True) unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32) print("\nPredicting Banknote authenticity for input features: ") print(unknown[0]) pred_prob = model.eval({X:unknown}) print("Prediction probability: ") print(“%0.4f” % pred_prob[0,0]) if pred_prob[0,0] < 0.5: print(“Prediction: authentic”) else: print(“Prediction: fake”) if __name__== ”__main__”: main()
Output
Using CNTK version = 2.7 batch 0: mean loss = 0.6936, accuracy = 10.00% batch 100: mean loss = 0.6882, accuracy = 70.00% batch 200: mean loss = 0.6597, accuracy = 50.00% batch 300: mean loss = 0.5298, accuracy = 70.00% batch 400: mean loss = 0.4090, accuracy = 100.00% batch 500: mean loss = 0.3790, accuracy = 90.00% batch 600: mean loss = 0.1852, accuracy = 100.00% batch 700: mean loss = 0.1135, accuracy = 100.00% batch 800: mean loss = 0.1285, accuracy = 100.00% batch 900: mean loss = 0.1054, accuracy = 100.00% Evaluating test data Classification accuracy = 84.00% Predicting banknote authenticity for input features: [0.6 1.9 -3.3 -0.3] Prediction probability: 0.8846 Prediction: fake