The chapter will help you understand the neural network regression with regards to CNTK.
Introduction
As we know that, in order to predict a numeric value from one or more predictor variables, we use regression. Letβs take an example of predicting the median value of a house in say one of the 100 towns. To do so, we have data that includes β
- A crime statistic for each town.
- The age of the houses in each town.
- A measure of the distance from each town to a prime location.
- The student-to-teacher ratio in each town.
- A racial demographic statistic for each town.
- The median house value in each town.
Based on these five predictor variables, we would like to predict median house value. And for this we can create a linear regression model along the lines ofβ
Y = a0+a1(crime)+a2(house-age)+(a3)(distance)+(a4)(ratio)+(a5)(racial)
In the above equation β
Y is a predicted median value
a0 is a constant and
a1 through a5 all are constants associated with the five predictors we discussed above.
We also have an alternate approach of using a neural network. It will create more accurate prediction model.
Here, we will be creating a neural network regression model by using CNTK.
Loading Dataset
To implement Neural Network regression using CNTK, we will be using Boston area house values dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at https://archive.ics.uci.edu/ml/machine-learning-databases/housing/. This dataset has total 14 variables and 506 instances.
But, for our implementation program we are going to use six of the 14 variables and 100 instances. Out of 6, 5 as predictors and one as a value-to-predict. From 100 instances, we will be using 80 for training and 20 for testing purpose. The value which we want to predict is the median house price in a town. Letβs see the five predictors we will be using β
- Crime per capita in the town β We would expect smaller values to be associated with this predictor.
- Proportion of owner β occupied units built before 1940 – We would expect smaller values to be associated with this predictor because larger value means older house.
- Weighed distance of the town to five Boston employment centers.
- Area school pupil-to-teacher ratio.
- An indirect metric of the proportion of black residents in the town.
Preparing training & test files
As we did before, first we need to convert the raw data into CNTK format. We are going to use first 80 data items for training purpose, so the tab-delimited CNTK format is as follows β
|predictors 1.612820 96.90 3.76 21.00 248.31 |medval 13.50 |predictors 0.064170 68.20 3.36 19.20 396.90 |medval 18.90 |predictors 0.097440 61.40 3.38 19.20 377.56 |medval 20.00 . . .
Next 20 items, also converted into CNTK format, will used for testing purpose.
Constructing Regression model
First, we need to process the data files in CNTK format and for that, we are going to use the helper function named create_reader as follows β
def create_reader(path, input_dim, output_dim, rnd_order, sweeps): x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False) y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False) streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams) mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps) return mb_src
Next, we need to create a helper function that accepts a CNTK mini-batch object and computes a custom accuracy metric.
def mb_accuracy(mb, x_var, y_var, model, delta): num_correct = 0 num_wrong = 0 x_mat = mb[x_var].asarray() y_mat = mb[y_var].asarray() for i in range(mb[x_var].shape[0]): v = model.eval(x_mat[i]) y = y_mat[i] if np.abs(v[0,0] β y[0,0]) < delta: num_correct += 1 else: num_wrong += 1 return (num_correct * 100.0)/(num_correct + num_wrong)
Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code β
def main(): print("Using CNTK version = " + str(C.__version__) + "\n") input_dim = 5 hidden_dim = 20 output_dim = 1 train_file = ".\\...\\" #provide the name of the training file(80 data items) test_file = ".\\...\\" #provide the name of the test file(20 data items)
Now, with the help of following code line our program will create the untrained NN β
X = C.ops.input_variable(input_dim, np.float32) Y = C.ops.input_variable(output_dim, np.float32) with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)): hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X) oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer) model = C.ops.alias(oLayer)
Now, once we have created the dual untrained model, we need to set up a Learner algorithm object. We are going to use SGD learner and squared_error loss function β
tr_loss = C.squared_error(model, Y) max_iter = 3000 batch_size = 5 base_learn_rate = 0.02 sch=C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2)) learner = C.sgd(model.parameters, sch) trainer = C.Trainer(model, (tr_loss), [learner])
Now, once we finish with Learning algorithm object, we need to create a reader function to read the training data β
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT) boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
Now, itβs time to train our NN model β
for i in range(0, max_iter): curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch) if i % int(max_iter/10) == 0: mcee = trainer.previous_minibatch_loss_average acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00) print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc))
Once we have done with training, letβs evaluate the model using test data items β
print("\nEvaluating test data \n") rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1) boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } num_test = 20 all_test = rdr.next_minibatch(num_test, input_map=boston_input_map) acc = mb_accuracy(all_test, X, Y, model, delta=3.00) print("Prediction accuracy = %0.2f%%" % acc)
After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data β
np.set_printoptions(precision = 2, suppress=True) unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32) print("\nPredicting median home value for feature/predictor values: ") print(unknown[0]) pred_prob = model.eval({X: unknown) print("\nPredicted value is: ") print(β$%0.2f (x1000)β %pred_value[0,0])
Complete Regression Model
import numpy as np import cntk as C def create_reader(path, input_dim, output_dim, rnd_order, sweeps): x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False) y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False) streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams) mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps) return mb_src def mb_accuracy(mb, x_var, y_var, model, delta): num_correct = 0 num_wrong = 0 x_mat = mb[x_var].asarray() y_mat = mb[y_var].asarray() for i in range(mb[x_var].shape[0]): v = model.eval(x_mat[i]) y = y_mat[i] if np.abs(v[0,0] β y[0,0]) < delta: num_correct += 1 else: num_wrong += 1 return (num_correct * 100.0)/(num_correct + num_wrong) def main(): print("Using CNTK version = " + str(C.__version__) + "\n") input_dim = 5 hidden_dim = 20 output_dim = 1 train_file = ".\\...\\" #provide the name of the training file(80 data items) test_file = ".\\...\\" #provide the name of the test file(20 data items) X = C.ops.input_variable(input_dim, np.float32) Y = C.ops.input_variable(output_dim, np.float32) with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)): hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X) oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer) model = C.ops.alias(oLayer) tr_loss = C.squared_error(model, Y) max_iter = 3000 batch_size = 5 base_learn_rate = 0.02 sch = C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2)) learner = C.sgd(model.parameters, sch) trainer = C.Trainer(model, (tr_loss), [learner]) rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT) boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } for i in range(0, max_iter): curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch) if i % int(max_iter/10) == 0: mcee = trainer.previous_minibatch_loss_average acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00) print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc)) print("\nEvaluating test data \n") rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1) boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } num_test = 20 all_test = rdr.next_minibatch(num_test, input_map=boston_input_map) acc = mb_accuracy(all_test, X, Y, model, delta=3.00) print("Prediction accuracy = %0.2f%%" % acc) np.set_printoptions(precision = 2, suppress=True) unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32) print("\nPredicting median home value for feature/predictor values: ") print(unknown[0]) pred_prob = model.eval({X: unknown) print("\nPredicted value is: ") print(β$%0.2f (x1000)β %pred_value[0,0]) if __name__== β__main__β: main()
Output
Using CNTK version = 2.7 batch 0: mean squared error = 385.6727, accuracy = 0.00% batch 300: mean squared error = 41.6229, accuracy = 20.00% batch 600: mean squared error = 28.7667, accuracy = 40.00% batch 900: mean squared error = 48.6435, accuracy = 40.00% batch 1200: mean squared error = 77.9562, accuracy = 80.00% batch 1500: mean squared error = 7.8342, accuracy = 60.00% batch 1800: mean squared error = 47.7062, accuracy = 60.00% batch 2100: mean squared error = 40.5068, accuracy = 40.00% batch 2400: mean squared error = 46.5023, accuracy = 40.00% batch 2700: mean squared error = 15.6235, accuracy = 60.00% Evaluating test data Prediction accuracy = 64.00% Predicting median home value for feature/predictor values: [0.09 50. 4.5 17. 350.] Predicted value is: $21.02(x1000)
Saving the trained model
This Boston Home value dataset has only 506 data items (among which we sued only 100). Hence, it would take only a few seconds to train the NN regressor model, but training on a large dataset having hundred or thousand data items can take hours or even days.
We can save our model, so that we wonβt have to retain it from scratch. With the help of following Python code, we can save our trained NN β
nn_regressor = β.\\neuralregressor.modelβ #provide the name of the file model.save(nn_regressor, format=C.ModelFormat.CNTKv2)
Following are the arguments of save() function used above β
- File name is the first argument of save() function. It can also be written along with the path of file.
- Another parameter is the format parameter which has a default value C.ModelFormat.CNTKv2.
Loading the trained model
Once you saved the trained model, itβs very easy to load that model. We only need to use the load () function. Letβs check this in following example β
import numpy as np import cntk as C model = C.ops.functions.Function.load(β.\\neuralregressor.modelβ) np.set_printoptions(precision = 2, suppress=True) unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32) print("\nPredicting area median home value for feature/predictor values: ") print(unknown[0]) pred_prob = model.eval({X: unknown) print("\nPredicted value is: ") print(β$%0.2f (x1000)β %pred_value[0,0])
The benefit of saved model is that once you load a saved model, it can be used exactly as if the model had just been trained.