Deep-belief network with DL4J

A deep-belief network can be defined as a stack of restricted Boltzmann machines in which each RBM layer communicates with both the previous and subsequent layers.
The nodes of any single layer don’t communicate with each other laterally.
This stack of RBMs might end with a Softmax layer to create a classifier, or it may simply help cluster unlabeled data in an unsupervised learning scenario.
With the exception of the first and final layers, each layer in a deep-belief network has a double role: it serves as the hidden layer to the nodes that come before it, and as the input (or “visible”) layer to the nodes that come after.
Deep-belief networks are used to recognize, cluster and generate images, video sequences and motion-capture data. A continuous deep-belief network is simply an extension of a deep-belief network that accepts a continuum of decimals, rather than binary data. They were introduced by Geoff Hinton and his students in 2006.

Let’s try to create an example for Iris Dataset that is one of the benchmark datasets often used when measuring the precision or accuracy of a machine learning method. For the example I assume that the dataset contains 150 pieces of data out of 3 classes of 50 instances each, and each class refers to a type of Iris plant. The number of inputs is 4 and the number of outputs is therefore 3. One class is linearly separable from the other two; the latter are not linearly separable from each other.
The implementation begins by setting up the configuration. Here are the variables that need setting:

final int numRows = 4;
final int numColumns = 1;
int outputNum = 3;
int numSamples = 150;
int batchSize = 150;
int iterations = 5;
int splitTrainNum = (int) (batchSize * .8);
int seed = 123;
int listenerFreq = 1;

In DL4J, input data can be up to two-dimensional data, hence you need to assign the number of rows and columns of the data. As Iris is one-dimensional data, numColumns is set as 1. Here numSamples is the total data and batchSize is the amount of data in each mini-batch. Since the total data is 150 and it is relatively small, batchSize is set at the same number. This means that learning is done without splitting the data into mini-batches. splitTrainNum is the variable that decides the allocation between the training data and test data. Here, 80% of all the dataset is training data and 20% is the test data. The parameter listenerFreq decides how often we see loss function’s value for logging is seen in the process. This value is set to 1 here, which means the value is logged after each epoch.
Subsequently, we need to fetch the dataset. In DL4J, a class that can easily fetch data with respect to a typical dataset, such as Iris, MINST, and LFW, is prepared. Therefore, you can just write the following line if you would like to fetch the Iris dataset:

DataSetIterator iter = new IrisDataSetIterator(batchSize, numSamples);
The following two lines are to format data:
DataSet next = iter.next();
next.normalizeZeroMeanZeroUnitVariance();

This code splits the data into training data and test data and stores them respectively:

SplitTestAndTrain testAndTrain = next.splitTestAndTrain(splitTrainNum, new Random(seed));
DataSet train = testAndTrain.getTrain();
DataSet test = testAndTrain.getTest();

As you can see, it makes data handling easier by treating all the data DL4J prepares with the DataSet class.
Now, let’s actually build a model. The basic structure is as follows:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().layer().layer() … .layer().build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

The code begins by defining the model configuration and then builds and initializes the actual model with the definition. Let’s take a look at the configuration details. At the beginning, the whole network is set up:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(iterations)
.learningRate(1e-6f)
.optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
.l1(1e-1).regularization(true).l2(2e-4)
.useDropConnect(true)
.list(2)

The configuration setup is self-explanatory. However, since you haven’t learned about regularization before now, let’s briefly check it out.
Regularization prevents the neural networks model from overfitting and makes the model more generalized. To achieve this, the evaluation function E(w) is rewritten with the penalty term as follows:

1

Here,||.|| denotes the vector norm. The regularization is called L1 regularization when p=1 and L2 regularization when p=2. The norm is called L1 norm and L2 norm, respectively. That’s why we have .l1() and .l2() in the code 5 is the hyper parameter. These regularization terms make the model more sparse. L2 regularization is also called weight decay and is used to prevent the vanishing gradient problem.
The .useDropConnect() command is used to enable dropout and .list() to define the number of layers, excluding the input layer.
When you set up a whole model, then the next step is to configure each layer. In this sample code, the model is not defined as deep neural networks. One single RBM layer is defined as a hidden layer:

.layer(0, new RBM.Builder(RBM.HiddenUnit.RECTIFIED, RBM.VisibleUnit.GAUSSIAN)
.nIn(numRows * numColumns)
.nOut(3)
.weightInit(WeightInit.XAVIER)
.k(1)
.activation(“relu”)
.lossFunction(LossFunctions.LossFunction.RMSE_XENT)
.updater(Updater.ADAGRAD)
.dropOut(0.5)
.build()
)

Here, the value of 0 in the first line is the layer’s index and .k() is for contrastive divergence. Since Iris’ data is of float values, we can’t use binary RBM. That’s why we have RBM.VisibleUnit.GAUSSIAN here, enabling the model to handle continuous values. Also, as for the definition of this layer, what should be especially mentioned is the role of Updater.ADAGRAD. This is used to optimize the learning rate. The subsequent output layer is very simple and self-explanatory:

.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.nIn(3)
.nOut(outputNum)
.activation(“softmax”)
.build()
)

Thus, the neural networks have been built with three layers : input layer, hidden layer, and output layer. The graphical model of this example can be illustrated as follows:

6

After the model building, we need to train the networks. Here, again, the code is super simple:
model.setListeners(Arrays.asList((IterationListener) new ScoreIterationListener(listenerFreq)));
model.fit(train);
Because the first line is to log the process, what we need to do to train the model is just to write model.fit().
Testing or evaluating the model is also easy with DL4J. First, the variables for evaluation are set up as follows:
Evaluation eval = new Evaluation(outputNum);
INDArray output = model.output(test.getFeatureMatrix());
Then, we can get the values of the feature matrix using:
eval.eval(test.getLabels(), output);
log.info(eval.stats());
By running the code, we will have the result as follows:

==========================Scores=====================================
Accuracy: 0.7667
Precision: 1
Recall: 0.7667
F1 Score: 0.8679245283018869
=====================================================================

F1 Score, also called F-Score or F-measure, is the harmonic means of precision and recall, and is represented as follows:

7

This value is often calculated to measure the model’s performance as well. Also, as written in the example, you can see the actual values and predicted values by writing the following:

for (int i = 0; i < output.rows(); i++) {
String actual = test.getLabels().getRow(i).toString().trim();
String predicted = output.getRow(i).toString().trim();
log.info(“actual ” + actual + ” vs predicted ” + predicted);
}

That’s it for the whole training and test process. The neural networks in the preceding code are not deep, but you can easily build deep neural networks just by changing the configuration as follows:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(iterations)
.learningRate(1e-6f)
.optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
.l1(1e-1).regularization(true).l2(2e-4)
.useDropConnect(true)
.list(3)
.layer(0, new RBM.Builder(RBM.HiddenUnit.RECTIFIED, RBM.VisibleUnit.GAUSSIAN)
.nIn(numRows * numColumns)
.nOut(4)
.weightInit(WeightInit.XAVIER)
.k(1)
.activation(“relu”)
.lossFunction(LossFunctions.LossFunction.RMSE_XENT)
.updater(Updater.ADAGRAD)
.dropOut(0.5)
.build()
)
.layer(1, new RBM.Builder(RBM.HiddenUnit.RECTIFIED, RBM.VisibleUnit.GAUSSIAN)
.nIn(4)
.nOut(3)
.weightInit(WeightInit.XAVIER)
.k(1)
.activation(“relu”)
.lossFunction(LossFunctions.LossFunction.RMSE_XENT)
.updater(Updater.ADAGRAD)
.dropOut(0.5)
.build()
)
.layer(2, new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.nIn(3)
.nOut(outputNum)
.activation(“softmax”)
.build()
)
.build();

As you can see, building deep neural networks requires just simple implementations with DL4J. Once you set up the model, what you need to do is adjust the parameters. For example, increasing the iterations value or changing the seed value would return a better result.

References:

Java Deep Learning Essentials
By: Yusuke Sugomori

deeplearning4j.org

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *

*