Quantcast
Channel: 未分类 –懒得折腾
Viewing all 759 articles
Browse latest View live

Deep learning for complete beginners

$
0
0

Deep learning for complete beginners: Recognising handwritten digitsby Cambridge Coding Academy | Download notebook

Introduction

Welcome to the first in a series of blog posts that is designed to get you quickly up to speed with deep learning; from first principles, all the way to discussions of some of the intricate details, with the purposes of achieving respectable performance on two established machine learning benchmarks: MNIST (classification of handwritten digits) and CIFAR-10 (classification of small images across 10 distinct classes—airplane, automobile, bird, cat, deer, dog, frog, horse, ship & truck).

MNIST CIFAR-10

The accelerated growth of deep learning has lead to the development of several very convenient frameworks, which allow us to rapidly construct and prototype our models, as well as offering a no-hassle access to established benchmarks such as the aforementioned two. The particular environment we will be using is Keras, which I’ve found to be the most convenient and intuitive for essential use, but still expressive enough to allow detailed model tinkering when it is necessary.

By the end of this part of the tutoral, you should be capable of understanding and producing a simple multilayer perceptron (MLP) deep learning model in Keras, achieving a respectable level of accuracy on MNIST. The next tutorial in the series will explore techniques for handling larger image classification tasks (such as CIFAR-10).

(Artificial) neurons

While the term “deep learning” allows for a broader interpretation, in pratice, for a vast majority of cases, it is applied to the model of (artificial) neural networks. These biologically inspired structures attempt to mimic the way in which the neurons in the brain process percepts from the environment and drive decision-making. In fact, a single artificial neuron (sometimes also called a perceptron) has a very simple mode of operation—it computes a weighted sum of all of itsinputs xx→, using a weight vector ww→ (along with an additive bias term, w0w0), and then potentially applies an activation function, σσ, to the result.

Some of the popular choices for activation functions include (plots given below):
identity: σ(z)=zσ(z)=z;
sigmoid: especially the logistic function, σ(z)=11+exp(z)σ(z)=11+exp⁡(−z), and the hyperbolic tangent, σ(z)=tanhzσ(z)=tanh⁡z;
rectified linear (ReLU): σ(z)=max(0,z)σ(z)=max(0,z).

Original perceptron models (from the 1950s) were fully linear, i.e. they only employed the identity activation. It soon became evident that tasks of interests are often nonlinear in nature, which lead to usage of other activation functions. Sigmoid functions (owing their name to their characteristic “S” shaped plot) provide a nice way to encode initial “uncertainty” of a neuron in a binary decision, when zz is close to zero, coupled with quick saturation as zz shifts in either direction. The two functions presented here are very similar, with the hyperbolic tangent giving outputs within [1,1][−1,1], and the logistic function giving outputs within [0,1][0,1] (and therefore being useful for representing probabilities).

In recent years, ReLU activations (and variations thereof) have become ubiquitous in deep learning—they started out as a simple, “engineer’s” way to inject nonlinearity into the model (“if it’s negative, set it to zero”), but turned out to be far more successful than the historically more popular sigmoid activations, and also have been linked to the way physical neurons transmit electrical potential. As such, we will focus exclusively on them in this tutorial.

A neuron is completely specified by its weight vector ww→, and the key aim of a learning algorithm is to assign appropriate weights to the neuron based on a training set of known input/output pairs, such that the notion of a “predictive error/loss” will be minimised when applying the inputs within the training set to the neuron. One typical example of such a learning algorithm is gradient descent, which will, for a particular loss function E(w)E(w→), update the weight vector in the direction of steepest descent of the loss function, scaled by a learning rate parameter ηη:

wwηE(w)ww→←w→−η∂E(w→)∂w→

The loss function represents our belief of how “incorrect” the neuron is at making predictions under its current parameter values. The simplest such choice of a loss function (that usually works best for general environments) is thesquared error loss; for a particular training example (x,y)(x→,y) it is defined as the squared difference between the ground-truth label yy and the output of the neuron when given xx→ as input:

E(w)=(yσ(w0+ni=1wixi))2E(w→)=(y−σ(w0+∑i=1nwixi))2

There are many excellent tutorials online that provide a more in-depth overview of gradient descent algorithms—one of which may be found on this website! Here the framework will take care of the optimisation routines for us, and therefore I will not dedicate further attention to them.

Enter neural networks (& deep learning)

Once we have a notion of a neuron, it is possible to connect outputs of neurons to inputs of other neurons, giving rise to neural networks. In general we will focus on feedforward neural networks, where these neurons are typically organised in layers, such that the neurons within a single layer process the outputs of the previous layer. The most potent of such architectures (a multilayer perceptron or MLP) fully connects all outputs of a layer to all the neurons in the following layer, as illustrated below.

The output neurons’ weights can be updated by direct application of the previously mentioned gradient descent on a given loss function—for other neurons these losses need to be propagated backwards (by applying the chain rule for partial differentiation), thus giving rise to the backpropagation algorithm. Similarly as for the basic gradient descent algorithm, I will not focus on the mathematical derivations of the algorithm here, as the framework will be taking care of it for us.

By Cybenko’s universal approximation theorem, a (wide enough) MLP with a single hidden layer of sigmoid neurons is capable of approximating any continuous real function on a bounded interval; however, the proof of this theorem is not constructive, and therefore does not offer an efficient training algorithm for learning such structures in general. Deep learning represents a response to this: rather than increasing the width, increase the depth; by definition, anyneural network with more than one hidden layer is considered deep.

The shift in depth also often allows us to directly feed raw input data into the network; in the past, single-layer neural networks were ran on features extracted from the input by carefully crafted feature functions. This meant that significantly different approaches were needed for, e.g. the problems of computer vision, speech recognition and natural language processing, impeding scientific collaboration across these fields. However, when a network has multiple hidden layers, it gains the capability to learn the feature functions that best describe the raw data by itself, thus being applicable to end-to-end learning and allowing one to use the same kind of networks across a wide variety of tasks, eliminating the need for designing feature functions from the pipeline. I will demonstrate graphical evidence of this in the second part of this tutorial, when we will explore convolutional neural networks (CNNs).

Applying a deep MLP to MNIST

As this post’s objective, we will implement the simplest possible deep neural network—an MLP with two hidden layers—and apply it on the MNIST handwritten digit recognition task.

Only the following imports are required:

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense # the two types of neural network layer we will be using
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values

Next up, we’ll define some parameters of our model. These are often called hyperparameters, because they are assumed to be fixed before training starts. For the purposes of this tutorial, we will stick to using some sensible values, but keep in mind that properly training them is a significant issue, which will be addressed more properly in a future tutorial.

In particular, we will define:
– The batch size, representing the number of training examples being used simultaneously during a single iteration of the gradient descent algorithm;
– The number of epochs, representing the number of times the training algorithm will iterate over the entire training set before terminating;
– The number of neurons in each of the two hidden layers of the MLP.

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 20 # we iterate twenty times over the entire training set
hidden_size = 512 # there will be 512 neurons in both hidden layers

Now it is time to load and preprocess the MNIST data set. Keras makes this extremely simple, with a fixed interface for fetching and extracting the data from the remote server, directly into NumPy arrays.

To preprocess the input data, we will first flatten the images into 1D (as we will consider each pixel as a separate input feature), and we will then force the pixel intensity values to be in the [0,1][0,1] range by dividing them by 255255. This is a very simple way to “normalise” the data, and I will be discussing other ways in future tutorials in this series.

A good approach to a classification problem is to use probabilistic classification, i.e. to have a single output neuron for each class, outputting a value which corresponds to the probability of the input being of that particular class. This implies a need to transform the training output data into a “one-hot” encoding: for example, if the desired output class is 33, and there are five classes overall (labelled 00 to 44), then an appropriate one-hot encoding is: [0 0 0 1 0][0 0 0 1 0]. Keras, once again, provides us with an out-of-the-box functionality for doing just that.

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(num_train, height * width) # Flatten data to 1D
X_test = X_test.reshape(num_test, height * width) # Flatten data to 1D
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

Now is the time to actually define our model! To do this we will be using a stack of three Dense layers, which correspond to a fully unrestricted MLP structure, linking all of the outputs of the previous layer to the inputs of the next one. We will use ReLU activations for the neurons in the first two layers, and a softmax activation for the neurons in the final one. This activation is designed to turn any real-valued vector into a vector of probabilities, and is defined as follows, for the jj-th neuron:

σ(z)j=exp(zj)iexp(zi)σ(z→)j=exp⁡(zj)∑iexp⁡(zi)

An excellent feature of Keras, that sets it apart from frameworks such as TensorFlow, is automatic inference of shapes; we only need to specify the shape of the input layer, and afterwards Keras will take care of initialising the weight variables with proper shapes. Once all the layers have been defined, we simply need to identify the input(s) and the output(s) in order to define our model, as illustrated below.

inp = Input(shape=(height * width,)) # Our input is a 1D vector of size 784
hidden_1 = Dense(hidden_size, activation='relu')(inp) # First hidden ReLU layer
hidden_2 = Dense(hidden_size, activation='relu')(hidden_1) # Second hidden ReLU layer
out = Dense(num_classes, activation='softmax')(hidden_2) # Output softmax layer

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

To finish off specifying the model, we need to define our loss function, the optimisation algorithm to use, and which metrics to report.

When dealing with probabilistic classification, it is actually better to use the cross-entropy loss, rather than the previously defined squared error. For a particular output probability vector yy→, compared with our (ground truth) one-hot vector ^yy^→, the loss (for kk-class classification) is defined by

L(y,^y)=ki=1^yilnyiL(y→,y^→)=−∑i=1ky^iln⁡yi

This loss is better for probabilistic tasks (i.e. ones with logistic/softmax output neurons), primarily because of its manner of derivation—it aims only to maximise the model’s confidence in the correct class, and is not concerned with the distribution of probabilities for other classes (while the squared error loss would dedicate equal attention to getting all of the other class probabilities as close to zero as possible). This is due to the fact that incorrect classes, i.e. classes ii′with ^yi=0y^i′=0, eliminate the respective neuron’s output from the loss function.

The optimisation algorithm used will typically revolve around some form of gradient descent; their key differences revolve around the manner in which the previously mentioned learning rate, ηη, is chosen or adapted during training. An excellent overview of such approaches is given by this blog post; here we will use the Adam optimiser, which typically performs well.

As our classes are balanced (there is an equal amount of handwritten digits across all ten classes), an appropriate metric to report is the accuracy; the proportion of the inputs classified correctly.

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

Finally, we call the training algorithm with the determined batch size and epoch count. It is good practice to set aside a fraction of the training data to be used just for verification that our algorithm is (still) properly generalising (this is commonly referred to as the validation set); here we will hold out 10%10% of the data for this purpose.

An excellent out-of-the-box feature of Keras is verbosity; it’s able to provide detailed real-time pretty-printing of the training algorithm’s progress.

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Train on 54000 samples, validate on 6000 samples
Epoch 1/20
54000/54000 [==============================] - 9s - loss: 0.2295 - acc: 0.9325 - val_loss: 0.1093 - val_acc: 0.9680
Epoch 2/20
54000/54000 [==============================] - 9s - loss: 0.0819 - acc: 0.9746 - val_loss: 0.0922 - val_acc: 0.9708
Epoch 3/20
54000/54000 [==============================] - 11s - loss: 0.0523 - acc: 0.9835 - val_loss: 0.0788 - val_acc: 0.9772
Epoch 4/20
54000/54000 [==============================] - 12s - loss: 0.0371 - acc: 0.9885 - val_loss: 0.0680 - val_acc: 0.9808
Epoch 5/20
54000/54000 [==============================] - 12s - loss: 0.0274 - acc: 0.9909 - val_loss: 0.0772 - val_acc: 0.9787
Epoch 6/20
54000/54000 [==============================] - 12s - loss: 0.0218 - acc: 0.9931 - val_loss: 0.0718 - val_acc: 0.9808
Epoch 7/20
54000/54000 [==============================] - 12s - loss: 0.0204 - acc: 0.9933 - val_loss: 0.0891 - val_acc: 0.9778
Epoch 8/20
54000/54000 [==============================] - 13s - loss: 0.0189 - acc: 0.9936 - val_loss: 0.0829 - val_acc: 0.9795
Epoch 9/20
54000/54000 [==============================] - 14s - loss: 0.0137 - acc: 0.9950 - val_loss: 0.0835 - val_acc: 0.9797
Epoch 10/20
54000/54000 [==============================] - 13s - loss: 0.0108 - acc: 0.9969 - val_loss: 0.0836 - val_acc: 0.9820
Epoch 11/20
54000/54000 [==============================] - 13s - loss: 0.0123 - acc: 0.9960 - val_loss: 0.0866 - val_acc: 0.9798
Epoch 12/20
54000/54000 [==============================] - 13s - loss: 0.0162 - acc: 0.9951 - val_loss: 0.0780 - val_acc: 0.9838
Epoch 13/20
54000/54000 [==============================] - 12s - loss: 0.0093 - acc: 0.9968 - val_loss: 0.1019 - val_acc: 0.9813
Epoch 14/20
54000/54000 [==============================] - 12s - loss: 0.0075 - acc: 0.9976 - val_loss: 0.0923 - val_acc: 0.9818
Epoch 15/20
54000/54000 [==============================] - 12s - loss: 0.0118 - acc: 0.9965 - val_loss: 0.1176 - val_acc: 0.9772
Epoch 16/20
54000/54000 [==============================] - 12s - loss: 0.0119 - acc: 0.9961 - val_loss: 0.0838 - val_acc: 0.9803
Epoch 17/20
54000/54000 [==============================] - 12s - loss: 0.0073 - acc: 0.9976 - val_loss: 0.0808 - val_acc: 0.9837
Epoch 18/20
54000/54000 [==============================] - 13s - loss: 0.0082 - acc: 0.9974 - val_loss: 0.0926 - val_acc: 0.9822
Epoch 19/20
54000/54000 [==============================] - 12s - loss: 0.0070 - acc: 0.9979 - val_loss: 0.0808 - val_acc: 0.9835
Epoch 20/20
54000/54000 [==============================] - 11s - loss: 0.0039 - acc: 0.9987 - val_loss: 0.1010 - val_acc: 0.9822
10000/10000 [==============================] - 1s





[0.099321320021623111, 0.9819]

As can be seen, our model achieves an accuracy of 98.2%∼98.2% on the test set; this is quite respectable for such a simple model, despite being outclassed by state-of-the-art approaches enumerated here.

I encourage you to play around with this model: attempt different hyperparameter values/optimisation algorithms/activation functions, add more hidden layers, etc. Eventually, you should be able to achieve accuracies above 99%99%.

Conclusion

Throughout this post we have covered the essentials of deep learning, and successfully implemented a simple two-layer deep MLP in Keras, applying it to MNIST, all in under 30 lines of code.

Next time around, we will explore convolutional neural networks (CNNs), resolving some of the issues posed by applying MLPs to larger image tasks (such as CIFAR-10).

ABOUT THE AUTHOR

Petar Veličković

Petar is currently a Research Assistant in Computational Biology within the Artificial Intelligence Group of the Cambridge University Computer Laboratory, where he is working on developing machine learning algorithms on complex networks, and their applications to bioinformatics. He is also a PhD student within the group, supervised by Dr Pietro Liò and affiliated with Trinity College. He holds a BA degree in Computer Science from the University of Cambridge, having completed the Computer Science Tripos in 2015.

Just show me the code!

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense # the two types of neural network layer we will be using
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 20 # we iterate twenty times over the entire training set
hidden_size = 512 # there will be 512 neurons in both hidden layers

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(num_train, height * width) # Flatten data to 1D
X_test = X_test.reshape(num_test, height * width) # Flatten data to 1D
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

inp = Input(shape=(height * width,)) # Our input is a 1D vector of size 784
hidden_1 = Dense(hidden_size, activation='relu')(inp) # First hidden ReLU layer
hidden_2 = Dense(hidden_size, activation='relu')(hidden_1) # Second hidden ReLU layer
out = Dense(num_classes, activation='softmax')(hidden_2) # Output softmax layer

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!

 

Deep learning for complete beginners: Using convolutional nets to recognise imagesby Cambridge Coding Academy | Download notebook

Introduction

Welcome to the second in a series of blog posts that is designed to get you quickly up to speed with deep learning; from first principles, all the way to discussions of some of the intricate details, with the purposes of achieving respectable performance on two established machine learning benchmarks: MNIST (classification of handwritten digits) and CIFAR-10 (classification of small images across 10 distinct classes—airplane, automobile, bird, cat, deer, dog, frog, horse, ship & truck).

MNIST CIFAR-10

Last time around, I have introduced the fundamental concepts of deep learning, and illustrated how models can be rapidly developed and prototyped by leveraging the Keras deep learning framework. Ultimately, a two-layer multilayer perceptron (MLP) was applied to MNIST, achieving an accuracy level of 98.2%98.2%, which can be quite easily improved upon. But ultimately, fully connected MLPs will usually not be the model of choice for image-related tasks—it is far more typical to make advantage of a convolutional neural network (CNN) in this case. By the end of this part of the tutoral, you should be capable of understanding and producing a simple CNN in Keras, achieving a respectable level of accuracy on CIFAR-10.

This tutorial will, for the most part, assume familiarity with the previous one in the series.

Image processing

The previously mentioned multilayer perceptrons represent the most general and powerful feedforward neural network model possible; they are organised in layers, such that every neuron within a layer receives its own copy of all the outputs of the previous layer as its input. This kind of model is perfect for the right kind of problem—learning from a fixed number of (more or less) unstructured parameters.

However, consider what happens to the number of parameters (weights) of such a model when being fed raw image data. CIFAR-10, for example, contains 32×32×332×32×3 coloured images: if we are to treat each channel of each pixel as an independent input to an MLP, each neuron of the first hidden layer adds 3000∼3000 new parameters to the model! The situation quickly becomes unmanageable as image sizes grow larger, way before reaching the kind of images people usually want to work with in real applications.

A common solution is to downsample the images to a size where MLPs can safely be applied. However, if we directly downsample the image, we potentially lose a wealth of information; it would be great if we would somehow be able to still do some useful (without causing an explosion in parameter count) processing of the image, prior to performing the downsampling.

Convolutions

It turns out that there is a very efficient way of pulling this off, and it makes advantage of the structure of the information encoded within an image—it is assumed that pixels that are spatially closer together will “cooperate” on forming a particular feature of interest much more than ones on opposite corners of the image. Also, if a particular (smaller) feature is found to be of great importance when defining an image’s label, it will be equally important if this feature was found anywhere within the image, regardless of location.

Enter the convolution operator. Given a two-dimensional image, II, and a small matrix, KK of size h×wh×w, (known as a convolution kernel), which we assume encodes a way of extracting an interesting image feature, we compute the convolved image, IKI∗K, by overlaying the kernel on top of the image in all possible ways, and recording the sum of elementwise products between the image and the kernel:

(IK)xy=hi=1wj=1KijIx+i1,y+j1(I∗K)xy=∑i=1h∑j=1wKij⋅Ix+i−1,y+j−1

(in fact, the exact definition would require us to flip the kernel matrix first, but for the purposes of machine learning it is irrelevant whether this is done)

The images below show a diagrammatical overview of the above formula and the result of applying convolution (with two separate kernels) over an image, to act as an edge detector:


Convolutional and pooling layers

The convolution operator forms the fundamental basis of the convolutional layer of a CNN. The layer is completely specified by a certain number of kernels, KK→ (along with additive biases, bb→, per each kernel), and it operates by computing the convolution of the output images of a previous layer with each of those kernels, afterwards adding the biases (one per each output image). Finally, an activation function, σσ, may be applied to all of the pixels of the output images. Typically, the input to a convolutional layer will have dd channels (e.g. red/green/blue in the input layer), in which case the kernels are extended to have this number of channels as well, making the final formula of a single output image channel of a convolutional layer (for a kernel KK and bias bb) as follows:

conv(I,K)xy=σ(b+hi=1wj=1dk=1KijkIx+i1,y+j1,k)conv(I,K)xy=σ(b+∑i=1h∑j=1w∑k=1dKijk⋅Ix+i−1,y+j−1,k)

Note that, since all we’re doing here is addition and scaling of the input pixels, the kernels may be learned from a given training dataset via gradient descent, exactly as the weights of an MLP. In fact, an MLP is perfectly capable of replicating a convolutional layer, but it would require a lot more training time (and data) to learn to approximate that mode of operation.

Finally, let’s just note that a convolutional operator is in no way restricted to two-dimensionally structured data: in fact, most machine learning frameworks (Keras included) will provide you with out-of-the-box layers for 1D and 3D convolutions as well!

It is important to note that, while a convolutional layer significantly decreases the number of parameters compared to a fully connected (FC) layer, it introduces more hyperparameters—parameters whose values need to be chosenbefore training starts.

Namely, the hyperparameters to choose within a single convolutional layer are:
depth: how many different kernels (and biases) will be convolved with the output of the previous layer;
height and width of each kernel;
stride: by how much we shift the kernel in each step to compute the next pixel in the result. This specifies the overlap between individual output pixels, and typically it is set to 11, corresponding to the formula given before. Note that larger strides result in smaller output sizes.
padding: note that convolution by any kernel larger than 1×11×1 will decrease the output image size—it is often desirable to keep sizes the same, in which case the image is sufficiently padded with zeroes at the edges. This is often called“same” padding, as opposed to “valid” (no) padding. It is possible to add arbitrary levels of padding, but typically the padding of choice will be either same or valid.

As already hinted, convolutions are not typically meant to be the sole operation in a CNN (although there have been promising recent developments on all-convolutional networks); but rather to extract useful features of an image prior to downsampling it sufficiently to be manageable by an MLP.

A very popular approach to downsampling is a pooling layer, which consumes small and (usually) disjoint chunks of the image (typically 2×22×2) and aggregates them into a single value. There are several possible schemes for the aggregation—the most popular being max-pooling, where the maximum pixel value within each chunk is taken. A diagrammatical illustration of 2×22×2 max-pooling is given below.

Putting it all together: a common CNN

Now that we got all the building blocks, let’s see what a typical convolutional neural network might look like!

A typical CNN architecture for a kk-class image classification can be split into two distinct parts—a chain of repeating ConvPoolConv→Pool layers (sometimes with more than one convolutional layer at once), followed by a few fully connected layers (taking each pixel of the computed images as an independent input), culminating in a kk-way softmax layer, to which a cross-entropy loss is optimised. I did not draw the activation functions here to make the sketch clearer, but do keep in mind that typically after every convolutional or fully connected layer, an activation (e.g. ReLU) will be applied to all of the outputs.

Note the effect of a single ConvPoolConv→Pool pass through the image: it reduces height and width of the individual channels in favour of their number, i.e. depth.

The softmax layer and cross-entropy loss are both introduced in more detail in the previous tutorial. For summarisation purposes, a softmax layer’s purpose is converting any vector of real numbers into a vector of probabilities(nonnegative real values that add up to 1). Within this context, the probabilities correspond to the likelihoods that an input image is a member of a particular class. Minimising the cross-entropy loss has the effect of maximising the model’s confidence in the correct class, without being concerned for the probabilites for other classes—this makes it a more suitable choice for probabilistic tasks compared to, for example, the squared error loss.

Detour: Overfitting, regularisation and dropout

This will be the first (and hopefully the only) time when I will divert your attention to a seemingly unrelated topic. It regards a very important pitfall of machine learning—overfitting a model to the training data. While this is primarily going to be a major topic of the next tutorial in the series, the negative effects of overfitting will tend to become quite noticeable on the networks like the one we are about to build, and we need to introduce a way to properly protect ourselves against it, before going any further. Luckily, there is a very simple technique we can use.

Overfitting corresponds to adapting our model to the training set to such extremes that its generalisation potential (performance on samples outside of the training set) is severely limited. In other words, our model might have learned the training set (along with any noise present within it) perfectly, but it has failed to capture the underlying process that generated it. To illustrate, consider a problem of fitting a sine curve, with white additive noise applied to the data points:

Here we have a training set (denoted by blue circles) derived from the original sine wave, along with some noise. If we fit a degree-3 polynomial to this data, we get a fairly good approximation to the original curve. Someone might argue that a degree-14 polynomial would do better; indeed, given we have 15 points, such a fit would perfectly describe the training data. However, in this case, the additional parameters of the model cause catastrophic results: to cope with the inherent noise of the data, anywhere except in the closest vicinity of the training points, our fit is completely off.

Deep convolutional neural networks have a large number of parameters, especially in the fully connected layers. Overfitting might often manifest in the following form: if we don’t have sufficiently many training examples, a small group of neurons might become responsible for doing most of the processing and other neurons becoming redundant; or in the other extreme, some neurons might actually become detrimental to performance, with several other neurons of their layer ending up doing nothing else but correcting for their errors.

To help our models generalise better in these circumstances, we introduce techniques of regularisation: rather than reducing the number of parameters, we impose constraints on the model parameters during training to keep them from learning the noise in the training data. The particular method I will introduce here is dropout—a technique that initially might seem like “dark magic”, but actually helps to eliminate exactly the failure modes described above. Namely, dropout with parameter pp will, within a single training iteration, go through all neurons in a particular layer and, with probability pp, completely eliminate them from the network throughout the iteration. This has the effect of forcing the neural network to cope with failures, and not to rely on existence of a particular neuron (or set of neurons)—relying more on a consensus of several neurons within a layer. This is a very simple technique that works quite well already for combatting overfitting on its own, without introducing further regularisers. An illustration is given below.

Applying a deep CNN to CIFAR-10

As this post’s objective, we will implement a deep convolutional neural network—and apply it on the CIFAR-10 image classification task.

Imports are largely similar to last time, apart from the fact that we will be using a wider variety of layers:

from keras.datasets import cifar10 # subroutines for fetching the CIFAR-10 dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Convolution2D, MaxPooling2D, Dense, Dropout, Flatten
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values
import numpy as np
Using Theano backend.

As already mentioned, a CNN will typically have more hyperparameters than an MLP. For the purposes of this tutorial, we will also stick to “sensible” hand-picked values for them, but do still keep in mind that later on I will introduce a more proper method for learning them.

The hyperparameters are:
– The batch size, representing the number of training examples being used simultaneously during a single iteration of the gradient descent algorithm;
– The number of epochs, representing the number of times the training algorithm will iterate over the entire training set before terminating*;
– The kernel sizes in the convolutional layers;
– The pooling size in the pooling layers;
– The number of kernels in the convolutional layers;
– The dropout probability (we will apply dropout after each pooling, and after the fully connected layer);
– The number of neurons in the fully connected layer of the MLP.

* N.B. here I have set the number of epochs to 200, which might be undesirably slow if you do not have a GPU at your disposal (the convolution layers are going to pose a significant performance bottleneck in this case). You might wish to decrease the epoch count and/or numbers of kernels if you are going to be training the network on a CPU.

batch_size = 32 # in each iteration, we consider 32 training examples at once
num_epochs = 200 # we iterate 200 times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth_1 = 32 # we will initially have 32 kernels per conv. layer...
conv_depth_2 = 64 # ...switching to 64 after the first pooling layer
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 512 # the FC layer will have 512 neurons

Loading and preprocessing the CIFAR-10 dataset is done in exactly the same way as for MNIST, with Keras routines doing most of the work. The sole difference is that now we do not initially consider each pixel an independent input feature, and therefore we do not reshape the input to 1D. We will once again force the pixel intensity values to be in the [0,1][0,1], and use a one-hot encoding for the output labels.

However, this time around, this stage will be done in a more general way, to allow you to adapt it more easily to new datasets: the sizes will be extracted from the dataset rather than hardcoded, the number of classes is inferred from the number of unique labels in the training set, and the normalisation is performed via division by the maximum value in the training set.

N.B. we will divide the testing set by the maximum of the training set, because our algorithms are not allowed to see the testing data before the learning process is complete, and therefore we are not allowed to compute any statistics on it, other than performing transformations derived entirely from the training set.

(X_train, y_train), (X_test, y_test) = cifar10.load_data() # fetch CIFAR-10 data

num_train, depth, height, width = X_train.shape # there are 50000 training examples in CIFAR-10 
num_test = X_test.shape[0] # there are 10000 test examples in CIFAR-10
num_classes = np.unique(y_train).shape[0] # there are 10 image classes

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= np.max(X_train) # Normalise data to [0, 1] range
X_test /= np.max(X_train) # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

Modelling time! Our network will consist of four Convolution2D layers, with a MaxPooling2D layer following after the second and the fourth convolution. After the first pooling layer, we double the number of kernels (in line with the previously mentioned principle of sacrificing height and width for more depth). Afterwards, the output of the second pooling layer is flattened to 1D (via the Flatten layer), and passed through two fully connected (Dense) layers. ReLU activations will once again be used for all layers except the output dense layer, which will use a softmax activation (for purposes of probabilistic classification).

To regularise our model, a Dropout layer is applied after each pooling layer, and after the first Dense layer. This is another area where Keras shines compared to other frameworks: it has an internal flag that automatically enables or disables dropout, depending on whether the model is currently used for training or testing.

The remainder of the model specification exactly matches our previous setup for MNIST:
– We use the cross-entropy loss function as the objective to optimise (as its derivation is more appropriate for probabilistic tasks);
– We use the Adam optimiser for gradient descent;
– We report the accuracy of the model (as the dataset is balanced across the ten classes)*;
– We hold out 10% of the data for validation purposes.

* To get a feeling for why accuracy might be inappropriate for unbalanced datasets, consider an extreme case where 90% of the test data belongs to class xx (this could be, for example, the task of diagnosing patients for an extremely rare disease). In this case, a classifier that just outputs xx achieves a seemingly impressive accuracy of 90% on the test data, without really doing any learning/generalisation.

inp = Input(shape=(depth, height, width)) # N.B. depth goes first in Keras!
# Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer)
conv_1 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(inp)
conv_2 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_1)
pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
drop_1 = Dropout(drop_prob_1)(pool_1)
# Conv [64] -> Conv [64] -> Pool (with dropout on the pooling layer)
conv_3 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(drop_1)
conv_4 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_3)
pool_2 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_4)
drop_2 = Dropout(drop_prob_1)(pool_2)
# Now flatten to 1D, apply FC -> ReLU (with dropout) -> softmax
flat = Flatten()(drop_2)
hidden = Dense(hidden_size, activation='relu')(flat)
drop_3 = Dropout(drop_prob_2)(hidden)
out = Dense(num_classes, activation='softmax')(drop_3)

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Train on 45000 samples, validate on 5000 samples
Epoch 1/200
45000/45000 [==============================] - 9s - loss: 1.5435 - acc: 0.4359 - val_loss: 1.2057 - val_acc: 0.5672
Epoch 2/200
45000/45000 [==============================] - 9s - loss: 1.1544 - acc: 0.5886 - val_loss: 0.9679 - val_acc: 0.6566
Epoch 3/200
45000/45000 [==============================] - 8s - loss: 1.0114 - acc: 0.6418 - val_loss: 0.8807 - val_acc: 0.6870
Epoch 4/200
45000/45000 [==============================] - 8s - loss: 0.9183 - acc: 0.6766 - val_loss: 0.7945 - val_acc: 0.7224
Epoch 5/200
45000/45000 [==============================] - 9s - loss: 0.8507 - acc: 0.6994 - val_loss: 0.7531 - val_acc: 0.7400
Epoch 6/200
45000/45000 [==============================] - 9s - loss: 0.8064 - acc: 0.7161 - val_loss: 0.7174 - val_acc: 0.7496
Epoch 7/200
45000/45000 [==============================] - 9s - loss: 0.7561 - acc: 0.7331 - val_loss: 0.7116 - val_acc: 0.7622
Epoch 8/200
45000/45000 [==============================] - 9s - loss: 0.7156 - acc: 0.7476 - val_loss: 0.6773 - val_acc: 0.7670
Epoch 9/200
45000/45000 [==============================] - 9s - loss: 0.6833 - acc: 0.7594 - val_loss: 0.6855 - val_acc: 0.7644
Epoch 10/200
45000/45000 [==============================] - 9s - loss: 0.6580 - acc: 0.7656 - val_loss: 0.6608 - val_acc: 0.7748
Epoch 11/200
45000/45000 [==============================] - 9s - loss: 0.6308 - acc: 0.7750 - val_loss: 0.6854 - val_acc: 0.7730
Epoch 12/200
45000/45000 [==============================] - 9s - loss: 0.6035 - acc: 0.7832 - val_loss: 0.6853 - val_acc: 0.7744
Epoch 13/200
45000/45000 [==============================] - 9s - loss: 0.5871 - acc: 0.7914 - val_loss: 0.6762 - val_acc: 0.7748
Epoch 14/200
45000/45000 [==============================] - 8s - loss: 0.5693 - acc: 0.8000 - val_loss: 0.6868 - val_acc: 0.7740
Epoch 15/200
45000/45000 [==============================] - 9s - loss: 0.5555 - acc: 0.8036 - val_loss: 0.6835 - val_acc: 0.7792
Epoch 16/200
45000/45000 [==============================] - 9s - loss: 0.5370 - acc: 0.8126 - val_loss: 0.6885 - val_acc: 0.7774
Epoch 17/200
45000/45000 [==============================] - 9s - loss: 0.5270 - acc: 0.8134 - val_loss: 0.6604 - val_acc: 0.7866
Epoch 18/200
45000/45000 [==============================] - 9s - loss: 0.5090 - acc: 0.8194 - val_loss: 0.6652 - val_acc: 0.7860
Epoch 19/200
45000/45000 [==============================] - 9s - loss: 0.5066 - acc: 0.8193 - val_loss: 0.6632 - val_acc: 0.7858
Epoch 20/200
45000/45000 [==============================] - 9s - loss: 0.4938 - acc: 0.8248 - val_loss: 0.6844 - val_acc: 0.7872
Epoch 21/200
45000/45000 [==============================] - 9s - loss: 0.4684 - acc: 0.8361 - val_loss: 0.6861 - val_acc: 0.7904
Epoch 22/200
45000/45000 [==============================] - 9s - loss: 0.4696 - acc: 0.8365 - val_loss: 0.6349 - val_acc: 0.7980
Epoch 23/200
45000/45000 [==============================] - 9s - loss: 0.4584 - acc: 0.8387 - val_loss: 0.6592 - val_acc: 0.7926
Epoch 24/200
45000/45000 [==============================] - 9s - loss: 0.4410 - acc: 0.8443 - val_loss: 0.6822 - val_acc: 0.7876
Epoch 25/200
45000/45000 [==============================] - 8s - loss: 0.4404 - acc: 0.8454 - val_loss: 0.7103 - val_acc: 0.7784
Epoch 26/200
45000/45000 [==============================] - 8s - loss: 0.4276 - acc: 0.8512 - val_loss: 0.6783 - val_acc: 0.7858
Epoch 27/200
45000/45000 [==============================] - 8s - loss: 0.4152 - acc: 0.8542 - val_loss: 0.6657 - val_acc: 0.7944
Epoch 28/200
45000/45000 [==============================] - 9s - loss: 0.4107 - acc: 0.8549 - val_loss: 0.6861 - val_acc: 0.7888
Epoch 29/200
45000/45000 [==============================] - 9s - loss: 0.4115 - acc: 0.8548 - val_loss: 0.6634 - val_acc: 0.7996
Epoch 30/200
45000/45000 [==============================] - 9s - loss: 0.4057 - acc: 0.8586 - val_loss: 0.7166 - val_acc: 0.7896
Epoch 31/200
45000/45000 [==============================] - 9s - loss: 0.3992 - acc: 0.8605 - val_loss: 0.6734 - val_acc: 0.7998
Epoch 32/200
45000/45000 [==============================] - 9s - loss: 0.3863 - acc: 0.8637 - val_loss: 0.7263 - val_acc: 0.7844
Epoch 33/200
45000/45000 [==============================] - 9s - loss: 0.3933 - acc: 0.8644 - val_loss: 0.6953 - val_acc: 0.7860
Epoch 34/200
45000/45000 [==============================] - 9s - loss: 0.3838 - acc: 0.8663 - val_loss: 0.7040 - val_acc: 0.7916
Epoch 35/200
45000/45000 [==============================] - 9s - loss: 0.3800 - acc: 0.8674 - val_loss: 0.7233 - val_acc: 0.7970
Epoch 36/200
45000/45000 [==============================] - 9s - loss: 0.3775 - acc: 0.8697 - val_loss: 0.7234 - val_acc: 0.7922
Epoch 37/200
45000/45000 [==============================] - 9s - loss: 0.3681 - acc: 0.8746 - val_loss: 0.6751 - val_acc: 0.7958
Epoch 38/200
45000/45000 [==============================] - 9s - loss: 0.3679 - acc: 0.8732 - val_loss: 0.7014 - val_acc: 0.7976
Epoch 39/200
45000/45000 [==============================] - 9s - loss: 0.3540 - acc: 0.8769 - val_loss: 0.6768 - val_acc: 0.8022
Epoch 40/200
45000/45000 [==============================] - 9s - loss: 0.3531 - acc: 0.8783 - val_loss: 0.7171 - val_acc: 0.7986
Epoch 41/200
45000/45000 [==============================] - 9s - loss: 0.3545 - acc: 0.8786 - val_loss: 0.7164 - val_acc: 0.7930
Epoch 42/200
45000/45000 [==============================] - 9s - loss: 0.3453 - acc: 0.8799 - val_loss: 0.7078 - val_acc: 0.7994
Epoch 43/200
45000/45000 [==============================] - 8s - loss: 0.3488 - acc: 0.8798 - val_loss: 0.7272 - val_acc: 0.7958
Epoch 44/200
45000/45000 [==============================] - 9s - loss: 0.3471 - acc: 0.8797 - val_loss: 0.7110 - val_acc: 0.7916
Epoch 45/200
45000/45000 [==============================] - 9s - loss: 0.3443 - acc: 0.8810 - val_loss: 0.7391 - val_acc: 0.7952
Epoch 46/200
45000/45000 [==============================] - 9s - loss: 0.3342 - acc: 0.8841 - val_loss: 0.7351 - val_acc: 0.7970
Epoch 47/200
45000/45000 [==============================] - 9s - loss: 0.3311 - acc: 0.8842 - val_loss: 0.7302 - val_acc: 0.8008
Epoch 48/200
45000/45000 [==============================] - 9s - loss: 0.3320 - acc: 0.8868 - val_loss: 0.7145 - val_acc: 0.8002
Epoch 49/200
45000/45000 [==============================] - 9s - loss: 0.3264 - acc: 0.8883 - val_loss: 0.7640 - val_acc: 0.7942
Epoch 50/200
45000/45000 [==============================] - 9s - loss: 0.3247 - acc: 0.8880 - val_loss: 0.7289 - val_acc: 0.7948
Epoch 51/200
45000/45000 [==============================] - 9s - loss: 0.3279 - acc: 0.8886 - val_loss: 0.7340 - val_acc: 0.7910
Epoch 52/200
45000/45000 [==============================] - 9s - loss: 0.3224 - acc: 0.8901 - val_loss: 0.7454 - val_acc: 0.7914
Epoch 53/200
45000/45000 [==============================] - 9s - loss: 0.3219 - acc: 0.8916 - val_loss: 0.7328 - val_acc: 0.8016
Epoch 54/200
45000/45000 [==============================] - 9s - loss: 0.3163 - acc: 0.8919 - val_loss: 0.7442 - val_acc: 0.7996
Epoch 55/200
45000/45000 [==============================] - 9s - loss: 0.3071 - acc: 0.8962 - val_loss: 0.7427 - val_acc: 0.7898
Epoch 56/200
45000/45000 [==============================] - 9s - loss: 0.3158 - acc: 0.8944 - val_loss: 0.7685 - val_acc: 0.7920
Epoch 57/200
45000/45000 [==============================] - 8s - loss: 0.3126 - acc: 0.8942 - val_loss: 0.7717 - val_acc: 0.8062
Epoch 58/200
45000/45000 [==============================] - 9s - loss: 0.3156 - acc: 0.8919 - val_loss: 0.6993 - val_acc: 0.7984
Epoch 59/200
45000/45000 [==============================] - 9s - loss: 0.3030 - acc: 0.8970 - val_loss: 0.7359 - val_acc: 0.8016
Epoch 60/200
45000/45000 [==============================] - 9s - loss: 0.3022 - acc: 0.8969 - val_loss: 0.7427 - val_acc: 0.7954
Epoch 61/200
45000/45000 [==============================] - 9s - loss: 0.3072 - acc: 0.8950 - val_loss: 0.7829 - val_acc: 0.7996
Epoch 62/200
45000/45000 [==============================] - 9s - loss: 0.2977 - acc: 0.8996 - val_loss: 0.8096 - val_acc: 0.7958
Epoch 63/200
45000/45000 [==============================] - 9s - loss: 0.3033 - acc: 0.8983 - val_loss: 0.7424 - val_acc: 0.7972
Epoch 64/200
45000/45000 [==============================] - 9s - loss: 0.2985 - acc: 0.9003 - val_loss: 0.7779 - val_acc: 0.7930
Epoch 65/200
45000/45000 [==============================] - 8s - loss: 0.2931 - acc: 0.9004 - val_loss: 0.7302 - val_acc: 0.8010
Epoch 66/200
45000/45000 [==============================] - 8s - loss: 0.2948 - acc: 0.8994 - val_loss: 0.7861 - val_acc: 0.7900
Epoch 67/200
45000/45000 [==============================] - 9s - loss: 0.2911 - acc: 0.9026 - val_loss: 0.7502 - val_acc: 0.7918
Epoch 68/200
45000/45000 [==============================] - 9s - loss: 0.2951 - acc: 0.9001 - val_loss: 0.7911 - val_acc: 0.7820
Epoch 69/200
45000/45000 [==============================] - 9s - loss: 0.2869 - acc: 0.9026 - val_loss: 0.8025 - val_acc: 0.8024
Epoch 70/200
45000/45000 [==============================] - 8s - loss: 0.2933 - acc: 0.9013 - val_loss: 0.7703 - val_acc: 0.7978
Epoch 71/200
45000/45000 [==============================] - 8s - loss: 0.2902 - acc: 0.9007 - val_loss: 0.7685 - val_acc: 0.7962
Epoch 72/200
45000/45000 [==============================] - 9s - loss: 0.2920 - acc: 0.9025 - val_loss: 0.7412 - val_acc: 0.7956
Epoch 73/200
45000/45000 [==============================] - 8s - loss: 0.2861 - acc: 0.9038 - val_loss: 0.7957 - val_acc: 0.8026
Epoch 74/200
45000/45000 [==============================] - 8s - loss: 0.2785 - acc: 0.9069 - val_loss: 0.7522 - val_acc: 0.8002
Epoch 75/200
45000/45000 [==============================] - 9s - loss: 0.2811 - acc: 0.9064 - val_loss: 0.8181 - val_acc: 0.7902
Epoch 76/200
45000/45000 [==============================] - 9s - loss: 0.2841 - acc: 0.9053 - val_loss: 0.7695 - val_acc: 0.7990
Epoch 77/200
45000/45000 [==============================] - 9s - loss: 0.2853 - acc: 0.9061 - val_loss: 0.7608 - val_acc: 0.7972
Epoch 78/200
45000/45000 [==============================] - 9s - loss: 0.2714 - acc: 0.9080 - val_loss: 0.7534 - val_acc: 0.8034
Epoch 79/200
45000/45000 [==============================] - 9s - loss: 0.2797 - acc: 0.9072 - val_loss: 0.7188 - val_acc: 0.7988
Epoch 80/200
45000/45000 [==============================] - 9s - loss: 0.2682 - acc: 0.9110 - val_loss: 0.7751 - val_acc: 0.7954
Epoch 81/200
45000/45000 [==============================] - 9s - loss: 0.2885 - acc: 0.9038 - val_loss: 0.7711 - val_acc: 0.8010
Epoch 82/200
45000/45000 [==============================] - 9s - loss: 0.2705 - acc: 0.9094 - val_loss: 0.7613 - val_acc: 0.8000
Epoch 83/200
45000/45000 [==============================] - 9s - loss: 0.2738 - acc: 0.9095 - val_loss: 0.8300 - val_acc: 0.7944
Epoch 84/200
45000/45000 [==============================] - 9s - loss: 0.2795 - acc: 0.9066 - val_loss: 0.8001 - val_acc: 0.7912
Epoch 85/200
45000/45000 [==============================] - 9s - loss: 0.2721 - acc: 0.9086 - val_loss: 0.7862 - val_acc: 0.8092
Epoch 86/200
45000/45000 [==============================] - 9s - loss: 0.2752 - acc: 0.9087 - val_loss: 0.7331 - val_acc: 0.7942
Epoch 87/200
45000/45000 [==============================] - 9s - loss: 0.2725 - acc: 0.9089 - val_loss: 0.7999 - val_acc: 0.7914
Epoch 88/200
45000/45000 [==============================] - 9s - loss: 0.2644 - acc: 0.9108 - val_loss: 0.7944 - val_acc: 0.7990
Epoch 89/200
45000/45000 [==============================] - 9s - loss: 0.2725 - acc: 0.9106 - val_loss: 0.7622 - val_acc: 0.8006
Epoch 90/200
45000/45000 [==============================] - 9s - loss: 0.2622 - acc: 0.9129 - val_loss: 0.8172 - val_acc: 0.7988
Epoch 91/200
45000/45000 [==============================] - 9s - loss: 0.2772 - acc: 0.9085 - val_loss: 0.8243 - val_acc: 0.8004
Epoch 92/200
45000/45000 [==============================] - 9s - loss: 0.2609 - acc: 0.9136 - val_loss: 0.7723 - val_acc: 0.7992
Epoch 93/200
45000/45000 [==============================] - 9s - loss: 0.2666 - acc: 0.9129 - val_loss: 0.8366 - val_acc: 0.7932
Epoch 94/200
45000/45000 [==============================] - 9s - loss: 0.2593 - acc: 0.9135 - val_loss: 0.8666 - val_acc: 0.7956
Epoch 95/200
45000/45000 [==============================] - 9s - loss: 0.2692 - acc: 0.9100 - val_loss: 0.8901 - val_acc: 0.7954
Epoch 96/200
45000/45000 [==============================] - 8s - loss: 0.2569 - acc: 0.9160 - val_loss: 0.8515 - val_acc: 0.8006
Epoch 97/200
45000/45000 [==============================] - 8s - loss: 0.2636 - acc: 0.9146 - val_loss: 0.8639 - val_acc: 0.7960
Epoch 98/200
45000/45000 [==============================] - 9s - loss: 0.2693 - acc: 0.9113 - val_loss: 0.7891 - val_acc: 0.7916
Epoch 99/200
45000/45000 [==============================] - 9s - loss: 0.2611 - acc: 0.9144 - val_loss: 0.8650 - val_acc: 0.7928
Epoch 100/200
45000/45000 [==============================] - 9s - loss: 0.2589 - acc: 0.9121 - val_loss: 0.8683 - val_acc: 0.7990
Epoch 101/200
45000/45000 [==============================] - 9s - loss: 0.2601 - acc: 0.9142 - val_loss: 0.9116 - val_acc: 0.8030
Epoch 102/200
45000/45000 [==============================] - 9s - loss: 0.2616 - acc: 0.9138 - val_loss: 0.8229 - val_acc: 0.7928
Epoch 103/200
45000/45000 [==============================] - 9s - loss: 0.2603 - acc: 0.9140 - val_loss: 0.8847 - val_acc: 0.7994
Epoch 104/200
45000/45000 [==============================] - 9s - loss: 0.2579 - acc: 0.9150 - val_loss: 0.9079 - val_acc: 0.8004
Epoch 105/200
45000/45000 [==============================] - 8s - loss: 0.2696 - acc: 0.9127 - val_loss: 0.7450 - val_acc: 0.8002
Epoch 106/200
45000/45000 [==============================] - 9s - loss: 0.2555 - acc: 0.9161 - val_loss: 0.8186 - val_acc: 0.7992
Epoch 107/200
45000/45000 [==============================] - 9s - loss: 0.2631 - acc: 0.9160 - val_loss: 0.8686 - val_acc: 0.7920
Epoch 108/200
45000/45000 [==============================] - 9s - loss: 0.2524 - acc: 0.9178 - val_loss: 0.9136 - val_acc: 0.7956
Epoch 109/200
45000/45000 [==============================] - 9s - loss: 0.2569 - acc: 0.9151 - val_loss: 0.8148 - val_acc: 0.7994
Epoch 110/200
45000/45000 [==============================] - 9s - loss: 0.2586 - acc: 0.9150 - val_loss: 0.8826 - val_acc: 0.7984
Epoch 111/200
45000/45000 [==============================] - 9s - loss: 0.2520 - acc: 0.9155 - val_loss: 0.8621 - val_acc: 0.7980
Epoch 112/200
45000/45000 [==============================] - 9s - loss: 0.2586 - acc: 0.9157 - val_loss: 0.8149 - val_acc: 0.8038
Epoch 113/200
45000/45000 [==============================] - 9s - loss: 0.2623 - acc: 0.9151 - val_loss: 0.8361 - val_acc: 0.7972
Epoch 114/200
45000/45000 [==============================] - 9s - loss: 0.2535 - acc: 0.9177 - val_loss: 0.8618 - val_acc: 0.7970
Epoch 115/200
45000/45000 [==============================] - 8s - loss: 0.2570 - acc: 0.9164 - val_loss: 0.7687 - val_acc: 0.8044
Epoch 116/200
45000/45000 [==============================] - 9s - loss: 0.2501 - acc: 0.9183 - val_loss: 0.8270 - val_acc: 0.7934
Epoch 117/200
45000/45000 [==============================] - 8s - loss: 0.2535 - acc: 0.9182 - val_loss: 0.7861 - val_acc: 0.7986
Epoch 118/200
45000/45000 [==============================] - 9s - loss: 0.2507 - acc: 0.9184 - val_loss: 0.8203 - val_acc: 0.7996
Epoch 119/200
45000/45000 [==============================] - 9s - loss: 0.2530 - acc: 0.9173 - val_loss: 0.8294 - val_acc: 0.7904
Epoch 120/200
45000/45000 [==============================] - 9s - loss: 0.2599 - acc: 0.9160 - val_loss: 0.8458 - val_acc: 0.7902
Epoch 121/200
45000/45000 [==============================] - 9s - loss: 0.2483 - acc: 0.9164 - val_loss: 0.7573 - val_acc: 0.7976
Epoch 122/200
45000/45000 [==============================] - 8s - loss: 0.2492 - acc: 0.9190 - val_loss: 0.8435 - val_acc: 0.8012
Epoch 123/200
45000/45000 [==============================] - 9s - loss: 0.2528 - acc: 0.9179 - val_loss: 0.8594 - val_acc: 0.7964
Epoch 124/200
45000/45000 [==============================] - 9s - loss: 0.2581 - acc: 0.9173 - val_loss: 0.9037 - val_acc: 0.7944
Epoch 125/200
45000/45000 [==============================] - 8s - loss: 0.2404 - acc: 0.9212 - val_loss: 0.7893 - val_acc: 0.7976
Epoch 126/200
45000/45000 [==============================] - 8s - loss: 0.2492 - acc: 0.9177 - val_loss: 0.8679 - val_acc: 0.7982
Epoch 127/200
45000/45000 [==============================] - 8s - loss: 0.2483 - acc: 0.9196 - val_loss: 0.8894 - val_acc: 0.7956
Epoch 128/200
45000/45000 [==============================] - 9s - loss: 0.2539 - acc: 0.9176 - val_loss: 0.8413 - val_acc: 0.8006
Epoch 129/200
45000/45000 [==============================] - 8s - loss: 0.2477 - acc: 0.9184 - val_loss: 0.8151 - val_acc: 0.7982
Epoch 130/200
45000/45000 [==============================] - 9s - loss: 0.2586 - acc: 0.9188 - val_loss: 0.8173 - val_acc: 0.7954
Epoch 131/200
45000/45000 [==============================] - 9s - loss: 0.2498 - acc: 0.9189 - val_loss: 0.8539 - val_acc: 0.7996
Epoch 132/200
45000/45000 [==============================] - 9s - loss: 0.2426 - acc: 0.9190 - val_loss: 0.8543 - val_acc: 0.7952
Epoch 133/200
45000/45000 [==============================] - 9s - loss: 0.2460 - acc: 0.9185 - val_loss: 0.8665 - val_acc: 0.8008
Epoch 134/200
45000/45000 [==============================] - 9s - loss: 0.2436 - acc: 0.9216 - val_loss: 0.8933 - val_acc: 0.7950
Epoch 135/200
45000/45000 [==============================] - 8s - loss: 0.2468 - acc: 0.9203 - val_loss: 0.8270 - val_acc: 0.7940
Epoch 136/200
45000/45000 [==============================] - 9s - loss: 0.2479 - acc: 0.9194 - val_loss: 0.8365 - val_acc: 0.8052
Epoch 137/200
45000/45000 [==============================] - 9s - loss: 0.2449 - acc: 0.9206 - val_loss: 0.7964 - val_acc: 0.8018
Epoch 138/200
45000/45000 [==============================] - 9s - loss: 0.2440 - acc: 0.9220 - val_loss: 0.8784 - val_acc: 0.7914
Epoch 139/200
45000/45000 [==============================] - 9s - loss: 0.2485 - acc: 0.9198 - val_loss: 0.8259 - val_acc: 0.7852
Epoch 140/200
45000/45000 [==============================] - 9s - loss: 0.2482 - acc: 0.9204 - val_loss: 0.8954 - val_acc: 0.7960
Epoch 141/200
45000/45000 [==============================] - 9s - loss: 0.2344 - acc: 0.9249 - val_loss: 0.8708 - val_acc: 0.7874
Epoch 142/200
45000/45000 [==============================] - 9s - loss: 0.2476 - acc: 0.9204 - val_loss: 0.9190 - val_acc: 0.7954
Epoch 143/200
45000/45000 [==============================] - 9s - loss: 0.2415 - acc: 0.9223 - val_loss: 0.9607 - val_acc: 0.7960
Epoch 144/200
45000/45000 [==============================] - 9s - loss: 0.2377 - acc: 0.9232 - val_loss: 0.8987 - val_acc: 0.7970
Epoch 145/200
45000/45000 [==============================] - 9s - loss: 0.2481 - acc: 0.9201 - val_loss: 0.8611 - val_acc: 0.8048
Epoch 146/200
45000/45000 [==============================] - 9s - loss: 0.2504 - acc: 0.9197 - val_loss: 0.8411 - val_acc: 0.7938
Epoch 147/200
45000/45000 [==============================] - 9s - loss: 0.2450 - acc: 0.9216 - val_loss: 0.7839 - val_acc: 0.8028
Epoch 148/200
45000/45000 [==============================] - 9s - loss: 0.2327 - acc: 0.9250 - val_loss: 0.8910 - val_acc: 0.8054
Epoch 149/200
45000/45000 [==============================] - 9s - loss: 0.2432 - acc: 0.9219 - val_loss: 0.8568 - val_acc: 0.8000
Epoch 150/200
45000/45000 [==============================] - 9s - loss: 0.2436 - acc: 0.9236 - val_loss: 0.9061 - val_acc: 0.7938
Epoch 151/200
45000/45000 [==============================] - 9s - loss: 0.2434 - acc: 0.9222 - val_loss: 0.8439 - val_acc: 0.7986
Epoch 152/200
45000/45000 [==============================] - 9s - loss: 0.2439 - acc: 0.9225 - val_loss: 0.9002 - val_acc: 0.7994
Epoch 153/200
45000/45000 [==============================] - 8s - loss: 0.2373 - acc: 0.9237 - val_loss: 0.8756 - val_acc: 0.7880
Epoch 154/200
45000/45000 [==============================] - 8s - loss: 0.2359 - acc: 0.9238 - val_loss: 0.8514 - val_acc: 0.7936
Epoch 155/200
45000/45000 [==============================] - 9s - loss: 0.2435 - acc: 0.9222 - val_loss: 0.8377 - val_acc: 0.8080
Epoch 156/200
45000/45000 [==============================] - 9s - loss: 0.2478 - acc: 0.9204 - val_loss: 0.8831 - val_acc: 0.7992
Epoch 157/200
45000/45000 [==============================] - 9s - loss: 0.2337 - acc: 0.9253 - val_loss: 0.8453 - val_acc: 0.7994
Epoch 158/200
45000/45000 [==============================] - 9s - loss: 0.2336 - acc: 0.9257 - val_loss: 0.9027 - val_acc: 0.7882
Epoch 159/200
45000/45000 [==============================] - 9s - loss: 0.2384 - acc: 0.9230 - val_loss: 0.9121 - val_acc: 0.8016
Epoch 160/200
45000/45000 [==============================] - 9s - loss: 0.2481 - acc: 0.9217 - val_loss: 0.9495 - val_acc: 0.7974
Epoch 161/200
45000/45000 [==============================] - 9s - loss: 0.2450 - acc: 0.9224 - val_loss: 0.8510 - val_acc: 0.7884
Epoch 162/200
45000/45000 [==============================] - 9s - loss: 0.2433 - acc: 0.9220 - val_loss: 0.8979 - val_acc: 0.7948
Epoch 163/200
45000/45000 [==============================] - 9s - loss: 0.2339 - acc: 0.9262 - val_loss: 0.8979 - val_acc: 0.7978
Epoch 164/200
45000/45000 [==============================] - 9s - loss: 0.2298 - acc: 0.9257 - val_loss: 0.9036 - val_acc: 0.7990
Epoch 165/200
45000/45000 [==============================] - 9s - loss: 0.2404 - acc: 0.9236 - val_loss: 0.8341 - val_acc: 0.8052
Epoch 166/200
45000/45000 [==============================] - 9s - loss: 0.2402 - acc: 0.9227 - val_loss: 0.8731 - val_acc: 0.7996
Epoch 167/200
45000/45000 [==============================] - 9s - loss: 0.2367 - acc: 0.9250 - val_loss: 0.9218 - val_acc: 0.7992
Epoch 168/200
45000/45000 [==============================] - 9s - loss: 0.2267 - acc: 0.9262 - val_loss: 0.8767 - val_acc: 0.7922
Epoch 169/200
45000/45000 [==============================] - 9s - loss: 0.2336 - acc: 0.9254 - val_loss: 0.8418 - val_acc: 0.8038
Epoch 170/200
45000/45000 [==============================] - 9s - loss: 0.2434 - acc: 0.9232 - val_loss: 0.8362 - val_acc: 0.7920
Epoch 171/200
45000/45000 [==============================] - 9s - loss: 0.2328 - acc: 0.9265 - val_loss: 0.8712 - val_acc: 0.7950
Epoch 172/200
45000/45000 [==============================] - 9s - loss: 0.2346 - acc: 0.9262 - val_loss: 0.9256 - val_acc: 0.7976
Epoch 173/200
45000/45000 [==============================] - 8s - loss: 0.2382 - acc: 0.9242 - val_loss: 0.8875 - val_acc: 0.7982
Epoch 174/200
45000/45000 [==============================] - 9s - loss: 0.2400 - acc: 0.9239 - val_loss: 0.8264 - val_acc: 0.7864
Epoch 175/200
45000/45000 [==============================] - 9s - loss: 0.2334 - acc: 0.9261 - val_loss: 0.9178 - val_acc: 0.8014
Epoch 176/200
45000/45000 [==============================] - 9s - loss: 0.2427 - acc: 0.9219 - val_loss: 0.8458 - val_acc: 0.7920
Epoch 177/200
45000/45000 [==============================] - 9s - loss: 0.2310 - acc: 0.9257 - val_loss: 0.9171 - val_acc: 0.8062
Epoch 178/200
45000/45000 [==============================] - 9s - loss: 0.2310 - acc: 0.9265 - val_loss: 0.8544 - val_acc: 0.7990
Epoch 179/200
45000/45000 [==============================] - 9s - loss: 0.2378 - acc: 0.9240 - val_loss: 0.9259 - val_acc: 0.8000
Epoch 180/200
45000/45000 [==============================] - 9s - loss: 0.2381 - acc: 0.9242 - val_loss: 0.8573 - val_acc: 0.8056
Epoch 181/200
45000/45000 [==============================] - 9s - loss: 0.2231 - acc: 0.9297 - val_loss: 0.8935 - val_acc: 0.8002
Epoch 182/200
45000/45000 [==============================] - 9s - loss: 0.2419 - acc: 0.9248 - val_loss: 1.0145 - val_acc: 0.7900
Epoch 183/200
45000/45000 [==============================] - 9s - loss: 0.2336 - acc: 0.9266 - val_loss: 0.8838 - val_acc: 0.8006
Epoch 184/200
45000/45000 [==============================] - 9s - loss: 0.2429 - acc: 0.9242 - val_loss: 0.8685 - val_acc: 0.7918
Epoch 185/200
45000/45000 [==============================] - 9s - loss: 0.2317 - acc: 0.9260 - val_loss: 0.8297 - val_acc: 0.7942
Epoch 186/200
45000/45000 [==============================] - 9s - loss: 0.2330 - acc: 0.9264 - val_loss: 0.8831 - val_acc: 0.8026
Epoch 187/200
45000/45000 [==============================] - 9s - loss: 0.2353 - acc: 0.9254 - val_loss: 0.8934 - val_acc: 0.7956
Epoch 188/200
45000/45000 [==============================] - 9s - loss: 0.2312 - acc: 0.9247 - val_loss: 0.9275 - val_acc: 0.8042
Epoch 189/200
45000/45000 [==============================] - 9s - loss: 0.2239 - acc: 0.9282 - val_loss: 0.9246 - val_acc: 0.7934
Epoch 190/200
45000/45000 [==============================] - 9s - loss: 0.2349 - acc: 0.9253 - val_loss: 0.8628 - val_acc: 0.8000
Epoch 191/200
45000/45000 [==============================] - 9s - loss: 0.2313 - acc: 0.9266 - val_loss: 0.9020 - val_acc: 0.7978
Epoch 192/200
45000/45000 [==============================] - 9s - loss: 0.2358 - acc: 0.9254 - val_loss: 0.9481 - val_acc: 0.7966
Epoch 193/200
45000/45000 [==============================] - 9s - loss: 0.2298 - acc: 0.9276 - val_loss: 0.8791 - val_acc: 0.8010
Epoch 194/200
45000/45000 [==============================] - 9s - loss: 0.2279 - acc: 0.9265 - val_loss: 0.8890 - val_acc: 0.7976
Epoch 195/200
45000/45000 [==============================] - 9s - loss: 0.2330 - acc: 0.9273 - val_loss: 0.8893 - val_acc: 0.7890
Epoch 196/200
45000/45000 [==============================] - 9s - loss: 0.2416 - acc: 0.9243 - val_loss: 0.9002 - val_acc: 0.7922
Epoch 197/200
45000/45000 [==============================] - 9s - loss: 0.2309 - acc: 0.9273 - val_loss: 0.9232 - val_acc: 0.7990
Epoch 198/200
45000/45000 [==============================] - 9s - loss: 0.2247 - acc: 0.9278 - val_loss: 0.9474 - val_acc: 0.7980
Epoch 199/200
45000/45000 [==============================] - 9s - loss: 0.2335 - acc: 0.9256 - val_loss: 0.9177 - val_acc: 0.8000
Epoch 200/200
45000/45000 [==============================] - 9s - loss: 0.2378 - acc: 0.9254 - val_loss: 0.9205 - val_acc: 0.7966
 9984/10000 [============================>.] - ETA: 0s




[0.97292723369598388, 0.7853]

This model achieves an accuracy of 78.6%∼78.6% on the test set; for such a difficult task (*where human performance is only around 94%94%*), and given the relative simplicity of this model, this is a respectable result. However, more sophisticated models have recently been able to get as far as 96.53%96.53%.

I appreciate that tinkering with this model might be cumbersome if you do not have a GPU in your possession. I would, however, encourage you to apply a similar model to the previously discussed MNIST dataset; you should be able to break 99.3%99.3% accuracy on its test set with little to no effort using a CNN with dropout.

Conclusion

Throughout this post we have covered the essentials of convolutional neural networks, introduced the problem of overfitting, and made a very brief dent into how it could be rectified via regularisation (by applying dropout) and successfully implemented a four-layer deep CNN in Keras, applying it to CIFAR-10, all in under 50 lines of code.

Next time around, we will focus on some assorted topics, tips and tricks that should help you when fine-tuning models at this scale, and extracting more power out of your models while keeping overfitting in check.

ABOUT THE AUTHOR

Petar Veličković

Petar is currently a Research Assistant in Computational Biology within the Artificial Intelligence Group of the Cambridge University Computer Laboratory, where he is working on developing machine learning algorithms on complex networks, and their applications to bioinformatics. He is also a PhD student within the group, supervised by Dr Pietro Liò and affiliated with Trinity College. He holds a BA degree in Computer Science from the University of Cambridge, having completed the Computer Science Tripos in 2015.

Just show me the code!

from keras.datasets import cifar10 # subroutines for fetching the CIFAR-10 dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Convolution2D, MaxPooling2D, Dense, Dropout, Activation, Flatten
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values
import numpy as np

batch_size = 32 # in each iteration, we consider 32 training examples at once
num_epochs = 200 # we iterate 200 times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth_1 = 32 # we will initially have 32 kernels per conv. layer...
conv_depth_2 = 64 # ...switching to 64 after the first pooling layer
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 512 # the FC layer will have 512 neurons

(X_train, y_train), (X_test, y_test) = cifar10.load_data() # fetch CIFAR-10 data

num_train, depth, height, width = X_train.shape # there are 50000 training examples in CIFAR-10 
num_test = X_test.shape[0] # there are 10000 test examples in CIFAR-10
num_classes = np.unique(y_train).shape[0] # there are 10 image classes

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= np.max(X_train) # Normalise data to [0, 1] range
X_test /= np.max(X_train) # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

inp = Input(shape=(depth, height, width)) # N.B. depth goes first in Keras!
# Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer)
conv_1 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(inp)
conv_2 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_1)
pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
drop_1 = Dropout(drop_prob_1)(pool_1)
# Conv [64] -> Conv [64] -> Pool (with dropout on the pooling layer)
conv_3 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(drop_1)
conv_4 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_3)
pool_2 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_4)
drop_2 = Dropout(drop_prob_1)(pool_2)
# Now flatten to 1D, apply FC -> ReLU (with dropout) -> softmax
flat = Flatten()(drop_2)
hidden = Dense(hidden_size, activation='relu')(flat)
drop_3 = Dropout(drop_prob_2)(hidden)
out = Dense(num_classes, activation='softmax')(drop_3)

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!

Deep learning for complete beginners: neural network fine-tuning techniquesby Cambridge Coding Academy | Download notebook

Introduction

Welcome to the third (and final) in a series of blog posts that is designed to get you quickly up to speed with deep learning; from first principles, all the way to discussions of some of the intricate details, with the purposes of achieving respectable performance on two established machine learning benchmarks: MNIST (classification of handwritten digits) and CIFAR-10 (classification of small images across 10 distinct classes—airplane, automobile, bird, cat, deer, dog, frog, horse, ship & truck).

MNIST CIFAR-10

Last time around, I have introduced the convolutional neural network model, and illustrated how, combined with a simple but effective regularisation method of dropout, it may quickly achieve an accuracy level of 78.6%78.6% on CIFAR-10, leveraging the Keras deep learning framework.

By now, you have acquired the fundamental skills necessary to apply deep learning to most problems of interest (a notable exception, outside of the scope of these tutorials, is the problem of processing time-series of arbitrary length, for which a recurrent neural network (RNN) model is often preferable). In this tutorial, I will wrap up with an important but often overlooked aspect of tutorials such as this one—the tips and tricks for properly fine-tuning a model, to make it generalise better than the initial baseline you started out with.

This tutorial will, for the most part, assume familiarity with the previous two in the series.

Hyperparameter tuning and the baseline model

Typically, the design process for neural networks starts off by designing a simple network, either directly applying architectures that have shown successes for similar problems, or trying out hyperparameter values that generally seem effective. Eventually, we will hopefully attain performance values that seem like a nice baseline starting point, after which we may look into modifying every fixed detail in order to extract the maximal performance capacity out of the network. This is commonly known as hyperparameter tuning, because it involves modifying the components of the network which need to be specified before training.

While the methods described here can yield far more tangible improvements on CIFAR-10, due to the relative difficulty of rapid prototyping on it without a GPU, we will focus specifically on improving performance on the MNIST benchmark. Of course, I do invite you to have a go at applying methods like these to CIFAR-10 and see the kinds of gains you may achieve compared to the basic CNN approach, should your resources allow for it.

We will start off with the baseline CNN given below. If you find any aspects of this code unclear, I invite you to familiarise yourself with the previous two tutorials in the series—all the relevant concepts have already been introduced there.

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense, Flatten, Convolution2D, MaxPooling2D, Dropout
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 12 # we iterate twelve times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth = 32 # use 32 kernels in both convolutional layers
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 128 # there will be 128 neurons in both hidden layers

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(X_train.shape[0], depth, height, width)
X_test = X_test.reshape(X_test.shape[0], depth, height, width)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

inp = Input(shape=(depth, height, width)) # N.B. Keras expects channel dimension first
# Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer)
conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', activation='relu')(inp)
conv_2 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_1)
pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
drop_1 = Dropout(drop_prob_1)(pool_1)
flat = Flatten()(drop_1)
hidden = Dense(hidden_size, activation='relu')(flat) # Hidden ReLU layer
drop = Dropout(drop_prob_2)(hidden)
out = Dense(num_classes, activation='softmax')(drop) # Output softmax layer

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Train on 54000 samples, validate on 6000 samples
Epoch 1/12
54000/54000 [==============================] - 4s - loss: 0.3010 - acc: 0.9073 - val_loss: 0.0612 - val_acc: 0.9825
Epoch 2/12
54000/54000 [==============================] - 4s - loss: 0.1010 - acc: 0.9698 - val_loss: 0.0400 - val_acc: 0.9893
Epoch 3/12
54000/54000 [==============================] - 4s - loss: 0.0753 - acc: 0.9775 - val_loss: 0.0376 - val_acc: 0.9903
Epoch 4/12
54000/54000 [==============================] - 4s - loss: 0.0629 - acc: 0.9809 - val_loss: 0.0321 - val_acc: 0.9913
Epoch 5/12
54000/54000 [==============================] - 4s - loss: 0.0520 - acc: 0.9837 - val_loss: 0.0346 - val_acc: 0.9902
Epoch 6/12
54000/54000 [==============================] - 4s - loss: 0.0466 - acc: 0.9850 - val_loss: 0.0361 - val_acc: 0.9912
Epoch 7/12
54000/54000 [==============================] - 4s - loss: 0.0405 - acc: 0.9871 - val_loss: 0.0330 - val_acc: 0.9917
Epoch 8/12
54000/54000 [==============================] - 4s - loss: 0.0386 - acc: 0.9879 - val_loss: 0.0326 - val_acc: 0.9908
Epoch 9/12
54000/54000 [==============================] - 4s - loss: 0.0349 - acc: 0.9894 - val_loss: 0.0369 - val_acc: 0.9908
Epoch 10/12
54000/54000 [==============================] - 4s - loss: 0.0315 - acc: 0.9901 - val_loss: 0.0277 - val_acc: 0.9923
Epoch 11/12
54000/54000 [==============================] - 4s - loss: 0.0287 - acc: 0.9906 - val_loss: 0.0346 - val_acc: 0.9922
Epoch 12/12
54000/54000 [==============================] - 4s - loss: 0.0273 - acc: 0.9909 - val_loss: 0.0264 - val_acc: 0.9930
 9888/10000 [============================>.] - ETA: 0s




[0.026324689089493085, 0.99119999999999997]

As can be seen, our model achieves an accuracy level of 99.12%99.12% on the test set. This is slightly better than the MLP model explored in the first tutorial, but it should be easy to do even better!

The core of this tutorial will explore common ways in which a baseline neural network such as this one can often be improved (we will keep the basic CNN architecture fixed), after which we will evaluate the relative gains we have achieved.

L2L2 regularisation

As already detailedly explained in the previous tutorial, one of the primary pitfalls of machine learning is overfitting, when the model sometimes catastrophically sacrifices generalisation performance for the purposes of minimising training loss.

Previously, we introduced dropout as a very simple way to keep overfitting in check.

There are several other common regularisers that we can apply to our networks. Arguably the most popular out of them is L2L2 regularisation (sometimes also called weight decay), which takes a more direct approach than dropout for regularising. Namely, a common underlying cause for overfitting is that our model is too complex (in terms of parameter count) for the problem and training set size at hand. A regulariser, in this sense, aims to decrease complexity of the model while maintaining parameter count the same. L2L2 regularisation does so by penalising weights with large magnitudes, by minimising their L2L2 norm, using a hyperparameter λλ to specify the relative importance of minimising the norm to minimising the loss on the training set. Introducing this regulariser effectively adds a cost of λ2||w||2=λ2Wi=0w2iλ2||w→||2=λ2∑i=0Wwi2 (the 1/21/2 factor is there just for nicer backpropagation updates) to the loss function L(^y,y)L(y^→,y→) (in our case, this was the cross-entropy loss).

Note that choosing λλ properly is important. For too low values, the effect of the regulariser will be negligible, and for too high values, the optimal model will set all the weights to zero. We will set λ=0.0001λ=0.0001 here; to add this regulariser to our model, we need an additional import, after which it’s as simple as adding a W_regularizer parameter to each layer we want to regularise:

from keras.regularizers import l2 # L2-regularisation
# ...
l2_lambda = 0.0001
# ...
# This is how to add L2-regularisation to any Keras layer with weights (e.g. Convolution2D/Dense)
conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', W_regularizer=l2(l2_lambda), activation='relu')(inp)

Network initialisation

One issue that was completely overlooked in previous tutorials (and a lot of other existing tutorials as well!) is the policy used for assigning initial weights to the various layers within the network. Clearly, this is a very important issue: simply initialising all weights to zero, for example, would significantly impede learning given that no weight would initially be active. Uniform initialisation between ±1±1 is also not typically the best way to go—in fact, sometimes (depending on problem and model complexity) choosing the proper initialisation for each layer could mean the difference between superb performance and achieving no convergence at all! Even if the problem does not pose such issues, initialising the weights in an appropriate way can be significantly influential to how easily the network learns from the training set (as it effectively preselects the initial position of the model parameters with respect to the loss function to optimise).

Here I will mention two schemes that are particularly of interest:
Xavier (sometimes Glorot) initialisation: The key idea behind this initialisation scheme is to make it easier for a signal to pass through the layer during forward as well as backward propagation, for a linear activation (this also works nicely for sigmoid activations because the interval where they are unsaturated is roughly linear as well). It draws weights from a probability distribution (uniform or normal) with variance equal to: Var(W)=2nin+noutVar(W)=2nin+nout, where ninnin and noutnoutare the numbers of neurons in the previous and next layer, respectively.
He initialisation: This scheme is a version of the Xavier initialisation more suitable for ReLU activations, compensating for the fact that this activation is zero for half of the possible input space. Namely, Var(W)=2ninVar(W)=2nin in this case.

In order to derive the required variance for the Xavier initialisation, consider what happens to the variance of the output of a linear neuron (ignoring the bias term), based on the variance of its inputs, assuming that the weights and inputs are uncorrelated, and are both zero-mean:

Var(nini=1wixi)=nini=1Var(wixi)=nini=1Var(W)Var(X)=ninVar(W)Var(X)Var(∑i=1ninwixi)=∑i=1ninVar(wixi)=∑i=1ninVar(W)Var(X)=ninVar(W)Var(X)

This implies that, in order to preserve variance of the input after passing through the layer, it must hold that Var(W)=1ninVar(W)=1nin. We may apply a similar argument to the backpropagation update to get that Var(W)=1noutVar(W)=1nout. As we cannot typically satisfy these two constraints simultaneously, we set the variance of the weights to their average, i.e. Var(W)=2nin+noutVar(W)=2nin+nout, which usually works quite well in practice.

These two schemes will be sufficient for most examples you will encounter (although the orthogonal initialisation is also worth investigating in some cases, particularly when initialising recurrent neural networks). Adding a custom initialisation to a layer is simple: you only need to specify an init parameter for it, as described below. We will be using the uniform He initialisation (he_uniform) for all ReLU layers and the uniform Xavier initialisation (glorot_uniform) for the output softmax layer (as it is effectively a generalisation of the logistic function for multiple inputs).

# Add He initialisation to a layer
conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(inp)
# Add Xavier initialisation to a layer
out = Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)

Batch normalisation

If there’s one technique I would like you to pick up and readily use upon reading this tutorial, it has to be batch normalisation, a method for speeding up deep learning pioneered by Ioffe and Szegedy in early 2015, already accumulating 560 citations on arXiv! It is based on a really simple failure mode that impedes efficient training of deep networks: as the signal propagates through the network, even if we normalised it in our input, in an intermediate hidden layer it may well end up completely skewed in both mean and variance properties (an effect called the internal covariance shift by the original authors), meaning that there will be potentially severe discrepancies between gradient updates across different layers. This requires us to be more conservative with our learning rate, and apply stronger regularisers, significantly slowing down learning.

Batch normalisation’s answer to this is really quite simple: normalise the activations to a layer to zero mean and unit variance, across the current batch of data being passed through the network (this means that, during training, we normalise across batch_size examples, and during testing, we normalise across statistics derived from the entire training set—as the testing data cannot be seen upfront). Namely, we compute the mean and variance statistics for a particular batch of activations B=x1,,xmB=x1,…,xm as follows:

μB=1mmi=1xiμB=1m∑i=1mxi

σ2B=1mmi=1(xiμB)2σB2=1m∑i=1m(xi−μB)2

We then use these statistics to transform the activations so that they have zero mean and unit variance across the batch, as follows:

^xi=xiμBσ2B+εx^i=xi−μBσB2+ε

where ε>0ε>0 is a small “fuzz” parameter designed to protect us from dividing by zero (in an event where the batch standard deviation is very small or even zero). Finally, to obtain the final activations yy, we need to make sure that we haven’t lost any generalisation properties by performing the normalisation—and since the operations we performed on the original data were a scale and shift, we allow for an arbitrary scale and shift on the normalised values to obtain the final acivations (this allows the network, for example, to fall back to the original values if it finds this to be more useful):

yi=γ^xi+βyi=γx^i+β

where ββ and γγ are trainable parameters of the batch normalisation operation (can be optimised via gradient descent on the training data). This generalisation also means that batch normalisation can often be usefully applied directly to the inputs of a neural network (given that the presence of these parameters allows the network to assume a different input statistic to the one we selected through manual preprocessing of the data).

This method, when applied to the layers within a deep convolutional neural network almost always achieves significant success with its original design goal of speeding up training. Even further, it acts as a great regulariser, allowing us to be far more careless with our choice of learning rate, L2L2 regularisation strength and use of dropout (sometimes making it completely unnecessary). This regularisation occurs as a consequence of the fact that the output of the network for a single example is no longer deterministic (it depends on the entire batch within which it is contained), helping the network generalise easier.

A final piece of analysis: while the authors of batch normalisation suggest performing it before applying the activation function of the neuron (on the computed linear combinations of the input data), recently published results suggest that it might be more beneficial (and at least as good) to do it after, which is what we will be doing within this tutorial.

Adding batch normalisation to our network is simple through Keras; expressed by a BatchNormalization layer, to which we may provide a few parameters, the most important of which is axis (along which axis of the data should the statistics be computed). Specially, when dealing with the inputs to convolutional layers, we would usually like to normalise across individual channels, and therefore we set axis = 1.

from keras.layers.normalization import BatchNormalization # batch normalisation
# ...
inp_norm = BatchNormalization(axis=1)(inp) # apply BN to the input (N.B. need to rename here)
# conv_1 = Convolution2D(...)(inp_norm)
conv_1 = BatchNormalization(axis=1)(conv_1) # apply BN to the first conv layer

Data augmentation

While the previously discussed methods have all tuned the model specification, it is often useful to consider data-driven fine-tuning as well—especially when dealing with image recognition tasks.

Imagine that we trained a neural network on handwritten digits which all, roughly, had the same bounding box, and were nicely oriented. Now consider what happens when someone presents the network with a slightly shifted, scaled and rotated version of a training image to test on: its confidence in the correct class is bound to drop. We would ideally want to instruct the model to remain invariant under feasible levels of such distortions, but our model can only learn from the samples we provided to it, given that it performs a kind of statistical analysis and extrapolation from the training set!

Luckily, there is a very simple remedy to this problem which is often quite effective, especially on image recognition tasks: artificially augment the data with distorted versions during training! This means that, prior to feeding an example to the network for training, we will apply any transformations to it that we find appropriate, and therefore allow the network to directly observe the effects of applying them on data and instructing it to behave better on such examples. For illustration purposes, here are a few shifted/scaled/sheared/rotated examples of MNIST digits:

shift shift shear shift & scale rotate & scale

Keras provides a really nice interface to image data augmentation by way of the ImageDataGenerator class. We initialise the class by providing it with the kinds of transformations we want performed to every image, and then feed our training data through the generator, by way of performing a call to its fit method followed by its flow method, returning an (infinitely-extending) iterator across augmented batches. There is even a custom model.fit_generatormethod which will directly perform training of our model using this iterator, simplifying the code significantly! A slight downside is that we now lose the validation_split parameter, meaning we have to separate the validation dataset ourselves, but this adds only four extra lines of code.

For this tutorial, we will apply random horizontal and vertical shifts to the data. ImageDataGenerator also provides us with methods for applying random rotations, scales, shears and flips. These should all also be sensible transformations to attempt, except for the flips, due to the fact that we are not ever expecting to receive flipped handwritten digits from a person.

from keras.preprocessing.image import ImageDataGenerator # data augmentation
# ... after model.compile(...)
# Explicitly split the training and validation sets
X_val = X_train[54000:]
Y_val = Y_train[54000:]
X_train = X_train[:54000]
Y_train = Y_train[:54000]

datagen = ImageDataGenerator(
            width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
            height_shift_range=0.1) # randomly shift images vertically (fraction of total height)
datagen.fit(X_train)

# fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit
model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=num_epochs,
                        validation_data=(X_val, Y_val),
                        verbose=1)

Ensembles

One interesting aspect of neural networks that you might observe when using them for classification (on more than two classes) is that, when trained from different initial conditions, they will express better discriminative properties for different classes, and will tend to get more confused on others. On the MNIST example, you might find that a single trained network becomes very good at distinguishing a three from a five, but as a consequence does not learn to distinguish ones from sevens properly; while another trained network could well do the exact opposite.

This discrepancy may be exploited through an ensemble method—rather than building just one model, build several copies of it (with different initial values), and average their predictions on a particular input to obtain the final answer. Here we will perform this across three separate models. The difference between the two architectures can be easily visualised by a diagram such as the one below (plotted within Keras!):

Baseline (with BN) Ensemble

Keras once again provides us with a way to effectively do this at minimial expense to code length – we may wrap the constituent models’ constructions in a loop, extracting just their outputs for a final three-way merge layer.

from keras.layers import merge # for merging predictions in an ensemble
# ...
ens_models = 3 # we will train three separate models on the data
# ...
inp_norm = BatchNormalization(axis=1)(inp) # Apply BN to the input (N.B. need to rename here)

outs = [] # the list of ensemble outputs
for i in range(ens_models):
    # conv_1 = Convolution2D(...)(inp_norm)
    # ...
    outs.append(Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)) # Output softmax layer

out = merge(outs, mode='ave') # average the predictions to obtain the final output

Early stopping

I will discuss one further method here, as an introduction to a much wider area of hyperparameter optimisation. Namely, thus far we have utilised the validation dataset just to monitor training progress, which is arguably wasteful (given that we don’t do anything constructive with this data, other than observe successive losses on it). In fact, validation data represents the primary platform for evaluating hyperparameters of the network (such as depth, neuron/kernel numbers, regularisation factors, etc). We may imagine running our network with different combinations of hyperparameters we wish to optimise, and then basing our decisions on their performance on the validation set. Keep in mind that we may not observe the test dataset until we have irrevocably committed ourselves to all hyperparameters, given that otherwise features of the test set may inadvertently flow into the training procedure! This is sometimes known as the golden rule of machine learning, and breaking it was a common failure of many early approaches.

Perhaps the simplest use of the validation set is for tuning the number of epochs, through a procedure known as early stopping; simply stop training once the validation loss hasn’t decreased for a fixed number of epochs (a parameter known as patience). As this is a relatively small benchmark which saturates quickly, we will have a patience of five epochs, and increase the upper bound on epochs to 50 (which will likely never be reached).

Keras supports early stopping through an EarlyStopping callback class. Callbacks are methods that are called after each epoch of training, upon supplying a callbacks parameter to the fit or fit_generator method of the model. As usual, this is very concise, adding only a single line to our program.

from keras.callbacks import EarlyStopping
# ...
num_epochs = 50 # we iterate at most fifty times over the entire training set
# ...
# fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit
model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=num_epochs,
                        validation_data=(X_val, Y_val),
                        verbose=1,
                        callbacks=[EarlyStopping(monitor='val_loss', patience=5)]) # adding early stopping

Just show me the code! (also, how well does it do?)

With these six techniques applied to our original baseline, the final version of your code should look something like the following:

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense, Flatten, Convolution2D, MaxPooling2D, Dropout, merge
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values
from keras.regularizers import l2 # L2-regularisation
from keras.layers.normalization import BatchNormalization # batch normalisation
from keras.preprocessing.image import ImageDataGenerator # data augmentation
from keras.callbacks import EarlyStopping # early stopping

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 50 # we iterate at most fifty times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth = 32 # use 32 kernels in both convolutional layers
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 128 # there will be 128 neurons in both hidden layers
l2_lambda = 0.0001 # use 0.0001 as a L2-regularisation factor
ens_models = 3 # we will train three separate models on the data

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(X_train.shape[0], depth, height, width)
X_test = X_test.reshape(X_test.shape[0], depth, height, width)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

# Explicitly split the training and validation sets
X_val = X_train[54000:]
Y_val = Y_train[54000:]
X_train = X_train[:54000]
Y_train = Y_train[:54000]

inp = Input(shape=(depth, height, width)) # N.B. Keras expects channel dimension first
inp_norm = BatchNormalization(axis=1)(inp) # Apply BN to the input (N.B. need to rename here)

outs = [] # the list of ensemble outputs
for i in range(ens_models):
    # Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer), applying BN in between
    conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(inp_norm)
    conv_1 = BatchNormalization(axis=1)(conv_1)
    conv_2 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(conv_1)
    conv_2 = BatchNormalization(axis=1)(conv_2)
    pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
    drop_1 = Dropout(drop_prob_1)(pool_1)
    flat = Flatten()(drop_1)
    hidden = Dense(hidden_size, init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(flat) # Hidden ReLU layer
    hidden = BatchNormalization(axis=1)(hidden)
    drop = Dropout(drop_prob_2)(hidden)
    outs.append(Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)) # Output softmax layer

out = merge(outs, mode='ave') # average the predictions to obtain the final output

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

datagen = ImageDataGenerator(
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1)  # randomly shift images vertically (fraction of total height)
datagen.fit(X_train)

# fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit
model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=num_epochs,
                        validation_data=(X_val, Y_val),
                        verbose=1,
                        callbacks=[EarlyStopping(monitor='val_loss', patience=5)]) # adding early stopping

model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Epoch 1/50
54000/54000 [==============================] - 30s - loss: 0.3487 - acc: 0.9031 - val_loss: 0.0579 - val_acc: 0.9863
Epoch 2/50
54000/54000 [==============================] - 30s - loss: 0.1441 - acc: 0.9634 - val_loss: 0.0424 - val_acc: 0.9890
Epoch 3/50
54000/54000 [==============================] - 30s - loss: 0.1126 - acc: 0.9716 - val_loss: 0.0405 - val_acc: 0.9887
Epoch 4/50
54000/54000 [==============================] - 30s - loss: 0.0929 - acc: 0.9757 - val_loss: 0.0390 - val_acc: 0.9890
Epoch 5/50
54000/54000 [==============================] - 30s - loss: 0.0829 - acc: 0.9788 - val_loss: 0.0329 - val_acc: 0.9920
Epoch 6/50
54000/54000 [==============================] - 30s - loss: 0.0760 - acc: 0.9807 - val_loss: 0.0315 - val_acc: 0.9917
Epoch 7/50
54000/54000 [==============================] - 30s - loss: 0.0740 - acc: 0.9824 - val_loss: 0.0310 - val_acc: 0.9917
Epoch 8/50
54000/54000 [==============================] - 30s - loss: 0.0679 - acc: 0.9826 - val_loss: 0.0297 - val_acc: 0.9927
Epoch 9/50
54000/54000 [==============================] - 30s - loss: 0.0663 - acc: 0.9834 - val_loss: 0.0300 - val_acc: 0.9908
Epoch 10/50
54000/54000 [==============================] - 30s - loss: 0.0658 - acc: 0.9833 - val_loss: 0.0281 - val_acc: 0.9923
Epoch 11/50
54000/54000 [==============================] - 30s - loss: 0.0600 - acc: 0.9844 - val_loss: 0.0272 - val_acc: 0.9930
Epoch 12/50
54000/54000 [==============================] - 30s - loss: 0.0563 - acc: 0.9857 - val_loss: 0.0250 - val_acc: 0.9923
Epoch 13/50
54000/54000 [==============================] - 30s - loss: 0.0530 - acc: 0.9862 - val_loss: 0.0266 - val_acc: 0.9925
Epoch 14/50
54000/54000 [==============================] - 31s - loss: 0.0517 - acc: 0.9865 - val_loss: 0.0263 - val_acc: 0.9923
Epoch 15/50
54000/54000 [==============================] - 30s - loss: 0.0510 - acc: 0.9867 - val_loss: 0.0261 - val_acc: 0.9940
Epoch 16/50
54000/54000 [==============================] - 30s - loss: 0.0501 - acc: 0.9871 - val_loss: 0.0238 - val_acc: 0.9937
Epoch 17/50
54000/54000 [==============================] - 30s - loss: 0.0495 - acc: 0.9870 - val_loss: 0.0246 - val_acc: 0.9923
Epoch 18/50
54000/54000 [==============================] - 31s - loss: 0.0463 - acc: 0.9877 - val_loss: 0.0271 - val_acc: 0.9933
Epoch 19/50
54000/54000 [==============================] - 30s - loss: 0.0472 - acc: 0.9877 - val_loss: 0.0239 - val_acc: 0.9935
Epoch 20/50
54000/54000 [==============================] - 30s - loss: 0.0446 - acc: 0.9885 - val_loss: 0.0226 - val_acc: 0.9942
Epoch 21/50
54000/54000 [==============================] - 30s - loss: 0.0435 - acc: 0.9890 - val_loss: 0.0218 - val_acc: 0.9947
Epoch 22/50
54000/54000 [==============================] - 30s - loss: 0.0432 - acc: 0.9889 - val_loss: 0.0244 - val_acc: 0.9928
Epoch 23/50
54000/54000 [==============================] - 30s - loss: 0.0419 - acc: 0.9893 - val_loss: 0.0245 - val_acc: 0.9943
Epoch 24/50
54000/54000 [==============================] - 30s - loss: 0.0423 - acc: 0.9890 - val_loss: 0.0231 - val_acc: 0.9933
Epoch 25/50
54000/54000 [==============================] - 30s - loss: 0.0400 - acc: 0.9894 - val_loss: 0.0213 - val_acc: 0.9938
Epoch 26/50
54000/54000 [==============================] - 30s - loss: 0.0384 - acc: 0.9899 - val_loss: 0.0226 - val_acc: 0.9943
Epoch 27/50
54000/54000 [==============================] - 30s - loss: 0.0398 - acc: 0.9899 - val_loss: 0.0217 - val_acc: 0.9945
Epoch 28/50
54000/54000 [==============================] - 30s - loss: 0.0383 - acc: 0.9902 - val_loss: 0.0223 - val_acc: 0.9940
Epoch 29/50
54000/54000 [==============================] - 31s - loss: 0.0382 - acc: 0.9898 - val_loss: 0.0229 - val_acc: 0.9942
Epoch 30/50
54000/54000 [==============================] - 31s - loss: 0.0379 - acc: 0.9900 - val_loss: 0.0225 - val_acc: 0.9950
Epoch 31/50
54000/54000 [==============================] - 30s - loss: 0.0359 - acc: 0.9906 - val_loss: 0.0228 - val_acc: 0.9943
10000/10000 [==============================] - 2s





[0.017431972888592554, 0.99470000000000003]

Our updated model achieves an accuracy of 99.47%99.47% on the test set, a significant improvement from the baseline performance of 99.12%99.12%. Of course, for such a small and (comparatively) simple problem such as MNIST, the gains might not immediately seem that important. Applying these techniques to problems like CIFAR-10 (provided you have sufficient resources) can yield much more tangible benefits.

I invite you to work on this model even further: specifically, make advantage of the validation data to do more than just early stopping: use it to evaluate various kernel sizes/counts, hidden layer sizes, optimisation strategies, activation functions, number of networks in the ensemble, etc. and see how this makes you compare to the best of the best (at the time of writing this post, the top-ranked model achieves an accuracy of 99.79%99.79% on the MNIST test set).

Conclusion

Throughout this post we have covered six methods that can help further fine-tune the kinds of deep neural networks discussed in the previous two tutorials:
L2L2 regularisation
– Initialisation
– Batch normalisation
– Data augmentation
– Ensemble methods
– Early stopping

and successfully applied them to a baseline deep CNN model within Keras, achieving a significant improvement on MNIST, all in under 90 lines of code.

This is also the final topic of the series. I hope that what you’ve learnt here is enough to provide you with an initial drive that, combined with appropriate targeted resources, should see you become a bleeding-edge deep learning engineer in no time!

If you have any feedback on the series as a whole, wishes for future tutorials, or would just like to say hello, feel free to email me: petar [dot] velickovic [at] cl [dot] cam [dot] ac [dot] uk. Hopefully, there will be more tutorials to come from me on this subject, perhaps focusing on topics such as recurrent neural networks or deep reinforcement learning.

Thank you!

ABOUT THE AUTHOR

Petar Veličković

Petar is currently a Research Assistant in Computational Biology within the Artificial Intelligence Group of the Cambridge University Computer Laboratory, where he is working on developing machine learning algorithms on complex networks, and their applications to bioinformatics. He is also a PhD student within the group, supervised by Dr Pietro Liò and affiliated with Trinity College. He holds a BA degree in Computer Science from the University of Cambridge, having completed the Computer Science Tripos in 2015.

 


Exploring Deep Learning on Satellite Data

$
0
0

Exploring Deep Learning on Satellite Data

This is a guest post, originally posted at the Fast Forward Labs blog, by Patrick Doupe, now at the Arnhold Institute for Global Health. In this post Patrick describes his Insight project, undertaken in consultation with Fast Forward Labs during the Winter 2016 NYC Data Science session.

Machines are getting better at identifying objects in images. These technologies are used to do more than organize your photos or chat your family and friends with snappy augmented pictures and movies. Some companies are using them to better understand how the world works. Be it by improving forecasts on Chinese economic growth from satellite images of construction sites or estimating deforestation, algorithms and data can help provide useful information about the current and future states of society.

In early 2016, I developed a prototype of a model to predict population from satellite images. This extends existing classification tasks, which ask whether something exists in an image. In my prototype, I ask how much of something not directly visible is in an image? The regression task is difficult; current advice is to turn any regression problem into a classification task. But I wanted to aim higher. After all, satellite images appear different across populated and non populated areas.

Populated Region
Empty region

The prototype was developed in conjuction with Fast Forward Labs, as my project in the Insight Data Science program. I trained convolutional neural networks on LANDSAT satellite imagery to predict Census population estimates. I also learned all of this, from understanding what a convolutional neural network is, to dealing with satellite images to building a website with four weeks of support and mentorship from Fast Forward Labs and Insight Data Science. If I can do this within a few weeks, your data scientists too can take your project from idea to prototype in a short amount of time.

LANDSAT-landstats

Counting people is an important task. We need to know where people are to provide government services like health care and to develop infrastructure like school buildings. There are also constitutional reasons for a Census,which I’ll leave to Sam Seaborn.

We typically get this information from a Census or other government surveys like the American Community Survey. These are not perfect measures. For example, the inaccuracies are biased against those who are likely to use government services.

If we could develop a model that could estimate the population well at the community level, we could help government services better target those in need. The model could also help governments that facing resources constraints that prevent the running of a census. Also, if it works for counting humans, then maybe it could work for estimating other socio-economic statistics. Maybe even help provide universal internet access. So much promise!

So much reality

Satellite images are huge. To keep the project manageable I chose two US States that are similar in their environmental and human landscape; one State for model training and another for model testing. Oregon and Washington seemed to fit the bill. Since these states were chosen based on their similarity, I thought I would stretch the model by choosing a very different state as a tougher test. I’m from Victoria, Australia, so I chose this glorious region.

Satellite images are also messy and full of interference. To minimize this issue and focus on the model, I chose the LANDSAT Top Of Atmosphere (TOA) annual composite satellite image for 2010. This image is already stitched together from satellite images with minimal interference. I obtained the satellite images from the Google Earth Engine. I began with low resolution images (1km) and lowered my resolution in each iteration of the model.

For the Census estimates, I wanted the highest spatial resolution, which is the Census block. A typical Census block contains between 600 and 3000 people, or about a city block. To combine these datasets I assigned each pixel its geographic coordinates and merged each pixel to its census population estimates using various Python geospatial tools. This took enough time that I dropped the bigger plans. Best get something complete than a half baked idea.

A very high level overview of training Convolutional Neural Networks

The problem I faced is a classic supervised learning problem: train a model on satellite images to predict census data. Then I could use standard methods, like linear regression or neural networks. For every pixel there is number corresponding to the intensity of various light bandwidths. We then have the number of features equal to the number of bandwidths by the number of pixels. Sure, we could do some more complicated feature engineering but the basic idea could work, right?

Not really. You see, a satellite image is not a collection of independent pixels. Each pixel is connected to other pixels and this connection has meaning. A mountain range is connected across pixels and human built infrastructure is connected across pixels. We want to retain this information. Instead of modeling pixels independently, we need to model pixels in connection with their neighbors.

Convolutional neural networks (hereafter, “convnets”) do exactly this. These networks are super powerful at image classification, with many models reporting better accuracy than humans. For the problem of estimating population numbers, we can swap the loss function and run a regression.

Diagram of a simple convolutional neural network processing an input image. From Fast Forward Labs report on Deep Learning: Image Analysis

Training the model

Unfortunately convnets can be hard to train. First, there are a lot of parameters to set in a convnet: how many convolutional layers? Max-pooling or average-pooling? How do I initialize my weights? Which activations? It’s super easy to get overwhelmed. My contact at Fast Forward Labs suggested using VGGNet as a starting base for a model. For other parameters, I based the network on what seemed to be the current best practices. I learned these by following this winter’s convnet course at Stanford.

Second, convnets take a lot of time and data to train, we’re talking weeks here. I want fast results for a prototype. One option is to use pre-trained models, like those available at the Caffe model zoo. I was writing my model using the Keras python library, which at present doesn’t have as large a zoo of models. Instead, I chose to use a smaller model and see if the results pointed in a promising direction.

Results

To validate the model, I used data from on Washington, USA and Victoria, Australia. I show the model’s accuracy on the following scatter plot of the model’s predictions against reality. The unit of observation is the small image-observation used by the network and I estimate the population density in an image. Since each image size is roughly the same area (complication: the earth is round), this is the same as estimating population. Last, the data is quasi log-normalized.

Phew. Let’s start with Washington:

We see that the model is picking up the signal. Higher actual population densities are associated with higher model predictions. Also noticeable is that the model struggles to estimate regions of zero population density. The R^2 of the model is 0.74. That is, the model explains about 74 percent of the spatial variation in population. This is up from 26 percent in the four weeks in Insight.

A harder test is a region like Victoria with a different natural and built environment. The scatter plot of model performance shows the reduced performance. The model’s inability to pick regions of low population is more apparent here. Not only does the model struggle with areas of zero population, it predicts higher population for low population areas. Nevertheless, with an R^2 of 0.63, the overall fit is good for a harder test.

An interesting outcome is that the regression estimates are quite similar for both Washington and Victoria: the model consistently underestimates reality. In sample, we also have a model that underestimates population. Given that the images are unlikely to have enough information to identify human settlements at current resolution, it’s understandable that the model struggles to estimate population in these regions.

Model results

Conclusion

LANDSAT-landstats was an experiment to see if convnets could estimate objects they couldn’t ‘see.’ Given project complexity, the timeframe, and my limited understanding of the algorithms at the outset, the results are promising. We’re not at a stage to provide precise estimates of a region’s population, but with improved image resolution and advances in our understanding of convnets, we may not be far away.

Interested in transitioning to a career in data science? Find out more about the Insight Data Science Fellows Program in New York and Silicon Valley, applytoday, or sign up for program updates.

Already a data scientist or engineer? Find out more about our Advanced Workshops for Data Professionals. Register for two-day workshops in Apache Spark and Data Visualization, or sign up for workshop updates.


This startup uses machine learning and satellite imagery to predict crop yields

$
0
0

This startup uses machine learning and satellite imagery to predict crop yields

Artificial intelligence + nanosatellites + corn


Cloudless: Open Source Deep Learning Pipeline for Orbital Satellite Data

$
0
0

Cloudless: Open Source Deep Learning Pipeline for Orbital Satellite Data

Introduction

I’m proud to announce the 1.0 release of Cloudless, an open source computer vision pipeline for orbital satellite data, powered by data from Planet Labs and using deep learning under the covers. This blog post contains details and a technical report on the project.

Note: this article uses Stretchtext, which are dashed and bordered boxes like this one and which can optionally be clicked and opened inline to learn more details about a subject.

The Cloudless project was born during Dropbox’s week-long Hack Week in August 2015 with contributions from Johann Hauswald, Max Nova, and myself; I’ve continued working on the project post-Hack Week to bring it to a 1.0 release, generally during the weekly colearning groups I run.

What is Planet Labs?


Figure 1 Two Planet Labs “Doves” in orbit

Planet Labs is a startup company that has launched fleets of nanosats, each about the size of a shoebox, to image much of the earth daily. They’ve launched about eighty of their nanosats, called Doves, into Low Earth Orbit, via both the International Space Station (ISS) and as secondary-payloads. Doves are built from commercially available hardware and can be placed into orbit quickly, resulting in much lower prices per satellite and allowing much more rapid iteration in what is known as “agile aerospace.” Planet Labs approach is in contrast to traditional geospatial imaging satellites, which cost billions of dollars, take five to ten years to develop, and are launched into geosynchronous orbit.


Figure 2 Two Planet Labs Doves being launched from the ISS

Motivation

Once Planet Labs full fleet is deployed 90% of the earth’s surface will be imaged once daily. This will result in a flood of visual data that is greater than any person can efficiently process. Can we have computers do the detection and localization of items in the orbital satellite data instead?

Interesting applications of automated visual satellite detection are counting the numbers of cars in Walmart parking lots as a correlate of daily consumer demand or detecting deforestation from illegal logging operations. The Planet Labs resolution is currently three through five meters, which is not great enough for counting cars, however, and publicly available forestry deforestation data sets are not suitable yet.

For this reason we focused on automated cloud detection, hence the name Cloudless (which is also a pun for the fact that the project originated during Dropbox’s Hack Week, a cloud-based company — who says deep learning can’t be funny).

Detecting clouds in orbital satellite imagery to ignore or eliminate them is an important pre-processing step to doing interesting work with nanosat imagery as we want to detect changes over time, such as cars appearing, forests disappearing, etc. Being able to first detect and eliminate clouds (which change often and could lead to false positives), is therefore important.

Note that even though the Cloudless pipeline is currently focused on cloud detection and localization, the entire pipeline focuses on what would be needed for any visual orbital detection and localization task and with tweaking can be applied to other problems.

Related Work

Cloud detection and elimination are by no means solved problems. However, there are currently two broad non-deep learning approaches for cloud detection and elimination that help with the problem.

The first involves detailed, hand-rolled feature extraction pipelines that attempt to extract relevant pixel-level details from infrared, thermal, multispectral bands, etc. and then use various thresholds that are sometimes location, day, and season-specific. They are complicated and not necessarily universal. In addition, the Planet Labs satellites only currently work in the visual spectrum (except for a few satellites gained through a recent acquisition named RapidEye, which can detect in the near infrared), which means the multispectral and thermal bands some of these techniques depend on are not present. A good survey of these hand-engineered approaches is in [1].

The second approach involves taking a stack of satellite imagery of the same location over many days and “averaging” them together. This will by definition remove the clouds from the image, leaving consistent details such as cities, roads, surface details, etc. behind. Unfortunately, it also removes much of the “changes over time” information, stripping away what makes Planet Labs imagery so compelling. A good example of the stacking approach is in an open source library from Planet Labs themselves named plcompositor [2].

Approach & Results

Convolutional neural nets (CNNs) have shown surprisingly strong results in image classification tasks in recent years and seem to be a natural fit for this problem. Once trained, a CNN can be used to do object localization to draw bounding boxes around candidate detected items. However, supervised training of CNNs depends on large amounts of training data, which does not readily exist for orbital satellite data, requiring us to bootstrap it ourselves. Cloudless therefore consists of three pieces:

  • An annotation tool that takes data from the Planet Labs API and allows users to draw bounding boxes around clouds to bootstrap training data.
  • A training pipeline that takes annotated data, runs it on EC2 on GPU boxes to fine tune a neural network model using Caffe, and then generates validation statistics to relate how well the trained model performs.
  • A bounding box system that takes the trained cloud classifier and attempts to draw bounding boxes on orbital satellite data, in this case clouds.

The annotation tool is fairly straightforward; it draws imagery from the Planet Labs API, normalizes and pre-processes it, and then presents it to a user through a web browser to draw bounding boxes over candidate objects:


Figure 5 Animated GIF showing annotation tool in action, drawing bounding boxes around clouds

The training portion takes the annotated images, chops them into pieces representing images that either have clouds or not, splits them into 80% training and 20% validation sets, and trains a binary classifier on Caffe. The model itself is a fine-tuned version of AlexNet. The open source project includes scripts to train these on GPU-instances on Amazon EC2 to explore different hyperparameter settings in parallel quickly and to download and query the trained results to understand how well training went on the validation data; see two examples graphs generated from the training pipeline

Finally, the trained classifier is fed into a Python-based implementation of Selective Search [3][4] in order to draw candidate bounding boxes around clouds. As part of the work on this project the Python Selective Search library was back ported from Python 3 to Python 2.7 to be compatible with Caffe and the rest of the Python-2.7 based Cloudless pipeline. Here are example before and after output showing detected clouds on the final trained model (detailed more below in the Trained Model section); detected clouds are overlaid with yellow boxes below:


Figure 6 Satellite image before bounding boxes shown

Figure 7 Satellite image with detected bounding boxes from Cloudless overlaid in yellow

The bounding box system also outputs a JSON file that gives detected cloud coordinates for later-use by possible downstream computer vision consumers.

Trained Model

It took quite a number of iterations to get a model with decent results. The major approaches are detailed in Table 1, with the final best approach bolded:


Table 1 Performance results for different model setups, detailed later

All iterations were based on a BVLC AlexNet Caffe Zoo trained model [6], with fine tuning done on Cloudless annotation data. Convolutional layers were frozen during fine tuning, with training done only on the final fully connected layer. The AlexNet ImageNet output softmax was reduced from one-thousand classes to two, indicating whether a cloud is present in a given image or not. All iterations had a standard 80/20 split between training and hold-out validation data. The number of training epochs for most runs converged fairly quickly, as you can see in the graph below. Finally, all iterations used location imagery restricted to the San Francisco Bay Area as we were restricted via the Planet Labs API to the state of California.


Figure 8 Training converged very rapidly; this shows accuracy converging after a few hundred iterations and then stabilizing for the rest of the 20,000 iterations.

700 images from the Planet Labs API were hand-labelled via the annotation tool and trained; the final accuracy was fairly low, only 62.5%. This was discovered to not necessarily be from small amounts of data, but rather from fairly extensive “motion blur” affecting some satellite imagery. An example is shown below:


Figure 9 Motion blur in Planet Labs imagery

Despite this, the bounding boxes generated were still in the realm of plausibility, though with some false positives caused by the motion blur. An example from the earlier image:


Figure 10 Planet Labs motion blur image with detected cloud bounding boxes in yellow

Getting to a better accuracy involved three things:

1) About two months after Cloudless was started Planet Labs purchased a fleet of satellites named RapidEye. The data from these are clearer than the Planet Labs satellites currently are. An example image:


Figure 11 Example RapidEye imagery showing clearer resolution without motion blur

2) The same location but over 65 days were annotated and fed into the network, allowing the network to ‘learn’ what is constant in a location and what changes. Here’s four days of the larger-format imagery that fed into the annotation tool as an example:


Figure 12 A single day from the San Francisco Bay Area

Figure 13 A single day from the San Francisco Bay Area

Figure 14 A single day from the San Francisco Bay Area

Figure 15 A single day from the San Francisco Bay Area

In affect the network was getting examples of what the San Francisco Bay Area looks like when it’s clear and when it’s cloudy, allowing it to generalize from this.

3) More images were hand-labelled with the annotation tool, totalling about 4000 images.

These three steps added up to a much larger final accuracy of 89.69% on the validation data, with much stronger precision and recall than the earlier run. While its difficult to say exactly how the hand-engineered non-deep learning cloud detection pipelines detailed in Related Work above and in [1] perform, as its fairly dependent on location and hand-chosen threshold, general performance accuracy is reported to be in the 80% to 90% range, making Cloudless generally competitive with them. Unlike those other solutions, however, Cloudless could be fed with multi-class annotated data and trained for a wide variety of tasks without a hand-rolled feature engineering pipeline focused just on clouds.

For the final best solution see the final confusion matrix and details on the raw data fed into neural network training pipeline

Experiments were done to attempt to get beyond the 89.69% accuracy via greater data augmentation. All iterations used Caffe’s built-in clipping and mirroring transformations during training; experiments were done however with manually rotating all labelled images via data preparation 90 degrees in four directions to see if this aided training. Counterintuitively, though, performance was actually lower, as detailed in table 1 above.

Limitations

Since Cloudless uses only visible spectrum imagery, it does not work with night-time images. In addition, Cloudless has not been trained or tested on regions with snow, which has been reported to cause issues with other cloud detection schemes.

In addition, the Selective Search bounding box scheme chosen is fairly computationally intensive. On my Macbook Pro laptop with a 2.5 GHz Intel Core i7 and an NVIDIA GPU, for example, processing a single image took about two minutes. This is not terrible for Planet Labs, as a given location would only be imaged once a day and this task can be parallelized fairly easily.

Unfortunately, generating the bounding boxes depends on a number of hyperparameters for Selective Search that are not always generic across input images and requires some hand tuning for a given location; it’s not always “hands free” yet. Example hyperparameter values are given on the Github Cloudless page. This motivates some of the ideas in the “Future Work” section below.

Future Work

A good future step is to eliminate Selective Search’s computational slowness and hand-crafted hyperparameter tuning from the equation. One possibility is to turn the convolutional neural network currently in Cloudless into a deconvolution neural network [7], feeding in a raw image and having the output be a pixel mask where each value is the class of that pixel, whether its a cloud or some other desired classification and localization. This might allow the network itself to learn what hyperparameter “thresholds” are appropriate for different input images, eliminating the kind of hand tuning the Selective Search bounding boxes currently require, as well as providing pixel-level classification.

To increase accuracy in general, it‘s clear from non-deep learning cloud detection techniques that non-visible spectral bands can be important for inferring the presence of clouds or other items. When a user annotates a satellite image in the annotation tool in the visual spectrum, other side-channel non-visual bands can be saved and also fed into the network, such as infrared, thermal, etc. As Planet Labs integrates greater spectral bands into their satellites this will become more of an option.

In addition, we should be able to feed in the latitude, longitude, altitude, day and time of the year, and the position of the sun of each pixel as input into the network. This is clearly information that a human themselves would use to identify what something is. For example, if you were told that an image came from the North Pole, you would probably classify a white patch as snow if you were unsure, while if you were told it came from the equator in a tropical forest you‘d assign a very high probability to a blury white patch as being a cloud. If you weren’t sure what a white patch was while looking at an orbital image of NYC, you’d probably have different answers if you knew it was winter or summer. It seems only natural to give these same features to a neural network to help it decide when its dealing with “boundary” cases where its not quite sure; having this information would allow it to essentially take a Bayesian approach to figuring out what it is looking at to make an informed guess.

Finally, the annotation tool itself can be extended to run on Mechanical Turk to bootstrap even more data, which would probably be necessary if the deconvolution approach earlier is taken as it has more free parameters to train. If this is done, the annotation tool will have to be extended with a training step to gauge how well labellers do as well as a validation step to run labelled data by other groups of users to gauge their quality, as in [8]. Once done, and once Planet Labs has satellite data with a greater resolution, this can be used to bootstrap multi-class data sets with examples of cars, roads, power lines, ocean tankers, buildings, different biomes, etc.

Conclusion

In this blog post I’ve introduced Cloudless, a three-part open source orbital satellite pipeline that provides annotation tools, a deep learning component, and a bounding box system. It is currently focused on cloud detection and localization, but could be extended for other non-cloud tasks in the future. The final trained model has an accuracy of 89.69% on cloud detection, generally competitive with hand-tuned non-deep learning cloud detection pipelines.

One strong aspect of deep learning is that future proposed enhancements, such as those detailed in “Future Work” above, should help with end-to-end learning of most computer vision oriented orbital satellite data tasks, not just cloud detection tasks. Rather than just increase cloud detection such as with other non-deep learning oriented approaches that use manual feature engineering, investments in deep learning can help improve a wide range of computer vision tasks beyond just the problem at hand.

The GitHub page for Cloudless can be found here.


Special thanks to Johann Hauswald and Max Nova for their help on Cloudless during Hack Week, to Planet Labs for graciously giving us access to their data and API, and for Dropbox for making space for Hack Week and Cloudless.

Bibliography

[1] Gary Jedlovec, “Automated Detection of Clouds in Satellite Imagery”, NASA Marshall Space Flight Center

[2] Frank Warmerdam, Pixel Lapse Compositor (plcompositor)

[3] J. R. R. Uijlings et al., Selective Search for Object Recognition, IJCV, 2013

[4] Koen van de Sande et al., Segmentation As Selective Search for Object Recognition, ICCV, 2011

[5] Brad Neuberg, fork of Selective Search

[6] Caffe Model Zoo

[7] Hyeonwoo Noh et al., Learning Deconvolution Network for Semantic Segmentation

[8] Hao Su et al., Crowdsourcing Annotations for Visual Object Detection

Subscribe to my RSS feed and follow me on Twitter to stay up to date on new posts.

Please note that this is my personal blog — the views expressed on these pages are mine alone and not those of my employer.

BACK TO CODINGINPARADISE.ORG

 


Introducing the PiAQ

$
0
0

Recently, we’ve gotten a lot of buzz for a project we’re really excited about here in the labs – our PiAQ. It’s something we’re glad to see get attention, because we think it’s going to be a valuable tool to increase the quality of life for a lot of people.

The PiAQ is an indoor air quality sensor built to fit on a $35 credit-card sized computer called a Raspberry Pi. Raspberry Pis are great for educational programs or for do-it-yourself projects like our sensor. The PiAQ uses that small computer to power air quality monitors, as well as run software (like our Rosetta Home, a home-monitoring software in development here in the lab) that can interpret and visualize what those monitors are tracking.

The Raspberry Pi provides a great platform for us to build upon, as it’s a trusted low-cost alternative to many of the commercial devices that exist in the Internet of Things space. Since the Pi is compatible with a variety of devices and can run many different types software, it’s the perfect candidate to start building up your smart home. There have been several Pi-based smart home DIY kits, including some with environmental sensors.

The goal for this project is to make information about the air people breathe more accessible. While the prevailing thought has been that outdoor air quality – especially in cities – is worse than indoor, did you know that the reverse can actually be true? In fact, the EPA estimates that indoor air quality can be two-to-five times worse than outdoors in some places, which is especially troubling considering we spent most of our time indoors.

As well as helping people monitor their own indoor air quality, we also built the PiAQ as a launching point for a larger project we’ve been working on: Rosetta Home 2.0. Rosetta Home is a whole-home automation system that will include indoor air quality sensing. By building on top of the Pi, we not only are able to use the Pi’s existing hardware and software, but we also bring the project forward into the Pi community at large. By opening the project up to this community, we get one of the largest groups of “beta-testers” possible; people who are passionate about technology and interested in helping us make the best sensor we can. To date, there have been over 10 million Raspberry Pi units sold.

At CRT Labs, we’re excited about what open-source hardware and software can do for emerging technologies. Currently, you can view our GitHub repository for the PiAQ (and our other projects) and download our hardware schematics and as well as our software. You can build your own PiAQ, or modify the software to your needs. It also allows you to help us debug the code, or find any flaws in our hardware. The more eyes we have on our projects, the quicker we can iterate them.

rosettahome3

It’s great that we have this community open to us, not just to create this product, but to allow us to use what we learn from the PiAQ to expand our indoor air quality sensing even further. We’re working on developing stand-alone sensors that can be networked together in order to give you a sense of the indoor air quality of your whole home. That stat above, where the EPA says your home’s air can be up to five times worse than outdoors, can affect your daily life. For example, NAR’s CTO was curious about his home’s air quality when his wife complained of frequent headaches. He brought home an indoor air quality (IAQ) sensor, and found out his home’s CO2 was above recommended levels. To counter this, he started opening the windows at night and running his whole-house fan and quickly after, his family’s headaches disappeared.

Our IAQs will measure not only CO2, but temperature, humidity, barometric pressure, light and sound intensity, volatile organic compounds (VOCs), CO, and NO2. Temperature, humidity, barometric pressure, and light and sound intensity all contribute to your home’s comfort levels. When it’s too humid, you know to run a dehumidifier; if your baby can’t sleep, you can check the sound levels to see if maybe the party next door is louder than you thought. CO, CO2, and NO2 can actually cause short and long-term health effects. In the short-term, these pollutants can cause headaches, drowsiness, sinus issues, and light-headedness; in the long-term, they have serious consequences, especially when exposure lasts for hours at a time. If you know about what levels these gases occur within your home, you’re able to start mitigating them, like our CTO did when CO2 reached high levels in his house.

The PiAQ will be available to purchase starting in Q1 2017. We are looking for research partnerships; email us and tell us about your projects, and we can work together to see how the PiAQ can fit your needs.

The PiAQ is the exciting first step in beginning to create an ecosystem where the home’s health is monitored just like we monitor our own fitness. The FitBit got people talking about their own health – we now all know that 10,000 steps is a good goal to maintain our body’s fitness. Our goal is to get people talking about the home in the same way. We think about what we do in the lab in terms of our REALTOR® members’ code of ethics: “Under all is the land. Upon its wise utilization and widely allocated ownership depend the survival and growth of free institutions and of our civilization.” We are striving to change how people think about their homes, and by making the home’s health a priority, we can help to positively impact everyone’s lives.


39 Open Source Swift UI Libraries For iOS App Development

$
0
0

39 Open Source Swift UI Libraries For iOS App Development

This is “amazing” series of open source projects.

Developed by Apple Inc, Swift is currently the most popular programming language on Github and it has one of the most active communities that kindly contribute their open source projects.

Open source libraries can be sweet and they can make your life dramatically easier in building your iOS apps. For those iOS folks spending hours and days hunting for good libraries, you may find this post useful.

Mybridge AI evaluates the quality of content and ranks the best articles for professionals. In this observation we’ve compared nearly 2,700 open source Swift UI libraries to select the Top 39. With only 1.4% chance to be included in the list, the average number of Github stars was 2,527.

This is specific to Swift “UI” (User Interface) libraries —broken down into 12 groups: Animation, Popup, Feed, Onboarding, Color, Image, Graph, Icon, Form, Layout, Message, Search.

If you’re looking for open source Swift “Apps”, follow this link.

<Animation UI>

No 1

Spring: A library to simplify iOS animations in Swift. [9164 stars on Github].


No 2

Material: An animation and graphics framework that is used to create beautiful applications [6120 stars on Github].


No 3

RazzleDazzle: A simple keyframe-based animation framework for iOS, written in Swift. Perfect for scrolling app intros [2291 stars on Github].


No 4

Stellar: A fantastic Physical animation library for swift [1881 stars on Github].


No 5

Macaw: Powerful and easy-to-use vector graphics Swift library with SVG support [594 stars on Github].

<Transition UI>

No 6

PagingMenuController: Paging view controller with customizable menu in Swift [1305 stars on Github].


No 7

PreviewTransition: A simple preview gallery controller [1025 stars on Github].


No 8

PinterestSwift: Transition like Pinterest in Swift [1007 stars on Github].


No 9

Youtube iOS app written in swift 3 [786 stars on Github].


No 10

Twicket Segmented Control: Custom UISegmentedControl replacement for iOS, written in Swift [680 stars on Github].

<Pop up UI>

No 11

SCLAlertView-Swift: Beautiful animated Alert View written in Swift [3056 stars on Github].


No 12

SwiftMessages: Very flexible alert messages written in Swift. [1356 stars on Github].


No 13

XLActionController: Fully customizable and extensible action sheet controller written in Swift 3 [1346 stars on Github].


No 14

Popover: Balloon pop up library like Facebook app, written in pure swift. [852 stars on Github].


No 15

Presentr: Wrapper for custom ViewController presentations [635 stars on Github].

<Feed UI>

No 16

FoldingCell: An expanding content cell inspired by folding paper material [4285 stars on Github].


No 17

ExpandingCollection: A card peek/pop controller [2425 stars on Github].


No 18

DGElasticPullToRefresh: Elastic pull to refresh compontent written in Swift [2308 stars on Github].


No 19

Persei: Animated top menu for UITableView / UICollectionView / UIScrollView written in Swift [2269 stars on Github].


No 20

SCLAlertView-Swift: Beautiful animated Alert View written in Swift by Instagram Engineering. [2443 stars on Github].


No 21

PullToMakeSoup: Custom animated pull-to-refresh that can be easily added to UIScrollView [1301 stars on Github].

<Onboarding UI>

No 22

DZNEmptyDataSet: Empty State UI Library [6552 stars on Github].


No 23

Instructions: Create walkthroughs and guided tours in Swift. [2256 stars on Github].


No 24

Presentation: Make tutorials, release notes and animated pages [1680 stars on Github].

<Color UI>

No 25

Chameleon: Flat Color Framework for Swift Developers [7071 stars on Github].


No 26

Hue: All-in-one coloring utility that you’ll ever need to write in Swift [1612 stars on Github].


No 27

DynamicColor: Yet another extension to manipulate colors easily in Swift [1310 stars on Github].

<Image UI>

No 28

FaceAware: An extension that gives UIImageView the ability to focus on faces within an image when using AspectFill [1424 stars on Github].


No 29

ComplimentaryGradientView: Create complementary gradients generated from dominant and prominent colors in supplied image [384 stars on Github].

<Graph UI>

No 30

Charts: Beautiful charts for iOS built in Swift [11433 stars on Github].


No 31

aper-switch: A Swift module which paints over the parent view when the switch is turned on. [3065 stars on Github].

<Icon UI>

No 32

Paper Switch: RAMPaperSwitch is a Swift module which paints over the parent view when the switch is turned on. [1849 stars on Github].


No 33

Circle-menu: CircleMenu is a simple, elegant menu with a circular layout [1768 stars on Github].

<Schedule UI>

No 34

JTAppleCalendar: The Unofficial Swift Apple Calendar Library. View. Control. for iOS & tvOS [1026 stars on Github].


No 35

DateTimePicker: A nicer iOS UI component for picking date and time [455 stars on Github].

<Form UI>

No 36

Eureka: Elegant iOS form builder in Swift [4117 stars on Github].

<Layout UI>

No 37

Neon: A powerful Swift programmatic UI layout framework for iPhone & iPad [3439 stars on Github].

<Message UI>

No 38

NMessenger: A fast, lightweight messenger component built on AsyncDisplaykit and written in Swift [1492 stars on Github].

<Search UI>

No 39

Reel-search: A search controller that allows you to choose options from a list [1364 stars on Github].

<Resources>

No 1) Learn

The Complete iOS 10 Developer Course: Build 21 Apps including Uber, Instagram & Tinder.

[22,575 recommends, 4.7/5 star]

No 2) Interview

Software Engineer Interview Unleashed: Learn from a former Google interviewer.

[210 recommends, 4.8/5 rating]

No 3) Hosting

For those who looking to host a website under 5 minutes

[One of the cheapest]


技术周刊 Vol.10 – React Native丨Learn Once, Write Anywhere

$
0
0

结束了前两期的入门( Vol.8 – React,“5 分钟快速入门”)和进阶(Vol.9 – 进阶吧!React),为期一个月的 React 学习快要完成了。接下来,我们进入学习的最后一阶段 – React Native。

本期周刊重点学习 React Native,从上手到项目实践,希望本期的内容,可以让你对 React 的整体结构,达到一个全局的了解。

React Native 上手

上手一种新的技术,官方的文档 自然是最详实不过了。然而,很多时候看完官方文档,我们仍旧会在自己用的时候掉进各种各样的坑里,所以,我们精选下面这几篇文章,让你在上手 React Native 的同时尽量避免进坑。

ChanceKing – React Native 初构建之我等到花儿都谢了

喜欢 React Native,因为它改变了前端给大家的传统认知,拓展了前端的维度;因为它不仅能在 H5 的范畴里搞一搞,也可以侵占到客户端里翻云覆雨,因为它提高了前端的存在感,让人有所期盼和兴奋。本文作者将自己第一次构建 React Native 项目所踩的坑记录一下,如果你也准备上手 React Native,不妨一起跟着试一下。

听海 JamiE – React Native 基础练习指北(一)React Native 基础练习指北(二)

React Native 是如何开发 iOS APP?如果你也好奇,那就赶快准备好 Mac OSX, XCode, node 以及 npm,在终端输入 npm install -g react-native-clireact-native init AwesomeProject,从展示一张海报开始,聊聊模拟数据、渲染,通过接口获取线上数据并展示等环节。

陈学家_6174 – React-Native 之布局篇

宽度单位和像素密度、flex 布局、图片布局、绝对定位和相对定位、文本元素……详细的讲解方式,简洁的特征总结,帮你轻松搞定 React-Native 布局。

陈学家_6174 – React-Native 与 React-Web 的融合

对于 React-Native 在实际中的应用,Facebook 官方的说法是 React-Native 是为多平台提供共同的开发方式,而不是说一份代码,多处使用。为此,作者也尝试通过一个实际的例子(以 SampleApp 做一个简单 demo)探究一下共享代码的可行性。

cnsnake11 – React Native 增量升级方案

当修改了代码或者图片的时候,只要 app 使用新的 bundle 文件和 assets 文件夹,就完成了一次在线升级。本文将基于以上思路,尝试讲解增量升级的解决方案。

DesGemini – 初窥基于 react-art 库的 React Native SVG

在移动端,考虑到跨平台的需求,加之 web 端的技术积累,react-art 成为了现成绘制图形的解决方案,且添加了 iOS 和 Android 平台上对 react-art 的支持,在此,作者为诸位带来了(全球首发?=_=)入门文档。

静逸秋水 – React Native 开发小 Tips

相信好多写 React Native 的都是前端出身,当遇见问题时会习惯性从前端出发,但由于 React Native 本身的限制,并不是支持足够多的属性和样式,故作者结合自己的开发实践,将一些未来开发可能会遇见的问题做了总结,并给出一些小的代码参考,希望能帮助到你。

DesGemini – React Native 蛮荒开发生存指南

React Native 的发展可谓是大红大紫,但其文档更新速度却远远跟不上开发的速度,使得 React Native 的工程化恍若蛮荒生存。作者为某一商业项目开发 React Native App 已近半年,并将自己的踩坑和爬坑经验撰写成文,希望此份指南能为大家带来帮助。

React Native for Android

本章节选自侯医生的「React Native Android 安利」系列,教程将会更多的结合原生的安卓去讲 React Native,项目从搭建 React Native Android 环境开始,层层深入,跟着教程学习,可以熟练掌握 react-native-android 的开发。

1. 搭建 React Native Android 环境

搭建 React-Native 的文章虽然很多,但大多数都是搭建 JS 层面的,没有结合原生 Android 及其开发去讲。本文将简单介绍如何使用 Android Studio 与 React Native 搭建一个 React 的基础环境,并使用其成功的制作出了一个 hello world。

2. 创建简单 RN 应用(以 JS 角度来看 RN)

从成功制作出一个 hello world 之后,我们要探索一下如何利用 React-Native 制作出比 Hello World 复杂一点的界面,顺便一起审视一些 React Native Android 工程的目录结构。

4. RN 中使用 JS 调用 Java 代码

掌握 3. 如何控制原生 Android 的 activity 间跳转,我们将其中学到的原生知识,使用 JS 去调用。这样双剑合璧,便可以更加高效的开发 React-Native 应用啦~

6. React Native 中的 React.js 基础

很多关于 React-Native 的知识,都是有关于样式,底层,环境等知识的,现在我们来学习一下 React.js 的基础,我们的代码,我们创建的组件及其他相关知识。

8. 手机百度 feed 流的实现

经过上述几篇文章的学习,你将基本掌握了 React-Native 样式的书写与布局的方式。这一节,我们将一起来做个动手实践的例子,来模仿一下手机百度的新闻流,巩固一下自己的 React-Native 能力。

项目分享

ctriptech – React Native 实践之携程 Moles 框架

本次分享将通过对 Moles 框架的分享,介绍携程在 React Native 方面的实战干货,希望给大家一些灵感和启发。内容包括三个方面:

  • Moles 框架在 React Native 和我们主 App 的集成中起到了什么作用?
  • Moles 框架是如何打通 Android、iOS、H5、SEO,让我们一套代码跑在多个平台上?
  • Moles 框架的组成以及原理是怎样的?

静逸秋水 – 使用 React Native 制作圆形加载条

进度条的常规制作方法,可以用 canvas 去绘制图,也可以用 SVG 去绘制。如何使用 React Native 写这样的进度条呢?可以跟着作者一起来尝试一下这个进度条的完成方案。

DesGemini – Node.js + React Native 毕设:农业物联网监测系统的开发手记

该物联网监测系统整体上可分为三层:数据库层,服务器层和客户端层。数据库层除了原有的 Oracle 11d 数据库以外,还额外增加了一个 Redis 数据库。服务器层采用 Node.js 的 Express 框架作为客户端的 API 后台。客户端层绝大部分是 React Native 代码,另外采用了 Redux 来统一应用的事件分发和 UI 数据管理了。一起来感受下 react native 自带 buff 吧~

王铁手 – React-Native 初体验 – 使用 JavaScript 来写 iOS app

写过一个 Hybrid App 了,自觉不够,又心血来潮,耍起了 React Native,非常简单的入门,不知初体验的你是否和作者想一块儿去了。

(本期完)


Raspberry Pi Weather Station: Monitoring Humidity, Temperature and Pressure over Internet

$
0
0
Raspberry Pi Weather Station: Monitoring Humidity, Temperature and Pressure over ThingSpeakRaspberry Pi Weather Station: Monitoring Humidity, Temperature and Pressure over ThingSpeak

Humidity, Temperature and Pressure are three basic parameters to build any Weather Station and to measure environmental conditions. We have previously built a mini Weather Station using Arduinoand this time we are extending the weather station with Raspberry Pi. This IoT based Project aims to show the current Humidity, Temperature and Pressure parameters on the LCD as well on the Internet server using Raspberry Pi, which makes it a Raspberry Pi Weather Station. You can install this setup anywhere and can monitor the weather conditions of that place from anywhere in the world over the internet, it will not only show the current data but can also show the past values in the form of Graphs.

 

We have used DHT11 Humidity & temperature sensor for sensing the temperature and BM180 Pressure sensor module for measuring barometric pressure. This Celsius scale Thermometer and percentage scale Humidity meter displays the ambient temperature and humidity through a LCD display and barometric pressure is displayed in millibar or hPa (hectopascal). All this data is sent toThingSpeak server for live monitoring from anywhere in the world over internet. Do check theDemonstration Video and Python Program, given at the end of this tutorial.

Raspberry-pi-weather-station-IoT-project

Working and ThingSpeak Setup:

This IoT based project has four sections. Firstly DHT11 sensor senses the Humidity & Temperature Data and BM180 sensor measures the atmospheric pressure. Secondly Raspberry Pi reads the DHT11 sensor module’s output by using single wire protocol and BM180 pressure sensor’s output by using I2C protocol and extracts both sensors values into a suitable number in percentage (humidity), Celsius scale (temperature), hectoPascal or millibar (pressure). Thirdly, these values are sent to ThingSpeak server by using inbuilt Wi-Fi of Raspberry Pi 3. And finally ThingSpeak analyses the data and shows it in a Graph form. A LCD is also used to display these values locally.

Raspberry-pi-weather-station-block-diagram

ThingSpeak provides very good tool for IoT based projects. By using ThingSpeak site, we can monitor our data and control our system over the Internet, using the Channels and webpages provided by ThingSpeak. ThingSpeak ‘Collects’ the data from the sensors, ‘Analyze and Visualize’ the data and ‘Acts’ by triggering a reaction. We have previously explained about sending data to ThingSpeak in detail, you can check there. Here we are briefly explaining to use ThingSpeak for this Raspberry Pi Weather station.

First you need to create account on ThingSpeak website and create a ‘New channel’ in it. In new channel you have to define some fields for the data you want to monitor, like in this project we will create three fields for Humidity, Temperature and Pressure data.

 

Now click on ‘API keys’ tab and save the Write and Read API keys, here we are only using Write key. You need to Copy this key in ‘key’ variable in the Code.

Raspberry-pi-weather-station-thingspeak-API-key

After it, click on ‘Data Import/Export’ and copy the Update Channel Feed GET Request URL, which is:

https://api.thingspeak.com/update?api_key=30BCDSRQ52AOI3UA&field1=0

Raspberry-pi-weather-station-thingspeak-feed-GET-url

Now we need this ‘Feed Get Request URL’ in our Python code to open “api.thingspeak.com” and then send data using this Feed Request as query string. And Before sending data user needs to enter the temperature, humidity and pressure data in this query String using variables in program, check in the Code at the end this article.

URL = 'https://api.thingspeak.com/update?api_key=%s' % key
finalURL = URL +"&field1=%s&field2=%s"%(humi, temp)+"&field3=%s" %(pressure)

Working of DHT11 is based on single wire serial communication for fetching data from DHT11. Here we have used AdaFruit DHT11 library for interfacing DHT11 with Raspberry Pi. Raspberry Pi here collects the Humidity and temperature data from DHT11 and atmospheric pressure from BMP180 sensor and then sends it to 16×2 LCD and ThingSpeak server. ThingSpeak displays the Data in form of Graph as below:

Raspberry-pi-weather-station-humidity-temperature-pressure-charts

DHT11-humidity-and-temperature-sensor-and-BMP180-pressure-sensorDHT11 Humidity & Temperature Sensor and BMP180 Pressure Sensor

You can learn more about DHT11 Sensor and its Interfacing with Arduino here.

Circuit Diagram:

Raspberry-pi-weather-station-circuit-diagram

 

Raspberry Pi Configuration and Python Program:

We are using Python language here for the Program. Before coding, user needs to configure Raspberry Pi. You can check our previous tutorials for Getting Started with Raspberry Pi and Installing & Configuring Raspbian Jessie OS in Pi.

First off all we need to install Adafruit Python DHT Sensor Library files to run this project on Raspberry Pi. To do this we need to follow given commands:

sudo apt-get install git-core
sudo apt-get update
git clone https://github.com/adafruit/Adafruit_Python_DHT.git
cd Adafruit_Python_DHT
sudo apt-get install build-essential python-dev
sudo python setup.py install

Installing-adafruit-python-DHT11-library-in-raspberry-Pi

After it user needs to enable Raspberry Pi I2C by going into RPi Software Configuration Too:

sudo raspi-config

Then go to ‘Advance Options’, select ‘I2C’ and ‘Enable’ it.

raspberry-pi-software-configuration-tool-raspi-config

Raspberry-pi-weather-station-Enable-I2C-for-BMP180

Programming part of this project plays a very important role to perform all the operations. First of all we include all required libraries, initiaze variables and define pins for LCD and DHT11.

import sys
import RPi.GPIO as GPIO
import os
import Adafruit_DHT
import urllib2
import smbus
import time
from ctypes import c_short

#Register Address
regCall   = 0xAA
... ......
 ..... ...

In def main(): function, below code is used for sending the data to the server and display it over the LCD, continuously in while loop.

def main():

    print 'System Ready...'
    URL = 'https://api.thingspeak.com/update?api_key=%s' % key
    print "Wait...."
    while True:
            (humi, temp)= readDHT()
            (pressure) =readBmp180()

            lcdcmd(0x01)
            lcdstring("Humi#Temp#P(hPa)")
            lcdstring(humi+'%'+"  %sC  %s" %(temp, pressure))
            finalURL = URL +"&field1=%s&field2=%s"%(humi, temp)+"&field3=%s" %(pressure)
            print finalURL
            s=urllib2.urlopen(finalURL);
            print  humi+ " " + temp + " " + pressure
            s.close()
            time.sleep(10)

For LCD, def lcd_init() function is used to initialize LCD in four bit mode, def lcdcmd(ch) function is used for sending command to LCD, def lcddata(ch) function is used for sending data to LCD and def lcdstring(Str) function is used to send data string to LCD. You can check all these functions in Code given afterwards.

Given def readDHT() function is used for reading DHT11 Sensor:

def readDHT():
    humi, temp = Adafruit_DHT.read_retry(Adafruit_DHT.DHT11, DHTpin)
    return (str(int(humi)), str(int(temp)))

def readBmp180 function is used for reading pressure from the BM180 sensor. BM180 sensor can also give temperature but here we have only used it for calculating pressure.

def readBmp180(addr=deviceAdd):
  value = bus.read_i2c_block_data(addr, regCall, 22)  # Read calibration data

  # Convert byte data to word values
  AC1 = convert1(value, 0)
  AC2 = convert1(value, 2)
  AC3 = convert1(value, 4)
  AC4 = convert2(value, 6)
  ..... .......
  ........ ......

So this is the basic Raspberry Pi Weather Station, you can further extend it to measure various weather related parameters like wind speed, soil temperature, illuminance (lux), rainfall, air quality etc.

Code:

import sys
import RPi.GPIO as GPIO
import os
import Adafruit_DHT
import urllib2
import smbus
import time
from ctypes import c_short

#Register Address
regCall   = 0xAA
regMean   = 0xF4
regMSB    = 0xF6
regLSB    = 0xF7
regPres   = 0x34
regTemp   = 0x2e

DEBUG = 1
sample = 2
deviceAdd =0x77

humi=””
temp=””

#bus = smbus.SMBus(0)  #for Pi1 uses 0
I2cbus = smbus.SMBus(1) # for Pi2 uses 1

DHTpin = 17

key=”30BCDSRQ52AOI3UA”       # Enter your Write API key from ThingSpeak

GPIO.setmode(GPIO.BCM)
# Define GPIO to LCD mapping
LCD_RS = 18
LCD_EN  = 23
LCD_D4 = 24
LCD_D5 = 16
LCD_D6 = 20
LCD_D7 = 21

GPIO.setwarnings(False)
GPIO.setmode(GPIO.BCM)
GPIO.setup(LCD_E, GPIO.OUT)
GPIO.setup(LCD_RS, GPIO.OUT)
GPIO.setup(LCD_D4, GPIO.OUT)
GPIO.setup(LCD_D5, GPIO.OUT)
GPIO.setup(LCD_D6, GPIO.OUT)
GPIO.setup(LCD_D7, GPIO.OUT)

def convert1(data, i):   # signed 16-bit value
return c_short((data[i]<< 8) + data[i + 1]).value

def convert2(data, i):   # unsigned 16-bit value
return (data[i]<< 8) + data[i+1]

def readBmp180(addr=deviceAdd):
value = bus.read_i2c_block_data(addr, regCall, 22)  # Read calibration data

# Convert byte data to word values
AC1 = convert1(value, 0)
AC2 = convert1(value, 2)
AC3 = convert1(value, 4)
AC4 = convert2(value, 6)
AC5 = convert2(value, 8)
AC6 = convert2(value, 10)
B1  = convert1(value, 12)
B2  = convert1(value, 14)
MB  = convert1(value, 16)
MC  = convert1(value, 18)
MD  = convert1(value, 20)

# Read temperature
bus.write_byte_data(addr, regMean, regTemp)
time.sleep(0.005)
(msb, lsb) = bus.read_i2c_block_data(addr, regMSB, 2)
P2 = (msb << 8) + lsb

# Read pressure
bus.write_byte_data(addr, regMean, regPres + (sample << 6))
time.sleep(0.05)
(msb, lsb, xsb) = bus.read_i2c_block_data(addr, regMSB, 3)
P1 = ((msb << 16) + (lsb << 8) + xsb) >> (8 – sample)

# Refine temperature
X1 = ((P2 – AC6) * AC5) >> 15
X2 = (MC << 11) / (X1 + MD) B5 = X1 + X2 temperature = (B5 + 8) >> 4

# Refine pressure
B6  = B5 – 4000
B62 = B6 * B6 >> 12
X1  = (B2 * B62) >> 11
X2  = AC2 * B6 >> 11
X3  = X1 + X2
B3  = (((AC1 * 4 + X3) << sample) + 2) >> 2

X1 = AC3 * B6 >> 13
X2 = (B1 * B62) >> 16
X3 = ((X1 + X2) + 2) >> 2
B4 = (AC4 * (X3 + 32768)) >> 15
B7 = (P1 – B3) * (50000 >> sample)

P = (B7 * 2) / B4

X1 = (P >> 8) * (P >> 8)
X1 = (X1 * 3038) >> 16
X2 = (-7357 * P) >> 16
pressure = P + ((X1 + X2 + 3791) >> 4)

return (str(pressure/100.0))

def readDHT():
humi, temp = Adafruit_DHT.read_retry(Adafruit_DHT.DHT11, DHTpin)
return (str(int(humi)), str(int(temp)))

def lcd_init():
lcdcmd(0x33)
lcdcmd(0x32)
lcdcmd(0x06)
lcdcmd(0x0C)
lcdcmd(0x28)
lcdcmd(0x01)
time.sleep(0.0005)

def lcdcmd(ch):
GPIO.output(RS, 0)
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x10==0x10:
GPIO.output(D4, 1)
if ch&0x20==0x20:
GPIO.output(D5, 1)
if ch&0x40==0x40:
GPIO.output(D6, 1)
if ch&0x80==0x80:
GPIO.output(D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

# Low bits
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x01==0x01:
GPIO.output(LCD_D4, 1)
if ch&0x02==0x02:
GPIO.output(LCD_D5, 1)
if ch&0x04==0x04:
GPIO.output(LCD_D6, 1)
if ch&0x08==0x08:
GPIO.output(LCD_D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

def lcddata(ch):
GPIO.output(RS, 1)
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x10==0x10:
GPIO.output(D4, 1)
if ch&0x20==0x20:
GPIO.output(D5, 1)
if ch&0x40==0x40:
GPIO.output(D6, 1)
if ch&0x80==0x80:
GPIO.output(D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

# Low bits
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x01==0x01:
GPIO.output(LCD_D4, 1)
if ch&0x02==0x02:
GPIO.output(LCD_D5, 1)
if ch&0x04==0x04:
GPIO.output(LCD_D6, 1)
if ch&0x08==0x08:
GPIO.output(LCD_D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

def lcdstring(Str):
l=0;
l=len(Str)
for i in range(l):
lcddata(ord(message[i]))

lcd_init()
lcdcmd(0x01)
lcdstring(“Circuit Digest”)
lcdcmd(0xc0)
lcdstring(“Welcomes you”)
time.sleep(3) # 3 second delay

# main() function
def main():

print ‘System Ready…’
URL = ‘https://api.thingspeak.com/update?api_key=%s‘ % key
print “Wait….”
while True:
(humi, temp)= readDHT()
(pressure) =readBmp180()

lcdcmd(0x01)
lcdstring(“Humi#Temp#P(hPa)”)
lcdstring(humi+’%’+”  %sC  %s” %(temp, pressure))
finalURL = URL +”&field1=%s&field2=%s”%(humi, temp)+”&field3=%s” %(pressure)
print finalURL
s=urllib2.urlopen(finalURL);
print  humi+ ” ” + temp + ” ” + pressure
s.close()
time.sleep(10)

if __name__==”__main__”:
main()

Video:


How to build an autonomous, voice-controlled, face-recognizing drone for $200

$
0
0

More adventures in deep learning and cheap hardware.

October 25, 2016

Early aeronautics, 1818.
Early aeronautics, 1818.(source: Library of Congress on Wikimedia Commons).

After building an image-classifying robot, the obvious next step was to make a version that can fly. I decided to construct an autonomous drone that could recognize faces and respond to voice commands.

Choosing a prebuilt drone

One of the hardest parts about hacking drones is getting started. I got my feet wet first by building a drone from parts, but like pretty much all of my DIY projects, building from scratch ended up costing me way more than buying a prebuilt version—and frankly, my homemade drone never flew quite right. It’s definitely much easier and cheaper to buy than to build.

Most of the drone manufacturers claim to offer APIs, but there’s not an obvious winner in terms of a hobbyist ecosystem. Most of the drones with usable-looking APIs cost more than $1,000—a huge barrier to entry.

But after some research, I found the Parrot AR Drone 2.0 (see Figure 1), which I think is a clear choice for a fun, low-end, hackable drone. You can buy one for $200 new, but so many people buy drones and never end up using them that a secondhand drone is a good option and available widely on eBay for $130 or less.

drone collection
Figure 1. The drone collection in my garage. The Parrot AR drone I used is hanging on the far left. Source: Lukas Biewald.

The Parrot AR drone doesn’t fly quite as stably as the much more expensive (about $550) new Parrot Bebop 2 drone, but the Parrot AR comes with an excellent node.js client library called node-ar-drone that is perfect for building onto.

Another advantage: the Parrot AR drone is very hard to break. While testing the autonomous code, I crashed it repeatedly into walls, furniture, house plants, and guests, and it still flies great.

The worst thing about hacking on drones compared to hacking on terrestrial robots is the short battery life. The batteries take hours to charge and then last for about 10 minutes of flying. I recommend buying two additional batteries and cycling through them while testing.

Programming my drone

Javascript turns out to be a great language for controlling drones because it is so inherently event driven. And trust me, while flying a drone, there will be a lot of asynchronous events. Node isn’t a language I’ve spent a lot of time with, but I walked away from this project super impressed with it. The last time I seriously programmed robots, I used C, where the threading and exception handling is painful enough that there is a tendency to avoid it. I hope someone builds Javascript wrappers for other drone platforms because the language makes it easy and fun to deal with our indeterministic world.

Architecture

I decided to run the logic on my laptop and do the machine learning in the cloud. This setup led to lower latency than running a neural network directly on Raspberry PI hardware, and I think this architecture makes sense for hobby drone projects at the moment.

Microsoft, Google, IBM, and Amazon all have fast, inexpensive cloud machine learning APIs. In the end, I used Microsoft’s Cognitive Service APIsfor this project because it’s the only API that offers custom facial recognition.

See Figure 2 for a diagram illustrating the architecture of the drone:

Smart Drone Architecture
Figure 2. The Smart Drone Architecture. Source: Lukas Biewald.

Getting started

By default, the Parrot AR Drone 2.0 serves a wireless network that clients connect to. This is incredibly annoying for hacking. Every time you want to try something, you need to disconnect from your network and get on the drone’s network. Luckily, there is a super useful project called ardrone-wpa2 that has a script to hack your drone to join your own WiFi network.

It’s fun to Telnet into your drone and poke around. The Parrot runs a stripped down version of Linux. When was the last time you connected to something with Telnet? Here’s an example of how you would open a terminal and log into the drone’s computer directly.

% script/connect "The Optics Lab" -p "particleorwave" -a 192.168.0.1 -d 192.168.7.43
% telnet 192.168.7.43

Flying from the command line

After installing the node library, it’s fun to make a node.js REPL (Read-Evaluate-Print-Loop) and steer your drone:

var arDrone = require('ar-drone');
var client = arDrone.createClient({ip: '192.168.7.43'});
client.createRepl();

drone> takeoff()
true

drone> client.animate(‘yawDance, 1.0)

If you are actually following along, by now you’ve definitely crashed your drone—at least a few times. I super-glued the safety hull back together about a thousand times before it disintegrated and I had to buy a new one. I hesitate to mention this, but the Parrot AR actually flies a lot better without the safety hull. This configuration makes the drone much more dangerous without the hull because when the drone bumps into something the propellers can snap, and it will leave marks in furniture.

Flying from a webpage

It’s satisfying and easy to build a web-based interface to the drone (see Figure 3). The express.js framework makes it simple to build a nice little web server:

var express = require('express');

app.get('/', function (req, res) {
 res.sendFile(path.join(__dirname + '/index.html'));
});

app.get('/land', function(req, res) {
 client.land();
});

app.get('/takeoff', function(req, res) {
 client.takeoff();
});

app.listen(3000, function () {
});

I set up a function to make AJAX requests using buttons:


Takeoff
Land


Streaming video from the drone

I found the best way to send a feed from the drone’s camera was to open up a connection and send a continuous stream of PNGs for my webserver to my website.  My webserver continuously pulls PNGs from the drone’s camera using the AR drone library.

var pngStream = client.getPngStream();

pngStream
 .on('error', console.log)
 .on('data', function(pngBuffer) {
       sendPng(pngBuffer);
 }

function sendPng(buffer) {
 res.write('--daboundary\nContent-Type: image/png\nContent-length: ' + buff
er.length + '\n\n');
 res.write(buffer);
});

Running face recognition on the drone images

The Azure Face API is powerful and simple to use. You can upload pictures of your friends and it will identify them. It will also guess age and gender, both functions of which I found to be surprisingly accurate. The latency is around 200 milliseconds, and it costs $1.50 per 1,000 predictions, which feels completely reasonable for this application. See below for my code that sends an image and does face recognition.

var oxford = require('project-oxford'),
oxc = new oxford.Client(CLIENT_KEY);

loadFaces = function() {
 chris_url = "https://media.licdn.com/mpr/mpr/shrinknp_400_400/AAEAAQAAAAAAAALyAAAAJGMyNmIzNWM0LTA5MTYtNDU4Mi05YjExLTgyMzVlMTZjYjEwYw.jpg";
 lukas_url = "https://media.licdn.com/mpr/mpr/shrinknp_400_400/p/3/000/058/147/34969d0.jpg";
 oxc.face.faceList.create('myFaces');
 oxc.face.faceList.addFace('myFaces', {url => chris_url, name=> 'Chris'});
 oxc.face.faceList.addFace('myFaces', {url => lukas_url, name=> 'Lukas'});
}

oxc.face.detect({
 path: 'camera.png',
 analyzesAge: true,
 analyzesGender: true
}).then(function (response) {
 if (response.length > 0) {
  drawFaces(response, filename)
 }
});

I used the excellent ImageMagick library to annotate the faces in my PNGs. There are a lot of possible extensions at this point—for example, there is anemotion API that can determine the emotion of faces.

Running speech recognition to drive the drone

The trickiest part about doing speech recognition was not the speech recognition itself, but streaming audio from a webpage to my local server in the format Microsoft’s Speech API wants, so that ends up being the bulk of the code. Once you’ve got the audio saved with one channel and the right sample frequency, the API works great and is extremely easy to use. It costs $4 per 1,000 requests, so for hobby applications, it’s basically free.

RecordRTC has a great library, and it’s a good starting point for doing client-side web audio recording. On the client side, we can add code to save the audio file:

app.post('/audio', function(req, res) {
 var form = new formidable.IncomingForm();
 // specify that we want to allow the user to upload multiple files in a single request
 form.multiples = true;
 form.uploadDir = path.join(__dirname, '/uploads');

 form.on('file', function(field, file) {
       filename = "audio.wav"
       fs.rename(file.path, path.join(form.uploadDir, filename));
 });

 // log any errors that occur
 form.on('error', function(err) {
       console.log('An error has occured: \n' + err);
 });

 // once all the files have been uploaded, send a response to the client
 form.on('end', function() {
       res.end('success');
 });

 // parse the incoming request containing the form data
 form.parse(req)

 speech.parseWav('uploads/audio.wav', function(text) {
       console.log(text);
       controlDrone(text);
 });
});

I used the FFmpeg utility to downsample the audio and combine it into one channel for uploading to Microsoft:

exports.parseWav = function(wavPath, callback) {
 var cmd = 'ffmpeg -i ' + wavPath + ' -ar 8000 -ac 1 -y tmp.wav';

 exec(cmd, function(error, stdout, stderr) {
       console.log(stderr); // command output is in stdout
 });

 postToOxford(callback);
});

While we’re at it, we might as well use Microsoft’s text-to-speech API so the drone can talk back to us!

Autonomous search paths

I used the ardrone-autonomy library to map out autonomous search paths for my drone. After crashing my drone into the furniture and houseplants one too many times in my livingroom, my wife nicely suggested I move my project to my garage, where there is less to break—but there isn’t much room to maneuver (see Figure 3).

Flying the drone
Figure 3. Flying the drone in my “lab.” Source: Lukas Biewald.

When I get a bigger lab space, I’ll work more on smart searching algorithms, but for now I’ll just have my drone take off and rotate, looking for my friends and enemies:

var autonomy = require('ardrone-autonomy');
var mission = autonomy.createMission({ip: '10.0.1.3', frameRate: 1, imageSize: '640:320'});

console.log("Here we go!")

mission.takeoff()
         .zero()         // Sets the current state as the reference
         .altitude(1)
         .taskSync(console.log("Checkpoint 1"))
         .go({x: 0, y: 0, z: 1, yaw: 90})
         .taskSync(console.log("Checkpoint 2"))
         .hover(1000)
         .go({x: 0, y: 0, z: 1, yaw: 180})
         .taskSync(console.log("Checkpoint 3"))
         .hover(1000)
         .go({x: 0, y: 0, z: 1, yaw: 270})
         .taskSync(console.log("Checkpoint 4"));
         .hover(1000)
         .go({x: 0, y: 0, z: 1, yaw: 0
         .land()

Putting it all together

Check out this video I took of my drone taking off and searching for my friend Chris:

Conclusion

Once everything is set up and you are controlling the drone through an API and getting the video feed, hacking on drones becomes incredibly fun. With all of the newly available image recognition technology, there are all kinds of possible uses, from surveying floorplans to painting the walls. The Parrot drone wasn’t really designed to fly safely inside a small house like mine, but a more expensive drone might make this a totally realistic application. In the end, drones will become more stable, the price will come down, and the real-world applications will explode.

Microsoft’s Cognitive Service Cloud APIs are easy to use and amazingly cheap. At first, I was worried that the drone’s unusually wide-angle camera might affect the face recognition and that the loud drone propeller might interfere with the speech recognition, but overall the performance was much better than I expected. The latency is less of an issue than I was expecting. Doing the computation in the Cloud on a live image feed seems like a strange architecture at first, but it will probably be the way of the future for a lot of applications.

Article image: Early aeronautics, 1818. (source: Library of Congress on Wikimedia Commons).


深度学习

$
0
0

可可老师24小时热门分享(2016.10.13)

2016-10-13 21:12阅读 612
No 1. 【Berkeley深度学习专题课程】
No 2. 【YOLO实时目标检测Caffe实现】
No 3. 【致力于让高中生能读懂的高等数学教材】
No 4. 【Jupyter Notebook提示、技巧与快捷键27例】
No 5. 《GitHub 上有哪些值得推荐的开源电子书? – 知乎》
No 6. 《Hybrid computing using a neural network with dynamic external memory》
No 7. 《想知道大家都用python写过哪些有趣的脚本? – 知乎》
No 8. 《机器阅读理解任务综述》
No 9. 《深度学习的“深度”有什么意义? – 知乎》
No 10. 《麻省理工学院(MIT)研究生学习指导》
No 11. 【Keras神经网络优化教程】
No 12. 【Neural Style的TensorFlow实现】
No 13. 【面向软件工程师的机器学习每日学习计划】
No 14. 【深度学习就业指南】
No 15. 【网络爬虫大全】
No 16. 【深度学习软硬件评测】
No 17. 【Python数值计算编程】
No 18. 【九月十大热门Python文章】
No 19. 【Python变分推断】
No 20. 《如何系统地学习Python 中 matplotlib, numpy, scipy, pandas? – 知乎》
No 21. 《Feedforward semantic segmentation with zoom-out features》
No 22. 《为什么有些人除了上课时间以外都没有学习,成绩却还是很好? – 知乎》
No 23. 《[Machine Learning] Active Learning(主动学习)》
No 24. 《为什么使用jupyter? – 知乎》
No 25. “If you can’t explain something to a first-year st…
No 26. 《Book Review: Cathy O’Neil’s Weapons of Math Destruction》
No 27. 【图卷积网络】
No 28. 《国内哪些高校老师在机器学习数据挖掘方面做的不错? – 知乎》
No 29. 读书重在结构生长,形成扎实的支撑;碎片阅读重在视野的纳新和扩展,开枝散叶;思考重在提炼和关联,勾画错…
No 30. 【词嵌入概览】
No 31. 【数据科学中的拓扑】
No 32. 《目前有哪些比较成功的人工智能应用? – 知乎》
No 33. 【DQN从理论到实践】
No 34. 【人物共现关系挖掘实例】
No 35. 【P-values疑云】
No 36. 【视觉图灵测试教程】
No 37. 《Improving performance of recurrent neural network with relu nonlinearity》
No 38. 《How to do Research At the MIT AI Lab》
No 39. 【预测模型标记语言(PMML)研究】
No 40. 【源于Tesseract支持62种语言的纯Javascript字符识别(OCR)引擎Tesseract.js】
No 41. 【与OpenAI gym类似的TORCS(开放赛车模拟器)增强学习环境】
No 42. 【FaceNet人脸识别的Tensorflow实现】
No 43. 《Deep Reinforcement Learning From Raw Pixels in Doom》
No 44. 《Deep Learning with Separable Convolutions》
No 45. 《有没有一种让人欲罢不能的学习方法? – 知乎》
No 46. 《Generative Adversarial Nets from a Density Ratio Estimation Perspective》
No 47. 【轻松实现基于深度学习的高质量目标检测】
No 48. 《Learning in Implicit Generative Models》
No 49. 《Hello, TensorFlow! – Building and training your first TensorFlow graph from the ground up | O’Reilly Media》
No 50. ‘Statistical Machine Learning 10-702/36-702, Sprin…

全面整理30个重要的深度学习库:按Python和C++等10种语言分类

$
0
0

作者:Emmanuelle Rieuf

本文介绍了包括 Python、Java、Haskell等在内的一系列编程语言的深度学习库。

Python

  • Theano 是一种用于使用数列来定义和评估数学表达的 Python 库。它可以让 Python 中深度学习算法的编写更为简单。很多其他的库是以 Theano 为基础开发的:
  1. Keras 是类似 Torch 的一个精简的,高度模块化的神经网络库。Theano 在底层帮助其优化 CPU 和 GPU 运行中的张量操作。
  2. Pylearn2 是一个引用大量如随机梯度(Stochastic Gradient)这样的模型和训练算法的库。它在深度学习中被广泛采用,这个库也是以 Theano 为基础的。
  3. Lasagne 是一个轻量级的库,它可以在 Theano 中建立和训练神经网络。它简单、透明、模块化、实用、专一而克制。
  4. Blocks 是一种帮助你在 Theano 之上建立神经网络模型的框架。
  • Caffe 是一种以表达清晰、高速和模块化为理念建立起来的深度学习框架。它是由伯克利视觉和学习中心(BVLC)和网上社区贡献者共同开发的。谷歌的 DeepDream 人工智能图像处理程序正是建立在 Caffe 框架之上。这个框架是一个 BSD 许可的带有 Python 接口的 C++库。
  • nolearn 包含大量其他神经网络库中的包装器和抽象(wrappers and abstractions),其中最值得注意的是 Lasagne,其中也包含一些机器学习的实用模块。
  • Genism 是一个部署在 Python 编程语言中的深度学习工具包,用于通过高效的算法处理大型文本集。
  • Chainer 连接深度学习中的算法与实现,它强劲、灵活而敏锐,是一种用于深度学习的灵活的框架。
  • deepnet 是一种基于 GPU 的深度学习算法的 Python 实现,比如:前馈神经网络、受限玻尔兹曼机、深度信念网络、自编码器、深度玻尔兹曼机和卷积神经网络。
  • Hebel 是一个在 Python 中用于带有神经网络的深度学习的库,它通过 PyCUDA 使用带有 CUDA 的 GPU 加速。它可实现大多数目前最重要的神经网络模型,提供了多种不同的激活函数和训练方式,如动量,Nesterov 动量,退出(dropout)和 前期停止(early stopping)。
  • CXXNET 是一种快速,简明的分布式深度学习框架,它以 MShadow 为基础。它是轻量级可扩展的 C++/CUDA 神经网络工具包,同时拥有友好的 Python/Matlab 界面,可供机器学习的训练和预测使用。
  • DeepPy 是一种建立在 Mumpy 之上的 Python 化的深度学习框架。
  • DeepLearning 是一个用 C++和 Python 开发的深度学习库。
  • Neon 是 Nervana 公司基于 Python 开发的深度学习框架。

C++

  • eblearn 是一个机器学习的开源 C++库,由纽约大学机器学习实验室的 Yann LeCun 牵头研发。尤其是,按照 GUI、演示和教程来部署的带有基于能量的模型的卷积神经网络。
  • SINGA 被设计用来进行已有系统中分布式训练算法的普通实现。它由 Apache Software Foundation 提供支持。
  • NVIDIA DIGITS 是一个新的用于开发、训练和可视化神经网络系统。它把深度学习放进了基于浏览器的界面中,让数据分析师和研究人员可以快速设计最好的深度学习神经网络(DNN)来获取实时的网络行为可视化数据。
  • Intel® Deep Learning Framework 为英特尔的平台提供了统一的框架来加速深度卷积神经网络。

Java

  • N-Dimensional Arrays for Java (ND4J) 是一种为 JVM 设计的科学计算库。它们被应用在生产环境中,这就意味着路径被设计成可以最小的 RAM 内存需求来快速运行。
  •  Deeplearning4j 是第一个为 Java 和 Scala 编写的消费级开元分布式深度学习库。它被设计成在商业环境中使用,而非研究工具。
  • Encog 是一种先进的机器学习框架,支持支持向量机(Support Vector Machines),人工神经网络(Artificial Neural Networks),基因编程(Genetic Programming),贝叶斯网络(Bayesian Networks),隐马尔科夫模型(Hidden Markov Models)和 遗传算法(Genetic Algorithms)。

JavaScript

  • Convent.js 是一种 Javascript 中用于深度学习模型(主要是神经网络)的库。完全在浏览器中使用,不需要开发工具,不需要编译器,不需要安装,也不需要 GPU 的支持,简单易用。

Lua

  • Torch 是一种科学计算框架,可支持多种计算机学习算法。

Julia

  • Mocha 用于 Julia 的一种深度学习框架,其灵感来源于 C++框架 Caffe。在 Mocha 中通用的随机梯度求解器和公共层的有效实现可以被用于训练深度/浅层(卷积)神经网络,其带有通过(堆叠的)自动解码器的(可选的)无监督的预训练。其最大特点包括:带有模块化架构、 高层面的接口、便携性与速度、兼容性等等。

Lisp

  • Lush(Lisp Universal Shell)是一种为研究人员、试验者以及对大规模数值和图形应用感兴趣的工程师设计的、面向对象的编程语言。它带有丰富的作为机器学习库一部分的深度学习库。

Haskell

  • DNNGraph 是一个用 Haskell 编写的深度神经网络生成 DSL。

.NET

  • Accord.NET 是一种.NET 机器学习框架,包含声音和图像处理库,它完全由 C# 编写。它是一种为开发生产级的计算机视觉、计算机听觉、信号处理和统计应用而设计的完整框架。

R

  • darch 包可以用于建立多层神经网络(深层结构)。其中的训练方式包括使用对比发散法进行提前训练,或使用通常的训练方法(如反向传播和共轭梯度)进行一些微调。
  • deepnet 实现了一些深度学习架构和神经网络算法,包括 BP、RBM、DBN、深度自编码器等等。

Node.js + React Native 毕设:农业物联网监测系统的开发手记

$
0
0

毕设大概是大学四年里最坑爹之一的事情了,毕竟一旦选题不好,就很容易浪费一年的时间做一个并没有什么卵用,又不能学到什么东西的鸡肋项目。所幸,鄙人所在的硬件专业,指导老师并不懂软件,他只是想要一个农业物联网的监测系统,能提供给我的就是一个Oracle 11d数据库,带着一个物联网系统运行一年所保存的传感器数据…That’s all。然后,因为他不懂软件,所以他显然以结果为导向,只要我交出一个移动客户端和一个服务端,并不会关心我在其中用了多少坑爹的新技术。

那还说什么?上!我以强烈的恶搞精神,决定采用业界最新最坑爹最有可能烂尾的技术,组成一个 Geek 大杂烩,幻想未来那个接手我工作的师兄的一脸懵逼,我露出了邪恶的笑容,一切只为了满足自己的上新欲。

全部代码在 GPL 许可证下开源:

由于数据库是学校实验室所有,所以不能放出数据以供运行,万分抱歉~。理论上应该有一份文档,但事实上太懒,不知道什么时候会填坑~。

总体架构

OK,上图说明技术框架。


该物联网监测系统整体上可分为三层:数据库层,服务器层和客户端层。

数据库和代码层

数据库层除了原有的Oracle 11d数据库以外,还额外增加了一个Redis数据库。之所以增加第二个数据库,原因为:

  1. Node.js 的 Oracle 官方依赖 node-oracledb 没有ORM,也就是说,所有的对数据库的操作,都是直接执行SQL语句,简单粗暴,我担心自己孱弱的数据库功底(本行是 Android 开发)会引发锁表问题,所以通过限制只读来避开这个问题。
  2. 由于该系统服务于农业企业的内部管理人员,因此其账号数量和总体数据量必然有限,因此使用 redis 这种内存型数据库,可以不必考虑非关系型数据库在容量占用上的劣势。读取速度反而较传统的 SQL 数据库有一定的优势。
  3. 使用非关系型数据库比关系型数据库好玩多了(雾
  4. 之所以写了右边的Git部分,是因为原本打算利用docker技术搞一个持续集成和部署的程序,实现提交代码=>自动测试=>更新服务器部署更新=>客户端自动更新 这样一整套持续交付的流程,然而最后并没有时间写。

服务器层

服务器层,采用 Node.js 的 Express 框架作为客户端的 API 后台。因为 Node.js 的单线程异步并发结构使之可以轻松实现较高的 QPS,所以非常适合 API 后端这一特点。其框架设计和主要功能如下图所示:

像网关层:鉴权模块这么装逼的说法,本质也就是app.use(jwt({secret: config.jwt_secret}).unless({path: ['/signin']}));一行而已。因为是直接从毕业论文里拿下来的图,毕业论文都这尿性你们懂的,所以一些故弄玄虚敬请谅解。

客户端层

客户端层绝大部分是 React Native 代码,但是监控数据的图表生成这一块功能(如下图),由于 React Native 目前没有开源的成熟实现;试图通过 Native 代码来画图表,需要实现一个 Native 和 React Native 互相嵌套的架构,又面临一些可能的困难;故而最终选择了内嵌一个 html 页面,前端代码采用百度的 Echarts 框架来绘制图表。最终的结构就是大部分 React Native + 少部分 Html5 的客户端结构。

另外就是采用了 Redux 来统一应用的事件分发和 UI 数据管理了。可以说,React Native 若能留名青史,Redux 必定是不可或缺的一大原因。这一点我们后文再述。

细节详述

服务端层

服务端接口表:

服务端程序的编写过程中,往往涉及到了大量的异步操作,如数据库读取,网络请求,JSON解析等等。而这些异步操作,又往往会因为具体的业务场景的要求,而需要保持一定的执行顺序。此外,还需要保证代码的可读性,显然此时一味嵌套回调函数,只会使我们陷入代码几乎不可读的回调地狱(Callback Hell)中。最后,由于JavaScript单线程的执行环境的特性,我们还需要避免指定不必要的执行顺序,以免降低了程序的运行性能。因此,我在项目中使用Promise模式来处理多异步的逻辑过程。如下代码所示:

function renderGraph(req, res, filtereds) {
  var x = [];
  var ys = [];
  var titles = [];

  filtereds[0].forEach(function(row) {
    x.push(getLocalTime(row.RECTIME));
  });

  filtereds.forEach(function(filtered){
    if (filtered[0] == undefined)
      // even if at least one of multi query was succeed
      // fast-fail is essential for secure
      throw new Error('数据库返回结果为空');
    var y = [];
    filtered.forEach(function(row) {
      y.push(row.ANALOGYVALUE);
    });
    ys.push(y);
    titles.push(filtered[0].DEVICENAME + ': ' + filtered[0].DEVICECODE);
  });

  res.render('graph', {
    titles: titles,
    dataX: x,
    dataY: ys,
    height: req.query.height == undefined ? 200 : req.query.height,
    width: req.query.width == undefined ? 300 : req.query.width,
  });
}

function resFilter(resolve, reject, connection, resultSet, numRows, filtered) {
  resultSet.getRows(
    numRows,
    function (err, rows)
    {
      if (err) {
        console.log(err.message);
        reject(err);
      } else if (rows.length == 0) {
        resolve(filtered);
        process.nextTick(function() {
          oracle.releaseConnection(connection);
        });
      } else if (rows.length > 0) {
        filtered.push(rows[0]);
        resFilter(resolve, reject, connection, resultSet, numRows, filtered);
      }
    }
  );
}

function createQuerySingleDeviceDataPromise(req, res, device_id, start_time, end_time) {
  return oracle.getConnection()
  .then(function(connection) {
    return oracle.execute(
      "SELECT\
        DEVICE.DEVICEID,\
        DEVICECODE,\
        DEVICENAME,\
        UNIT,\
        ANALOGYVALUE,\
        DEVICEHISTROY.RECTIME\
      FROM\
        DEVICE INNER JOIN DEVICEHISTROY\
      ON\
        DEVICE.DEVICEID = DEVICEHISTROY.DEVICEID\
      WHERE\
        DEVICE.DEVICEID = :device_id\
        AND DEVICEHISTROY.RECTIME\
        BETWEEN :start_time AND :end_time\
      ORDER\
        BY RECTIME",
      [
        device_id,
        start_time,
        end_time
      ],
      {
        outFormat: oracle.OBJECT,
        resultSet: true
      },
      connection
    )
    .then(function(results) {
      var filtered = [];
      var filterGap = Math.floor(
        (end_time - start_time) / (120 * 100)
      );
      return new Promise(function(resolve, reject) {
        resFilter(resolve, reject,
          connection, results.resultSet, filterGap, filtered);
      });
    })
    .catch(function(err) {
      res.status(500).json({
        status: 'error',
        message: err.message
      });
      process.nextTick(function() {
        oracle.releaseConnection(connection);
      });
    });
  });
}

function secureCheck(req, res) {
  let qry = req.query;

  if (
    qry.device_ids == undefined
    || qry.start_time == undefined
    || qry.end_time == undefined
  ) {
    throw new Error('device_ids或start_time或end_time参数为undefined');
  }

  if (req.query.end_time < req.query.start_time) {
    throw new Error('终止时间小于起始时间');
  }
};

router.get('/', function(req, res, next) {
  try {
    var device_ids;
    var queryPromises = [];

    secureCheck(req, res);

    device_ids = req.query.device_ids.toString().split(';');

    for(let i=0; i<device_ids.length; i++) {
      queryPromises.push(createQuerySingleDeviceDataPromise(
        req, res, device_ids[i], req.query.start_time, req.query.end_time));
    };

    Promise.all(queryPromises)
    .then(function(filtereds) {
      renderGraph(req, res, filtereds);
    }).catch(function(err) {
      res.status(500).json({
        status: 'error',
        message: err.message
      });
    })
  } catch(err) {
    res.status(500).json({
      status: 'error',
      message: err.message
    });
  }
});

这是生成指定N个传感器在一段时间内的折线图的逻辑。显然,剖析业务可知,我们需要在数据库中查询N次传感器,获得N个值对象数组,然后才能去用N组数据渲染出图表的HTML页面。 可以看到,外部核心的Promise控制的流程只集中于下面的几行之中:Promise.all(queryPromises()).then(renderGraph()).catch()。即,只有获取完N个传感器的数值之后,才会去渲染图表的HTML页面,但是这N个传感器的获取过程却又是并发进行的,由Promise.all()来实现的,合理地利用了有限的机器性能资源。

然而,推入queryPromises数组中的每个Promise对象,又构成了自己的一条Promise逻辑链,只有这些子Promise逻辑链被处理完了,才可以说整个all()函数都被执行完了。子Promise逻辑链大致地可以总结为以下形式:

function() {
    return new Promise().then().catch();
}

其中的难点在于:

  1. 合理地切分整套业务逻辑到不同的then()函数中,且一个then()中只能有一个异步过程。
  2. 函数体内的异步过程所产生的新的Promise逻辑链必须被通过return的方式挂载到父函数的Promise逻辑链中,否则即可能形成一个有先有后的控制流程。
  3. catch()函数必须要做好捕捉和输出错误的处理,否则代码编写过程中的错误即不可能被发现,异步编程的整个过程也就无从继续下去了。
  4. 值得一提的是Promise模式的引入。Node.js 自身不带有Promise,可以引入标准的ECMAScript的Promise实现,然而其功能较为简陋,对于各种API的实现过于匮乏,因此最后选择了bluebird库来引入Promise模式的语言支持。

由此我们可以看到,没有无缘无故的高性能。Node.js 的高并发的优良表现,是用异步编程的高复杂度换来的。当然,你也可以选择不要编程复杂度,即不采用 Promise,Asnyc 等等异步编程模式,任由代码沦入回调地狱之中,那么这时候的代价就是维护复杂度了。其中取舍,见仁见智。

客户端层

客户端主要功能如下表所示:

接下来简单介绍下几个主要页面。可以发现 iOS 明显比 Android 要来的漂亮,因为只对 iOS 做了视觉上的细调,直接迁移到 Android 上,就会由于屏幕显示的色差问题,显得非常粗糙。所以,对于跨平台的 React Native App 来说,做两套色值配置文件,以供两个平台使用,还是很有必要的。

上图即是土壤墒情底栏的当前数据页面,分别在Android和iOS上的显示效果,默认展示所有当前的传感器的数值,可以通过选择传感器种类或监测站编号进行筛选,两个条件可以分别设置,选定后再点击查找,即向服务器发起请求,得到数据后刷新页面。由于React Native 的组件化设计,刷新将只刷新下侧的DashBoard部分,且,若有上次已经渲染过的MonitorView,则会复用他们,不再重复渲染,从而实现了降低CPU占用的性能优化。MonitorView,即每一个传感器的展示小方块,自上至下依次展示了传感器种类,传感器编号,当前的传感器数值以及该传感器显示数值的单位。MonitorView和Dashboard均被抽象为一个一般化,可复用的组件,使之能够被利用在气象信息、病虫害监测之中,提升了开发效率,降低了代码的重复率。

上图是土壤墒情界面的历史数据界面,分别在Android和iOS上的展示效果,默认不会显示数据,直到输入了传感器种类和监测站编号,选择了年月日时间后,再点击查找,才会得到结果并显示出来。该界面并非如同当前数据界面一样,Android和iOS代码完全共用。原因在于选择月日和选择时间的控件,Android和iOS系统有各自的控件,它们也被封装为React Native中不同的控件,因此,两条绿色的选择时间的按钮,被封装为HistoricalDateSelectPad,分别放在componentIOS和componentAndroid文件夹中。界面下侧的数据监测板,即代码中的Dashboard,是复用当前数据中的Dashboard。

上图是土壤墒情界面的图表生成界面,分别在Android和iOS上的展示效果。时间选择界面,查找按钮,选择框,均可复用前两个界面的代码,因此无需多提。值得说的是,生成的折线图,事实上是通过内嵌的WebView来显示一个网页的。图表网页的生成,则依靠的百度Echarts 第三方库,然后服务端提供了一个预先写好的前端模板,Express框架填入需要的数据,最后下发到移动客户端上,渲染生成图表。图表支持了多曲线的删减,手指选取查看具体数据点,放大缩小等功能。

上图则是实际项目应用中的Redux相关文件的结构。stores中存放全局数据store相关的实现。

actions中则存放根据模块切割开的各类action生成函数集合。在 Redux 中,改变 State 只能通过 action。并且,每一个 action 都必须是 Javascript Plain Object。事实上,创建 action 对象很少用这种每次直接声明对象的方式,更多地是通过一个创建函数。这个函数被称为Action Creator。

reducers中存放许多reducer的实现,其中RootReducer是根文件,它负责把其他reducer拼接为一整个reducer,而reducer就是根据 action 的语义来完成 State 变更的函数。Reducer 的执行是同步的。在给定 initState 以及一系列的 actions,无论在什么时间,重复执行多少次 Reducer,都应该得到相同的 newState。

性能测试

服务端

测试工具:OS X Activity Monitor(http_load)

客户端

iOS

测试工具:Xcode 7.3

Android

测试工具:Android Studio 1.2.0

代码量相关

简单总结

React Native 尽管在开发上具有这样那样的坑,但是因其天生的跨平台,和优于 Html5的移动性能表现,使得他在写一些不太复杂的 App 的时候,开发速度非常快,自带两倍 buff。


Draw animated chart on React Native

$
0
0

Draw animated chart on React Native

@2016-07-18 23:43JavaScript, React

At Meguro.es #4 on June 21th, 2016, I talked about drawing animated chart on React Native. The talk was about the things I learned through developing an tiny app, Compare. It’s a super simple app to compare temperatures.

Before creating it, I had no idea about what temperatures on weather forecast, like 15 degrees Celsius, were actually like. I remember what yesterday was like, but not the numbers. Typical weather forecast apps shows only future temperatures without past records. Thanks to The Dark Sky Forecast API, the app fetches both of past records and future forecasts, and show them together.

Compare app

The app’s source code is on GitHub:

shuhei/Compare

There might have been some charting libraries to draw similar charts, but I like to write things from scratch. I like to reinvent the wheel especially when it’s a side project. Thanks to that, I found a way to animate smooth paths with the Animated library.

If I have to add something to the slides:


Home automation with cheap 433MHz plugs, a 1$ 433MHz transmitter, and a TP-Link TL-WR703N router

$
0
0

Home automation with cheap 433MHz plugs, a 1$ 433MHz transmitter, and a TP-Link TL-WR703N router

Two years ago, I started playing around with cheap 433MHz plugs that can be found almost everywhere. At that time, I got several from different brands, from the well known Chacon Di-O plugs, to the most obscure chinese/no-name ones, and my goal was to reverse engineer as much protocols as possible. I compiled the result into a little tool I called rf-ctrl (now available on my GitHub), and forgot about it. However, this summer, I needed to find a solution to remotely control my electric heaters (not because I was cold obviously, but because I had the time to do it), and thought it was time to dig up rf-ctrl with a bit of polishing (a Web UI called Home-RF).

Blue-RF4.jpg

ON/OFF Keying (OOK)

Let’s first talk about OOK a little bit. Most of the cheap 433MHz plugs (but also chimes, rolling shutter controllers, thermometers …) use the ON/OFF Keying or OOK modulation. The idea is that data are sent in binary form, by alternatively enabling and disabling the transmitter, thus the carrier frequency. I found mainly two ways of doing so:

  • coding a 0 with an ON/OFF transition that has particular ON and OFF durations, and coding a 1 with another ON/OFF transition that has other particular ON and OFF durations

Most of the plugs I found use this scheme, and this is the kind of modulation that rf-ctrl implements. This technique, which could be seen as some kind of Manchester code, allows the receiver to easily recover the clock and sync, since the carrier frequency cannot be enabled or disabled longer than a particular amount of time. The timings for a 0 are often inverted compared to those of a 1, for instance, 160µs-ON/420µs-OFF represents a 0 with the OTAX protocol, while 420µs-ON/160µs-OFF represents a 1. However, this is not systematic, and some protocols use totally different timings, for instance 260µs-ON/260µs-OFF for a 0 and 260µs-ON/1300µs-OFF for a 1 with the Di-O protocol. The data part of the frame is sometimes encapsulated between a starting marker and an ending marker. These markers are also represented with an ON/OFF transition, but with different timings. The whole frame is then repeated a specific number of times, with a delay between the frames that can also be assimilated to either a starting marker without ON state, or an ending one with a long OFF state. Last thing to note is the transition order which is often ON/OFF, but can be OFF/ON as well.

  • coding a 0 by disabling the carrier frequency for a time Tb, and coding a 1 by enabling it for the same time Tb

This is actually the “real” low-level way of doing OOK things. You can even describe the previous one that way by choosing a bit-rate (1/Tb) high enough to represent the previous ON/OFF transitions by a succession of ones and zeros that will match the timings. This kind of coding is rather found in high-end devices, like old car keys and more secure plugs/rolling shutters. It was not compatible with the HE853 dongle I had at that time, and thus is not supported by rf-ctrl. However I played with it at a point in order to control the rolling shutters and plugs from the Somfy brand, and to test TI’s CC110x transceiver, but that’s not the purpose of this post.

Reversing the protocol of a 433MHz plug

To replicate a protocol, one must understand two things. The OOK timings (physical characteristics) is the first one and the easiest, while the actual data format of the frame will be the second one.

OOK timings

The easiest way to capture a frame is to use a 1$ 433MHz (433,92Mhz actually) receiver connected to either an oscilloscope or a digital analyzer. You will get something like this (Sumtech protocol):

OOK_da.pngBut if you do not have this kind of receiver but have an oscilloscope laying around, you can also use a simple wire of around 17 cm (= 3×10^8/(4x433x10^6) = lambda/4) connected to one of the inputs ! You will get something like this, which is enough to understand the underlaying timings (Idk protocol this time):

OOK_scope.jpg

Thanks to this, you can measure the expected timings and the number of times the frame needs to be repeated. It’s time to start writing down zeros and ones on a sheet of paper.

Frame format

Now, what remains is the actual data to send. Most of the time, a frame consists of a remote ID which is the ID of the remote that sends the frame, a device ID which is just the number of the button pressed on the remote, an action, like ON or OFF, which is most likely 1 or 0, some kind of checksum, and some fixed values. In some cases there are additional values that change every time a button is pressed. They are called rolling codes, and are found in brands like Somfy. This kind of codes are often harder to reverse, but the cheap plugs do not use that😉. Finally, some protocols add a simple obfuscation layer on top of the frame, like a XOR for instance.
To understand a protocol, the best method remains to gather as much frames as possible, while writing down what generated them. The first step is to determine if two frames generated by pushing the same button are indeed the same. It will most likely be the case, but if not, you need to find out which part of the frame changes. It can be a simple counter, or something more clever. Remember that if there is some kind of encryption/obfuscation, the whole frame can change because of a simple counter. Anyway, you need to scratch your head and find the solution by comparing as much frames as possible.
Assuming all frames generated by one button are the same, the next thing to do is to change one parameter at a time, and look at the result to identify the different fields. For instance, press the ON and OFF button of the same plug number, on the same remote, and compare the resulting frames. Only a small part of it should change, part that you can now identify as the action field.
Then press the ON button for another plug, and compare to the ON button for the first plug. Check that 1) the action field remains the same, 2) something else changed. This something else is probably the device ID. You can then try to open the remote, and look for some kind of multi-switch or jumpers. You will not necessarily find something in all remotes since some will have their ID stored in an Eeprom or something like that, but if you do find something, try to change it and check the generated frames. This will most likely help to find the remote ID.
If you see a part of the frame that seems to change only when something else changes, then you might just have identified a checksum. Try to find how these bits can be computed from the other ones. It can be a for instance a simple sum, or a XOR. Repeat the procedure until you are convinced that all those fields behave as assumed.
Now, keep in mind this is just a generic description of a 433MHz device. Some will not fit the mold and might have, for instance, more or less fields. The frame format can even be completely different.

Once the frame format understood, it’s time to test ! For this you will need a 433MHz transmitter. I first used this HE853 USB dongle, which works fine with a regular PC, but I found out it was easier to just use this 1$ transmitter connected to a Raspberry PI, a TP-Link TL-WR703N router, or any device that offers GPIOs. And this is where rf-ctrl comes in handy. It uses a back-end/front-end (transmitter driver/protocol driver) logic allowing to implement new protocols easily. Here is how to do so:

  • copy an existing protocol like auchan.c as <my_protocol>.c and add it to the Makefile by appending <my_protocol>.o to the OBJECTS variable
  • fill in the timing_config structure with the values you measured (values are expected in µs)
  • implement the int (*format_cmd)(uint8_t *data, size_t data_len, uint32_t remote_code, uint32_t device_code, rf_command_t command); function which is supposed to generate the final frame in the pre-allocated *data array of data_len bytes from the remote ID, the device ID and the command.
  • fill in the rf_protocol_driver structure with a short and long name for the protocol, the pointer to the format_cmd() function and to the timing_config structure, the max allowed remote and device IDs, and the actual parameters this protocol needs (most likely PARAM_REMOTE_ID | PARAM_DEVICE_ID | PARAM_COMMAND)

That’s all ! You should be able to build rf-ctrl and control your plug with it. If it does not work, do not hesitate to check the generated signal with your oscilloscope or digital analyzer.

Controlling my electric heaters remotely

Let’s get back to the main topic. To control my heaters, I thought I would buy plugs from one of the brands I already reversed, and went to buy the “auchan” ones. Unfortunately, they were still selling 433MHz plugs under the same name, but the underlaying supplier had clearly changed. I decided to buy three of them anyway, but knew I would have to reverse yet another protocol, with the risk it might have used some kind of rolling codes… Hopefully, it did not, and was pretty straightforward to understand. For your information it’s the protocol I called “auchan2”.
Now regarding the actual setup, I used the well known TP-Link TL-WR703N router running OpenWrt and a 1$ 433MHz transmitter (again, like this one) connected, through a 2V -> 5V level shifter, to the GPIO 7 of the router. I wrote the needed Makefile to build rf-ctrl as an OpenWrt package, and also created a kernel driver that generates the proper OOK signal on GPIO 7 once fed with the correct timings and data. This driver, called ook-gpio, is directly provided as an OpenWrt package on my GitHub. Since the WR703N does not have much free space, I chose to build a special firmware for it with everything in it, removing what was useless. Once the firmware flashed, I verified that I was indeed able to control my heaters. But to do that remotely, I had to connect trough SSH and use my command line tool, which looked like something that could have been improved. So I made a little Web UI called Home-RF, which is a little shell script that allows to control rf-ctrl by generating a web page with configurable presets. It looks like this:

Home-RF2.jpg Home-RF3.jpg

The idea is that you can add presets for devices like plugs, rolling shutters or chimes, and they will be displayed like a remote. As a bonus, It also supports WakeOnLan compatible computers (usingetherwake). There is a simple preset editor included in the interface, as well as an advanced panel that allows to manually control rf-ctrl or etherwake. Home-RF will be nicely displayed on a PC, as well as on mobile phones. It is available here on my GitHub, and can be built as an OpenWrt package.
At that point, I rebuilt a firmware with Home-RF inside, and flashed it. I’m using a VPN at home, so I do not care about authentication directly in Home-RF. However, if you plan to use it remotely, do not forget to add some kind of access control on top of it (htaccess, SSH, VPN…) !

How-to do the same

In order to build your own RF gateway, you will need:

  • a TP-Link TL-WR703N router that you can find here for instance
  • a 5V 433MHz transmitter like this one
  • an N-MOSFET in enhancement mode with a threshold voltage smaller than 2V for the level shifter (I used this BSS138)
  • one 10k Ohm resistor
  • a 17,3cm long wire for the antenna
  • a 3 pins female connector with 2.54mm pitch (optional)
  • a piece of prototyping PCB
  • some wire
  • some Kapton

The provided instructions assume you are working on a PC running Linux.

First, the hardware

The schematic for the level shifter is the following:

2V_to_5V_LS.png

– Solder the MOSFET and the resistor to match the schematic above
– Use one pad of the PCB as Ground, and solder three wires on Output, +5V Transmitter, and GND Transmitter
– Either solder the 3 pins connector to the other end of the wires, or solder the RF transmitter directly (remove the male pins of the transmitter if any)
– Open the WR703N router, and look for the four signals below:

  • Gnd is on a little pad, top side of the PCB
  • GPIO 7 is on R15, top side of the PCB
  • +2V is on C47 (or TP2V0), bottom side of the PCB
  • +5V is on R113, bottom side of the PCB

WR703N_bottom.jpg WR703N_top.jpg

– Solder one end of four wires on these signals, and the other end to the level shifter previously made

WR703N_module.jpg

– Solder the 17,3 cm long wire to the antenna pad of the transmitter and put Kapton everywhere to prevent any short-circuit (I tried without antenna at first, that’s why it is missing on my picture)

WR703N_module2.jpg

– Put the board back in its casing, use its reset hole to get the antenna out of it (you will have to bend the antenna to do so, so make sure it does not push the reset button), and close it

You should get something like this:

WR703N_antenna.jpg

Now, the software

I attached a prebuilt Barrier Breaker (14.07) OpenWrt firmware with all the tools in it, but it is funnier to build it yourself:

– Create your root folder for the build, for instance my-gateway:

$ mkdir my-gateway

– Go to that folder, and checkout a Barrier Breaker OpenWrt tree (I did not try Chaos Calmer, so let me know if it works):

$ cd my-gateway
$ git clone -b barrier_breaker git://github.com/openwrt/openwrt.git

– Checkout rf-ctrl, Home-RF and ook-gpio:

$ git clone https://github.com/jcrona/rf-ctrl.git
$ git clone https://github.com/jcrona/home-rf.git
$ git clone https://github.com/jcrona/ook-gpio.git

– Create the packages folders in OpenWrt:

$ mkdir -p openwrt/package/utils/home-rf/files
$ mkdir -p openwrt/package/utils/rf-ctrl/src
$ mkdir -p openwrt/package/kernel/ook-gpio

– Copy the packages content:

$ cp -a home-rf/www openwrt/package/utils/home-rf/files/
$ cp home-rf/OpenWrt/Makefile openwrt/package/utils/home-rf/
$ cp rf-ctrl/* openwrt/package/utils/rf-ctrl/src/
$ cp rf-ctrl/OpenWrt/Makefile openwrt/package/utils/rf-ctrl/
$ cp -a ook-gpio/* openwrt/package/kernel/ook-gpio/

– Update external feeds in OpenWrt and add etherwake to the build system:

$ cd openwrt
$ ./scripts/feeds update -a
$ ./scripts/feeds install etherwake

– Download the attached home-rf_openwrt.config into the my-gateway folder, and use it:

$ cp ../home-rf_openwrt.config .config
$ make oldconfig

– Build the OpenWrt firmware

$ make

– You should have your firmware ready in my-gateway/openwrt/bin/ar71xx/.

If you have any issue buidling the mac80211 package, it might be because the build system failed to clone the linux-firmware Git. In that case, download the linux-firmware-2014-06-04-7f388b4885cf64d6b7833612052d20d4197af96f.tar.bz2 archive from here, copy it into the my-gateway/openwrt/dl/ folder, and restart the build.

Now, you need to flash your WR703N router. If you never flashed OpenWrt before on your router, use openwrt-ar71xx-generic-tl-wr703n-v1-squashfs-factory.bin as explained here. Otherwise, use openwrt-ar71xx-generic-tl-wr703n-v1-squashfs-sysupgrade.bin with the sysupgrade tool.

Final touch

At that point, you should have your router up and running. You still need to configure it like a regular OpenWrt router, as explained here. You can, for instance, configure it in WiFi station mode, so that you can find the best place to reach all your 433MHz devices.
Once properly configured, open a browser and go to http://<your_router_ip>/home-rf. If everything went well, you will get the Home-RF interface ready to be configured !

Final words

So now I’m able to control my electric heaters from my phone for around 20$, and I hope you will be able to do the same with your own 433MHz devices. All the discussed tools are available on myGitHub. I will be happy to extend the list of supported protocols in rf-ctrl, so feel free to add more.
If you want to play around, try the “scan” mode of rf-ctrl ! It allows to send all possible frames within a range of given remote IDs, device IDs, and protocols.

That’s all for now !


Recovering 433MHz Messages with RTL-SDR and MATLAB

$
0
0

Recovering 433MHz Messages with RTL-SDR and MATLAB

Posted 2015/02/13. Last updated 2015/02/17.

Introduction

I recently bought a DVB-T dongle containing the Realtek RTL2832U and Raphael Micro R820T chips with the intent to use it as a Software-Defined Radio (SDR) receiver. These dongles are incredible because for about $10, you can tune in to frequencies between 24 and 1766MHz and listen to a wide range of devices and signals, provided you have a proper antenna (and a down-/up- converter if you want to listen outside of this range). The device, pictured below, is truly very simple: the back consists solely of a couple lines that could probably not be routed on the top layer of the PCB.

RTL-SDR dongle

As a first project, I decided to look into the 433MHz frequency, as others have also successfully done (see here, here, and here for instance), but decided to focus on the methodology and the tools available, rather than recovering a specific device’s key, since I didn’t have one lying around. This post describes the manual process I followed with existing tools, as well as a basic MATLAB script that I wrote interfacing with the RTL device which automates the binary signal recovery process.

UPDATE: There is some good discussion of this post going on at Hackaday, RTL-SDR, and Reddit, which also contain a few more pointers for this kind of thing. My response to some of the points raised can be found here. A good alternative to MATLAB which I had not considered is Octave, which apparently interfaces well with GNU Radio.

Setup

As mentioned above, I did not have a device transmitting at 433MHz, so instead I used a typical cheap MX-FS-03V RF transmitter (pictured below) bought off of EBay, connected to an Arduino Uno. I used the rc-switch library, which appears to be pretty popular, with a lot of forks on GitHub. My code‘s loop simply calls mySwitch.send("010010100101") followed by a delay of 1 second and makes no other calls to the library besides enabling transmission on the appropriate Arduino pin.

433MHz transmitter

The goal of the project was to uncover the details of the protocol (and the value transmitted) before looking at the library code to verify it. To this end, I installed SDR# to visualize and record the signal, as well as Audacity to inspect the produced WAV file. I additionally installed the rtl-sdr and rtl_433 libraries which contain command-line utilities for automation (Windows binaries can be found here and here).

Tuning In

Having programmed the Arduino and left it to constantly transmit, my first step was to fire up SDR# to visually inspect the signal. The figures below show SDR#’s spectrum analyzer and waterfall graphs centered at 433MHz. The spectrum analyzer shows a consistent noise level across frequencies when the transmitter is silent, and also indicates a few DC bias spikes. Moreover, the waterfall illustrates that the transmitter output is not filtered and produces noise/energy across many unwanted frequencies. [UPDATE: Per a suggestion here, reducing the gain helps remove the aliases, but does not entirely eliminate them.]

433MHz transmission lowWaterfall across the spectrum

This can be seen even more clearly below, when a transmission is occurring, where we can also identify that the strongest signal is actually at 434MHz.

433MHz transmission high434MHz transmission high

Analyzing the Signal

After selecting the frequency, I recorded 10 seconds of the signal which came out as an astonishingly large 110MB WAV file! Opening up the recording on Audacity, as shown below, we can identify 10 seemingly identical, equally spaced transmissions 1 second apart, with the exception of the 8th one.

Full Signal on Audacity

We ignore the anomaly for now (as a closer inspection indicates it is simply truncated, but otherwise the same as other transmissions), and focus on an individual section:

Single Transmission on Audacity

Once more we find 10 identical transmissions within each section, so zooming further we can clearly identify the modulation as a type of on-off keying (OOK) where 0s are short HIGH bursts followed by long periods of silence, and 1s are long HIGH bursts followed by small periods of silence.

Individual Bits on Audacity

Note of course that the encoding could be reversed, but it is reasonable to assume that it is not (and our knowledge of what is being transmitted tells us we are right!): the signal appears to be 0100101001010. This is indeed what we transmitted, but there is a spurious 0 at the end. Though this could be a checksum, flipping the last bit or removing it does not alter the value, hence we can assume it is simply an End-of-Message (EOM) value. Looking at the individual signals for 0 and 1, we see that the pulse length for a 0 is 350μs long, and it is 3 times as long for a 1.

Amplitude Modulation for 0
Amplitude Modulation for 1

Looking at the setup code, we see that the pulse length is indeed 350μs long, and each message is repeated 10 times, each of which is followed by a sync message. Moreover, for the default protocol, a 0 is represented as 1 HIGH, 3 LOWs, while a 1 is the reverse. Success!

Recovering the Transmission with MATLAB

Even though rtl_433 readily decodes this message for us, when I found out that MATLAB has a package for RTL-SDR (which needs the Communications System Toolbox), I thought I’d try it out. As a first step, I tried the spectrum analyzer example, just to ensure that everything works. 433.989MHz gave the strongest signal, and behaves as expected both during silence and transmission:

MATLAB transmission lowMATLAB transmission high

The data is output in I/Q format with values between -1 and 1, but I did not want to write a demodulator, so I instead took the real part, corresponding to the in-phase component, which proves to be sufficient for our purposes. [UPDATE: An alternative is taking the modulus of the complex value. This has the added benefit of not needing the Hilbert transform below, asthis comment mentions. I can confirm that setting rdata = abs(data); and binary(smoothed >= high_thres) = 1; in the code works without further changes.] As can be seen in the figure below and left, the output is very noisy, so I immediately applied a Savitzky-Golay filter, which was chosen to be cubic for data frames of length 41, as in the MATLAB example. As the picture below and to the right shows, the filtering is very effective.

In-phase unfiltered dataIn-phase filtered data

Having reduced the noise, the next step was to calculate the envelope of the signal, which in MATLAB is implemented by taking the modulus of the Hilbert transform, as also explainedhere. The figures below show what that looks like for the overall signal, as well as for a specific transmission of our 10 bits. As can be seen, during the transmission the envelope fluctuates a bit, but is most frequently above 1. When the transmission is not occurring, the value remains below 0.1, but this is not pictured here.

Signal envelopeSignal envelope detail

The conversion to a binary signal is straightforward: if the magnitude of the above quantity is above 0.5, the signal is considered to be at a logical HIGH, and if it is below 0.5 it is a logical LOW. Zooming into one of the transmissions shows us that the digital pulse produced is as expected, without noise:

Digital pulse output

The basic idea to automatically detect whether a signal is a 0 or 1 is simple: count the number of consecutive samples that were HIGH, and if they are close to the transmission pulse length of a 0 or a 1, print that value! There were a few intricacies in debouncing (where the code basically skips over a few LOWs in between HIGHs) and in setting the appropriate thresholds for what counts as “close enough”, but in the end the code was able to accurately recover all transmitted bits. That said, I expect that changes to the parameters will need to be made for other hardware, depending on factors such as the antennas and power of transmission.

Conclusion

RTL-SDR definitely opens up many possibilities. Even though this post was a “toy example”, it has real-world implications as plenty of devices operate freely at 433MHz and other frequencies, as explained in the introduction. Although MATLAB is not always easy to work with, it has tremendous capabilities, and the fact that it interfaces with the dongle is a great feature.

I believe that the RTL-SDR community would greatly benefit from more open-source projects using MATLAB, so I have made my code availabe on GitHub, if you would like to try it out for yourself. As mentioned above, it might need some tweaking based on your hardware, but I hope such changes will be minimal. If you have any comments or improvements, feel free tocontact me!

Postscript

My initial plan was to use GNU Radio on my new Raspberry Pi 2, but despite its extra processing power, I found that it could not adequately do signal processing, even for FM frequencies, and often underflowed. If you are interested in going down that route, you might want to look at this post containing installation instructions, and gqrx as a *nix alternative to SDR# (it’sgqrx-sdr under the repositories). Also take a look at this forum discussion if you get a BadMatch error, and at this post detailing how to approach the analysis using GNU Radio. Finally, if you, like me, don’t have an Ethernet plug available, but have an Android phone that can tether (even if it is using Wi-Fi), connect it to your Pi’s USB, set the connection mode to “Media” and follow the instructions here!



【资源帖】深度学习视觉领域常用数据集汇总

$
0
0
2016-10-28 刘念宏、付睿 THU数据派

[导读] “大数据时代”,数据为王!无论是数据挖掘还是目前大热的深度学习领域都离不开“大数据”。大公司们一般会有自己的数据,但对于创业公司或是高校老师、学生来说,“Where can I get large datasets open to the public?”是不得不面对的一个问题。

本文结合笔者在研究生学习、科研期间使用过以及阅读文献了解到的深度学习视觉领域常用的开源数据集,进行介绍和汇总。
MNIST
深度学习领域的“Hello World!”,入门必备!MNIST是一个手写数字数据库,它有60000个训练样本集和10000个测试样本集,每个样本图像的宽高为28*28。此数据集是以二进制存储的,不能直接以图像格式查看,不过很容易找到将其转换成图像格式的工具。
最早的深度卷积网络LeNet便是针对此数据集的,当前主流深度学习框架几乎无一例外将MNIST数据集的处理作为介绍及入门第一教程,其中Tensorflow关于MNIST的教程非常详细。
数据集大小:~12MB
下载地址:
http://yann.lecun.com/exdb/mnist/index.html
Imagenet
MNIST将初学者领进了深度学习领域,而Imagenet数据集对深度学习的浪潮起了巨大的推动作用。深度学习领域大牛Hinton在2012年发表的论文《ImageNet Classification with Deep Convolutional Neural Networks》在计算机视觉领域带来了一场“革命”,此论文的工作正是基于Imagenet数据集。
Imagenet数据集有1400多万幅图片,涵盖2万多个类别;其中有超过百万的图片有明确的类别标注和图像中物体位置的标注,具体信息如下:
    1)Total number of non-empty synsets: 21841
    2)Total number of images: 14,197,122
    3)Number of images with bounding box annotations: 1,034,908
    4)Number of synsets with SIFT features: 1000
    5)Number of images with SIFT features: 1.2 million
Imagenet数据集是目前深度学习图像领域应用得非常多的一个领域,关于图像分类、定位、检测等研究工作大多基于此数据集展开。Imagenet数据集文档详细,有专门的团队维护,使用非常方便,在计算机视觉领域研究论文中应用非常广,几乎成为了目前深度学习图像领域算法性能检验的“标准”数据集。
与Imagenet数据集对应的有一个享誉全球的“ImageNet国际计算机视觉挑战赛(ILSVRC)”,以往一般是google、MSRA等大公司夺得冠军,今年(2016)ILSVRC2016中国团队包揽全部项目的冠军。
Imagenet数据集是一个非常优秀的数据集,但是标注难免会有错误,几乎每年都会对错误的数据进行修正或是删除,建议下载最新数据集并关注数据集更新。
数据集大小:~1TB(ILSVRC2016比赛全部数据)
下载地址:
http://www.image-net.org/about-stats
COCO
COCO(Common Objects in Context)是一个新的图像识别、分割和图像语义数据集,它有如下特点:
    1)Object segmentation
    2)Recognition in Context
    3)Multiple objects per image
    4)More than 300,000 images
    5)More than 2 Million instances
    6)80 object categories
    7)5 captions per image
    8)Keypoints on 100,000 people
COCO数据集由微软赞助,其对于图像的标注信息不仅有类别、位置信息,还有对图像的语义文本描述,COCO数据集的开源使得近两三年来图像分割语义理解取得了巨大的进展,也几乎成为了图像语义理解算法性能评价的“标准”数据集。
Google开源的开源了图说生成模型show and tell就是在此数据集上测试的,想玩的可以下下来试试哈。
数据集大小:~40GB
下载地址:http://mscoco.org/
PASCAL VOC
PASCAL VOC挑战赛是视觉对象的分类识别和检测的一个基准测试,提供了检测算法和学习性能的标准图像注释数据集和标准的评估系统。PASCAL VOC图片集包括20个目录:人类;动物(鸟、猫、牛、狗、马、羊);交通工具(飞机、自行车、船、公共汽车、小轿车、摩托车、火车);室内(瓶子、椅子、餐桌、盆栽植物、沙发、电视)。PASCAL VOC挑战赛在2012年后便不再举办,但其数据集图像质量好,标注完备,非常适合用来测试算法性能。
数据集大小:~2GB
下载地址:
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
CIFAR
CIFAR-10包含10个类别,50,000个训练图像,彩色图像大小:32×32,10,000个测试图像。CIFAR-100与CIFAR-10类似,包含100个类,每类有600张图片,其中500张用于训练,100张用于测试;这100个类分组成20个超类。图像类别均有明确标注。CIFAR对于图像分类算法测试来说是一个非常不错的中小规模数据集。
数据集大小:~170MB
下载地址:
http://www.cs.toronto.edu/~kriz/cifar.html
Open Image
过去几年机器学习的发展使得计算机视觉有了快速的进步,系统能够自动描述图片,对共享的图片创造自然语言回应。其中大部分的进展都可归因于 ImageNet 、COCO这样的数据集的公开使用。谷歌作为一家伟大的公司,自然也要做出些表示,于是乎就有了Open Image。
Open Image是一个包含~900万张图像URL的数据集,里面的图片通过标签注释被分为6000多类。该数据集中的标签要比ImageNet(1000类)包含更真实生活的实体存在,它足够让我们从头开始训练深度神经网络。
谷歌出品,必属精品!唯一不足的可能就是它只是提供图片URL,使用起来可能不如直接提供图片方便。
此数据集,笔者也未使用过,不过google出的东西质量应该还是有保障的。
数据集大小:~1.5GB(不包括图片)
下载地址:
https://github.com/openimages/dataset
Youtube-8M
Youtube-8M为谷歌开源的视频数据集,视频来自youtube,共计8百万个视频,总时长50万小时,4800类。为了保证标签视频数据库的稳定性和质量,谷歌只采用浏览量超过1000的公共视频资源。为了让受计算机资源所限的研究者和学生也可以用上这一数据库,谷歌对视频进行了预处理,并提取了帧级别的特征,提取的特征被压缩到可以放到一个硬盘中(小于1.5T)。
此数据集的下载提供下载脚本,由于国内网络的特殊原因,下载此数据经常断掉,不过还好下载脚本有续传功能,过一会儿重新连接就能再连上。可以写一个脚本检测到下载中断后就sleep一段时间然后再重新请求下载,这样就不用一直守着了。(截至发文,断断续续的下载,笔者表示还没下完呢……)
数据集大小:~1.5TB
下载地址:https://research.google.com/youtube8m/
以上是笔者根据学习科研和文献阅读经历总结的目前深度学习视觉领域研究人员常用数据集。由于个人学识有限,难免有疏漏和不当的地方,望读者朋友们不吝赐教。
如果以上数据集还不能满足你的需求的话,不妨从下面找找吧。
 
1.深度学习数据集收集网站
http://deeplearning.net/datasets/**
收集大量的各深度学习相关的数据集,但并不是所有开源的数据集都能在上面找到相关信息。
2、Tiny Images Dataset
http://horatio.cs.nyu.edu/mit/tiny/data/index.html
包含8000万的32×32图像,CIFAR-10和CIFAR-100便是从中挑选的。
3、CoPhIR
http://cophir.isti.cnr.it/whatis.html
雅虎发布的超大Flickr数据集,包含1亿多张图片。
4、MirFlickr1M
http://press.liacs.nl/mirflickr/
Flickr数据集中挑选出的100万图像集。
5、SBU captioned photo dataset
http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/
Flickr的一个子集,包含100万的图像集。
6、NUS-WIDE
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm
Flickr中的27万的图像集。
7、Large-Scale Image Annotation using Visual Synset(ICCV 2011)
http://cpl.cc.gatech.edu/projects/VisualSynset/
机器标注的一个超大规模数据集,包含2亿图像。
8、SUN dataset
http://people.csail.mit.edu/jxiao/SUN/
包含13万的图像的数据集。
9、MSRA-MM
http://research.microsoft.com/en-us/projects/msrammdata/
包含100万的图像,23000视频;微软亚洲研究院出品,质量应该有保障。
 
中国是一个“数据大国”,中国的数据开放在政府部门以北京、上海等地为首,陆续开放了交通、天气等数据集;在企业中以新浪微博等为首,开放了真实、有效的数据给研究人员提供了极大的便利;但就计算机视觉领域来说,国内数据集的开放水平和国外相比仍有一定差距。希望国内相关企业和组织能够开放更多优秀的数据集,促进相关行业研究进展,提升中国在相关研究领域的影响力,为推动全人类科学技术的进步贡献自己的一份力量。
 
 
参考文献:
[1] http://yann.lecun.com/exdb/mnist/index.html
[2] http://www.image-net.org/about-stats
[3] http://mscoco.org/
[4] http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
[5] http://www.cs.toronto.edu/~kriz/cifar.html
[6] https://github.com/openimages/dataset
[7] https://research.google.com/youtube8m/
[8] http://blog.csdn.net/qq_26898461/article/details/50593328
作者介绍:
刘念宏:清华大学微电子系在读硕士研究生,清华大学“大数据硕士”,现任清华大学学生大数据协会会长。
主要研究方向:深度学习图像检测。
联系方式:
lnh15@mails.tsinghua.edu.cn。
 
付睿:清华大学自动化系在读硕士研究生,清华大学“大数据硕士”,前任清华大学学生大数据协会会长。
主要研究方向:智能交通。
联系方式:freefor_ever@163.com。

Mirai物联网僵尸攻击深度解析

$
0
0

Mirai物联网僵尸攻击深度解析

2016-10-2833090人围观 ,发现 7 个不明物体安全报告终端安全

mirai.png

有的厂商将Mirai命名为蠕虫不是很贴切,Mirai利用类似蠕虫的方式感染(与传统蠕虫感染方式不同),但实际上是一款僵尸程序。因而称之为Mirai僵尸蠕虫更为准确,后文主要以僵尸称呼。

美国大面积的网络瘫痪事件

2016年9月30日,黑客Anna-senpai公开发布Mirai僵尸源码。其公布源码的目的一则是发现有关机构正在清理其掌控的僵尸设备;二则是为了让更多的黑客使用该僵尸进行扩散,掩人耳目,隐藏自己的踪迹。

2016年10月21日,美国东海岸地区遭受大面积网络瘫痪,其原因为美国域名解析服务提供商Dyn公司当天受到强力的DDoS攻击所致。Dyn公司称此次DDoS攻击涉及千万级别的IP地址(攻击中UDP/DNS攻击源IP几乎皆为伪造IP,因此此数量不代表僵尸数量),其中部分重要的攻击来源于IOT设备,攻击活动从上午7:00(美国东部时间)开始,直到下午1:00才得以缓解,黑客发动了三次大规模攻击,但是第三次攻击被缓解未对网络访问造成明显影响。

此次攻击是一次跨越多个攻击向量以及互联网位置的复杂攻击,Flashpoint与Akamai的分析确认攻击流量的来源之一是感染了Mirai僵尸的设备,因为部分离散攻击IP地址来自Mirai僵尸网络。

Mirai僵尸在黑客Anna-senpai公布源码后,被黑客利用并快速的形成了大量的僵尸网络,其中部分黑客参与了此次攻击,目前不排除黑客Anna-senpai也参与了本次攻击,其拥有大概30万-40万的Mirai僵尸肉鸡。

启明星辰ADLab分析发现,Mirai僵尸借鉴了QBOT的部分技术,并在扫描技术、感染技术等方面做了优化,大大提升了感染速度。

Mirai僵尸重要事件回溯

此次针对Dyn域名服务器的攻击让古老的DDoS技术再一次震撼了互联网,其中最引人注目是物联网僵尸网络的参与,物联网概念流行了近7年,大量的智能设备正不断地接入互联网,其安全脆弱性、封闭性等特点成为黑客争相夺取的资源。目前已经存在大量针对物联网的僵尸网络,如QBOT、Luabot、Bashlight、Zollard、Remaiten、KTN-RM等等,并且越来越多的传统僵尸也开始加入到这个物联网行列中。

通过启明星辰ADLab的调查分析,Mirai僵尸网络有两次攻击史,其中一次是针对安全新闻工作者Brian Krebs的网站,攻击流量达到665Gbps。

另一次是针对法国网站主机OVH的攻击,其攻击流量达到1.1Tbps,打破了DDoS攻击流量历史记录。

Mirai僵尸重要事件回顾:

(1)2016年8月31日,逆向分析人员在malwaremustdie博客上公布mirai僵尸程序详细逆向分析报告,此举公布的C&C惹怒黑客Anna-senpai。

(2)2016年9月20日,著名的安全新闻工作者Brian Krebs的网站KrebsOnSecurity.com受到大规模的DDoS攻击,其攻击峰值达到665Gbps,Brian Krebs推测此次攻击由Mirai僵尸发动。

(3)2016年9月20日,Mirai针对法国网站主机OVH的攻击突破DDoS攻击记录,其攻击量达到1.1Tpbs,最大达到1.5Tpbs

(4)2016年9月30日,Anna-senpai在hackforums论坛公布Mirai源码,并且嘲笑之前逆向分析人员的错误分析。

(5)2016年10月21日,美国域名服务商Dyn遭受大规模DDoS攻击,其中重要的攻击源确认来自于Mirai僵尸。

在2016年10月初,Imperva Incapsula的研究人员通过调查到的49,657个感染设备源分析发现,其中主要感染设备有CCTV摄像头、DVRs以及路由器。根据这些调查的设备IP地址发现其感染范围跨越了164个国家或地区,其中感染量最多的是越南、巴西、美国、中国大陆和墨西哥。

直到2016年10月26日,我们通过Mirai特征搜索shodan发现,当前全球感染Mirai的设备已经超过100万台,其中美国感染设备有418,592台,中国大陆有145,778台,澳大利亚94,912台,日本和中国香港分别为47,198和44,386台。

在该地图中颜色越深,代表感染的设备越多,可以看出感染Mirai最多的几个国家有美国、中国和澳大利亚。

Mirai源码分析

Mirai源码是2016年9月30日由黑客Anna-senpai在论坛上公布,其公布在github上的源码被star了2538次,被fork了1371次。

Mirai通过扫描网络中的Telnet等服务来进行传播,实际受感染的设备bot并不充当感染角色,其感染通过黑客配置服务来实施,这个服务被称为Load。黑客的另外一个服务器C&C服务主要用于下发控制指令,对目标实施攻击。

通过我们对僵尸源码的分析发现,该僵尸具备如下特点:

(1)黑客服务端实施感染,而非僵尸自己实施感染。

(2)采用高级SYN扫描,扫描速度提升30倍以上,提高了感染速度。

(3)强制清除其他主流的IOT僵尸程序,干掉竞争对手,独占资源。比如清除QBOT、Zollard、Remaiten Bot、anime Bot以及其他僵尸。

(4)一旦通过Telnet服务进入,便强制关闭Telnet服务,以及其他入口如:SSH和web入口,并且占用服务端口防止这些服务复活。

(5)过滤掉通用电气公司、惠普公司、美国国家邮政局、国防部等公司和机构的IP,防止无效感染。

(6)独特的GRE协议洪水攻击,加大了攻击力度。

Mirai感染示意图:

上图简单显示了Mirai僵尸的感染过程,与普通僵尸感染不同的是,其感染端是通过黑客服务端实施的,而不是靠bot来实施感染。

受感染的设备端的 bot程序通过随机策略扫描互联网上的设备,并会将成功猜解的设备用户名、密码、IP地址,端口信息以一定格式上传给sanListen,sanLiten解析这些信息后交由Load模块来处理,Load通过这些信息来登录相关设备对设备实施感染,感染方式有echo方式、wget方式和tftp方式。这三种方式都会向目标设备推送一个具有下载功能的微型模块,这个模块被传给目标设备后,命名为dvrHelper。最后,dvrHelper远程下载bot执行,bot再次实施Telnet扫描并进行密猜解,由此周而复始的在网络中扩散。这种感染方式是极为有效的,Anna-senpai曾经每秒会得到500个成功爆破的结果。

bot分析

bot是mirai僵尸的攻击模块,其主要实现对网络服务设备(扫描过程不只针对IOT设备,只要开启Telnet服务的网络设备均不会放过)的Telnet服务的扫描并尝试进行暴力破解,其会将成功破解的设备ip地址、端口、用户名、密码等信息发送给黑客配置的服务器。并且同时接收C&C服务器的控制命令对目标发动攻击。

1、IOT设备防重启

由于Mirai的攻击目标主要设计来针对IOT设备,因此其无法将自身写入到设备固件中,只能存在于内存中。所以一旦设备重启,Mirai的bot程序就会消失。为了防止设备重启,Mirai向看门狗发送控制码0×80045704来禁用看门狗功能。

通常在嵌入式设备中,固件会实现一种叫看门狗(watchdog)的功能,有一个进程会不断的向看门狗进程发送一个字节数据,这个过程叫喂狗。如果喂狗过程结束,那么设备就会重启,因此为了防止设备重启,Mirai关闭了看门狗功能。这种技术常常被广泛应用于嵌入式设备的攻击中,比如曾经的海康威视漏洞(CVE-2014-4880)攻击代码中就采用过这种防重启技术。

这里有个小插曲,2016年8月31日,一位逆向分析人员将此代码判定错误,认为这是为了做延时而用,黑客Anna-senpai在Hackforums论坛公布源码时嘲笑并斥责了该逆向分析人员的错误。

2、进程名隐藏

Mirai为了防止进程名被暴露,在一定程度上做了隐藏,虽然这种隐藏并不能起到很好的作用。Mirai的具体做法是将字符串进行了随机化。

3、防止多实例运行

Mirai同大多数恶意代码一样,需要一种互斥机制防止同一个设备多个实例运行。但Mirai采用的手段有所不同,其通过开启48101端口来防止多个实例运行,具体做法是通过绑定和监听此端口,如果失败,便会关闭已经开启此端口的进程确保只有一个实例运行。这个特点是检测网络设备中是否存在Mirai的最高效的检测方法。

4、重绑定技术防止外来者抢占资源

Mirai有一个特点就是具有排他性,设备一旦感染,其会通过端口来关闭Telnet(23)、SSH(22,编译时可选删除项)、HTTP(80,编译时可选删除项)服务并且会阻止这些服务进行重启,其主要实现方法是通过kill强制关闭这三个服务进程,并强行占用这些服务开启时所需要的端口。此举Mirai既可以防止设备被其他恶意软件感染,也可以防止安全人员从外部访问该设备,提高Mirai的取证难度。此功能实现在killer.c文件中。

Telnet服务的重绑定实现如下图,SSH和HTTP服务采用类似的方式实现。

SSH服务的重绑定实现:

HTTP服务的重绑定实现:

通过对实际样本的分析我们发现,大部分黑客并没有对SSH和HTTP进行重绑定操作,绝大部分都只针对于Telnet服务进行了重绑定。

5、干掉竞争对手,独占资源

Mirai会通过一种 memory scraping的技术干掉设备中的其他恶意软件,其具体做法是搜索内存中是否存在QBOT特征、UPX特征、Zollard蠕虫特征、Remaiten bot特征来干掉对手,以达到独占资源的目的。

此外,Mirai如果发现anime恶意软件,同样也会强行干掉它。

6、可感染设备探测

Mirai僵尸随机扫描网络中IOT设备的Telnet服务并通过预植的用户名密码进行暴力破解,然后将扫描得到的设备IP地址、端口、设备处理器架构等信息回传给Load服务器。这里要注意的是,Mirai的随机扫描是有一个过滤条件的,其中比较有意思就是他会过滤掉通用电气公司、惠普公司、美国国家邮政局、国防部等公司和机构的IP地址。

Mirai僵尸中内置有60余个用户名和密码,其中内置的用户名和密码是加密处理过的,加密算法是通过简单的单字节多次异或实现,其密钥为0xDEADBEEF, 解密密钥为0xEFBEADDE。

Mirai使用高级SYN扫描技术对网络中的设备进行扫描破解,其速度较僵尸程序QBOT所采用的扫描技术快80倍,资源消耗减少至少达20倍。因此具备强大的扫描感染能力,黑客在收集肉鸡过程中,曾经每秒可新增500个IOT设备。

Telnet服务扫描实现如下:

当Mirai扫描到Telnet服务时,会连接Telnet并进行暴力登录尝试。Mirai首先会使用内置的用户名和密码尝试登录,之后通过发送一系列命令来判定登录成功与否。如果成功则试图进行一些操作,比如开启shell等操作,其发送的命令被初始化在一个Table中,如下表所示:

命令操作类型 Index 有效 功能描述
TABLE_SCAN_CB_DOMAIN 18 yes domain to connect to
TABLE_SCAN_CB_PORT 19 yes Port to connect to
TABLE_SCAN_SHELL 20 yes ‘shell’ to enable shell access
TABLE_SCAN_ENABLE 21 yes ‘enable’ to enable shell access
TABLE_SCAN_SYSTEM 22 yes ‘system’ to enable shell access
TABLE_SCAN_SH 23 yes ‘sh’ to enable shell access
TABLE_SCAN_QUERY 24 yes echo hex string to verify login
TABLE_SCAN_RESP 25 yes utf8 version of query string
TABLE_SCAN_NCORRECT 26 yes ‘ncorrect’ to fast-check for invalid password
TABLE_SCAN_PS 27 no “/bin/busybox ps”
TABLE_SCAN_KILL_9 28 no “/bin/busybox kill -9 “

以上表格中只有TABLE_SCAN_PS和TABLE_SCAN_KILL_9进行了初始化而未对目标设备进行预执行操作。从20到26的操作均是在发送用户名和密码后的登录验证操作。其中TABLE_SCAN_CB_DOMAIN和TABLE_SCAN_CB_PORT为黑客配置的Load服务器,该服务器用于获取有效的Telnet扫描结果,扫描结果中包含IP地址、端口、Telnet用户名和密码等信息。发送信息的格式如下:

zero(1个字节) IP地址(4bytes) 端口(2bytes) 用户名长度(4bytes) 用户名(muti-bytes) 密码长度(4bytes) 密码(muti-bytes)

7、连接C&C,等候发动攻击

Mirai的攻击类型包含UDP攻击、TCP攻击、HTTP攻击以及新型的GRE攻击。其中,GRE攻击就是著名安全新闻工作者Brian Krebs的网站KrebsOnSecurity.com遭受的主力攻击形式,攻击的初始化代码如下:

C&C会被初始化在一张表中,当Mirai回连C&C时,会从表中取出C&C进行连接。

连接C&C成功后,Mirai会进行上线,其上线过程非常简单,自身简单向C&C发送4个字节的0。

接下来会等候C&C的控制命令,伺机对目标发动攻击。对于接受控制命令处做了一些处理,比如首先会进行试读来做预处理(控制指令长度判定等等),最后才会接受完整的控制命令。

当接受到控制命令后,Mirai对控制命令做解析并且执行。控制命令格式如下:

type Attackstruct {

Durationuint32

Typeuint8

Targetsmap[uint32]uint8 //Prefix/netmask

Flagsmap[uint8]string // key=value

}

其中,前4个字节为攻击时长,接下来的4个字节为攻击类型(攻击ID),然后是攻击目标,攻击目标格式如下:

目标数(4个字节) IP地址(4个字节) MASK(一个字节) IP地址(4个字节) MASK(一个字节) IP地址….MASK…

最后是Flags,Flag是一系列的键值对数据,结构类似于攻击目标的格式。下面列出Mirai僵尸网络攻击功能列表。

攻击类型(32位) 类型值 攻击函数
ATK_VEC_UDP 0 attack_udp_generic
ATK_VEC_VSE 1 attack_udp_vse
ATK_VEC_DNS 2 attack_udp_dns
ATK_VEC_UDP_PLAIN 9 attack_udp_plain
ATK_VEC_SYN 3 attack_tcp_syn
ATK_VEC_ACK 4 attack_tcp_ack
ATK_VEC_STOMP 5 attack_tcp_stomp
ATK_VEC_GREIP 6 attack_gre_ip
ATK_VEC_GREETH 7 attack_gre_eth
ATK_VEC_PROXY 8 attack_app_proxy(已经被取消)
ATK_VEC_HTTP 10 attack_app_http

这其中的GRE攻击也就是9月20日安全新闻工作者Brian Krebs攻击事件的主力攻击类型。

scanListen分析

ScanListen主要用于处理bot扫描得到的设备信息(ip、端口、用户名、密码),并将其转化为如下格式后输入给Load处理。

Load分析

Load模块的主要功能是处理scanListen的输入并将其解析后针对每个设备实施感染。其感染实现方法如下:

(1)首先通过Telnet登陆目标设备。

(2)登陆成功后,尝试运行命令/bin/busybox ps来确认是否可以执行busybox命令。

(3)远程执行/bin/busybox cat /proc/mounts;用于发现可读写的目录。

(4)如果发现可用于读写的文件目录,进入该目录并将/bin/echo拷贝到该目录,文件更名为dvrHelpler,并开启所有用户的读写执行权限。

(5)接下来通过执行命令”/bin/busybox cat /bin/echo\r\n”来获取当前设备架构信息。

(6)如果获取架构信息成功,样本试图通过三种方式对设备进行感染,这三种方式分别为echo方式、wget方式、tftp方式。

(7)接下来通过Telnet远程执行下放的程序。

(8)最后远程删除bot程序。

总 结

僵尸网络已成为全球面临的共同问题,其攻击不同于其他以窃密、远控为主的恶意代码,其通过掌握着的巨型僵尸网络可以在任何时候对任何目标发动DDoS攻击。僵尸的感染对象已经从服务器、PC、智能手机,扩展向摄像头、路由器、家居安防系统、智能电视、智能穿戴设备,甚至是婴儿监视器,任何互联网连接的设备都可能成为一个潜在的目标。而一般用户是很难注意到被感染的状况的。Mirai僵尸由于源码的开放可能正在迅速的扩散,其攻击的流量特征也可能快速变化而难以监测。由于受感染目标多以IOT设备为主,所有的密码均固化在固件中,因此即便重启后Mirai从内存中消失也无法杜绝二次感染,并且隐藏在这种嵌入式设备中是极难判定其是否受到恶意感染。

缓解措施:

(1)如果感染Mirai,请重启设备,并且请求设备厂商更新固件剔除Telnet服务。

(2)不必要联网的设备尽量不要接入到互联网中。

(3)通过端口扫描工具探测自己的设备是否开启了SSH (22), Telnet (23)、 HTTP/HTTPS (80/443)服务,如果开启,请通知技术人员禁用SSH和Telnet服务,条件允许的话也可关闭HTTP./HTTPS服务(防止类似攻击利用Web对设备进行感染)。


Turn any hard drive into networked storage with Raspberry Pi

$
0
0

Networked hard drives are super convenient. You can access files no matter what computer you’re on — and even remotely.

But they’re expensive. Unless you use the Raspberry Pi.

If you happen to have a few of hard drives laying around you can put them to good use with a Raspberry Pi by creating your own, very cheap NAS setup. My current setup is two 4TB hard drives and one 128GB hard drive, connected to my network and accessible from anywhere using the Raspberry Pi.

Here’s how.

What you will need

raspberry-pi-nas-openmediavault.jpg
Taylor Martin/CNET

For starters, you need an external storage drive, such as an HDD, SSD or a flash drive.

You also need a Raspberry Pi. Models 1 and 2 work just fine for this application but you will get a little better support from the Raspberry Pi 3. With the Pi 3, you’re still limited to USB 2.0 and 100Mbps via Ethernet. However, I was able to power one external HDD with a Pi 3, while the Pi 2 Model B could not supply enough power to the same HDD.

In my Raspberry Pi NAS, I currently have one powered 4TB HDD, one non-powered 4TB HDD and a 128GB flash drive mounted without issue. To use a Pi 1 or 2 with this, you may want to consider using a powered USB hub for your external drives or using a HDD that requires external power.

Additionally, you need a microSD card — 8GB is recommended — and the OpenMediaVault OS image, which you can download here.

Installing the OS

To install the operating system, we will use the same method used for installing any OS without NOOBS. In short:

More detailed installation instructions can be found here for both Windows and Mac. Just substitute the Raspbian image with OpenMediaVault.

Setup

Screenshot by Taylor Martin/CNET

After the image has been written to the SD card, connect peripherals to the Raspberry Pi. For the first boot, you need a keyboard, monitor and a local network connection via Ethernet. Next, connect power to the Raspberry Pi and let it complete the initial boot process.

Once that is finished, use the default web interface credentials to sign in. (By default, the username isadmin and the password is openmediavault.) This will provide you with the IP address of the Raspberry Pi. After you have that, you will no longer need a keyboard and monitor connected to the Pi.

Connect your storage drives to the Raspberry Pi and open a web browser on a computer on the same network. Enter the IP address into the address bar of the browser and press return. Enter the same login credentials again ( admin for the username and openmediavault for the password) and you will be taken to the web interface for your installation of OpenMediaVault.

Mounting the disks

Screenshot by Taylor Martin/CNET

The first thing you will want to do to get your NAS online is to mount your external drives. Click File Systems in the navigation menu to the left under Storage.

Locate your storage drives, which will be listed under the Device column as something like /dev/sda1 or/dev/sdc2. Click one drive to select it and click Mount. After a few seconds have passed, click Apply in the upper right corner to confirm the action.

Repeat this step to mount any additional drives.

Creating a shared folder

Screenshot by Taylor Martin/CNET

Next, you will need to create a shared folder so that the drives can be accessed by other devices on the network. To do this:

  • Click Shared Folders in the navigation pane under Access Rights Management.
  • Click Add and give the folder a name.
  • Select one of the storage drives in the dropdown menu to the right of Volume.
  • Specify a path (if you want it to be different from the name).
  • Click save.

Enabling SMB/CFIS

Screenshot by Taylor Martin/CNET

Finally, to access these folders and drives from an external computer on the network, you need to enable SMB/CFIS.

Click SMB/CFIS under Services in the left navigation pane and click the toggle button beside Enable. Click Save and Apply to confirm the changes.

Next, click on the Shares tab near the top of the window. Click Add, select one of the folders you created in the dropdown menu beside Shared folder and click Save. Repeat this step for shared folders you created.

Accessing the drives over your network

Now that your NAS is up and running, you need to map those drives from another computer to see them. This process is different for Windows and Mac, but should only take a few seconds.

Windows

raspberry-pi-map-network-folder-windows.png
Screenshot by Taylor Martin/CNET

To access a networked drive on Windows, open File Explorer and click This PC. Select the Computer tab and click Map network drive.

In the dropdown menu beside Drive choose an unused drive letter. In the Folder field, input the path to the network drive. By default, it should look something like \\RASPBERRYPI\[folder name]. (For instance, one of my folders is HDD, so the folder path is \\RASPBERRYPI\HDD). Click Finish and enter the login credentials. By default, the username is pi and the password is raspberry. If you change or forgot the login for the user, you can reset it or create a new user and password in the web interface under User in Access Rights Management.

Mac

mount-network-drive-mac-raspberry-pi.png

To open a networked folder in OS X, open Finder and press Command + K. In the window that appears, type smb://raspberrypi or smb://[IP address] and click Connect. In the next window, highlight the volumes you want to mount and click OK.

You should now be able to see and access those drives within Finder or File Explorer and move files on or off the networked drives.

There are tons of settings to tweak inside OpenMediaVault, including the ability to reboot the NAS remotely, setting the date and time, power management, a plugin manager and much, much more. But if all you need is a network storage solution, you’ll never need to dig any deeper.


微信教父张小龙所说的敏捷开发是什么?

$
0
0

微信教父张小龙所说的敏捷开发是什么?

技能GET
GrowingIO • 6小时前
总结起来就是“MVP”和“精益分析”两个概念
微信教父张小龙所说的敏捷开发是什么?
本文作者:GrowingIO增长团队,集工程、产品、市场、分析多重角色于一身,负责拉新和用户活跃,用数据驱动业务增长。原文发于GrowingIO博客和公众号,授权36氪发布。

昨天晚上,产品教父张小龙在WXG(微信事业群)领导力大会上的讲话又一次刷爆了互联网人的朋友圈。谈到敏捷开发的时候,张小龙直言:

我们今天可以想一些与众不同的点子,然后我们可以很快就看到效果,因为我们可以很快把它上线了,然后可以去验证,如果不对就下线,如果还有改进余地,下个星期再去改它。这是一个能够持续实现你的想法的过程。

其实这种敏捷开发的方法由来已久,并且被Google、Facebook等硅谷企业广泛应用。它已经形成了一套完整的方法论,总结起来就是“MVP”和“精益分析”两个概念。

一、敏捷的背后:MVP与精益分析

(一)MVP

MVP是最简化可实行产品(Minimum Viable Product)的简称。最简化可实行产品是以尽可能低的成本展现产品的核心概念,用最快、最简的方式建立一个可用的产品原型,用这个原型表达出你产品最终想要的效果,然后通过迭代来完善细节。

 

微信教父张小龙所说的敏捷开发是什么?

图1:MVP的产品迭代策略

假如你的产品愿景是一种高级出行工具,比如小轿车。传统的产品设计思路是一步一步,从车轮、车轱辘、外壳、动力装置、内部装饰一个流程一个流程做起,最后得到一个完善的产品。而MVP的思路,我们可能会先做一个小滑板车或者自行车,看看用户对出行工具的认可程度。如果用户认可我们的产品概念,我们可以接下去生产更加高级、完善的摩托车、甚至小轿车。

传统产品迭代思路成本高、速度慢、风险大,花高成本做出来的产品用户可能不认可;MVP策略的优点在于试错成本低、速度快、风险低,能满足产品快速迭代的需求。

(二)精益分析

埃里克·莱斯,硅谷著名的企业家和作家,最先提出来“精益创业”的概念。精益创业的理念涵括精益客户开发、精益软件开发和精益生产实践三大部分,这是一个快速和高效开发产品和服务的框架。

 

微信教父张小龙所说的敏捷开发是什么?

图2:”构建-衡量-学习“的精益分析框架

精益创业的本质是精益分析,核心是“构建-衡量-学习”循环。张小龙的敏捷开发想法和这个方法论不谋而合:首先是你有一个想法或者灵感,然后通过MVP策略产品快速上线;产品上线后,通过数据来衡量用户的表现,如果好的话就保持、继续优化,不好的下线反思。通过这种循环,产品快速迭代、用户的需求得到更好的满足。

二、数据驱动的MVP

在MVP开发过程或者精益分析整个框架中,最重要的莫属于衡量(measure)这个环节。如何选择合理的指标来衡量MVP的效果?如何通过合理工具来监测用户行为数据、优化产品?这都是敏捷开发需要解决的问题。

在精益化运营的今天,产品、市场、运营如何通过敏捷开发来提升效率,这是一个所有互联网人的需要面对的命题。

(一)产品:MVP

任何产品,信息量达到一个量级的时候就会出现信息查找困难的情况。一个装满文件的Mac笔记本,想找一个文档却不知道放在哪个文件夹了,这个时候就可以使用【Spotlight搜索】进行全局搜索。这样的好处,一个是能找到所有相关的文件,另一方面是节省时间。

微信教父张小龙所说的敏捷开发是什么?

图3:借鉴Mac笔记本的Spotlight搜索功能

某数据分析产品内含单图、看板、漏斗、分群等十多个功能模块,如何通过一次检索找到需要的图表就非常重要。因为这样可以避免个功能之间来回跳转,节省检索时间、提高效率。

1. 提出想法、快速构建

工程师提出做一个搜索工具的想法,方便用户在整个系统内进行全局搜索。产品经理根据用户反馈设定使用产品,把功能点进行优先级排序,确定MVP(最简化可实行产品)–––部分功能点击后直接进入详情页,降低操作成本。

2.产品落地、快速上线

工程师开始进行搜索库的建设,打通各个模块里的图表存储做为搜索库,匹配拼音/空格等多种搜索方式。设计师负责界面样式的设计,把信息直观地表现出来。

微信教父张小龙所说的敏捷开发是什么?

图4:GrowingIO的全局检索功能

3.数据验证、快速迭代

短短 1 天半的时间里,全局搜索产品快速迭代了三次。从只能把汉字作为关键字,到可以直接用拼音进行搜索,再把关键词和模块自动分类 ,提高整个搜索工具的检索速度。

在这次MVP实践中,快速组建团队、合理分工、快速上线验证起到了重要作用。团队越来越大的时候,我们需要跨部门更加直接的沟通、需要直接在白板前讨论的效率。这个【全局搜索】开发案例是一个非常好的MVP案例。

(二)营销:快速试错

商业环境变幻莫测,如何在竞争激烈的市场中赶超对手,这个时候快速、正确的决策尤为重要。

微信教父张小龙所说的敏捷开发是什么?

图5:GrowingIO分钟级别的实时数据监测

某在线教育产品正在探索获取用户的新方式,依次尝试了免费视频课程、赠送电子教材、热门话题在线讨论等多种方法,但是效果都一般般。某日该企业市场部门举行了一次免费在线直播课程,直播结束后的5分钟,该社区平台的流量比往日均值上涨了8倍,获取了大量新用户。实时监测到的数据极大启发了市场人员,直播流量会成为营销的新一轮增长点。

某互联网金融企业在SEM方面有大量的推广,但是他的CAC(获客成本)只有行业均值的一半,这到底是怎样做到的呢?在投放的初期,他们也面临着如何优化渠道、关键词等问题。在没有任何指导数据的基础上,他们对市场上的主流搜索引擎、相关关键词进行了一轮短时间的“地毯式轰炸”。

微信教父张小龙所说的敏捷开发是什么?

图6:GrowingIO监测到的广告效果

通过前期的地毯式投放,该企业获得了各个渠道、各个关键词的转化数据。通过数据分析,市场人员选出来转化效果最佳的两个渠道,进行大规模投放,并且在后期不断优化。通过这种方式该企业以非常低的成本获得了大量的新用户。

(三)运营:精益化

产品越来越多,同质化竞争越来越严重。产品要想在激烈的竞争中拔得头筹,除了产品设计,运营也是非常重要的一个因素。

某技术社区花了大量精力获取新用户,但是新用户的流失率一直居高不下,次周留存率才11%。一个偶然机会,一个新用户反馈给社区运营,说他们的精选文章非常好,他每天都看。

微信教父张小龙所说的敏捷开发是什么?

图7:GrowingIO留存图

受到这个启发,运营想既然精选文章能吸引新用户留下来,那是不是可以向新用户多多推荐精选内容呢?于是他们挑选了20篇优质文章拼凑成了一本PDF电子书,每位新注册用户都能收到这本电子书。数据显示,点击了这本电子书的用户次周留存率提升到了43%,这样的数据让社区运营人员异常兴奋。

微信教父张小龙所说的敏捷开发是什么?

图8:电子书MVP迭代过程

通过对读过电子书新用户的访谈,他们发现之前仓促准备的电子书内容不是适合所有人、而且内容深度不一,给很多新用户造成困恼。在接下来的几周时间里,该社区准备了前端、中端、后端、测试4个技术方向电子书,并且根据用户的工作时间推荐初级、中级、高级三个版本。一个月后,新版电子书向新用户推送,数据显示点击过电子书的新用户次周留存率再一次提升到了56%。

精益不是廉价或者小规模的代名词,精益意味着破除浪费、低效率并且快速行动起来,它适用于任何组织。MVP不仅是改善产品的方式,更是现实业务的监测器。 

谈及之前所做的事,用张小龙自己的话说,“一个非常平庸的团队用了一些非常平庸的方法去做出来一个非常平庸的产品,而且是不知不觉的!”那么,如今的你为何不放手一搏。搭建一个志同道合的团队,尝试一下敏捷开发的方式,在不断试错中做出一个个惊艳的功能!

你,是不是也想试试?


10 Lessons Learned from Building Deep Learning Systems

$
0
0

10 Lessons Learned from Building Deep Learning Systems

Deep Learning is a sub-field of Machine Learning that has its own peculiar ways of doing things. Here are 10 lessons that we’ve uncovered while building Deep Learning systems. These lessons are a bit general, although they do focus on applying Deep Learning in a area that involves structured and unstructured data.

  1. The More Experts the Better

The one tried and true way to improve accuracy is to have more networks perform the inferencing and combining the results. In fact, techniques like DropOut is a means of creating “Implicit Ensembles” were multiple subsets of superimposed networks cooperate using shared weights.

2. Seek Problems where Labeled Data is Abundant

The current state of Deep Learning is that it works well only in a supervised context. The rule of thumb is around 1,000 samples per rule. So if you are given a problem where you don’t have enough data to train with, try considering an intermediate problem that does have more data and then run a simpler algorithm with the results from the intermediate problem.

3. Search for ways to Synthesize Data

Not all data is nicely curated and labeled for machine learning. Many times you have data that are weakly tagged. If you can join data from disparate sources to achieve a weakly labeled set, then this approach works surprisingly well. The most well known example is Word2Vec where you train for word understanding based on the words that happen to be in proximity with other words.

4. Leverage Pre-trained Networks

One of the spectacular capabilities of Deep Learning networks is that bootstrapping from an existing pre-trained network and using it to train into a new domain works surprisingly well.

5. Don’t forget to Augment Data

Data usually have meaning that a human may be aware of that a machine can likely never discover. One simple example is a time feature. From the perspective of a human the day of the week, whether this is a holiday or not or the time of the day may be important attributes, however a Deep Learning system may never be able to surface that if all its given are seconds since Unix epoch.

6. Explore Different Regularizations

L1 and L2 regularizations are not the only regularizations that are out there. Explore the different kinds and perhaps look at different regularizations per layer.

7. Embrace Randomness

There are multiple techniques to initialize your network prior to training. In fact, you can get very far just training the last layer of a network with the previous layers being mostly random. Consider using this technique to speed up you Hyper-tuning explorations.

8. End-to-End Deep Learning is a Hail Mary Play

A lot of researchers love to explore end-to-end deep learning research. Unfortunately, the most effective use of Deep Learning has been to couple it with out techniques. AlphaGo would not have been successful if Monte Carlo Tree Search was not employed. If you want to make an impact in the Academic community then End-to-end Deep Learning might be your gamble. However in a time constrained industrial environment that demands predictable results, then you best be more pragmatic.

9. Resist the Urge to Distribute

If you can, try to avoid using multiple machines (with the exception of hyper-parameter tuning). Training on a single machine is the most cost effective way to proceed.

10. Convolution Networks work pretty well even beyond Images

Convolution Networks are clearly the most successful kind of network in the Deep Learning space. However, ConvNets are not only for Images, you can use them for other kinds of features (i.e. Voice, time series, text).

That’s all I have for now. There certainly a lot more other lessons. Let me know if you stumble on others.

You can find more details of these individual lessons athttp://www.deeplearningpatterns.com


Originally published at blog.alluviate.com.


Viewing all 759 articles
Browse latest View live