AI
Jump to navigation
Jump to search
Applications
- 人何時走完全未知?美研發AI預測臨終準確度達90%
- 美國FDA首次批准AI醫療儀器上市,能自動即時偵測糖尿病視網膜病變
- 在家養老-科技幫大忙
- 病理研究有新幫手,Google以AR顯微鏡結合深度學習即時發現癌細胞
- This New App Is Like Shazam for Your Nature Photos. Seek App.
- Draw This camera prints crappy drawings of the things you photograph (DIY) with Google's quickdraw.
- What Are Machine Learning Algorithms? Here’s How They Work
- Google的人工智慧開源神器三歲了,它被用在很多你想不到的地方 Nov 2018
TensorFlow
- https://www.tensorflow.org/
- https://tensorflow.rstudio.com/
- R interface to Keras. I followed the instruction for the installation but got an error of illegal operand. The solution is to use an older version of tensorflow; see here. library(keras); install_keras(tensorflow = "1.5") (Ubuntu 16.04, Phenom(tm) II X6 1055T)
- https://rviews.rstudio.com/2018/04/03/r-and-tensorflow-presentations/, Slides
- https://hub.docker.com/r/andrie/tensorflowr/ (outdated)
- Deep Learning on Biowulf
- Raspberry Pi
- Books
- Deep Learning with R by François Chollet with J. J. Allaire, 2018. ISBN-10: 161729554X (available on safaribooksonline)
- Deep Learning with Python by François Chollet, 2017 (available on safaribooksonline)
- Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville
- Deep Learning Glossary
Keras
- Derivative of a tensor operation: the gradient
- Define loss_value = f(W) = dot(W, x)
- W1 = W0 - step * gradient(f)(W0)
- Stochastic gradient descent
- Tensor operations:
- relu(x) = max(0, x)
- Each neural layer from our first network example transforms its input data:output = relu(dot(W, input) + b) where W and b are the weights or trainable parameters of the layer.
Training process:
- Draw a batch of X and Y
- Run the network on x (a step called the forward pass) to obtain predictions y_pred.
- How many layers to use.
- How many “hidden units” to chose for each layer.
- Compute the loss of the network on the batch
- loss
- optimizer: determines how learning proceeds (how the network will be updated based on the loss function). It implements a specific variant of stochastic gradient descent (SGD).
- metrics
- Update all weights of the network in a way that slightly reduces the loss on this batch.
- batch_size
- epochs (=iteration over all samples in a batch_size of samples)
Keras (in order to use Keras, you need to install TensorFlow or CNTK or Theano):
- Define your training data: input tensors and target tensors.
- Define a network of layers (or model). Two ways to define a model:
- using the keras_model_sequential() function (only for linear stacks of layers, which is the most common network architecture by far) or
model <- keras_model_sequential() %>% layer_dense(units = 32, input_shape = c(784)) %>% layer_dense(units = 10, activation = "softmax")
- the functional API (for directed acyclic graphs of layers, which let you build completely arbitrary architectures)
input_tensor <- layer_input(shape = c(784)) output_tensor <- input_tensor %>% layer_dense(units = 32, activation = "relu") %>% layer_dense(units = 10, activation = "softmax") model <- keras_model(inputs = input_tensor, outputs = output_tensor)
- using the keras_model_sequential() function (only for linear stacks of layers, which is the most common network architecture by far) or
- Compile the learning process by choosing a loss function, an optimizer, and some metrics to monitor.
model %>% compile( optimizer = optimizer_rmsprop(lr = 0.0001), loss = "mse", metrics = c("accuracy") )
- Iterate on your training data by calling the fit() method of your model.
model %>% fit(input_tensor, target_tensor, batch_size = 128, epochs = 10)
The following examples can be found at R Markdown Notebooks for "Deep Learning with R"
Some examples
- Binary data (Chapter 3.4).
- The final layer will use a sigmoid activation so as to output a probability (a score between 0 and 1, indicating how likely the sample is to have the target “1”.
- A relu (rectified linear unit) is a function meant to zero-out negative values, while a sigmoid “squashes” arbitrary values into the [0, 1] interval, thus outputting something that can be interpreted as a probability.
library(keras) imdb <- dataset_imdb(num_words = 10000) c(c(train_data, train_labels), c(test_data, test_labels)) %<-% imdb # Preparing the data vectorize_sequences <- function(sequences, dimension = 10000) {...} x_train <- vectorize_sequences(train_data) x_test <- vectorize_sequences(test_data) y_train <- as.numeric(train_labels) y_test <- as.numeric(test_labels) # Build the network ## Two intermediate layers with 16 hidden units each ## The final layer will output the scalar prediction model <- keras_model_sequential() %>% layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>% layer_dense(units = 16, activation = "relu") %>% layer_dense(units = 1, activation = "sigmoid") model %>% compile( optimizer = "rmsprop", loss = "binary_crossentropy", metrics = c("accuracy") ) model %>% fit(x_train, y_train, epochs = 4, batch_size = 512) ## Error in py_call_impl(callable, dots$args, dots$keywords) : MemoryError: # Validation results <- model %>% evaluate(x_test, y_test) # Prediction on new data model %>% predict(x_test[1:10,])
- Multi class data (Chapter 3.5)
- Goal: build a network to classify Reuters newswires into 46 different mutually-exclusive topics.
- You end the network with a dense layer of size 46. This means for each input sample, the network will output a 46-dimensional vector. Each entry in this vector (each dimension) will encode a different output class.
- The last layer uses a softmax activation. You saw this pattern in the MNIST example. It means the network will output a probability distribution over the 46 different output classes: that is, for every input sample, the network will produce a 46-dimensional output vector, where outputi is the probability that the sample belongs to class i. The 46 scores will sum to 1.
library(keras) reuters <- dataset_reuters(num_words = 10000) c(c(train_data, train_labels), c(test_data, test_labels)) %<-% reuters model <- keras_model_sequential() %>% layer_dense(units = 64, activation = "relu", input_shape = c(10000)) %>% layer_dense(units = 64, activation = "relu") %>% layer_dense(units = 46, activation = "softmax") model %>% compile( optimizer = "rmsprop", loss = "categorical_crossentropy", metrics = c("accuracy") ) history <- model %>% fit( partial_x_train, partial_y_train, epochs = 9, batch_size = 512, validation_data = list(x_val, y_val) ) results <- model %>% evaluate(x_test, one_hot_test_labels) # Prediction on new data predictions <- model %>% predict(x_test)
- Regression data (Chapter 3.6)