AI
Jump to navigation
Jump to search
Applications
- 人何時走完全未知?美研發AI預測臨終準確度達90%
- 美國FDA首次批准AI醫療儀器上市,能自動即時偵測糖尿病視網膜病變
- 在家養老-科技幫大忙
- 病理研究有新幫手,Google以AR顯微鏡結合深度學習即時發現癌細胞
- This New App Is Like Shazam for Your Nature Photos. Seek App.
- Draw This camera prints crappy drawings of the things you photograph (DIY) with Google's quickdraw.
- What Are Machine Learning Algorithms? Here’s How They Work
- Google的人工智慧開源神器三歲了,它被用在很多你想不到的地方 Nov 2018
TensorFlow
- https://www.tensorflow.org/
- https://tensorflow.rstudio.com/
- R interface to Keras. I followed the instruction for the installation but got an error of illegal operand. The solution is to use an older version of tensorflow; see here. library(keras); install_keras(tensorflow = "1.5") (Ubuntu 16.04, Phenom(tm) II X6 1055T)
- https://rviews.rstudio.com/2018/04/03/r-and-tensorflow-presentations/, Slides
- https://hub.docker.com/r/andrie/tensorflowr/, https://hub.docker.com/r/rocker/ml/dockerfile (outdated)
- Deep Learning on Biowulf
- Raspberry Pi
- Books
- Deep Learning with R by François Chollet with J. J. Allaire, 2018. ISBN-10: 161729554X (available on safaribooksonline)
- Deep Learning with Python by François Chollet, 2017 (available on safaribooksonline)
- Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville
- Deep Learning Glossary
Keras
- Derivative of a tensor operation: the gradient
- Define loss_value = f(W) = dot(W, x)
- W1 = W0 - step * gradient(f)(W0)
- Stochastic gradient descent
- Tensor operations:
- relu(x) = max(0, x)
- Each neural layer from our first network example transforms its input data:output = relu(dot(W, input) + b) where W and b are the weights or trainable parameters of the layer.
Training process:
- Draw a batch of X and Y
- Run the network on x (a step called the forward pass) to obtain predictions y_pred.
- How many layers to use.
- How many “hidden units” to chose for each layer.
- Compute the loss of the network on the batch
- loss
- optimizer: determines how learning proceeds (how the network will be updated based on the loss function). It implements a specific variant of stochastic gradient descent (SGD).
- metrics
- Update all weights of the network in a way that slightly reduces the loss on this batch.
- batch_size
- epochs (=iteration over all samples in a batch_size of samples)
Keras (in order to use Keras, you need to install TensorFlow or CNTK or Theano):
- Define your training data: input tensors and target tensors.
- Define a network of layers (or model). Two ways to define a model:
- using the keras_model_sequential() function (only for linear stacks of layers, which is the most common network architecture by far) or
model <- keras_model_sequential() %>% layer_dense(units = 32, input_shape = c(784)) %>% layer_dense(units = 10, activation = "softmax")
- the functional API (for directed acyclic graphs of layers, which let you build completely arbitrary architectures)
input_tensor <- layer_input(shape = c(784)) output_tensor <- input_tensor %>% layer_dense(units = 32, activation = "relu") %>% layer_dense(units = 10, activation = "softmax") model <- keras_model(inputs = input_tensor, outputs = output_tensor)
- using the keras_model_sequential() function (only for linear stacks of layers, which is the most common network architecture by far) or
- Compile the learning process by choosing a loss function, an optimizer, and some metrics to monitor.
model %>% compile( optimizer = optimizer_rmsprop(lr = 0.0001), loss = "mse", metrics = c("accuracy") )
- Iterate on your training data by calling the fit() method of your model.
model %>% fit(input_tensor, target_tensor, batch_size = 128, epochs = 10)
The following examples can be found at R Markdown Notebooks for "Deep Learning with R"
Some examples
- Binary data (Chapter 3.4).
- The final layer will use a sigmoid activation so as to output a probability (a score between 0 and 1, indicating how likely the sample is to have the target “1”.
- A relu (rectified linear unit) is a function meant to zero-out negative values, while a sigmoid “squashes” arbitrary values into the [0, 1] interval, thus outputting something that can be interpreted as a probability.
library(keras) imdb <- dataset_imdb(num_words = 10000) c(c(train_data, train_labels), c(test_data, test_labels)) %<-% imdb # Preparing the data vectorize_sequences <- function(sequences, dimension = 10000) {...} x_train <- vectorize_sequences(train_data) x_test <- vectorize_sequences(test_data) y_train <- as.numeric(train_labels) y_test <- as.numeric(test_labels) # Build the network ## Two intermediate layers with 16 hidden units each ## The final layer will output the scalar prediction model <- keras_model_sequential() %>% layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>% layer_dense(units = 16, activation = "relu") %>% layer_dense(units = 1, activation = "sigmoid") model %>% compile( optimizer = "rmsprop", loss = "binary_crossentropy", metrics = c("accuracy") ) model %>% fit(x_train, y_train, epochs = 4, batch_size = 512) ## Error in py_call_impl(callable, dots$args, dots$keywords) : MemoryError: # Validation results <- model %>% evaluate(x_test, y_test) # Prediction on new data model %>% predict(x_test[1:10,])
- Multi class data (Chapter 3.5)
- Goal: build a network to classify Reuters newswires into 46 different mutually-exclusive topics.
- You end the network with a dense layer of size 46. This means for each input sample, the network will output a 46-dimensional vector. Each entry in this vector (each dimension) will encode a different output class.
- The last layer uses a softmax activation. You saw this pattern in the MNIST example. It means the network will output a probability distribution over the 46 different output classes: that is, for every input sample, the network will produce a 46-dimensional output vector, where outputi is the probability that the sample belongs to class i. The 46 scores will sum to 1.
library(keras) reuters <- dataset_reuters(num_words = 10000) c(c(train_data, train_labels), c(test_data, test_labels)) %<-% reuters model <- keras_model_sequential() %>% layer_dense(units = 64, activation = "relu", input_shape = c(10000)) %>% layer_dense(units = 64, activation = "relu") %>% layer_dense(units = 46, activation = "softmax") model %>% compile( optimizer = "rmsprop", loss = "categorical_crossentropy", metrics = c("accuracy") ) history <- model %>% fit( partial_x_train, partial_y_train, epochs = 9, batch_size = 512, validation_data = list(x_val, y_val) ) results <- model %>% evaluate(x_test, one_hot_test_labels) # Prediction on new data predictions <- model %>% predict(x_test)
- Regression data (Chapter 3.6)
- Because so few samples are available, we will be using a very small network with two hidden layers. In general, the less training data you have, the worse overfitting will be, and using a small network is one way to mitigate overfitting.
- Our network ends with a single unit, and no activation (i.e. it will be linear layer). This is a typical setup for scalar regression (i.e. regression where we are trying to predict a single continuous value). Applying an activation function would constrain the range that the output can take. Here, because the last layer is purely linear, the network is free to learn to predict values in any range.
- We are also monitoring a new metric during training: mae. This stands for Mean Absolute Error.
library(keras) dataset <- dataset_boston_housing() c(c(train_data, train_targets), c(test_data, test_targets)) %<-% dataset build_model <- function() { model <- keras_model_sequential() %>% layer_dense(units = 64, activation = "relu", input_shape = dim(train_data)[[2]]) %>% layer_dense(units = 64, activation = "relu") %>% layer_dense(units = 1) model %>% compile( optimizer = "rmsprop", loss = "mse", metrics = c("mae") ) } # K-fold CV k <- 4 indices <- sample(1:nrow(train_data)) folds <- cut(1:length(indices), breaks = k, labels = FALSE) num_epochs <- 100 all_scores <- c() for (i in 1:k) { cat("processing fold #", i, "\n") # Prepare the validation data: data from partition # k val_indices <- which(folds == i, arr.ind = TRUE) val_data <- train_data[val_indices,] val_targets <- train_targets[val_indices] # Prepare the training data: data from all other partitions partial_train_data <- train_data[-val_indices,] partial_train_targets <- train_targets[-val_indices] # Build the Keras model (already compiled) model <- build_model() # Train the model (in silent mode, verbose=0) model %>% fit(partial_train_data, partial_train_targets, epochs = num_epochs, batch_size = 1, verbose = 0) # Evaluate the model on the validation data results <- model %>% evaluate(val_data, val_targets, verbose = 0) all_scores <- c(all_scores, results$mean_absolute_error) }