What’s really cool is that Keras in R now has an interface to AI Platform (fka Cloud ML) in Google Cloud Platform. You do some trial-and-error on your computer and once you’re good to go, you can submit your job with a couple lines of code.
- Part 1: Using Keras in R: Installing and Debugging
- Part 2: Using Keras in R: Training a model
- Part 3: Using Keras in R: Hypertuning a model
- Part 4: Using Keras in R: Submitting a job to AI Platform
Just to be clear: you need a working project in Google Cloud. You can get a free $300 to spend if you’ve never registered for an account. If that’s done, go to AI Platform in the GCP menu and follow the instructions to enable this component. If you got that set up, you can proceed to the rest of this blog.
There is a compatibility issue with R versions > 3.5. If you see the following error in the logs of your job, it means you use R version 3.6.0 on your local computer. This version uses a non-backward compatible way of saving RDS files. Once this RDS file is uploaded to AI Platform, it’s unreadable, since version 3.5.0 is used in there.
Error in readRDS(“cloudml/deploy.rds”) :
cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer
[…]
The replica master 0 exited with a non-zero status of 1.
What I recommend from this point onward is:
- Create a virtual environment in miniconda and source it
- Install R version 3.4.1 using the command conda install -c conda-force r=3.4.1
- Run Rstudio from your virtual environment
- Select the freshly installed R 3.4.1 in Rstudio
- Reinstall all the necessary packages
- Proceed
The next thing you should do is install Google Cloud SDK. I managed to use my existing installation, but R was not able to use it. So I reinstalled the Google Cloud SDK through Rstudio.
install.packages("cloudml")
library(cloudml)
gcloud_install()
You can go into the Google Cloud terminal by using the command gcloud_init(). However, I keep getting the following error.
Failed to start terminal: ‘system error 126 (The specified module could not be found)’
It drove me nuts for a while, but I circumvented this step by basically just opening the Google Cloud SDK Shell myself. You should run gcloud init to get started and select a project. You should also login through gcloud auth login.
Next: you should move the data set and training script (see previous blog post) to a new directory within the current working directory. That’s because, once you submit your job to GCP, all files in your working directory will be passed along it. This might contain references to R packages that are not available in GCP and you will get the following error:
Error: Unable to retrieve package records for the following packages
Since you will be deploying the training script to a server somewhere in the world, it should be able to run stand alone. Loading, preprocessing, etc should all be in the training script.
Next up, setting the parameters for the job. This is done through a YAML file, that you can also create and edit in RStudio. Once again you can hypertune all the parameters that are set as a flag in the training script. You can find a complete list of possible parameters and their values on this reference page.
trainingInput:
scaleTier: CUSTOM
masterType: standard_gpu
hyperparameters:
goal: MINIMIZE
hyperparameterMetricTag: val_loss
maxTrials: 10
maxParallelTrials: 2
params:
- parameterName: dropout1
type: DOUBLE
minValue: 0.3
maxValue: 0.6
scaleType: UNIT_LINEAR_SCALE
- parameterName: neurons1
type: DISCRETE
discreteValues:
- 128
- 256
- parameterName: neurons2
type: DISCRETE
discreteValues:
- 128
- 256
- parameterName: neurons3
type: DISCRETE
discreteValues:
- 128
- 256
- parameterName: lr
type: DOUBLE
minValue: 0.0001
maxValue: 0.01
scaleType: UNIT_LINEAR_SCALE
- parameterName: l2
type: DOUBLE
minValue: 0.0001
maxValue: 0.01
scaleType: UNIT_LINEAR_SCALE
Finally, in R, change to the appropriate working directory and submit the job to AI Platform.
setwd('gcp')
cloudml_train('nn_ht_gcp.R', config = 'tuning.yml')
Important: before submitting your training job, make sure that Cloud ML (AI Platform) can write logs, otherwise, when you submit a job it will prepare the whole shebang and eventually just tell you that “The replica master 0 exited with a non-zero status of 1.” You can do that by going to IAM in the GCP menu and check if the Cloud ML Service agent actually has the role of Logs Writer. If not, assign it that role.

Now you can go to GCP AI Platform and see if the job is active. By clicking on ‘View logs’, you’ll be able to access stackdriver logs and monitor what’s going on in the back.

Your point of view caught my eye and was very interesting. Thanks. I have a question for you.
I’m reaply loving the theme/design of your website.
Do yoou evber ruun into anny wweb browsser compatibility problems?
A few of my blogg visitors have complained about myy blog not operating correctly in Explorer but looks great in Opera.
Do you have any tips tto hedlp fix this problem?
With hvin soo muych conhtent do yoou ever run into anyy issuees of plagorism orr copyright infringement?
My sie haas a lot oof completey unique content
I’ve either created myself orr outsourced buut itt ooks like a lott oof it is popping iit upp alll ver
tthe web without myy permission. Do youu know anyy
soluttions to hlp prevenbt contednt frfom being stolen? I’dtruly appreciate
it.