Google Cloud Platform – Machine Learning as a Service

Links:

Google Cloud Console: https://console.cloud.google.com/home/

An example on Census dataset: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census

Common Commands:

gcloud ml-engine jobs describe [JOB_NAME]

gcloud ml-engine jobs cancel [JOB_NAME]

Related Concepts:

Wide models v.s. deep models

https://www.tensorflow.org/tutorials/wide_and_deep

Embedding columns: suitable for sparse attributes (e.g., native_country, occupation)

https://www.tensorflow.org/programmers_guide/embedding

Note:

Error Message:

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 4, in <module> import model File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 40, in <module> tf.feature_column.categorical_column_with_vocabulary_list( AttributeError: 'module' object has no attribute 'feature_column'

Solution:

Change runtime version from 1.0 to 1.2

gcloud ml-engine jobs submit training $JOB_NAME \
                                    --stream-logs \
                                    --scale-tier $SCALE_TIER \
                                    --runtime-version 1.2 \
                                    --job-dir $GCS_JOB_DIR \
                                    --module-name trainer.task \
                                    --package-path trainer/ \
                                    --region us-central1 \
                                    -- \
                                    --train-files $TRAIN_FILE \
                                    --eval-files $EVAL_FILE \
                                    --train-steps $TRAIN_STEPS \
                                    --eval-steps 100
Advertisements

Weka APIs

References:

Filters

weka.filters.unsupervised.attribute.Remove

Remove specified attributes (columns)

http://weka.sourceforge.net/doc.stable/weka/filters/unsupervised/attribute/Remove.html

Valid options are:

 -R <index1,index2-index4,...>
  Specify list of columns to delete. First and last are valid
  indexes. (default none)

-V Invert matching sense (i.e. only keep specified columns)

Instances

public Instances trainCV(int numFolds, int numFold)

This function split the training dataset for cross-validation. It does not randomize the input dataset before splitting.

 

public Instances trainCV(int numFolds, int numFold, java.util.Random random)

This function only randomizes the dataset after splitting.

Save Dataset

 ArffSaver saver = new ArffSaver();
 saver.setInstances(dataSet);
 saver.setFile(new File("./data/test.arff"));
 saver.setDestination(new File("./data/test.arff"));   // **not** necessary in 3.5.4 and later
 saver.writeBatch();
 https://weka.wikispaces.com/Save+Instances+to+an+ARFF+File