Spot conf parser#142
Conversation
| "--conf spark.executor.cores=" + SPK_EXEC_CORES, | ||
| "--conf spark.executor.memory=" + SPK_EXEC_MEM, | ||
| "--conf spark.driver.maxResultSize=" + SPK_DRIVER_MAX_RESULTS, | ||
| "--conf spark.yarn.driver.memoryOverhead=" + SPK_DRIVER_MEM_OVERHEAD, |
There was a problem hiding this comment.
spark.yarn.driver.memoryOverhead should be spark.yarn.am.memoryOverhead based on current spot branch.
| SPK_DRIVER_MAX_RESULTS= | ||
| SPK_EXEC_CORES= | ||
| SPK_DRIVER_MEM_OVERHEAD= | ||
| SPK_EXEC_MEM_OVERHEAD= |
| TOL = conf.get('DEFAULT','TOL') | ||
|
|
||
| #prepare options for spark-submit | ||
| spark_cmd = [ |
There was a problem hiding this comment.
Some of the values in spark_cmd and spark_extras are either modified or gone in spot branch.
There was a problem hiding this comment.
i'm working on rebasing my branch right now to resolve this and the conflicts
There was a problem hiding this comment.
there was only the one change to the actual spark command that i could find. let me know if there is anything else, pushing the changes now.
|
Now that we are reviewing this... Obviously these vars are not being used, is there any reason to keep them around? PREPROCESS_STEP = "{0}_pre_lda".format(args.type) HDFS_DOCRESULTS = "{0}/doc_results.csv".format(HPATH) HDFS_WORDRESULTS = "{0}/word_results.csv".format(HPATH) LDA_OUTPUT_DIR = "{1}/{1}".format(args.type, args.fdate) |
|
I don't see any reason to keep those. |
Moving hdfs_setup and ml_ops to python scripts instead of bash to support new spot.conf
now all variables are being stored in spot.conf, including ingest configurations.
I have left the normal bash scripts for comparison and testing this round.