Skip to content
This repository was archived by the owner on May 15, 2019. It is now read-only.

Spot conf parser#142

Open
natedogs911 wants to merge 23 commits intospotfrom
spot_conf_parser
Open

Spot conf parser#142
natedogs911 wants to merge 23 commits intospotfrom
spot_conf_parser

Conversation

@natedogs911
Copy link
Copy Markdown
Contributor

Moving hdfs_setup and ml_ops to python scripts instead of bash to support new spot.conf
now all variables are being stored in spot.conf, including ingest configurations.

I have left the normal bash scripts for comparison and testing this round.

Comment thread spot-ml/ml_ops.py Outdated
"--conf spark.executor.cores=" + SPK_EXEC_CORES,
"--conf spark.executor.memory=" + SPK_EXEC_MEM,
"--conf spark.driver.maxResultSize=" + SPK_DRIVER_MAX_RESULTS,
"--conf spark.yarn.driver.memoryOverhead=" + SPK_DRIVER_MEM_OVERHEAD,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.yarn.driver.memoryOverhead should be spark.yarn.am.memoryOverhead based on current spot branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread spot-setup/spot.conf
SPK_DRIVER_MAX_RESULTS=
SPK_EXEC_CORES=
SPK_DRIVER_MEM_OVERHEAD=
SPK_EXEC_MEM_OVERHEAD=
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you fixed it.

Comment thread spot-ml/ml_ops.py Outdated
TOL = conf.get('DEFAULT','TOL')

#prepare options for spark-submit
spark_cmd = [
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the values in spark_cmd and spark_extras are either modified or gone in spot branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm working on rebasing my branch right now to resolve this and the conflicts

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was only the one change to the actual spark command that i could find. let me know if there is anything else, pushing the changes now.

@natedogs911
Copy link
Copy Markdown
Contributor Author

Now that we are reviewing this...

Obviously these vars are not being used, is there any reason to keep them around?

PREPROCESS_STEP = "{0}_pre_lda".format(args.type)
POSTPROCESS_STEP = "{0}_post_lda".format(args.type)

HDFS_DOCRESULTS = "{0}/doc_results.csv".format(HPATH)
LOCAL_DOCRESULTS = "{0}/doc_results.csv".format(LPATH)

HDFS_WORDRESULTS = "{0}/word_results.csv".format(HPATH)
LOCAL_WORDRESULTS = "{0}/word_results.csv".format(LPATH)

LDA_OUTPUT_DIR = "{1}/{1}".format(args.type, args.fdate)

@rabarona
Copy link
Copy Markdown

I don't see any reason to keep those.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants