Conversation
cdeb5d1 to
04e0547
Compare
07caa3c to
e57e09d
Compare
|
What's the status on this? |
|
The only comment from Andrews was to have mAP rather than Loss being graphed. But it sounds like the plan is to show loss unless mAP is available. That sounds like an enhancement. It could show both if mAP is output. This could be added to the head of the main page, with the dataset and model jobs just beneath. |
|
I think any version of this will be an improvement on the current home page. Can you rebase to fix the merge conflicts and let me know when this is ready for review? |
|
Rebased. I'll move it to the home page next. |
30048c7 to
7838d29
Compare
|
@lukeyeager, can you have a look. I might move the job_management.html into home.html rather than include it. |
|
Looks pretty good!
Let's be consistent. How about doing what Google Maps does - tell you (1) the time remaining and (2) the estimated time that will be. |
digits/job.py
Outdated
There was a problem hiding this comment.
I'm confused by this function. If you have tasks A and B, and A is B's parent, wouldn't this function return [A,B,A]? Is that what you want? Don't you just want the list at job.tasks?
There was a problem hiding this comment.
Wait, so if A is B's parent, then it is also in the job's list of tasks? That sounds like the node is in the graph twice.
There was a problem hiding this comment.
Yes, all tasks are in the job.tasks list. The task.parents field is optional, and may describe dependencies between tasks. The tasks are not necessarily a fully-connected graph. We can change this behavior if you have a good reason for it.
There was a problem hiding this comment.
No need to change it at this point. I was just working with certain assumptions about the graph. I removed that method in job.py and task.py. So, I'm happy with that.
|
@lukeyeager The trick about time estimation is that Jobs page shows the estimate time remaining for the task not the job. Projecting the time for the Job is not going to be very reliable. Could just the last task, or display the times for each task. |
7838d29 to
e159fb2
Compare
Oh right, I forgot about this problem. How ridiculous would it be for us to assume that all tasks take the same amount of time? Could we display the overall, naively-averaged progress on top, and the progress of each task below? That might look ugly, I'm just spitballing here. I don't think we need to solve this before merging. |
e159fb2 to
008a038
Compare
|
@lukeyeager, I emit gpu availability after resources are allocated or deallocated. That should be pretty solid, I think. The sparkline drawn when the page is refreshed. I've removed the get_tasks_recursively method, because job.tasks was the exhaustive task list. I'd too would rather get this out and deal with the eta issue in the future. |
|
Sounds good, thanks. You've still got some tests failing on the Travis build. Let's get those sorted out and merge. |
5c23342 to
91b3327
Compare
|
Quick, it's passing travis! |
digits/scheduler.py
Outdated
There was a problem hiding this comment.
As I recall, I was getting a failure on tests that I wasn't getting in practice, and the error was in flask that flask._app_ctx_stack.top was None. It was probably late, and I I couldn't ping you, so I did that. Let me see if it's still needed. If it still errors, I'll ping you.
There was a problem hiding this comment.
======================================================================
ERROR: digits.test_scheduler.TestSchedulerFlow.test_add_remove_job
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/jmancewicz/dev/digits/DIGITS/digits/test_scheduler.py", line 44, in test_add_remove_job
assert self.s.add_job(job), 'failed to add job'
File "/home/jmancewicz/dev/digits/DIGITS/digits/scheduler.py", line 183, in add_job
html = flask.render_template('job_row.html', job = job)
File "/usr/local/lib/python2.7/dist-packages/flask/templating.py", line 126, in render_template
ctx.app.update_template_context(context)
AttributeError: 'NoneType' object has no attribute 'app'
I wasn't sure what was missing and it looks like I just committed the stopgap measure. What's the correct way to avoid that error? @lukeyeager
There was a problem hiding this comment.
Does this work?
with app.app_context():
html = flask.render_template('job_row.html', job = job)Examples:
https://github.com/NVIDIA/DIGITS/blob/v2.2.1/digits/job.py#L161-L162
https://github.com/NVIDIA/DIGITS/blob/v2.2.1/digits/task.py#L100-L105
There was a problem hiding this comment.
Not quite. New error. Some new caffe errors, so rebuilt caffe and pycaffe. Incidentally make -j10 doesn't work in caffe until the first few targets build. Looking into the new error.
======================================================================
ERROR: Failure: AttributeError ('module' object has no attribute 'scheduler')
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/jmancewicz/dev/digits/DIGITS/digits/test_scheduler.py", line 7, in <module>
from . import scheduler as _
File "/home/jmancewicz/dev/digits/DIGITS/digits/scheduler.py", line 23, in <module>
from digits.webapp import app
File "/home/jmancewicz/dev/digits/DIGITS/digits/webapp.py", line 20, in <module>
scheduler = digits.scheduler.Scheduler(config_value('gpu_list'))
AttributeError: 'module' object has no attribute 'scheduler'
There was a problem hiding this comment.
You're talking about this test?
https://github.com/NVIDIA/DIGITS/blob/v2.2.1/digits/test_scheduler.py#L42-L47
That's a pretty sloppy test - my bad.
You could also give Jobs a default type or make job_type() return None instead of throwing an error. Both seem easier than subclassing Job to me.
There was a problem hiding this comment.
File "/home/jmancewicz/dev/digits/DIGITS/digits/templates/job_row.html", line 5, in top-level template code
<td><h4 class="list-group-item-heading"><a href="{{ url_for(show_func, job_id=job.id()) }}">{{ job.name() }} </a></h4></td>
File "/usr/local/lib/python2.7/dist-packages/flask/helpers.py", line 287, in url_for
raise RuntimeError('Application was not able to create a URL '
@lukeyeager, I think this is close to the error that caused me to add the line you questioned. url_for doesn't work in this test. From what I can see it's that the server is not running or the SERVER_NAME is not set. It feels like we talked about this, but I don't recall if there was a resolution.
There was a problem hiding this comment.
yes, that's the test. I did the subclass, and ran into the above error.
There was a problem hiding this comment.
@lukeyeager, as far as imports go, the only import change from origin/master in webapp or scheduler is in scheduler.py
from digits.utils import subclass, override
There was a problem hiding this comment.
So I seem to be past whatever the import issue may have been, but url_for is not working, which is why I had bailed out if there was not an app context (if that is what that was).
e789fbb to
e46a4b9
Compare
e46a4b9 to
2bb22ee
Compare
|
Removing the 'ago' text which shows how long ago the job started. There are potential time computation issues between client and server that need to be resolved. This will most likely will return in the future. |
There was a problem hiding this comment.
@jmancewicz did you mean to leave this route in? It throws an error on my machine when I try to access this url:
UndefinedError
'running_datasets' is undefined
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/share/digits/digits/views.py", line 181, in job_management
running_job = running_datasets + running_models,
File "/usr/lib/python2.7/dist-packages/flask/templating.py", line 128, in render_template
context, ctx.app)
File "/usr/lib/python2.7/dist-packages/flask/templating.py", line 110, in _render
rv = template.render(context)
File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 969, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 742, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/share/digits/digits/templates/job_management.html", line 204, in top-level template code
{% block content %}
File "/usr/share/digits/digits/templates/job_management.html", line 206, in block "content"
{% set running_jobs = running_datasets + running_models %}
UndefinedError: 'running_datasets' is undefined
|
Ah. Nope. That's left over from the first version. |





Initial Job Management page.
There is no link to it
http://localhost:5000/job_management