When creating a new job via mjob create -w ..., if the job/storage shard for some reason returns a ServiceUnavailableError (or any other error that would hit this condition) after job creation, mjob will stop watching and report the error to the client. It's quite easy to think that the job itself has failed and the user may be inclined to try again, but the job may in fact still be running or even completed.
Here's some example output of what this would look like. In this case the job did in fact run to completion:
$ mfind -t o /richard/stor/path/to/log/files | mjob create -w -m "grep something"
62171e93-bc55-6348-e35e-fc812b2ee1f0
mjob: ServiceUnavailableError: manta is unable to serve this request
I believe the UUID being reported here means we've at least successfully created the job. The following line reporting the ServiceUnavailableError may be from mjob's poll against Manta (via lib/client.js' job method) for completion some time after creation.
We could probably make this a little more clear in a few different ways.
- Report that the UUID is actually a successfully created job. Consumers might rely on the first response being just a UUID, however
- If it is in fact that the poll has failed, we could retry a couple of times and/or expand on the error message to explain that we failed after job creation
When creating a new job via
mjob create -w ..., if the job/storage shard for some reason returns a ServiceUnavailableError (or any other error that would hit this condition) after job creation,mjobwill stop watching and report the error to the client. It's quite easy to think that the job itself has failed and the user may be inclined to try again, but the job may in fact still be running or even completed.Here's some example output of what this would look like. In this case the job did in fact run to completion:
I believe the UUID being reported here means we've at least successfully created the job. The following line reporting the ServiceUnavailableError may be from
mjob's poll against Manta (via lib/client.js'jobmethod) for completion some time after creation.We could probably make this a little more clear in a few different ways.