Add protection against failure to reserve jobs#41
Conversation
fdc55bf to
e53381c
Compare
|
All right, this version works for a wrong query in the consumers. Here is an example by adding a non existing column: Though technically, in its current state, it says it will reenqueue tasks even if there are no tasks (and the error will pop every so often, but that's ok: it's pretty bad to have a query that fails even when there are no actual jobs to run), so I suspect we need two different |
1006cf5 to
0d15769
Compare
|
Latest commit also guarantees it doesn't fail when a single job fails to parse (without blocking the others). However, the update logic doesn't seem to work yet for some reason :( |
33b0e21 to
b47da37
Compare
|
Build errors are unrelated (but I suspect we're missing some lib install in the CI). This works, and is tested. Now, the main issue is having all these rollbacks / commit in the middle of the consumer loop - it's not great. Another solution would be to have a savepoint, but I think it actually makes sense to rollback in this context. @arybczak any opinion or better solutions ? |
arybczak
left a comment
There was a problem hiding this comment.
Another solution would be to have a savepoint
Nah, this one is good because it rollbacks only when there's an error, so the common path stays efficient. A savepoint would have to be always created, adding unnecessary overhead for each reservation.
| pure (batchSize > 0) | ||
|
|
||
| reserveJobs :: Int -> m ([job], Int) | ||
| reserveJobs :: Int -> m ([Maybe job], Int) |
There was a problem hiding this comment.
This no longer works as advertised, because if it returns ([Nothing], 1), then batch size lies and the when (batchSize > 0) $ do block above is unnecessarily entered.
I think it's best to return m [job], i.e. get rid of maybes before returning from this helper and calculate batchSize using length in loop.
| else | ||
| handle | ||
| ( \(SomeException e) -> do | ||
| logAttention "Failure to fetch the jobs, will reenqueue for 6 hours later" $ object ["error" .= show e, "job_ids" .= show jobIds] |
There was a problem hiding this comment.
Why 6 hours? #39 does 1 day.
Also, this bit of the exception handler until pure is the same as below, you can make a helper function out of it.
There was a problem hiding this comment.
I don't remember why I changed it. I've set it back to one day.
Unfortunately, factorizing is hard here:
- If I want to factorize the rollback, GHC demands that I add
MonadDBinstances EVERYWHERE; - The log messages are different, and use different objects;
So the only part I managed to really unite is the update logic - which is not nothing, since it gives us a single place to define the delay before the job gets retried.
| toUpdate = (,Failed . RerunAfter . ihours $ 6) <$> jobIds | ||
| rollback | ||
| lift $ updateJobs toUpdate | ||
| commit |
There was a problem hiding this comment.
No need for explicit commit, it's going to be committed right after as we exit runDBT immediately.
| , "RETURNING id, " <+> mintercalate ", " ccJobSelectors | ||
| ] | ||
| qr <- queryResult | ||
| results <- forM (F.toList qr) $ \(jobIdRow :*: other) -> |
There was a problem hiding this comment.
| results <- forM (F.toList qr) $ \(jobIdRow :*: other) -> | |
| results <- forM (F.toList qr) $ \(rawJobId :*: jobSelectors) -> |
jobIdRow is a weird name, I'm not sure how it's a row. And other is non-descriptive.
Also, you can use fetchMany for less cruft, explicit queryResult in the original code was used for lazy processing, but we're forgoing it here.
There was a problem hiding this comment.
fetchMany demands again a cascade of constraint, so I opted to keep it as is.
| qr <- queryResult | ||
| results <- forM (F.toList qr) $ \(jobIdRow :*: other) -> | ||
| let jobId = runIdentity jobIdRow | ||
| in handle |
There was a problem hiding this comment.
I'd do
(Just <$> liftBase (evaluate $ ccJobFetcher other)) `catch` \(SomeException e) -> do
...
seems more readable to me that way since you immediately see which code is being handled.
I also extracted Just outside the evaluate because the combination of evaluate, $ and $! isn't great for readability imo :P
There was a problem hiding this comment.
Much better, thank you !
| ] | ||
| qr <- queryResult | ||
| results <- forM (F.toList qr) $ \(jobIdRow :*: other) -> | ||
| let jobId = runIdentity jobIdRow |
There was a problem hiding this comment.
If you make a do block above, you won't need in for the let.
| :: forall m idx job | ||
| . ( MonadBaseControl IO m | ||
| , MonadLog m | ||
| , MonadCatch m |
There was a problem hiding this comment.
Redundant, MonadMask implies it.
ebfc9da to
77b6504
Compare
77b6504 to
c564f35
Compare
WIP