Interface contract for robust error handling?

In order for a user to implement a production robust use of the library APIs, the error scenarios need to be well documented, and ideally they should all follow well defined patterns so the user has a simple way to appropriately handle all error scenarios.

For example, a use case I've tried to implement is to fetch thousands of URLs, attempting for all of them to succeed.

The starting code looks like:

```
(ns script
  (:require [babashka.http-client :as http]))
```

```
(->> ["https://clojure.org"
      "https://google.com"
      "https://amazon.com"
      "https://localhost"
      "garbage"]
     (mapv #(http/get % {:async true}))
     (mapv deref)
     (println))
```

Where the vector of URL strings would contain thousands more URLs in it.

In a real life scenario, we can't assume we have the cleanest data, first of all that means some URLs might be malformed to begin with, such as `"garbage"` in my example.

In our case, `http/get` will not behave in an `async` way even though we said `:async true`, because it's URL to URI parser is actually ran synchronously and throws synchronously on error, instead of always returning a CompletableFuture.

This means users must know that some errors are synchronous, while others are async. We might update the code as follows:

```
(->> ["https://clojure.org"
      "https://google.com"
      "https://amazon.com"
      "https://localhost"
      "garbage"]
     (mapv #(try (http/get % {:async true})
                 (catch Exception e (delay e))))
     (mapv deref)
     (println))
```

This delays the synchronous errors, so that they show up asynchronously later when we `deref` the results, allowing us to move all the error handling around the `deref`.

Next the user needs to know how to handle errors in the `deref` part. Because we want this to be a robust implementation, the user wants to perform a retry on all errors that indicate a transient issue, but would like to log and continue on all errors that indicate an issue with the input or a bug, for which retrying would be fruitless, and finally after some number of retries if the retryable ones still fail, it wants to similarly log and continue.

The first code refactor might look like:

```
(->> ["https://clojure.org"
      "https://google.com"
      "https://amazon.com"
      "https://localhost"
      "garbage"]
     (mapv #(try (http/get % {:async true})
                 (catch Exception e (delay e))))
     (mapv #(try (deref %)
                 (catch Exception e e)))
     (println))
```

Here we're turning the deref's thrown exceptions into return values, so we can process them after to check for which one succeeded or errored.

Now we refactor the code again so we can handle errors:

```
(defn error-type
  [result]
  (cond (instance? clojure.lang.ExceptionInfo result)
        :retryable-error
        (instance? Exception result)
        :non-retryable-error
        (and (map? result) (not= 200 (:status result)))
        :retryable-error
        :else
        :success))

(defn batch-fetch
  [urls]
  (->> urls
       (mapv #(try (http/get % {:async true})
                   (catch Exception e (delay e))))
       (mapv #(try (deref %)
                   (catch Exception e e)))
       (group-by error-type)
       (doall)))

(batch-fetch ["https://clojure.org"
              "https://google.com"
              "https://amazon.com"
              "https://localhost"
              "garbage"])
```

This is the crux of the issue, look at the `error-type` function. Here we want a reliable way to identify everything that's not a success, and for each thing that's not a success, we want to identify if it's retryable or not retryable. But because babashka.http-client doesn't provide well defined errors, it is very difficult to know how to identify the type of errors that could happen exhaustively.

Currently, it appears sometimes the error will be an `ex-info` but not always. If it is an `ex-info`, it's not clear how to tell what type of error it is, or if there will always be a `:status` key on the `ex-data` for it.

The user would need documented descriptions of where errors are thrown/returned, the set of all possible type of errors at each point, and their structure.

At a minimum, a consistent structure for all errors would help a lot, for example making all errors `ex-info` with the type of error documented on a `:type` key on the `ex-data`, along with a documented list of the set of all possible types of errors, and that all errors will always have an ex-data with the details.

That way inside `error-type` we could simply inspect the ex-data `:type` key and if it's one of a retryable type, we can classify it as retryable, otherwise non-retryable.

This would also allow us, for non-retryable ones, to know what to log in the error, since we can always expect an `ex-data` with some common structure, and a `ex-message`, we can log something like:

```
(log (str (:type (ex-data err)) ", " (ex-message err) ", " (ex-data err)))
```

Finally, it would be nice if all errors contained the URL as well which it failed on, because for the retryable ones for example, the user would want to retry, but if the error no longer contains the information to map back to which of the async `get` this was for, knowing which one to retry is non-trivial. Thus either having the URL as part of the error so we can retry using the error data itself, or having a feature to provide some ID to the `async get` call which is included on the error so we can map it back would be ideal.

Regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interface contract for robust error handling? #11

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Interface contract for robust error handling? #11

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions