WIP: reduced precision for lanczos_opencv#635
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #635 +/- ##
==========================================
+ Coverage 88.10% 88.15% +0.05%
==========================================
Files 29 29
Lines 1908 1925 +17
==========================================
+ Hits 1681 1697 +16
- Misses 227 228 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
second commit makes it slightly faster: |
|
the output is qualitatively different for Float32 compared to Float64. :( specifically, when i don't know enough about lanczos resampling to fix this in the right way. what i do know though is that this is not a problem if opencv's lanczos implementation is followed more closely (see the last commit of this PR). specifically, so i'm putting this on the back burner for now. below is the output without the last commit. |
|
latest commit preserves the original implementation of lanczos_opencv, and adds a new version which can use Float32 internally to mirror openCV identically. |
|
@mkitti can you please review? |
| _lanczos4_opencv_faithful(float(T), float(T).(l4_2d_cs), δx) | ||
|
|
||
| # main differences with `_lanczos4_opencv` are (1) the criterion for preventing a | ||
| # division by zero is `< pi/4 * 1e-6` (instead of `iszero`) and (2) the resulting |
There was a problem hiding this comment.
Can you justify where these new constants are coming from? Is it going to work for Float16?
I wonder if we should use isapprox?
There was a problem hiding this comment.
these new constants come directly from the OpenCV implementation. i don't know about Float16. the whole point of this PR is to copy the OpenCV algorithm verbatim, including the use of Float32. so that one can reproduce results from and OpenCV workflow in julia with nearly identical results at the bit level. the existing non-faithful code here tries to improve on their approach. that's not what this new method is intended for.
|
Any update here? |
using Float32 for Lanczos is ~2x faster and uses ~1/2x as much memory as the current Float64.
this PR currently uses whatever precision was input as the precision the internal calculations are performed with. i could also imagine specifying the type used for internal computations in the type (e.g.
struct Lanczos4OpenCV{T} <: AbstractLanczos end) to separate it from the input.i'm also curious where there is a more clever way to cast
l4_2d_csat compile time so as not to incur runtime penalities.let me know what you think and i'll add some tests and docs.