Optimize PutFS in text.z80#53
Conversation
This is what I want to do to the rest of the document.
Update branch
|
Optimized the Text Routine and added it to the pull request. You can test it if you'd like :) |
Five bytes smaller. I think... |
|
I don't know how fast this one is but it is a little more optimized. I would like to remove the |
I added in some improvements to this version. If I calculated right the Clock Cycles are 839 for PutRight and 834 for PutLeft
I have used ixl half register as a temp storage for the mask. It reduced 1 byte and removed more clock cycles.
For code like that, I do something like this: That goes from 8 bytes, 59cc to 8 bytes, 33cc|41cc (average is 33.375 though as the 41cc only happens 12/256 times (~4.7%) on average). Your newest text code didn't quite work for me when I compiled it, but here is what I came up with for the PutFS routine I reorganized some of the beginning code in PutFS. With the somewhat recent text updates, Grammer (finally) supported archived fonts, but I basically patched the PutFS code instead of reorganizing it to be more optimal. So now it reads the char data from flash to a fixed location so it doesn't have to keep track of the char # or pointer. It then updates text coordinates, and directly proceeds to convert those to an offset into the graphics buffer. I tweaked that calculation to save a few more bytes and clock cycles by taking advantage of the Y-coordinate being less than 64. Then we get into the actual drawing of the char were I use your idea of calling a common a put/shiftput subroutine, but instead of using B as a counter and looping 3 times, I just Over all, the code that actually draws the char is about 141.25cc faster than your latest routine (and ~263.25cc than the original), and I didn't calculate the clock cycles saved from my changes to the load/coord/calculate stages. Your version is 6 bytes smaller than mine and a full 19 bytes smaller than the original, but currently I like the above version more. (Side note: While I was typing all of this up, I saw your trick with using (de) to restore the byte, and by using that in my code, saved 3 bytes and 6cc, nice! Since that also frees up a variable, I'm hoping to find even more optimizations, so I'll edit this comment.) EDIT: That |
|
I think these modifications save bytes and clock cycles. I can't test it atm but I'm almost positive unless there is something I'm missing. I added some of the operations to the routine instead of the overhead loop as I think they should do the same thing but with less bytes. Edit: I tested and it didn't work. So time to try again until I get it. I know the c optimization works because c isn't being used by anything else so it should stay the same no matter what. Edit2: I removed the extra bytes and added my trick into the routine. I also fixed a superb amount of extra stuff that wasn't needed in the loop area that was used twice. I took those and put them into the Edit3 |
|
Your optimization is almost too superior but that's okay. It is fast! That's my final push. I think. |


No description provided.