Optimize PutFS in text.z80 by NonstickAtom785 · Pull Request #53 · Zeda/Grammer2

NonstickAtom785 · 2020-11-12T17:37:17Z

No description provided.

merge

Merge

This is what I want to do to the rest of the document.

Update branch

NonstickAtom785 · 2020-11-20T19:37:13Z

Optimized the Text Routine and added it to the pull request. You can test it if you'd like :)

NonstickAtom785 · 2020-11-23T13:26:25Z

PutLeft:
  ld a,(de)
  call nc,Put
PutRight:
  ld a,(de)
  call ShiftPut
  ld a,(de)
  inc de
  call c,Put
  djnz PutLeft
  ret

Five bytes smaller. I think...

NonstickAtom785 · 2020-11-24T16:52:08Z

I don't know how fast this one is but it is a little more optimized.

  ld bc,$030F
  ld a,(de)
  jr nz,PutRight    ;Note my nz is there because I am doing and 7 in my GetPixel routine.
  ld c,$F0
PutLeft:
  call Put
  inc de
  call ShiftPut
  djnz PutLeft
  ret
PutRight:
  call ShiftPut
  inc de
  call Put
  djnz PutRight
  ret
ShiftPut:
  rlca \ rlca \ rlca \ rlca
Put:
  bit textInverse,(iy+textFlags)
  jr z,+_
  cpl
_:
  and c
  or (hl)
  ld (hl),a
  push bc
  ld bc,12
  add hl,bc
  pop bc
  ld a,(de)
  ret

I would like to remove the push bc and the pop bc but I don't know any faster alternatives. And also there is the fact that everything after ld (hl),a adds extra clock cycles when the routine reaches the last line. Is there a way to improve that?

I added in some improvements to this version. If I calculated right the Clock Cycles are 839 for PutRight and 834 for PutLeft

I have used ixl half register as a temp storage for the mask. It reduced 1 byte and removed more clock cycles.

Zeda · 2020-12-05T23:39:12Z

  push bc
  ld bc,12
  add hl,bc
  pop bc
  ld a,(de)
  ret
I would like to remove the push bc and the pop bc but I don't know any faster alternatives.

For code like that, I do something like this:

  ld a,12
  add a,l
  ld l,a
  ld a,(de)
  ret nc
  inc h
  ret

That goes from 8 bytes, 59cc to 8 bytes, 33cc|41cc (average is 33.375 though as the 41cc only happens 12/256 times (~4.7%) on average).

Your newest text code didn't quite work for me when I compiled it, but here is what I came up with for the PutFS routine

PutFS:
; read the font from flash to RAM
; need to add 3*A to the fontpointer
  ld hl,(FontPointer)
  ld b,0
  ld c,a
  add hl,bc
  add hl,bc
  adc hl,bc     ;add hl,bc won't set the right flags, so use adc
  ld a,(font_ptr_page)
  jp p,+_
  or a
  jr z,+_
  set 6,h
  res 7,h
  inc a
_:
  ld c,3
  ld de,$8005
  call readarc

; get the text position and update it
  ld hl,(textRow)
;  ld b,0 ;B is already 0 from the ReadArc routine
  ld a,h
  cp 24
  ld a,l
  jr c,+_
  ld h,b
  add a,6
_:
  cp 3Bh
  jr c,+_
  sub 3Ch
  jr nc,+_
  add a,6
_:
  ld l,a
; need to advance the x-coord by 1
  inc h
  ld (textRow),hl
  dec h
  ;want A*12+H/2+(gbuf_temp), and we know A < 64
  add a,a
  add a,a
  ;now A*3+(gbuf_temp)+H/2
  ld c,a
  ld a,h
  ld hl,(gbuf_temp)
  add hl,bc
  add hl,bc
  add hl,bc
  ld c,a
  srl c
  add hl,bc
  rra
  ld e,4    ; now DE points to the byte before the char data
  jr nc,put_left
put_right:
  ld c,$0F
  call put_right2
  call put_right2
put_right2:
; read in the byte
  inc de
  ld a,(de)

; check if it needs to be inverted
  bit InvertTextFlag,(iy+UserFlags)
  jr z,$+3
  cpl
  ld b,a      ; back up the byte
  call shift_put_lr
  ld a,b      ;restore the byte
  jr put_lr

put_left:
  ld c,$F0
  call put_left2
  call put_left2
put_left2:
; read in the byte
  inc de
  ld a,(de)

; check if it needs to be inverted
  bit InvertTextFlag,(iy+UserFlags)
  jr z,$+3
  cpl

  ld b,a      ; back up the byte
  call put_lr
  ld a,b      ;restore the byte

shift_put_lr:
; rotate the nibbles
  rrca
  rrca
  rrca
  rrca
put_lr:
; mask the byte
  and c

; OR it to the screen
  or (hl)
  ld (hl),a

; advance the gbuf ptr
  ld a,l
  add a,12
  ld l,a
  ret nc
  inc h
  ret

I reorganized some of the beginning code in PutFS. With the somewhat recent text updates, Grammer (finally) supported archived fonts, but I basically patched the PutFS code instead of reorganizing it to be more optimal. So now it reads the char data from flash to a fixed location so it doesn't have to keep track of the char # or pointer. It then updates text coordinates, and directly proceeds to convert those to an offset into the graphics buffer. I tweaked that calculation to save a few more bytes and clock cycles by taking advantage of the Y-coordinate being less than 64. Then we get into the actual drawing of the char were I use your idea of calling a common a put/shiftput subroutine, but instead of using B as a counter and looping 3 times, I just call the body of the routine twice and fall through for the third iteration. As well, I move the logic to invert the text to the body instead of the put routine, saving about 81cc (at the cost of 7 bytes since I duplicate that code in the left and right variants).

Over all, the code that actually draws the char is about 141.25cc faster than your latest routine (and ~263.25cc than the original), and I didn't calculate the clock cycles saved from my changes to the load/coord/calculate stages. Your version is 6 bytes smaller than mine and a full 19 bytes smaller than the original, but currently I like the above version more.

(Side note: While I was typing all of this up, I saw your trick with using (de) to restore the byte, and by using that in my code, saved 3 bytes and 6cc, nice! Since that also frees up a variable, I'm hoping to find even more optimizations, so I'll edit this comment.)

EDIT: That ld a,(de) trick won't work with my code because my code doesn't re-apply the invert logic, so the every other row of pixels would be inverted in invert mode. So I lost the three bytes savings, but I was able to save 18cc more.

NonstickAtom785 · 2020-12-07T14:17:13Z

I think these modifications save bytes and clock cycles. I can't test it atm but I'm almost positive unless there is something I'm missing. I added some of the operations to the routine instead of the overhead loop as I think they should do the same thing but with less bytes.

PutFS:
; Read the font from Flash to RAM
; Need to add 3A to the font-pointer
  ld hl,(FontPointer)
  ld b,0
  ld c,a
  add hl,bc
  add hl,bc
  adc hl,bc     ;add hl,bc won't set the right flags, so use adc
  ld a,(font_ptr_page)
  jp p,+_
  or a
  jr z,+_
  set 6,h
  res 7,h
  inc a
_:
  ld c,3
  ld de,$8005
  call readarc

; get the text position and update it
  ld hl,(textRow)
;  ld b,0 ;B is already 0 from the ReadArc routine
  ld a,h
  cp 24
  ld a,l
  jr c,+_
  ld h,b
  add a,6
_:
  cp 3Bh
  jr c,+_
  sub 3Ch
  jr nc,+_
  add a,6
_:
  ld l,a
; need to advance the x-coord by 1
  inc h
  ld (textRow),hl
  dec h
; Want A*12+H/2+(gbuf_temp), and we know A < 64
  add a,a
  add a,a
; Now A*3+(gbuf_temp)+H/2
  ld c,a
  ld a,h
  ld hl,(gbuf_temp)
  add hl,bc
  add hl,bc
  add hl,bc
  ld c,a
  srl c
  add hl,bc
  rra
  ld e,5    ; Well now DE should point to the character.
  ld a,(de)
  jr nc,+_
  ld c,$0F
put_right:
  call put_right2
  call put_right2
put_right2:
  call shift_put_lr
  inc de
  jr put_lr
_:
  ld c,$F0
put_left:
  call put_left2
  call put_left2
put_left2:
  call put_lr
  inc de

shift_put_lr:
; Rotate the nibbles
  rrca
  rrca
  rrca
  rrca
put_lr:
; Check if it needs to be inverted
  bit InvertTextFlag,(iy+UserFlags)
  jr z,$+3
  cpl
; Mask the byte
  and c

; OR it to the screen
  or (hl)
  ld (hl),a

; advance the gbuf ptr
  ld a,l
  add a,12
  ld l,a
  ld a,(de)   ; Restore the byte
  ret nc
  inc h
  ret

Edit: I tested and it didn't work. So time to try again until I get it. I know the c optimization works because c isn't being used by anything else so it should stay the same no matter what.

Edit2: I removed the extra bytes and added my trick into the routine. I also fixed a superb amount of extra stuff that wasn't needed in the loop area that was used twice. I took those and put them into the put_lr: routine to be used by both routines. It works quite well. I still would like to know how you are counting your clock cycles. I changed the code above to my current.

Edit3
You said something about your invert logic not reapplying. Do you mean that you save a then use the inverted a again? If that's the case the size optimized method might be a tad bit slower because it's reapplying the invert logic again on each loop, which could slow it down.

NonstickAtom785 · 2020-12-07T17:18:26Z

Here are the proofs:

I just prettied it up a bit.

NonstickAtom785 · 2020-12-07T18:34:10Z

Your optimization is almost too superior but that's okay. It is fast! That's my final push. I think.

NonstickAtom785 added 13 commits February 18, 2020 08:56

Merge pull request #7 from Zeda/master

912c401

merge

Update tokenhook.z80

8bd5d0a

Update tokenhook.z80

ced8e17

Merge pull request #8 from Zeda/master

7bfe686

Merge

Added another column for OS Token equivalents.

f8809f8

This is what I want to do to the rest of the document.

Delete Readme.md

cb5a983

Create readme.md

df689f5

Delete readme.md

1c259b0

Update readme.md

8f49db4

Create readme.md

bbe4756

Merge pull request #9 from Zeda/master

d241e50

Update branch

Removed unnecessary push/pop af in PutFS

3d0fea5

Optimized Fixed Text routine

9dd132a

NonstickAtom785 added 2 commits November 30, 2020 08:53

More Optimization to my Version

7f7ea9c

I added in some improvements to this version. If I calculated right the Clock Cycles are 839 for PutRight and 834 for PutLeft

An even bigger Optimization(Read description)

b117fd0

I have used ixl half register as a temp storage for the mask. It reduced 1 byte and removed more clock cycles.

Optimized your optimization :D

79f4b3d

NonstickAtom785 changed the title ~~Remove push/pop af in text.z80~~ Optimize PutFS in text.z80 Dec 7, 2020

igivein

1ddfebc

I just prettied it up a bit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize PutFS in text.z80#53

Optimize PutFS in text.z80#53
NonstickAtom785 wants to merge 17 commits into
Zeda:masterfrom
NonstickAtom785:master

NonstickAtom785 commented Nov 12, 2020

Uh oh!

NonstickAtom785 commented Nov 20, 2020

Uh oh!

NonstickAtom785 commented Nov 23, 2020

Uh oh!

NonstickAtom785 commented Nov 24, 2020

Uh oh!

Zeda commented Dec 5, 2020 •

edited

Loading

Uh oh!

NonstickAtom785 commented Dec 7, 2020 •

edited

Loading

Uh oh!

NonstickAtom785 commented Dec 7, 2020

Uh oh!

NonstickAtom785 commented Dec 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NonstickAtom785 commented Nov 12, 2020

Uh oh!

NonstickAtom785 commented Nov 20, 2020

Uh oh!

NonstickAtom785 commented Nov 23, 2020

Uh oh!

NonstickAtom785 commented Nov 24, 2020

Uh oh!

Zeda commented Dec 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NonstickAtom785 commented Dec 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NonstickAtom785 commented Dec 7, 2020

Uh oh!

NonstickAtom785 commented Dec 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Zeda commented Dec 5, 2020 •

edited

Loading

NonstickAtom785 commented Dec 7, 2020 •

edited

Loading