For my portability and optimization class, I am taking a look at the 6502 assembly code in preparation for modern assembly code. I am using a 6502 emulator located here that includes a bitmap display and a text output so we can visually see the output for the assembly code including a memory monitor to see the added memory.
I will be running some 6502 assembly code to see the results while calculating the time it takes to run and performing some extra experiments in this blog.
The instructions for creating programs with this CPU are very minimal and a manual used for this lab and in the course can be found here with all instructions for the 6502 processer.
Calculating Performance
The following code fills the bitmap with a solid colour:
lda #$00 ; set a pointer in memory location $40 to point to $0200
sta $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address $41
lda #$07 ; colour number
ldy #$00 ; set index to 0
loop: sta ($40),y ; set pixel colour at the address (pointer)+Y
iny ; increment index
bne loop ; continue until done the page (256 pixels)
inc $41 ; increment the page
ldx $41 ; get the current page number
cpx #$06 ; compare with 6
bne loop ; continue until done all pages
Timing Results
Below are the timing results from each instruction used above including how many cycles it would take to complete which is then transformed into time.
A bit of clarification, the bitmap display is 32x32 pixels long and takes that amount of space in bits in the 6502's memory, and because we are looping and changing each pixel. The cycle count for those operations is 32x32 = 1024 split into 4 pages of 256 pixels.
This overall means that the program requires 10326 cycles to complete which assuming a 1 MHz CPU speed would take 0.01036 seconds to complete.
Memory
During the execution of the code, memory is used for each instruction used and for variables. The total memory used for this program was the following:
Just to note: The answer of 8KB is not the max used during the program but it is the total amount of bytes used during the entire program. The byte values for the instructions were found in the manual here.
Optimized code
To optimize this code for better time I removed the loading of the x register to the beginning of the program instead of in the page loop. I then compare x to the current high-bit (since it's incremented) which would tell me if it is within the page's boundary for the bitmap.
lda #$00 ; a pointer in memory location $40 to point to $0200
sta $40 ; low byte ($00) goes in address $40
lda #$02
sta $41 ; high byte ($02) goes into address $41
LDX #$06 ; Loading number of pages
LDA #$06 ; Yellow color code
LDY #$00 ; set Y register index to 0
loop:
STA ($40),y ; Set pixel colour at the address (pointer)+Y
INY ; Increment Y register
BNE loop ; Continue until done the page (256 pixels)
INC $41 ; increment the page
CPX $41 ; Comparing the high bit to the number of pages
BNE loop ; continue until done all pages
Here are the new timings below
As can be seen, my new "optimized" implementation doesn't really shave off that much time only 0.000006 seconds.
I am not sure exactly how to cut more time from this program. I knew that it most likely had something to do with how the loops were constantly repeating instructions and that it could be optimized there somehow.
Modifying the Code
Colouring Each Page
Here is code that will add 1 to the accumulator used for getting the colour code for each pixel after every page so that its colour updates each page change.
lda #$00 ; set a pointer in memory location $40 to point to $0200
sta $40 ; low byte ($00) goes in address $40
lda #$02
sta $41 ; high byte ($02) goes into address $41
LDX #$06 ; Loading number of pages
LDA #$06 ; Yellow colour code
LDY #$00 ; set Y register index to 0
loop:
STA ($40),y ; Set pixel colour at the address (pointer)+Y
INY ; Increment Y register
BNE loop ; Continue until done the page (256 pixels)
ADC #$01 ; Adds 1 to the accumulator to update the color
; for each page
INC $41 ; increment the page
CPX $41 ; Comparing the high bit to the number of pages in x
BNE loop ; continue until done all pages
Here is the result:
Experiments
TYA
Adding the TYA instruction at the beginning of the loop:
After adding TYA it causes the bit map to display vertical lines of different colours. It appears that it is displaying each available colour in a loop for each pixel as each horizontal bit on the display is a different colour on each line and there are both 16 different colours and 32 bits on the display so it shows each colour twice.
LSR
Including LSR after and with the TYA instruction:
When adding LSR after TYA it causes each different coloured vertical line to become thicker meaning that there is only 1 set of the 16 different colours showing.
Adding more LSR instructions continually causes the lines to become thicker as after 3 LSR instructions there are 8 lines but since there are 16 colours available they overlap each other on every other line.
This eventually causes the single-width lines to be displayed horizontally once 5 instructions of LSR are performed. Additionally, it only repeats the first 8 colors instead of the 16.
ASL
Restarting the process by adding TYA I added ASL instead which caused the following.
The display once again displayed vertical lines for different colours, however, this time compared to using LSR some of the colours were skipped altogether. When referring to the 6502 emulator colour code chart on this page it seems that the ASL operation ignores one colour every time it loops meaning that only 8 out of the 16 colours are repeated for the 32 bits.
When adding more ASL operations like before, it seems that the program skips more and more for each ASL opcode added with each ASL added effectively halving the number of colours available (starting from 16) as seen below.
This culminates in 4 ASL instructions as it only shows the first colour: black, because 4 ASL instructions == 2^4 = 16 | 16/16 =1 meaning that the first colour is repeated for each pixel.
When referencing the manual located here ASL is an "Arithmetic shift left" which means that ASL performs a bitshift to the left which ends up dividing the byte by 2 explaining what is happening here with each additional instruction.
INY
Another experiment to perform is to add more INY (increment y register) instructions for each loop. I would expect this to increment the counter in 5's meaning it makes skip colouring pixels.
However, what happens is the screen still fills with colour but when running at the lowest speed you can see the pixels being filled in with 5 spaces apart for each pixel repeating 256 times which means that it overflows like (255 -> 300 (overflow) x --> 004) and so on meaning that eventually, the pixels fill up each page as usual then the entire display.
Random Colours
lda #$00 ; set a pointer in memory location $40 to point to $0200
sta $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address $41
lda $FE ; Sets random colour (number) to accumulator
ldx #$06 ; get the current page number
ldy #$00 ; set index to 0
loop: LDA $FE ; Adds random colour for next pixel
STA ($40), y
iny
BNE loop
inc $41 ; increment the page
cpx $41 ; compare with 6
bne loop ; continue until done all pages
This code loads the hexadecimal number 0xFE (254) which is defined as a pseudo-random number in the peripherals of the 6502 processor. This generates a random number assigned to the accumulator before each pixel is stored, allowing each pixel to be a random colour. Here are two examples with the above code on the 6502 Emulator:
Challenges
The code below sets the bit display to a single colour except for the middle 4 pixels.
lda #$00 ; set a pointer in memory location $40 to point to $0200
sta $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address $41
lda $FE ; Sets random colour (number) to accumulator
ldx #$06 ; get the current page number
ldy #$00 ; set index to 0
loop:
STA ($40), y
iny
BNE loop
inc $41 ; increment the page
cpx $41 ; compare current page to x (0600)
bne loop ; continue until done all pages
ADC #$01 ; Add one to the colour code
STA $03EF ; Storing the new coloured pixels at middle location
STA $03F0
STA $040F
STA $0410
My solution to this challenge was to manually add each of the middle pixel locations and assign them +1 to the colour chosen (kept the random colour choice in this code so it is still random).
Here is the output:
Final Thoughts
After completing this lab, and doing the experiments and challenges. I believe that I have learned the basic instructions and addressing methods for this assembler including the little-endian design philosophy when it comes to byte storage.
When first viewing the assembler demos and lab I was sort of intimidated by the randomness of how the code operated but throughout this lab, I experimented and got a good beginner's understanding of how the 6502 processer operates and how assembly operates in general.
Top comments (0)