DEV Community

Cover image for Exploring 6502 Assembly Language - Lab 1 (Part One)
Juro Zaw
Juro Zaw

Posted on • Edited on

Exploring 6502 Assembly Language - Lab 1 (Part One)

Introduction

Hi everybody! In this blog post, I will document my experience through exploring the 6502 assembly language as part of my work for SPO600. Lab 1 is about implementing bitmap code, calculating the execution time and optimizing some parts of it. This is the Part 1 of Lab 1.

6502 Emulator

To write and test my assembly programs, I started by accessing the 6502 Emulator.

📌 Initial Code

We will start off with this code. Essentially, it fills the entire bitmap display with one color. The Y register is used for storing index and the X register is used to stop the loop by comparing with a value.

    lda #$00    ; set a pointer in memory location $40 to point to $0200
    sta $40     ; ... low byte ($00) goes in address $40
    lda #$02    
    sta $41     ; ... high byte ($02) goes into address $41
    lda #$07    ; colour number
    ldy #$00    ; set index to 0
 loop:  sta ($40),y ; set pixel colour at the address (pointer)+Y
    iny     ; increment index
    bne loop    ; continue until done the page (256 pixels)
    inc $41     ; increment the page
    ldx $41     ; get the current page number
    cpx #$06    ; compare with 6
    bne loop    ; continue until done all pages
Enter fullscreen mode Exit fullscreen mode

There are two loops in the code. The inner loop fills one page with the color. In the first run of the loop, it would fill the entire $2000 page.

Note that this value is divided into two ($00 and $20) and they are stored in starting memory locations $40 and $41 as a start.

 loop:  sta ($40),y ; set pixel colour at the address (pointer)+Y
    iny     ; increment index
    bne loop    ; continue until done the page (256 pixels)
Enter fullscreen mode Exit fullscreen mode

After this loop, it will still go through the rest of the loop until we reach page $06 (the higher byte == immediate value $06)

     inc $41        ; increment the page
     ldx $41        ; get the current page number
     cpx #$06    ; compare with 6
     bne loop    ; continue until done all pages
Enter fullscreen mode Exit fullscreen mode

📌 Calculating Performance

Now, we will calculate how long it takes for the code to load, assuming a 1 MHz clock speed. The 6502 Reference Sheet is used to count the number of cycles.

Below is a table detailing how the performance is calculated.

Instruction Cycles Count Alt Cycle Alt Count Total Cycles
LDA #$00 2 1 - - 2
STA $40 3 1 - - 3
LDA #$02 2 1 - - 2
STA $41 3 1 - - 3
LDA #$07 2 1 - - 2
LDY #$00 2 1 - - 2
STA ($40),Y 6 1,024 - - 6,144
INY 2 1,024 - - 2,048
BNE loop 3 1,020 2 4 3,068
INC $41 5 4 - - 20
LDX $41 3 4 - - 12
CPX #$06 2 4 - - 8
BNE loop 3 3 2 1 11
Total 11,325

Execution Timing Details

Parameter Value
Total Cycles 11,325 cycles
Clock Speed 1 MHz
Cycle Time 1 µs per cycle
Execution Time
   • Seconds (s) 0.011325 s
   • Milliseconds (ms) 11.325 ms
   • Microseconds (µs) 11,325 µs

Memory Usage

I will also count the number of bytes for each operation from this link and calculate the memory usage.

Component Bytes
Program Code 29
Pointers/Variables 2
Total Memory Usage 31

Program Code

Instruction Bytes
LDA #$00 2
STA $40 3
LDA #$02 2
STA $41 3
LDA #$07 2
LDY #$00 2
STA ($40),Y 2
INY 1
BNE loop (inner) 2
INC $41 3
LDX $41 3
CPX #$06 2
BNE loop (outer) 2
Subtotal 29

Pointers/Variables

Pointer/Variable Address Bytes
Pointer Low Byte $40 1
Pointer High Byte $41 1
Subtotal 2

📌 Let's Optimize it!

Now, the provided code works fine but I believe we can make it run faster! There are three things that we are going to change:

  • Change Addressing Mode

Indirect Addressing (STA ($40), Y) takes 6 cycles per instruction. We can change that to Absolute Addressing instead.

  • Loop Adjustment

Instead of writing one byte per loop and then changing the page, we can write four bytes in each iteration. This will reduce the number of iterations

  • Remove Pointer

Same idea as the first reason, by using absolute addressing, we don't need to manage a pointer via addresses $40 and $41.

With all these changes in mind, this is what the final optimized code looks like:

        LDA #$07        ; Load accumulator with color number $07
        LDY #$00        ; Initialize Y register to 0

fill_screen:
        STA $0200,Y     ; Store $07 at $0200 + Y
        STA $0300,Y     ; Store $07 at $0300 + Y
        STA $0400,Y     ; Store $07 at $0400 + Y
        STA $0500,Y     ; Store $07 at $0500 + Y
        INY             ; Increment Y
        BNE fill_screen ; Branch to fill_screen if Y != $00 (256 iterations)
Enter fullscreen mode Exit fullscreen mode

Calculating Performance for Optimized Code

Instruction Cycles Count Alt Cycle Alt Count Total Cycles
LDA #$07 2 1 - - 2
LDY #$00 2 1 - - 2
STA $0200,Y 4 256 - - 1,024
STA $0300,Y 4 256 - - 1,024
STA $0400,Y 4 256 - - 1,024
STA $0500,Y 4 256 - - 1,024
INY 2 256 - - 512
BNE fill_screen 3 255 - - 765
Total 6,363

Execution Timing Details

Parameter Value
Total Cycles 6,363 cycles
Clock Speed 1 MHz
Cycle Time 1 µs per cycle
Execution Time
   • Seconds (s) 0.006363 s
   • Milliseconds (ms) 6.363 ms
   • Microseconds (µs) 6,363 µs

As you can see the changes I made lessened the execution time by almost half!!

In the next blog post, we will be modifying the color to change as well as do some experiments.

Top comments (0)