Amir Mullagaliev

Posted on Jan 21 • Edited on Jan 22

SPO600: First Lab

#portability #optimization #6502 #assembly

Introduction

This blog post dedicated to the very first lab in the SPO600 course. It includes solutions for all of the tasks, screenshots, and the thoughts that I have regarding whole process.

Throughout week number two we were learning basics of 6502 CPU, and 6502 Assembly. It is interesting and invaluable experience. Moreover, couple of years ago I wanted to learn assembly language, for the reason, that I'd been learning C++ where you have to manually allocate dynamic memory.

6502 CPU is an 8 bit microprocessor that has an address bus with width of 16 bit. Therefore, 16-bit address comprised with two 8-bit bytes.

Lab Description

Since we don't have a physical 6502 CPU chip, we work using an online emulator. I ain't going to dive very deep into 6502's theory because I still don't feel very comfortable with it, and I don't want to mislead people that are reading it.

For the first lab we have initial code that we have to analyze and manipulate in different ways. This is how it looks:

        lda #$00    
        sta $40     
        lda #$02    
        sta $41     
        lda #$07    
        ldy #$00    
 loop:  sta ($40),y 
        iny     
        bne loop    
        inc $41     
        ldx $41     
        cpx #$06    
        bne loop

At first it looks horrifying. However, when you know how everything works, you will find out that is pretty logical sequence of instructions.

This code fills the bitmap with yellow colour on every page. Where every page is 256 bytes. Here's the output:

As you can see, it filled the bitmap entirely (all 4 pages).

First Task

Calculate how long it takes for the code to execute, assuming a 1 MHz clock speed.

One of the sources provided by professor in the lab is documentation that helps to understand how many cycles and memory usage of each instruction.

I have created the spreadsheet that helped me to count, as professor presented us during the lab session.

I broke down the code in the first column, number of cycles in the second, and the number of iterations in the third. Obviously, you will ask what alt-cycles and alt-count mean. These are alternative number of cycles and iterations.

Why alternative?

As per documentation we can see that some of instructions under certain conditions are adding number of cycles for an instruction.

In the provided code BNE is the only instruction that has this unique ability.

It is screenshot from the documentation for BNE instruction. If you take a closer look the number of cycles is 2 + t + p, where p added if page is crossed, and t if branch is taken. It is tricky part, but since we know the size of each page, and it the bitmap four of these pages, we can calculate everything.

To calculate time, we need to understand that clock speed is 1MHz (million cycles per second). Therefore, one cycle takes 0.000001s, therefore, 11326 cycles takes 11326 * 0.000001s which results in 0.011326s or 11.326mS or 11326uS.

Second Task

Calculate the total memory usage for the program code plus any pointers or variables.

This task wasn't hard, I included my calculations to the spreadsheet. However, it wasn't necessary because this emulator provides the number of bytes to the console:

Third Task

Find one or more ways to decrease the time taken to fill the screen with a solid colour. Calculate the execution time of the fastest version of this program that you can create. Challenge: the fastest version is nearly twice as fast as the original version shown above!

It is really challenging part, especially with the language that you barely know. As per professor's words, the best solution in terms of performance is the most stupid one, so I started digging.

First of all, I looked at documentation of STA instruction because it has different parameters that we can pass, and it affects the number of cycles that instruction takes.

Unfortunately, it didn't help at all, and I realized that it wouldn't even help to cut-off half of the cycles. However, It pushed me to find other way, and I realized that each page can iterate independently, the only thing that changes is the Y that iterates.

Here's the code that I came up with:

; load accumulator with yellow color
lda #$07
; set y to 00
ldy #$00
loop:
; store y, in each page on each position
; within the pages
sta $0200, y
sta $0300, y
sta $0400, y
sta $0500, y
; increment y
iny
; loops until y isn't 0
bne loop

Hopefully, all the comments clear and explain the principles of code. This solution cuts almost half of the operations:

Using the same technique, this spreadsheet shows that the number of cycles now is 6403. 11324 / 6403 ≈ 1.8. Therefore, this code 1.8 times faster, than initial version. I think that my solution perfectly fits requirements!

Forth Task

Change the code to fill the display with light blue instead of yellow.

For this task we will need a table of colour codes:

This task is simple as changing just one line of code, in my case it is first line, where we load accumulator with 7 or lda #$07, and change it to lda #$e. Code:

; load accumulator with yellow color
lda #$e
; set y to 00
ldy #$00
loop:
; store y, in each page on each position
; within the pages
sta $0200, y
sta $0300, y
sta $0400, y
sta $0500, y
; increment y
iny
; loops until y isn't 0
bne loop

Bitmap Display:

Fifth Task

Change the code to fill the display with a different colour on each page (each “page” will be one-quarter of the bitmapped display).

To solve this task, I decided to use initial code, where I can change colour every time page crossed. I was surprised that I found the solution fast.

The key point of this solution to understand that we are able to load accumulator with the value that matches page number – lda $41.

Solution:

; load accumulator with low-byte #$00
lda #$00
; store at address $40
sta $40
; load accumulator with high-byte #$02
lda #$02
; store at address $41
sta $41
; load Y register with #$00, to iterate further
ldy #$00
; load accumulator with color light-blue(#$0e)
lda #$0e
loop:
; store accumulator with value from
; address $40 + y
sta ($40), y
; increment y
iny
; iterate while y isn't 0
bne loop
; load accumulator with colors
; (#$02, #$03, #$04, #$05)
lda $41
; increment value at address $41
inc $41
; load x with value at address $41
ldx $41
; compare x with 6
cpx #$6
bne loop

Bitmap Display:

Sixth Task

Make each pixel a random colour.

This is going to be the last task considered in this blog post, but we will continue in the next one. We will take a look at experiments and challenges professor proposed.

Let's get back to the task.

As per documentation, if we use lda $fe it generates one-byte pseudo-random number. Doesn't take much effort to execute, also using initial code. I added inside of the loop this instruction, so it gives every pixel a random colour.

Code:

; load accumulator with low-byte #$00
lda #$00
; store at address $40
sta $40
; load accumulator with high-byte #$02
lda #$02
; store at address $41
sta $41
; load Y register with #$00, to iterate further
ldy #$00
loop:
; store accumulator with value from
; address $40 + y
sta ($40), y
; load random color to accumulator
lda $fe
; increment y
iny
; iterate while y isn't 0
bne loop
; increment value at address $41
inc $41
; load x with value at address $41
ldx $41
; compare x with 6
cpx #$6
bne loo

Bitmap Display:

Conclusion

It is really interesting experience to learn assembly. Epecially, when you start from simple versions like 6502. It is not easy to me, it requires a lot of reading, you cannot just dive into the code without theory. Moreover, assembly requires knowledge how hardware works. I will continue in my next blog with the rest of the lab experiments and the challenges!

DEV Community

SPO600: First Lab

Introduction

Lab Description

First Task

Second Task

Third Task

Forth Task

Fifth Task

Sixth Task

Conclusion

Top comments (0)

Read next

Pull Request testing on Kubernetes: working with GitHub Actions and GKE

Building Mobile Apps Without a Backend: The Power of Database Gateway API

Apache httpd-2.4 - Secured Configuration on Fedora

Print each level of a Tree in a new line