Raspberry 5 bare metal GIC and timer's

Intro

Bringing up timers for Raspberry PI 5. There is 2 options ARM ones and BCM one. Then idea is to create irq handlers and handle timer events. Change from previouse versions of raspberry PI is that BCM timer now using GIC rather then BCM IRQ's.

ARM generic timer

ARM support generic timer's that come out of the box for ARM SoC's

Linux kerne defines armv8 timers as

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
timer {
		compatible = "arm,armv8-timer";
		interrupts = <GIC_PPI 13 (GIC_CPU_MASK_SIMPLE(4) |
					  IRQ_TYPE_LEVEL_LOW)>,
			     <GIC_PPI 14 (GIC_CPU_MASK_SIMPLE(4) |
					  IRQ_TYPE_LEVEL_LOW)>,
			     <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(4) |
					  IRQ_TYPE_LEVEL_LOW)>,
			     <GIC_PPI 10 (GIC_CPU_MASK_SIMPLE(4) |
					  IRQ_TYPE_LEVEL_LOW)>,
			     <GIC_PPI 12 (GIC_CPU_MASK_SIMPLE(4) |
					  IRQ_TYPE_LEVEL_LOW)>;
	};

There are one of possibilities to run as timers.

BCM2835 timer

Timer allows to set value and when the value matches the counter interrupt is rised. Can be used for system tick's or sheduling events. Its seem's esiers peripheral to start with.

Interrupts

Linux kernel defines interrupts in device-tree as

1
2
3
4
5
6
7
8
9
system_timer: timer@7c003000 {
	compatible = "brcm,bcm2835-system-timer";
	reg = <0x7c003000 0x1000>;
	interrupts = <GIC_SPI 64 IRQ_TYPE_LEVEL_HIGH>,
		     <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>,
		     <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>,
		     <GIC_SPI 67 IRQ_TYPE_LEVEL_HIGH>;
	clock-frequency = <1000000>;
};
GIC_SPI Actual number Note
64 96 Timer C0
65 97 Timer C1
66 98 Timer C2
67 99 Timer C3

System Timer Registers

Address where are mapped was 0x107c003000

Offset Register Name Description
0x0 CS System Timer Control/Status
0x4 CLO System Timer Counter Lower 32 bits
0x8 CHI System Timer Counter Higher 32 bits
0xc C0 System Timer Compare 0
0x10 C1 System Timer Compare 1
0x14 C2 System Timer Compare 2
0x18 C3 System Timer Compare 3

Usage of the driver

Clear Control registers, is just writing bit's that need to be cleared

1
2
3
4
5
6
7
#define bcm2835_read32(offset) (*(volatile uint32_t *)(TM_BCM2835_REG_OFFSET+offset))
#define bcm2835_write32(offset,val) (*(volatile uint32_t *)(TM_BCM2835_REG_OFFSET+offset)) = val


void timer_clr_stat() {
    bcm2835_write32(BCM2835_CS,0xF);
}

To set timer event, need to read counter register and add timer interval to that value

1
2
3
4
5
void timer_start_c0() {
    uint32_t t_lo = bcm2835_read32(BCM2835_CLO);
    t_lo += TIMER_INTERVAL;
    bcm2835_write32(BCM2835_C0, t_lo);
}

GIC

GIC - Generic Interrupt Controller

Comparing to other ones that I have worked with this is more complex and support different modes. All ARM SoC's will have it and in one or other way will need to deal with it, so its interesting one to check how it works.

GIC consist from few parts depend on version. RP5 uses GIC-400

GIC Register list

Register Offset Note
GICC_CTRL 0x0 Enable disable IRQ types, configure IRQ group behaviour
GICC_PMR 0x4 IRQ priority filter
GICC_IAR 0xC Read current IRQ to handle
GICC_EOIR 0x10 Notify that IRQ is handled
GICC_APR0 0xD0 State management
GICC_APR1 0xD4 State management
GICC_APR2 0xD8 State management
GICC_APR3 0xDC State management

GICD Register list

Register Offset Note
GICD_CTRL 0x0 Enable/Disable forward IRQ's
GICD_IDR 0x8 Identification register
GICD_IGROUPR 0x080 Set interrupt group
GICD_ISENABLER 0x100 Set enable IRQ
GICD_ICENABLER 0x180 Clear enable IRQ
GICD_ISPENDR 0x200 Set pending interrupt
GICD_ICPENDR 0x280 Clear pending interrupt
GICD_ISACTIVER 0x300 Set active interrupt
GICD_ICACTIVER 0x380 Clear active interrupt
GICD_IPRIORITYR 0x400 Set IRQ priority
GICD_ITARGETSR 0x800 Set target CPU
GICD_ICFGR 0xC00 Interrupt config

Initializing GIC

Setting up GIC for use

Setting up GIC. Switch of Controller and Distributor, clear all pending and active statuses. Set all priority filters, and enable all groups. And then enable Controller and Distributor.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
iowrite32(GICD_CTRL, 0);
iowrite32(GICC_CTRL, 0);

for (i = 0; i < 16; i++) {
	iowrite32(GICD_ICENABLER(i), 0xffffffff);
	iowrite32(GICD_ICPENDR(i),   0xffffffff);
	iowrite32(GICD_ICACTIVER(i), 0xffffffff);
}
for (i = 0; i < 16; i++)
	iowrite32(GICD_IGROUPR(i), 0xffffffff);
for (i = 0; i < 128; i++) {
	iowrite32(GICD_IPRIORITYR(i), 0xa0a0a0a0); 
	iowrite32(GICD_ITARGETSR(i),  0x01010101); 
}
for (i = 0; i < 32; i++)
	iowrite32(GICD_ICFGR(i), 0x55);
iowrite32(GICD_CTRL, 0x1);
iowrite32(GICC_PMR, 0xff);
iowrite32(GICC_CTRL, 0x3);

Draining interrupt queue before use.

There was some time spent to figure out why IAR report's spuriouse interrupts so code added that drains all interrupts and restart state. Issue with enabling all IRQ's there was 3 extra interrupts that for some reason where raised. Probably something GPU related. And GIC state after few kernel loads in gdb where messed up.

drain all pending interrupts

1
2
3
uint32_t val;
while (((val = ioread32(GICC_IAR)) & 0x3ff) < 1020)
	iowrite32(GICC_EOIR, val);

Restart state machine

1
2
3
4
iowrite32(GICC_APR0, 0);
iowrite32(GICC_APR1, 0);
iowrite32(GICC_APR2, 0);
iowrite32(GICC_APR3, 0);

Enable interrupt handling

1
iowrite32(GICD_ISENABLER(irqn/32), 1 << (irqn % 32));

GDB notes

Checking status of GIC_IAR register in gdb

p/x *0x107fffa00c

if value is 0x3ff its spurious interrupt. I seen this state after loading kernel few times and couldn't recover from that option was to restart the board or use GICC_APR register's to clear the state.

Execution levels

Pi 5 launches from EL2 level and that why not possible to configure interrupts. So we need to modify armstub to be able to do that.

To check execution level do

1
2
3
4
5
static inline uint32_t current_el(void) {
    uint64_t el;
    __asm__ volatile("mrs %0, CurrentEL" : "=r"(el));
    return (uint32_t)((el >> 2) & 0x3);
}

On my machine it got reported as EL2 without any stub configuration there is possible to configure arm stub that will be executed before loading kernel

armstub=armstubv8.bin

base example can be found at https://github.com/raspberrypi/tools/blob/master/armstubs/armstub8.S
I tried to modify and run it, Execution level was reported as EL3, seems it failed and need more investigation then just bare replacment of addresses

IRQ handling

To handle IRQ's need to setup vector table and passed to VBAR register, and irq handling function

Vector tables

.align 11
sync_exception_handler:
    bl kernel_panic
    eret

.macro ventry label
    .align 7
    b \label
.endm

dummy_vector:
    b 0x200

.align 11
vector_table:
    ventry sync_exception_handler
    ventry dummy_vector 
    ventry dummy_vector 
    ventry dummy_vector 
    
    //EL1h
    ventry dummy_vector 
    ventry handle_irq
    ventry dummy_vector 
    ventry dummy_vector 

Vector table address need to be configured, each of the execution levels can't see higher level registers, so for EL2 and EL3 different registers need to be set

for EL2

ldr x1, =vector_table
msr VBAR_EL2, x1
isb

for EL3

ldr x1, =vector_table
msr VBAR_EL3, x1
isb

It took some time to figure out why EL3 part didn't worked, only after checking execution level right approach was chosen (it was EL2). Also when I did attempted to run stub I got EL3 rather then EL2.

IRQ handler

IRQ handler function will be called when generic IRQ is triggered, so it will need to handle each of the IRQ's in some way

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
void irq_handler() {
    uint32_t val = ioread32(GICC_IAR);
    uint32_t irq = val & 0x3ff;

    if (irq >= 1020) {
        return;
    }

    pl011_puts("Got IRQ ");
    pl011_putu32(irq);
    pl011_puts("\r\n");
    bcm2835_write32(BCM2835_CS, 0xf);  // clear all four match flags
    timer_start_c1();

    iowrite32(GICC_EOIR, val);
}

Enabling global interrupts

enable_global_interrupts:
    msr daifclr, #7
    dmb osh
    ret

Sumup

There is many parts that's need's to be combined to make IRQ's to work

  1. Setting up the vector table
  2. Write vector table address
  3. Enable the interrupts in GIC
  4. Enable global interrupts
  5. Set timer
  6. Handle interrupt

Some parts needs to be cleaned up

AI usage

There was needed some help fro LLM's to figure out why interrupts didn't fired.

First LLM attempt was to blame that IRQ's for BCM timer is wrong, it tried to figure out IRQ's used from kernel and blamed those are wrongs. Then it tried to implement generic ARM's timer's that wasn't the goal. It's always assumed execution level is EL3, after trying to load ARMv8 stub figured out that actually execution level was EL2, as armstub failed and crashed kernel that how figured out that its in EL3 rather then EL2.

After this thing's got better but issue with IAR register that always showed at 0x3ff no internet search showed how to deal with this, one of the things that LLM added is to list all pending IRQ's. As config where enabling all IRQ's there is 3 extra IRQ's that popup's 129,141,260, after checking Linux kernel some seems related to graphics, default config.txt have HDMI screen setup, so those comes from RP firmware and those caused that if kernel loaded second time IAR was 0x3ff. This only can be fixed after restart.

Search didn't gave anything how to deal with it but with LLM suggest to reset APR registers, cant find much doc's on those. Also suggested the IRQ drain loop. After this before loading kernel could get rid of spurious interrupt state. Without LLM, set the all interrupts disabled and only enabled ones for timer and things started to go well.

ARMv8 Stub's

Raspberry Pi provide option to boot custom armstub's

I have tried to modify stub to update to register's from raspi5, but after loading stub, I dont think its worked properly.

Source

Source code doesn't have the UART, linker script and Makefiles as those was explored in previous notes.

timer-bcm2835.h

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#ifndef __TIMER_BCM2835_H
#define __TIMER_BCM2835_H

#include "io.h"

//typedef unsigned char uint8_t;
//typedef unsigned short uint16_t;
//typedef unsigned int uint32_t;

//#define ARM_IO_BASE 0x107C000000
//#define RP1_IO_BASE 0x1c00000000UL
#define TM_BCM2835_REG_OFFSET (ARM_IO_BASE+0x7c003000)


#define BCM2835_CS  0x00
#define BCM2835_CLO 0x04
#define BCM2835_CHI 0x08
#define BCM2835_C0  0x0c
#define BCM2835_C1  0x10
#define BCM2835_C2  0x14
#define BCM2835_C3  0x18

#define bcm2835_read32(offset) (*(volatile uint32_t *)(TM_BCM2835_REG_OFFSET+offset))
#define bcm2835_write32(offset,val) (*(volatile uint32_t *)(TM_BCM2835_REG_OFFSET+offset)) = val

#define TIMER_INTERVAL 1000000

#define GIC_SPI_IRQ_BCM2835_C0 96
#define GIC_SPI_IRQ_BCM2835_C1 97
#define GIC_SPI_IRQ_BCM2835_C2 98
#define GIC_SPI_IRQ_BCM2835_C3 99



void timer_set_cmp1(uint32_t val);
void timer_get_cmp1();
void timer_get_cnt();
void timer_status();
void timer_clr_stat_c0();
void timer_clr_stat_c1();
void timer_clr_stat();
void timer_start_c0();
void timer_start_c1();
void timer_start_c3();

#endif

timer-bcm2835.c

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include "timer-bcm2835.h"
#include "pl011.h"

void timer_get_cnt() {
    uint32_t lo = bcm2835_read32(BCM2835_CLO);
    uint32_t hi = bcm2835_read32(BCM2835_CHI);
    pl011_putu32(hi);
    pl011_putu32(lo);
    pl011_puts("\r\n");
}

void timer_status() {
    uint32_t stat = bcm2835_read32(BCM2835_CS);
    pl011_puts("timer status ");
    pl011_putu32(stat);
    pl011_puts("\r\n");
}

void timer_clr_stat_c1() {
    bcm2835_write32(BCM2835_CS,0x2);
}
void timer_clr_stat_c0() {
    bcm2835_write32(BCM2835_CS,0x1);
}

void timer_clr_stat() {
    bcm2835_write32(BCM2835_CS,0xF);
}

void timer_start_c0() {
    uint32_t t_lo = bcm2835_read32(BCM2835_CLO);
    t_lo += TIMER_INTERVAL;
    bcm2835_write32(BCM2835_C0, t_lo);
}

void timer_start_c1() {
    uint32_t t_lo = bcm2835_read32(BCM2835_CLO);
    t_lo += TIMER_INTERVAL;
    bcm2835_write32(BCM2835_C1, t_lo);
}

void timer_start_c3() {
    uint32_t t_lo = bcm2835_read32(BCM2835_CLO);
    t_lo += TIMER_INTERVAL;
    bcm2835_write32(BCM2835_C3, t_lo);
}

gic400.h

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#ifndef __GIC400_H
#define __GIC400_H

#include "io.h"

#define GIC400_REG_OFFSET (ARM_IO_BASE+0x0)

#define GIC400_GICD_BASE (GIC400_REG_OFFSET+0x7fff9000)
#define GIC400_GICC_BASE (GIC400_REG_OFFSET+0x7fffa000)

#define GICD_CTRL			(GIC400_GICD_BASE+0x0)
#define GICD_IDR 			(GIC400_GICD_BASE+0x8)
#define GICD_IGROUPR(n)     (GIC400_GICD_BASE+0x080+4*(n))
#define GICD_ISENABLER(n)	(GIC400_GICD_BASE+0x100+4*(n))
#define GICD_ICENABLER(n)	(GIC400_GICD_BASE+0x180+4*(n))
#define GICD_ISPENDR(n)     (GIC400_GICD_BASE+0x200+4*(n))
#define GICD_ICPENDR(n)		(GIC400_GICD_BASE+0x280+4*(n))
#define GICD_ISACTIVER(n)   (GIC400_GICD_BASE+0x300+4*(n))
#define GICD_ICACTIVER(n)   (GIC400_GICD_BASE+0x380+4*(n))
#define GICD_IPRIORITYR(n)	(GIC400_GICD_BASE+0x400+4*(n))
#define GICD_ITARGETSR(n)	(GIC400_GICD_BASE+0x800+4*(n))
#define GICD_ICFGR(n)		(GIC400_GICD_BASE+0xC00+4*(n))

#define GICC_CTRL (GIC400_GICC_BASE+0x0)
#define GICC_PMR  (GIC400_GICC_BASE+0x4)
#define GICC_IAR  (GIC400_GICC_BASE+0xC)
#define GICC_EOIR (GIC400_GICC_BASE+0x10)
#define GICC_RPR  (GIC400_GICC_BASE+0x14)   // running priority (read to diagnose)
#define GICC_APR0 (GIC400_GICC_BASE+0xD0)   // active priorities (write 0 to unstick)
#define GICC_APR1 (GIC400_GICC_BASE+0xD4)
#define GICC_APR2 (GIC400_GICC_BASE+0xD8)
#define GICC_APR3 (GIC400_GICC_BASE+0xDC)


void gic400_init();
void gic400_reset();
uint32_t gic400_version();
void gic400_enable_irq(unsigned int irqn);
void gic400_set_pending_irq(unsigned int irqn);
void gicc_drain(void);

#endif

gic400.c

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <gic400.h>

void gicc_drain(void) {
	uint32_t val;
	while (((val = ioread32(GICC_IAR)) & 0x3ff) < 1020)
		iowrite32(GICC_EOIR, val);
}

void gic400_init() {
	int i;

	iowrite32(GICD_CTRL, 0);
	iowrite32(GICC_CTRL, 0);

	for (i = 0; i < 16; i++) {
		iowrite32(GICD_ICENABLER(i), 0xffffffff);  // disable
		iowrite32(GICD_ICPENDR(i),   0xffffffff);  // clear pending
		iowrite32(GICD_ICACTIVER(i), 0xffffffff);  // clear active
	}

	for (i = 0; i < 16; i++)
		iowrite32(GICD_IGROUPR(i), 0xffffffff);    // all -> Group 1 (NS)
	for (i = 0; i < 128; i++) {
		iowrite32(GICD_IPRIORITYR(i), 0xa0a0a0a0); // mid priority
		iowrite32(GICD_ITARGETSR(i),  0x01010101); // -> CPU0
	}
	for (i = 0; i < 32; i++)
		iowrite32(GICD_ICFGR(i), 0x55);            // (SPIs: level-triggered)

	iowrite32(GICD_CTRL, 0x1);

	iowrite32(GICC_PMR, 0xff);
	iowrite32(GICC_CTRL, 0x3);                      // EnableGrp0 | EnableGrp1
	
	iowrite32(GICC_APR0, 0);
	iowrite32(GICC_APR1, 0);
	iowrite32(GICC_APR2, 0);
	iowrite32(GICC_APR3, 0);

	gicc_drain();
}

uint32_t gic400_version() {
	return ioread32(GICD_IDR);
}

void gic400_enable_irq(unsigned int irqn) {
	iowrite32(GICD_ISENABLER(irqn/32), 1 << (irqn % 32));
}

void gic400_set_pending_irq(unsigned int irqn) {
	iowrite32(GICD_ISPENDR(irqn/32), 1 << (irqn % 32));
}

boot.S

// AArch64 mode

// To keep this in the first portion of the binary.
.section ".text.boot"

// Make _start global.
.globl _start

//    .org 0x80000
// Entry point for the kernel. Registers:
// x0 -> 32 bit pointer to DTB in memory (primary core only) / 0 (secondary cores)
// x1 -> 0
// x2 -> 0
// x3 -> 0
// x4 -> 32 bit kernel entry point, _start location
_start:
    // set stack before our code
    ldr     x5, =_start
    mov     sp, x5

    // clear bss
    ldr     x5, =__bss_start
    ldr     w6, =__bss_size
1:  cbz     w6, 2f
    str     xzr, [x5], #8
    sub     w6, w6, #1
    cbnz    w6, 1b

    

    // jump to C code, should not return
2:  
    //mrs x0, scr_el3
    //orr x0, x0, #(1<<1)   // SCR_EL3.IRQ — take physical IRQ to EL3
    //msr scr_el3, x0
    //isb

    //set vector table (EL2!)
    ldr x1, =vector_table
    msr VBAR_EL2, x1
    isb

    // Route physical IRQ to EL2 (HCR_EL2.IMO = bit 4)
    mrs x0, hcr_el2
    orr x0, x0, #(1 << 4)     // IMO
    msr hcr_el2, x0
    isb

    bl      main
    // for failsafe, halt this core
halt:
    wfe
    b halt

.align 11
sync_exception_handler:
    bl kernel_panic
    eret

.macro ventry label
    .align 7
    b \label
.endm

dummy_vector:
    b kernel_panic

.align 11
vector_table:
    ventry sync_exception_handler
    ventry handle_irq 
    ventry dummy_vector  
    ventry dummy_vector 
    
    //EL1h
    ventry dummy_vector 
    ventry handle_irq
    ventry dummy_vector  
    ventry dummy_vector 

    ventry sync_exception_handler
    ventry handle_irq 
    ventry dummy_vector  
    ventry dummy_vector 
    
    //EL1h
    ventry dummy_vector 
    ventry handle_irq
    ventry dummy_vector  
    ventry dummy_vector 

irq.S

.global enable_global_interrupts

enable_global_interrupts:
    msr daifclr, #7
    dmb osh
    ret

//push all the registers just after the stack pointer
.macro m__save_curr_context
    //reserving 16*16 bytes of stack
    sub sp, sp, #16*16  //30 regs, 8 bytes each
    stp x0, x1, [sp, #16*0]
    stp x2, x3, [sp, #16*1]
    stp x4, x5, [sp, #16*2]
    stp x6, x7, [sp, #16*3]
    stp x8, x9, [sp, #16*4]
    stp x10, x11, [sp, #16*5]
    stp x12, x13, [sp, #16*6]
    stp x14, x15, [sp, #16*7]
    stp x16, x17, [sp, #16*8]
    stp x18, x19, [sp, #16*9]
    stp x20, x21, [sp, #16*10]
    stp x22, x23, [sp, #16*11]
    stp x24, x25, [sp, #16*12]
    stp x26, x27, [sp, #16*13]
    stp x28, x29, [sp, #16*14]
    str x30, [sp, #16*15]
.endm

//undo previous operation and start restoring
.macro m__resume_from_context
    ldp x0, x1, [sp, #16*0]
    ldp x2, x3, [sp, #16*1]
    ldp x4, x5, [sp, #16*2]
    ldp x6, x7, [sp, #16*3]
    ldp x8, x9, [sp, #16*4]
    ldp x10, x11, [sp, #16*5]
    ldp x12, x13, [sp, #16*6]
    ldp x14, x15, [sp, #16*7]
    ldp x16, x17, [sp, #16*8]
    ldp x18, x19, [sp, #16*9]
    ldp x20, x21, [sp, #16*10]
    ldp x22, x23, [sp, #16*11]
    ldp x24, x25, [sp, #16*12]
    ldp x26, x27, [sp, #16*13]
    ldp x28, x29, [sp, #16*14]
    ldr x30, [sp, #16*15]
    add sp, sp, #16*16  //reclaim stack
.endm

.global handle_irq
handle_irq:
    m__save_curr_context
    bl irq_handler
    m__resume_from_context
    eret

kernel.c

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#include <pl011.h>
#include <timer-bcm2835.h>
#include <gic400.h>

extern void enable_global_interrupts();

void kernel_panic(void) __attribute__ ((noreturn));

void kernel_panic() {
    pl011_puts("Kernel panic\r\n");
    while(1) {
    }
}

volatile uint32_t spurious_count = 0;

void irq_handler() {
    uint32_t val = ioread32(GICC_IAR);
    uint32_t irq = val & 0x3ff;

    if (irq >= 1020) {
        spurious_count++;
        return;
    }

    pl011_puts("Got IRQ ");
    pl011_putu32(irq);
    pl011_puts("\r\n");
    bcm2835_write32(BCM2835_CS, 0xf);  // clear all four match flags
    timer_start_c1();

    iowrite32(GICC_EOIR, val);
}

static inline uint32_t current_el(void) {
    uint64_t el;
    __asm__ volatile("mrs %0, CurrentEL" : "=r"(el));
    return (uint32_t)((el >> 2) & 0x3);
}

static void gic_drain(void) {
    uint32_t val;
    while (((val = ioread32(GICC_IAR)) & 0x3ff) < 1020)
        iowrite32(GICC_EOIR, val);
    
}

static void put_dec(uint32_t v) {
    char buf[10];
    int i = 0;
    if (v == 0) { pl011_putc('0'); return; }
    while (v) { buf[i++] = '0' + (v % 10); v /= 10; }
    while (i) pl011_putc(buf[--i]);
}

static void scan_pending(void) {
    for (int n = 0; n < 16; n++) {
        uint32_t p = ioread32(GICD_ISPENDR(n));
        for (int b = 0; b < 32; b++) {
            if ((p >> b) & 1) {
                pl011_puts("  pending IRQ ");
                put_dec(32 * n + b);
                pl011_puts("\r\n");
            }
        }
    }
}

void main()
{
    pl011_puts("----------------------------\r\n");
    pl011_puts("Running at EL");
    pl011_putu32(current_el());
    pl011_puts("\r\n");

    gic400_init();

    bcm2835_write32(BCM2835_CS, 0xf);
    gic400_enable_irq(97);
    gic400_enable_irq(99);
    enable_global_interrupts();
    timer_start_c1();
    timer_start_c3();
    
    for (volatile int d = 0; d < 4000000; d++);

    pl011_puts("CS = "); 
    pl011_putu32(bcm2835_read32(BCM2835_CS)); 
    pl011_puts("\r\n");

    scan_pending();

    while (1) {
        __asm__ volatile("wfi");
    }
}

Links

https://www.raspberrypi.com/documentation/computers/config_txt.html
https://forums.raspberrypi.com/viewtopic.php?t=371974
https://docs.amd.com/r/en-US/ug585-zynq-7000-SoC-TRM/Shared-Peripheral-Interrupts-SPI?tocId=jlBXQc~lyPkRtaPCJuxp2Q
https://developer.arm.com/documentation/ihi0048/b/Programmers--Model/CPU-interface-register-descriptions/Interrupt-Acknowledge-Register--GICC-IAR
https://www.scs.stanford.edu/~zyedidia/docs/arm/armv8_baremetal.pdf
https://stackoverflow.com/questions/66962053/why-kernels-arm64-vector-table-is-aligned-with-11
https://github.com/raspberrypi/tools/blob/master/armstubs/armstub8.S
https://github.com/rsta2/circle/blob/master/boot/config64.txt
https://github.com/rsta2/circle/blob/master/boot/armstub/armstub8.S
https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
https://github.com/Utsav-Agarwal/HobOS/blob/main/boot.S
https://developer.arm.com/documentation/ddi0471/b/
https://developer.arm.com/documentation/ihi0048/b/Programmers--Model/CPU-interface-register-descriptions/CPU-Interface-Control-Register--GICC-CTLR
https://developer.arm.com/documentation/ihi0048/b/Programmers--Model/CPU-interface-register-descriptions/Interrupt-Priority-Mask-Register--GICC-PMR
https://developer.arm.com/documentation/ihi0048/b/Programmers--Model/Distributor-register-descriptions/Distributor-Control-Register--GICD-CTLR
https://developer.arm.com/documentation/ihi0048/b/Programmers--Model/Distributor-register-descriptions/Interrupt-Group-Registers--GICD-IGROUPRn
https://developer.arm.com/documentation/ddi0601/2025-12/External-Registers/GICD-ITARGETSR-n---Interrupt-Processor-Targets-Registers
/writeup/raspberry5_baremetal_uart.md