Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle signals properly #529

Merged
merged 1 commit into from
Dec 30, 2024
Merged

Conversation

ChinYikMing
Copy link
Collaborator

@ChinYikMing ChinYikMing commented Dec 26, 2024

flowchart TB
    X[page faults] --> trap --> do_page_fault --> lock_mm_and_find_vma --> find_vma
    find_vma -- valid VMA --> A[Map the VMA to page table and resume the userspace]
    find_vma -- NULL --> bad_area_nosemaphore
    bad_area_nosemaphore -- from kernel space --> die_kernel_fault
    bad_area_nosemaphore -- from user space --> do_trap --> B[Checks whether the signal should be ignored or blocked, and if not, adds it to the task's pending signal list.]
                         
    ret_from_exception --> resume_userspace_slow --> do_work_pending -- determined by _TIF_WORK_MASK --> do_signal --> handle_signal --> setup_rt_frame --> C[Update SEPC CSR if needed] --> resume_userspace
Loading

The above flowchart is the simplified version of handling page faults.

Page faults trigger a trap, which is handled by do_page_fault(). This function calls lock_mm_and_find_vma() to locate and validate the virtual memory area (VMA), returning the VMA if valid, or NULL otherwise. Typically, attempts to read or write to a NULL VMA result in a NULL return. If the VMA is invalid, bad_area_nosemaphore() is invoked, which checks whether the fault originated in kernel or user space.

For user-space faults, a SIGSEGV signal is sent to the user process via do_trap(), which determines if the signal should be ignored or blocked, and if not, adds it to the task's pending signal list. Kernel-space faults cause the kernel to crash via die_kernel_fault().

Before returning to user space (via the resume_userspace label), pending work (indicated by the _TIF_WORK_MASK mask) is processed by do_work_pending(). Signals are handled by do_signal(), which in turn calls handle_signal(). handle_signal() creates a signal handler frame that will be jumped to upon returning to user space. This frame creation process might modifies the Control and Status Register (CSR) SEPC. If there are a signal pending, the SEPC CSR overwritten the original trap/fault PC. This caused an assertion failure in get_ppn_and_offset() when running the vi program, reported in [1].

To address this, a variable last_csr_sepc was introduced to store the original SEPC CSR value before entering the trap path. After returning to user space, last_csr_sepc is compared with the current SEPC CSR value. If they differ, the fault ld/st instruction returns early and jumps to the signal handler frame.

This commit prevents emulator crashes when the guest OS accesses invalid memory. Consequently, reads or writes to a NULL value now correctly result in a segmentation fault. In addition, two user-space programs: mem_null_read and mem_null_write are bundled into the rootfs for verification.

mem_null_read and mem_null_write source codes:

mem_null_read

#include <stdio.h>

int main(){
	char *ptr = NULL;
	printf("*ptr: %c\n", *ptr);
	return 0;
}

mem_null_write

#include <stdio.h>

int main(){
        char *ptr = NULL;
        *ptr = 'a';

        return 0;
}

Original behaviour

  1. Run system emulation
$ make system ENABLE_SYSTEM=1 -j$(nproc)
  1. Emulator crashes
$ mem_null_read
  1. Emulator crashes
$ mem_null_write
  1. Emulator crashes
$ vi

Patch Reproduce / Testing procedure:

  1. Run system emulation
$ make system ENABLE_SYSTEM=1 -j$(nproc)
  1. NULL read causes SIGSEGV without crashing emulator
$ mem_null_read
  1. NULL write causes SIGSEGV without crashing emulator
$ mem_null_write
  1. w/o filename causes SIGSEGV without crashing emulator
$ vi

[1] #508

vi Crach Discussion

Running vi without a filename results in a NULL pointer access (behavior that is implementation-defined), correctly triggering a segmentation fault. Dmesg confirms that the faulting address is indeed NULL. However, running vi without a filename in RVVM and semu does not produce a SIGSEGV fault. My investigation revealed that semu simply increments the program counter (PC) by 4, effectively skipping the faulting instruction. I'm unsure if this is a valid approach. What I know is that read/write access to NULL should generate a non-maskable SIGSEGV signal, consistent with standard Unix signal.

Dmesg after running vi (getting SIGSEGV instead of crashing w/ the patch):

[   10.374478] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   10.374812] Oops [#2]
[   10.374962] Modules linked in:
[   10.375152] CPU: 0 PID: 61 Comm: vi Tainted: G      D            6.1.119 #1
[   10.375504] Hardware name: rv32emu (DT)
[   10.375696] epc : strncpy_from_user+0x6c/0x190
[   10.375986]  ra : getname_flags+0x74/0x194
[   10.376269] epc : c01bd3a4 ra : c00ba71c sp : c0989ea0
[   10.376553]  gp : c0476320 tp : c0aa0580 t0 : 00000ff0
[   10.376831]  t1 : fefefeff t2 : 691a2420 s0 : c0989eb0
[   10.377115]  s1 : c0882000 a0 : 00000000 a1 : 00000000
[   10.377377]  a2 : 00000ff0 a3 : 00000000 a4 : 00000000
[   10.377632]  a5 : 00000ff0 a6 : 0000146d a7 : c0882010
[   10.377904]  s2 : c0989f38 s3 : 00000000 s4 : 00000000
[   10.378167]  s5 : c047752c s6 : 00000000 s7 : 00000000
[   10.378429]  s8 : 00001000 s9 : 00000002 s10: 00000014
[   10.378688]  s11: ffffffff t3 : 80808080 t4 : 00040000
[   10.378968]  t5 : 00000005 t6 : 00000ff0
[   10.379174] status: 00040120 badaddr: 00000000 cause: 0000000d
[   10.379455] [<c01bd3a4>] strncpy_from_user+0x6c/0x190
[   10.379790] [<c00ba71c>] getname_flags+0x74/0x194
[   10.380117] [<c00ba88c>] getname+0x1c/0x2c
[   10.380421] [<c00a9f7c>] do_sys_openat2+0x4c/0xf0
[   10.380776] [<c00aa11c>] do_sys_open+0x40/0x58
[   10.381122] [<c00aa194>] sys_openat+0x24/0x34
[   10.381467] [<c00023c0>] ret_from_syscall+0x0/0x4
[   10.381788] ---[ end trace 0000000000000000 ]---
Segmentation fault
#

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks

Benchmark suite Current: 8e08387 Previous: f554a00 Ratio
Dhrystone 1334 Average DMIPS over 10 runs 1287 Average DMIPS over 10 runs 0.96
Coremark 970.171 Average iterations/sec over 10 runs 962.761 Average iterations/sec over 10 runs 0.99

This comment was automatically generated by workflow using github-action-benchmark.

src/emulate.c Show resolved Hide resolved
@ChinYikMing ChinYikMing force-pushed the handle-signal branch 2 times, most recently from 6a5733b to b4c467f Compare December 27, 2024 03:39
src/system.c Outdated Show resolved Hide resolved
@ChinYikMing ChinYikMing force-pushed the handle-signal branch 3 times, most recently from c9510ce to 6542e92 Compare December 27, 2024 04:10
@jserv jserv added this to the release-2025.1 milestone Dec 27, 2024
src/system.c Outdated
} \
} while (0)

#define CLEAR_PENDING_SIGNAL_W() \
Copy link
Collaborator

@RinHizakura RinHizakura Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we just introduce these CLEAR_PENDING_SIGNAL_* macros for repeated code? I think the readability is bad to return in a macro that is used as a function-like call routine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I think we can remain return in CLEAR_PENDING_SIGNAL_* and mention they might early return using a comment at the call point of CLEAR_PENDING_SIGNAL_*?

Copy link
Collaborator

@RinHizakura RinHizakura Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any disadvantage to removing these macros and just putting them inline? It can be just a few lines.

if (need_handle_signal = (rv->csr_sepc != rv->last_csr_sepc))
    return 0;

...

if (need_handle_signal = (rv->csr_sepc != rv->last_csr_sepc))
    return;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any disadvantage to removing these macros and just putting them inline? It can be just a few lines.

if (need_handle_signal = (rv->csr_sepc != rv->last_csr_sepc))
    return 0;

...

if (need_handle_signal = (rv->csr_sepc != rv->last_csr_sepc))
    return;

Looking these few LoC is hard to understand (maybe for newcomer), thus abstracting them into a macro and comment around the macro make things clearer.

Copy link
Collaborator

@RinHizakura RinHizakura Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about inline this directly?

need_handle_signal = (rv->csr_sepc != rv->last_csr_sepc);
if (need_handle_signal)
    return 0;

You can replace need_handle_signal = (rv->csr_sepc != rv->last_csr_sepc) with an abstract if you think it is needed. I am just thinking that we should avoid return in a function-like macro as possible because it is not clear that the function will return there at first glance.

@vacantron
Copy link
Collaborator

Should we stack the exception handling (i.e. nested exception handling)? By default, RISC-V does not accept any incoming interrupt in the interrupt service routine (ISR) by disabling m/sie, however, there is no same mechanism when handling the exceptions.

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Dec 28, 2024

Should we stack the exception handling (i.e. nested exception handling)?

Do you mean nested signals or others since the changes in this commit is signals?

By default, RISC-V does not accept any incoming interrupt in the interrupt service routine (ISR) by disabling m/sie

Yes.

however, there is no same mechanism when handling the exceptions.

If you mean regular exceptions, we did store current sie to spie then turning of sie when trap. Check here.

@vacantron
Copy link
Collaborator

Do you mean nested signals or others since the changes in this commit is signals?

Yes. There might be more than two-level nested signals. The oldest m/sepc might be forgotten if we only store two of them.

@ChinYikMing
Copy link
Collaborator Author

Do you mean nested signals or others since the changes in this commit is signals?

Yes. There might be more than two-level nested signals. The oldest m/sepc might be forgotten if we only store two of them.

The signal is a userspace stack frame (refer to the commit message and the flowchart) that is created by the kernel. Thus, when multiple signals arrive, the kernel simply creates multiple signal frames and redirects the PC to handle the signal at a safe point. The return address of old signal frame will be saved in the struct sigcontext and restored via sys_sigreturn or sys_rt_sigreturn.

@ChinYikMing ChinYikMing force-pushed the handle-signal branch 3 times, most recently from 8ee7304 to cb5eb10 Compare December 30, 2024 02:10
src/system.c Outdated
* cannot always redo. For example, invalid memory access causes SIGSEGV.
*/
extern bool need_handle_signal;
#define CHECK_PENDING_SIGNAL() \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify rv parameter in this function-like macro. That is,

#define CHECK_PENDING_SIGNAL(rv) \
    ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using rv, it may be more appropriate to use need_handle_signal as the parameter of the macro. This is because rv has only a single instance, and changes to its members are part of the macro's internal logic.

On the other hand, need_handle_signal is explicitly modified by the macro and directly determines whether the function shall return. This approach improves intuitiveness and readability. Consider the following revised version:

#define CHECK_PENDING_SIGNAL(flag)        \
do {                                                                 \
    flag = (rv->csr_sepc != rv->last_csr_sepc); \
} while (0);

Page faults trigger a trap, which is handled by do_page_fault(). This
function calls lock_mm_and_find_vma() to locate and validate the virtual
memory area (VMA), returning the VMA if valid, or NULL otherwise.
Typically, attempts to read or write to a NULL VMA result in a NULL
return. If the VMA is invalid, bad_area_nosemaphore() is invoked, which
checks whether the fault originated in kernel or user space.

For user-space faults, a SIGSEGV signal is sent to the user process via
do_trap(), which determines if the signal should be ignored or blocked,
and if not, adds it to the task's pending signal list. Kernel-space
faults cause the kernel to crash via die_kernel_fault().

Before returning to user space (via the resume_userspace label), pending
work (indicated by the _TIF_WORK_MASK mask) is processed by
do_work_pending(). Signals are handled by do_signal(), which in turn
calls handle_signal(). handle_signal() creates a signal handler frame
that will be jumped to upon returning to user space. This frame creation
process might modifies the Control and Status Register (CSR) SEPC.
If there are a signal pending, the SEPC CSR overwritten the original
trap/fault PC. This caused an assertion failure in get_ppn_and_offset()
when running the vi program, reported in [1].

To address this, a variable last_csr_sepc was introduced to store the
original SEPC CSR value before entering the trap path. After returning
to user space, last_csr_sepc is compared with the current SEPC CSR
value. If they differ, the fault ld/st instruction returns early and
jumps to the signal handler frame.

This commit prevents emulator crashes when the guest OS accesses invalid
memory. Consequently, reads or writes to a NULL value now correctly
result in a segmentation fault. In addition, two user-space programs:
mem_null_read and mem_null_write are bundled into the rootfs for
verification.

Original behaviour:
1. $ make system ENABLE_SYSTEM=1 -j$(nproc)   # run system emulation
2. $ mem_null_read                            # Emulator crashes
3. $ mem_null_write                           # Emulator crashes
4. $ vi                                       # Emulator crashes

Patch Reproduce / Testing procedure:
1. $ make system ENABLE_SYSTEM=1 -j$(nproc)   # run system emulation
2. $ mem_null_read       # NULL read causes SIGSEGV with no crashing
3. $ mem_null_write      # NULL write causes SIGSEGV with no crashing
4. $ vi                  # w/o filename causes SIGSEGV with no crashing

[1] sysprog21#508
@jserv jserv merged commit 5a04f3a into sysprog21:master Dec 30, 2024
8 checks passed
@jserv
Copy link
Contributor

jserv commented Dec 30, 2024

Thank @ChinYikMing for contributing!

@ChinYikMing ChinYikMing deleted the handle-signal branch December 30, 2024 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants