Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement memory function lowering and HuC6280 block copy handling #311

Merged

Conversation

asiekierka
Copy link
Contributor

@asiekierka asiekierka commented Jul 1, 2023

My first look at the Legalizer pass. Resolves #207

Also adds support for inlining G_MEMCPY and G_MEMSET using HuC6280's TII block copy instructions, complete with all the special handling this needs to not be too miserable an experience for the end user.

TODO:

  • Learn more about the Combiner pass and decide if HuC block copy emitting should continue to live in a separate pass of its own.
  • Write tests.
  • Give the code one more look.

TODO (after this PR):

  • Implement G_MEMMOVE HuC6280 lowering - only when are known at compile time and can either be proven not to overlap or the overlap direction can be discerned.

Notes:

  • Memory instructions have sub-optimal/curious lowering for volatile copies - they are emitted as batches of G_LOAD followed by G_STORE, which prohibits interleaving them by the MachineScheduler. More interestingly, this means that a memcpy(volatile, volatile, 4); (lowered) will read and write values in a different order than memcpy(volatile, volatile, 3); (libcall).

@asiekierka asiekierka force-pushed the upstream/pr/memcpy-inline branch 2 times, most recently from 4d6f83a to 9df5c7f Compare July 1, 2023 19:16
@mysterymath
Copy link
Member

Memory instructions still have bad lowering - they are emitted as batches of G_LOAD followed by G_STORE, which leads to the register allocator putting all of them in the zero page.

This is very surprising; after instructions are selected for these, the MachineScheduler should interleave them to reduce the register pressure. (We don't have a pipeline, so MachineScheduler is the dedicated "reorder instructions to decrease register pressure" pass.) We should definitely look into why it isn't; there may be something slightly different we can emit to get it to do so, or we may be hitting a limitation of this pass that we can relax.

@asiekierka
Copy link
Contributor Author

asiekierka commented Jul 1, 2023

Ah, I see why.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define L 4
#define a ((volatile uint8_t*) 0x2200)
#define b ((volatile uint8_t*) 0x2300)

int main() {
  memcpy(b, a, L);
  return 0;
}

The key is the volatile - normally, this wouldn't matter for a memcpy call, but when InstCombinerPass replaces such a memcpy with an s32 G_LOAD/G_STORE, it preserves the volatile attribute of the pointer - it is also preserved when the G_LOAD/G_STORE is legalized into multiple s8 G_LOAD/G_STOREs.

So the issue happens at the Legalizer pass, and in what is a fairly narrow edge case.

(This also means a memcpy(volatile, volatile, 3); will load and store in a different order than memcpy(volatile, volatile, 4);, which is an interesting side effect.)

@asiekierka
Copy link
Contributor Author

I've updated the notes in the PR comment to clarify the situation. Sorry!

@asiekierka asiekierka force-pushed the upstream/pr/memcpy-inline branch from 8b6961d to b8e5bd3 Compare July 1, 2023 21:45
@asiekierka asiekierka force-pushed the upstream/pr/memcpy-inline branch from b8e5bd3 to 39e7468 Compare July 2, 2023 05:04
@asiekierka asiekierka force-pushed the upstream/pr/memcpy-inline branch from 39e7468 to a2b36d7 Compare July 2, 2023 06:05
@asiekierka
Copy link
Contributor Author

Okay, everything should be taken care of now! (Probably "almost everything", but.)

@asiekierka asiekierka marked this pull request as ready for review July 6, 2023 19:11
Copy link
Member

@mysterymath mysterymath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through everything except the legalizer changes; I'll tackle that a bit later.

llvm/lib/Target/MOS/MOSCombiner.cpp Outdated Show resolved Hide resolved
auto LengthReg = MRI.createGenericVirtualRegister(LLT::scalar(16));
B.buildConstant(LengthReg, Length);

B.buildInstr(MOS::G_MEMCPY_INLINE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B.buildMemTransferInst(MOS::G_MEMCPY_INLINE, MI.getOperand(1), MIPrev->getOperand(1), B.buildConstant(LLT::scalar(16), Length), DstMemOperand, SrcMemOperand);

Copy link
Contributor Author

@asiekierka asiekierka Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd think, but:

llc: [...]/llvm-mos/llvm/lib/CodeGen/MachineInstr.cpp:228: void llvm::MachineInstr::addOperand(llvm::MachineFunction &, const llvm::MachineOperand &): Assertion `(MCID->isVariadic() || OpNo < MCID->getNumOperands() || Op.isValidExcessOperand()) && "Trying to add an operand to a machine instr that is already done!"' failed.

...

 #1 0x00007fc48cee7bbb PrintStackTraceSignalHandler(void*) /home/asie/Homebrew/MOS/llvm-mos/llvm/lib/Support/Unix/Signals.inc:675:1
 #2 0x00007fc48cee5ac3 llvm::sys::RunSignalHandlers() /home/asie/Homebrew/MOS/llvm-mos/llvm/lib/Support/Signals.cpp:104:5
 #3 0x00007fc48cee8451 SignalHandler(int) /home/asie/Homebrew/MOS/llvm-mos/llvm/lib/Support/Unix/Signals.inc:0:3
 #4 0x00007fc48944fab0 (/usr/lib/libc.so.6+0x39ab0)
 #5 0x00007fc48949f26c (/usr/lib/libc.so.6+0x8926c)
 #6 0x00007fc48944fa08 raise (/usr/lib/libc.so.6+0x39a08)
 #7 0x00007fc489438538 abort (/usr/lib/libc.so.6+0x22538)
 #8 0x00007fc48943845c (/usr/lib/libc.so.6+0x2245c)
 #9 0x00007fc4894483d6 (/usr/lib/libc.so.6+0x323d6)
#10 0x00007fc48d8a97b8 llvm::MachineInstr::addOperand(llvm::MachineFunction&, llvm::MachineOperand const&) /home/asie/Homebrew/MOS/llvm-mos/llvm/lib/CodeGen/MachineInstr.cpp:0:3
#11 0x00007fc48d650c46 llvm::MachineInstrBuilder::addImm(long) const /home/asie/Homebrew/MOS/llvm-mos/llvm/include/llvm/CodeGen/MachineInstrBuilder.h:132:9
#12 0x00007fc48e5f5377 llvm::SrcOp::addSrcToMIB(llvm::MachineInstrBuilder&) const /home/asie/Homebrew/MOS/llvm-mos/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:0:11
#13 0x00007fc48e5f2c07 llvm::MachineIRBuilder::buildInstr(unsigned int, llvm::ArrayRef<llvm::DstOp>, llvm::ArrayRef<llvm::SrcOp>, std::optional<unsigned int>) /home/asie/Homebrew/MOS/llvm-mos/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp:1305:24
#14 0x00007fc48e4aeb07 llvm::CSEMIRBuilder::buildInstr(unsigned int, llvm::ArrayRef<llvm::DstOp>, llvm::ArrayRef<llvm::SrcOp>, std::optional<unsigned int>) /home/asie/Homebrew/MOS/llvm-mos/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:278:30
#15 0x00007fc48e4be2dc llvm::MachineIRBuilder::buildMemTransferInst(unsigned int, llvm::SrcOp const&, llvm::SrcOp const&, llvm::SrcOp const&, llvm::MachineMemOperand&, llvm::MachineMemOperand&) /home/asie/Homebrew/MOS/llvm-mos/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1989:16
#16 0x00007fc490f28e2a MOSCombinerHelperState::applyLoadStoreToMemcpy(llvm::MachineInstr&, llvm::MachineRegisterInfo&, llvm::MachineIRBuilder&, llvm::GISelChangeObserver&, llvm::MachineInstr*&) const /home/asie/Homebrew/MOS/llvm-mos/llvm/lib/Target/MOS/MOSCombiner.cpp:380:5```

llvm/lib/Target/MOS/MOSCombiner.cpp Outdated Show resolved Hide resolved
auto LengthReg = MRI.createGenericVirtualRegister(LLT::scalar(16));
B.buildConstant(LengthReg, Length);

B.buildInstr(MOS::G_MEMSET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B.buildInst(MOS::G_MEMSET, {MI.getOperand(1)}, {B.buildConstant(LLT::scalar(8), Value), B.buildConstant(LLT::scalar(16), Length)}, UINT64_C(0)}).addMemOperand(MI.memoperands(0)[0]));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, but close.

llvm/lib/Target/MOS/MOSCombiner.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSInstrLogical.td Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSMCInstLower.cpp Outdated Show resolved Hide resolved
llvm/test/CodeGen/MOS/combiner.mir Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/MOS/MOSLegalizerInfo.cpp Outdated Show resolved Hide resolved
// Note that Descending transfers must be done in backwards order.
if (KnownLen <= BytesPerTransfer) {
uint64_t AdjOfs = Descending ? (KnownLen - 1) : 0;
Builder.buildInstr(MOS::HuCMemcpy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, these instructions load and store memory, so they need their own memory operands, based on those from the original generic instruction. Otherwise, code motion passes will pessimize the aliasing possibilities of the instruction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think?

@asiekierka asiekierka force-pushed the upstream/pr/memcpy-inline branch from 7a9cc41 to 7c53034 Compare July 7, 2023 13:56
@asiekierka asiekierka force-pushed the upstream/pr/memcpy-inline branch from 11b500f to 3f4f5c9 Compare July 7, 2023 14:26
@mysterymath mysterymath merged commit dbaa6ec into llvm-mos:main Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mem-family instructions have terrible lowering
2 participants