Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

116 add shared memory support for l0 path #390

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

lmontigny
Copy link
Contributor

No description provided.

@lmontigny lmontigny linked an issue Apr 17, 2023 that may be closed by this pull request
@lmontigny
Copy link
Contributor Author

Facing issue with the RuntimeFunctions.bc file for L0:

$ ./Tests/GpuSharedMemoryTestIntel
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SingleColumn
[ RUN ] SingleColumn.VariableEntries_CountQuery_4B_Group
warning: Linking two modules of different target triples: '/localdisk/lmontign/hdk/omniscidb/build/QueryEngine/RuntimeFunctions.bc' is 'nvptx64-nvidia-cuda' whereas '/localdisk/lmontign/hdk/omniscidb/build/QueryEngine/RuntimeFunctions.bc' is 'spir-unknown-unknown'
InvalidTargetTriple: Expects spir-unknown-unknown or spir64-unknown-unknown. Actual target triple is nvptx64-nvidia-cuda

@lmontigny
Copy link
Contributor Author

Solved previous .bc mismatch.
Now addr space casting issue:
image

@lmontigny
Copy link
Contributor Author

Casting issue still going on: %4 = addrspacecast i64* %3 to i64 addrspace(3)*

Fail to generate spri-v here:

 std::unique_ptr<L0DeviceCompilationContext> gpu_context(compile_and_link_gpu_code(
	      module_str, module_, l0_mgr_, getWrapperKernel()->getName().str()))
	
	 auto success = writeSpirv(module, opts, ss, err)

Unclear where the casting is generated in the application.
Not related to CreatePointerCast, need to double check GpuSharedMemoryUtils.cpp

@lmontigny
Copy link
Contributor Author

Casting issue for shared memory is happening here

with address_space = 3

  auto ptr_type = [&context](const size_t slot_bytes, const hdk::ir::Type* type) {
    if (slot_bytes == sizeof(int32_t)) {
      return llvm::Type::getInt32PtrTy(context, /*address_space=*/3);
    } else {
      CHECK(slot_bytes == sizeof(int64_t));
      return llvm::Type::getInt64PtrTy(context, /*address_space=*/3);
    }
    UNREACHABLE() << "Invalid slot size encountered: " << std::to_string(slot_bytes);
    return llvm::Type::getInt32PtrTy(context, /*address_space=*/3);
  };

  const auto casted_dest_slot_address = ir_builder.CreatePointerCast(
      ir_builder.CreateGEP(
          dest_byte_stream->getType()->getScalarType()->getPointerElementType(),
          dest_byte_stream,
          byte_offset),
      ptr_type(slot_bytes, type),
      "dest_slot_adr_" + std::to_string(slot_idx));
  return casted_dest_slot_address;
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add shared memory support for L0 path
1 participant