NVIDIA / cutlass Public

Notifications You must be signed in to change notification settings
Fork 1k
Star 6k

Code
Issues 205
Pull requests 32
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Issues: NVIDIA/cutlass

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

205 Open 1,011 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[QST] Terminology question on GMMA::ScaleOut::One ? - Needs Triage question

Question

#2046 opened Jan 17, 2025 by haeunlee99

[FEA] Does it supports quantization-matrix-mul? ? - Needs Triage feature request

New feature or request

#2044 opened Jan 17, 2025 by bianxuxuxu

[BUG][QST] Hopper Grouped GEMM Fails When Workspace not aligned at 64, but MinWorkspaceAlignment =16 ? - Needs Triage bug

Something isn't working

#2042 opened Jan 16, 2025 by ankutalev

[BUG] Modifying the block/warptile shapes and the output datatype in the unit test causes the tests to fail. ? - Needs Triage bug

Something isn't working

#2041 opened Jan 16, 2025 by xiaonans

[QST] link invalid in efficient_gemm.md ? - Needs Triage question

Question

#2038 opened Jan 13, 2025 by unship

[QST]Question about the picture in documentation Efficient GEMM in CUDA ? - Needs Triage question

Question

#2034 opened Jan 9, 2025 by sleepwalker2017

[BUG] Logic issue in nondeterministic reduction mode of Stream-K tile scheduler. ? - Needs Triage bug

Something isn't working

#2027 opened Jan 7, 2025 by allispaul

[QST] What is API version compatibility? ? - Needs Triage question

Question

#2025 opened Jan 6, 2025 by ZzEeKkAa

[QST] why have Int<2>{} in coalesce_x function when last shape value equal to constant one. ? - Needs Triage question

Question

#2023 opened Jan 5, 2025 by Shan19900305

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example? ? - Needs Triage question

Question

#2022 opened Jan 5, 2025 by danielhua23

[BUG] Memory corruption/undefined behavior on GemmUniversal in 3.4.0 - 3.6.0 🐛 ? - Needs Triage bug

Something isn't working

#2017 opened Dec 28, 2024 by warpuv

[QST]Why Does CUTLASS Use 3-4-3 Swizzle? ? - Needs Triage question

Question

#2015 opened Dec 27, 2024 by ziyuhuang123

[BUG] [QST] Regression - why Sm90RowBroadcast in 3.5.1 stops support smem usage? ? - Needs Triage bug

Something isn't working

#2010 opened Dec 23, 2024 by ankutalev

[BUG] Removal of OpMultiplyAdd template substitutions from mma_sm80.h in 3.5.1 ? - Needs Triage bug

Something isn't working

#2009 opened Dec 23, 2024 by ankutalev

[QST]How Does TMA Work in CUTLASS for Writing from Shared Memory to Global Memory? ? - Needs Triage question

Question

#2008 opened Dec 23, 2024 by ziyuhuang123

[BUG] wmma should be enabled w/ clang. ? - Needs Triage bug

Something isn't working

#2006 opened Dec 20, 2024 by Artem-B

[BUG] Unaligned access in test/unit/gemm/threadblock/batched_gemv.cu ? - Needs Triage bug

Something isn't working

#2003 opened Dec 19, 2024 by Artem-B

[QST]Behavior of TMA Store and Wait Mechanism in CUTLASS ? - Needs Triage question

Question

#2002 opened Dec 19, 2024 by ziyuhuang123

[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? ? - Needs Triage question

Question

#2001 opened Dec 19, 2024 by ginowu

[Proposal] layout deduction ambiguity of Nested Layout Access Problem ? - Needs Triage bug

Something isn't working

#2000 opened Dec 18, 2024 by yiakwy-xpu-ml-framework-team

[QST]Is the Key Difference Between mbarrier and barrier Their Handling of Producer-Consumer Count? ? - Needs Triage inactive-30d question

Question

#1999 opened Dec 18, 2024 by ziyuhuang123

[QST]How to Handle Synchronization with Different Thread Counts for Producer and Consumer in CUTLASS? ? - Needs Triage inactive-30d question

Question

#1998 opened Dec 18, 2024 by ziyuhuang123

[BUG] calling cast_smem_ptr_to_uint(device fn) from make_gmma_desc(host device fn) is not allowed ? - Needs Triage bug

Something isn't working

inactive-30d

#1997 opened Dec 18, 2024 by lygztq

[QST] Gemm got 'incomplete type is not allowed' when use Sm90 ? - Needs Triage inactive-30d question

Question

#1996 opened Dec 18, 2024 by TopIdiot

[QST] custom kernel integrated in Pytorch ? - Needs Triage inactive-30d question

Question

#1991 opened Dec 16, 2024 by IzanCatalan

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly