-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtodo.txt
56 lines (48 loc) · 2.18 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
- implement cache-based profiling (how)
- keep data per-sm (simt core)
-- per address, reference
< SM 0, kernel 1 : addr1 ref, addr2 ref, ...addrn ref
kernel 2 : addr1 ref, addr2 ref, ...addrn ref
...
kernel n : addr1 ref, addr2 ref, ...addrn ref >
<SIMT CORE>
<kernel 1: addr1 ref, addr2 ref, addrn ref
<kernel 2:
<kernel 3:
...
<kernel n:>
array of structs? // vector of structs?
struct block_ref {
unsigned long long address;
unsigned int count;
}
std::map<unsigned long long, int> l1d_cache_references_per_block;
//how to keep the data for per kernel?
//m_core[i] in shader.cc
//each core calls a second kernel after completing a cycle
//store map with 2 keys?
while kernel.l1d_access(addr){
l1d_cache_references_per_block[addr]++;
}
- store this data in a file somewhere
This is nested data, how do you store it?
- Core > Kernel > Refs
- core, kernel, ref, count
- load this data before doing a rerun
- All the SIMT core related stuff is implemented via shader_core_ctx.cc/h. So to keeo track of per-SIMT Core stats, you want to allocate the profiler/data structure in the shader_core_ctx.cc
- print <core_id, kernel_id, addref, count>
<m_core->sid,
- in ldst_unit::cycle, memory_cycle, you want to implement the cache bypassing based on the logic.
// calling within the ldst_unit
SIMTCore Id m_core->get_sid(),
Address Ref access.get_addr(),
Kernel ID m_core->get_kernel()->get_uid(),
Reference Count m_core->addr_ref_stat_per_simt_core[access.get_addr()]);
// implement a method or something to store the data for a shader_core to a file
-[x] find where all the stats related to shader_core are being calculated and stored
-[x] Finding someplace after the simulation is over and grabbing all the stats from shader_core
stats is quite difficult. I am looking at gpu-sim.cc for examples but unsure if there
is any there. Looks like we may have to do an intermediate step to calculate all for the cluster?
and collect from the cluster. I am not sure if m_stats is protected
- Load up the hardcoded file before starting the execution
// implement a logic to grab the data from the said file