( turul16 | 2025. 03. 24., h – 12:39 )

16x Configured Memory Speed: 5200 MT/s ~ 665 GB/sec nevleg
2x INTEL(R) XEON(R) GOLD 6542Y CPU @ 2.9GHz

Neked mi volt a DDR5 config ?

1.58 bittes model megvalaszolta eddig azt amit az online, mondjuk csak par test ..
Ebbe sokkal nagyobb nem fer be.

16 thread:

llama_perf_context_print: eval time = 686830.58 ms / 2389 runs ( 287.50 ms per token, 3.48 tokens per second)

reported memory bandwidth usage 80GB/sec. (intel has pcm)

46 thread 120GB/sec .

32 thread:
llama_perf_context_print: eval time = 449517.01 ms / 2389 runs ( 188.16 ms per token, 5.31 tokens per second)

Valami mas mint csak a momoria bandwith a gond, allitolag rendes threadpool nem segitett.
Viszanezve zen4/zen5 test csak prompt processing volt, nem talalok oszehasonlito token generalo testet,
Ha token generalas kell akkor zen5 valoszinuleg nem akkora ugras ;-(
llama.cpp nem skalazodik CPU-n, mas CPU- kat is nezegettem , nagyobb core szamu teljesen mas CPU -k nal is megalt skalazodas, hiaba volt memoria bandwith.

szerk:
llama-3.3-70B-instruct-Q4_K_M tud 203GB/sec -t hasznalni. 4.54 t/s . 80GB RES , 38.7 SHR
llama-3.3-70B-instruct-Q8 244GB/sec -t hasznalni. 2.81 t/s .
llama-3.3-70B-instruct-f16 232GB/sec 1.59 t/s

3x UPI nem hinnem hogy kozel vagyunk a szuk keresztmetszethez.
mbw 46 core memcopy: 361 GB/sec
massive random access 46 core: 392 Gb/sec (jo esely a row missre, ill nincs cache write trough, read+write)
massive random read 46 core, 481GB/sec (jo esely a row missre, nincs write <100MB/s)