[llama] Store KV Cache on CPU and Use PyTorch SPDA
for Next token generation
#1182
+187
−71
SPDA
for Next token generation
#1182