Consider creating package for calculating perplexity for llms. Test with NanoGPT / Custom Local Models. Expand capabilities to other accuracy or performance metrics like BLUE, ROGUE, Accuracy.