The main reason I ask, is because my current favorite model is a Llama 2 70B Q4_1 GGML model quantized by The Bloke. Here's the thing though, it was labeled as "Instruct" but it defaults to chat in settings in Oobabooga/Textgen. Every other model I have tried to use for technical help and python/bash snippets has failed to meet my expectations for (skeptically acceptable) accuracy. This 70B is powerful enough that I can prompt it to generate code snippets, and if the code creates an error, by pasting the error into the prompt, it almost always generates a solution in a single correction. Other models I have tried to use this paste-error technique on often crash, 'dig in their heels' insisting they are correct, or fail in several different ways like over fitting that forces resetting context tokens.
For whatever reason, the specific 70B model I am using has far exceeded my expectations, but I must use it with very specific conditions in Oobabooga/Textgen. It must be set to: chat, llama.cpp, the "divine intellect" perimeter preset, and the character profile set to the default of "None."
For whatever reason, deviation from these settings ruins the accuracy of code snippets. Speculatively/intuitively, if I try to use the instruct prompt, or a new persistent character profile, it seems like there is an issue in the way the previous context is handled. In a single session the context seems to drift. In any case, code seems to always have errors and paste corrections fail.
I can't contextualize this issue with such large models. I have had the same issues with smaller models regardless of settings I have tried. I have written or modified a dozen scripts between bash and python using this 70B in chat mode. It is a bit of a pain because the prompt input/output is not proper markdown for code so I have to correct for whitespace scope and have a reasonable understanding of the code syntax, but for the most part, I don't need to make corrections to specific lines of output. Is this rare, an issue/quirk with: the model quantization, llama.cpp, Textgen, other? Has anyone else experienced something like this? Am I just super lucky to have found a chance combination that works really well at snippets combined with my prompting/coding skill level? I haven't had much success with the code specific LLMs either. I'm not sure why this model is doing so well for me.