Terminology
LLMs can have billions of parameters (unknown quantities in the model).
An LLM is trained (model parameters are determined) using a very large amount of text data.
A pre-trained LLM has already been trained. This process allows the model to “learn” the language.
A fine-tuned LLM is a pre-trained LLM that then is further trained on additional data for a specific task. Model parameters are updated in the fine-tuning process.
Inference is the process in which a trained (or fine-tuned) LLM makes a prediction for a given input.
The input to an LLM is a sequence of tokens (words, characters, subwords, etc.)
The batch size for an LLM is the number of sequences passed to the model at once.
Generally, raw text is passed through a tokenizer which processes it into tokens and sequences and then numerical data which can be sent to the LLM.