Terminology
The following are common terms used when working with UVA RC GenAI and LLMs in general.
Core Units
-
Token - The basic unit of text LLMs process. Can be a whole word, part of a word, or a single character. Roughly 100 tokens ≈ 75 words. All usage, limits, and costs are measured in tokens.
-
Prompt - The input text you send to the model. Prompts can be a question, instruction, or conversation history. A prompt is measured in tokens.
-
System Prompt - A hidden prompt that sets the model’s behavior rules for the entire conversation—such as tone, role, or safety constraints. It frames how the model interprets all subsequent user prompts.
Capacity Limits
-
Context Window - The maximum number of tokens (input + output combined) a model can consider at one time.
-
Batched Token Length - The total tokens across multiple requests sent together as a single batch. Batching can improve efficiency; this is the sum of all tokens in that group.
Generation Parameters (API access only)
-
Temperature - Controls output randomness. Lower values (0.1-0.3) produce more focused, deterministic responses; higher values (0.8-1.0) produce more varied and creative outputs.
-
Seed - An integer that initializes the random number generator. Using the same seed with identical prompts and settings produces reproducible outputs.
Usage and Operations
-
Metadata - Non-content information about an API exchange: timestamp, model version, token counts, request ID, cost.
-
Rate Limiting - Restrictions on tokens or requests per time period to prevent overload and ensure fair access.