What is prompting?
Prompting is the practice of representing a task as a natural language utterance in order to query a language model for a
... [Show More] response.
Prompt Engineering
"is the process of creating a prompting function that results in the most effective performance on the downstream task."
Manual Prompting
▶ an intuitive art
▶ limited search capacity, but
best practice exist
▶ often developed from probing scenario
▶ humans might fail
Automatic Prompting
▶ hard (discrete) prompts are textual prompts
▶ various search techniques based on texts, heuristics or gradients
▶ continuous techniques use the underlying vector space of an LLM
Llama/GPT Index▲: Fighting Hallucinations by Supplying Specific Information
▶ Issue: Scarce, vague input elicits hallucinations
▶ Solution: Inform the LLM better by ingesting your data, documents, knowledge into the prompt
Retrieval-Augmented Generation
Semantic search in a vector space combines information retrieval with abstractive question answering.
Explanations for Typical LLM Generation Hyperparameters
▶ Temperature ▲ comes from thermodynamics. Low temperatures make high probabilities of continuations even more probable, temperatures larger than 1 make probabilities more uniform. We are cooking the SoftMax:-) 0.2 serves more consistent output; 1 gives more creative and divers results.
▶ Nucleus Sampling Top P: A widely used technique to balance coherence and diversity of text generation. A subset of most probable next tokens (the nucleus) is selected that exceeds a cumulative probability mass P. After renormalization of nucleus probabilities, a token is sampled from this distribution.
Prompting vs Fine-Tuning in Very Large Language Models (VLLMs)
▶ Fine-Tuning for VLLMs such as GPT-3.5 or Llama 2 is actually providing prompt/response training pairs
▶ Good prompts vom few-shot learnings are good prompts for fine-tuning!
▶ Fine-Tuning principle: "Show, don't tell" — especially if telling what to do is cumbersome/difficult
Reasons for Fine-Tuning VLLMs
▶ Prompting results are not satisfying (regarding content or style)
▶ Needed number of examples for In-Context-Learning is high: Exceeding the context window (typically 2-8k), or :spending too much money at inference time due to long prompts (although input tokens are less than half the price than output tokens)
▶ Lower latency for requests needed (saving time)
Tuning,Freezing,Prompting
Fine-tuning can lead to catastrophic forgetting. In-context-learning can be slow/expensive at test time. Fixed-prompt is a good compromise for few-shot.
RAG goal?
Tries to prevent LLM from hallucinations
Prompt-tuning
Definition: Prompt tuning involves adding a natural language prompt or instructions at the beginning of the input text to guide the model's predictions. The idea is to prepend a few examples or a task description to the input to 'nudge' the model towards the desired output.
Parameters: Typically, no model parameters are updated in prompt tuning. Instead, the creativity lies in designing effective prompts.
Advantages:Flexibility: Can be easily customized for various tasks without any training.Efficiency: Doesn't require additional computational resources for training.
Drawbacks:Dependence on Prompt Quality: The performance heavily relies on the quality and design of the prompt.Limited Control: It offers less control over the model's behavior compared to methods that involve tuning the model's parameters.
Prefix-tuning
Definition: Prefix tuning is a parameter-efficient method where a sequence of continuous vectors (prefixes) is prepended to the input embeddings at each layer of the transformer model. These vectors are learned during a training phase and are task-specific, guiding the model's subsequent layers to produce desired outputs.
Parameters: The prefix vectors are the only parameters that are learned during the training phase, while the rest of the model's parameters remain frozen.
Advantages:Parameter Efficiency: Only a small set of parameters (prefixes) need to be trained, making it much more efficient than full model fine-tuning.Better Control: Offers more control over the model's behavior compared to prompt tuning, as the prefixes directly interact with the model's internal representations.
Drawbacks:Need for Training: Requires a separate training phase to learn the prefixes, which can be computationally expensive compared to prompt tuning.Task Specific: The learned prefixes are typically task-specific, so a new set of prefixes may need to be trained for each new task or domain.
Promt vs Prefix tuning
prompt tuning is a quick and flexible way to adapt a language model to a new task using carefully crafted text prompts (more param. efficient), while prefix tuning involves a more involved process of training task-specific continuous vectors to guide the model, offering more control but at the cost of additional computation during the training phase.
Pros and Cons of Adapters
Adapters are a method for fine-tuning pre-trained deep learning models. Instead of fine-tuning all the parameters of a model, adapters only train a small set of parameters inserted between the layers of the original model.
Pros of Adapters:
Parameter Efficiency: Adapters train only a small number of parameters, significantly reducing computational and memory requirements compared to full model fine-tuning. [Show Less]