Prompt Engineering Basics: Write Effective Prompts for Any LLM
You type "summarize this data" into ChatGPT and get a wall of vague bullet points. Your colleague types a differently worded request about the same data and gets a crisp, structured analysis. Same model, same data, wildly different results. The difference isn't luck — it's the prompt.
This tutorial teaches you the four building blocks of effective prompts — instruction, context, format, and role — and gives you a scoring rubric to evaluate any prompt before you send it. Every example runs live against the OpenAI API, so you can see exactly how small wording changes produce dramatically different outputs.
Why Prompts Matter More Than You Think
I spent my first week with LLMs blaming the model when I got bad output. "GPT isn't smart enough for this," I'd think. Then I'd watch someone else get a perfect answer from the same model by asking differently. The model wasn't the bottleneck — my prompts were.
An LLM doesn't read your mind. It predicts the most likely continuation of the text you give it. A vague prompt gets a vague continuation. A specific, well-structured prompt gets a specific, useful response. Prompt engineering is the skill of writing inputs that reliably produce the outputs you need.
Let's prove this with a real example. Here's the same question asked two different ways:
Run both blocks. The vague prompt might give you a 500-word essay, a list of 10 errors, or a tutorial on try/except — you can't predict it. The specific prompt consistently returns exactly 3 errors in the format you requested. That predictability is what prompt engineering buys you.
The Four Building Blocks of a Great Prompt
Every effective prompt — whether it's one sentence or a full page — is built from some combination of four elements. Not every prompt needs all four, but knowing what they are lets you diagnose why a prompt is failing.
That prompt has four distinct parts, each doing a specific job:
1. Role — "You are a senior Python developer reviewing code for a junior teammate." This tells the model whose perspective to take. A senior developer reviewing code gives different feedback than a teacher explaining concepts.
2. Context — The function itself plus "processes user data." This is the raw material the model needs to work with.
3. Instruction — "Rewrite this function to follow Python best practices." This is what you want done. It's the action verb.
4. Format — "Return the improved code with brief inline comments explaining each change." This controls how the answer looks.
Let's examine each building block on its own so you understand when and how to use it.
Instruction — The Core of Every Prompt
The instruction is the one non-negotiable. Every prompt has one, even if it's implicit. "Tell me about Python" is an instruction — just a bad one. Good instructions are specific about three things: the action (what to do), the scope (how much), and the constraints (what to avoid).
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content":
"Explain decorators in Python"
}],
)
print(response.choices[0].message.content)response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content":
"Explain Python decorators to someone who "
"understands functions but has never seen the "
"@syntax. Use exactly one analogy and one "
"code example. Keep it under 150 words."
}],
)
print(response.choices[0].message.content)The weak version could produce anything from 50 words to 2,000 words. The strong version specifies the audience ("someone who understands functions"), the content ("one analogy and one code example"), and the length ("under 150 words"). Three constraints, and suddenly the output is predictable.
One technique I rely on constantly is specifying what the model should not do. LLMs tend to over-explain — they'll add caveats, disclaimers, and tangential information unless you tell them not to.
Without that "Do NOT include" clause, you'd likely get pip install commands, code snippets, and a paragraph comparing BeautifulSoup to Scrapy. The negative constraint cuts out the noise before it starts.
Context — Giving the Model What It Needs
That's a useless answer because the model has no idea which code you're talking about. Context is the background information the model needs to give a relevant response — your data, your code, your situation, your constraints.
The model immediately sees that sorted(data) runs inside the loop — re-sorting the entire list on every row. With 500,000 rows, that's catastrophic. It couldn't have told you that without seeing your code and knowing the data size.
Context comes in many forms: code snippets, error messages, data samples, business requirements, user profiles, or even prior conversation. The key question is always the same: "Does the model have everything it needs to give me a useful answer?"
Format — Controlling How the Answer Looks
This is the building block most people skip — and the one that saves the most time. Without a format specification, the model picks its own structure. Sometimes that's fine. But if you need JSON for your pipeline, or a markdown table for your report, or exactly three bullet points for a Slack message, you need to say so.
Same question, completely different output shapes. The table is great for human reading. The JSON is great for feeding into another program. I've found that specifying format is the single highest-leverage prompt improvement for production applications — it turns unstructured AI output into something your code can parse reliably.
Role — Setting the Model's Perspective
Here's something that surprised me early on: asking the model to "act as" a specific expert genuinely changes the quality of its answers. It's not just a gimmick. When you assign a role, the model draws on patterns from that domain — the vocabulary, the reasoning style, the depth of analysis.
The teacher focuses on readability and learning — renaming variables, adding docstrings, explaining why. The engineer focuses on edge cases and robustness — division by zero, invalid operations, error handling. Neither response is wrong, but they serve completely different purposes.
When should you use a role? My rule of thumb: if you need domain-specific vocabulary, reasoning, or tone, assign a role. If you just need a factual answer, skip it. Telling the model to "act as a Python expert" before asking "what does len() do" adds nothing.
Practice: Build a Structured Prompt
Time to put the four building blocks together. This exercise tests whether you can identify and combine instruction, context, format, and role into a well-structured prompt.
Write a function build_prompt(role, context, instruction, output_format) that assembles a prompt string from the four building blocks.
Rules:
role is provided (non-empty string), it should appear first, on its own lineinstruction always appears next, on its own linecontext is provided, it appears after the instruction, prefixed with "Context: "output_format is provided, it appears last, prefixed with "Format: "A Prompt Quality Rubric You Can Actually Use
How do you know if a prompt is good before you send it? I use a simple rubric that scores prompts on five criteria. Each criterion gets a score from 0 to 2. A prompt scoring 8 or above out of 10 almost always produces solid results.
| Here's the rubric: | |||
|---|---|---|---|
| --- | --- | --- | --- |
| Specificity | Vague ("tell me about X") | Somewhat specific | Precise action verb + scope |
| Scope | Unbounded (no length/count limit) | Partial constraint | Clear boundaries (word count, item count) |
| Format | No format specified | Implicit format | Explicit format (JSON, table, bullets) |
| Context | No relevant context | Some context | All necessary context included |
| Testability | Can't tell if output is good | Partially checkable | Clear pass/fail criteria |
Let's score a few prompts with this rubric to build your intuition:
The scoring function is just a visualization helper — the real value is the mental rubric. Before sending any important prompt, run through the five criteria in your head. If your total is below 5, rewrite before hitting send.
Practice: Score and Improve a Prompt
Now that you have the rubric, let's use it to evaluate and improve prompts programmatically.
Write a function evaluate_prompt(prompt_text) that scores a prompt and returns a dictionary with the total score and a list of suggestions.
Scoring rules:
Return a dict with keys "specificity", "scope", "format_score", "total", and "suggestions" (a list of strings, one per criterion that scored 0).
The Five Most Common Prompt Mistakes
After writing hundreds of prompts — and reviewing many more in production codebases — I've found the same five mistakes account for the vast majority of bad LLM output. You can now use the rubric to diagnose each one.
Mistake 1: Asking Two Questions at Once
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content":
"What is a Python generator and when should "
"I use it instead of a list comprehension and "
"what are the performance implications?"
}],
)
print(response.choices[0].message.content[:400])response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content":
"When should I use a Python generator instead "
"of a list comprehension? Give 3 specific "
"scenarios with a one-sentence explanation each."
}],
)
print(response.choices[0].message.content)The "before" prompt is really three separate questions stitched together with "and." The model tries to answer all three at once and does a mediocre job on each. One clear question per prompt. If you need three answers, make three calls — the cost is fractions of a cent.
Mistake 2: No Output Constraints
Without length or format constraints, LLMs default to verbose. They'll write 500 words when you needed 50. Every production prompt I write includes at least one constraint — a word limit, a sentence count, or a structural format.
Mistake 3: Ambiguous Action Verbs
"Tell me about," "discuss," and "talk about" are the worst prompt starters. They give the model zero direction on scope, depth, or structure. Watch the difference a single verb swap makes:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content":
"Discuss Python testing"
}],
)
print(response.choices[0].message.content[:300])response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content":
"List the 3 most popular Python testing "
"frameworks and state what each one is "
"best at in one sentence."
}],
)
print(response.choices[0].message.content)Mistake 4: Ignoring the System Message
If you're building an application — not just chatting — the system message is where persistent behavior rules belong. Stuffing role instructions and formatting rules into every user message is wasteful and error-prone. Set them once in the system message.
Mistake 5: Not Iterating
The biggest mistake is treating a prompt as a one-shot effort. Professional prompt engineers iterate: try, evaluate the output, adjust one thing, try again. I usually go through 3-5 iterations before settling on a prompt for production use. The rubric from the previous section makes this process systematic — score your prompt, fix the weakest criterion, and re-run.
Putting It All Together — A Real-World Prompt Rewrite
Let's walk through a realistic scenario. You're building an app that reviews Python functions and suggests improvements. Your first attempt at a prompt produces inconsistent results. Here's how you'd iterate to fix it.
Version 3 hits nearly every rubric criterion. It has a role (senior developer), context (the actual function), a specific instruction (find issues in three categories), an explicit format (JSON array), and clear testability (you can parse the JSON and check each issue has the required keys).
Three Prompt Patterns Worth Memorizing
Before we wrap up, here are three prompt structures I reach for constantly. They handle the most common use cases in day-to-day work with LLMs.
Pattern 1: The Structured Extractor
Pattern 2: The Step-by-Step Analyzer
Pattern 3: The Constrained Generator
Each pattern follows the same logic: clear instruction, relevant context, explicit constraints, and a defined output shape. Once you internalize this formula, writing effective prompts becomes almost automatic.
Summary
Every effective prompt is built from four building blocks: instruction (what to do), context (what to work with), format (how to present the answer), and role (whose perspective to take). Not every prompt needs all four — but when your output is wrong, one of these four is usually the fix.
The prompt quality rubric gives you a quick way to evaluate any prompt before sending it: score it on specificity, scope, format, context, and testability. A prompt scoring 8+ out of 10 almost always produces useful results. Below that, iterate — adjust one criterion at a time until the output is consistently what you need.
The five mistakes to avoid: asking multiple questions at once, leaving output unconstrained, using vague verbs, ignoring the system message, and failing to iterate. Fix those five habits and you're ahead of most LLM users.
The next tutorial covers zero-shot and few-shot prompting — techniques for getting LLMs to classify, extract, and generate data without any training, just by carefully structuring your prompts with examples.
Frequently Asked Questions
Do these prompt techniques work with all LLMs, not just GPT?
Yes. The four building blocks — instruction, context, format, role — work across every major LLM: GPT-4, Claude, Gemini, Llama, Mistral. The fundamentals are universal because all these models predict text continuations. Specific formatting quirks vary (Claude handles XML tags well; GPT handles JSON well), but the structural principles transfer directly.
How long should a prompt be?
As long as it needs to be, and not a word more. A simple factual question works in one sentence. A complex code review prompt with context, constraints, and format might be 200 words. The right length is the minimum needed to reliably produce the output you want. If a shorter prompt gets the same results, use the shorter one — it's faster and cheaper.
Should I always use a system message?
For one-off questions, it doesn't matter much — put everything in the user message. For applications where the model's behavior should be consistent across many inputs (a chatbot, a code reviewer, a data extractor), the system message is the right place for role and formatting rules. It sets the behavior once instead of repeating it in every user message.