Prompt Engineering Basics: Write Effective Prompts for Any LLM

Beginner60 min2 exercises35 XP

Prerequisites

Your First Gen AI App LLM Sampling Parameters

0/2 exercises

You type "summarize this data" into ChatGPT and get a wall of vague bullet points. Your colleague types a differently worded request about the same data and gets a crisp, structured analysis. Same model, same data, wildly different results. The difference isn't luck — it's the prompt.

This tutorial teaches you the four building blocks of effective prompts — instruction, context, format, and role — and gives you a scoring rubric to evaluate any prompt before you send it. Every example runs live against the OpenAI API, so you can see exactly how small wording changes produce dramatically different outputs.

Why Prompts Matter More Than You Think

I spent my first week with LLMs blaming the model when I got bad output. "GPT isn't smart enough for this," I'd think. Then I'd watch someone else get a perfect answer from the same model by asking differently. The model wasn't the bottleneck — my prompts were.

An LLM doesn't read your mind. It predicts the most likely continuation of the text you give it. A vague prompt gets a vague continuation. A specific, well-structured prompt gets a specific, useful response. Prompt engineering is the skill of writing inputs that reliably produce the outputs you need.

Let's prove this with a real example. Here's the same question asked two different ways:

A vague prompt — unpredictable output

Loading editor...

A specific prompt — predictable, structured output

Loading editor...

Run both blocks. The vague prompt might give you a 500-word essay, a list of 10 errors, or a tutorial on try/except — you can't predict it. The specific prompt consistently returns exactly 3 errors in the format you requested. That predictability is what prompt engineering buys you.

The Four Building Blocks of a Great Prompt

Every effective prompt — whether it's one sentence or a full page — is built from some combination of four elements. Not every prompt needs all four, but knowing what they are lets you diagnose why a prompt is failing.

All four building blocks in one prompt

Loading editor...

That prompt has four distinct parts, each doing a specific job:

1. Role — "You are a senior Python developer reviewing code for a junior teammate." This tells the model whose perspective to take. A senior developer reviewing code gives different feedback than a teacher explaining concepts.

2. Context — The function itself plus "processes user data." This is the raw material the model needs to work with.

3. Instruction — "Rewrite this function to follow Python best practices." This is what you want done. It's the action verb.

4. Format — "Return the improved code with brief inline comments explaining each change." This controls how the answer looks.

Let's examine each building block on its own so you understand when and how to use it.

Instruction — The Core of Every Prompt

The instruction is the one non-negotiable. Every prompt has one, even if it's implicit. "Tell me about Python" is an instruction — just a bad one. Good instructions are specific about three things: the action (what to do), the scope (how much), and the constraints (what to avoid).

Weak instruction

response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content":
        "Explain decorators in Python"
    }],
)
print(response.choices[0].message.content)

Strong instruction

response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content":
        "Explain Python decorators to someone who "
        "understands functions but has never seen the "
        "@syntax. Use exactly one analogy and one "
        "code example. Keep it under 150 words."
    }],
)
print(response.choices[0].message.content)

The weak version could produce anything from 50 words to 2,000 words. The strong version specifies the audience ("someone who understands functions"), the content ("one analogy and one code example"), and the length ("under 150 words"). Three constraints, and suddenly the output is predictable.

One technique I rely on constantly is specifying what the model should not do. LLMs tend to over-explain — they'll add caveats, disclaimers, and tangential information unless you tell them not to.

Negative constraints keep output focused

Loading editor...

Without that "Do NOT include" clause, you'd likely get pip install commands, code snippets, and a paragraph comparing BeautifulSoup to Scrapy. The negative constraint cuts out the noise before it starts.

Context — Giving the Model What It Needs

Without context, the model can only guess

Loading editor...

That's a useless answer because the model has no idea which code you're talking about. Context is the background information the model needs to give a relevant response — your data, your code, your situation, your constraints.

With context, the model spots the exact problem

Loading editor...

The model immediately sees that sorted(data) runs inside the loop — re-sorting the entire list on every row. With 500,000 rows, that's catastrophic. It couldn't have told you that without seeing your code and knowing the data size.

Context comes in many forms: code snippets, error messages, data samples, business requirements, user profiles, or even prior conversation. The key question is always the same: "Does the model have everything it needs to give me a useful answer?"

Format — Controlling How the Answer Looks

This is the building block most people skip — and the one that saves the most time. Without a format specification, the model picks its own structure. Sometimes that's fine. But if you need JSON for your pipeline, or a markdown table for your report, or exactly three bullet points for a Slack message, you need to say so.

Format specification: markdown table

Loading editor...

Format specification: JSON

Loading editor...

Same question, completely different output shapes. The table is great for human reading. The JSON is great for feeding into another program. I've found that specifying format is the single highest-leverage prompt improvement for production applications — it turns unstructured AI output into something your code can parse reliably.

Role — Setting the Model's Perspective

Here's something that surprised me early on: asking the model to "act as" a specific expert genuinely changes the quality of its answers. It's not just a gimmick. When you assign a role, the model draws on patterns from that domain — the vocabulary, the reasoning style, the depth of analysis.

Same code, different roles — different feedback

Loading editor...

The teacher focuses on readability and learning — renaming variables, adding docstrings, explaining why. The engineer focuses on edge cases and robustness — division by zero, invalid operations, error handling. Neither response is wrong, but they serve completely different purposes.

When should you use a role? My rule of thumb: if you need domain-specific vocabulary, reasoning, or tone, assign a role. If you just need a factual answer, skip it. Telling the model to "act as a Python expert" before asking "what does len() do" adds nothing.

Practice: Build a Structured Prompt

Time to put the four building blocks together. This exercise tests whether you can identify and combine instruction, context, format, and role into a well-structured prompt.

Build a Structured Prompt from Components

Write Code

Write a function build_prompt(role, context, instruction, output_format) that assembles a prompt string from the four building blocks.

Rules:

If role is provided (non-empty string), it should appear first, on its own line

instruction always appears next, on its own line

If context is provided, it appears after the instruction, prefixed with "Context: "

If output_format is provided, it appears last, prefixed with "Format: "

Each non-empty section is separated by a blank line

The function should return the assembled string with no trailing whitespace

Loading editor...

A Prompt Quality Rubric You Can Actually Use

How do you know if a prompt is good before you send it? I use a simple rubric that scores prompts on five criteria. Each criterion gets a score from 0 to 2. A prompt scoring 8 or above out of 10 almost always produces solid results.

Here's the rubric:
---	---	---	---
Specificity	Vague ("tell me about X")	Somewhat specific	Precise action verb + scope
Scope	Unbounded (no length/count limit)	Partial constraint	Clear boundaries (word count, item count)
Format	No format specified	Implicit format	Explicit format (JSON, table, bullets)
Context	No relevant context	Some context	All necessary context included
Testability	Can't tell if output is good	Partially checkable	Clear pass/fail criteria

Let's score a few prompts with this rubric to build your intuition:

Scoring prompts with the rubric

Loading editor...

The scoring function is just a visualization helper — the real value is the mental rubric. Before sending any important prompt, run through the five criteria in your head. If your total is below 5, rewrite before hitting send.

Practice: Score and Improve a Prompt

Now that you have the rubric, let's use it to evaluate and improve prompts programmatically.

Score a Prompt with the Quality Rubric

Write Code

Write a function evaluate_prompt(prompt_text) that scores a prompt and returns a dictionary with the total score and a list of suggestions.

Scoring rules:

Specificity: 2 if the prompt starts with a strong action verb (list, compare, explain, generate, classify, summarize, rewrite, analyze, create, design), 1 if it contains any of those verbs elsewhere, 0 otherwise

Scope: 2 if the prompt contains a number (any digit), 1 if it contains words like "brief", "concise", "short", 0 otherwise

Format: 2 if the prompt mentions "json", "table", "markdown", "bullet", "csv", or "numbered list" (case-insensitive), 0 otherwise

Return a dict with keys "specificity", "scope", "format_score", "total", and "suggestions" (a list of strings, one per criterion that scored 0).

Loading editor...

The Five Most Common Prompt Mistakes

After writing hundreds of prompts — and reviewing many more in production codebases — I've found the same five mistakes account for the vast majority of bad LLM output. You can now use the rubric to diagnose each one.

Mistake 1: Asking Two Questions at Once

Two questions merged — confused output

response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content":
        "What is a Python generator and when should "
        "I use it instead of a list comprehension and "
        "what are the performance implications?"
    }],
)
print(response.choices[0].message.content[:400])

One clear question — focused output

response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content":
        "When should I use a Python generator instead "
        "of a list comprehension? Give 3 specific "
        "scenarios with a one-sentence explanation each."
    }],
)
print(response.choices[0].message.content)

The "before" prompt is really three separate questions stitched together with "and." The model tries to answer all three at once and does a mediocre job on each. One clear question per prompt. If you need three answers, make three calls — the cost is fractions of a cent.

Mistake 2: No Output Constraints

Without length or format constraints, LLMs default to verbose. They'll write 500 words when you needed 50. Every production prompt I write includes at least one constraint — a word limit, a sentence count, or a structural format.

Explicit constraints produce concise output

Loading editor...

Mistake 3: Ambiguous Action Verbs

"Tell me about," "discuss," and "talk about" are the worst prompt starters. They give the model zero direction on scope, depth, or structure. Watch the difference a single verb swap makes:

Vague verb — rambling output

response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content":
        "Discuss Python testing"
    }],
)
print(response.choices[0].message.content[:300])

Specific verb — structured output

response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content":
        "List the 3 most popular Python testing "
        "frameworks and state what each one is "
        "best at in one sentence."
    }],
)
print(response.choices[0].message.content)

Mistake 4: Ignoring the System Message

If you're building an application — not just chatting — the system message is where persistent behavior rules belong. Stuffing role instructions and formatting rules into every user message is wasteful and error-prone. Set them once in the system message.

System message sets rules once for all messages

Loading editor...

Mistake 5: Not Iterating

The biggest mistake is treating a prompt as a one-shot effort. Professional prompt engineers iterate: try, evaluate the output, adjust one thing, try again. I usually go through 3-5 iterations before settling on a prompt for production use. The rubric from the previous section makes this process systematic — score your prompt, fix the weakest criterion, and re-run.

Three iterations of the same prompt — output gets tighter

Loading editor...

Putting It All Together — A Real-World Prompt Rewrite

Let's walk through a realistic scenario. You're building an app that reviews Python functions and suggests improvements. Your first attempt at a prompt produces inconsistent results. Here's how you'd iterate to fix it.

Iterating from a weak prompt to a strong one

Loading editor...

Version 3 hits nearly every rubric criterion. It has a role (senior developer), context (the actual function), a specific instruction (find issues in three categories), an explicit format (JSON array), and clear testability (you can parse the JSON and check each issue has the required keys).

Three Prompt Patterns Worth Memorizing

Before we wrap up, here are three prompt structures I reach for constantly. They handle the most common use cases in day-to-day work with LLMs.

Pattern 1: The Structured Extractor

The Structured Extractor pattern

Loading editor...

Pattern 2: The Step-by-Step Analyzer

The Step-by-Step Analyzer pattern

Loading editor...

Pattern 3: The Constrained Generator

The Constrained Generator pattern

Loading editor...

Each pattern follows the same logic: clear instruction, relevant context, explicit constraints, and a defined output shape. Once you internalize this formula, writing effective prompts becomes almost automatic.

Summary

Every effective prompt is built from four building blocks: instruction (what to do), context (what to work with), format (how to present the answer), and role (whose perspective to take). Not every prompt needs all four — but when your output is wrong, one of these four is usually the fix.

The prompt quality rubric gives you a quick way to evaluate any prompt before sending it: score it on specificity, scope, format, context, and testability. A prompt scoring 8+ out of 10 almost always produces useful results. Below that, iterate — adjust one criterion at a time until the output is consistently what you need.

The five mistakes to avoid: asking multiple questions at once, leaving output unconstrained, using vague verbs, ignoring the system message, and failing to iterate. Fix those five habits and you're ahead of most LLM users.

The next tutorial covers zero-shot and few-shot prompting — techniques for getting LLMs to classify, extract, and generate data without any training, just by carefully structuring your prompts with examples.

Frequently Asked Questions

Do these prompt techniques work with all LLMs, not just GPT?

Yes. The four building blocks — instruction, context, format, role — work across every major LLM: GPT-4, Claude, Gemini, Llama, Mistral. The fundamentals are universal because all these models predict text continuations. Specific formatting quirks vary (Claude handles XML tags well; GPT handles JSON well), but the structural principles transfer directly.

How long should a prompt be?

As long as it needs to be, and not a word more. A simple factual question works in one sentence. A complex code review prompt with context, constraints, and format might be 200 words. The right length is the minimum needed to reliably produce the output you want. If a shorter prompt gets the same results, use the shorter one — it's faster and cheaper.

Should I always use a system message?

For one-off questions, it doesn't matter much — put everything in the user message. For applications where the model's behavior should be consistent across many inputs (a chatbot, a code reviewer, a data extractor), the system message is the right place for role and formatting rules. It sets the behavior once instead of repeating it in every user message.

References

OpenAI — Prompt Engineering Guide. Link

Anthropic — Prompt Engineering Documentation. Link

Wei, J. et al. — "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." NeurIPS 2022. Link

White, J. et al. — "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT." arXiv 2023. Link

Google — Prompt Engineering for Generative AI. Link

OpenAI — Chat Completions API Reference. Link

Zamfirescu-Pereira, J.D. et al. — "Why Johnny Can't Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts." CHI 2023. Link

Why Prompts Matter More Than You Think

The Four Building Blocks of a Great Prompt

Instruction — The Core of Every Prompt

Context — Giving the Model What It Needs

Format — Controlling How the Answer Looks

Role — Setting the Model's Perspective

Practice: Build a Structured Prompt

A Prompt Quality Rubric You Can Actually Use

Practice: Score and Improve a Prompt

The Five Most Common Prompt Mistakes

Mistake 1: Asking Two Questions at Once

Mistake 2: No Output Constraints

Mistake 3: Ambiguous Action Verbs

Mistake 4: Ignoring the System Message

Mistake 5: Not Iterating

Putting It All Together — A Real-World Prompt Rewrite

Three Prompt Patterns Worth Memorizing

Pattern 1: The Structured Extractor

Pattern 2: The Step-by-Step Analyzer

Pattern 3: The Constrained Generator

Summary

Frequently Asked Questions

Do these prompt techniques work with all LLMs, not just GPT?

How long should a prompt be?

Should I always use a system message?

References

Related Tutorials