AI Coding Best Practices for Today

Why This Document Exists

AI coding assistants like GitHub Copilot, Claude, Cursor, and others can significantly accelerate development. However, research shows they also introduce systematic risks:

63% more code smells in AI-generated code compared to human-written code
25-38% of AI suggestions use deprecated APIs due to training data cutoffs
Only 10.5% of functionally correct AI apps are also secure
Performance degrades significantly as context windows fill up

This guide helps you capture the productivity benefits while avoiding the quality pitfalls. It’s grounded in peer-reviewed research from 2024-2026 and experience from teams at Anthropic, GitHub, and across the industry.

1. Introduction and Philosophy

What AI Coding Assistants Are (and Aren’t)

Think of AI coding assistants as an over-confident pair programming partner. They’re fast at looking things up, excellent at executing tedious tasks, and can write code that looks correct. But they make mistakes—and those mistakes can be subtle and “deeply inhuman” in ways you won’t expect.

“The mental model I use: LLMs are an over-confident pair programming assistant who’s fast at lookups and executing tedious tasks, but makes mistakes.” — Simon Willison

Core Principles

1. You own all AI-generated code.

The moment you commit code, it’s yours—regardless of who or what wrote it. You’re responsible for its correctness, security, and maintainability. Never commit code you don’t understand.

2. AI is an assistant, not a replacement.

AI accelerates your work; it doesn’t replace your judgment. Your experience, architectural knowledge, and understanding of requirements are irreplaceable.

2. Start simple, add complexity only when needed.

Don’t reach for sophisticated AI workflows when a simple prompt works. Many tasks need just a direct request, not an elaborate agent setup.

3. Verify everything.

AI can hallucinate APIs that don’t exist, introduce security vulnerabilities, and miss edge cases. Trust, but verify—with tests, linters, and your own review.

2. The Cardinal Rule: Read All Code Before Committing

Mandatory practice: Engineers must read every line of AI-generated code at least once before committing.

This is the single most important practice in this document. No exceptions.

I know a lot of developers oppose reading code since it’s easy to regenerate. But at least for the next 12-18 months, I believe it’s far too important to understand what your code is actually doing under the hood.

Why This Matters

Learning: Reading code builds your understanding of patterns, idioms, and the codebase itself. Even if the code is correct, you learn by reading it.

Complexity Awareness: When bugs appear later (and they will), you’ll need mental models of the code to debug effectively. If you never read the code, you’ll struggle to fix it when AI assistance fails.

Ownership: You cannot truly own code you haven’t read. Blind commits transfer liability without understanding. When someone asks “why did you do it this way?”, “the AI wrote it” is not an acceptable answer.

Bug Detection: AI introduces subtle bugs that only human review catches: - Race conditions in concurrent code - Logical errors that pass type checking - Security vulnerabilities that look functional - Edge cases the AI never considered

What “Reading” Means

Reading doesn’t mean glancing at the code. It means:

Understanding what each function does
Tracing the data flow
Considering edge cases
Asking yourself: “Would I have written it this way? If not, why?”
Verifying the code matches your intended requirements

If you can’t explain what the code does to a colleague, you haven’t read it thoroughly enough.

3. The Planning-First Workflow

The Problem with “Just Code It”

When you ask AI to implement something immediately, you risk: - Solving the wrong problem (AI misunderstood your requirements) - Implementing a suboptimal approach (AI didn’t know about existing patterns) - Creating code that doesn’t fit your architecture

The Solution: Explore → Plan → Implement → Verify

This workflow, recommended by Anthropic, separates research from execution.

Phase 1: Explore (Plan Mode - Read Only)

Before making any changes, research the codebase:

I want to add user authentication. Before we implement anything:
- What authentication patterns already exist in this codebase?
- What libraries are we already using?
- Are there any relevant configuration files?
- Which libraries seem resilient and implement the best practices for this today?

Let the AI read files and answer questions. Don’t let it make changes yet.

Result: Understanding of the current state and constraints.

Phase 2: Plan

Ask the AI to create a detailed implementation plan:

Based on your research, create a plan for implementing OAuth login:
- Which files need to change?
- What new files should we create?
- What's the sequence of changes?
- What edge cases should we handle?

Review this plan yourself. Edit it. Challenge assumptions. This is your specification. If anything important isn’t spelled out, tell the model what to do or ask for that concern to be addressed in the plan.

Result: A human-reviewed plan you can reference later.

Phase 3: Implement (Fresh Context)

Start a new session or clear your context (/clear in many tools). Then implement:

Implement the OAuth login flow according to this plan:
[paste or reference your reviewed plan]
Start with step 1: creating the authentication service.

Execute against your plan, not the exploratory conversation. This keeps context clean.

Result: Clean implementation without research noise.

Phase 4: Verify

Run your test suite
Run linters and type checkers
Manually test the feature
Have a human review the changes

Result: Confidence the implementation is correct.

When to Skip Planning

For simple, well-understood tasks, skip straight to implementation: - Fixing a typo - Adding a log line - Renaming a variable - Implementing a function with a clear specification

If you could describe the diff in one sentence, you probably don’t need a plan. You might also benefit from not using a coding agent.

4. Writing Effective Prompts

The Basics

Be specific, not vague:

Instead of…

Write…

“Add tests for foo.py”

“Write tests for the validate_email function in foo.py, covering: valid emails, invalid format, empty string, and None input”

“Fix the login bug”

“Users report that login fails after session timeout. The issue is likely in src/auth/. Write a failing test that reproduces the issue, then fix it”

“Make the dashboard look better”

“[paste screenshot] Implement this design for the dashboard. Use our existing component library in src/components/”

Provide verification criteria:

Tell the AI how to verify its work:

Implement a rate limiter for our API. Requirements:
- Max 100 requests per minute per user
- Returns 429 status when exceeded
- Write tests that verify the limit is enforced
- Run the tests after implementing

Reference existing patterns:

Add a new UserSettings component following the same patterns as
ProfileSettings in src/components/ProfileSettings.tsx.
Use our existing form components and validation approach.

Break complex tasks into steps:

Instead of “Build a complete user registration system,” break it down:

“Create the database schema for users”
“Implement the registration endpoint”
“Add email verification”
“Create the frontend form”

What to Include in Your Prompts

Context: What files are relevant? What’s the current state?
Requirements: What exactly should the code do?
Constraints: What libraries to use? What patterns to follow?
Verification: How will you know it works?
Examples: Sample inputs/outputs if applicable

What NOT to Do

Don’t assume the AI knows your codebase’s conventions
Don’t accept the first suggestion without review
Don’t skip verification because the code “looks right”
Don’t let the AI decide your architecture

5. Code Review Standards for AI-Generated Code

The Layered Review Approach

AI-generated code requires the same (or more) scrutiny as human-written code.

Layer 1: Automated Checks (First) - Linters and formatters - Type checkers - Unit tests - Static analysis (CodeQL, Semgrep) - Security scanners (Dependabot, Snyk)

Run these before any human looks at the code. They catch obvious issues quickly.

Layer 2: Self-Review (Second) - Did you read every line? (See Section 3) - Does the code match your requirements? - Are there any TODO comments or incomplete implementations? - Were any implementations fully mocked to look like they worked? - Do you understand why each decision was made?

Layer 3: Peer Review (Third) - Another human reviews the code - Focus on architectural fit and business logic - Check for edge cases the AI might have missed - Verify the code is maintainable

What Human Reviewers Should Focus On

Automated tools catch syntax and known vulnerability patterns. Humans should focus on:

Architectural alignment: Does this fit our system design?
Business logic correctness: Does it actually solve the problem?
Edge cases: What happens with unusual inputs?
Maintainability: Will this be easy to modify later?
Security assumptions: Is the threat model appropriate?

Red Flags in AI-Generated Code

Watch for these patterns during review:

// TODO: or // FIXME comments (incomplete code)
throw new NotImplementedError() or similar stubs
Hardcoded values that should be configurable
Missing error handling for edge cases
Overly complex solutions to simple problems
Code that doesn’t match existing patterns in the codebase
Loud, insecure logging of sensitive content

Keep Pull Requests Small

Even though AI can generate code quickly, keep PRs small and focused:

Easier to review thoroughly
Easier to identify AI-introduced issues
Easier to revert if problems emerge

6. Maintaining Developer Skills

The Skill Atrophy Risk

Research from Microsoft and Carnegie Mellon (2025) found that increased reliance on AI tools correlates with decreased critical thinking engagement. The more you delegate to AI, the harder it becomes to summon those skills when needed.

This is similar to how GPS navigation eroded wayfinding abilities—convenient in the moment, but problematic when you need to navigate without it.

How to Keep Your Skills Sharp

1. Write Code Yourself First

Before asking AI for help with a new concept, try writing it yourself. Struggle with it. Make mistakes. Then use AI to compare approaches or fill gaps.

2. Understand Before Accepting

Never blindly accept AI suggestions. When AI proposes code: - Read it (see Section 3) - Understand why it works - Consider if you’d write it differently - Ask the AI to explain (in ask mode) if something is unclear

2. Use Teaching Mode

Instead of asking AI to generate code, ask it to explain concepts:

Explain how rate limiting works with examples.
Don't write the code—I want to understand the concepts first.

Most coding agents have an Ask mode intended just for this purpose.

3. Code Review as Learning

Treat reviewing AI output as a learning opportunity. Ask yourself: - What patterns is the AI using? - Are there approaches I hadn’t considered? - What would I have done differently?

4. Disable Inline Suggestions Sometimes

During learning-focused sessions, turn off AI autocomplete. Write the code yourself. Use AI chat as a tutor that explains concepts, not as a code generator.

The Key Insight

The mandatory code reading from Section 3 isn’t just about quality—it’s about professional development. The same practice that catches bugs also builds the mental models you need for future debugging.

When AI assistance fails (and it will), your own skills are what you fall back on.

7. AI Mistake Catalog

The 9 Systematic Failure Patterns

AI-generated code fails in predictable patterns. Learning to recognize these makes debugging faster.

1. Hallucinated APIs

What it looks like: The AI suggests importing a package or calling a method that doesn’t exist. It looks plausible but fails at runtime.

Why it happens: AI models learn patterns, not facts. They’ve seen thousands of imports and method calls, so they generate similar-looking code—even when it’s fictional.

How to catch it: - Read the code before running code whenever new dependencies are added - Verify unfamiliar imports and read library reviews, check 3rd party references - Watch for methods that seem “too convenient”

⚠️

Warning: Hackers have been known to name-squat hallucinated library names with malicious code. Hallucinated APIs can do far worse than simply not work as intended. They can put nation-state hostile actors on your machine!

Example:

# AI suggests this, but the method doesn't exist
from datetime import datetime
result = datetime.parse_flexible("2024-01-15")  # No such method!

2. Security Vulnerabilities That Look Functional

What it looks like: Code that works correctly but has security flaws. The tests pass, but it’s exploitable.

Why it happens: AI optimizes for making code work, not for security. It generates patterns that are common in training data—including insecure patterns in overly simplified examples.

How to catch it: - Run security scanners (CodeQL, Semgrep) - Check for SQL injection, XSS, missing authentication - Look for hardcoded credentials - Review error messages for information leakage

Example:

# Works, but SQL injection vulnerability
def get_user(username):
    query = f"SELECT * FROM users WHERE name = '{username}'"
    return db.execute(query)

2. Performance Anti-Patterns

What it looks like: Code that’s correct but slow. Often involves inefficient algorithms or data structures.

Why it happens: AI generates code that works at small scale. It doesn’t optimize for production loads.

How to catch it: - Look for nested loops (O(n²) where O(n) is possible) - Check for string concatenation in loops - Profile performance-critical code - Test with realistic data volumes

Example:

# O(n²) when O(n) is possible
def has_duplicates(items):
    for i, item in enumerate(items):
        for j, other in enumerate(items):
            if i != j and item == other:
                return True
    return False

# Better: use a set for O(n)
def has_duplicates(items):
    return len(items) != len(set(items))

3. Error Handling That Assumes Happy Paths

What it looks like: Try-catch blocks that don’t actually handle errors or missing validation for inputs.

Why it happens: Training data mostly contains happy-path code. Error handling is often missing or incomplete in examples the AI learned from.

How to catch it: - Look for empty catch blocks - Check if errors are logged but not handled - Verify all inputs are validated - Test with null, empty, and invalid inputs

Example:

# Catches error but doesn't handle it
try:
    result = risky_operation()
except Exception as e:
    print(f"Error:{e}")  # Then what?
return result  # Could be undefined!

4. Missing Edge Cases

What it looks like: Code that works for typical inputs but fails on boundaries: empty arrays, null values, maximum integers, Unicode characters.

Why it happens: Training data over-represents common cases. Edge cases appear less frequently, so the AI is less likely to handle them.

How to catch it: - Test with empty inputs - Test with null/None values - Test with boundary values (0, -1, MAX_INT) - Test with special characters and Unicode

Example:

# Fails on empty list
def get_average(numbers):
    return sum(numbers) / len(numbers)  # ZeroDivisionError!

5. Outdated Library Usage

What it looks like: Code uses deprecated APIs or old patterns that have been replaced with better alternatives.

Why it happens: AI training data has cutoff dates (often late 2023). Libraries evolve faster than AI training updates. Research shows 25-38% of AI suggestions use deprecated APIs.

How to catch it: - Check documentation for deprecation warnings - Verify package versions against latest - Run with deprecation warnings enabled - Watch for security advisories on old versions

💡

Instead of catching it retroactively, suggest known, excellent libraries and APIs during the planning phase

Example:

# Old pattern (deprecated in Python 3.10+)
from typing import List, Dict
def process(items: List[Dict[str, int]]) -> None: ...

# Modern pattern
def process(items: list[dict[str, int]]) -> None: ...

6. Data Model Mismatches

What it looks like: Code assumes data structures that don’t match actual APIs or database schemas. Fails at runtime when real data is used.

Why it happens: AI doesn’t have access to your actual schemas. It guesses based on variable names and context.

How to catch it: - Verify field names against actual schemas - Test with real data from your systems - Check API documentation for response formats - Use TypeScript/type hints to catch mismatches early

💡

Always pass in real data and data structures if you have them BEFORE implementing code

Example:

// AI assumes this structure
interface User {
  username: string;
  email: string;
}

// Actual API returns this
interface User {
  user_name: string;  // Different field name!
  email_address: string;
}

7. Missing Context Dependencies

What it looks like: Code assumes environment variables, configurations, or services that don’t exist in all environments.

Why it happens: AI generates code based on immediate context. It doesn’t know about your deployment configurations.

How to catch it: - Check for hardcoded environment assumptions - Verify all configuration values exist - Test in staging environments - Document required dependencies

Example:

# Assumes DATABASE_URL exists
import os
db_url = os.environ["DATABASE_URL"]  # KeyError in some environments!

# Better: provide a default or fail explicitly
db_url = os.environ.get("DATABASE_URL")
if not db_url:
    raise EnvironmentError("DATABASE_URL must be set")

8. Race Conditions and Concurrency Issues

AI models particularly struggle with concurrent programming.

Watch for: - Shared mutable state without synchronization (locks, mutexes) - Async operations without proper awaiting - Resource cleanup in the wrong order - Lock acquisition without guaranteed release - Time-of-check to time-of-use (TOCTOU) bugs - Inappropriate locking methods used with lock-busting coroutines (synchronized) - Hardcoded delays used as race condition “fixes”

Example of a race condition:

# Not thread-safe: check and update are separate
class Counter:
    def __init__(self):
        self.count = 0

    def increment_if_below(self, limit):
        if self.count < limit:  # Check
            self.count += 1     # Update - another thread could have changed count!

Common Footguns

Hardcoded delays: Race conditions need real solutions, not bandaids that break with the next change or slower connection
Implicit type coercion: "5" + 3 behaves differently across languages
Null pointer exceptions: Unvalidated inputs accessed without checks
Off-by-one errors: Loop boundaries wrong by one
Resource leaks: File handles or connections not closed
Infinite loops: Incorrect termination conditions
Silent failures: Exceptions caught but not handled or logged

Deprecated APIs and Outdated Code Patterns

High-risk areas for outdated code: - Cryptography (algorithms deprecated for security reasons) - Authentication/authorization libraries - Web frameworks (breaking changes between major versions) - Database drivers and ORMs - Cloud SDK clients (AWS, GCP, Azure APIs change frequently)

Verification question: “When was this method/API introduced and is it still the recommended approach?”

Team practice: Always verify AI-suggested library code against current documentation.

It often helps to directly link the most recent API documentation to the coding agent.

TODOs, Mocks, and Incomplete Implementations

AI systematically leaves incomplete code that can slip into production.

Patterns to search for before committing:

TODO
FIXME
XXX
"not implemented"
"placeholder"
"implement this"

Set up pre-commit hooks: Use a tool like lefthook to enforce pre-commit hooks that detect these patterns.

Team policy: If no ticket number is included, we should safely be able to assume they were added by AI. We should block the commit by hook. Do not commit TODOs without ticket numbers in production code!

8. Avoiding “AI Slop” - Quality Standards

What is “AI Slop”?

“AI slop” refers to low-quality, generic AI-generated content that: - Uses templated, surface-level approaches - Lacks genuine insight or fit to your specific problem - Shows minimal human editing or oversight - Is harder to refactor than to rewrite

Characteristics of AI Slop in Code

Generic implementations that don’t match your architecture
Overly verbose code when simpler solutions exist
Missing context about your specific requirements
Boilerplate that doesn’t integrate with existing patterns
Code comments that state the obvious rather than explain why

Quality Practices

Treat AI prompts like code: Version them, review them, iterate on them. Good prompts lead to better output.

Verify against your own knowledge: You know your architecture. If the AI’s approach conflicts with it, trust yourself.

Don’t accept first drafts that just “seem to work”: Ask for improvements. Request alternatives. Challenge the AI’s assumptions.

The Verification Hierarchy

Critical: Do NOT rely on the same model to review its own output.

Research shows LLMs exhibit “self-preference bias”—they score their own outputs higher and struggle to recognize their own errors. They also hallucinate mistakes when asked to spot their errors.

Correct verification order:

Objective tests first: Automated, deterministic checks (unit tests, linters, type checkers). These cannot hallucinate.
LLM-as-judge from a DIFFERENT model: If using AI for review, use a different model than the one that generated the code. This avoids self-confirmation bias.
Human review: Final verification by humans who understand the requirements.

Why this matters: Same-model self-review can hallucinate problems that don’t exist OR miss real problems while confidently approving flawed code.

9. Security Considerations

Assume AI Code Contains Vulnerabilities

Research shows: - 45% of AI-generated code contains security vulnerabilities - AI code is 2.74x more prone to XSS vulnerabilities

Security Checklist for AI-Generated Code

Run static analysis security tools (CodeQL, Semgrep)
Check for hardcoded secrets or credentials
Verify input validation exists for all user inputs
Check for SQL injection in database queries
Verify authentication/authorization is properly implemented
Check error messages don’t leak sensitive information
Verify dependencies don’t have known vulnerabilities
Check for proper HTTPS/TLS usage
Verify file operations don’t allow path traversal

High-Risk Areas

Extra scrutiny for AI-generated code in: - Authentication and authorization - Payment processing - Personal data handling - API endpoints accessible from the internet - File upload/download functionality - Cryptographic operations

Never Trust AI for Security-Critical Decisions

For security-critical code: 1. Write it yourself or closely supervise the AI 2. Have security-focused peer review 2. Consider professional security audit for high-stakes systems

Team policy: Add a merge request template for project code reviews with a checklist appropriate to that kind of application.

While you can ask another coding agent to check for specific vulnerabilities, you should never assume that is a sufficient level of review. It’s a good first step only.

MCP Security Best Practices

MCP tools are a significant organizational security risk that has no simple solution. It’s best to only use what’s absolutely necessary.

1. Vet MCP servers before installing

Only use MCP servers from trusted, known sources
Check for recent security audits or vulnerability reports
Prefer official servers from major vendors (but even these have had vulnerabilities)

2. Minimize MCP server permissions

Only grant the minimum permissions needed
Don’t connect MCP servers to sensitive systems
Use read-only access where possible

2. Isolate MCP servers

Run MCP servers in containers or sandboxed environments
Limit network access to only required endpoints

3. Monitor MCP activity

Log all MCP tool invocations
Review what data is being accessed
Watch for unexpected behavior

4. Keep servers updated

MCP servers must be actively patched for vulnerabilities
Update frequently
Remove unused servers

10. Agent Rules vs. Documentation

Understanding the Difference

Agent Rules (like CLAUDE.md, .cursorrules): Loaded into every AI interaction. Always in context.

Documentation: Files the AI can search and read when needed. Not always loaded.

Best Practices for Rules Files

Rules files consume context tokens on every interaction. Keep them lean.

Include: - Build/test commands the AI can’t guess - Coding style rules that differ from defaults - Architectural decisions specific to your project - Common gotchas the AI keeps getting wrong

Exclude: - Standard language conventions (AI already knows these) - Things your linter catches - Long explanations (link to docs instead) - Anything the AI infers correctly from code

Example CLAUDE.md:

# Code Style
-Use ES modules (import/export), not CommonJS (require)
-Prefer async/await over .then() chains

# Workflow
-Run `npm test` after making changes
-Run `npm run lint:fix` before committing

# Architecture
-API routes are in src/routes/, follow existing patterns
-All database access goes through src/db/client.ts

The Stale Documentation Problem

Warning: AI confidently uses outdated documentation. If your docs are out of date, AI will generate wrong code based on them.

Best practices for AI-consumable documentation:

Date your docs: Include “Last updated: YYYY-MM-DD”
Mark deprecated sections: [DEPRECATED as of v2.0]
Update docs with code: Same PR for both
Delete stale docs: Outdated docs are worse than no docs
Link to authoritative sources: Don’t duplicate content that drifts

Team practice: When AI suggests something wrong because of stale docs, fix the docs immediately.

Team practice: Ask the coding agent to update docs when the implementation changes.

11. Team Practices and Standards

Establish Shared AI Guidelines

Create technology-specific guidelines that serve as a “contract” with AI tools: - Preferred patterns and libraries - Coding conventions - Testing requirements - Security standards

Define Responsibilities

Task

AI Does

Human Does

Generate initial code

✓

Reviews and approves

Run tests

✓

Interprets failures

Suggest refactoring

✓

Decides if appropriate

Architectural decisions

Suggests

Decides

Security review

Assists

Owns final decision

Code review

First pass

Final approval

Team Onboarding

New team members should: 1. Read this document 2. Pair with experienced engineers on AI-assisted tasks 2. Have their AI-generated code receive extra review initially 3. Gradually build trust through demonstrated good practices

12. Context Window Management (Critical)

Why This Matters

Research shows LLM performance degrades 13.9%-85% as context length increases—even with perfect information retrieval.

Real-world examples: - Claude 3.5 Sonnet dropped from 29% to 3% accuracy at extended context - Models show >50% performance drops at 100K tokens (far below their claimed limits)

Key insight: Bigger context windows don’t mean better results. Quality beats quantity.

The Planning Mode Pattern

Use plan mode to separate research from implementation:

Phase 1: Explore (Read Only) - Research the codebase - All exploration in isolated context - No changes made

Phase 2: Plan - Create implementation plan - Review and edit yourself - Save important decisions to a reference doc

Phase 3: Implement (Fresh Context) - Start NEW session or /clear - Load only the plan and necessary files - Execute against plan, not exploration history

Phase 4: Verify - Run objective tests - Human review

The Reference Document Pattern

Before complex implementations, create a plan file:

# Task: Implement OAuth Login
## docs/plans/oauth-implementation.md

### Requirements
-Support Google OAuth
-Store refresh tokens securely
-Handle token expiration

### Files to Modify
-src/auth/oauth.ts (new)
-src/routes/login.ts (modify)
-src/config/env.ts (add variables)

### Patterns to Follow
-Follow existing auth middleware in src/middleware/auth.ts
-Use our standard error handling from src/utils/errors.ts

### Edge Cases
-Handle OAuth denied
-Handle expired refresh tokens
-Handle network failures during token exchange

Benefits: - Fresh sessions load only this small doc - Human-reviewable artifact before code exists - Team can review the plan - Prevents drift between exploration and implementation

Context Hygiene Practices

Clear context aggressively: - /clear between unrelated tasks - Start fresh sessions for new features - Don’t let debugging sessions pollute feature work

Monitor context fill: - Watch for 80% capacity—exit and restart - Large-scale refactoring degrades fastest - Single-file edits handle high context better

Use subagents for noisy operations: - File searches, analysis, summarization → subagent - Only summaries return to main context

Signs Your Context Is Degraded

AI “forgets” earlier instructions
Suggestions become more generic
AI repeats work it already did
Responses feel less coherent
AI makes mistakes on things it got right earlier

When you see these signs: Don’t push through. Clear context and start fresh with a focused prompt.

13. When NOT to Use AI

Tasks Where AI Isn’t Appropriate

Deep domain expertise required: If you don’t have the expertise to verify the AI’s output, you shouldn’t use AI for it
Security-critical code: Always requires human expertise and review
When you can’t verify: If you can’t test or review the output, don’t generate it
Learning new concepts: Use teaching mode (explanations) instead of generation

The Verification Principle

If you can’t verify whether AI output is correct, you shouldn’t be using AI for that task. This might mean: - You need to learn more first - You need better tests - You need a human expert to review - The task isn’t suitable for AI assistance

14. Using AI for R&D: Research-First Development

When implementing novel functionality, use AI to discover latest research first.

Why This Matters

AI training data has cutoffs—the model may not know current best practices
Academic papers contain more authoritative and novel implementation details not in blog posts
Recent research may reveal better alternatives to your planned approach

Research Discovery Workflow

Before implementing novel algorithms:

Search for recent research:

Search arXiv, IEEE, and ACM for recent papers (2024-2026) on [topic].
Find: state-of-the-art approaches, known limitations, benchmark comparisons,
and implementation considerations.

Review findings: Have AI summarize key papers
Find implementations: Many papers link to GitHub repos
Check for newer work: Is there research that supersedes initial findings?
Document: Add findings to your plan before implementing

Key Sources

arXiv (arxiv.org): Preprints, fastest access
IEEE Xplore: Peer-reviewed engineering
ACM Digital Library: Computer science research
Semantic Scholar: AI-powered discovery
Papers With Code: Papers with implementations
Google Scholar: Broad academic search

Cautions

Verify claims against your use case
Check paper dates—even recent work may be outdated
Cross-reference multiple sources
AI may hallucinate paper titles—-ask for a web search of particular sources and verify URLs exist

15. References

Practitioner Guides

Academic Research (2024-2025)

Knight Foundation RCT 2025 - Experienced developers 19% slower with AI
MIT: GitHub Copilot Field Experiment - 12.9%-21.8% more PRs per week
arXiv 2510.03029: Code Smells in LLM Code - 63.34% more smells
arXiv 2406.09834: Deprecated API Usage - 25-38% deprecated API suggestions
arXiv 2512.03262: Is Vibe Coding Safe? - Only 10.5% secure
ACL 2025: Context Length Hurts Performance - 13.9%-85% degradation
arXiv 2505.07897: LongCodeBench - Performance drops at extended context
arXiv 2504.03846: LLM Self-Preference Bias - Models struggle to recognize own errors
OpenReview: Human-AI Collaboration - 31.11% success with collaboration

Why This Document Exists

1. Introduction and Philosophy

What AI Coding Assistants Are (and Aren’t)

Core Principles

2. The Cardinal Rule: Read All Code Before Committing

Why This Matters

What “Reading” Means

3. The Planning-First Workflow

The Problem with “Just Code It”

The Solution: Explore → Plan → Implement → Verify

Phase 1: Explore (Plan Mode - Read Only)

Phase 2: Plan

Phase 3: Implement (Fresh Context)

Phase 4: Verify

When to Skip Planning

4. Writing Effective Prompts

The Basics

What to Include in Your Prompts

What NOT to Do

5. Code Review Standards for AI-Generated Code

The Layered Review Approach

What Human Reviewers Should Focus On

Red Flags in AI-Generated Code

Keep Pull Requests Small

6. Maintaining Developer Skills

The Skill Atrophy Risk

How to Keep Your Skills Sharp

The Key Insight

7. AI Mistake Catalog

The 9 Systematic Failure Patterns

1. Hallucinated APIs

2. Security Vulnerabilities That Look Functional

2. Performance Anti-Patterns

3. Error Handling That Assumes Happy Paths

4. Missing Edge Cases

5. Outdated Library Usage

6. Data Model Mismatches

7. Missing Context Dependencies

8. Race Conditions and Concurrency Issues

Common Footguns

Deprecated APIs and Outdated Code Patterns

TODOs, Mocks, and Incomplete Implementations

8. Avoiding “AI Slop” - Quality Standards

What is “AI Slop”?

Characteristics of AI Slop in Code

Quality Practices

The Verification Hierarchy

9. Security Considerations

Assume AI Code Contains Vulnerabilities

Security Checklist for AI-Generated Code

High-Risk Areas

Never Trust AI for Security-Critical Decisions

MCP Security Best Practices

10. Agent Rules vs. Documentation

Understanding the Difference

Best Practices for Rules Files

The Stale Documentation Problem

11. Team Practices and Standards

Establish Shared AI Guidelines

Define Responsibilities

Team Onboarding

12. Context Window Management (Critical)

Why This Matters

The Planning Mode Pattern

The Reference Document Pattern

Context Hygiene Practices

Signs Your Context Is Degraded

13. When NOT to Use AI

Tasks Where AI Isn’t Appropriate

The Verification Principle

14. Using AI for R&D: Research-First Development

Why This Matters

Research Discovery Workflow

Key Sources

Cautions

15. References

Practitioner Guides

Academic Research (2024-2025)

Additional Resources