AI Coding Best Practices for Today
Why This Document Exists
AI coding assistants like GitHub Copilot, Claude, Cursor, and others can significantly accelerate development. However, research shows they also introduce systematic risks:
-
63% more code smells in AI-generated code compared to human-written code
-
25-38% of AI suggestions use deprecated APIs due to training data cutoffs
-
Performance degrades significantly as context windows fill up
This guide helps you capture the productivity benefits while avoiding the quality pitfalls. It’s grounded in peer-reviewed research from 2024-2026 and experience from teams at Anthropic, GitHub, and across the industry.
1. Introduction and Philosophy
What AI Coding Assistants Are (and Aren’t)
Think of AI coding assistants as an over-confident pair programming partner. They’re fast at looking things up, excellent at executing tedious tasks, and can write code that looks correct. But they make mistakes—and those mistakes can be subtle and “deeply inhuman” in ways you won’t expect.
“The mental model I use: LLMs are an over-confident pair programming assistant who’s fast at lookups and executing tedious tasks, but makes mistakes.” — Simon Willison
Core Principles
1. You own all AI-generated code.
The moment you commit code, it’s yours—regardless of who or what wrote it. You’re responsible for its correctness, security, and maintainability. Never commit code you don’t understand.
2. AI is an assistant, not a replacement.
AI accelerates your work; it doesn’t replace your judgment. Your experience, architectural knowledge, and understanding of requirements are irreplaceable.
2. Start simple, add complexity only when needed.
Don’t reach for sophisticated AI workflows when a simple prompt works. Many tasks need just a direct request, not an elaborate agent setup.
3. Verify everything.
AI can hallucinate APIs that don’t exist, introduce security vulnerabilities, and miss edge cases. Trust, but verify—with tests, linters, and your own review.
2. The Cardinal Rule: Read All Code Before Committing
Mandatory practice: Engineers must read every line of AI-generated code at least once before committing.
This is the single most important practice in this document. No exceptions.
I know a lot of developers oppose reading code since it’s easy to regenerate. But at least for the next 12-18 months, I believe it’s far too important to understand what your code is actually doing under the hood.
Why This Matters
Learning: Reading code builds your understanding of patterns, idioms, and the codebase itself. Even if the code is correct, you learn by reading it.
Complexity Awareness: When bugs appear later (and they will), you’ll need mental models of the code to debug effectively. If you never read the code, you’ll struggle to fix it when AI assistance fails.
Ownership: You cannot truly own code you haven’t read. Blind commits transfer liability without understanding. When someone asks “why did you do it this way?”, “the AI wrote it” is not an acceptable answer.
Bug Detection: AI introduces subtle bugs that only human review catches: - Race conditions in concurrent code - Logical errors that pass type checking - Security vulnerabilities that look functional - Edge cases the AI never considered
What “Reading” Means
Reading doesn’t mean glancing at the code. It means:
-
Understanding what each function does
-
Tracing the data flow
-
Considering edge cases
-
Asking yourself: “Would I have written it this way? If not, why?”
-
Verifying the code matches your intended requirements
If you can’t explain what the code does to a colleague, you haven’t read it thoroughly enough.
3. The Planning-First Workflow
The Problem with “Just Code It”
When you ask AI to implement something immediately, you risk: - Solving the wrong problem (AI misunderstood your requirements) - Implementing a suboptimal approach (AI didn’t know about existing patterns) - Creating code that doesn’t fit your architecture
The Solution: Explore → Plan → Implement → Verify
This workflow, recommended by Anthropic, separates research from execution.
Phase 1: Explore (Plan Mode - Read Only)
Before making any changes, research the codebase:
I want to add user authentication. Before we implement anything:
- What authentication patterns already exist in this codebase?
- What libraries are we already using?
- Are there any relevant configuration files?
- Which libraries seem resilient and implement the best practices for this today?
Let the AI read files and answer questions. Don’t let it make changes yet.
Result: Understanding of the current state and constraints.
Phase 2: Plan
Ask the AI to create a detailed implementation plan:
Based on your research, create a plan for implementing OAuth login:
- Which files need to change?
- What new files should we create?
- What's the sequence of changes?
- What edge cases should we handle?
Review this plan yourself. Edit it. Challenge assumptions. This is your specification. If anything important isn’t spelled out, tell the model what to do or ask for that concern to be addressed in the plan.
Result: A human-reviewed plan you can reference later.
Phase 3: Implement (Fresh Context)
Start a new session or clear your context (/clear in many tools). Then implement:
Implement the OAuth login flow according to this plan:
[paste or reference your reviewed plan]
Start with step 1: creating the authentication service.
Execute against your plan, not the exploratory conversation. This keeps context clean.
Result: Clean implementation without research noise.
Phase 4: Verify
-
Run your test suite
-
Run linters and type checkers
-
Manually test the feature
-
Have a human review the changes
Result: Confidence the implementation is correct.
When to Skip Planning
For simple, well-understood tasks, skip straight to implementation: - Fixing a typo - Adding a log line - Renaming a variable - Implementing a function with a clear specification
If you could describe the diff in one sentence, you probably don’t need a plan. You might also benefit from not using a coding agent.
4. Writing Effective Prompts
The Basics
Be specific, not vague:
Instead of…
Write…
“Add tests for foo.py”
“Write tests for the validate_email function in foo.py, covering: valid emails, invalid format, empty string, and None input”
“Fix the login bug”
“Users report that login fails after session timeout. The issue is likely in src/auth/. Write a failing test that reproduces the issue, then fix it”
“Make the dashboard look better”
“[paste screenshot] Implement this design for the dashboard. Use our existing component library in src/components/”
Provide verification criteria:
Tell the AI how to verify its work:
Implement a rate limiter for our API. Requirements:
- Max 100 requests per minute per user
- Returns 429 status when exceeded
- Write tests that verify the limit is enforced
- Run the tests after implementing
Reference existing patterns:
Add a new UserSettings component following the same patterns as
ProfileSettings in src/components/ProfileSettings.tsx.
Use our existing form components and validation approach.
Break complex tasks into steps:
Instead of “Build a complete user registration system,” break it down:
-
“Create the database schema for users”
-
“Implement the registration endpoint”
-
“Add email verification”
-
“Create the frontend form”
What to Include in Your Prompts
-
Context: What files are relevant? What’s the current state?
-
Requirements: What exactly should the code do?
-
Constraints: What libraries to use? What patterns to follow?
-
Verification: How will you know it works?
-
Examples: Sample inputs/outputs if applicable
What NOT to Do
-
Don’t assume the AI knows your codebase’s conventions
-
Don’t accept the first suggestion without review
-
Don’t skip verification because the code “looks right”
-
Don’t let the AI decide your architecture
5. Code Review Standards for AI-Generated Code
The Layered Review Approach
AI-generated code requires the same (or more) scrutiny as human-written code.
Layer 1: Automated Checks (First) - Linters and formatters - Type checkers - Unit tests - Static analysis (CodeQL, Semgrep) - Security scanners (Dependabot, Snyk)
Run these before any human looks at the code. They catch obvious issues quickly.
Layer 2: Self-Review (Second) - Did you read every line? (See Section 3) - Does the code match your requirements? - Are there any TODO comments or incomplete implementations? - Were any implementations fully mocked to look like they worked? - Do you understand why each decision was made?
Layer 3: Peer Review (Third) - Another human reviews the code - Focus on architectural fit and business logic - Check for edge cases the AI might have missed - Verify the code is maintainable
What Human Reviewers Should Focus On
Automated tools catch syntax and known vulnerability patterns. Humans should focus on:
-
Architectural alignment: Does this fit our system design?
-
Business logic correctness: Does it actually solve the problem?
-
Edge cases: What happens with unusual inputs?
-
Maintainability: Will this be easy to modify later?
-
Security assumptions: Is the threat model appropriate?
Red Flags in AI-Generated Code
Watch for these patterns during review:
-
// TODO:or// FIXMEcomments (incomplete code) -
throw new NotImplementedError()or similar stubs -
Hardcoded values that should be configurable
-
Missing error handling for edge cases
-
Overly complex solutions to simple problems
-
Code that doesn’t match existing patterns in the codebase
-
Loud, insecure logging of sensitive content
Keep Pull Requests Small
Even though AI can generate code quickly, keep PRs small and focused:
-
Easier to review thoroughly
-
Easier to identify AI-introduced issues
-
Easier to revert if problems emerge
6. Maintaining Developer Skills
The Skill Atrophy Risk
Research from Microsoft and Carnegie Mellon (2025) found that increased reliance on AI tools correlates with decreased critical thinking engagement. The more you delegate to AI, the harder it becomes to summon those skills when needed.
This is similar to how GPS navigation eroded wayfinding abilities—convenient in the moment, but problematic when you need to navigate without it.
How to Keep Your Skills Sharp
1. Write Code Yourself First
Before asking AI for help with a new concept, try writing it yourself. Struggle with it. Make mistakes. Then use AI to compare approaches or fill gaps.
2. Understand Before Accepting
Never blindly accept AI suggestions. When AI proposes code: - Read it (see Section 3) - Understand why it works - Consider if you’d write it differently - Ask the AI to explain (in ask mode) if something is unclear
2. Use Teaching Mode
Instead of asking AI to generate code, ask it to explain concepts:
Explain how rate limiting works with examples.
Don't write the code—I want to understand the concepts first.
Most coding agents have an Ask mode intended just for this purpose.
3. Code Review as Learning
Treat reviewing AI output as a learning opportunity. Ask yourself: - What patterns is the AI using? - Are there approaches I hadn’t considered? - What would I have done differently?
4. Disable Inline Suggestions Sometimes
During learning-focused sessions, turn off AI autocomplete. Write the code yourself. Use AI chat as a tutor that explains concepts, not as a code generator.
The Key Insight
The mandatory code reading from Section 3 isn’t just about quality—it’s about professional development. The same practice that catches bugs also builds the mental models you need for future debugging.
When AI assistance fails (and it will), your own skills are what you fall back on.
7. AI Mistake Catalog
The 9 Systematic Failure Patterns
AI-generated code fails in predictable patterns. Learning to recognize these makes debugging faster.
1. Hallucinated APIs
What it looks like: The AI suggests importing a package or calling a method that doesn’t exist. It looks plausible but fails at runtime.
Why it happens: AI models learn patterns, not facts. They’ve seen thousands of imports and method calls, so they generate similar-looking code—even when it’s fictional.
How to catch it: - Read the code before running code whenever new dependencies are added - Verify unfamiliar imports and read library reviews, check 3rd party references - Watch for methods that seem “too convenient”
⚠️
Warning: Hackers have been known to name-squat hallucinated library names with malicious code. Hallucinated APIs can do far worse than simply not work as intended. They can put nation-state hostile actors on your machine!
Example:
# AI suggests this, but the method doesn't exist
from datetime import datetime
result = datetime.parse_flexible("2024-01-15") # No such method!
2. Security Vulnerabilities That Look Functional
What it looks like: Code that works correctly but has security flaws. The tests pass, but it’s exploitable.
Why it happens: AI optimizes for making code work, not for security. It generates patterns that are common in training data—including insecure patterns in overly simplified examples.
How to catch it: - Run security scanners (CodeQL, Semgrep) - Check for SQL injection, XSS, missing authentication - Look for hardcoded credentials - Review error messages for information leakage
Example:
# Works, but SQL injection vulnerability
def get_user(username):
query = f"SELECT * FROM users WHERE name = '{username}'"
return db.execute(query)
2. Performance Anti-Patterns
What it looks like: Code that’s correct but slow. Often involves inefficient algorithms or data structures.
Why it happens: AI generates code that works at small scale. It doesn’t optimize for production loads.
How to catch it: - Look for nested loops (O(n²) where O(n) is possible) - Check for string concatenation in loops - Profile performance-critical code - Test with realistic data volumes
Example:
# O(n²) when O(n) is possible
def has_duplicates(items):
for i, item in enumerate(items):
for j, other in enumerate(items):
if i != j and item == other:
return True
return False
# Better: use a set for O(n)
def has_duplicates(items):
return len(items) != len(set(items))
3. Error Handling That Assumes Happy Paths
What it looks like: Try-catch blocks that don’t actually handle errors or missing validation for inputs.
Why it happens: Training data mostly contains happy-path code. Error handling is often missing or incomplete in examples the AI learned from.
How to catch it: - Look for empty catch blocks - Check if errors are logged but not handled - Verify all inputs are validated - Test with null, empty, and invalid inputs
Example:
# Catches error but doesn't handle it
try:
result = risky_operation()
except Exception as e:
print(f"Error:{e}") # Then what?
return result # Could be undefined!
4. Missing Edge Cases
What it looks like: Code that works for typical inputs but fails on boundaries: empty arrays, null values, maximum integers, Unicode characters.
Why it happens: Training data over-represents common cases. Edge cases appear less frequently, so the AI is less likely to handle them.
How to catch it: - Test with empty inputs - Test with null/None values - Test with boundary values (0, -1, MAX_INT) - Test with special characters and Unicode
Example:
# Fails on empty list
def get_average(numbers):
return sum(numbers) / len(numbers) # ZeroDivisionError!
5. Outdated Library Usage
What it looks like: Code uses deprecated APIs or old patterns that have been replaced with better alternatives.
Why it happens: AI training data has cutoff dates (often late 2023). Libraries evolve faster than AI training updates. Research shows 25-38% of AI suggestions use deprecated APIs.
How to catch it: - Check documentation for deprecation warnings - Verify package versions against latest - Run with deprecation warnings enabled - Watch for security advisories on old versions
💡
Instead of catching it retroactively, suggest known, excellent libraries and APIs during the planning phase
Example:
# Old pattern (deprecated in Python 3.10+)
from typing import List, Dict
def process(items: List[Dict[str, int]]) -> None: ...
# Modern pattern
def process(items: list[dict[str, int]]) -> None: ...
6. Data Model Mismatches
What it looks like: Code assumes data structures that don’t match actual APIs or database schemas. Fails at runtime when real data is used.
Why it happens: AI doesn’t have access to your actual schemas. It guesses based on variable names and context.
How to catch it: - Verify field names against actual schemas - Test with real data from your systems - Check API documentation for response formats - Use TypeScript/type hints to catch mismatches early
💡
Always pass in real data and data structures if you have them BEFORE implementing code
Example:
// AI assumes this structure
interface User {
username: string;
email: string;
}
// Actual API returns this
interface User {
user_name: string; // Different field name!
email_address: string;
}
7. Missing Context Dependencies
What it looks like: Code assumes environment variables, configurations, or services that don’t exist in all environments.
Why it happens: AI generates code based on immediate context. It doesn’t know about your deployment configurations.
How to catch it: - Check for hardcoded environment assumptions - Verify all configuration values exist - Test in staging environments - Document required dependencies
Example:
# Assumes DATABASE_URL exists
import os
db_url = os.environ["DATABASE_URL"] # KeyError in some environments!
# Better: provide a default or fail explicitly
db_url = os.environ.get("DATABASE_URL")
if not db_url:
raise EnvironmentError("DATABASE_URL must be set")
8. Race Conditions and Concurrency Issues
AI models particularly struggle with concurrent programming.
Watch for: - Shared mutable state without synchronization (locks, mutexes) - Async operations without proper awaiting - Resource cleanup in the wrong order - Lock acquisition without guaranteed release - Time-of-check to time-of-use (TOCTOU) bugs - Inappropriate locking methods used with lock-busting coroutines (synchronized) - Hardcoded delays used as race condition “fixes”
Example of a race condition:
# Not thread-safe: check and update are separate
class Counter:
def __init__(self):
self.count = 0
def increment_if_below(self, limit):
if self.count < limit: # Check
self.count += 1 # Update - another thread could have changed count!
Common Footguns
-
Hardcoded delays: Race conditions need real solutions, not bandaids that break with the next change or slower connection
-
Implicit type coercion:
"5" + 3behaves differently across languages -
Null pointer exceptions: Unvalidated inputs accessed without checks
-
Off-by-one errors: Loop boundaries wrong by one
-
Resource leaks: File handles or connections not closed
-
Infinite loops: Incorrect termination conditions
-
Silent failures: Exceptions caught but not handled or logged
Deprecated APIs and Outdated Code Patterns
High-risk areas for outdated code: - Cryptography (algorithms deprecated for security reasons) - Authentication/authorization libraries - Web frameworks (breaking changes between major versions) - Database drivers and ORMs - Cloud SDK clients (AWS, GCP, Azure APIs change frequently)
Verification question: “When was this method/API introduced and is it still the recommended approach?”
Team practice: Always verify AI-suggested library code against current documentation.
It often helps to directly link the most recent API documentation to the coding agent.
TODOs, Mocks, and Incomplete Implementations
AI systematically leaves incomplete code that can slip into production.
Patterns to search for before committing:
TODO
FIXME
XXX
"not implemented"
"placeholder"
"implement this"
Set up pre-commit hooks: Use a tool like lefthook to enforce pre-commit hooks that detect these patterns.
Team policy: If no ticket number is included, we should safely be able to assume they were added by AI. We should block the commit by hook. Do not commit TODOs without ticket numbers in production code!
8. Avoiding “AI Slop” - Quality Standards
What is “AI Slop”?
“AI slop” refers to low-quality, generic AI-generated content that: - Uses templated, surface-level approaches - Lacks genuine insight or fit to your specific problem - Shows minimal human editing or oversight - Is harder to refactor than to rewrite
Characteristics of AI Slop in Code
-
Generic implementations that don’t match your architecture
-
Overly verbose code when simpler solutions exist
-
Missing context about your specific requirements
-
Boilerplate that doesn’t integrate with existing patterns
-
Code comments that state the obvious rather than explain why
Quality Practices
Treat AI prompts like code: Version them, review them, iterate on them. Good prompts lead to better output.
Verify against your own knowledge: You know your architecture. If the AI’s approach conflicts with it, trust yourself.
Don’t accept first drafts that just “seem to work”: Ask for improvements. Request alternatives. Challenge the AI’s assumptions.
The Verification Hierarchy
Critical: Do NOT rely on the same model to review its own output.
Research shows LLMs exhibit “self-preference bias”—they score their own outputs higher and struggle to recognize their own errors. They also hallucinate mistakes when asked to spot their errors.
Correct verification order:
-
Objective tests first: Automated, deterministic checks (unit tests, linters, type checkers). These cannot hallucinate.
-
LLM-as-judge from a DIFFERENT model: If using AI for review, use a different model than the one that generated the code. This avoids self-confirmation bias.
-
Human review: Final verification by humans who understand the requirements.
Why this matters: Same-model self-review can hallucinate problems that don’t exist OR miss real problems while confidently approving flawed code.
9. Security Considerations
Assume AI Code Contains Vulnerabilities
Research shows: - 45% of AI-generated code contains security vulnerabilities - AI code is 2.74x more prone to XSS vulnerabilities
Security Checklist for AI-Generated Code
-
Run static analysis security tools (CodeQL, Semgrep)
-
Check for hardcoded secrets or credentials
-
Verify input validation exists for all user inputs
-
Check for SQL injection in database queries
-
Verify authentication/authorization is properly implemented
-
Check error messages don’t leak sensitive information
-
Verify dependencies don’t have known vulnerabilities
-
Check for proper HTTPS/TLS usage
-
Verify file operations don’t allow path traversal
High-Risk Areas
Extra scrutiny for AI-generated code in: - Authentication and authorization - Payment processing - Personal data handling - API endpoints accessible from the internet - File upload/download functionality - Cryptographic operations
Never Trust AI for Security-Critical Decisions
For security-critical code: 1. Write it yourself or closely supervise the AI 2. Have security-focused peer review 2. Consider professional security audit for high-stakes systems
Team policy: Add a merge request template for project code reviews with a checklist appropriate to that kind of application.
While you can ask another coding agent to check for specific vulnerabilities, you should never assume that is a sufficient level of review. It’s a good first step only.
MCP Security Best Practices
MCP tools are a significant organizational security risk that has no simple solution. It’s best to only use what’s absolutely necessary.
1. Vet MCP servers before installing
-
Only use MCP servers from trusted, known sources
-
Check for recent security audits or vulnerability reports
-
Prefer official servers from major vendors (but even these have had vulnerabilities)
2. Minimize MCP server permissions
-
Only grant the minimum permissions needed
-
Don’t connect MCP servers to sensitive systems
-
Use read-only access where possible
2. Isolate MCP servers
-
Run MCP servers in containers or sandboxed environments
-
Limit network access to only required endpoints
3. Monitor MCP activity
-
Log all MCP tool invocations
-
Review what data is being accessed
-
Watch for unexpected behavior
4. Keep servers updated
-
MCP servers must be actively patched for vulnerabilities
-
Update frequently
-
Remove unused servers
10. Agent Rules vs. Documentation
Understanding the Difference
Agent Rules (like CLAUDE.md, .cursorrules): Loaded into every AI interaction. Always in context.
Documentation: Files the AI can search and read when needed. Not always loaded.
Best Practices for Rules Files
Rules files consume context tokens on every interaction. Keep them lean.
Include: - Build/test commands the AI can’t guess - Coding style rules that differ from defaults - Architectural decisions specific to your project - Common gotchas the AI keeps getting wrong
Exclude: - Standard language conventions (AI already knows these) - Things your linter catches - Long explanations (link to docs instead) - Anything the AI infers correctly from code
Example CLAUDE.md:
# Code Style
-Use ES modules (import/export), not CommonJS (require)
-Prefer async/await over .then() chains
# Workflow
-Run `npm test` after making changes
-Run `npm run lint:fix` before committing
# Architecture
-API routes are in src/routes/, follow existing patterns
-All database access goes through src/db/client.ts
The Stale Documentation Problem
Warning: AI confidently uses outdated documentation. If your docs are out of date, AI will generate wrong code based on them.
Best practices for AI-consumable documentation:
-
Date your docs: Include “Last updated: YYYY-MM-DD”
-
Mark deprecated sections:
[DEPRECATED as of v2.0] -
Update docs with code: Same PR for both
-
Delete stale docs: Outdated docs are worse than no docs
-
Link to authoritative sources: Don’t duplicate content that drifts
Team practice: When AI suggests something wrong because of stale docs, fix the docs immediately.
Team practice: Ask the coding agent to update docs when the implementation changes.
11. Team Practices and Standards
Establish Shared AI Guidelines
Create technology-specific guidelines that serve as a “contract” with AI tools: - Preferred patterns and libraries - Coding conventions - Testing requirements - Security standards
Define Responsibilities
Task
AI Does
Human Does
Generate initial code
✓
Reviews and approves
Run tests
✓
Interprets failures
Suggest refactoring
✓
Decides if appropriate
Architectural decisions
Suggests
Decides
Security review
Assists
Owns final decision
Code review
First pass
Final approval
Team Onboarding
New team members should: 1. Read this document 2. Pair with experienced engineers on AI-assisted tasks 2. Have their AI-generated code receive extra review initially 3. Gradually build trust through demonstrated good practices
12. Context Window Management (Critical)
Why This Matters
Research shows LLM performance degrades 13.9%-85% as context length increases—even with perfect information retrieval.
Real-world examples: - Claude 3.5 Sonnet dropped from 29% to 3% accuracy at extended context - Models show >50% performance drops at 100K tokens (far below their claimed limits)
Key insight: Bigger context windows don’t mean better results. Quality beats quantity.
The Planning Mode Pattern
Use plan mode to separate research from implementation:
Phase 1: Explore (Read Only) - Research the codebase - All exploration in isolated context - No changes made
Phase 2: Plan - Create implementation plan - Review and edit yourself - Save important decisions to a reference doc
Phase 3: Implement (Fresh Context) - Start NEW session or /clear - Load only the plan and necessary files - Execute against plan, not exploration history
Phase 4: Verify - Run objective tests - Human review
The Reference Document Pattern
Before complex implementations, create a plan file:
# Task: Implement OAuth Login
## docs/plans/oauth-implementation.md
### Requirements
-Support Google OAuth
-Store refresh tokens securely
-Handle token expiration
### Files to Modify
-src/auth/oauth.ts (new)
-src/routes/login.ts (modify)
-src/config/env.ts (add variables)
### Patterns to Follow
-Follow existing auth middleware in src/middleware/auth.ts
-Use our standard error handling from src/utils/errors.ts
### Edge Cases
-Handle OAuth denied
-Handle expired refresh tokens
-Handle network failures during token exchange
Benefits: - Fresh sessions load only this small doc - Human-reviewable artifact before code exists - Team can review the plan - Prevents drift between exploration and implementation
Context Hygiene Practices
Clear context aggressively: - /clear between unrelated tasks - Start fresh sessions for new features - Don’t let debugging sessions pollute feature work
Monitor context fill: - Watch for 80% capacity—exit and restart - Large-scale refactoring degrades fastest - Single-file edits handle high context better
Use subagents for noisy operations: - File searches, analysis, summarization → subagent - Only summaries return to main context
Signs Your Context Is Degraded
-
AI “forgets” earlier instructions
-
Suggestions become more generic
-
AI repeats work it already did
-
Responses feel less coherent
-
AI makes mistakes on things it got right earlier
When you see these signs: Don’t push through. Clear context and start fresh with a focused prompt.
13. When NOT to Use AI
Tasks Where AI Isn’t Appropriate
-
Deep domain expertise required: If you don’t have the expertise to verify the AI’s output, you shouldn’t use AI for it
-
Security-critical code: Always requires human expertise and review
-
When you can’t verify: If you can’t test or review the output, don’t generate it
-
Learning new concepts: Use teaching mode (explanations) instead of generation
The Verification Principle
If you can’t verify whether AI output is correct, you shouldn’t be using AI for that task. This might mean: - You need to learn more first - You need better tests - You need a human expert to review - The task isn’t suitable for AI assistance
14. Using AI for R&D: Research-First Development
When implementing novel functionality, use AI to discover latest research first.
Why This Matters
-
AI training data has cutoffs—the model may not know current best practices
-
Academic papers contain more authoritative and novel implementation details not in blog posts
-
Recent research may reveal better alternatives to your planned approach
Research Discovery Workflow
Before implementing novel algorithms:
-
Search for recent research:
Search arXiv, IEEE, and ACM for recent papers (2024-2026) on [topic]. Find: state-of-the-art approaches, known limitations, benchmark comparisons, and implementation considerations. -
Review findings: Have AI summarize key papers
-
Find implementations: Many papers link to GitHub repos
-
Check for newer work: Is there research that supersedes initial findings?
-
Document: Add findings to your plan before implementing
Key Sources
-
arXiv (arxiv.org): Preprints, fastest access
-
IEEE Xplore: Peer-reviewed engineering
-
ACM Digital Library: Computer science research
-
Semantic Scholar: AI-powered discovery
-
Papers With Code: Papers with implementations
-
Google Scholar: Broad academic search
Cautions
-
Verify claims against your use case
-
Check paper dates—even recent work may be outdated
-
Cross-reference multiple sources
-
AI may hallucinate paper titles—-ask for a web search of particular sources and verify URLs exist
15. References
Practitioner Guides
Academic Research (2024-2025)
-
Knight Foundation RCT 2025 - Experienced developers 19% slower with AI
-
MIT: GitHub Copilot Field Experiment - 12.9%-21.8% more PRs per week
-
arXiv 2510.03029: Code Smells in LLM Code - 63.34% more smells
-
arXiv 2406.09834: Deprecated API Usage - 25-38% deprecated API suggestions
-
arXiv 2512.03262: Is Vibe Coding Safe? - Only 10.5% secure
-
ACL 2025: Context Length Hurts Performance - 13.9%-85% degradation
-
arXiv 2505.07897: LongCodeBench - Performance drops at extended context
-
arXiv 2504.03846: LLM Self-Preference Bias - Models struggle to recognize own errors
-
OpenReview: Human-AI Collaboration - 31.11% success with collaboration