Ask HN: A proposal for interviewing "AI-Augmented" Engineers

Hi HN,

I’m currently rethinking our hiring process. Like many of you, I feel that traditional algorithmic tests (LeetCode style) are becoming less relevant now that LLMs can solve them instantly. Furthermore, prohibiting AI during interviews feels counter-productive; I want to hire engineers who know how to use these tools effectively to multiply their output.

I am designing a new evaluation framework based on real-world open-source work, and I would love the community’s feedback on whether this sounds fair, effective, or if I’m missing something critical.

The Core Philosophy: We shouldn't test if a candidate can write syntax better than an AI. We should test if they can guide, debug, and improve upon an AI's output to handle the "last mile" of complex engineering.

The Proposed Process:

1. Task Selection (Real World Context) Instead of synthetic puzzles, we select open issues or discussions from public GitHub repositories that share a tech stack with our product.

    Scope: 2–4 hours.

    Types: Implementing a feature based on a discussion, fixing a bug, or reviewing a PR (specifically one that was eventually rejected, to test "taste").

    Ambiguity: Adjusted for seniority. Junior roles get clear specs; senior roles get vague problem statements requiring architectural decisions.

2. Establishing the "AI Baseline" Before giving the task to a candidate, we run it through current SOTA models with minimal human intervention.

    The Filter: If the AI solves it perfectly on the first try, we discard the task.

    The Sweet Spot: We are looking for tasks where the AI gets 80% right but fails on edge cases, context integration, or complex logic. The problem setup should not be too easy or too hard.

3. The Candidate Test Candidates are required to use their preferred AI coding tools. We ask them to submit not just the code, but their chat/prompt history.

How We Evaluate (The "AI Delta"):

We aren't just looking at the final code. We analyze the "diff" between the Candidate’s process and our "AI Baseline":

    1. Exploration Strategy: How does the candidate "load context"? Do they blindly paste errors, or do they guide the AI to understand the repository structure first? We look for a clear understanding of the existing codebase.

    2. Engineering Rigor (TDD): Does the candidate push the AI to generate a test plan or reproduction script before generating the fix? We value candidates who treat the AI as a junior partner that needs verification.

    3. The "Last 10%" (Edge Cases): Since we picked tasks where AI fails slightly, we look at how the candidate handles those failure modes. Can they spot the boundary conditions and logic errors that the LLM glossed over?

    4. Documentation Hygiene: We specifically check if the candidate instructs the AI to search existing documentation and—crucially—if they prompt the AI to update the docs to reflect the new changes.

    5. Engineering Taste (The Rejected PR): For the code review task, we ask them to analyze a PR that was rejected in the real world (without telling them). We want to see if their reasoning for rejection aligns with our team's engineering culture (maintainability, complexity, clarity, etc.).

My Questions for HN:

    Is analyzing the "Chat History" too invasive, or is it the best way to see their thought process in 2026?

    For those of you hiring now, how do you distinguish between a "prompt kiddie" and a senior engineer who is just very good at prompting?

    Does the 2-4 hour time commitment feel reasonable for a "take-home" if the tooling makes the actual coding faster?

Thanks for your insights!

(Full disclosure: In the spirit of this topic, this post was composed by AI based on my draft notes.)

Story

Ask HN: A proposal for interviewing "AI-Augmented" Engineers