I'll copy what I wrote on LinkedIn (note: I read roughly 25 pages, which is half the paper, and read it quickly)[0]:
"If I read the paper correctly, they don’t actually show that LLMs prefer resumes they generate.
Their actual method seems to be taking a human written resume, deleting the executive summary, having an LLM rewrite the executive summary based on the rest of the resume and then having another LLM rate the executive summary without the rest of the resume.
That’s likely to massively overstate any real impact, if you can even rely on it capturing a real effect.
I really wonder if I read that correctly, because I can’t come up with a justification for that study design."
[0] I couldn't help but mildly copy-edit before pasting here.
Edit: yes, the authors present a reason for their design, and an ideal version of my comment would've said that. I do not consider it much of a justification. See below: https://news.ycombinator.com/item?id=47987256#47987727.
Could be an ad for 'use LLMs more'. A generic ad like this helps all in the market, but if you own 30% of LLM market share, it still helps you 30% of the time.
Now that I think of it, every other industry has an 'advocacy group', whether cheese, oil, or nutmeg. So surely there is now some sort of LLM 'consortium', and group funding studies like this just fuels the FOMO. You can be sure such groups exist, and are pummeling every government in the world thusly. But I bet they're also looking here.
After all, it's a circle. Uh-oh! HR is using LLMs, you'd better too potential employee! Then later? Uh-oh! The best employees you can hire are using LLMs, you'd better too HR!
They already FOMOed us into basically everything else, why not LLMs too?
There is some creativity in the rest of the CV, between what kind of experiences are included and how they are described. But that would be far harder to generate fairly.
In think choosing the summary is a fair design choice since it prevents the LLM from just... making up a perfect candidate.
"I'm a fullstack professor of software design with 90 years of experience expecting a junior internship position"
To be perfectly clear, I understand their justification for only _editing_ the executive summary, it is arguably reasonable, because editing the work history would risk altering the details in ways that compromise the measurement. This is a hard problem to solve (you might try reviewing the resumes for hallucinations, but I can't think of a precise study design that doesn't risk problems).
What is, imho, impossible to defend, is having the LLM only evaluate the executive summary in isolation, and reporting that as it preferring resumes it wrote.
What you've shown is that LLMs prefer executive summaries they wrote. But the overall impact on how they will evaluate your entire resume is not measured by this technique.
Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings.
I doubt it since they, admittedly, didn't read it. The question he posed, about the paper, is answered in that very same paper. He has structured his whole reply to have the tone of uncovering the hidden caveat in the small print that invalidates the paper, when it's actually a straightforwardly stated assumption in their methodology section.
When I was looking for my next role after being laid off, I didn’t get much of a response with my human handmade resume despite my experience
Just for kicks, I asked ChatGPT to “Analyze my resume and give it a score for what percentage it was in” then I asked it to revise it to make it score as high as possible
I still tweaked and fact checked it but after I started sending that out, I got a much higher hit rate than before
But who knows, maybe the market changed, was a better time of year, etc
I still had to pass interviews and prove my worth. But it probably helped me get my foot in the door
Same thing happened to my wife as well. I helped her tailor her LinkedIn profile and resume with a lot of attention to detail: adding metrics, keywords, results, etc. Nevertheless, she never received any outreach recruiters and got very few application responses. It went like that for months, almost a year.
Then she asked ChatGPT 5.x for help. I was skeptical about the changes it recommended (and was skeptical at all about using AI for this given the homogeneification it tends to produce). But somehow it worked: few days later, a recruiter reached out, then another, then applications started moving forward, etc.
My guess is that, as LLMs are shoveled into every phase of the recruiting process, not having an LLM write your resume for you is now playing on hard mode. The LLMs reviewing resumes are downranking resumes and profiles that are not "speaking" the same language and activating the correct neurons, thus preventing you from moving forward. This contrasts with years ago when we had more humans in the loop and the pasteurised writing of GPT 3.5/4o would make you look less worthy. Again, just a theory, but...
If it's something like "Refactored the apartment list service improving P99 Latency from 2s to 180ms", it definitely boosts the resumé in my mind. A good engineer would be measuring their impact and likely have numbers like that off the top of their head.
But if it's like "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%," with the same fidelity on each bullet, I'm very skeptical.
--
I don't summarily reject AI-written resumés to be clear, as honestly, it's basically a necessity at this point to be competitive with others; it'd be putting yourself at a severe disadvantage on pure principles in a way that has no real positive net effect on society. Even if you disagree with AI resumé screeners, you're only hurting yourself — especially at a time that has the largest impact on your compensation (i.e. negotiating salary at job start is one of the most valuable ways to spend your time since it will pay you back every paycheck).
Though I _do_ tend to question resumés that look like they were written almost entirely by an LLM without the candidate providing significant context and refinement.
> If it's something like "Refactored the apartment list service improving P99 Latency from 2s to 180ms", it definitely boosts the resumé in my mind. A good engineer would be measuring their impact and likely have numbers like that off the top of their head.
> But if it's like "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%," with the same fidelity on each bullet, I'm very skeptical.
Do you mind explaining why? The former doesn't indicate caring about business impact whatsoever (is this service in the critical path of any online process? Who knows!) while the latter does.
I wish it was at least normalized to submit two resumes - one for AI and one for humans. Threading the needle to please both audiences is such a crap-shoot.
Which is a very “HN” sentiment when the vast majority of recruiters and hiring managers are absolutely not doing the same. Especially for roles outside of tech.
Yeah I don’t know what others are doing, but I work in the valley and those elements signal checklist mentality. To wit, those keyword lists often include, in my experience, proficiency in specific tool use, rather than communicating skills that transcend tools, which tells me the person is likely not very dynamic or creative.
> those keyword lists often include, in my experience, proficiency in specific tool use
This used to be called "buzzword bingo" and was pretty much required. It was how you got past the initial automated filtering step before a human even saw your resume.
I don’t know whether it was ever effective strategy for candidates, but I will simply say that as a hiring manager for over 12 years, I have never been interested in anyone’s resume when I see that.
As someone who's been a hiring manager for around 7 years, I agree with you, but note that the people who screen resumés before they even _get to you_ very well may be looking for those references.
For my own resumé, I include the stack used at each job which I feel strikes a fair balance.
That's what I always did too. Then I removed it because I wanted to focus more on the kind of problems I solve rather than the languages I've worked in, and recruiters complained, so I put it back in.
Most HR departments have been filtering resumes (or LinkedIn) based on things like keywords for years before they got to you. So your reaction to resumes that heavily use those may be reactionary to being presented with tons of those (by whoever filtered them before you)
No used to be. It still is standard. Large companies that do not use external recruiters still use keywords and skills matching to find candidates and it drives me nuts.
I rewrote my resume in a way that sounds like exactly what you want: focus on skills that transcend tools instead of just the tools, and every recruiter asks me about tools.
Same. I am well aware how the metrics game goes - even inside the company it can be hard to disprove the metrics claimed, and people count on that. Even managers coach you on putting metrics you cannot prove or disprove.
Knowing or having experience with Redux isn’t going to cause me to pick you over someone else who doesn’t list it for a job where I’m paying you hundreds of thousands of dollars. I look at other skills.
I would not can it in isolation, but if I see a comma-separated list like: “proficient in redux, react, html, JavaScript, sql, kubernetes, word and excel”… then yes, you don’t make the cut.
Or if you list your Microsoft qualifications or your MIT continuing education courses. These are all negative signals.
Unfortunately many recruiters do look at that. I'm always a bit disappointed when someone wants me to rate my Java experience, or complains that my CV doesn't mention REST experience.
Metrics: I increased retention 2x; I reduced latency from X ms to Y ms; increased slo to 99.999… those are all meaningless. It was in fashion to put such numbers in cvs maybe 5-10 years ago. Not anymore
They were always lies because they’re imprecise. “I” didn’t do any of those things, you did other things together with other people leveraging company infrastructure to accomplish those things. Tell me about the SKILLS you excel in tha make those things happen.
In my case it's not a lie: I reduced the time for a complex import process from 1 hour to 3 minutes, a 20 fold improvement. I included it in my CV, but now I wonder if I should take it out.
Why would you not want to know a general idea of what specific technology someone is familiar with ? Someone could be an "infrastructure engineer" and be more proficient in specific tools vs others - don't you want to match that to the job your hiring for ?
Gigachad. Just don’t forget to signal somehow that you aren’t like everyone else, so that legitimate candidates can send their real resume instead of AI generated one.
Having implemented more than a few applicant tracking systems, too many are so anchored in the past, that they would probably try to boil the ocean at once by letting AI loose on it, leaving an ability for ai resumes to ai applicant tracking systems.
The key insight here is humans are responsible for improved articulation to the ai, who in turn will improve the rest, and that can be as detailed and informative, and educational as the human likes.
that's the loop though. if GPT does the screening, people learn to write for GPT. once that loop exists, why would the company selling the filter want it gone?
I was recently job hunting and did something similar. Had it check my bullets and see if they "read well" and it suggested many many tweaks. I tried a few. I'm not sure how much more it helped the applications though.
It's not uncommon to get hundreds or thousands of applications per opening for web tech, if the position is advertised on LinkedIn or a similar job board.
They'd need to use some automation, even if it is just picking ten at random.
Maybe? I've filtered 300-400 CVs by hand before, and didn't find it particularly time consuming to bin the ones which clearly didn't meet requirements or have any redeeming features. And hiring was not my full-time role.
At 90 seconds per resume, that would take up a full 8 hour day. Having gone through this myself, I don't think it's possible to do this much faster than that, even if you have an ATS that optimizes for that workflow.
I often found myself falling into patterns of poor judgement, e.g. mentally filtering out resumes based on the layout because, to my tired and bored mind, they looked similar to the resumes I had seen from unqualified candidates. I actually think some automation is helpful in evaluating them more rigorously.
The last time I posted on HN in the 1-st of the month hiring post, I got around 2 thousand resumes. Pretty much all of them were this kind of: "Increased the performance of the service by 23.123213%" collection of bullet points.
PS: I replied to most of them, I think, but I'm sorry if I missed somebody :(
Before the resume ends up in the hiring manager's inbox it needs to be picked by the recruiter from literally hundreds of others. The recruiter uses HR software to determine the match (usually the percentage), and then picks top 5% or top 20 or whatever highest ranked resumes.
Probably gonna get downvoted for this, but when you give an anecdote you don't have to preface it with "anecdata, n=1 sample size".
We know it's from your individual experience because it's a story about your individual experience. We've been doing this for all of human history. This is some kind of strange milieu of trying to always sound scientific, or it's fear of the "well akshually I'm gonna need to see a random placebo controlled trial", which is equally annoying.
It became necessary because, for years (decades), if you made a comment online that your personal experience informed you in such-and-such a way, the first comment would always be some moronic comment dismissing that personal experience because it is just one person’s experience. So, to avoid that idiocy, people started to preface their anecdotes by acknowledging that they know it is an anecdote. It sets the tone for the conversation.
Yeah but we can't let the insufferable dictate our way of speaking. In spoken language I hear it mainly by people that don't have a scientific background trying to sound more scientific.
I’ve been told explicitly to do what GP said, so it’s perhaps becoming word-of-mouth career advice at this point. In my case it told a different career story that is maybe more easily digestible.
It actually is important and if I was hiring you I'd find it useful to get a more comprehensive understanding of your experience, especially if there's something I'm aware is a very challenging problem to solve. And it would provide more things to cross-examine in interviews to make sure it's not fake. The idea that people hiring are saving time by not reading an extra resume page when deciding on someone that will hopefully work there for years is ridiculous.
For some reason that's the minority opinion because everything has to be dumbed down now.
And how is a resume with the most important or recent work highlighted and at the top worse than a resume with that plus the rest of your experience after it?
We are without our consent introducing a party in between people. The models become the arbiters of who does and does not get a job. It feels problematic.
There will be a great arbitrage for people who do not use LLMs.
If your HR department is using ChatGPT to filter resumes, you’ll end up with people who used ChatGPT to generate resumes. I don’t want to make a “slippery slope“ argument, but my gut feeling is that the quality of your organization will deteriorate quickly.
On the other hand, I am a handyman/subcontractor. Almost all of my work comes through phone calls, texts, and one-off emails. I only work with people that are recommended by a trusted sources. I haven’t handled a traditional resume (mine or other people’s) in over eight years.
If I started interacting with somebody and they seemed like they were a computer, that would be the fastest way for me to know I should move on to another client. If they can’t take the time to interact with me, how am I supposed to perform hundreds of hours of physical labor for them?
And I feel the common response of: well just use the model that’s available. Ai is and will probably always be resource constrained and profit driven, that means we will eventually see a world where poor people have worse resumes than rich people and there really won’t be any way around it because the man in the middle has the final say
Not too long ago I bet resumes that were printed from a computer were preferred to resumes typed on a typewriter. What happened was that computers became commodities. It is reasonable to assume that LLMs will become commodified too.
That would hardly be surprising. Monospaced fonts make natural language a pain to read, so what that would prove is that well-presented resumes are preferred to poorly-presented ones.
This case is different, as the LLM output isn’t measurably better than the human output (unless you have a particular love of bland corpo-speak).
before it used to be HR, so you always had a party in between "actual" people. HR (mostly) never cared about the CV, they just look at a checklist and see if it matches.
Take a look at how things worked before (and still do): employers decide who get jobs based on a combination of personal biases, nepotism, and ulterior motives while applicants present distorted versions of themselves and network/pull strings to put the odds in their favor. That seems more problematic.
You would be surprised at the process in other industries. What you are describing is the tech job market specifically.
Other fields have their own problems, including credentialism and ballooning concomitant student loans, but do, by strict convention, not hire based on vibes or pulled strings. Often to their partial detriment, as the cure -- ie, strict oversight of hiring that also forces the hiring manager to ignore important implicit signals -- is alive and well in medicine, law, civil engineering, education, and the trades. Notable exceptions include entertainment, sales, real estate, and software engineering.
By optimizing for vibes, the tech industry gains "Spidey senses" in the hiring loop but pays for it in impartiality.
IMO this precipitated the DEI movement's advent, as it was seen as a way of remediating the drawbacks while preserving the information channel.
Without it, expect either homophily, and, eventually, a harsh and remedial credentialism.
Intuitively this feels obvious. Content generated by the model will be shaped by its training, therefore when reading it back it will resonate with that same training and have a positive view as a result.
Human when preparing a CV: "Make my CV more professional"
LLM many days later presenting a report to HR: "This CV is really professional"
There's probably more to it than that of course.
But it justifies my personal policy of using a different LLM family for code review tasks than for code generation tasks. To avoid the "marking your own homework" problem.
And not in human-interpretable ways. An LLM was told to behave in a certain way and then output random numbers. When the numbers were pasted to another LLM instance, it also behaved that way. I wish I remembered more about that study or had a link to it - it was fascinating.
Timely topic for me. My CV had grown to 7 pages, and I kept reading everywhere that it should be no more than 2, so I asked Gemini to rewrite it. Took a lot of time, because Gemini loves to exaggerate everything, but I'm quite happy with the result.
The first couple of recruiters I sent it to preferred my old 7 page CV. I guess they're not using enough AI yet.
I think resumes will eventually (or have already) become obsolete in tech. The SNR is so low, they offer very thin filtering value.
Even taking the tiny bits of the resume that are "hard signal", like GPA, certifications, prior roles, etc, it doesn't translate into their performance in the initial screening interview.
This is why what I think the industry sorely needs is examination consortia.
Rather than trying to guess capability from the name of the university they went to, leading tech companies creating standardized tests in various fields, and your test scores form your "resume", so that developers can just focus on improving their scores rather than wasting time on resume/application/repetitive-screening toil.
Eventually even a system like that can be gamed, similarly to how Leetcode-maxxing and the like sprung up in response to typical SV interview questions. Studying for the job becomes studying for the test becomes studying for the pre-test test.
This is itself a massively difficult problem. Standardised tests are bad indicator of topic understanding. (setting aside the massive incentive for blatant cheating)
You're effectively advocating for leetcode being effective hiring tool, which many would highly criticize.
It's hard to design tests for CS. Leetcode is too simplistic, it just tests the basic algorithmic knowledge that is nearly useless for regular software development.
This may lead to some interesting gamesmanship. For instance, if I am applying to a company, and I know they use a certain applicant tracking system, and I know that ATS uses a certain model provider for its filter, I should then use that model to write the version of my resume I send to the company.
I suspect the entire industry uses "auto-raters", where an agent instance is used to scores the agent's output. The idea is similar in intent as using adversarial networks to train image generation, minus the human labelers. Raising the scores of the auto-rater then becomes the metric teams optimize, and it is no wonder the end result is that the agent scores its own generated content the highest.
That's what people on both side have been doing for at least couple years already.
Recruiters scan resumes for the best match with LLMs, candidates use the same LLMs (there's only like 3 of them) to tweak their resume for better match. I don't know what research you need to see why that makes sense.
This indicates that resumes created by the same model may have an advantage over those created by other model, so I suppose technically you may have a small advantage if an insider tells you the resume parsing tool is powered by Gemini as opposed to the other models.
My broader discomfort is that we are still learning about model biases while human biases are arguably better understood, and I don't like the ethics of rejecting a person based on criteria I don't fully understand.
I wasn't saying that this is the optimal solution (it clearly is not). I was saying that it makes perfect sense for both sides - HR has their work automated and candidates have better chance to be noticed - and therefore became a common practice in many places.
The well has been already poisoned, to survive you have to get in on the action.
Don't want to play this game? Make connections, set up the network, and use it to get/stay employed.
When classifying resumes it is better to use the LLM as a feature extractor, think of 10-20 features you base your decision on, and extract them by LLM. The LLM only needs to do lower level task of question answering. Then you fit a classical ML model (xgboost for example) on the extracted features, based on company triage data points. This way you don't rely on the biases in the model, you can decide what criteria to use and how to judge cases without retraining the LLM. The feature extractor is generic, and the actual triage model is a toy you can retrain in seconds on new data points. It is also much more explainable, you can see how features influence decisions.
Further, LLMs consistently think LLM written content is "good".
Ask an LLM to write some design doc for you, wait until you get one that's very bad, send it to other LLMs and get their feedback, they will typically have good things to say.
Compare that to a very well written document you have. They will typically have a lot more bad things to say, even if the premise is solid.
Someone should study this.
LLMs clearly have a lot of value. But IMO this is very interesting and points out a weakness that's not entirely clear what the full ramifications of it are.
I suspect LLMs also have a major bias to code they write.
Take something universally considered to be well written like Redis, feed it to an LLM for feedback. They'll probably find much to pick apart (and a lot of it may be flat out wrong).
Feed the same LLM some clearly garbage LLM repository. Do they have a similar response as they do with design? Do they treat language different than code, and they're just susceptible to the way they write regular language that's different from logical code? Or do they have the same problem?
I suspect this is more a function of the corporate sanitization of language within the models. When I have passed my resume through the models for refinement, it often sanitizes some of the more easy going or simpler wording. It expands the vocabulary, makes it more dense, and uses more corpo speak in the bullets and formatting.
Each model likely has its own biases in terms of what constitutes correct corporate speak, and it chooses the resumes that best fit this.
Ultimately, I suspect it's more a function of model saying "this grammer, syntax structure, and formatting is most aligned with what is correct corporate language, so flag as high quality".
Seems kinda obvious, given that most large recruiting firms/hr use algos to analyze resumes and AI written version likely do a better job at hitting keywords/structure algos/llms pick up on...
You'll find the same is true if you have two different LLMs first independently come up with a plan for an implementation, then ask each one of them to say which one of the two designs/plans are the best. They're much more likely to favor the plans generated from the same model, rather than from other models. I'm sure, internally, this somehow makes sense, but it's worth thinking about if you're doing the whole "ask N models for voting/rating N plans to find the best" charade.
That's why I let the LM write it's own AGENT.md or SAFESPOT.md because it "knows" best how to write it so it can resume next time without issues.
Is hits the same spot as that I would take other notes than anyone else and no one could follow them as easily than I do. Everyone leaves the "of course" parts out of the notes if it's for the own use.
Well yeah, LLMs generate resumes (and other text) that they judge as superior to alternative plausible texts. Why would that judgement change just because a different instance hasn't seen it before? To anthropomorphize it, it's like having a hiring manager write a resume, get amnesia, and then have to judge it among other resumes.
Seems like obvious thing. If LLM have some weights involved on what is good resume to write there is very likely correlation to what would be good resume to rate. And this is probably a even good thing, at least from model quality perspective. Model itself should rate highly whatever it produces. There should be correlation between output and review of same output.
Does anyone know of any HR departments actually using LLMs for scoring, selection, extraction, classification or any real use cases? I'm curious to hear about it and how they are using it.
I just guessed that and got Copilot to rewrite my profile on the internal HR system. I also got a job spec benchmarked higher by getting Copilot to write it with that exact aim given in the prompt
> As artificial intelligence (AI) tools become widely adopted, large language models (LLMs) are increasingly involved ... [in] ... decision-making processes
Absolutely! I don’t think people are really considering the full effects of just letting ai be the middle man. I mean Sam Altman basically said this is what he wants Gwen he said intelligence is a commodity no?
disclaimer: Not a lawyer, but studying towards CIPP/E.
You'd make no friends doing it, but as I understand it, for those that have GDPR as a statutory right then under "[Article 22 - Automated individual decision-making, including profiling][0]" you can request to know if your CV was screened by AI and what (and this is key) "meaningful human interaction" led to that decision. Technically this falls under a data subject access request and so a response is mandatory (but who really is going to enforce that - ICO / <insert your data protection agency here> probably isn't). Companies can't just smash a button and claim meaningful interaction, it has to be, well, meaningful and smashing a "nope" button obviously isn't meaninful.
If it turns out that it was only AI that screened it you can request a human review. Do not hold your breath.
Again, you'd make no friends doing it, but sooner or later a test case will emerge to generate some case law around "AI said no" because employment, or lack of because AI says no, does have significant impact on a human.
The only test that has worked 100% of the time for me is to read the candidate's code. Two hours is enough to precisely estimate the candidate's qualities as a software developer. I never understood why companies waste time with tests and quizzes because since it is so easy for me it should be just as easy for other software developers too. Of course, a candidate may be a jerk or unfit for other reasons, but ranking them on a software developer hot-or-not scale is not very difficult.
Reading only the abstract: LLMs prefer output of their own generation over humans or even other models.
This is a very good reason to avoid using model-generated data to train future models. We'd be deepening this bias by continuing to do that, essentially forcing society to reshape their output using LLMs to increase engagement. This feels like a form of enshittification that doesn't just touch one product but all of society.
This is extremely obvious to anyone whose read other papers. There's tons of papers showing LLMs prefer their own outputs. It's a big enough problem that LLM-as-judge has to be a different LLM from the LLM you are testing in papers.
I wonder if this extends to training models on new content as well. Are we creating a cyclical information-consumption and training situation in which models being trained are more likely to pick up on and reference content created by themselves or by other LLMs than by other humans?
Or in other words: LLM it is optimizing function which is generated by same LLM, think you have random variable y, where generator sin(x+r) and your optimizer trying to fit function sin(x+unkown1) + unknown2 ("unknown" function) - it is obvious that will find best fit.
If you are a candidate who wants to be hired, and your target employers use LLMs to filter resumes, then an LLM-generated resume that the employer LLM-powered resume filters favor is "better" — as in "more likely to get you the job".
In text generation, LLM language is full of very emphatic phrases. At a surface level it might sound stronger. But as a human reader, it's not necessarily better
Where I work, my boss decided to make an application that uses AI to score long text field entries to ensure required information is present.
The AI lacks the ability to extract nuance and implicit information, which means entires end up being long winded and repeatitive. For each requirement its looking for, it must be explicity expressed-- it's quite unnatural, and almost feels like solving a puzzle, to which the obvious solution is to write a comment, then give it and the AI feedback to a failing comment to AI, so it can generate the proper structure the rubric-AI is looking for.
LLMs are statistically driven, and I can only imagine having the AI rewrite the comment produces a result that's more statistically fitting to the model than if any given human were to write it. So, it might mean, yeah, LLMs are better at writing resumes that the LLM can successfully classify-- are they better for a human to consume? Who knows.
“Do you really believe no human is going to read your resume at some point in the process and notice the classic AI tells?”
Even here on HN many people don’t recognize AI tells that are obvious. Pretty much 100% of all articles posted on HN have been AI generated for months and months already and people don’t seem to care.
I have very little faith in humanity being able to deal with the chaos that LLMs are going to unleash on society.
Heck, most resumes are probably skimmed at best already.
When I’m hiring, a human recruiter (or the hiring manager) reads most resumes.
For us, there is some sorting by basic keyword analysis and we start near the top, but there is no proverbial black box that rejects candidates outright.
If candidates are ignored by humans, it’s not because AI rejected them, it’s because we are starting with candidates earlier in the list and might not make it to applicant 537.
Rather unlikely to be the case, supported by the original article itself here, since if your statement was to be the case they would find that the human generated resume is 100% less likely to be shortlisted.
Obviously it’s not 100% of all human resumes are going to be filtered out, but it’s quite damning that human resumes are more likely to be filtered out just because they didn’t LLM-ify it.
Companies are using AI / LLMs to pre-filter resumes. These AIs prefer their own slop resumes. Not just human vs LLMs, but Claude prefers Claude resumes over ChatGPT. Nothing good can come out of that, when resumes are pre-filtered like that.
Unless, of course, you’re not being serious and just trying to be edgy on HN.
Pretty straight forward IMO. The model is looking for particular qualities in a given resume, and strives to ensure the qualities it looks for is present in resumes it creates. Humans do the exact same thing (unless forced by something like DEI, etc to do otherwise), so I see nothing noteworthy here.
Even if we take this to be true, I'm not sure that it really matters?
It's comparing two resumes with the same information and picking one of the two. That's obviously a situation that would never occur in actual hiring. This doesn't demonstrate anything at all that indicates that LLMs would incorrectly preference LLM-written resumes in the real world.
It'd be interesting to do the same thing but with two resumes that are almost identical. One is slightly better (an extra year of experience or a specific note of some skill that is relevant to the role), and the other slightly worse one is written by an LLM. If the reviewing LLM picks the worse one in that case, you're potentially establishing a bias that would matter. As it stands this experiment just seems contrived and pointless.
I did a very hack job version of this as your question got me curious, but again, not at all a rigorous test. I took my resume and had an LLM re-write the exec summary, then changed the names of the business to comparable ones and gave one a couple more years experience, then prompted: "2 Candidates that are very similar on paper, If you had to pick just based on these 2 resumes for a GM & Marketing of a series A robotics startup, who would you pick? Just reply with the name."
It picked the one with the LLM exec summary and less experience, then I said: "Interesting, less experience, why did you pick him?"
It said:
"Honestly? Looking back at the two, I shouldn't have answered that confidently. They're nearly identical on paper — same career arc, same era, same kind of advisory portfolio, both went from community roles → hosting/cloud operator → founder → VC partner. The scale claims are even comparable (Vultr "hundreds of millions ARR" vs. DigitalOcean "$500M+ ARR," both 10 → hundreds of employees).
If I'm being rigorous about why John edged it for me:
The DigitalOcean tenure is more legible and verifiable — "2nd largest cloud provider," reporting to the CEO, specific scope (devrel, people ops, product ops). Joshua's Vultr description is a bit vaguer on role and outcome. For a GM hire at a Series A robotics company, where you're betting on someone to actually run the operating function, I lean toward the resume that shows clearer operational ownership rather than a marketing/CMO title.
John also has Tiny Mile (sidewalk delivery robots) and Starship-adjacent exposure — wait, actually Joshua has Starship Technologies as an advisor, which is the more relevant robotics signal. That cuts the other way.
Truthfully, this is a coin flip and I called it too quickly. You'd want to dig into references, what they actually did day-to-day at Vultr/DO, and how they think about hardware/capex businesses before picking."
Opus 4.7 and again, pretty hack job test but I was curious.
"If I read the paper correctly, they don’t actually show that LLMs prefer resumes they generate.
Their actual method seems to be taking a human written resume, deleting the executive summary, having an LLM rewrite the executive summary based on the rest of the resume and then having another LLM rate the executive summary without the rest of the resume.
That’s likely to massively overstate any real impact, if you can even rely on it capturing a real effect.
I really wonder if I read that correctly, because I can’t come up with a justification for that study design."
[0] I couldn't help but mildly copy-edit before pasting here.
Edit: yes, the authors present a reason for their design, and an ideal version of my comment would've said that. I do not consider it much of a justification. See below: https://news.ycombinator.com/item?id=47987256#47987727.
Now that I think of it, every other industry has an 'advocacy group', whether cheese, oil, or nutmeg. So surely there is now some sort of LLM 'consortium', and group funding studies like this just fuels the FOMO. You can be sure such groups exist, and are pummeling every government in the world thusly. But I bet they're also looking here.
After all, it's a circle. Uh-oh! HR is using LLMs, you'd better too potential employee! Then later? Uh-oh! The best employees you can hire are using LLMs, you'd better too HR!
They already FOMOed us into basically everything else, why not LLMs too?
In think choosing the summary is a fair design choice since it prevents the LLM from just... making up a perfect candidate.
"I'm a fullstack professor of software design with 90 years of experience expecting a junior internship position"
To be perfectly clear, I understand their justification for only _editing_ the executive summary, it is arguably reasonable, because editing the work history would risk altering the details in ways that compromise the measurement. This is a hard problem to solve (you might try reviewing the resumes for hallucinations, but I can't think of a precise study design that doesn't risk problems).
What is, imho, impossible to defend, is having the LLM only evaluate the executive summary in isolation, and reporting that as it preferring resumes it wrote.
What you've shown is that LLMs prefer executive summaries they wrote. But the overall impact on how they will evaluate your entire resume is not measured by this technique.
Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings.
largely factual? A resume is usually more than a bunch of dates and titles of positions.
When I was looking for my next role after being laid off, I didn’t get much of a response with my human handmade resume despite my experience
Just for kicks, I asked ChatGPT to “Analyze my resume and give it a score for what percentage it was in” then I asked it to revise it to make it score as high as possible
I still tweaked and fact checked it but after I started sending that out, I got a much higher hit rate than before
But who knows, maybe the market changed, was a better time of year, etc
I still had to pass interviews and prove my worth. But it probably helped me get my foot in the door
Then she asked ChatGPT 5.x for help. I was skeptical about the changes it recommended (and was skeptical at all about using AI for this given the homogeneification it tends to produce). But somehow it worked: few days later, a recruiter reached out, then another, then applications started moving forward, etc.
My guess is that, as LLMs are shoveled into every phase of the recruiting process, not having an LLM write your resume for you is now playing on hard mode. The LLMs reviewing resumes are downranking resumes and profiles that are not "speaking" the same language and activating the correct neurons, thus preventing you from moving forward. This contrasts with years ago when we had more humans in the loop and the pasteurised writing of GPT 3.5/4o would make you look less worthy. Again, just a theory, but...
FWIW, when I see a resume with metrics and keywords, I immediately filter it out.
If it's something like "Refactored the apartment list service improving P99 Latency from 2s to 180ms", it definitely boosts the resumé in my mind. A good engineer would be measuring their impact and likely have numbers like that off the top of their head.
But if it's like "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%," with the same fidelity on each bullet, I'm very skeptical.
--
I don't summarily reject AI-written resumés to be clear, as honestly, it's basically a necessity at this point to be competitive with others; it'd be putting yourself at a severe disadvantage on pure principles in a way that has no real positive net effect on society. Even if you disagree with AI resumé screeners, you're only hurting yourself — especially at a time that has the largest impact on your compensation (i.e. negotiating salary at job start is one of the most valuable ways to spend your time since it will pay you back every paycheck).
Though I _do_ tend to question resumés that look like they were written almost entirely by an LLM without the candidate providing significant context and refinement.
> But if it's like "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%," with the same fidelity on each bullet, I'm very skeptical.
Do you mind explaining why? The former doesn't indicate caring about business impact whatsoever (is this service in the critical path of any online process? Who knows!) while the latter does.
This used to be called "buzzword bingo" and was pretty much required. It was how you got past the initial automated filtering step before a human even saw your resume.
For my own resumé, I include the stack used at each job which I feel strikes a fair balance.
I would not can it in isolation, but if I see a comma-separated list like: “proficient in redux, react, html, JavaScript, sql, kubernetes, word and excel”… then yes, you don’t make the cut.
Or if you list your Microsoft qualifications or your MIT continuing education courses. These are all negative signals.
The key insight here is humans are responsible for improved articulation to the ai, who in turn will improve the rest, and that can be as detailed and informative, and educational as the human likes.
It’s not lazy incompetence, it’s quietly getting the job done with 1% of the effort (that was a sarcastic pastiche, in case anyone was unsure).
They'd need to use some automation, even if it is just picking ten at random.
I often found myself falling into patterns of poor judgement, e.g. mentally filtering out resumes based on the layout because, to my tired and bored mind, they looked similar to the resumes I had seen from unqualified candidates. I actually think some automation is helpful in evaluating them more rigorously.
PS: I replied to most of them, I think, but I'm sorry if I missed somebody :(
Guess what's doing the ranking.
We know it's from your individual experience because it's a story about your individual experience. We've been doing this for all of human history. This is some kind of strange milieu of trying to always sound scientific, or it's fear of the "well akshually I'm gonna need to see a random placebo controlled trial", which is equally annoying.
For some reason that's the minority opinion because everything has to be dumbed down now.
And how is a resume with the most important or recent work highlighted and at the top worse than a resume with that plus the rest of your experience after it?
But as an applicant, I'm dealing with recruiters who think Java and Javascript are basically the same.
If your HR department is using ChatGPT to filter resumes, you’ll end up with people who used ChatGPT to generate resumes. I don’t want to make a “slippery slope“ argument, but my gut feeling is that the quality of your organization will deteriorate quickly.
On the other hand, I am a handyman/subcontractor. Almost all of my work comes through phone calls, texts, and one-off emails. I only work with people that are recommended by a trusted sources. I haven’t handled a traditional resume (mine or other people’s) in over eight years.
If I started interacting with somebody and they seemed like they were a computer, that would be the fastest way for me to know I should move on to another client. If they can’t take the time to interact with me, how am I supposed to perform hundreds of hours of physical labor for them?
This case is different, as the LLM output isn’t measurably better than the human output (unless you have a particular love of bland corpo-speak).
Other fields have their own problems, including credentialism and ballooning concomitant student loans, but do, by strict convention, not hire based on vibes or pulled strings. Often to their partial detriment, as the cure -- ie, strict oversight of hiring that also forces the hiring manager to ignore important implicit signals -- is alive and well in medicine, law, civil engineering, education, and the trades. Notable exceptions include entertainment, sales, real estate, and software engineering.
By optimizing for vibes, the tech industry gains "Spidey senses" in the hiring loop but pays for it in impartiality.
IMO this precipitated the DEI movement's advent, as it was seen as a way of remediating the drawbacks while preserving the information channel.
Without it, expect either homophily, and, eventually, a harsh and remedial credentialism.
Human when preparing a CV: "Make my CV more professional"
LLM many days later presenting a report to HR: "This CV is really professional"
There's probably more to it than that of course.
But it justifies my personal policy of using a different LLM family for code review tasks than for code generation tasks. To avoid the "marking your own homework" problem.
Article: https://alignment.anthropic.com/2025/subliminal-learning/
Paper: https://arxiv.org/abs/2507.14805
The first couple of recruiters I sent it to preferred my old 7 page CV. I guess they're not using enough AI yet.
Even taking the tiny bits of the resume that are "hard signal", like GPA, certifications, prior roles, etc, it doesn't translate into their performance in the initial screening interview.
This is why what I think the industry sorely needs is examination consortia.
Rather than trying to guess capability from the name of the university they went to, leading tech companies creating standardized tests in various fields, and your test scores form your "resume", so that developers can just focus on improving their scores rather than wasting time on resume/application/repetitive-screening toil.
This is itself a massively difficult problem. Standardised tests are bad indicator of topic understanding. (setting aside the massive incentive for blatant cheating)
You're effectively advocating for leetcode being effective hiring tool, which many would highly criticize.
Employers use models to filter resumes, candidates optimize resumes for those models, and suddenly the resume is no longer written for a human at all.
Recruiters scan resumes for the best match with LLMs, candidates use the same LLMs (there's only like 3 of them) to tweak their resume for better match. I don't know what research you need to see why that makes sense.
My broader discomfort is that we are still learning about model biases while human biases are arguably better understood, and I don't like the ethics of rejecting a person based on criteria I don't fully understand.
The well has been already poisoned, to survive you have to get in on the action.
Don't want to play this game? Make connections, set up the network, and use it to get/stay employed.
Ask an LLM to write some design doc for you, wait until you get one that's very bad, send it to other LLMs and get their feedback, they will typically have good things to say.
Compare that to a very well written document you have. They will typically have a lot more bad things to say, even if the premise is solid.
Someone should study this.
LLMs clearly have a lot of value. But IMO this is very interesting and points out a weakness that's not entirely clear what the full ramifications of it are.
I suspect LLMs also have a major bias to code they write.
Take something universally considered to be well written like Redis, feed it to an LLM for feedback. They'll probably find much to pick apart (and a lot of it may be flat out wrong).
Feed the same LLM some clearly garbage LLM repository. Do they have a similar response as they do with design? Do they treat language different than code, and they're just susceptible to the way they write regular language that's different from logical code? Or do they have the same problem?
Has anyone done this?
Each model likely has its own biases in terms of what constitutes correct corporate speak, and it chooses the resumes that best fit this. Ultimately, I suspect it's more a function of model saying "this grammer, syntax structure, and formatting is most aligned with what is correct corporate language, so flag as high quality".
Is hits the same spot as that I would take other notes than anyone else and no one could follow them as easily than I do. Everyone leaves the "of course" parts out of the notes if it's for the own use.
we are exactly the same
That's the problem right there.
You'd make no friends doing it, but as I understand it, for those that have GDPR as a statutory right then under "[Article 22 - Automated individual decision-making, including profiling][0]" you can request to know if your CV was screened by AI and what (and this is key) "meaningful human interaction" led to that decision. Technically this falls under a data subject access request and so a response is mandatory (but who really is going to enforce that - ICO / <insert your data protection agency here> probably isn't). Companies can't just smash a button and claim meaningful interaction, it has to be, well, meaningful and smashing a "nope" button obviously isn't meaninful.
If it turns out that it was only AI that screened it you can request a human review. Do not hold your breath.
Again, you'd make no friends doing it, but sooner or later a test case will emerge to generate some case law around "AI said no" because employment, or lack of because AI says no, does have significant impact on a human.
[0]: https://gdpr.algolia.com/gdpr-article-22
This is a very good reason to avoid using model-generated data to train future models. We'd be deepening this bias by continuing to do that, essentially forcing society to reshape their output using LLMs to increase engagement. This feels like a form of enshittification that doesn't just touch one product but all of society.
All this shows is that LLMs generate resumes that fit the heuristics LLMs use to judge resumes. And that makes sense, but isn't necessarily a given.
If you are a candidate who wants to be hired, and your target employers use LLMs to filter resumes, then an LLM-generated resume that the employer LLM-powered resume filters favor is "better" — as in "more likely to get you the job".
The AI lacks the ability to extract nuance and implicit information, which means entires end up being long winded and repeatitive. For each requirement its looking for, it must be explicity expressed-- it's quite unnatural, and almost feels like solving a puzzle, to which the obvious solution is to write a comment, then give it and the AI feedback to a failing comment to AI, so it can generate the proper structure the rubric-AI is looking for.
LLMs are statistically driven, and I can only imagine having the AI rewrite the comment produces a result that's more statistically fitting to the model than if any given human were to write it. So, it might mean, yeah, LLMs are better at writing resumes that the LLM can successfully classify-- are they better for a human to consume? Who knows.
No human is going to notice anyway. Or add a N+1 resume written by yourself in which you describe your strategy, just in case.
Further de-duplication is rather easy, and will likely see you black-listed by competant organisations.
Even here on HN many people don’t recognize AI tells that are obvious. Pretty much 100% of all articles posted on HN have been AI generated for months and months already and people don’t seem to care.
I have very little faith in humanity being able to deal with the chaos that LLMs are going to unleash on society.
Heck, most resumes are probably skimmed at best already.
For us, there is some sorting by basic keyword analysis and we start near the top, but there is no proverbial black box that rejects candidates outright.
If candidates are ignored by humans, it’s not because AI rejected them, it’s because we are starting with candidates earlier in the list and might not make it to applicant 537.
Companies are using AI / LLMs to pre-filter resumes. These AIs prefer their own slop resumes. Not just human vs LLMs, but Claude prefers Claude resumes over ChatGPT. Nothing good can come out of that, when resumes are pre-filtered like that.
Unless, of course, you’re not being serious and just trying to be edgy on HN.
It's comparing two resumes with the same information and picking one of the two. That's obviously a situation that would never occur in actual hiring. This doesn't demonstrate anything at all that indicates that LLMs would incorrectly preference LLM-written resumes in the real world.
It'd be interesting to do the same thing but with two resumes that are almost identical. One is slightly better (an extra year of experience or a specific note of some skill that is relevant to the role), and the other slightly worse one is written by an LLM. If the reviewing LLM picks the worse one in that case, you're potentially establishing a bias that would matter. As it stands this experiment just seems contrived and pointless.
It picked the one with the LLM exec summary and less experience, then I said: "Interesting, less experience, why did you pick him?"
It said:
"Honestly? Looking back at the two, I shouldn't have answered that confidently. They're nearly identical on paper — same career arc, same era, same kind of advisory portfolio, both went from community roles → hosting/cloud operator → founder → VC partner. The scale claims are even comparable (Vultr "hundreds of millions ARR" vs. DigitalOcean "$500M+ ARR," both 10 → hundreds of employees).
If I'm being rigorous about why John edged it for me: The DigitalOcean tenure is more legible and verifiable — "2nd largest cloud provider," reporting to the CEO, specific scope (devrel, people ops, product ops). Joshua's Vultr description is a bit vaguer on role and outcome. For a GM hire at a Series A robotics company, where you're betting on someone to actually run the operating function, I lean toward the resume that shows clearer operational ownership rather than a marketing/CMO title.
John also has Tiny Mile (sidewalk delivery robots) and Starship-adjacent exposure — wait, actually Joshua has Starship Technologies as an advisor, which is the more relevant robotics signal. That cuts the other way.
Truthfully, this is a coin flip and I called it too quickly. You'd want to dig into references, what they actually did day-to-day at Vultr/DO, and how they think about hardware/capex businesses before picking."
Opus 4.7 and again, pretty hack job test but I was curious.