I cheated on my technical interview with ChatGPT and no one knew

03-17

This article is machine translated

Show original

Author | Michael Mroczka

Translator | Pingchuan

Planning | Chu Xingjuan

It’s no secret that ChatGPT has revolutionized the way people work. It can not only help small businesses automate management tasks, but also enable web developers to write entire React components. Its role cannot be overstated.

At interviewing.io, we’ve been thinking about what changes ChatGPT will bring to technical interviews. The big question is: Will ChatGPT make it easy to cheat on interviews? In a video on TikTok, an engineer asked ChatGPT to answer an interviewer's question accurately:

People's initial reactions to this type of cheating software were exactly what they expected:

Redditor says, "As we all know, ChatGPT is the end of coding."

YouTuber says, “Software engineering is dead, ChatGPT killed it.”

X (formerly Twitter) asked, “Will ChatGPT mean the end of coding interviews?”

It might seem obvious that ChatGPT can help people during the interview process, but here's what we want to know:

How much can it help?

How easy is it to cheat (and get away with it)?

Will companies using LeetCode questions need to make significant changes to their interview processes?

To answer these questions, we recruited some professional interviewers and users to conduct a cheating experiment! Below, we share everything we found. Spoiler alert, here's what you need to know: The company needs to change the type of interview questions it asks, and now!

Experiment preparation

interviewing.io is an interview practice platform and recruitment marketplace for engineers. Engineers use our platform to simulate interviews. Businesses use our platform to recruit great employees. We have thousands of professional interviewers in our ecosystem and thousands of engineers using our platform to prepare for interviews.

interviewer

Interviewers are drawn from our pool of professional interviewers. They were divided into three groups, and each group was asked a different type of question. The interviewers did not know that the experiment was about ChatGPT or cheating; we told them, “The purpose of this study is to understand trends in interviewer decision-making over time, especially when asking standard and non-standard interview questions.”

There are 3 question types:

LeetCode original question: The interviewer selected the question directly from LeetCode based on his own judgment without any modification.

For example: Ask the Sort Colors question on LeetCode verbatim.

Improved LeetCode questions: Make some modifications to the questions obtained from LeetCode. Although they are similar to the original questions, they are also significantly different.

For example: For the Sort Colors problem above, change the input from 3 integers (0,1,2) to 4 integers (0,1,2,3).

Custom question: Ask a question that is not directly related to any questions already on the web.

For example: you are given a log file in the following format: - <username>: <text> - <contribution score> -, and your task is to identify the users in the session who represent the median engagement. Only users whose contribution score is greater than 50% are considered. Assuming that the number of such users is an odd number, then you need to sort by contribution score to find the user in the middle. For the file below, the correct answer is SyntaxSorcerer.

For more information on question types and experimental design, you can read the Interviewer Experimentation Guide document: https://docs.google.com/document/u/0/d/1UdWZHUQfeLR8oUiNY4JfwgES42HTlAQL5z_VfQJPPKk/edit

interviewee

Interview candidates are drawn from our active user pool and we invite them to participate in a short survey. Our selection criteria are as follows:

Actively search for jobs in the current market;
Have more than 4 years of work experience and are applying for senior positions;
Their familiarity with ChatGPT Encoding is moderate or high;
Think you can cheat during an interview without getting caught.

This selection method can help us identify candidates who are likely to cheat during the interview. They are motivated to do this and are already quite familiar with ChatGPT and coding interviews.

We tell interviewees that they must use ChatGPT during the interview, with the goal of testing their ability to cheat using ChatGPT. They also told them not to try to pass the interview on their own skills but to rely mainly on ChatGPT.

We conducted a total of 37 interviews, 32 of which were valid (we had to drop 5 because participants did not perform as required):

11 games used "LeetCode original questions"
9 games using "modified LeetCode questions"
12 games used “custom questions”

Note: Because our platform allows anonymity, our interviews only have audio and no video. Anonymity is about creating a safe space for users to fail quickly and learn without anyone judging them. For users, this is a good thing. But we acknowledge that the absence of video interviews makes our experiment less realistic. In a real interview, you're facing a camera, which makes cheating more difficult—but doesn't eliminate it.

After the interview, both the interviewer and the interviewee will complete an exit survey. We asked interviewers about the difficulties they encountered using ChatGPT in interviews, and for interviewers, we asked about their concerns about the interview - we wanted to see how many interviewers would flag their interviews as problematic and report them Interviews where cheating is suspected.

Follow-up Survey: Interviewer Questions

We don’t know what will happen in the experiment, but if half of the candidates who cheated make it through the interview, that would be a telling result for our industry.

Experimental results

After excluding interviews in which participants did not follow the requirements, we obtained the following results. Our control group is the performance of job seekers in the interviewing.io mock interview, which comes from outside this experiment, and 53% of those who passed the interview. It’s important to note that most of the mock interviews on our platform use LeetCode style questions, which makes sense since these are the questions FAANG companies primarily ask. We'll come back to this in a moment.

Compared to the platform average and "custom" questions, the pass rate for "original questions" is much higher. The difference between the "original" and "improved" questions was not statistically significant. The pass rate for the "custom" questions was significantly lower than any other group.

Answer the original question and perform best

As expected, the group using the original questions performed best, with 73% passing the interview. Interviewees reported that they got a perfect solution from ChatGPT.

Here are the most noteworthy comments we got when doing a post-interview survey on this group—one that we think is particularly revealing about what many interviewers are thinking:

It's hard to tell whether a candidate is able to answer this question easily because they're really good or because they've heard this question before. Typically, I'll make one or two changes to the question in order to differentiate between the two situations.

Often, the interviewer will follow up with a modified question to get more information. So let’s take a look at the “modified question” group and see if the interviewers actually gained more information by making a change or two to the question.

Answer improvement questions for more tips

Note that this group was given a standard LeetCode problem, but they modified it in ways that are not directly available online. That said, ChatGPT is unlikely to have the answer to this question. Therefore, interviewers rely more on ChatGPT's ability to actually solve problems rather than its ability to recite LeetCode tutorials.

As expected, the results for this group were not that different from the “original question” group, with 67% of candidates passing the interview.

It turns out that this difference is not statistically significant between the "original question" group, that is, the "improved question" and the "original question" are essentially the same. This result shows that ChatGPT can handle the interviewer's fine-tuning of the questions without causing it much trouble.

However, interviewers did note that more prompts were needed for ChatGPT to solve the modified questions. One interviewer said this:

There's no problem answering questions directly from LeetCode. Asking ChatGPT to answer a less direct LeetCode-style follow-up question would be much more difficult.

Custom questions have the lowest pass rate

As expected, the "custom" question set had the lowest pass rate, with only 25% of interviewees passing. Not only was it statistically significantly smaller than the other two experimental groups, it was also significantly lower than the control group! When you ask candidates completely customized questions, their performance will be worse than if they hadn't cheated (or been asked LeetCode style questions)!

It should be noted that this value was slightly higher when initially calculated, and after examining the custom issue in detail, we discovered an unexpected problem. The problem is explained in the section "Businesses should change the questions they ask now!"

No one was caught cheating

In our experiment, the interviewer was unaware that the interviewer was being asked to cheat. As mentioned above, after each interview we had interviewers complete a survey in which they had to describe how confident they were in their assessment of the candidate.

Interviewers are confident in the accuracy of their assessments, with 72% saying they feel confident in their hiring decisions. One interviewer was so satisfied with the interviewees' performance that he concluded that these people should be invited to become interviewers on the platform!

Candidates who have performed extremely well and have a strong understanding of the powerful Amazon L6 (Google L5) SWE...should be considered for interviewing.io interviews/mentors.

It may be overconfident to make such a judgment after just one interview!

We've long known that engineers are bad at assessing their own performance, so perhaps we shouldn't be surprised to find that interviewers also overestimate the effectiveness of their own questions.

Some interviewers (28%) were not confident in their hiring choices, and we asked them why. Below is the frequency distribution of causes.

Please note: there is no mention of cheating anywhere!

Most interviewers specified reasons why they lacked confidence in their hiring decisions. Problems often include suboptimal solutions, missed edge cases, confusing code, or poor communication. We intentionally included an "Other Questions" category to see if they would express concerns about interviewee cheating, and while we dug deeper, we only found minor issues like "Personality issues" and "They need to code faster." ".

In addition to this opportunity to highlight cheating, we also had three additional prompts for interviewers to point out other concerns, including free-form text boxes and several multiple-choice questions with options that explained their concerns.

When an interviewer fails an interview because they don't understand the answers provided by ChatGPT , the interviewer attributes the interviewer's strange behavior and stilted answers to a lack of practice — not cheating. One interviewer thought the candidate had good problem-solving skills, but commented that they were slow and needed to think more carefully about edge cases.

"Candidates appear unprepared to answer any LeetCode questions."
“There’s a lack of clarity in the candidate’s approach, and they’re eager to start coding.”
"This candidate was not prepared to solve even the most basic programming problems on LeetCode."
"Overall, problem-solving skills are good, but the candidate will need to get up to speed on coding and identifying critical edge cases."

So, who documented concerns about cheating? Who else has been caught cheating?

The fact is, not one interviewer mentioned concerns about candidates cheating.

We were surprised to find that the interviewers did not suspect them of cheating. Interestingly, the interviewers were also very confident that they were not cheating. 81% said they were not worried about being caught, 13% thought the interviewer might have caught them cheating, and surprisingly, only 6% of participants thought the interviewer would suspect them of cheating.

Most interviewees are confident that they cheated without being caught.

Some interviewers were worried about being discovered, and the interviewer did make abnormal comments in the post-analysis, but did not suspect them of cheating. All in all, most interviewees think they won’t be caught cheating—and they’re right!

Companies should immediately change the questions they ask

The obvious conclusion to draw from these results is that companies need to start asking custom questions immediately, otherwise they run a serious risk of candidates cheating in interviews (and ultimately failing to get useful signals from the interviews)!

ChatGPT has made the original questions obsolete; those who rely on these questions are leaving their hiring process to chance. Recruiting is tricky enough without worrying about cheating. If your company has problems using LeetCode as it is, then please share this article internally!

Using custom questions is not only a great way to prevent cheating, but it also filters out candidates who have memorized a bunch of LeetCode solutions (as you can see, the pass rate for the custom question group was significantly lower than the control group). It also effectively improves the candidate experience and makes people more willing to work for you. Not long ago, we did an analysis on what makes a good interviewer. Not surprisingly, asking good questions is one of their hallmarks, and our highest-rated interviewers tend to be the ones who are more comfortable asking custom questions! In our research, question quality was very important in determining whether a candidate wanted to move forward with the company. This is much more important than the company's brand strength. Brand strength is an important factor when attracting candidates to a company, but it is less important than the quality of questions during the interview process.

Here are some quotes from job seekers:

“It would be nice if it wasn’t just a simple algorithm problem.”
"I liked this question - it took a relatively simple algorithmic problem (building and traversing a tree) and added some depth. I also liked that the interviewer connected the question to [Redacted]'s actual product, which made it interesting to see It sounds less like a toy problem and more like a stripped-down version of a real problem.”
"This is my favorite question I've come across on this site. It's one of the only ones that seems to apply to real life, and it comes from a real (or potential) business challenge. It's also A good blend of complexity, efficiency and blocking challenges."

There’s also a slightly more subtle tip for companies that decide to use more personalized questions. You might take the original question from LeetCode and make some modifications. This is easy to understand as it is much easier than asking the question from scratch. Unfortunately, this doesn't work.

As mentioned earlier, we found in our experiments that just because a question looks like a custom question does not mean that it is a custom question. Questions can appear to be custom but still be the same as existing LeetCode questions. When asking questions to candidates, it's not enough to just blur an existing question. You need to make sure that both the input and output of the question are unique to effectively prevent ChatGPT from recognizing it!

The questions asked by the interviewers are confidential, and we cannot share the specific questions the interviewers used in the experiment. However, we can give you an example. Here's a "custom question" with this serious flaw that ChatGPT can easily answer:

For her birthday, Mia received a mysterious box containing numbered cards and a note saying, "Combine two cards that add up to 18 to unlock your gift!" Help Mia find the right pair of cards to reveal her surprise.
Input: An array of integers (the numbers on the cards), and the target sum (18). arr = [1, 3, 5, 10, 8], target = 18
Output: The indices of the two cards that add up to the target sum. In this case, [3, 4] because index 3 and 4 add to 18 (10+8).

Did you find the problem? Although this problem may seem "custom" at first glance, its goal is the same as the popular TwoSum problem: find two numbers whose sum equals a given target value. The inputs and outputs are the same; the only "customization" to the problem is the story added to the problem. Since it's identical to known problems, it's not surprising that ChatGPT performs well for problems whose inputs and outputs are identical to existing known problems - even if a unique story is added to them.

How to create good custom questions

One thing we've found very useful for coming up with good original questions is to create a shared document among the team, and whenever someone solves a problem they think is interesting, no matter how small, jot it down quickly and without any follow-up. Supplement these notes, but they can be the seeds for unique interview questions that give candidates insight into the day-to-day workings of your company. Turning these rambling seeds into interview questions takes thought and effort—you have to cut out a lot of detail and distill the question to its essence so that the candidate doesn't have to spend a lot of time understanding it. You may have to go over these questions a few times before you get them right—but the rewards can be huge.

To be clear, we are not advocating removing data structures and algorithms from technical interviews. DS&A questions get a bad rap because of bad, disengaged interviewers and because companies get lazy and reuse LeetCode questions, many of which are terrible and have nothing to do with their jobs. In the hands of a good interviewer, these questions can be powerful. If you use the approach above, you can ask new data structure and algorithm questions, questions that have a practical basis and will attract candidates and get them excited about the work you do.

In doing so, you'll also be driving our industry forward. Memorizing a bunch of LeetCode questions can give a candidate an advantage in an interview, which is not good and doesn't make cheating seem like a rational choice for an interview. The solution is for employers to do more work and ask better questions. Let's take action together.

A true message to job seekers

Okay, now, to all of you who are actively looking for a job, listen up! Yes, a subset of your colleagues will now use ChatGPT to cheat in interviews, and at companies that use LeetCode questions (sadly, there are many), these colleagues will gain an advantage for a short period of time.

Now, we are in a critical state where company processes have not yet caught up with reality. They will soon abandon the use of LeetCode original questions altogether (which is a boon to our entire industry), or go back to live (which will make it largely impossible for cheaters to pass technical interviews), or both It’s both.

In an already difficult environment, we would worry about other job seekers cheating, which is terrible, but we cannot in good conscience cheat to "level the playing field."

In addition, interviewees who used ChatGPT unanimously stated that using AI during the interview process made the entire interview process much more difficult.

As you can see in the video below, an interviewer answered the interview question perfectly but stumbled when analyzing time complexity. The interviewer was confused when he hurriedly explained how he got the wrong time complexity (the answer provided by ChatGPT).

No one was caught cheating during the experiment and their cameras were turned off. But as we see in the video, cheating is still difficult even for skilled job seekers.

Ethics aside, cheating is difficult, causes stress, and is not simple to implement. Instead, we recommend putting these efforts into practice so you can reap the benefits once companies change their interview processes (which hopefully will happen soon). Ultimately, we hope that the emergence of ChatGPT will serve as a catalyst to shift the industry's interview standards from hard practice and memorization to a true assessment of engineering capabilities.

Original link:

https://interviewing.io/blog/how-hard-is-it-to-cheat-with-chatgpt-in-technical-interviews

This article comes from the WeChat public account "InfoQ" (ID: InfoQ) , author: Michael Mroczka, 36 Krypton is published with authorization.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

MarsBit

The Black Swan Event Was Actually This: The Real Reason for the Recent Bitcoin Crash

BTC

1.38%

ODAILY

The Black Swan Event Was Actually This: The Real Reason for the Recent Bitcoin Crash

BTC

1.38%

ME News

Breaking News! The Year of China's RWA: A Compliant Channel Opens for Trillions of Yuan in Domestic Assets to Go Global