Just how good is AI-assisted code generation?

Why is AI-assisted coding so powerful?

One of the more heralded aspects of AI-assisted coding is that users dont have to be versed in software development. Natural language processing allows even business users to simply write a prompt and get back the software needed for any number of projects.

For example, users can write a comment in natural language that outlines a specific task in English, such as, Upload a file with server-side encryption. Based on that information, CodeWhisperer recommends one or more code snippets directly in the development platform to accomplish the task, according to an Amazon spokesperson.

Many of the coding tools also come with enhanced code securitycapabilities scans and code remediation suggestions. Some even come with bias filtering and reference trackers, which detect whether a code suggestion might be similar to open-source training data. The latter are important features in an AI-based coding assistant.

Amazon and otherproviders are also experimenting with tools to assist non-developers in producing apps for business purposes.For example, an Amazon spokesperson said the company sees the engagement of non-developers as a priority for making AI accessible. They releasedPartyRock, anedutainmentgenerative AI application builder that allows non-developers to work with genAI and LLMs in a sandbox environment, publicly after it went viral internally.

You can experiment with building different applications, Seven said in an interview withComputerworld. Well see an increase in different tools for different personas that will use generative AI. I think were just scratching the surface on where well see genAI in different places. Well start to see more and more of these tools.

Accuracy rates vary

Seven said code acceptance rates for CodeWhisperer are around 30% to 40%, but that doesnt mean the code it wrote was incorrect or error ridden. The acceptance rate refers to whether the genAI tool correctly interpreted what the developer asked it to do.

Seven described something akin to a conversation between a developer and an AI-code generator, where the developer asks it to produce something and then modifies the request with follow-up requests. The ability of CodeWhisperer to produce error-free, usable code is quite high, though Seven said Amazon doesnt reveal internal metrics.

Anecdotally, developers and IT leaders have placed the ability of popular AI-based code augmentation tools to correctly generate usable code at anywhere from 50% to 80%.

We had this as a hypothesis. Now were starting to see this in actual studies, said Derek Holt, CEO of AI-powered software delivery providerDigital.ai.

According toa study by Cornell University last year, theres a wide variance between various genAI coding tools. The study showed ChatGPT, GitHub Copilot and Amazon CodeWhisperer generate correct code 65.2%, 64.3% and 38.1% of the time, respectively.

While the study is a year old, the accuracy rates for the AI-assisted code tools is more or less the same today, according to Burak Yetitiren, the papers lead author and a graduate student researcher at UCLAs Henry Samueli School of Engineering and Applied Science.

Astudy by GitClear, a developer tool for GitHub and GitLab that provides code analysis and git stats, examined more than 153 million lines of code from 2020 to 2023. Highlighting key shifts in code churn, duplication, and age, it explored the impact of AI tools like GitHub Copilot on programming practices.

Among GitClears findings was that developers write code55% fasterwhen using Copilot. When GitClear looked at GitHubs code quality and maintainability compared to what would have been written by a human, it found less experienced developers have a greater advantage with AI-assisted programming compared to veteran developers.

GitHubs own data suggests that junior developers use Copilot about 20% more than more experienced developers, the research found.

GitClear conducted a corresponding survey of 500 developers and asked, What metrics should you be evaluated on, when actively using AI? The top three issues they named were code quality, time to complete task, and number of production incidents.

When developers are inundated with quick and easy suggestions that will work in the short term, it becomes a constant temptation to add more lines of code without really checking whether an existing system could be refined for reuse, GitClears paper said.

More code, but more errors?

Developers are producing 45% more code with the automation tools, according to Digital.ais Holt, but thats not necessarily a good thing.

The main challenge with AI-assisted programming, however, is that it becomes so easy to generate a lot of code which shouldnt have been written in the first place, Adam Tornhill, founder & CTO at CodeScene, said on X/Twitter.

Another wrinkle is that when code is not generated by humans, it is more opaque. As a result, quality challenges are emerging, including questions about whether code can effectively be tested for errors and security holes.

Ina survey of software engineerslast year (96% of whom used AI-based coding tools) by developer security platform Snyk, more than half said insecure AI code suggestions were common.

That shouldnt surprise us, Holt said. Its early days and were training these models on all of the code in certain repositories. All youre going to do is repeat the mistakes that were made by the developers who wrote that original code.

Given that much of a developers time is spent fixing existing code not writing new features the ability to read code and find issues when its not written by humans becomes yet another issue, Holt said.

Even with those issues, developers wouldnt be adopting tools like Copilot if they didnt believe it accelerated their ability to produce code. GitHubs research on the former point found developers are 75% more fulfilled when using Copilot.

In a study of 450 Accenture developers using Copilot for six months, 88% of suggested code was retained, build success rate increased by 45%, and every developer surveyed reported Copilot was useful, according to Microsofts Silver.

Churn, moved and copy/paste code issues

GitClear,however, also found that with the increased use of AI-assisted programming, the amount of Churn, Moved, and Copy/Pasted code increased significantly.

Churn is the percentage of code that is pushed to the repository, then subsequently reverted, removed or updated within two weeks. It was relatively rare when developers authored all their own code; only 3% to 4% of code was churned prior to 2023.

But overall code churn jumped 9% the first year Copilot was available in beta the same year that ChatGPT became available.

From 2022 through 2023, the rise of AI assistants was strongly correlated with mistake code being pushed to the repository. Copilot prevalence its use in generating code was 0% in 2021, 5% to 10% in 2022, and 30% in 2023, GitClear found.

If the current pattern continues into 2024, more than 7% of all code changes will be reverted within two weeks, double the rate of 2021, GitClears report said.

There is perhaps no greater scourge to long-term code maintainability than copy/pasted code. Thats because code thats simply reused can also contain previous mistakes, security holes or other issues.

I have no doubt well be able to figure out the problems, and well be able to train models on small amounts of code created only by our best developers, Holt said. But right now youre getting a junior developer, and if youre not paying attention to what that means to the broader software development lifecycle, youre going to be running some risks.

Amazons Seven argued that one of the strengths of CodeWhispererand other products is their ability to examine existing code for errors and then suggest changes. So, itll actually give you the code to make that change, Seven said. The advantage of using Amazon Q [CodeWhisperer] in this context is as a developer, you have a debugging companion.

That could be particularly useful in checking for discrepancies in existing code that may not be familiar to developers. And Q is really good at that, he said.

Another advantage of automated tools is that they can be used in a set-and-forget mode, where a developer or engineer simply explains a task and then the tools complete it independently whether developing a new application or debugging an existing one. In either case, the accuracy of the code, and the quality of the code, is really quite high, Seven said.

Whats not in question is that over time, software generation tools will continue to improvethough there will always be the need for a human in the loop.

My gut tells me there will always be roles for developers, whether thats reviewing or catalogizing or a mixture of both, Holt said. Were not even talking about the fact that delivering code is not the goal. Delivering great features that customers love is the actual goal.

So, from my view, I still have a long career ahead of me in software development.

www.actusduweb.com

Suivez Actusduweb sur Google News

AIassisted code génération Good