Just how good is AI-assisted code generation?
Generative AI-assisted coding allows developers to write code faster and often, more accurately using digital tools to create code based on natural language prompts or partialcodeinputs. (Like some email platforms, the tools can also suggest code for auto-completion as its written in real time.)
AI-assistedcodegenerationtools are increasingly prevalent in software engineering, and somewhat unexpectedly, have become low-hanging fruit for most organizations experimenting with generative AI (genAI). Adoption rates are skyrocketing. Thats because even if they only suggest a baseline of code for a new application, automation tools can eliminate hours that otherwise would have been devoted to manual code creation and updating.
Evans Data Corp., a market research firm that specializes in software development, conducted a multinational survey of 434 AI and machine learning developers. When asked what they most likely would create using genAI tools, the top answer was software code, followed by algorithms andlarge language models(LLMs). They also said they expect genAI to shorten the development lifecycle and make it easier to add machine-learning features.
By 2027, 70% of professional developers will be using AI-powered coding tools, up from less than 10% in September 2023,according to Gartner Research. And within three years, 80% of enterprises will have integrated AI-augmented testing tools into their software engineering toolchain a significant increase from approximately 15% early last year, Gartner said.
One of the top tools used for genAI-automated software development isGitHub Copilot. GitHub Copilot is powered by generative AI models developed by GitHub, OpenAI, and Microsoft, and is trained on all natural languages that appear in public repositories.
Since GitHub Copilot for business was launched last year, more than 50,000 organizations have signed up to use it, including digital natives such as Etsy and HelloFresh, as well as leading enterprises including Autodesk, Dell Technologies, and Goldman Sachs, according to Amanda Silver, corporate vice president of Microsofts Developer Division. (Microsoftacquired GitHub in 2018.)
GitHub Copilot now has more than 1.3 million paid subscribers, according to Silver.With 50,000 licenses, Accenture is now GitHubs largest Copilot customer to date, Silver said.
Along with GitHubs Copilot, some of the most popular code-generation tools includeGoogle Bard,Amazon CodeWhisperer,Microsoft 365 Copilot(powered by GPT),Replit,Divi AI,Tabnine,Refact.ai, andCodeium. Most are free or come as part of a larger AI-enabled subscription service.
AI-powered software augmentation tools can have an enormous impact on developer efficiency and productivity. Amazon Web Services (AWS), for example, ran a productivity challenge and found developers who used its CodeWhisperer code development tool were27% more likely to complete tasks successfullyand did so an average of 57% faster than those who didnt use the tool.
(Amazon Qis a genAI-based chatbot developed by Amazon for enterprise use and it underpins its CodeWhisperer tool.Amazon Q is powered byAmazon Bedrockwhich offers access to a selection of models including from theAmazon Titanfamily.)
According to anAWS-Persistent study, developers using Amazon CodeWhispererscustomization capabilitycompleted their tasksan additional28% faster than withoutcustomizations.
For example, a team of five Amazon developers usedAmazon Q Code Transformationto upgrade 1,000 production applications from Java 8 to Java 17 in just two days.The average time per application was less than 10 minutes compared to the two days it used to take to upgrade one app, according to an Amazon spokesperson.
Since becoming generally available in April 2023, Amazon CodeWhisperer has garnered more than 100,000 customers. For example, software development and outsourcing services company HCLTech isrolling out Amazon CodeWhispererto more than 50,000 HCLTech engineers, cloud practitioners and developers to build secure applications for use both internally and for clients.
Over the next two years,Accenture plans to enroll 50,000 development engineers in AWS AI services, including Amazon Q and Amazon CodeWhisperer.
Because genAI software development tools are based on LLMs, theyretrained on millions or billions of lines of code, with the most popular platforms capable of working with any number of coding languages, from C to Python.
Amazons CodeWhisperer is available as part of theAWS Toolkit for Visual Studio (VS) Code and JetBrains. It currently supports Python, Java, JavaScript, TypeScript, C#, Go, Rust, PHP, Ruby, Kotlin, C, C++, Shell scripting, SQL, Scala, JSON, YAML, and HCL.
In our early experimentation, we were doing a lot of work in Python, JavaScript and languages like that, GitHub COO Kyle Daigle said inan earlier interview with Computerworld. GitHub is mainly a Ruby company, but we also write in Go, and C, and FirGit. And so we were expanding our use cases of Copilot and using it in different languages. But overall, Copilot is able to work on the vast majority of languages that are in the public sphere.
Relying on nothing more than user prompts based on natural language processing, genAI-assisted code generators can offer software code suggestions ranging from snippets to full functions. And updates can make the tools even better.
Amazon, for instance, said updates to its CodeWhisperer tool increased codeacceptance rates from around 20% on average to 35% across all languages and use cases.
Now, with Amazon Q included with CodeWhisperer, developers can ask about their code, and leverage Amazon Qs capabilities to find bugs, optimize, and translate code they are working on, Doug Seven, general manager of Amazon CodeWhisperer and director of software development for Amazon Q, said in a blog.
Why is AI-assisted coding so powerful?
One of the more heralded aspects of AI-assisted coding is that users dont have to be versed in software development. Natural language processing allows even business users to simply write a prompt and get back the software needed for any number of projects.
For example, users can write a comment in natural language that outlines a specific task in English, such as, Upload a file with server-side encryption. Based on that information, CodeWhisperer recommends one or more code snippets directly in the development platform to accomplish the task, according to an Amazon spokesperson.
Many of the coding tools also come with enhanced code securitycapabilities scans and code remediation suggestions. Some even come with bias filtering and reference trackers, which detect whether a code suggestion might be similar to open-source training data. The latter are important features in an AI-based coding assistant.
Amazon and otherproviders are also experimenting with tools to assist non-developers in producing apps for business purposes.For example, an Amazon spokesperson said the company sees the engagement of non-developers as a priority for making AI accessible. They releasedPartyRock, anedutainmentgenerative AI application builder that allows non-developers to work with genAI and LLMs in a sandbox environment, publicly after it went viral internally.
You can experiment with building different applications, Seven said in an interview withComputerworld. Well see an increase in different tools for different personas that will use generative AI. I think were just scratching the surface on where well see genAI in different places. Well start to see more and more of these tools.
Accuracy rates vary
Seven said code acceptance rates for CodeWhisperer are around 30% to 40%, but that doesnt mean the code it wrote was incorrect or error ridden. The acceptance rate refers to whether the genAI tool correctly interpreted what the developer asked it to do.
Seven described something akin to a conversation between a developer and an AI-code generator, where the developer asks it to produce something and then modifies the request with follow-up requests. The ability of CodeWhisperer to produce error-free, usable code is quite high, though Seven said Amazon doesnt reveal internal metrics.
Anecdotally, developers and IT leaders have placed the ability of popular AI-based code augmentation tools to correctly generate usable code at anywhere from 50% to 80%.
We had this as a hypothesis. Now were starting to see this in actual studies, said Derek Holt, CEO of AI-powered software delivery providerDigital.ai.
According toa study by Cornell University last year, theres a wide variance between various genAI coding tools. The study showed ChatGPT, GitHub Copilot and Amazon CodeWhisperer generate correct code 65.2%, 64.3% and 38.1% of the time, respectively.
While the study is a year old, the accuracy rates for the AI-assisted code tools is more or less the same today, according to Burak Yetitiren, the papers lead author and a graduate student researcher at UCLAs Henry Samueli School of Engineering and Applied Science.
Astudy by GitClear, a developer tool for GitHub and GitLab that provides code analysis and git stats, examined more than 153 million lines of code from 2020 to 2023. Highlighting key shifts in code churn, duplication, and age, it explored the impact of AI tools like GitHub Copilot on programming practices.
Among GitClears findings was that developers write code55% fasterwhen using Copilot. When GitClear looked at GitHubs code quality and maintainability compared to what would have been written by a human, it found less experienced developers have a greater advantage with AI-assisted programming compared to veteran developers.
GitHubs own data suggests that junior developers use Copilot about 20% more than more experienced developers, the research found.
GitClear conducted a corresponding survey of 500 developers and asked, What metrics should you be evaluated on, when actively using AI? The top three issues they named were code quality, time to complete task, and number of production incidents.
When developers are inundated with quick and easy suggestions that will work in the short term, it becomes a constant temptation to add more lines of code without really checking whether an existing system could be refined for reuse, GitClears paper said.
More code, but more errors?
Developers are producing 45% more code with the automation tools, according to Digital.ais Holt, but thats not necessarily a good thing.
The main challenge with AI-assisted programming, however, is that it becomes so easy to generate a lot of code which shouldnt have been written in the first place, Adam Tornhill, founder & CTO at CodeScene, said on X/Twitter.
Another wrinkle is that when code is not generated by humans, it is more opaque. As a result, quality challenges are emerging, including questions about whether code can effectively be tested for errors and security holes.
Ina survey of software engineerslast year (96% of whom used AI-based coding tools) by developer security platform Snyk, more than half said insecure AI code suggestions were common.
That shouldnt surprise us, Holt said. Its early days and were training these models on all of the code in certain repositories. All youre going to do is repeat the mistakes that were made by the developers who wrote that original code.
Given that much of a developers time is spent fixing existing code not writing new features the ability to read code and find issues when its not written by humans becomes yet another issue, Holt said.
Even with those issues, developers wouldnt be adopting tools like Copilot if they didnt believe it accelerated their ability to produce code. GitHubs research on the former point found developers are 75% more fulfilled when using Copilot.
In a study of 450 Accenture developers using Copilot for six months, 88% of suggested code was retained, build success rate increased by 45%, and every developer surveyed reported Copilot was useful, according to Microsofts Silver.
Churn, moved and copy/paste code issues
GitClear,however, also found that with the increased use of AI-assisted programming, the amount of Churn, Moved, and Copy/Pasted code increased significantly.
Churn is the percentage of code that is pushed to the repository, then subsequently reverted, removed or updated within two weeks. It was relatively rare when developers authored all their own code; only 3% to 4% of code was churned prior to 2023.
But overall code churn jumped 9% the first year Copilot was available in beta the same year that ChatGPT became available.
From 2022 through 2023, the rise of AI assistants was strongly correlated with mistake code being pushed to the repository. Copilot prevalence its use in generating code was 0% in 2021, 5% to 10% in 2022, and 30% in 2023, GitClear found.
If the current pattern continues into 2024, more than 7% of all code changes will be reverted within two weeks, double the rate of 2021, GitClears report said.
There is perhaps no greater scourge to long-term code maintainability than copy/pasted code. Thats because code thats simply reused can also contain previous mistakes, security holes or other issues.
I have no doubt well be able to figure out the problems, and well be able to train models on small amounts of code created only by our best developers, Holt said. But right now youre getting a junior developer, and if youre not paying attention to what that means to the broader software development lifecycle, youre going to be running some risks.
Amazons Seven argued that one of the strengths of CodeWhispererand other products is their ability to examine existing code for errors and then suggest changes. So, itll actually give you the code to make that change, Seven said. The advantage of using Amazon Q [CodeWhisperer] in this context is as a developer, you have a debugging companion.
That could be particularly useful in checking for discrepancies in existing code that may not be familiar to developers. And Q is really good at that, he said.
Another advantage of automated tools is that they can be used in a set-and-forget mode, where a developer or engineer simply explains a task and then the tools complete it independently whether developing a new application or debugging an existing one. In either case, the accuracy of the code, and the quality of the code, is really quite high, Seven said.
Whats not in question is that over time, software generation tools will continue to improvethough there will always be the need for a human in the loop.
My gut tells me there will always be roles for developers, whether thats reviewing or catalogizing or a mixture of both, Holt said. Were not even talking about the fact that delivering code is not the goal. Delivering great features that customers love is the actual goal.
So, from my view, I still have a long career ahead of me in software development.