How LLMs Are Reshaping the Developer's Toolkit
Let's examine the state of Large Language Models (LLMs) for coding. Is it hype that's bound to fizzle out, or will we all need to search for new jobs in the coming years?
This is a follow-up to my previous post, which delved specifically into three tools: GitHub Copilot, Sourcegraph Cody, and AWS Q. Now, we'll take a broader look at the landscape.
In this post, we'll cover the following topics:
The inception of the hype surrounding LLMs and coding
A reality check: Where are we now? What do developers say, and what do various studies reveal?
Future outlook
What you should do as a manager or as a developer
The hype train
I can still remember the posts on LinkedIn, X and other medias when LLMs first emerged, showcasing their ability to code simple items. Everybody was blown away and when GPT-4 was released, the hype train went into overdrive, with bold claims that AI would soon render software engineers obsolete. If you have forgotten that, just take a look at this post from a little over a year ago: AI is eating software.
In hindsight it is very easy to disregard those claims but during the early days it was hard what the traction was and it was unclear in what direction everything will go. Now we know much more the hardware costs that are eating almost every LLM startup and the hallunication and training problem that also is very relevant when doing coding with LLMs. But at that time everybody thought that after GPT-4, 5 will be released then 6, 7 until we reach AGI (Artificial General Intelligence).
For companies like OpenAI, Anthropic, Meta and others this is a perfect storm which they are able to fund tons of projects and get almost unlimited media attention. They still claim that AGI is within reach in the coming years, but if we look realistically at LLMs, they will likely not be the only technology needed to achieve AGI.
Later in time, we saw agents that solved more complex tasks, and people were amazed as LLMs were able to create pet projects with a frontend, backend, and database. To not make this achievement small I think it is undeniable one of the most groundbreaking technologies that is creating a lot of opportunities. The problem was that the potential was overhyped during this time. We're now coming back to reality, recognizing that while LLMs can help, they won't put all software engineers around the world out of work.
Reality check
So where are we now with LLMs for coding? LLMs have become a powerful tool that, when used properly, can boost the performance of individuals, teams, and even entire companies. However, it's crucial to remember that they're just another tool in the developer's toolkit. To harness their full potential, you need to know where they excel and where they fall short.
Let's dive into some diverse data points from various studies to get a clearer picture of the current state of LLMs in coding.
One of the first empirical studies about LLMs and coding that I found was the study at ANZ Bank with GitHub Copilot. [2402.05636] The Impact of AI Tool on Engineering at ANZ Bank An Empirical Study on GitHub Copilot within Corporate Environment (arxiv.org). What makes this study so interesting? It aimed to create a level playing field by comparing non-AI users with AI users. About 100 engineers participated in this exercise, tackling six algorithmic coding challenges each week. One group was allowed to use Copilot, while the other wasn't. Python was chosen as the language for solving problems, and extensive preparation was done beforehand—for example, ensuring Python proficiency—to prevent bias in problem-solving speed. Groups were rotated weekly. In this controlled environment, the group using GitHub Copilot completed tasks 42.36% faster. While this is fascinating, there are a couple of caveats:
As engineers, we know we don't solve algorithmic challenges daily, so the exercise doesn't fully reflect a typical engineer's work life (though we do encounter such challenges occasionally).
We already knew that LLMs excel at solving algorithmic questions (also known as interview questions).
A more realistic approach comes from the BlueOptima paper The Impact of Generative AI on Software Developer Performance - BlueOptima. They created three groups: High AI-Contributing Developers, Low-AI Contributing Developers, and No AI-Contributing Developers. Their study design addressed two key questions in developers' daily work: productivity increase (albeit in single digits) and quality improvement. However, the study's main challenge lies in its methodology:
How do you measure productivity and quality?
While BlueOptima's approach is valid within the context of their tools, it raises questions about whether their findings truly reflect the impact of LLMs on daily work.
As you can see, determining whether AI boosts productivity is challenging, largely because measuring developer productivity is inherently difficult. Not to go deep into this debate here, but I plan to write a blog post exploring this topic in more detail another day.
So, what other data can we examine?
Survey data
So far those were quantitative data results on LLM and coding. Lets have a look at some qualitative data and there are two excellent articles from Gregrely Orosz on that matter.
The survey data from approximately 211 tech professionals reveals a generally positive outlook on LLMs for coding. Interestingly, when asked about changes after six months of using these tools, respondents reported an even more favorable view of LLMs in coding. Overall, the sentiment among those using LLMs for coding is positive.
However, there's a subset with significant negative feedback, citing unreliability, frequent hallucinations, and a steep learning curve. Some senior engineers express concern that junior developers might blindly follow LLM outputs without grasping the underlying principles.
This reminds me of the early days of Google and Stack Overflow, when senior developers urged juniors to read books and understand fundamentals rather than simply copy-pasting solutions. Of course, the situations differ—an AI will always provide an answer, whereas Stack Overflow might leave you digging deeper if no solution exists.
There is more interesting qualitative data in the following studies:
The study from Bowen Xu et al. AI Tool Use and Adoption in Software Development by Individuals and Organizations: A Grounded Theory Study (arxiv.org) is explaining why LLMs within the development community are so differently used. I am not going too much into detail of the study but they mention factors influencing the adoption and use of AI tools. The differeniate between individual, organizational motives and challenges. I can relate to this study as I have seen those factors play out and have a key influence if people are adopting LLMs in their daily work or if not.
The last article is from Google What Do Developers Want From AI? (computer.org) where they focus on what is wanted from AI. In this survey there is clear indication that developers want help from AI and that a lot of tasks especially the tedious ones should be handled in best case by an AI.
But are LLMs in particular even good enough for certain use cases?
For what a LLM is a good use case
So here we go more in my personal opinion but it was interesting to find most of my use cases reflected in the articles I have read and linked above.
Learning a new framework/language
Prototyping
This is probably the biggest benefit for me, as I can do things much quicker, especially if it's just for personal use or to test something out. Previously, many of my ideas remained just that—ideas. Now, they've become prototypes and projects where I learn a lot.
Explaining code
In a codebase, if you can provide the LLM with context, it performs better than a search, especially if you're not familiar with the codebase.
Writing tests
I draft tests and then fine-tune them. Testing is often tedious work, but with AI, it becomes more enjoyable.
Creating simple documentation
Maybe this is just me but I hate bad README.md files.
Automation
Write scripts to automate tasks for me. I have a Raspberry Pi running at home, and I've always known I could write scripts to automate some of the manual steps for updating and backing up the system. I never got around to it before, but now it's easier than ever.
I recently came accross a great article from Nicholas Carlini who even summed up better for what LLMs are for sure a great use-case https://nicholas.carlini.com/writing/2024/how-i-use-ai.html.
For what LLM is a bad use case so far
LLMs currently have limitations, especially with existing models. While these might change rapidly, we also face constraints such as the high costs of training new models and the pace at which they can evolve.
Writing production-ready code
It's far too risky to allow an LLM to write code and deploy it directly to production. LLMs are still limited in understanding the boundaries and business cases of your application. It's acceptable to generate code, but always double-check with pull requests and tests.
Context issues
Most LLMs struggle with understanding your specific context. This applies to both business context and code context, such as custom frameworks. Their results are often unusable as a result, and we need more work in this area to make LLMs more successful. Many coding LLM tools have recognized this issue and are trying to address it, but challenges remain with context windows, RAG implementation, and other factors. While it's still far from perfect, progress in this space is moving faster than in other areas.
Writing entire applications or fixing issues based on text descriptions
There are numerous impressive videos showcasing LLMs writing complete applications and fixing issues based on, for example, GitHub Issues. However, these demonstrations typically use simple, toy applications and issues. In the real world, you quickly encounter limitations, as the LLM would need comprehensive knowledge of both the business case and the codebase.
Future
This brings me to the speculative section, adding more fuel to the hype train. What does the future hold? It's impossible to predict with certainty, as the field is evolving fast. However, we've also seen limitations with LLMs, to the extent that I'm skeptical they alone will revolutionize everything for developers. I believe we'll see more value from synergies between traditional AI models and LLMs working together.
Even as LLM models grow larger and more sophisticated, I don't foresee them replacing developers as was initially predicted when GPT-3 and GPT-4 were released. Instead, they'll become better-integrated tools with fewer issues (such as hallucinations). It's already becoming easier to create prototypes and test ideas. Consider also automations and pipelines—code that doesn't follow complex business logic can be created much faster with LLMs' help. This could ultimately allow us to achieve much more automation, leading to higher-quality software. I firmly believe that LLMs have and will continue to have a positive impact on software quality.
What should you do as a Manager or as a developer?
But what should you do right now?
As a Manager:
If you're in a position to do so, provide your developers access to LLM tools. If not, advocate for it. Here are three compelling reasons:
Control and data protection: By offering official tools, you mitigate the risk of developers using private tools that might expose your code to external training.
Attracting talent: For many developers—myself included—an organization without a plan or access to LLM tools would be a significant drawback.
Developer growth: Early adoption allows your team to learn the limits and benefits of Large Language Models, enhancing their skills.
As an Engineer:
Don't panic: LLMs aren't capable of replacing you, and I believe they won't be in the near future (though don't quote me on that).
Stay informed: LLMs are here to stay. Embrace this technology to maintain your competitive edge.
Master the pros and cons: Understanding where LLMs excel and where they fall short will enable you to leverage these tools effectively, playing to their strengths.
Summary
It's understandable that many people are skeptical about AI in software engineering. The hype can be overwhelming, especially when headlines like Amazon CEO Andy Jassy Says Company's AI Assistant Has Saved $260M And 4.5K Developer-Years Of Work: 'It's Been A Game Changer For Us' (yahoo.com) pop up. But this is part of the game—companies driving this space will always make grand statements to garner attention and funding.
Nevertheless, LLMs can offer significant value. They have limitations, but if you haven't found them useful yet, don't write them off. Experiment with the technology to discover its strengths and how it can benefit you. Keep in mind that this is a rapidly evolving field—the LLM or tool that was once best suited for a task might already be outdated, with superior models emerging. Take the evolution of Claude and its new Artifacts feature as an example (What are Artifacts and how do I use them? | Anthropic Help Center). For those looking to stay up-to-date, I highly recommend following Markus Zimmermann's LinkedIn posts. He regularly analyzes extensive data and shares current benchmark results for various LLMs and their use cases, providing valuable insights into this rapidly evolving field.
These are truly exciting times to be part of this industry.