AI Dev Buddies: What our team learned from using AI Assistants

Stefan Fuchs
willhaben Tech Blog
10 min readApr 30, 2024

--

Tools like GitHub Copilot and JetBrains AI Assistant are rapidly gaining traction, promising to revolutionize how we write code.

The developer community is buzzing with excitement around AI assistants. The potential to boost productivity and tackle tedious tasks is undeniable. However, it’s crucial to move beyond the hype and understand the true impact of these tools.

Nearly a year ago, we at willhaben did an evaluation of Github Copilot and Tabnine. While intrigued by the potential, we found a significant gap between what these tools promised and what they delivered. In our evaluation, both Copilot and Tabnine’s reliance solely on the open file for context resulted in inconsistent or incorrect suggestions, such as wrong parameters or calls to non-existent methods. The focus on code suggestions, often not particularly helpful, hindered development flow more than it aided it. While we remain interested in the future of AI-powered development tools, we opted to wait for the next generation, such as the rumored Copilot X at that time, before revisiting this technology.

This decision underscores the importance of moving beyond the hype and understanding the true impact of these tools.

To gain an updated first hand perspective, we recently conducted an internal experiment within our development team. We enlisted a significant portion of our developers (⅔) to use either GitHub Copilot (⅓) or JetBrains AI Assistant (⅓) for a period of 42 days. The survey not only captured usage frequency but also examined the perceived helpfulness of AI assistants across various development tasks. Did they truly become the coding comrades we hoped for? Were there any unexpected challenges?

In this blog post, we’ll unveil the results of our survey and evaluate the real impact of AI assistants. We’ll explore how these tools influence specific tasks like code completion, debugging, and documentation generation. Did they live up to the hype? We’ll also share valuable insights from our developers, highlighting both the strengths and limitations they encountered.

Methodology

Participant Selection and Groups

Our development team was divided into three groups for a balanced comparison:

  • AI Assistant Users: Developers were assigned to use either GitHub Copilot (18 participants) or JetBrains AI Assistant (17 participants).
  • Control Group: Around a third of the developers continued using only traditional coding aids (like Stack Overflow 😉).

Data Collection and Surveys

To gauge the effectiveness and reception of the AI tools, we implemented a structured survey approach, collecting data at two critical points: midway and at the end of the 42-day experiment. The surveys focused on the developers’ experiences with the AI assistants for specific development tasks:

  • Writing code from scratch
  • Create unit tests
  • Writing documentation
  • Understanding code
  • Finding bugs
  • Optimizing/refactoring code
  • Generating commit messages

Participants rated each task on a scale of 1 (never used/not useful) to 5 (very often used/very useful) for both frequency of use and helpfulness. Additionally, open-ended questions encouraged them to share specific examples, suggestions, and potential use cases beyond coding. The response rate was 94% of the participating developers.

Focus on Developer Experience

While we initially aimed to track development metrics like Lead Time to Change (LTC) and SonarQube code quality to assess the quantitative impact of AI assistants, these metrics proved unsuitable for this study. Due to the brevity of the experiment and the relatively small participant groups, coupled with the inherent variability of those development metrics, it proved difficult to definitively assess the impact of AI assistants on these quantitative measures.

Therefore, we opted to prioritize survey data as the primary source of insights. This approach provided valuable perspectives on the developer experience, including the perceived benefits and drawbacks of using AI coding assistants in everyday tasks.

Survey Results

This section delves (😉) into the insights gleaned from our developer surveys, revealing not just the frequency of AI assistant usage for specific coding tasks, but also their perceived helpfulness and the overall sentiment towards these tools.

Engagement & Usefulness

Barchart of the % of user that used each of the AI assistant features

The chart above illustrates the share of the participating developers, using either GitHub Copilot or JetBrains AI Assistant, that used a certain feature. As expected, writing code from scratch was the most utilized feature, highlighting the potential of AI assistants in kickstarting development. Unit testing and optimizing/refactoring code followed closely, indicating areas where developers seek repetitive task automation.

Barchart showing the average perceived helpfulness of each of the AI asistant features

The bar graph comparison reveals a trend favoring GitHub Copilot in helpfulness ratings across several tasks. While both tools were seen as beneficial for some tasks, developers consistently rated Copilot as more helpful for writing code from scratch, code refactoring and unit testing.

Where AI Assistants Excel

  • Writing Code from Scratch: This was the most used and favored feature. Developers appreciated GitHub Copilot’s ability to generate relevant code snippets, accelerating initial coding phases. However, careful review of suggestions is crucial to avoid potential code quality issues.
  • Unit Testing: Over two-thirds of developers found AI assistants helpful for unit testing, with Copilot again receiving higher marks. While some encountered irrelevant or incomplete test cases, AI assistants can significantly streamline this repetitive task.
  • Optimizing/Refactoring Code: A significant 75% of developers used AI for this task, with GitHub Copilot praised for its valuable suggestions. However, developer expertise remains essential for evaluating and implementing the suggested optimizations.
  • Generating Commit Messages: This feature received positive feedback for its moderately helpful, but still useful suggestions.

Areas for AI Assistant Improvement

  • Understanding Code: Both tools received lower ratings for code comprehension tasks. AI assistants currently struggle with deep code analysis, limiting their effectiveness in this area.
  • Finding Bugs: Only 40% of developers used AI for bug detection due to its perceived limitations in handling complex or logic-related bugs.
  • Writing Documentation: While some found AI-generated documentation moderately helpful, concerns were raised about its quality and completeness, suggesting a need for human intervention.

Additional Insights from Developer Responses

This section reflects on the nuanced feedback from our developers regarding their experience with AI coding assistants. It sheds light on some challenges and unexpected benefits, showing the current AI capabilities and their practical implications in software development.

  • Over-reliance and Code Quality: Some developers highlighted concerns about becoming overly reliant on AI suggestions, cautioning that it could lead to code quality issues if suggestions aren’t thoroughly vetted. This underscores the continuing need for critical thinking and robust code review practices even (or especially?) when AI tools are involved.
  • Incomplete or Irrelevant Results: Developers encountered issues such as incomplete or irrelevant test cases generated for unit testing and repetitive suggestions for similar class names. These instances underscore the need for ongoing improvements in the AI’s accuracy and its ability to adapt suggestions to the specific context of a project.

Limited Effectiveness for Complex Tasks AI assistants showed limitations when dealing with complex tasks, particularly:

  • Explaining code laden with project-specific logic, AI tools often struggle to grasp the finer details and unique logic of a project’s codebase, highlighting a gap in their current understanding capabilities.
  • Generating code for niche technologies The effectiveness of AI assistants decreases with lesser-known technologies (e.g. dbt, Terraform, Keycloak), likely due to limited training data.

Benefits Beyond Coding: Beyond straightforward coding tasks, several developers appreciated the AI assistant’s chat functionality, which proved useful for:

  • Brainstorming solutions: AI suggestions can spark new ideas, providing fresh perspectives that might not be immediately obvious.
  • Gaining explanations unrelated to the current code: The chat feature also serves as a resource for understanding broader programming concepts or exploring alternative approaches, making it a versatile tool in the developer’s toolkit.

These insights not only reflect the mixed effectiveness of AI coding assistants but also hint at their potential to evolve into more capable and context-aware partners in coding.

Overall Impression

Stacked barchart showing the number of devs that rated the AI assistant on a scale from 1 to 5

None of our developers was blown away by the tools and not one gave the highest rating when asked for their overall experience with the AI assistant. Nevertheless, the general sentiment from developers about their experience was positive, with many acknowledging their potential to streamline development workflows:

“Copilot has become an invaluable tool for me, especially when starting new projects. It accelerates the initial coding phase significantly.”

“AI assistants have their limitations but excel in automating repetitive tasks like writing unit tests and code refactoring, which saves valuable time.”

4 out of 5 developers want to continue using an AI Assistant, with a strong preference for GitHub Copilot (9 out of 10 participants in that group want to keep it).

Stacked barchart showing the percentage of devs per group that want to continue to use an AI assistant

However, some developers expressed concerns about over-reliance and the need for careful evaluation of AI suggestions to ensure code quality.

These insights demonstrate how AI coding assistants are being integrated into developer workflows, pinpointing the tasks where these tools are most beneficial while also acknowledging their limitations. Moving forward, this data will guide our decisions on further adoption and integration of AI tools in our development practices, with a focus on maximizing their strengths while mitigating potential drawbacks.

Moving Forward

The Future of Development: Powered by AI

The developer survey results confirmed the potential of AI coding assistants to streamline workflows and enhance developer productivity. With a focus on efficiency, we will deepen the integration of AI assistants like GitHub Copilot, which have demonstrated strong capabilities in tasks such as code generation and optimization. This enhancement of our tool-set will liberate developers to focus on more intricate tasks, boosting overall productivity.

Addressing Limitations Proactively

The field of AI technology is rapidly evolving. While we acknowledge current limitations, especially in understanding complex code and niche technologies, we are confident that continuous advancements in AI will soon address these hurdles. Staying abreast of these developments will allow us to seamlessly incorporate improvements, ensuring our tools are always at the cutting edge.

Upholding Quality Standards

We are committed to maintaining high code quality. Our well-established code review processes will continue to be crucial, particularly as we integrate AI more comprehensively. These practices will ensure that any code suggested by AI meets our stringent quality criteria. Importantly, we emphasize the critical role of developer judgement in this process. The final responsibility for the code lies with the developer, and AI suggestions should be carefully evaluated before implementation.

Exploring New Use Cases

Feedback from the survey has opened up intriguing possibilities for AI applications beyond traditional coding tasks:

  • Automated Documentation and Review: We are excited to explore how AI might help streamline the creation of technical documentation and enhance our code review processes. This could significantly reduce the workload on our developers, allowing them to concentrate on innovation.
  • Knowledge Management with AI: Integrating AI into our knowledge management systems could revolutionize how we gather, store, and retrieve information, enhancing learning and decision-making processes across the board.

A Balanced Approach to AI Adoption

Feedback from developers emphasized the importance of integrating AI thoughtfully:

  • AI as a Co-pilot: It is crucial to view AI tools as supplements to, not replacements for, human expertise. Encouraging effective collaboration between developers and AI will maximize the benefits of these technologies. AI suggestions are powerful tools, but should not be seen as a substitute for developer judgement.
    The “the AI did it” excuse will not be accepted.
  • Learning with AI: For our less experienced developers, AI assistants can serve as valuable educational aids, providing guidance and accelerating skill acquisition.

By acting on these insights, we can make the most of AI’s current capabilities within our development workflow. This will inform our strategic approach to AI adoption, ensuring we’re ready to integrate future advancements as they emerge.

Conclusion: Navigating the Future with AI Coding Assistants

Our developer survey has solidified the role of AI coding assistants as pivotal tools in modern software development. The feedback underscores their ability to enhance productivity, particularly in tasks like code completion, unit testing, and optimization. Developers have reported substantial gains, confirming the benefits of integrating AI into our workflows.

Despite the enthusiasm, the survey also shed light on several challenges. Developers noted difficulties with AI assistants in understanding complex code and adapting to niche technologies. Concerns were also raised about potential over-reliance on AI, emphasizing the necessity for vigilant code reviews to maintain high quality standards.

However, the landscape of AI technology is progressing rapidly, promising that current limitations are likely to be overcome as these tools evolve. By staying at the forefront of AI developments, we position ourselves to seamlessly integrate forthcoming innovations and enhancements.

Looking beyond conventional coding tasks, the potential for AI extends into areas like automated documentation, synthetic test data generation and AI-enhanced knowledge management systems. These opportunities could further revolutionize how we handle information and streamline repetitive tasks.

As we advance, our approach to integrating AI will remain balanced. AI assistants are best utilized as supplements to, not substitutes for, human expertise. They offer invaluable support and guidance, particularly for less experienced developers, but should be used in conjunction with, not in place of, developer judgement.

The exploration of AI-powered development tools is an ongoing process. By integrating these tools thoughtfully, we can ensure our developers have the resources they need to thrive in an evolving development landscape. The integration of AI goes beyond mere adoption; it’s a chance to redefine the core principles of software creation for the AI era.

--

--