Exploring the Future of AI-Powered Development

Written by Jami Couch

•

Development

•

Nov 1, 2024

Written by Jami Couch

Development

Nov 1, 2024

OpenAI launched ChatGPT on November 30, 2022, and software development hasn’t been the same since. Developers often joke about using Google and Stack Overflow to perform most of their job duties. When OpenAI launched ChatGPT, there was suddenly a new tool in the mix. Since then, software development has become inextricably linked with generative AI and Large Language Models. Shortly after, GitHub launched Copilot, which was like having auto-complete on steroids.

Now, I am going to compare three state of the art tools for AI-powered development that move beyond copy-pasting code to and from ChatGPT or even the single-file auto-complete of Copilot:

I am going to treat all of these tools like an AI assistant or junior developer. I will give them a task that touches multiple files, and they will produce some output. Then we’ll compare the results and talk about where all of this is going.

Simple Task: Creating a Support Request for Content Moderation Failures

For the first test, I chose a straightforward yet real-world scenario. Here’s the prompt:

When an Episode fails content moderation (see the Episodes::ContentModerator service), create a SupportRequest containing the user’s email and ID, podcast name and ID, episode name and ID, and details of the failure.

Each tool tackled this task differently, and here’s how they measured up:

Cursor

Cursor initially struggled when given no additional context, attempting to create new models and modify the existing ContentModerator service unnecessarily. However, one of the best features of Cursor is the ability to specify additional context – in this case, the existing SupportRequest model, ContentModerator service, Episode model, and database schema. With that additional context, Cursor delivered the expected result, handling the task accurately. I had to ask it to update the tests as well, but they passed on the first try.

GitHub Copilot Workspace

Copilot Workspace performed impressively here, nearly completing the task on the first attempt with minimal guidance. Though it lacked some minor details compared to Cursor, it included tests on the first try, simplifying the workflow. I followed up by asking it to pretty-print the JSON formatted content moderation results, and it executed the request without issue.

Solver

Solver’s response mirrored Copilot Workspace’s response, both including the implementation and tests in the first response. One interesting difference was that Solver included tests for both error and failure paths – the others only used one test for that case.

Each of these tools was able to handle a simple task well. Next, let’s see how they do with a more complicated task.

Complex Task: Enabling Admin Overrides for Moderation Failures

For the second task, I assigned each tool a more complex, multi-layered feature:

Now that we have moderation results, we need to provide admins a way to allow episodes that failed moderation to proceed – in case these are not real violations of our content policy. Using the existing admin sections as a guide, design a simple system to allow admins to allow episodes to proceed into production – this should call the proceed_after_moderation method on Episode.

This task required multiple steps, with controllers, policies, views, and tests needing changes. Here’s how each tool handled the challenge:

Cursor

Cursor’s initial pass struggled with the complexity, missing some UI elements and not fully matching the existing admin screen structure. It required several rounds of iteration to adapt the UI, add necessary flash messages, and resolve test failures due to misaligned expectations. Cursor’s ability to add console command results to composer for subsequent “fix the tests” iterations proved helpful here, allowing a degree of “hands-off” debugging and code correction. After several iterations and a few manual edits, Cursor managed to complete the task. Cursor’s iterative approach shows promise for complex projects but requires close supervision and intervention for last-mile polishing.

GitHub Copilot Workspace

GitHub Copilot Workspace started with a prompt modification to fit the complexity:

Now that we have moderation results, we need to provide admins a way to allow episodes that failed moderation to proceed – in case these are not real violations of our content policy. Using the existing admin sections as a guide, design a simple system to allow admins to allow episodes to proceed into production – this should call the proceed_after_moderation method on Episode. Admins need to be able to view the episode’s full transcript and the moderation results.

This task was slightly complicated by some downtime on the part of Copilot Workspace – it didn’t generate one of the proposed files, and when refreshing I hit a 502 Gateway timeout. I came back later and was able to generate the file. Like Cursor, there were a few “last mile” things that had to be cleaned up on the UI side. There were also some hallucinations, but these would be solved by having the correct files in the context.

Solver

Solver delivered a promising first attempt, which needed a little bit of tweaking:

1. Update the AdminPolicy with the new bypass_moderation method
2. Admin::PodcastEpisodesController already has a set_episode method; modify the before_action to use this on bypass_moderation as well
3. Always show the Moderation Results and Transcript, but do continue to only show the Bypass Moderation button if needed
4. Remove the _episode.html.erb.new file in favor of updating the existing file.

However, this, and the follow up asking it to add feature specs, revealed one of the most interesting features of Solver: self iteration.

During this process, I noticed that Solver would essentially converse with itself to improve it’s solutions. In one instance, it even presented me with suggestions for how to improve the solution, I told it to continue, and it implemented those suggestions.

It also identified some feature tests that were missing and attempted to add them for me.

Like Cursor and Copilot Workspace, there were similar “last mile” UI and touch up things to get to a working solution.

Final Thoughts: Choosing the Right AI-Powered Tool for the Job

It’s amazing that I am writing a blog article to compare myself performing the same two tasks 3 times. All of these tools are really special and any one of them would be a good choice to take the next step in your AI-powered development workflow.

Each of these tools has a special idea:

Cursor – easily specify files or directories to add as context for the current task – as simple as @-mentioning the files while typing.
Copilot Workspace – develop and iteratively refine a detailed plan, then proceed to implementation.
Solver – automatically iterate on the current task and solution, and allow the user to “Continue Solving”.

As these AI-powered tools continue to evolve, the integration of these special ideas into a unified solution could not only streamline workflows but also empower developers to focus on higher-level tasks, fostering creativity and innovation. I’m excited to see how the future of AI in software development unfolds, and I believe we are on the cusp of transformative advancements that could redefine our industry.

« Older Entries

Exploring the Future of AI-Powered Development

Simple Task: Creating a Support Request for Content Moderation Failures

Cursor

GitHub Copilot Workspace

Solver

Complex Task: Enabling Admin Overrides for Moderation Failures

Cursor

GitHub Copilot Workspace

Solver

Final Thoughts: Choosing the Right AI-Powered Tool for the Job

You may also like…

Beyond Chatbots: A Business Leader’s Guide to Implementing Large Language Models

Introducing the AI Disclosure Decision Matrix

Twin Sun Recognized as a Clutch Global Leader for Fall 2024

Our Work

Twin Sun

Stay Connected

Our Work

Twin Sun

Contact

Stay Connected