First Experience with Vibe Coding

I am not going to spend time defining Vibe Coding, or explaining its history/origin. It's been on my radar for a while, but only recently did I find time to really educate myself on the topic. The concept is pretty simple - let the AI/Agent do the work for you. But the practical implications are very interesting.

If you choose to do your own research (which I highly encourage), be aware that since this is all changing so fast, articles or videos from even a few weeks ago may already be out of date. That being said, my "Aha moment" came when I watched this video. Fair warning: it's three hours long, but the first hour is more than enough to see the potential.

Where to start?

There are all kinds of AI assisted (enabled?) tools and IDE's, but I selected Cursor AI. It's based on VS Code, and after watching the video, I liked its visual organization, and the ability to understand rules & requirements. So I downloaded Cursor (version 0.49 as of this writing) and signed up for a 14 day trial of the Pro plan (for the record, I am totally subscribing). After that I opened my previous VS Code workspace. Now it was time to start implementing features.

The first thing I wanted to do was to tidy up interacting with Accounts:

  • Add a new field (starting transaction id)
  • Modify the Accounts page to be a simple listing, and only have the details visible when adding/editing.
  • Add a date picker to the Start Date field when in edit mode.

And this is when the magic truly began. I entered a few prompts, and just accepted each change as Cursor responded. When I asked to add a date picker, Cursor informed me that I needed some new dependencies, and even offered to run them:

npm install @mui/x-date-pickers @date-io/date-fns date-fns

So not only was it able to modify the code, it was able to modify my environment to support it. That's pretty amazing.

Ergonomics vs a Browser-based Chatbot

That was just the surface. I continued to implement features, and it only took minutes versus hours or days. Some of the more complex things took many iterations and ultimately did take hours, but there's no way I would have been able to emulate spreadsheet behavior in React in less than a week, particularly because I have zero practical experience with React.

At its core, you're still interacting with an Agent/UI in a chat model; it's just integrated in the IDE instead of a browser window. This is far superior to what I was doing just days before, and it meant no more copy/pasting of generated code.

The agent is able to modify the files within the workspace directly. As I said, the agent can even manipulate the host environment by executing commands from the shell. The agent can even automatically spot and correct certain types of errors.

The interaction model is typically as follows:

  • In the chat panel, you enter a prompt.
  • The agent will do its best to fulfill the request, or may ask clarifying questions.
  • The agent will generate a series of modifications to the source, and ask for you to review. You can choose to accept the changes, or request further refinement.
  • Occasionally the generated code will result in problems (compiler errors, linter errors), and Cursor will automatically attempt to resolve the errors.

This workflow allows for some truly rapid prototyping, and depending on the tech stack / toolchain, greatly reduces the feedback loop.

To me, this is all just groundbreaking. I can't make a direct comparison to using GitHub Copilot from within VS Code because I just haven't tried it yet. It's possible much of this workflow is available there as well.

It isn't all roses

As cool as all of this is, there are some issues. I'd wager many of my problems were on me; AI prompting is a skill that will continuously grow. But these are some of the more frequent issues I ran into:

  • "Chats" have a finite length - A chat session will eventually become ... exhausted? I don't know; that's the best term I can use. But after enough time, the chat interactions slow down, and eventually the agent will act like it's just started a new conversation. When that happens, it's effectively lost it's "memory." On this note, as the chat ages, there will be a little note in the bottom right hand corner recommending opening a new chat. It's not the easiest thing to see and I hope Cursor makes this more prominent in the future
  • Lack of "big picture" thinking - even when limited to the context within a single file, multiple times I requested a certain behavior to be implemented, and when I tested the changes, they did not function as desired. I would respond to the agent, describing the behavior, and frequently the fix was to make more changes elsewhere in the same file. For example, when exiting one particular cell in a grid, a validation message was shown to the user twice. It was because the generated code invoked a custom function, and when analyzing the source file to make a change, didn't notice it was already invoking that function. I chalk that up to the underlying model, and not Cursor itself.
  • Functionality was occasionally "undone" - I remember at one point I requested that the Date column be formatted in the US style (mm/dd/yyyy). Later when I requested a change to how the Amount column was formatted, the Date column had lost its formatting.
  • Infinite Loops - this only happened once, but it was super annoying. I requested a particular change, the agent made changes, which resulted in linter errors, so it automatically attempted to correct those linter errors. Apparently those corrections caused other problems, and after a couple of attempts to fix them, the agent decided to abandon the changes and re-generate the original code again. This ended up producing a literally endless cycle where it just made the same changes, applied the same fixes, and it could not escape. I had to manually stop the cycle, discard the changes, and refine my request.
  • "Naive" implementations - sometimes the agent made some choices that just didn't make sense, and seemed like something that a very junior developer might put out. For example, I stated that the Balance field was a running total; SUM(transaction.amount) of all prior transactions. At one point while testing the interface, when I edited the Amount field on a particular row, the Balance field was updated on all prior rows (which is the opposite of what I requested). I scrolled back through my chat history and literally copy/pasted a previous instruction, and the agent responded with an "Aha, I understand now", and was able to correct the calculation. I don't understand why that prompt didn't achieve the desired result the first time. Another example: the Balance field was being updated in all rows on every keystroke when an Amount cell was in edit mode. That's just obviously going to be slow, so I suggested to the agent that a performance optimization would be to recalculate only when the edited value was committed. It was able to make the requested change, but I'm surprised the model didn't forsee that problem earlier (especially since the initial implementation already memoized this value).

Magical Moments

Hiccups aside, there were occasions where what I experienced can only be described as magic (in the sense of Arthur C Clarke's third law):

  • The initial implementation of the Accounts page had a section where a single account's fields could be modified, followed by a table of all accounts. This UI was busy and ugly, so I requested:
Ok let's work on the accounts page. 
When you click on the Accounts link from the nav bar, 
it shows the area to enter a new account, 
and below that is a listing of the defined accounts, 
I'd like to modify this form so that the default 
view is just the listing, and there should be a 
button to add a new account. When clicking the button 
to add an account, or the pencil to edit an existing 
account, it should transition to a separate page to 
specify the account values. The add account button 
should just be a blue button with a +, with tool 
tips that reads "Add New Account":

That's a medium level of detail, and it was able to do everything I asked in one go. See the following images:

Image description

Image description

Image description

  • Occasionally there were some runtime errors, and the page would cease to render. When that happened, I opened the dev tools to show the console logs (with error and stack trace), and then took a screenshot. I informed the agent that rendering had stopped functioning and there was an error, and pasted in the screenshot, and the agent was able to fix the problem. Think about that one; the AI analyzed the screenshot, was able to understand the text of the stack trace, and correlate with the code to postulate the cause and make a fix. This happened several times. That's truly amazing to me.

Final Implementation

With several hours of iterations, I was able to get the agent to generate code that allowed the UI to mostly function as a spreadsheet. This is pretty sophisticated behavior for a UI, so it's no surprise it took so many iterations to get there. But keyboard navigation works: tab/reverse tab, arrow keys, pressing enter to accept a row, etc. It's not perfect and there are lots of tweaks to make, but considering I don't know React at all? This is just amazing.
Here's an example of what it looks like currently:

Image description

Next Steps?

I have just touched the surface. And although I will continue to iterate on application features, I'm curious about exploring Cursor AI in more depth. For example:

  • Start adding some unit tests
  • Dive into the code and ask it to explain a few things
  • Start suggesting refactoring
  • Explore how to leverage Cursor AI in a test driven manner; can it execute tests before applying changes?
  • Explore the limits of workspace size and context. For example, if I am working across multiple layers (database, backend service with an api, front end that consumes that api), can I make vertical changes across all modules?

Conclusion

There are some serious limitations to GenAI assited coding in general, but specifically "vibe coding". Some of the limitations are the tools; some are the models. This is a space that is changing very rapidly; it's worse than trying to stay abreast of front end frameworks. But I am confident that these limitations are only temporary, and I'm excited for the potential. I'm going to do my best to ride the wave and see where it goes.