I tried AI and all I got was disappointed
This is another post that I began a long time ago and never finished: it was originally dated 2024-05-05 but I think I started working on it even earlier.
At the time, the GPT hype cycle was already well underway, so I finally gave in and decided to give AI a serious try.
Prologue: Setting up a local chatbot
I didn’t want to give OpenAI any of my money, so I set up a VM with a GPU passed through.
I originally wrote “I want to be clear that the GPU was already in the box from when I did my gaming VM; I did not have the ambition to go that far out of my way to do this.” – however when I rebuilt my server I decided to add a Tesla dedicated to this project.
Ultimately, I settled on llama-cpp-python for the backend. I tried LocalAI, too, but ultimately ditched it because:
- The prebuilt image assumes you’re not running on decade-old hardware, so I got SIGILL’d
- So I had to build from source, which took forever
- It very badly wanted to be run in Docker
- Docker images with CUDA in them are like 50 GB
It was therefore way easier to just build and install llama-cpp-python and run it as a systemd service, since I didn’t really need any of the additional features.
For the frontend, I ended up with big-AGI. I liked librechat too, but it had some UX issues (kept logging me out, didn’t like to persist model preferences between chats, etc).
I ended up sticking with Mistral-7B-instruct for most use - I tried some other Mistral variants but this one seemed to be the least likely to start hallucinating when given short prompts.
Chapter 1: AI helps me create a node.js project
Now that I had a robot to ask questions to, I embarked on a project: getting AI to help me build a web app for a random domain I purchased on a whim because it sounded cool. I am not a web developer at all, and haven’t done any web stuff since when jQuery was considered a Javascript framework, the best CSS we had was 2.1, and we all still put those “Valid XHTML 1.1” badges in our footers. Therefore, I thought this would be a good exercise: I could pretend to be a newbie and AI could help me do things I otherwise couldn’t. If it went well, then I’d know my job really is at risk.
There is no chapter 2
I spent a week really trying to get Mistral to teach me React. It failed at providing up-to-date guidance on initial setup for a project, so I ended up doing it the old fashioned way (googling). I tried to get it to produce the equivalent of a hello world page, but it couldn’t do that either. I flailed for a while longer then abandoned the project.
If you are about to say “skill issue,” just shut up. I am told time and time again that LLMs can do great things. But if I try to do something complicated and it screws up, the story is “that’s too hard, it’s better at simple tasks.” I gave it the simplest task I could think of – an ideal use case per all the discourse online – and it couldn’t handle it.
If you were gonna say “that model sucks,” also shut up. What reason do I have to expect a paid service to be better, and not just a waste of my money? If you think it’s because a newer model would be better at the job, then why were we already hyping this shit up? This has been discussed time and time again.
Believe me, I really tried
I admit my stupid experiment had limitations, so I tried using an AI coding assistant at my day job for a few months, too. We had a model trained on internal code, touted to be able to deal with the nuances of our frameworks, etc., but it also sucked. If it didn’t outright hallucinate, the code it generated usually smelled; it failed at all but basic tasks, and the IDE integration was buggy and kept crashing.
It was at least somewhat competent at generating boilerplate, but as I began to rely on it for that I could feel it making me dumb. This wasn’t just hypochondria – there is evidence of this phenomena. I uninstalled it and never looked back.
To this day, generative AI forces itself into my daily life in ways I do not consent to. I have to review slop PRs submitted by my coworkers. They’re getting worse and worse. I have to read slop summaries of my own PRs and tickets, forcibly inserted by the tools themselves. The analysis, if it is even accurate, is consistently surface level at best. It’s usually just verbosely paraphrasing the commit message. It provides no value.
You can’t google a basic question without risk of getting a hallucinated answer. Even duckduckgo has inserted slop above all the actual search results.
I do not believe in this technology
LLMs are a brute force approach to approximately accomplishing anything. I won’t rattle on about the energy concerns, because this has already been discussed. But for what, a cheap parlor trick? It fundamentally cannot reason. It cannot exercise creativity, except in the eerie way it confidently lies about things that should be fact. I’m tired of hearing about it and I no longer want to write about it, either.