Ways I've tried to use LLMs for coding: - Smart autocomplete (dumb, annoying) - Search with poorly articulated queries (really good) - One shot from stupid prompt (random result) - One shot from better prompt (random result) - One shot from plan (random result) - Archon's research/plan/implement with sub-agents (100k LOC broken codebase) - Focused, directed feature implementation (convoluted logic, broken UI) - Focused easy bugfix (mixed results, sometimes works) - Focused difficult bugfix (burns tokens, no ability to debug) - Upgrade dependencies (hallucinations of old versions, usually broken) - Write tests (instead of dependency injection bad mock design, tautological tests) - Write documentation (stylistically poor, did a decent job with something I wouldn't otherwise have done though) - Fix linting errors (useful in a language I don't know, otherwise too slow/expensive to be better than doing it by hand) - Spec-driven development (ended up maintaining the code myself, asked LLM to update the spec) - Generate code in a well-defined context against an API/language I don't know (very helpful if I review/edit it) - Write a plan for me which I implement manually (fails to get design decisions right) - Write boring functions that I stubbed out or just called (works pretty well given enough context) - Help me sanity check plans/implementations by finding edge cases (pretty good, isolated work which I can ignore) So far: LLMs are good for certain categories of search, simple tasks with sufficient context, providing context that I lack (read the docs for me, bringing in skills I lack, helping me think things through). I remember a year ago people saying LLMs were most helpful to sharpen your thinking rather than think for you, but the draw of generating tons of code without thinking was so strong I didn't really see that for a long time. Overall, the net result for me has been that I have moved slower, done worse work, and gotten dumber. But I am slowly coming to a place where I can maybe start using these tools correctly.

Replies (19)

Based Truth's avatar
Based Truth 1 month ago
LLMs serve the interests of Alphabet and Microsoft, not yours, fueling the surveillance state.
To me the main gain is that I don't need to remember how certain language does certain things. I can center a div without having to worry about which browsers support which instructions, etc. It's quite freeing.. for the past 4 months I have been mostly focusing on spec development and just reviewing the implementation. Then I can leave say 6 sessions coding in parallel while I refine the next spec. Of course, the more backend the better,.since AIs can't really test interfaces well yet.
I have similar issues. I am getting stuff done, but outside of simple one shot apps, it would have been quicker if I'd have written it myself. Part of the issue is I like to build in a very specific way. Not just the architecture, but the particulars inside of the scaffolding. I like beautiful, elegant, well designed code. And that is decisively NOT what AI writes. I'm learning to let go of being a coder, and take on something of a manager's role. I write a detailed spec, enforce design decisions with code, and ensure I've got decent test coverage. And then I put on my editor's hat, iterate, curate, and refine.
I have the typical scientist "casual" coding experience for specific tasks like data analysis etc, never really "building applications" per se... I've been trying to use Lumo to work on an idea in VR/AR... Results have been a bit all over the map. It'll write code that doesn't work, or suggest libraries that don't exist, etc. It's definitely good at surfacing stuff that I wasn't aware of though, I.e the "search" application...
I rarely make prompts that are more than a few sentences. It usually builds something reasonable. But the key for me is iteration. I look at every line and commit and keep telling it how to improve. This of course requires (or is easier with) pre-existing expertise. It helps when the underlying project already has thorough test coverage. I require that every commit passes those tests and makes sense on its own.
Its just like playing chess against 6 players at the same time. Your job is to rotate fast enough to give them what to play while you keep verifying their assumptions and architectural decisions. You are hitting a good point with context, but I worked in large teams before, so context was never actually there. With AI, that "context" becomes just the highest level of architrecture you can think of.. the rest is details that only the AI knows. I have given up on the idea that I can find bugs in the AI code. If I set it up correctly, there won't be any actual bugs, just working behaviors that I don't actually want or missing features that I forgot to mention. Most of my day these days is just that.
I found a level that works for me where I start out with a feature manually to get a feel of how I would solve it and in the process, create skills that implement abstractable workflows. Then try to find scenarios on how to chain these skills together in order to achieve the outcomes I care about. I avoid having to explain my thought process to the llm and just point it to skills that where built as a result of the outcome of my manual process
The maximum number of sessions u have found ideal without loosing track of the work is between 3 and 4 sessions. My text editor is still at the fore front of work i choose to do my self
Skills on special unit of work that you do often are the unlock. Not generic skills but specific skills that accomplish something e.g get all components and their dependency related to a feature in the same file. Another skill could be move specific components from a file to another file along with its dependencies and create the equivalent UI stories. Another one could be resolved duplicate imports. It then becomes possible to build skills where each step is pointing to another skill so that at the very list the agent does things correctly. This has been the Middle ground imo
Yeah, you MUST write skills/experts on how to use the code. Otherwise it is never going to work well. The AI can write those too. You can ask it to review the code, find patterns and write the texts. Reading all the code consumes a lot of tokens but if the skills are good that only need to happen once.
I noticed you're using TypeScript too. The rules I want the AI to follow are encoded as a Deno lint plugin in scripts/lint-plugins/innis-rules.ts in this package I released today. They certainly don't address every silly decision the AI makes, but they run on every CI build and catch some basic things. You can probably build something similar for your pipeline.
Innis's avatar Innis
And shipping jsr:@innis/nostr-core today. The TypeScript port of the PHP library. Same architecture, same discipline. Branded primitives at the boundary, immutable domain objects, pure functions, ports where the protocol meets the world. The protocol layer separated from everything else, organised around domain concepts rather than NIP numbers, strict enough that a client, a relay, and an application can share the same core. It is a contracts library and not a batteries-included toolkit. nostr-tools is excellent at the latter and the two are not feature-for-feature competitors. What this exposes that nostr-tools does not is a hex-typed boundary the compiler can check, with PublicKey, EventId, RelayUrl, and Sig all branded, one Signer port that NIP-07 and NIP-46 and a local signer all satisfy, crypto failures returned as Results rather than thrown, and an HttpClient port so libraries that touch the network never reach for fetch directly. If you are building an app and do not need any of that, use nostr-tools. If you are working inside the innis stack or want swappable boundaries you can test against in memory, this is where the contracts live. The standalone relay-selection library released earlier this month was the first piece of the TypeScript stack to go public. This is the foundation of everything else. The pool, the event store, the NIP-07 and NIP-46 signers, and the work built on top of all of it, all to follow as each layer is cleaned for release. The discipline I am working on is not letting that cleanup become the delay. The lesson keeps coming back around. AI was involved, same terms as before. The architecture is mine. The decisions are mine. The machine held the other end of the board. deno add jsr:@innis/nostr-core https://github.com/johninnis/nostr-core-ts MIT. #nostr #typescript #opensource #nostrdev View quoted note →
View quoted note →
It's just practice, really. Once you get a hang of it, it becomes easy. It's much easier than managing the work of a small team of 6 people. Or taking care of 6 dogs at the same time, for instance.