Matt Lorentz's avatar
Matt Lorentz
_@mattlorentz.com
npub16zsl...92l7
Technologist, solarpunk, gamer, backpacker, passionate about using the internet to push more power to more people.
Matt Lorentz's avatar
mplorentz 6 days ago
I haven't shared much about this but in my free time I've been venturing into the self-hosted AI space. I acquired an old gaming machine with a decent graphics card from 4 years ago (RTX 4070S) and put linux on it and spend some time getting hermes agent (https://hermes-agent.nousresearch.com/) running on it. I got it running with various sparse versions of Qwen 3. Managed to cobble together a few scripts to do things like scrape some news and flight data, but I kept running into timeout errors at various levels of the hermes stack. It's really not set up to work with agents that take multiple minutes to respond and after fixing things in a bunch of different places I got tired of it and switched it back to claude. I did find a fork that supports Zulip and I really love it as an interface for many long-running async conversations. Then I decided to try to some autonomous coding with local models and fell down hard into the Steve Yegge beads/gas town/gas city rabbit hole. I took gas city (which is like the sdk for agent interactions extracted from gas town) and got it running. I tried running the entire thing with only local models but it wasn't working at all. I ended up with Claude as the mayor of the city who oversees a bunch of short-lived agents that use qwen on my gpu and try to write code and open PRs. They aren't doing a very good job yet but the mayor and I learn and improve things a bit more every day. I'm not a fan of the super-extractive metaphors of gas town but I do really like beads db as a system of getting agents to cooperate. It's basically an issue tracker, but some issues get labeled as memories and some get labeled as mail and some even represent agents, so it creates an observable system of cooperation where agents spin up, read their mail, complete their task, and hand it off to another agent, then shut down. I'm trying to run all these agents serially to limit gpu contention and it somewhat works. But it's going against the system's design which is just to have a mega bonfire of tokens. The biggest weakness I think is just that the free models that fit in 12GB of vram are not enough to do good coding. But the goal I'm working towards is getting frontier-quality code with free models on my own machine by chaining together enough hill-climing loops (planner, coder, architect review, qa review, bounce it back to coder, etc.) to get good code. And I'm thinking a lot about what the right interface is for me to review the work, right now it's just producing pull requests that I review normally. This has been my first time running --dangerously-skip-permissions agents 24/7 on my hardware and it feels quite cyborgian.
Matt Lorentz's avatar
mplorentz 2 weeks ago
Cursor 3 is so much worse than Cursor 2 that I spent time looking for another IDE yesterday. I thought they really understood how I wanted to work with AI but that trust has been totally shattered. I tried Windsurf but it doesn't have useful git worktree support. I also tried native VSCode with some plugins and Claude Code CLI. All were disappointing and I'm back on Cursor today. I feel so alone as a Cursor user, I feel like everyone I know is doing CLI development. For me the development bottleneck is reviewing and manually testing AI generated code. Both of these are much easier in an IDE. The UI to approve each hunk of changes the AI made is key for me (after it's done, not the interactive permissions prompts that claude insists on unless you use yolo mode). I need to be able to quickly see more context around the lines the AI changed, click through call hierarchies and go to definition. Then I want a dashboard that lists all my agents working in different worktrees and I need to be able spawn a new one quickly. And I want to quickly switch between worktrees and have the associated agent chat all right there. And I want all of this in one window. I'm sure this is all possible on the command line if you spend enough time configuring tmux and vim, but I'm worried that my workflow is going to change in another few months and I'd have to do it all again. So for now I'm reinstalling Cursor 2 and I'll check back in on 3 in a few weeks.
Matt Lorentz's avatar
mplorentz 3 weeks ago
Another AI pattern I'm really digging lately is managing my home server with Cursor + ansible. I run a few dozen docker containers and I've always managed the server with SSH, vim, and docker CLI. I don't want an AI agent mucking around on the machine and uploading who-knows-what as context to foreign servers. But for recent containers I have started a repo locally on my Mac where I have Claude or Composer write ansible scripts to deploy compose files and start the services. This feels like the best of both worlds to me: AI can blast out changes much faster than I can, but it doesn't have any access to the actual server and I can easily see exactly what it's going to do before I execute the playbook myself. This has allowed me to layer on additional functionality like creating a zfs dataset and ACL for every container which was too much work to do manually.
Matt Lorentz's avatar
mplorentz 3 weeks ago
I spent some time over the weekend setting up a hermes AI agent on my old gaming PC. It took a lot of fiddling but I finally have some models running locally on it that make it feel like a slower slightly stupider version of claude. It feels so good to finally have a fully local stack. There are a lot of boring chores in my life that I want AI to do and that it's probably capable of, but up until now I have refused to share much personal information with any of the big companies. I'm hoping that I can build up trust with a local agent and gradually make it more useful over time. I gave it an email address and already have it submitting some receipts for reimbursement (after approval by me) which feels like a good start.
Matt Lorentz's avatar
mplorentz 3 weeks ago
Instead of waiting for an app developer to fix the bug I reported I just one-shotted a replacement app with Opus. Achievement unlocked?
Matt Lorentz's avatar
mplorentz 0 months ago
Every couple months I do a race where I have some agents go off and build a feature or fix a bug while I do it myself in Cursor. The time I spend reviewing and fixing the agent's work always end up being longer and more painful, which is I haven't switched over to an "agent command-center" style of software dev. I do kick off worktree agents here and there throughout the day to make minor changes that come up while I'm working on a larger branch. But those are side quests while I work on the main thing.
Matt Lorentz's avatar
mplorentz 0 months ago
Cursor's Composer 2 model is performing much worse for me than Composer 1 :( I feel like Composer 1 really hit a sweet spot for me between speed and quality. For me the bottlenecks for coding with AI are: - understanding all the code that the model wrote - testing changes Composer 1 really helped with the first because it could blast out small amounts of code that I could quickly review without my brain getting bored and context switching to something else. I feel like I'm an outlier in that I'm trying to stay heavily involved in the dev flow rather than having a multiple agents work on long tasks and then coming back in cold to review their work. Is anyone else using smaller quicker models in this way?
Matt Lorentz's avatar
mplorentz 1 month ago
Do we have a NIP/tag that says to a relay "only serve this event to the authenticated author or p-tagged recipient?" This behavior is mentioned in NIP-17 and NIP-9a and probably makes sense in a lot of cases, and I want it for my Shamir's Secret Sharing NIP.
Matt Lorentz's avatar
mplorentz 1 month ago
Just had an hour long video call in flotilla (the video part is still in dev, not released yet). The call quality was actually really impressive, better than Jitsi or Keet I would say. Props to Livekit for the killer open source WebRTC toolkit.
Matt Lorentz's avatar
mplorentz 1 month ago
This blog post about the end of the American Empire has been living rent-free in my head since Saturday. It's quite long but it touches a lot of ideas that have been rolling around in my head like: how quickly will the American Empire fall apart, is it worth trying to reform the current system, how can we molt into better forms of governance through it? The idea that territorial sovereignty as a concept is on its way out is totally new to me but very intriguing.
Matt Lorentz's avatar
mplorentz 1 month ago
Lifetime iOS user 3 days into using GrapheneOS as my daily driver. AMA.