Thread - Nostr Hypermedia

Even the day job is switching away from ESXi to Proxmox. 🤌🤌

2025-10-20 16:14:15 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

Proxmox has it's issues tho. Im sure they all do. I used to run a Windows cluster, it was great but a lot to manage. Proxmox has horrible network stability issues and STONITH really be killer with corosync. It's HA storage is a little lacking tho. You either get hyperconverged (which i guess is getting popular) or standalone ceph which is not cost effective.

2025-10-20 16:40:55 from 1 relay(s) ↑ Parent 3 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

Something nostr:npub12262qa4uhw7u8gdwlgmntqtv7aye8vdcmvszkqwgs0zchel6mz7s6cgrkj and i discussed a little while ago

2025-10-20 16:41:47 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Dimi npub1r7ps...nspg

>network stability issues. Skill issue. Coming from someone with one.

2025-10-20 16:42:47 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

I really liked a 2 node HA SAN setup with iSCSI and virtual IPs. Windows had SMB multichannel as a backhaul for HA storage quorum. SMB has fantastic performance with Windows using link aggregation, and load balancing so you can easily use a mix of 10g fiber and 1g. Linux cannot do this at all. LCAP blows in comparison. Iscsi multipath is better, but still. And I'm not sure if anyone has priced out 10gb switches lately...

2025-10-20 16:46:23 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Dimi npub1r7ps...nspg

I’m using 10g iscsi rn in proxmox

2025-10-20 16:47:15 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

I partially agree, but coming from Windows networking, linux networking sucks ass out of the box and the basic RTFM. Unless you learn to become a wizard (and I havent) having a balance of performance, HA, and hardware pricing, Windows just works out of the box. That and there is no reason for a hard crash when a single node loses quorum. I'm currently learning about STONITH, but that seems unreasonable when a link goes down due to a packet drop or a STP temporary lockout. Which means yeah, you SHOULD have physically redundant connections for a cluster network... Okay so another set of nics, and another switch, and another 40 ethernet patch cables... I already have 2 48 port switches almost at capacity.

2025-10-20 16:50:00 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

*Packet loss* due to lacp issues

2025-10-20 16:50:39 from 1 relay(s) ↑ Parent Reply

Dimi npub1r7ps...nspg

Fellow windurr guigud admin, I feel this pain deep in my bones. But you must take responsibility for your environment. Dig deeper.

2025-10-20 16:51:52 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

What does it look like? Currently my SAN has a single 10g link which is my single point of failure, and I have not found a way to handle fallback to the 4gb lacp group it also has.

2025-10-20 16:52:29 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Dimi npub1r7ps...nspg

Won’t be discussing the setup for this one too much, but I will be in the future. I’ll make sure it’s live-streamed & I let you know ahead of time.

2025-10-20 16:53:23 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

We all have limited time and priorities. It will come, it's still important to explain this nuance to people. If I had been told what I know now I probably would have never switched to proxmox. People see what they want to see and leave out the massive blocking complexities that require specialization in the domain. We can't all specialize in everything.

2025-10-20 16:54:52 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Dimi npub1r7ps...nspg

TOTALLY FAIR!

2025-10-20 16:55:19 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

Sure please do! I'm wondering if I can use Active/Passive virtual IPs now to handle network failover. I don't have too much experience doing that on Linux. Literally learning that in practice as we speak.

2025-10-20 16:56:11 from 1 relay(s) ↑ Parent Reply

Dimi npub1r7ps...nspg

Personal use != business use. Enterprise use is a different beast entirely & needs someone who’s done this MANY times before.

2025-10-20 16:56:25 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

Very true, but I've always been somewhere in between. Like very prosumer anti-gatekeeping type of stuff XD. Like don't tell me I can't have HA at home im gonna fucking send it out of spite.

2025-10-20 16:57:32 from 1 relay(s) ↑ Parent Reply

Enki enki@sovbit.host npub1gnwp...errq

We haven't had too many issues in the day job yet, but we're also not clustering our proximox instances....yet. Stand alone, they seemed to work just fine. But we do have separate cores. It's going to get interesting when we start clustering them, I'm sure.

2025-10-20 16:58:13 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

Enki enki@sovbit.host npub1gnwp...errq

I guess we'll see how many network engineers it takes to keep them stable. 🤣

2025-10-20 17:02:44 from 1 relay(s) ↑ Parent Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

I think it's one of those things you have to get right the first time, otherwise you're stuck in the hell of "i can't touch the network because the entire cluster will hard crash and take 25 minutes to come back online and God I hope no disks were corrupted" My UPSs are getting kind of old and sometimes brown-outs don't trip fast enough, I had an issue where a quick power loss tripped up my main switch and looking at the logs 3/5 nodes lost quorum and the whole cluster hard crashed and hardware reset. I lost 3 VMs in the process I had to restore from backup. Took almost 2 hours to recover at 3 am XD

2025-10-20 17:02:51 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

Dimi npub1r7ps...nspg

Fr

2025-10-20 17:03:37 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

Also reminds me I want to consider getting switches with redundant hot-swap PSUs

2025-10-20 17:06:08 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Dimi npub1r7ps...nspg

That anecdote is a power issue though. This is where wholistic approach matters highly, one bad power system fucking higher layers is so damn frequent and it sucks when that’s what you’re stuck using. No power, no revenue.

2025-10-20 17:06:48 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

Dimi npub1r7ps...nspg

Edge-core. 👨‍🍳🤌💋

2025-10-20 17:07:09 from 1 relay(s) ↑ Parent Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

The network outage was intermittent, and 2 nodes still had quorum (yes Im aware that's too low a vote), taking the ENTIRE cluster down over 5 seconds of network loss is absolutely nuts to me. The machines in the picture I shared had consumer UPSs, crappier network cards, configurations, and switches in comparison and still did better in terms of stability.

2025-10-20 17:09:16 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

I also cannot fix crappy power, that's a condition id expect to survive... Line interactive UPS are prohibitively expensive even for many businesses.

2025-10-20 17:10:34 from 1 relay(s) ↑ Parent Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

Whey can't proxmox just kill all services to accomplish fencing? It takes like 10 minutes for a single server to boot into the OS. I think even the kernel watchdog can do a reset without a full system reboot.

2025-10-20 17:14:30 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

And no I'm not going to adjust my hardware to boot faster, it's old and needs memory checking and firmware updates. Hardware reboot should not be a normal condition.

2025-10-20 17:16:46 from 1 relay(s) ↑ Parent Reply

Dimi npub1r7ps...nspg

I’ll have to explore behavior more, most settings can be tweaked so there should be enough wiggle room.

2025-10-20 17:27:18 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

ChipTuner ChipTuner@gitcitadel.com npub1qdjn...fqm7

I'm playing with pacemaker and SBF which appears to accomplish STONITH fencing via the kernel watchdog, but I'm not certain yet. I haven't gotten my cluster established yet.

2025-10-20 17:34:00 from 1 relay(s) ↑ Parent Reply

npub12kqf...jh7s

Like this? https://github.com/khanh-ph/proxmox-kubernetes

2025-10-20 17:51:05 from 1 relay(s) ↑ Parent Reply