Do. It. Yourself.
Anon, really, itβs not that hard. We have the tools.
nostr:note1cfqcewnxjwwv0zcduptdrdajng7dat37ap765x9yt9cmdjq726xq40dvcs
Login to reply
Replies (38)
Even the day job is switching away from ESXi to Proxmox. π€π€
Proxmox has it's issues tho. Im sure they all do. I used to run a Windows cluster, it was great but a lot to manage. Proxmox has horrible network stability issues and STONITH really be killer with corosync. It's HA storage is a little lacking tho. You either get hyperconverged (which i guess is getting popular) or standalone ceph which is not cost effective.
Something nostr:npub12262qa4uhw7u8gdwlgmntqtv7aye8vdcmvszkqwgs0zchel6mz7s6cgrkj and i discussed a little while ago
>network stability issues.
Skill issue. Coming from someone with one.
I really liked a 2 node HA SAN setup with iSCSI and virtual IPs. Windows had SMB multichannel as a backhaul for HA storage quorum. SMB has fantastic performance with Windows using link aggregation, and load balancing so you can easily use a mix of 10g fiber and 1g. Linux cannot do this at all. LCAP blows in comparison. Iscsi multipath is better, but still. And I'm not sure if anyone has priced out 10gb switches lately...
Iβm using 10g iscsi rn in proxmox
I partially agree, but coming from Windows networking, linux networking sucks ass out of the box and the basic RTFM. Unless you learn to become a wizard (and I havent) having a balance of performance, HA, and hardware pricing, Windows just works out of the box.
That and there is no reason for a hard crash when a single node loses quorum. I'm currently learning about STONITH, but that seems unreasonable when a link goes down due to a packet drop or a STP temporary lockout.
Which means yeah, you SHOULD have physically redundant connections for a cluster network... Okay so another set of nics, and another switch, and another 40 ethernet patch cables... I already have 2 48 port switches almost at capacity.
*Packet loss* due to lacp issues
Fellow windurr guigud admin, I feel this pain deep in my bones. But you must take responsibility for your environment. Dig deeper.
What does it look like? Currently my SAN has a single 10g link which is my single point of failure, and I have not found a way to handle fallback to the 4gb lacp group it also has.
Wonβt be discussing the setup for this one too much, but I will be in the future. Iβll make sure itβs live-streamed & I let you know ahead of time.
We all have limited time and priorities. It will come, it's still important to explain this nuance to people. If I had been told what I know now I probably would have never switched to proxmox. People see what they want to see and leave out the massive blocking complexities that require specialization in the domain.
We can't all specialize in everything.
TOTALLY FAIR!
Sure please do! I'm wondering if I can use Active/Passive virtual IPs now to handle network failover. I don't have too much experience doing that on Linux. Literally learning that in practice as we speak.
Personal use != business use. Enterprise use is a different beast entirely & needs someone whoβs done this MANY times before.
Very true, but I've always been somewhere in between. Like very prosumer anti-gatekeeping type of stuff XD. Like don't tell me I can't have HA at home im gonna fucking send it out of spite.
We haven't had too many issues in the day job yet, but we're also not clustering our proximox instances....yet.
Stand alone, they seemed to work just fine. But we do have separate cores. It's going to get interesting when we start clustering them, I'm sure.
I guess we'll see how many network engineers it takes to keep them stable. π€£
I think it's one of those things you have to get right the first time, otherwise you're stuck in the hell of "i can't touch the network because the entire cluster will hard crash and take 25 minutes to come back online and God I hope no disks were corrupted"
My UPSs are getting kind of old and sometimes brown-outs don't trip fast enough, I had an issue where a quick power loss tripped up my main switch and looking at the logs 3/5 nodes lost quorum and the whole cluster hard crashed and hardware reset. I lost 3 VMs in the process I had to restore from backup. Took almost 2 hours to recover at 3 am XD
Fr
Also reminds me I want to consider getting switches with redundant hot-swap PSUs
That anecdote is a power issue though. This is where wholistic approach matters highly, one bad power system fucking higher layers is so damn frequent and it sucks when thatβs what youβre stuck using.
No power, no revenue.
Edge-core. π¨βπ³π€π
The network outage was intermittent, and 2 nodes still had quorum (yes Im aware that's too low a vote), taking the ENTIRE cluster down over 5 seconds of network loss is absolutely nuts to me. The machines in the picture I shared had consumer UPSs, crappier network cards, configurations, and switches in comparison and still did better in terms of stability.
I also cannot fix crappy power, that's a condition id expect to survive... Line interactive UPS are prohibitively expensive even for many businesses.
Whey can't proxmox just kill all services to accomplish fencing? It takes like 10 minutes for a single server to boot into the OS. I think even the kernel watchdog can do a reset without a full system reboot.
And no I'm not going to adjust my hardware to boot faster, it's old and needs memory checking and firmware updates. Hardware reboot should not be a normal condition.
Iβll have to explore behavior more, most settings can be tweaked so there should be enough wiggle room.
I'm playing with pacemaker and SBF which appears to accomplish STONITH fencing via the kernel watchdog, but I'm not certain yet. I haven't gotten my cluster established yet.
The variance in latency is only a few ms at most and it collapses. Like wtf?
Proxmox HA assumes an unrealistic latency target.
Personal clusters between virtual & physical nodes & networks havenβt had a problem. Iβll circle back.
have you ever worked @ the \/\/hitehouse π
Spooky question
4awhile th@z all i remember O;.;β .trickβtreat*****
ITz th@ "cirle back" bro
Did you just call me a suitcoiner?
can't remember faces & name JUSTmemez | R *U* imply-ING ima shITcoiner?