| amarsh04 | git bisecting a problem with hardware MIDI playback that started with kernel 6.5-rc1 kernels | 02:59 |
|---|---|---|
| systemdlete | I have a daedalus VM with a frozen desktop. I can ssh into the VM from the host, and I can run htop, etc. The desktop in the VM simply doesn't take input. The mouse, however, does move normally. What should I look at first? | 22:21 |
| systemdlete | I have another daedalus VM on a different host that does not freeze. | 22:22 |
| systemdlete | Both hosts have 32GB ram, but one is a FX8350 and the other has an Athlon II X6. | 22:23 |
| systemdlete | The host does not seem to have any noticeable problems with its own desktop. The VM's desktop will run for many hours, then freeze. | 22:24 |
| systemdlete | While ssh'd into the daedalus VM, df shows me there is plenty of disk space. And htop does not seem to be screaming any pain. | 22:25 |
| gnarface | the frozen xorg instance isn't taking up any cpu? | 22:26 |
| systemdlete | gnarface, htop shows xorg using 2.0 MB and 0 cpu | 22:28 |
| gnarface | systemdlete: what about its children? | 22:28 |
| gnarface | not sure but i suspect something running inside that xorg instance is hung and needs to be killed, then the desktop will unfreeze | 22:29 |
| systemdlete | one child, using same cpu and mem, child marked D | 22:29 |
| gnarface | hmmm | 22:30 |
| systemdlete | maybe restart xorg? | 22:30 |
| gnarface | yea that doesn't seem like enough children | 22:30 |
| systemdlete | or will that cause widespread mayhem | 22:30 |
| gnarface | usually you should have a window manager and some other stuff... | 22:30 |
| gnarface | mine has one child that's just another Xorg instance | 22:31 |
| systemdlete | xfwm4 is running | 22:31 |
| gnarface | but nothing inside it? | 22:32 |
| gnarface | no actual programs just an empty desktop? | 22:32 |
| systemdlete | looks like xfwm4 has 11 children | 22:33 |
| systemdlete | I have several shell windows open, but only 1 is max'd the others are minimized | 22:34 |
| gnarface | it's gotta be one of them, not necessarily a maxxed one | 22:34 |
| systemdlete | btw, I did that specifically to run a test on this problem. I wanted to eliminate things like browsers etc that can mess things up | 22:34 |
| systemdlete | so maybe try killing one at a time? | 22:34 |
| gnarface | yea | 22:35 |
| systemdlete | I will start with the highest numbered pid and work backwards | 22:36 |
| systemdlete | whoa. | 22:37 |
| systemdlete | SIGTERM didn't work, so I tried SIGHUP. That didn't work either. | 22:37 |
| systemdlete | So I sent that pid SIGKILL, but it killed off all of them apparently. | 22:38 |
| systemdlete | VM desktop is still frozen though | 22:38 |
| gnarface | brutal | 22:39 |
| systemdlete | I wonder, gnarface, if maybe I should try killing off the windows themselves | 22:39 |
| gnarface | maybe yea | 22:39 |
| systemdlete | stand by have to install wmctrl | 22:40 |
| gnarface | you killed the processes, the processes are all gone, and the windows from them are still there? | 22:40 |
| gnarface | that's not right... | 22:41 |
| gnarface | that suggests xfce itself froze | 22:41 |
| gnarface | or went out to lunch somehow | 22:41 |
| systemdlete | well, the desktop still has not refreshed | 22:41 |
| systemdlete | so check panera bread? | 22:41 |
| systemdlete | :D | 22:41 |
| systemdlete | (sorry, getting punchy here. Been scratching my head for days over this.) | 22:42 |
| gnarface | try perf top and radentop? | 22:42 |
| systemdlete | not familiar with those, but I can install them | 22:42 |
| gnarface | it's amd gpu too right? | 22:43 |
| gnarface | or is it nvidia? | 22:43 |
| gnarface | radeontop won't help for nvidia | 22:43 |
| systemdlete | hold on... | 22:43 |
| gnarface | but perf might still | 22:43 |
| systemdlete | M5A78L/USB3 board iirc | 22:44 |
| systemdlete | and no external graphics card, so its all amd | 22:44 |
| systemdlete | (yes) | 22:44 |
| * systemdlete looking through system logs inside VM for clues... | 22:46 | |
| gnarface | check what size the xorg log is | 22:46 |
| gnarface | see if it's got a lot of repeating warnings | 22:46 |
| gnarface | (or errors) | 22:47 |
| systemdlete | uh-oh... | 22:47 |
| systemdlete | .xsession-errors is dated Mar 3 and xorg log Mar 4. I've had to do the nasty to reboot the VM, so probably those files were not updated | 22:47 |
| systemdlete | which makes me wonder | 22:48 |
| * systemdlete checks to see if file systems are mounted ro instead of rw | 22:48 | |
| systemdlete | nope. FS are all mounted rw. | 22:48 |
| systemdlete | (was just a thought) | 22:48 |
| gnarface | you're looking in ~/.local/share/xorg/ ? | 22:48 |
| systemdlete | oh, sorry no. Was looking at /var/log/Xorg... | 22:49 |
| systemdlete | ah, now that one is dated Mar 14 | 22:49 |
| systemdlete | uptime is 8 days | 22:50 |
| gnarface | being not nvidia, i would assume it would have migrated to running as the user instead of suid root, which would have moved the log to ~/.local/share/xorg/ | 22:50 |
| gnarface | this is a new change | 22:50 |
| systemdlete | ? | 22:50 |
| gnarface | pretty much everything except nvidia runs xorg as the user now | 22:50 |
| gnarface | so the logs go in the home dir instead of /var/log/ | 22:51 |
| systemdlete | I launch xfce from command line after log in. | 22:51 |
| gnarface | even if you run startx, nvidia drivers are still wired to start as suid root, afaik | 22:51 |
| systemdlete | Was having a lot of problems with WMs | 22:51 |
| systemdlete | well, apparently that is not the problem here, as you said | 22:51 |
| gnarface | yes | 22:52 |
| gnarface | is that Xorg log very large? | 22:52 |
| systemdlete | All I meant is that xorg is running as user | 22:52 |
| systemdlete | the one in ~/.local/share/xorg is 27132 bytes | 22:52 |
| gnarface | nah that's not a problem then | 22:52 |
| systemdlete | last 2 messages are (EE) No surface to present from. | 22:53 |
| gnarface | oh, that's a problem | 22:53 |
| gnarface | are there more errors before that? | 22:53 |
| systemdlete | lots of messages, one other error earlier: "(EE) open /dev/fb0: Permission denied" | 22:54 |
| systemdlete | but that was at 52 secs or so, and the desktop had been working fine for a few days | 22:55 |
| gnarface | a couple errors might be normal while auto-detect brute-force fails its way through every driver until it finds one that works | 22:55 |
| gnarface | the last one where it says your surface has gone missing though, that seems like a smoking gun | 22:55 |
| gnarface | this makes it seem more like a driver issue | 22:56 |
| systemdlete | Those are at 60 secs in | 22:56 |
| systemdlete | oooh | 22:56 |
| systemdlete | maybe I need to install some specific FW for this board? | 22:57 |
| systemdlete | (main board) | 22:57 |
| gnarface | possible, or maybe just make it use a different xorg driver | 22:57 |
| systemdlete | hmmm. I never installed firmware-amd-graphics | 22:59 |
| systemdlete | do I need that in a VM? | 22:59 |
| gnarface | no, i doubt that | 22:59 |
| gnarface | but you might need some vm drivers | 23:00 |
| systemdlete | The VM drivers are installed | 23:00 |
| gnarface | virtio or something like that | 23:00 |
| rustyaxe | ya fbdev is quite old, do any modern drivers use that? | 23:00 |
| gnarface | arm hardware still maybe | 23:00 |
| rustyaxe | I think thats just xorg probing for the display and trying an old device (fbdev still exists after all but you probably dont want to use it instead of a more optimized driver) | 23:00 |
| systemdlete | rustyaxe, gnarface: Keep in mind this is 15+ year old tech. AM3 platforms | 23:01 |
| systemdlete | using a built-in video fw | 23:01 |
| rustyaxe | fbdev still predates that | 23:01 |
| rustyaxe | you're passing the video through to the vm? or using emulated video? | 23:02 |
| rustyaxe | That'll decide which driver the guest needs | 23:02 |
| systemdlete | well, I'm wondering if parts of the system are starting to drop support for "older" hardware, esp. if the drivers from them might not be quite up to the standard for u-know-what | 23:02 |
| systemdlete | virtualbox, using kvm virtualization | 23:03 |
| rustyaxe | we still have phenom ii machines running fine | 23:03 |
| rustyaxe | so no the guest shouldnt need the amd graphics stuff as it wont be talking to it, but rather the emulated video card | 23:03 |
| systemdlete | btw, there is just the host and two VMs, and one of the VMs is small (under 1MB) | 23:04 |
| systemdlete | cool. | 23:04 |
| gnarface | if it's like qemu, you'll want to make sure you're loading the virtual driver modules inside the guest | 23:04 |
| gnarface | i forget if you'll need to set xorg.conf too | 23:04 |
| systemdlete | so maybe gnarface's suggestion to switch to a different xorg driver? | 23:04 |
| rustyaxe | yea you can likely select which video card is emulated which will change which driver in the guest you need | 23:04 |
| rustyaxe | I dont use virtualbox, rather proxmox and virt-manager where needful, but im sure its similar to them | 23:05 |
| systemdlete | I have the VM set to use VMSVGA, which is the one recommended for most VMs | 23:05 |
| systemdlete | video memory is 128K | 23:06 |
| systemdlete | ooops | 23:06 |
| systemdlete | 128M | 23:06 |
| systemdlete | and the VM has 8GB RAM | 23:06 |
| gnarface | does it have its own system clock or do they all use the host system clock? | 23:07 |
| systemdlete | I have all of my VMs and hosts using one NTP server, which in turn uses an upstream NTP server | 23:08 |
| gnarface | grasping at straws here, but maybe emulated clock drift could be destabilizing it? | 23:08 |
| systemdlete | good point | 23:08 |
| systemdlete | let me see if it is off | 23:08 |
| systemdlete | no, not by more than a second or so | 23:08 |
| systemdlete | but good idea to check that | 23:08 |
| systemdlete | time sync can be a hazard, esp for network communications | 23:09 |
| gnarface | another daedalus change was the forced migration to ntpsec from ntp, and in the merge of the new example ntp.conf, you might have, like me, accidentally inherited a "...minsec 3" line which, if you're using just one ntp server, will cause it to ignore that server | 23:09 |
| gnarface | and then it will drift if it doesn't have a real clock | 23:09 |
| systemdlete | Vbox does provide a clock, but as far as I know, I don't use that (except for sync'ing up at VM boot). | 23:10 |
| gnarface | ah | 23:10 |
| systemdlete | right | 23:10 |
| gnarface | with qemu i tell it to use the host's clock because when left to its own devices it screws up | 23:10 |
| systemdlete | do you mean minsane? I have that set to 1 | 23:10 |
| gnarface | yea, meant minsane, sorry | 23:11 |
| systemdlete | np. I knew what you meant. | 23:11 |
| gnarface | yes, it should be 1 | 23:11 |
| systemdlete | of course, I'm no longer sure just how much any of this means now that browsers and maybe other programs are using NTP over HTTP or something | 23:11 |
| systemdlete | at any rate, I am not noticing any huge amount of drift, at least not in this case | 23:13 |
| systemdlete | although, maybe at the very moment that the freeze begins, there might be a lag. The only problem with that theory, is that I have other VMs that do not have desktops freezing intermittently | 23:14 |
| systemdlete | I have developed an extensive checklist of gotchas for new VMs and hosts, exactly for this reason. Updating the minsane value is just one of dozens | 23:16 |
| gnarface | do you have any shared mounts with the VMs? | 23:17 |
| systemdlete | yes! | 23:17 |
| gnarface | i wonder if it could be file contention in a shared mount | 23:17 |
| systemdlete | I have a LAN server and just about every host and VM is normally mounted to it. | 23:17 |
| gnarface | cache directory or something maybe...? | 23:17 |
| systemdlete | I use that as a sort of "clipboard" for passing files and data back and forth | 23:18 |
| rustyaxe | A hazard? | 23:18 |
| rustyaxe | You mean a must | 23:18 |
| systemdlete | I'm not using it to boot from, nor for any ongoing file operations. Just to transfer files around. | 23:18 |
| rustyaxe | Many network protocols wont work without good time sync | 23:19 |
| systemdlete | rustyaxe, yes. I have noticed that! | 23:19 |
| rustyaxe | Generally just throw chrony in them and it'll do the right thing | 23:19 |
| systemdlete | N.B.: elogind-daemon is in "D" state and does not seem to ever change. | 23:23 |
| gnarface | do you actually need elogind if you're starting Xorg with startx? | 23:24 |
| systemdlete | no, probably not. I think it is an artifact of when I used to login using a WM | 23:25 |
| systemdlete | ok, I have disabled elogind | 23:26 |
| systemdlete | I could kill off the remaining processes, one by one, hoping to narrow it down a bit. | 23:28 |
| systemdlete | But if I kill them in the wrong order, it could kill dependent processes off also, so I won't get an accurate fix. | 23:28 |
| systemdlete | now this is whacked. "service elogind stop" but elogind-daemon is still running | 23:28 |
| gnarface | suspicious | 23:28 |
| systemdlete | it won't die | 23:29 |
| systemdlete | kill -KILL 1885 (pid of elogind daemon) doesn't do anything | 23:29 |
| gnarface | give it a few | 23:29 |
| systemdlete | its parent is PID 1 | 23:29 |
| gnarface | maybe Xorg is gonna have to be killed | 23:29 |
| gnarface | i would still try to kill the running programs under xfce first though | 23:30 |
| systemdlete | there is still a xfce4-terminal running | 23:31 |
| systemdlete | I thought I'd killed that with the shells | 23:31 |
| systemdlete | stopped dbus, but dbus-launch still running--is that right? | 23:35 |
| systemdlete | desktop is still frozen at this point | 23:35 |
| systemdlete | even though I've killed off xfce4-* processes | 23:35 |
| systemdlete | (I am shelled into the VM as root, incidientally) | 23:36 |
| systemdlete | killed off at-spi-bus-launcher and now dbus-daemon is gone | 23:37 |
| systemdlete | and desktop still frozen | 23:37 |
| gnarface | is the window manager process still there? | 23:39 |
| systemdlete | ps -ef |grep wm shows nothing | 23:40 |
| systemdlete | ok, killed off all the vbox client processes, still frozen | 23:40 |
| systemdlete | xinit, Xorg, xfsettingsd, and xfdesktop still running | 23:43 |
Generated by irclog2html.py 2.17.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!