Troubleshooting random hangs

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Troubleshooting random hangs

suvayu ali
Hi,

I have been having random hangs on my new Ryzen workstation (Ryzen 5
2400G + B350 mobo). My hardware is supposedly properly supported on
4.15+ kernels.  But I have been unable to boot with any of the ones in
the repo.

That said, I can boot with older kernels, but the desktop hangs
randomly.  When I say hang, I mean it freezes, and my only recourse is
to reset my computer.  I have tried to login remotely, but then I get
"No route to host" from ssh.  Looking at the journal, I can't figure
out what is causing these hangs.  If someone could have a look, that
would be wonderful.

Logs from the last two hangs:

https://paste.fedoraproject.org/paste/8T~X8BYuVboAJK3Mkal72A
https://paste.fedoraproject.org/paste/qypJAnAKE01GD-OgC6n0SQ

TIA,

--
Suvayu

Open source is the future. It sets us free.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting random hangs

Jorge Martínez López-2
Hi Suvayu,


On Mon, 5 Mar 2018 at 16:23 Suvayu Ali <[hidden email]> wrote:
Hi,

I have been having random hangs on my new Ryzen workstation (Ryzen 5
2400G + B350 mobo). My hardware is supposedly properly supported on
4.15+ kernels.  But I have been unable to boot with any of the ones in
the repo.

That said, I can boot with older kernels, but the desktop hangs
randomly.  When I say hang, I mean it freezes, and my only recourse is
to reset my computer.  I have tried to login remotely, but then I get
"No route to host" from ssh.  Looking at the journal, I can't figure
out what is causing these hangs.  If someone could have a look, that
would be wonderful.

Logs from the last two hangs:

https://paste.fedoraproject.org/paste/8T~X8BYuVboAJK3Mkal72A
https://paste.fedoraproject.org/paste/qypJAnAKE01GD-OgC6n0SQ

TIA,


Does the screen freeze completely (e.g. the clock doesn't change)? Magic SysRq doesn't work? Are you running Chromium by any chance? 

I've been experiencing similar hangs, first with an AMD Phenom II processor and recently with a Ryzen 5 2400G and a X370 motherboard. This morning I left the computer unattended with Chromium and top open and top showed Chromium taking 34% of the CPU around the crash time. Then I left the computer unattended without running Chromium and ran happily for hours.

I might have a go at running Chromium with debug logs to see if I can see anything unusual. 

Reviewing the journal a few crashes ago I saw some Chromium errors regarding GPU, might be a red herring though.

Frustrating.

Greetings,
Jorge

 

_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting random hangs

Jorge Martínez López-2
Hello all,

On Tue, 13 Mar 2018 at 19:19 Jorge Martínez López <[hidden email]> wrote:
Does the screen freeze completely (e.g. the clock doesn't change)? Magic SysRq doesn't work? Are you running Chromium by any chance? 


I did some research and found the following kernel bug:


Fedora has CONFIG_RCU_NOCB_CPU=y in the 4.15.8 kernel configuration so I added "rcu_nocbs=0-7" to the boot parameters and it has been running stable for a while. I have also added "nopti" as well as there is some anecdotal evidence it improves stability but I'm not sure about that.

There is also some discussions in the bug page about old PSUs not providing good enough low voltage, AMD is recommending running a newer PSU (post-Haswell) but for the time being the boot config is working for me.

Greetings,
Jorge

_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting random hangs

suvayu ali
Hi Jorge,

I didn't see your responses until today!  I guess I got some clarity from our
bugzilla discussions.

On Fri, Mar 16, 2018 at 09:43:04AM +0000, Jorge Martínez López wrote:
> I did some research and found the following kernel bug:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=196683
>
> Fedora has CONFIG_RCU_NOCB_CPU=y in the 4.15.8 kernel configuration so I
> added "rcu_nocbs=0-7" to the boot parameters and it has been running stable
> for a while. I have also added "nopti" as well as there is some anecdotal
> evidence it improves stability but I'm not sure about that.

I think my tracebacks are very different.  That said, it also seems to me I'm
having freezes due to several unrelated reasons, and AMDGPU is probably one
among many.

> There is also some discussions in the bug page about old PSUs not providing
> good enough low voltage, AMD is recommending running a newer PSU
> (post-Haswell) but for the time being the boot config is working for me.

This is an interesting point, but very difficult to test :-|.

I haven't been able to debug my issues successfully as nothing useful really
shows up in the journal.  I was hoping someone could suggest a way so that I
could get more information to file a more specific bug report.

Any thoughts anyone?

TIA,

--
Suvayu

Open source is the future. It sets us free.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]