Help debug a network issue

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Help debug a network issue

Robin Laing
Hello,

I am trying to trace down a problem with a laptop that when it goes into
suspend for any reason, the network won't come back up.  Only a reboot
will enable the wired network.

This problem started in February after a kernel update with Fedora 26.
Upgraded to Fedora 27 today and the problem still persists.  I was
hoping it would be fixed.

The only indication of any issue is an error message that pops up.
        kernel: do_IRQ: 7.33 No irq handler for vector.

I would like to find more details but if I cannot I will just file a bug
against the kernel.

Robin
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Help debug a network issue

Samuel Sieb
On 03/25/2018 02:49 PM, Robin Laing wrote:
> I am trying to trace down a problem with a laptop that when it goes into
> suspend for any reason, the network won't come back up.  Only a reboot
> will enable the wired network.

Have you tried unloading and reloading the kernel module?

> The only indication of any issue is an error message that pops up.
>      kernel: do_IRQ: 7.33 No irq handler for vector.

These messages are usually benign.

> I would like to find more details but if I cannot I will just file a bug
> against the kernel.

Have you checked the journal for the time around the resume?  Note that
the first chunk of messages at the resume time are actually from the end
of the suspend before the resume.

It would also be useful to know the network chipset.  "lspci -v" will
tell you both the chipset and the kernel driver being used.  After
resume, try doing "modprobe -r <modulename>", then if that was
successful, do "modprobe <modulename>" and see if that fixes it.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Help debug a network issue

Robin Laing
In reply to this post by Robin Laing
On 25/03/18 17:34, Samuel Sieb wrote:

> On 03/25/2018 02:49 PM, Robin Laing wrote:
>> I am trying to trace down a problem with a laptop that when it goes
>> into suspend for any reason, the network won't come back up.  Only a
>> reboot will enable the wired network.
>
> Have you tried unloading and reloading the kernel module?
>
>> The only indication of any issue is an error message that pops up.
>>      kernel: do_IRQ: 7.33 No irq handler for vector.
>
> These messages are usually benign.
>
>> I would like to find more details but if I cannot I will just file a
>> bug against the kernel.
>
> Have you checked the journal for the time around the resume?  Note that
> the first chunk of messages at the resume time are actually from the end
> of the suspend before the resume.
>
> It would also be useful to know the network chipset.  "lspci -v" will
> tell you both the chipset and the kernel driver being used.  After
> resume, try doing "modprobe -r <modulename>", then if that was
> successful, do "modprobe <modulename>" and see if that fixes it.
> _______________________________________________
> users mailing list -- [hidden email]
> To unsubscribe send an email to [hidden email]


I have looked through the journal logs before but still learning journalctl.


Looking through my notes, the problem seems to start around Feb 26.




Network controller is:

Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
PCI Express Gigabit Ethernet Controller (rev 0c)

Module is:

Kernel modules: r8169

 From journal
The lid closed is detected and network manager shuts down the connection.

Network name is enp4s0


Start of suspend

Mar 25 21:31:44 xx NetworkManager[7949]: <info>  [1522013504.0482]
device (enp4s0): state change: activated -> deactivating (reason
'sleeping', internal state 'managed')

Mar 25 21:31:44 xx NetworkManager[7949]: <info>  [1522013504.0920]
device (enp4s0): state change: deactivating -> disconnected (reason
'sleeping', internal state 'managed')
Mar 25 21:31:44 xx avahi-daemon[7862]: Withdrawing address record for
2001:56a:7680:b500:4216:7eff:fe10:e09a on enp4s0.
Mar 25 21:31:44 xx NetworkManager[7949]: <info>  [1522013504.0926] dhcp6
(enp4s0): canceled DHCP transaction
Mar 25 21:31:44 xx avahi-daemon[7862]: Leaving mDNS multicast group on
interface enp4s0.IPv6 with address 2001:56a:7680:b500:4216:7eff:fe10:e09a.
Mar 25 21:31:44 xx avahi-daemon[7862]: Joining mDNS multicast group on
interface enp4s0.IPv6 with address fe80::4216:7eff:fe10:e09a.
Mar 25 21:31:44 xx avahi-daemon[7862]: Registering new address record
for fe80::4216:7eff:fe10:e09a on enp4s0.*.
Mar 25 21:31:44 xx avahi-daemon[7862]: Withdrawing address record for
fe80::4216:7eff:fe10:e09a on enp4s0.
Mar 25 21:31:44 xx avahi-daemon[7862]: Leaving mDNS multicast group on
interface enp4s0.IPv6 with address fe80::4216:7eff:fe10:e09a.
Mar 25 21:31:44 xx avahi-daemon[7862]: Interface enp4s0.IPv6 no longer
relevant for mDNS.
Mar 25 21:31:44 xx avahi-daemon[7862]: Withdrawing address record for
192.168.1.21 on enp4s0.
Mar 25 21:31:44 xx avahi-daemon[7862]: Leaving mDNS multicast group on
interface enp4s0.IPv4 with address 192.168.1.21.
Mar 25 21:31:44 xx avahi-daemon[7862]: Interface enp4s0.IPv4 no longer
relevant for mDNS.
Mar 25 21:31:44 xx NetworkManager[7949]: <info>  [1522013504.0950]
device (enp4s0): state change: disconnected -> unmanaged (reason
'sleeping', internal state 'managed')
Mar 25 21:31:44 xx nm-dispatcher[9588]: req:2 'down' [enp4s0]: new
request (6 scripts)
Mar 25 21:31:44 xx nm-dispatcher[9588]: req:2 'down' [enp4s0]: start
running ordered scripts...

Start of open lid from suspend

Mar 25 21:33:45 xx NetworkManager[7949]: <info>  [1522013625.9934]
device (enp4s0): state change: unmanaged -> unavailable (reason
'managed', internal state 'managed')
Mar 25 21:33:45 xx kernel: IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is
not ready
Mar 25 21:33:46 xx kernel: r8169 0000:04:00.0 enp4s0: link down
Mar 25 21:33:46 xx kernel: IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is
not ready


This laptop is using KDE and sddm.  Is there a

Looking further through the log files at another suspend today I came
across this.

Mar 26 01:07:54 xx kernel: r8169 0000:04:00.0 enp4s0: link down

Also, I find this but not sure if it is related.


Mar 26 01:07:54 xx audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295
ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=NetworkManager-dispatcher comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 26 01:07:55 xx ModemManager[1116]: <info>  Couldn't check support
for device at '/sys/devices/pci0000:00/0000:00:1c.2/0000:03:00.0': not
supported by any plugin
Mar 26 01:07:55 xx  ModemManager[1116]: <info>  Couldn't check support
for device at '/sys/devices/pci0000:00/0000:00:1c.3/0000:04:00.0': not
supported by any plugin
Mar 26 01:07:55 xx kernel: do_IRQ: 7.33 No irq handler for vector


Looking further into the log files, I don't seen any mention of r1869
before March 17 when I tried to make a change to the boot parameters
from something I found on the net which was almost a month after the
problem started.

pci=nomsi,noaer


I will try the modprobe when I can.

Robin
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Help debug a network issue

Robin Laing
In reply to this post by Robin Laing
On 25/03/18 17:34, Samuel Sieb wrote:

> On 03/25/2018 02:49 PM, Robin Laing wrote:
>> I am trying to trace down a problem with a laptop that when it goes
>> into suspend for any reason, the network won't come back up.  Only a
>> reboot will enable the wired network.
>
> Have you tried unloading and reloading the kernel module?
>
>> The only indication of any issue is an error message that pops up.
>>      kernel: do_IRQ: 7.33 No irq handler for vector.
>
> These messages are usually benign.
>
>> I would like to find more details but if I cannot I will just file a
>> bug against the kernel.
>
> Have you checked the journal for the time around the resume?  Note that
> the first chunk of messages at the resume time are actually from the end
> of the suspend before the resume.
>
> It would also be useful to know the network chipset.  "lspci -v" will
> tell you both the chipset and the kernel driver being used.  After
> resume, try doing "modprobe -r <modulename>", then if that was
> successful, do "modprobe <modulename>" and see if that fixes it.
> _______________________________________________
> users mailing list -- [hidden email]
> To unsubscribe send an email to [hidden email]


Finally got to try the modprobe and it did restart the network.

sudo modprobe -r r8169
sudo modprobe r1869

So, what is my next step in finding out why this won't restart on suspend?


Robin
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Help debug a network issue

Samuel Sieb
In reply to this post by Robin Laing
On 03/25/2018 07:45 PM, Robin Laing wrote:
> Network controller is:
>
> Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
> PCI Express Gigabit Ethernet Controller (rev 0c)
>
> Module is:
>
> Kernel modules: r8169

That's the right driver.  One thing you could try is after removing the
module, try "modprobe r8169 debug=n" where n is a number up to 16.  That
will give you more debugging info in the log.  Careful, 16 might really
spam the log, so maybe start at 8 and work your way up.

> Start of open lid from suspend
>
> Mar 25 21:33:45 xx NetworkManager[7949]: <info>  [1522013625.9934]
> device (enp4s0): state change: unmanaged -> unavailable (reason
> 'managed', internal state 'managed')
> Mar 25 21:33:45 xx kernel: IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is
> not ready
> Mar 25 21:33:46 xx kernel: r8169 0000:04:00.0 enp4s0: link down
> Mar 25 21:33:46 xx kernel: IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is
> not ready

The driver is saying that there is no link detected.  Are the lights on?
  What does "ethtool enp4s0" tell you?

> Mar 26 01:07:55 xx  ModemManager[1116]: <info>  Couldn't check support
> for device at '/sys/devices/pci0000:00/0000:00:1c.3/0000:04:00.0': not
> supported by any plugin

This one looks like your network card, but you don't want ModemManager
doing anything with it anyway.

> Looking further into the log files, I don't seen any mention of r1869
> before March 17 when I tried to make a change to the boot parameters
> from something I found on the net which was almost a month after the
> problem started.
>
> pci=nomsi,noaer

I would suggest removing this.

My guess, given that reloading the driver makes it work again, is that
after resume, the driver is not turning some part of the chipset back
on.  Maybe the interrupts are getting turned back on.

 > Mar 26 01:07:55 xx kernel: do_IRQ: 7.33 No irq handler for vector

What does "grep r8169 /proc/interrupts" give you when the interface is
working?  Try it a couple of times and see how the numbers change.  Then
when it's not working try it again a few times and see if the numbers
are still changing.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Help debug a network issue

Ulf Volmer
In reply to this post by Robin Laing
On 26.03.2018 08:40, Robin Laing wrote:

> sudo modprobe -r r8169
> sudo modprobe r1869
>
> So, what is my next step in finding out why this won't restart on suspend?

you can place a script for automatically load/unload your network driver.

see

https://blog.christophersmart.com/2016/05/11/running-scripts-before-and-after-suspend-with-systemd/

best regards
Ulf
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Help debug a network issue

Robin Laing
In reply to this post by Robin Laing
I have been busy and unable to look at this until today.

On 26/03/18 01:05, Samuel Sieb wrote:

> On 03/25/2018 07:45 PM, Robin Laing wrote:
>> Network controller is:
>>
>> Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
>> PCI Express Gigabit Ethernet Controller (rev 0c)
>>
>> Module is:
>>
>> Kernel modules: r8169
>
> That's the right driver.  One thing you could try is after removing the
> module, try "modprobe r8169 debug=n" where n is a number up to 16.  That
> will give you more debugging info in the log.  Careful, 16 might really
> spam the log, so maybe start at 8 and work your way up.

Going to 16 didn't make any difference from 10.

Apr 13 14:03:41 tdllap kernel: r8169 0000:04:00.0: can't disable ASPM;
OS doesn't have ASPM control
Apr 13 14:03:41 tdllap kernel: r8169 0000:04:00.0 eth0: RTL8168g/8111g
at 0x00000000ef2b4190, 40:16:7e:10:e0:9a, XID 0c000880 IRQ 34
Apr 13 14:03:41 tdllap kernel: r8169 0000:04:00.0 eth0: jumbo features
[frames: 9200 bytes, tx checksumming: ko]
Apr 13 14:03:41 tdllap kernel: r8169 0000:04:00.0 enp4s0: renamed from eth0
Apr 13 14:03:41 tdllap kernel: r8169 0000:04:00.0 enp4s0: link down
Apr 13 14:03:44 tdllap kernel: r8169 0000:04:00.0 enp4s0: link up


>> not ready
>
> The driver is saying that there is no link detected.  Are the lights on?
>   What does "ethtool enp4s0" tell you?
>

Link lights on switch come up when lid closed and opened without
reloading the network driver.

ethtool shows Link detected: no which is interesting.


>> Looking further into the log files, I don't seen any mention of r1869
>> before March 17 when I tried to make a change to the boot parameters
>> from something I found on the net which was almost a month after the
>> problem started.
>>
>> pci=nomsi,noaer
>
> I would suggest removing this.
>
> My guess, given that reloading the driver makes it work again, is that
> after resume, the driver is not turning some part of the chipset back
> on.  Maybe the interrupts are getting turned back on.
>
>  > Mar 26 01:07:55 xx kernel: do_IRQ: 7.33 No irq handler for vector
>
> What does "grep r8169 /proc/interrupts" give you when the interface is
> working?  Try it a couple of times and see how the numbers change.  Then
> when it's not working try it again a few times and see if the numbers
> are still changing.

This is from /proc/interupts and doesn't change between suspends or
disappear.  It is there from boot until I remove the module.

  34:          0          0          0          0        125          0
     0        175  IR-PCI-MSI 2097152-edge      enp4s0

It used to work until February but I don't know what update affected it
as I wasn't told until a few kernel updates that there was an issue.

What I found that is when I load the module, lsmod give me this.
r8169                  94208  0
mii                    16384  1 r8169

I am going to look more at the mii-tool and see if that has anything to
do with it.

I did find another thread about kernel modules being broken in February
and specifically mentioning r8169 module not reloading on suspend.

https://forum.manjaro.org/t/linux415-r8168-cant-connect-to-the-network-after-suspend-to-ram/39557/4

https://forum.manjaro.org/t/kernel-update-broke-ethernet-driver-realtek-r8168-r8169/39551/4

Robin
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Help debug a network issue

Robin Laing
In reply to this post by Robin Laing
On 26/03/18 11:57, Ulf Volmer wrote:

> On 26.03.2018 08:40, Robin Laing wrote:
>
>> sudo modprobe -r r8169
>> sudo modprobe r1869
>>
>> So, what is my next step in finding out why this won't restart on suspend?
>
> you can place a script for automatically load/unload your network driver.
>
> see
>
> https://blog.christophersmart.com/2016/05/11/running-scripts-before-and-after-suspend-with-systemd/
>
> best regards
> Ulf


This works.

Thanks.

This is the scrip I used.


#!/bin/sh
if [ "${1}" == "pre" ]; then
   # Do the thing you want before suspend here, e.g.:
#  echo "we are suspending at $(date)..." > /tmp/systemd_suspend_test
   modprobe -r r8169
elif [ "${1}" == "post" ]; then
   # Do the thing you want after resume here, e.g.:
#   echo "...and we are back from $(date)" >> /tmp/systemd_suspend_test
   modprobe r8169
fi


Robin
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]