Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB host mode doesn't work if compiled with OTG support #6

Open
abrasive opened this issue Feb 25, 2013 · 3 comments
Open

USB host mode doesn't work if compiled with OTG support #6

abrasive opened this issue Feb 25, 2013 · 3 comments

Comments

@abrasive
Copy link
Member

In a kernel with OTG support built as a module (so it defaults to host mode), plugging in a USB device causes an unhandled interrupt 75. This causes the interrupt to be disabled and stops the dual-role port from working at all.

This is probably due to something (uboot?) leaving an interrupt source enabled that is normally handled by the device controller (arcotg_udc).

Demonstrated when building using imx6_hdmidongle_usb_defconfig

@rzk
Copy link
Member

rzk commented Feb 26, 2013

We need to test this u-boot http://www.hiapad.com/?p=2064 and see if it will have same problem.

@abrasive
Copy link
Member Author

I suspect adding

UOG_OTSC = 0;

to the end of mx6_usb_dr_init in arch/arm/mach-mx6/usb_dr.c would fix it, although it'd probably be better off in the board file. I'll try it out tonight.

@abrasive
Copy link
Member Author

This is a wakeup issue. A lot of the code assumes that if OTG is enabled then the UDC is enabled too. Fiddled with it for a while; no progress. Easier just to disable device mode entirely.

mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Sep 22, 2013
xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
work during mount and unmount.  This flag can be cleared by unmount
after the xfs_sync_worker checks it but before the work is completed.
The has caused crashes in the completion handler for the dummy
transaction commited by xfs_sync_worker:

PID: 27544  TASK: ffff88013544e040  CPU: 3   COMMAND: "kworker/3:0"
 #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
 imx6-dongle#2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
 imx6-dongle#3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
 imx6-dongle#4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
 imx6-dongle#5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
 imx6-dongle#6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
 imx6-dongle#7 [ffff88016fdffc60] page_fault at ffffffff813ac635
    [exception RIP: xlog_get_lowest_lsn+0x30]
    RIP: ffffffffa04a9910  RSP: ffff88016fdffd10  RFLAGS: 00010246
    RAX: ffffc90014e48000  RBX: ffff88014d879980  RCX: ffff88014d879980
    RDX: ffff8802214ee4c0  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88016fdffd10   R8: ffff88014d879a80   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff8802214ee400
    R13: ffff88014d879980  R14: 0000000000000000  R15: ffff88022fd96605
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 imx6-dongle#8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
 imx6-dongle#9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]

Protect xfs_sync_worker by using the s_umount semaphore at the read
level to provide exclusion with unmount while work is progressing.

Reviewed-by: Mark Tinguely <[email protected]>
Signed-off-by: Ben Myers <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Sep 22, 2013
…condition

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 imx6-dongle#2 [f06a9e40] no_context at c0433ded
 imx6-dongle#3 [f06a9e64] bad_area_nosemaphore at c043401a
 imx6-dongle#4 [f06a9e6c] __do_page_fault at c0434493
 imx6-dongle#5 [f06a9eec] do_page_fault at c083eb45
 imx6-dongle#6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 imx6-dongle#7 [f06a9f38] _spin_lock at c083bc14
 imx6-dongle#8 [f06a9f44] sys_mincore at c0507b7d
 imx6-dongle#9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----

Reported-by: Ulrich Obergfell <[email protected]>
Signed-off-by: Andrea Arcangeli <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Petr Matousek <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Sep 22, 2013
… by ipw2100_pci_init_one

The problem was found by Larry Finger:
http://marc.info/?l=linux-wireless&m=133702401700614&w=2

The problem is identical to the one for ipw2200 which is already fixed:
http://marc.info/?l=linux-wireless&m=133457257407196&w=2

[   17.766431] ------------[ cut here ]------------
[   17.766467] WARNING: at net/wireless/core.c:562 wiphy_register+0x34c/0x3c0 [cfg80211]()
[   17.766471] Hardware name: Latitude D600
[   17.766474] Modules linked in: ipw2100(+) libipw pcmcia cfg80211 ppdev parport_pc yenta_socket sr_mod pcmcia_rsrc parport iTCO_wdt cdrom sg rfkill pcmcia_
core lib80211 tg3 video button battery ac iTCO_vendor_support joydev shpchp pcspkr pciehp pci_hotplug autofs4 radeon ttm drm_kms_helper uhci_hcd ehci_hcd rtc
_cmos thermal drm hwmon i2c_algo_bit i2c_core processor usbcore usb_common ata_generic ata_piix ahci libahci libata
[   17.766525] Pid: 474, comm: modprobe Not tainted 3.4.0-rc7-wl+ imx6-dongle#6
[   17.766528] Call Trace:
[   17.766541]  [<c066ad08>] ? printk+0x28/0x2a
[   17.766552]  [<c0230edd>] warn_slowpath_common+0x6d/0xa0
[   17.766563]  [<e0b253bc>] ? wiphy_register+0x34c/0x3c0 [cfg80211]
[   17.766573]  [<e0b253bc>] ? wiphy_register+0x34c/0x3c0 [cfg80211]
[   17.766578]  [<c0230f2d>] warn_slowpath_null+0x1d/0x20
[   17.766588]  [<e0b253bc>] wiphy_register+0x34c/0x3c0 [cfg80211]
[   17.766605]  [<e0b5b0d6>] ipw2100_wdev_init+0x196/0x1c0 [ipw2100]
[   17.766616]  [<e0b5d962>] ipw2100_pci_init_one+0x2b2/0x694 [ipw2100]
[   17.766632]  [<c047ce52>] local_pci_probe+0x42/0xb0
[   17.766637]  [<c047e2b0>] pci_device_probe+0x60/0x90
[   17.766645]  [<c0376de2>] ? sysfs_create_link+0x12/0x20
[   17.766654]  [<c050f1f6>] really_probe+0x56/0x2e0
[   17.766659]  [<c037636d>] ? create_dir+0x5d/0xa0
[   17.766667]  [<c0518c6b>] ? pm_runtime_barrier+0x3b/0xa0
[   17.766672]  [<c050f5e4>] driver_probe_device+0x44/0xa0
[   17.766677]  [<c047e227>] ? pci_match_device+0x97/0xa0
[   17.766681]  [<c050f6c9>] __driver_attach+0x89/0x90
[   17.766686]  [<c050f640>] ? driver_probe_device+0xa0/0xa0
[   17.766691]  [<c050da2a>] bus_for_each_dev+0x3a/0x70
[   17.766695]  [<c050ee6c>] driver_attach+0x1c/0x30
[   17.766699]  [<c050f640>] ? driver_probe_device+0xa0/0xa0
[   17.766704]  [<c050ea77>] bus_add_driver+0x187/0x280
[   17.766710]  [<c045b9cd>] ? kset_find_obj+0x2d/0x60
[   17.766715]  [<c047e2e0>] ? pci_device_probe+0x90/0x90
[   17.766719]  [<c047e2e0>] ? pci_device_probe+0x90/0x90
[   17.766724]  [<c050fb85>] driver_register+0x65/0x110
[   17.766729]  [<c047e09d>] __pci_register_driver+0x3d/0xa0
[   17.766738]  [<e09f705c>] ipw2100_init+0x5c/0x1000 [ipw2100]
[   17.766743]  [<c020110f>] do_one_initcall+0x2f/0x170
[   17.766749]  [<e09f7000>] ? 0xe09f6fff
[   17.766757]  [<c0287ce8>] sys_init_module+0xa8/0x210
[   17.766766]  [<c067a075>] syscall_call+0x7/0xb
[   17.766769] ---[ end trace 559898c6bb0d1c75 ]---
[   17.767093] ipw2100: probe of 0000:02:03.0 failed with error -5

This warning appears only if we apply Ben Hutchings' fix
http://marc.info/?l=linux-wireless&m=132720204412667&w=2
for the bug reported by Cesare Leonardi
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=656813
with cfg80211 warning during device registration
("cfg80211: failed to add phy80211 symlink to netdev!").

We separate device bring up and registration with network stack
to avoid the problem.

Signed-off-by: Stanislav Yakovlev <[email protected]>
Tested-by: Larry Finger <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Sep 22, 2013
The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 imx6-dongle#2 imx6-dongle#3 imx6-dongle#4 imx6-dongle#5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU imx6-dongle#6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  imx6-dongle#6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  imx6-dongle#7 imx6-dongle#8 imx6-dongle#9 imx6-dongle#10 imx6-dongle#11 Ok.
[    0.807108] Booting Node   3, Processors  imx6-dongle#12 imx6-dongle#13 imx6-dongle#14 imx6-dongle#15 imx6-dongle#16 imx6-dongle#17 Ok.
[    0.897587] Booting Node   2, Processors  imx6-dongle#18 imx6-dongle#19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andreas Herrmann <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Sep 22, 2013
Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 imx6-dongle#2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 imx6-dongle#3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 imx6-dongle#4 [eed63e50] do_writepages at c04f3e32
 imx6-dongle#5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 imx6-dongle#6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 imx6-dongle#7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 imx6-dongle#8 [eed63ecc] do_sync_write at c052d202
 imx6-dongle#9 [eed63f74] vfs_write at c052d4ee
imx6-dongle#10 [eed63f94] sys_write at c052df4c
imx6-dongle#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Cc: <[email protected]>
Reported-by: Jian Li <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Steve French <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Sep 22, 2013
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     imx6-dongle#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     imx6-dongle#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     imx6-dongle#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     imx6-dongle#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     imx6-dongle#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     imx6-dongle#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     imx6-dongle#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     imx6-dongle#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    imx6-dongle#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    imx6-dongle#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    imx6-dongle#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    imx6-dongle#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    imx6-dongle#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    imx6-dongle#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    imx6-dongle#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    imx6-dongle#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    imx6-dongle#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    imx6-dongle#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Sep 22, 2013
commit bea6832 upstream.

On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   imx6-dongle#2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   imx6-dongle#3 [ffff880472a51cd0] die at ffffffff8100f26b
   imx6-dongle#4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   imx6-dongle#5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   imx6-dongle#6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   imx6-dongle#7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   imx6-dongle#8 [ffff880472a51f40] sys_times at ffffffff81088524
   imx6-dongle#9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Signed-off-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
…d reasons

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     imx6-dongle#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     imx6-dongle#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     imx6-dongle#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     imx6-dongle#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     imx6-dongle#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     imx6-dongle#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     imx6-dongle#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     imx6-dongle#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    imx6-dongle#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    imx6-dongle#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    imx6-dongle#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    imx6-dongle#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    imx6-dongle#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    imx6-dongle#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    imx6-dongle#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    imx6-dongle#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    imx6-dongle#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    imx6-dongle#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Cc: [email protected]
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   imx6-dongle#2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   imx6-dongle#3 [ffff880472a51cd0] die at ffffffff8100f26b
   imx6-dongle#4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   imx6-dongle#5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   imx6-dongle#6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   imx6-dongle#7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   imx6-dongle#8 [ffff880472a51f40] sys_times at ffffffff81088524
   imx6-dongle#9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Cc: [email protected]
Signed-off-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, imx6-dongle#2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl imx6-dongle#16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, imx6-dongle#3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl imx6-dongle#8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, imx6-dongle#6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, imx6-dongle#5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, imx6-dongle#2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, imx6-dongle#4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl imx6-dongle#16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl imx6-dongle#8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, imx6-dongle#7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl imx6-dongle#16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, imx6-dongle#3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl imx6-dongle#8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, imx6-dongle#4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, imx6-dongle#2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, imx6-dongle#3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl imx6-dongle#8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl imx6-dongle#16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, imx6-dongle#4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, imx6-dongle#5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, imx6-dongle#2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, imx6-dongle#6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, imx6-dongle#7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, imx6-dongle#3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl imx6-dongle#8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl imx6-dongle#8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl imx6-dongle#16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl imx6-dongle#16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Nicolas Pitre <[email protected]>
Tested-by: Thomas Petazzoni <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Russell King <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Fixes following lockdep splat :

[ 1614.734896] =============================================
[ 1614.734898] [ INFO: possible recursive locking detected ]
[ 1614.734901] 3.6.0-rc3+ #782 Not tainted
[ 1614.734903] ---------------------------------------------
[ 1614.734905] swapper/11/0 is trying to acquire lock:
[ 1614.734907]  (slock-AF_INET){+.-...}, at: [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.734920]
[ 1614.734920] but task is already holding lock:
[ 1614.734922]  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734932]
[ 1614.734932] other info that might help us debug this:
[ 1614.734935]  Possible unsafe locking scenario:
[ 1614.734935]
[ 1614.734937]        CPU0
[ 1614.734938]        ----
[ 1614.734940]   lock(slock-AF_INET);
[ 1614.734943]   lock(slock-AF_INET);
[ 1614.734946]
[ 1614.734946]  *** DEADLOCK ***
[ 1614.734946]
[ 1614.734949]  May be due to missing lock nesting notation
[ 1614.734949]
[ 1614.734952] 7 locks held by swapper/11/0:
[ 1614.734954]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff81592801>] __netif_receive_skb+0x251/0xd00
[ 1614.734964]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff815d319c>] ip_local_deliver_finish+0x4c/0x4e0
[ 1614.734972]  imx6-dongle#2:  (rcu_read_lock){.+.+..}, at: [<ffffffff8160d116>] icmp_socket_deliver+0x46/0x230
[ 1614.734982]  imx6-dongle#3:  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734989]  imx6-dongle#4:  (rcu_read_lock){.+.+..}, at: [<ffffffff815da240>] ip_queue_xmit+0x0/0x680
[ 1614.734997]  imx6-dongle#5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9925>] ip_finish_output+0x135/0x890
[ 1614.735004]  imx6-dongle#6:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81595680>] dev_queue_xmit+0x0/0xe00
[ 1614.735012]
[ 1614.735012] stack backtrace:
[ 1614.735016] Pid: 0, comm: swapper/11 Not tainted 3.6.0-rc3+ #782
[ 1614.735018] Call Trace:
[ 1614.735020]  <IRQ>  [<ffffffff810a50ac>] __lock_acquire+0x144c/0x1b10
[ 1614.735033]  [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0
[ 1614.735037]  [<ffffffff810a6762>] ? mark_held_locks+0x82/0x130
[ 1614.735042]  [<ffffffff810a5df0>] lock_acquire+0x90/0x200
[ 1614.735047]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735051]  [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10
[ 1614.735060]  [<ffffffff81749b31>] _raw_spin_lock+0x41/0x50
[ 1614.735065]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735069]  [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735075]  [<ffffffffa014f7f2>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[ 1614.735079]  [<ffffffff81595112>] dev_hard_start_xmit+0x502/0xa70
[ 1614.735083]  [<ffffffff81594c6e>] ? dev_hard_start_xmit+0x5e/0xa70
[ 1614.735087]  [<ffffffff815957c1>] ? dev_queue_xmit+0x141/0xe00
[ 1614.735093]  [<ffffffff815b622e>] sch_direct_xmit+0xfe/0x290
[ 1614.735098]  [<ffffffff81595865>] dev_queue_xmit+0x1e5/0xe00
[ 1614.735102]  [<ffffffff81595680>] ? dev_hard_start_xmit+0xa70/0xa70
[ 1614.735106]  [<ffffffff815b4daa>] ? eth_header+0x3a/0xf0
[ 1614.735111]  [<ffffffff8161d33e>] ? fib_get_table+0x2e/0x280
[ 1614.735117]  [<ffffffff8160a7e2>] arp_xmit+0x22/0x60
[ 1614.735121]  [<ffffffff8160a863>] arp_send+0x43/0x50
[ 1614.735125]  [<ffffffff8160b82f>] arp_solicit+0x18f/0x450
[ 1614.735132]  [<ffffffff8159d9da>] neigh_probe+0x4a/0x70
[ 1614.735137]  [<ffffffff815a191a>] __neigh_event_send+0xea/0x300
[ 1614.735141]  [<ffffffff815a1c93>] neigh_resolve_output+0x163/0x260
[ 1614.735146]  [<ffffffff815d9cf5>] ip_finish_output+0x505/0x890
[ 1614.735150]  [<ffffffff815d9925>] ? ip_finish_output+0x135/0x890
[ 1614.735154]  [<ffffffff815dae79>] ip_output+0x59/0xf0
[ 1614.735158]  [<ffffffff815da1cd>] ip_local_out+0x2d/0xa0
[ 1614.735162]  [<ffffffff815da403>] ip_queue_xmit+0x1c3/0x680
[ 1614.735165]  [<ffffffff815da240>] ? ip_local_out+0xa0/0xa0
[ 1614.735172]  [<ffffffff815f4402>] tcp_transmit_skb+0x402/0xa60
[ 1614.735177]  [<ffffffff815f5a11>] tcp_retransmit_skb+0x1a1/0x620
[ 1614.735181]  [<ffffffff815f7e93>] tcp_retransmit_timer+0x393/0x960
[ 1614.735185]  [<ffffffff815fce23>] ? tcp_v4_err+0x163/0x6b0
[ 1614.735189]  [<ffffffff815fd317>] tcp_v4_err+0x657/0x6b0
[ 1614.735194]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735199]  [<ffffffff8160d19e>] icmp_socket_deliver+0xce/0x230
[ 1614.735203]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735208]  [<ffffffff8160d464>] icmp_unreach+0xe4/0x2c0
[ 1614.735213]  [<ffffffff8160e520>] icmp_rcv+0x350/0x4a0
[ 1614.735217]  [<ffffffff815d3285>] ip_local_deliver_finish+0x135/0x4e0
[ 1614.735221]  [<ffffffff815d319c>] ? ip_local_deliver_finish+0x4c/0x4e0
[ 1614.735225]  [<ffffffff815d3ffa>] ip_local_deliver+0x4a/0x90
[ 1614.735229]  [<ffffffff815d37b7>] ip_rcv_finish+0x187/0x730
[ 1614.735233]  [<ffffffff815d425d>] ip_rcv+0x21d/0x300
[ 1614.735237]  [<ffffffff81592a1b>] __netif_receive_skb+0x46b/0xd00
[ 1614.735241]  [<ffffffff81592801>] ? __netif_receive_skb+0x251/0xd00
[ 1614.735245]  [<ffffffff81593368>] process_backlog+0xb8/0x180
[ 1614.735249]  [<ffffffff81593cf9>] net_rx_action+0x159/0x330
[ 1614.735257]  [<ffffffff810491f0>] __do_softirq+0xd0/0x3e0
[ 1614.735264]  [<ffffffff8109ed24>] ? tick_program_event+0x24/0x30
[ 1614.735270]  [<ffffffff8175419c>] call_softirq+0x1c/0x30
[ 1614.735278]  [<ffffffff8100425d>] do_softirq+0x8d/0xc0
[ 1614.735282]  [<ffffffff8104983e>] irq_exit+0xae/0xe0
[ 1614.735287]  [<ffffffff8175494e>] smp_apic_timer_interrupt+0x6e/0x99
[ 1614.735291]  [<ffffffff81753a1c>] apic_timer_interrupt+0x6c/0x80
[ 1614.735293]  <EOI>  [<ffffffff810a14ad>] ? trace_hardirqs_off+0xd/0x10
[ 1614.735306]  [<ffffffff81336f85>] ? intel_idle+0xf5/0x150
[ 1614.735310]  [<ffffffff81336f7e>] ? intel_idle+0xee/0x150
[ 1614.735317]  [<ffffffff814e6ea9>] cpuidle_enter+0x19/0x20
[ 1614.735321]  [<ffffffff814e7538>] cpuidle_idle_call+0xa8/0x630
[ 1614.735327]  [<ffffffff8100c1ba>] cpu_idle+0x8a/0xe0
[ 1614.735333]  [<ffffffff8173762e>] start_secondary+0x220/0x222

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
This reverts commit 96d17b7 which
caused the following errors at boot:

  [    1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
  [    1.114885] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ imx6-dongle#6
  [    1.114885] Call Trace:
  [    1.114885]  [<ffffffff81273f37>] kobject_init+0x87/0xa0
  [    1.115555]  [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
  [    1.115555]  [<ffffffff8127c870>] ? sprintf+0x40/0x50
  [    1.115555]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.115555]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.115555]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.115555]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.115555]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.115836]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.116834]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.117835]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.117835]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.118401]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.119832]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.120325] ------------[ cut here ]------------
  [    1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
  [    1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
  [    1.121831] Modules linked in:
  [    1.122138] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ imx6-dongle#6
  [    1.122831] Call Trace:
  [    1.123074]  [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
  [    1.123833]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
  [    1.124405]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
  [    1.124832]  [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
  [    1.125337]  [<ffffffff81195eb3>] create_dir+0x73/0xd0
  [    1.125832]  [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
  [    1.126363]  [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
  [    1.126832]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
  [    1.127406]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.127832]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.128384]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.128833]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.129831]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.130305]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.130831]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.131351]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.131830]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.132392]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.132830]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.133315] ---[ end trace 2703540871c8fab7 ]---
  [    1.133830] ------------[ cut here ]------------
  [    1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
  [    1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
  [    1.135829] Modules linked in:
  [    1.136135] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ imx6-dongle#6
  [    1.136828] Call Trace:
  [    1.137071]  [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
  [    1.137830]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
  [    1.138402]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
  [    1.138830]  [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
  [    1.139419]  [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
  [    1.139830]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
  [    1.140429]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.140830]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.141829]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.142307]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.142829]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.143307]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.143829]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.144352]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.144829]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.145405]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.145828]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.146313] ---[ end trace 2703540871c8fab8 ]---

Conflicts:

	mm/slub.c

Signed-off-by: Pekka Enberg <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 imx6-dongle#2 [c5377d98] no_context at c06f614e
 imx6-dongle#3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 imx6-dongle#4 [c5377df4] bad_area_nosemaphore at c06f629b
 imx6-dongle#5 [c5377e00] do_page_fault at c070b0cb
 imx6-dongle#6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 imx6-dongle#7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 imx6-dongle#8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 imx6-dongle#9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
imx6-dongle#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
imx6-dongle#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
imx6-dongle#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
imx6-dongle#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
imx6-dongle#14 [c5377f58] process_one_work at c024ee4c
imx6-dongle#15 [c5377f98] worker_thread at c024f43d
imx6-dongle#16 [c5377fbc] kthread at c025326b
imx6-dongle#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 imx6-dongle#2 [cde0fe30] schedule_timeout at c0705600
 imx6-dongle#3 [cde0fe94] __down_common at c0706098
 imx6-dongle#4 [cde0fec8] __down at c0706122
 imx6-dongle#5 [cde0fed0] down at c025936f
 imx6-dongle#6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 imx6-dongle#7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 imx6-dongle#8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 imx6-dongle#9 [cde0ff1c] generic_shutdown_super at c0333d7a
imx6-dongle#10 [cde0ff38] kill_block_super at c0333e0f
imx6-dongle#11 [cde0ff48] deactivate_locked_super at c0334218
imx6-dongle#12 [cde0ff58] deactivate_super at c033495d
imx6-dongle#13 [cde0ff68] mntput_no_expire at c034bc13
imx6-dongle#14 [cde0ff7c] sys_umount at c034cc69
imx6-dongle#15 [cde0ffa0] sys_oldumount at c034ccd4
imx6-dongle#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Mark Tinguely <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 imx6-dongle#2 [c5377d98] no_context at c06f614e
 imx6-dongle#3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 imx6-dongle#4 [c5377df4] bad_area_nosemaphore at c06f629b
 imx6-dongle#5 [c5377e00] do_page_fault at c070b0cb
 imx6-dongle#6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 imx6-dongle#7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 imx6-dongle#8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 imx6-dongle#9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
imx6-dongle#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
imx6-dongle#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
imx6-dongle#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
imx6-dongle#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
imx6-dongle#14 [c5377f58] process_one_work at c024ee4c
imx6-dongle#15 [c5377f98] worker_thread at c024f43d
imx6-dongle#16 [c5377fbc] kthread at c025326b
imx6-dongle#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 imx6-dongle#2 [cde0fe30] schedule_timeout at c0705600
 imx6-dongle#3 [cde0fe94] __down_common at c0706098
 imx6-dongle#4 [cde0fec8] __down at c0706122
 imx6-dongle#5 [cde0fed0] down at c025936f
 imx6-dongle#6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 imx6-dongle#7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 imx6-dongle#8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 imx6-dongle#9 [cde0ff1c] generic_shutdown_super at c0333d7a
imx6-dongle#10 [cde0ff38] kill_block_super at c0333e0f
imx6-dongle#11 [cde0ff48] deactivate_locked_super at c0334218
imx6-dongle#12 [cde0ff58] deactivate_super at c033495d
imx6-dongle#13 [cde0ff68] mntput_no_expire at c034bc13
imx6-dongle#14 [cde0ff7c] sys_umount at c034cc69
imx6-dongle#15 [cde0ffa0] sys_oldumount at c034ccd4
imx6-dongle#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Mark Tinguely <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Acked-by: Gavin Shan <[email protected]>

=====================================
[ BUG: bad unlock balance detected! ]
3.6.0-rc5-00338-gcaa1d63-dirty imx6-dongle#6 Not tainted
-------------------------------------
swapper/0/1 is trying to release lock (eeh_mutex) at:
[<c000000000058218>] .eeh_add_to_parent_pe+0x318/0x410
but there are no more locks to release!

other info that might help us debug this:
no locks held by swapper/0/1.

stack backtrace:
Call Trace:
[c00000003e483870] [c000000000013310] .show_stack+0x70/0x1c0 (unreliable)
[c00000003e483920] [c0000000000d8310] .print_unlock_inbalance_bug+0x110/0x120
[c00000003e4839b0] [c0000000000d9a50] .lock_release+0x1d0/0x240
[c00000003e483a60] [c000000000778064] .__mutex_unlock_slowpath+0xb4/0x250
[c00000003e483b10] [c000000000058218] .eeh_add_to_parent_pe+0x318/0x410
[c00000003e483bc0] [c00000000005a118] .pseries_eeh_of_probe+0x258/0x2f0
[c00000003e483cc0] [c000000000032528] .traverse_pci_devices+0xa8/0x150
[c00000003e483d70] [c000000000aa7288] .eeh_init+0xd4/0x140
[c00000003e483e00] [c00000000000abc4] .do_one_initcall+0x64/0x1e0
[c00000003e483ec0] [c000000000a90418] .kernel_init+0x1e8/0x2bc
[c00000003e483f90] [c00000000002048c] .kernel_thread+0x54/0x70
EEH: PCI Enhanced I/O Error Handling Enabled

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
off, never to be seen again.  In the case where this occurred, an exiting
thread hit reiserfs homebrew conditional resched while holding a mutex,
bringing the box to its knees.

PID: 18105  TASK: ffff8807fd412180  CPU: 5   COMMAND: "kdmflush"
 #0 [ffff8808157e7670] schedule at ffffffff8143f489
 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
 imx6-dongle#2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
 imx6-dongle#3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
 imx6-dongle#4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
 imx6-dongle#5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
 imx6-dongle#6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
 imx6-dongle#7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
 imx6-dongle#8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
 imx6-dongle#9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
    [exception RIP: kernel_thread_helper]
    RIP: ffffffff8144a5c0  RSP: ffff8808157e7f58  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff8107af60  RDI: ffff8803ee491d18
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
The check in omap_musb_mailbox does not properly check if the module has
been fully initialized. The patch fixes that, and the kernel panic below:

$ modprobe twl4030-usb
[   13.924743] twl4030_usb twl4030-usb.33: HW_CONDITIONS 0xe0/224; link 3
[   13.940307] Unable to handle kernel NULL pointer dereference at virtual address 00000004
[   13.948883] pgd = ef27c000
[   13.951751] [00000004] *pgd=af256831, *pte=00000000, *ppte=00000000
[   13.958374] Internal error: Oops: 17 [#1] ARM
[   13.962921] Modules linked in: twl4030_usb(+) omap2430 libcomposite
[   13.969543] CPU: 0    Not tainted  (3.8.0-rc1-n9xx-11758-ge37a37c-dirty imx6-dongle#6)
[   13.976867] PC is at omap_musb_mailbox+0x18/0x54 [omap2430]
[   13.982727] LR is at twl4030_usb_probe+0x240/0x354 [twl4030_usb]
[   13.989013] pc : [<bf013b6c>]    lr : [<bf018958>]    psr: 60000013
[   13.989013] sp : ef273cf0  ip : ef273d08  fp : ef273d04
[   14.001068] r10: bf01b000  r9 : bf0191d8  r8 : 00000001
[   14.006530] r7 : 00000000  r6 : ef140e10  r5 : 00000003  r4 : 00000000
[   14.013397] r3 : bf0142dc  r2 : 00000006  r1 : 00000000  r0 : 00000003
[   14.020233] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   14.027740] Control: 10c5387d  Table: af27c019  DAC: 00000015
[   14.033752] Process modprobe (pid: 616, stack limit = 0xef272238)
[   14.040161] Stack: (0xef273cf0 to 0xef274000)
[   14.044708] 3ce0:                                     ef254310 00000001 ef273d34 ef273d08
[   14.053314] 3d00: bf018958 bf013b60 bf0190a4 ef254310 c0101550 c0c3a138 ef140e10 ef140e44
[   14.061889] 3d20: bf019150 00000001 ef273d44 ef273d38 c019890c bf018724 ef273d64 ef273d48
[   14.070495] 3d40: c01974fc c01988f8 ef140e10 bf019150 ef140e44 00000000 ef273d84 ef273d68
[   14.079071] 3d60: c0197728 c019748c c0197694 00000000 bf019150 c0197694 ef273dac ef273d88
[   14.087677] 3d80: c0195c38 c01976a0 ef03610c ef143eb0 c0128954 ef254780 bf019150 c0b19548
[   14.096252] 3da0: ef273dbc ef273db0 c0197098 c0195bf0 ef273dec ef273dc0 c0196c98 c0197080
[   14.104858] 3dc0: bf0190a4 c0b27bc0 ef273dec bf019150 bf019190 c0b27bc0 ef272000 00000001
[   14.113433] 3de0: ef273e14 ef273df0 c0197c18 c0196b30 ef273f48 bf019190 c0b27bc0 ef272000
[   14.122039] 3e00: 00000001 bf01b000 ef273e24 ef273e18 c0198b28 c0197ba4 ef273e34 ef273e28
[   14.130615] 3e20: bf01b014 c0198ae8 ef273e8c ef273e38 c0008918 bf01b00c c004f730 c012ba1c
[   14.139221] 3e40: ef273e74 00000000 c00505b0 c004f72c 00000000 ef273e60 ef273f48 bf019190
[   14.147796] 3e60: 00000001 ef273f48 bf019190 00000001 ef286340 00000001 bf0191d8 c0065414
[   14.156402] 3e80: ef273f44 ef273e90 c0067754 c00087fc bf01919c 00007fff c0064794 00000000
[   14.164978] 3ea0: ef273ecc f0064000 00000001 ef272000 ef272000 00067f39 bf0192b0 bf01919c
[   14.173583] 3ec0: ef273f0c ef273ed0 c00a6bf0 c00a53fc ff000000 000000d2 c0067dc8 00000000
[   14.182159] 3ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   14.190765] 3f00: 00000000 00000000 00000000 00000000 00000000 00000000 ffffffff 00002968
[   14.199340] 3f20: 00080878 00067f39 00000080 c000e2e8 ef272000 00000000 ef273fa4 ef273f48
[   14.207946] 3f40: c0067e54 c0066188 f0064000 00002968 f0065530 f0065463 f0065fb0 000012c4
[   14.216522] 3f60: 00001664 00000000 00000000 00000000 00000014 00000015 0000000c 00000000
[   14.225128] 3f80: 00000008 00000000 00000000 00080370 00080878 0007422c 00000000 ef273fa8
[   14.233703] 3fa0: c000e140 c0067d80 00080370 00080878 00080878 00002968 00067f39 00000000
[   14.242309] 3fc0: 00080370 00080878 0007422c 00000080 00074030 00067f39 bec7aef8 00000000
[   14.250885] 3fe0: b6f05300 bec7ab68 0000e93c b6f05310 60000010 00080878 af7fe821 af7fec21
[   14.259460] Backtrace:
[   14.262054] [<bf013b54>] (omap_musb_mailbox+0x0/0x54 [omap2430]) from [<bf018958>] (twl4030_usb_probe+0x240/0x354 [twl4030_usb])
[   14.274200]  r5:00000001 r4:ef254310
[   14.277984] [<bf018718>] (twl4030_usb_probe+0x0/0x354 [twl4030_usb]) from [<c019890c>] (platform_drv_probe+0x20/0x24)
[   14.289123]  r8:00000001 r7:bf019150 r6:ef140e44 r5:ef140e10 r4:c0c3a138
[   14.296203] [<c01988ec>] (platform_drv_probe+0x0/0x24) from [<c01974fc>] (driver_probe_device+0x7c/0x214)
[   14.306243] [<c0197480>] (driver_probe_device+0x0/0x214) from [<c0197728>] (__driver_attach+0x94/0x98)
[   14.316009]  r7:00000000 r6:ef140e44 r5:bf019150 r4:ef140e10
[   14.321990] [<c0197694>] (__driver_attach+0x0/0x98) from [<c0195c38>] (bus_for_each_dev+0x54/0x88)
[   14.331390]  r6:c0197694 r5:bf019150 r4:00000000 r3:c0197694
[   14.337371] [<c0195be4>] (bus_for_each_dev+0x0/0x88) from [<c0197098>] (driver_attach+0x24/0x28)
[   14.346588]  r6:c0b19548 r5:bf019150 r4:ef254780
[   14.351440] [<c0197074>] (driver_attach+0x0/0x28) from [<c0196c98>] (bus_add_driver+0x174/0x244)
[   14.360687] [<c0196b24>] (bus_add_driver+0x0/0x244) from [<c0197c18>] (driver_register+0x80/0x154)
[   14.370086]  r8:00000001 r7:ef272000 r6:c0b27bc0 r5:bf019190 r4:bf019150
[   14.377136] [<c0197b98>] (driver_register+0x0/0x154) from [<c0198b28>] (platform_driver_register+0x4c/0x60)
[   14.387390] [<c0198adc>] (platform_driver_register+0x0/0x60) from [<bf01b014>] (twl4030_usb_init+0x14/0x1c [twl4030_usb])
[   14.398895] [<bf01b000>] (twl4030_usb_init+0x0/0x1c [twl4030_usb]) from [<c0008918>] (do_one_initcall+0x128/0x1a8)
[   14.409790] [<c00087f0>] (do_one_initcall+0x0/0x1a8) from [<c0067754>] (load_module+0x15d8/0x1bf8)
[   14.419189] [<c006617c>] (load_module+0x0/0x1bf8) from [<c0067e54>] (sys_init_module+0xe0/0xf4)
[   14.428344] [<c0067d74>] (sys_init_module+0x0/0xf4) from [<c000e140>] (ret_fast_syscall+0x0/0x30)
[   14.437652]  r6:0007422c r5:00080878 r4:00080370
[   14.442504] Code: e24cb004 e59f3038 e1a05000 e593401c (e5940004)
[   14.448944] ---[ end trace dbf47e5bc5ba03c2 ]---
[   14.453826] Kernel panic - not syncing: Fatal exception

Signed-off-by: Aaro Koskinen <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
…ional

All the current platforms can work with 0x8000_0000 based dma_addr_t
since the Bus Bridges typically ignore the top bit (the only excpetion
was Angel4 PCI-AHB bridge which we no longer care for).
That way we don't need plat-specific cpu-addr to bus-addr conversion.

Hooks still provided - just in case a platform has an obscure device
which say needs 0 based bus address.

That way <asm/dma_mapping.h> no longer needs to unconditinally include
<plat/dma_addr.h>

Also verfied that on Angel4 board, other peripherals (IDE-disk / EMAC)
work fine with 0x8000_0000 based dma addresses.

Signed-off-by: Vineet Gupta <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
…ernel/git/vgupta/arc

Pull new ARC architecture from Vineet Gupta:
 "Initial ARC Linux port with some fixes on top for 3.9-rc1:

  I would like to introduce the Linux port to ARC Processors (from
  Synopsys) for 3.9-rc1.  The patch-set has been discussed on the public
  lists since Nov and has received a fair bit of review, specially from
  Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb...

  The arch bits are in arch/arc, some asm-generic changes (acked by
  Arnd), a minor change to PARISC (acked by Helge).

  The series is a touch bigger for a new port for 2 main reasons:

   1. It enables a basic kernel in first sub-series and adds
      ptrace/kgdb/.. later

   2. Some of the fallout of review (DeviceTree support, multi-platform-
      image support) were added on top of orig series, primarily to
      record the revision history.

  This updated pull request additionally contains

   - fixes due to our GNU tools catching up with the new syscall/ptrace
     ABI

   - some (minor) cross-arch Kconfig updates."

* tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits)
  ARC: split elf.h into uapi and export it for userspace
  ARC: Fixup the current ABI version
  ARC: gdbserver using regset interface possibly broken
  ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window
  ARC: make a copy of flat DT
  ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed"
  ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled
  ARC: Fix pt_orig_r8 access
  ARC: [3.9] Fallout of hlist iterator update
  ARC: 64bit RTSC timestamp hardware issue
  ARC: Don't fiddle with non-existent caches
  ARC: Add self to MAINTAINERS
  ARC: Provide a default serial.h for uart drivers needing BASE_BAUD
  ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux
  ARC: [Review] Multi-platform image imx6-dongle#8: platform registers SMP callbacks
  ARC: [Review] Multi-platform image imx6-dongle#7: SMP common code to use callbacks
  ARC: [Review] Multi-platform image imx6-dongle#6: cpu-to-dma-addr optional
  ARC: [Review] Multi-platform image imx6-dongle#5: NR_IRQS defined by ARC core
  ARC: [Review] Multi-platform image imx6-dongle#4: Isolate platform headers
  ARC: [Review] Multi-platform image imx6-dongle#3: switch to board callback
  ...
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    imx6-dongle#2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    imx6-dongle#3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    imx6-dongle#4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    imx6-dongle#5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    imx6-dongle#6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    imx6-dongle#7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    imx6-dongle#8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    imx6-dongle#9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    imx6-dongle#10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    imx6-dongle#11 <signal handler called>
    imx6-dongle#12 0x0000000000000000 in ?? ()
    imx6-dongle#13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533		struct sock *sk = vn->sock->sk;
    (gdb) l
    528	static int vxlan_leave_group(struct net_device *dev)
    529	{
    530		struct vxlan_dev *vxlan = netdev_priv(dev);
    531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532		int err = 0;
    533		struct sock *sk = vn->sock->sk;
    534		struct ip_mreqn mreq = {
    535			.imr_multiaddr.s_addr	= vxlan->gaddr,
    536			.imr_ifindex		= vxlan->link,
    537		};
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
clk inits on OMAP happen quite early, even before slab is available.
The dependency comes from the fact that the timer init code starts to
use clocks and hwmod and we need clocks to be initialized by then.

There are various problems doing clk inits this early, one is,
not being able to do dynamic clk registrations and hence the
dependency on clk-private.h. The other is, inability to debug
early kernel crashes without enabling DEBUG_LL and earlyprintk.

Doing early clk init also exposed another instance of a kernel
panic due to a BUG() when CONFIG_DEBUG_SLAB is enabled.

[    0.000000] Kernel BUG at c01174f8 [verbose debug info unavailable]
[    0.000000] Internal error: Oops - BUG: 0 [#1] SMP ARM
[    0.000000] Modules linked in:
[    0.000000] CPU: 0    Not tainted  (3.9.0-rc1-12179-g72d48f9 imx6-dongle#6)
[    0.000000] PC is at __kmalloc+0x1d4/0x248
[    0.000000] LR is at __clk_init+0x2e0/0x364
[    0.000000] pc : [<c01174f8>]    lr : [<c0441f54>]    psr: 600001d3
[    0.000000] sp : c076ff28  ip : c065cefc  fp : c0441f54
[    0.000000] r10: 0000001c  r9 : 000080d0  r8 : c076ffd4
[    0.000000] r7 : c074b578  r6 : c0794d88  r5 : 00000040  r4 : 00000000
[    0.000000] r3 : 00000000  r2 : c07cac70  r1 : 000080d0  r0 : 0000001c
[    0.000000] Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment kernel
[    0.000000] Control: 10c53c7d  Table: 8000404a  DAC: 00000017
[    0.000000] Process swapper (pid: 0, stack limit = 0xc076e240)
[    0.000000] Stack: (0xc076ff28 to 0xc0770000)
[    0.000000] ff20:                   22222222 c0794ec8 c06546e8 00000000 00000040 c0794d88
[    0.000000] ff40: c074b578 c076ffd4 c07951c8 c076e000 00000000 c0441f54 c074b578 c076ffd4
[    0.000000] ff60: c0793828 00000040 c0794d88 c074b578 c076ffd4 c0776900 c076e000 c07272ac
[    0.000000] ff80: 2f800000 c074c968 c07f93d0 c0719780 c076ffa0 c076ff98 00000000 00000000
[    0.000000] ffa0: 00000000 00000000 00000000 00000001 c074cd6c c077b1ec 8000406a c0715724
[    0.000000] ffc0: 00000000 00000000 00000000 00000000 00000000 c074c968 10c53c7d c0776974
[    0.000000] ffe0: c074cd6c c077b1ec 8000406a 411fc092 00000000 80008074 00000000 00000000
[    0.000000] [<c01174f8>] (__kmalloc+0x1d4/0x248) from [<c0441f54>] (__clk_init+0x2e0/0x364)
[    0.000000] [<c0441f54>] (__clk_init+0x2e0/0x364) from [<c07272ac>] (omap4xxx_clk_init+0xbc/0x140)
[    0.000000] [<c07272ac>] (omap4xxx_clk_init+0xbc/0x140) from [<c0719780>] (setup_arch+0x15c/0x284)
[    0.000000] [<c0719780>] (setup_arch+0x15c/0x284) from [<c0715724>] (start_kernel+0x7c/0x334)
[    0.000000] [<c0715724>] (start_kernel+0x7c/0x334) from [<80008074>] (0x80008074)
[    0.000000] Code: e5883004 e1a00006 e28dd00c e8bd8ff0 (e7f001f2)
[    0.000000] ---[ end trace 1b75b31a2719ed1c ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!

It was a know issue, that slab allocations would fail when common
clock core tries to cache parent pointers for mux clocks on OMAP,
and hence a patch 'clk: Allow late cache allocation for clk->parents,
commit 7975059' was added to work this problem around.
A BUG() within kmalloc() with CONFIG_DEBUG_SLAB enabled was completely
overlooked causing this regression.

More details on the issue reported can be found here,
http://www.mail-archive.com/[email protected]/msg85932.html

With all these issues around clk inits happening way too early, it
makes sense to at least move them to a point where dynamic memory
allocations are possible. So move them to a point just before the
timer code starts using clocks and hwmod.

This should at least pave way for clk inits on OMAP moving to dynamic
clock registrations instead of using the static macros defined in
clk-private.h.

The issue with kernel panic while CONFIG_DEBUG_SLAB is enabled
was reported by Piotr Haber and Tony Lindgren and this patch
fixes the reported issue as well.

Reported-by: Piotr Haber <[email protected]>
Reported-by: Tony Lindgren <[email protected]>
Signed-off-by: Rajendra Nayak <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Reviewed-by: Mike Turquette <[email protected]>
Acked-by: Paul Walmsley <[email protected]>
Cc: [email protected]  # v3.8
Signed-off-by: Tony Lindgren <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
The settimeofday01 test in the LTP testsuite effectively does

        gettimeofday(current time);
        settimeofday(Jan 1, 1970 + 100 seconds);
        settimeofday(current time);

This test causes a stack trace to be displayed on the console during the
setting of timeofday to Jan 1, 1970 + 100 seconds:

[  131.066751] ------------[ cut here ]------------
[  131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
[  131.104935] Hardware name: Dinar
[  131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
[  131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ imx6-dongle#6
[  131.182248] Call Trace:
[  131.184684]  <IRQ>  [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
[  131.191312]  [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
[  131.197131]  [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
[  131.203721]  [<ffffffff810bb584>] tick_program_event+0x24/0x30
[  131.209534]  [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
[  131.215437]  [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
[  131.221509]  [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
[  131.227839]  [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
[  131.233816]  <EOI>  [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
[  131.240267]  [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
[  131.246252]  [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
[  131.252238]  [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
[  131.257877]  [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
[  131.263692]  [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
[  131.268727]  [<ffffffff815f8971>] start_secondary+0x255/0x257
[  131.274449] ---[ end trace 1151a50552231615 ]---

When we change the system time to a low value like this, the value of
timekeeper->offs_real will be a negative value.

It seems that the WARN occurs because an hrtimer has been started in the time
between the releasing of the timekeeper lock and the IPI call (via a call to
on_each_cpu) in clock_was_set() in the do_settimeofday() code.  The end result
is that a REALTIME_CLOCK timer has been added with softexpires = expires =
KTIME_MAX.  The hrtimer_interrupt() fires/is called and the loop at
kernel/hrtimer.c:1289 is executed.  In this loop the code subtracts the
clock base's offset (which was set to timekeeper->offs_real in
do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
was KTIME_MAX):

	KTIME_MAX - (a negative value) = overflow

A simple check for an overflow can resolve this problem.  Using KTIME_MAX
instead of the overflow value will result in the hrtimer function being run,
and the reprogramming of the timer after that.

Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
[jstultz: Tweaked commit subject]
Signed-off-by: John Stultz <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Or Gerlitz says:

====================
This series adds support for the SRIOV ndo_set_vf callbacks to the mlx4 driver.

Series done against the net-next tree as of commit 0c50134 "batman-adv: fix
global protection fault during soft_iface destruction".

We have successfully tested the series on net-next, except for getting
the VF link info issue I have reported earlier today on netdev, we
see the problem for both ixgbe and mlx4 VFs. Just to make sure get
VF config is working OK with patch imx6-dongle#6 - we have run it over 3.8.8 too.
====================

Signed-off-by: David S. Miller <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Or Gerlitz says:

====================
This series adds support for the SRIOV ndo_set_vf callbacks to the mlx4 driver.

Series done against the net-next tree as of commit 37fe066 "net:
fix address check in rtnl_fdb_del"

We have successfully tested the series on net-next, except for getting
the VF link info issue I have reported earlier today on netdev, we
see the problem for both ixgbe and mlx4 VFs. Just to make sure get
VF config is working OK with patch imx6-dongle#6 - we have run it over 3.8.8 too.

We added to the V1 series two patches that disable HW timestamping
when running over a VF, as this isn't supported yet.
====================

Signed-off-by: David S. Miller <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Toralf reported the following oops to the linux-nfs mailing list:

    -----------------[snip]------------------
    NFSD: unable to generate recoverydir name (-2).
    NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!
    BUG: unable to handle kernel NULL pointer dereference at 000003c8
    IP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
    *pdpt = 000000002ba33001 *pde = 0000000000000000
    Oops: 0000 [#1] SMP
    Modules linked in: loop nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun arc4 iwldvm mac80211 coretemp kvm_intel uvcvideo sdhci_pci sdhci mmc_core videobuf2_vmalloc videobuf2_memops usblp videobuf2_core i915 iwlwifi psmouse videodev cfg80211 kvm fbcon bitblit cfbfillrect acpi_cpufreq mperf evdev softcursor font cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt drm_kms_helper snd_hda_codec_conexant drm agpgart fb fbdev tpm_tis thinkpad_acpi tpm nvram e1000e rfkill thermal ptp wmi pps_core tpm_bios 8250_pci processor 8250 ac snd_hda_intel snd_hda_codec snd_pcm battery video i2c_i801 snd_page_alloc snd_timer button serial_core i2c_core snd soundcore thermal_sys hwmon aesni_intel ablk_helper cryp
td lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
    Pid: 6374, comm: nfsd Not tainted 3.9.1 imx6-dongle#6 LENOVO 4180F65/4180F65
    EIP: 0060:[<f90a3d91>] EFLAGS: 00010202 CPU: 0
    EIP is at nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
    EAX: 00000000 EBX: fffffffe ECX: 00000007 EDX: 00000007
    ESI: eb9dcb00 EDI: eb2991c0 EBP: eb2bde38 ESP: eb2bde34
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    CR0: 80050033 CR2: 000003c8 CR3: 2ba80000 CR4: 000407f0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff0ff0 DR7: 00000400
    Process nfsd (pid: 6374, ti=eb2bc000 task=eb2711c0 task.ti=eb2bc000)
    Stack:
    fffffffe eb2bde4c f90a3e0c f90a7754 fffffffe eb0a9c00 eb2bdea0 f90a41ed
    eb2991c0 1b270000 eb2991c0 eb2bde7c f9099ce9 eb2bde98 0129a020 eb29a020
    eb2bdecc eb2991c0 eb2bdea8 f9099da5 00000000 eb9dcb00 00000001 67822f08
    Call Trace:
    [<f90a3e0c>] legacy_recdir_name_error+0x3c/0x40 [nfsd]
    [<f90a41ed>] nfsd4_create_clid_dir+0x15d/0x1c0 [nfsd]
    [<f9099ce9>] ? nfsd4_lookup_stateid+0x99/0xd0 [nfsd]
    [<f9099da5>] ? nfs4_preprocess_seqid_op+0x85/0x100 [nfsd]
    [<f90a4287>] nfsd4_client_record_create+0x37/0x50 [nfsd]
    [<f909d6ce>] nfsd4_open_confirm+0xfe/0x130 [nfsd]
    [<f90980b1>] ? nfsd4_encode_operation+0x61/0x90 [nfsd]
    [<f909d5d0>] ? nfsd4_free_stateid+0xc0/0xc0 [nfsd]
    [<f908fd0b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
    [<f9081b7b>] nfsd_dispatch+0x8b/0x1a0 [nfsd]
    [<f857b85d>] svc_process+0x3dd/0x640 [sunrpc]
    [<f908165d>] nfsd+0xad/0x110 [nfsd]
    [<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
    [<c1054824>] kthread+0x94/0xa0
    [<c1486937>] ret_from_kernel_thread+0x1b/0x28
    [<c1054790>] ? flush_kthread_work+0xd0/0xd0
    Code: 86 b0 00 00 00 90 c5 0a f9 c7 04 24 70 76 0a f9 e8 74 a9 3d c8 eb ba 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 15 68 c7 0a f9 85 d2 <8b> 88 c8 03 00 00 74 2c 3b 11 77 28 8b 5c 91 08 85 db 74 22 8b
    EIP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] SS:ESP 0068:eb2bde34
    CR2: 00000000000003c8
    ---[ end trace 09e54015d145c9c6 ]---

The problem appears to be a regression that was introduced in commit
9a9c647 "nfsd: make NFSv4 recovery client tracking options per net".
Prior to that commit, it was safe to pass a NULL net pointer to
nfsd4_client_tracking_exit in the legacy recdir case, and
legacy_recdir_name_error did so. After that comit, the net pointer must
be valid.

This patch just fixes legacy_recdir_name_error to pass in a valid net
pointer to that function.

Cc: <[email protected]> # v3.8+
Cc: Stanislav Kinsbursky <[email protected]>
Reported-and-tested-by: Toralf Förster <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:

imx6-dongle#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
imx6-dongle#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
imx6-dongle#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
imx6-dongle#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
imx6-dongle#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
imx6-dongle#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
imx6-dongle#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
imx6-dongle#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
imx6-dongle#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596

Daniel found a similar problem mentioned in
 http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html

And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.

We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.

A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.

Many thanks to Daniel for providing reports, patches and testing !

Reported-by: Daniel Petre <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
The following backtrace is reported with CONFIG_PROVE_RCU:

    drivers/infiniband/hw/qib/qib_keys.c:64 suspicious rcu_dereference_check() usage!
    other info that might help us debug this:
    rcu_scheduler_active = 1, debug_locks = 1
    4 locks held by kworker/0:1/56:
    #0:  (events){.+.+.+}, at: [<ffffffff8107a4f5>] process_one_work+0x165/0x4a0
    #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff8107a4f5>] process_one_work+0x165/0x4a0
    imx6-dongle#2:  (device_mutex){+.+.+.}, at: [<ffffffffa0148dd8>] ib_register_device+0x38/0x220 [ib_core]
    imx6-dongle#3:  (&(&dev->lk_table.lock)->rlock){......}, at: [<ffffffffa017e81c>] qib_alloc_lkey+0x3c/0x1b0 [ib_qib]

    stack backtrace:
    Pid: 56, comm: kworker/0:1 Not tainted 3.10.0-rc1+ imx6-dongle#6
    Call Trace:
    [<ffffffff810c0b85>] lockdep_rcu_suspicious+0xe5/0x130
    [<ffffffffa017e8e1>] qib_alloc_lkey+0x101/0x1b0 [ib_qib]
    [<ffffffffa0184886>] qib_get_dma_mr+0xa6/0xd0 [ib_qib]
    [<ffffffffa01461aa>] ib_get_dma_mr+0x1a/0x50 [ib_core]
    [<ffffffffa01678dc>] ib_mad_port_open+0x12c/0x390 [ib_mad]
    [<ffffffff810c2c55>] ?  trace_hardirqs_on_caller+0x105/0x190
    [<ffffffffa0167b92>] ib_mad_init_device+0x52/0x110 [ib_mad]
    [<ffffffffa01917c0>] ?  sl2vl_attr_show+0x30/0x30 [ib_qib]
    [<ffffffffa0148f49>] ib_register_device+0x1a9/0x220 [ib_core]
    [<ffffffffa01b1685>] qib_register_ib_device+0x735/0xa40 [ib_qib]
    [<ffffffff8106ba98>] ? mod_timer+0x118/0x220
    [<ffffffffa017d425>] qib_init_one+0x1e5/0x400 [ib_qib]
    [<ffffffff812ce86e>] local_pci_probe+0x4e/0x90
    [<ffffffff81078118>] work_for_cpu_fn+0x18/0x30
    [<ffffffff8107a566>] process_one_work+0x1d6/0x4a0
    [<ffffffff8107a4f5>] ?  process_one_work+0x165/0x4a0
    [<ffffffff8107c9c9>] worker_thread+0x119/0x370
    [<ffffffff8107c8b0>] ?  manage_workers+0x180/0x180
    [<ffffffff8108294e>] kthread+0xee/0x100
    [<ffffffff81082860>] ?  __init_kthread_worker+0x70/0x70
    [<ffffffff815c04ac>] ret_from_fork+0x7c/0xb0
    [<ffffffff81082860>] ?  __init_kthread_worker+0x70/0x70

Per Documentation/RCU/lockdep-splat.txt, the code now uses rcu_access_pointer()
vs. rcu_dereference().

Reported-by: Jay Fenlason <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
In Steven Rostedt's words:

> I've been debugging the last couple of days why my tests have been
> locking up. One of my tracing tests, runs all available tracers. The
> lockup always happened with the mmiotrace, which is used to trace
> interactions between priority drivers and the kernel. But to do this
> easily, when the tracer gets registered, it disables all but the boot
> CPUs. The lockup always happened after it got done disabling the CPUs.
>
> Then I decided to try this:
>
> while :; do
> 	for i in 1 2 3; do
> 		echo 0 > /sys/devices/system/cpu/cpu$i/online
> 	done
> 	for i in 1 2 3; do
> 		echo 1 > /sys/devices/system/cpu/cpu$i/online
> 	done
> done
>
> Well, sure enough, that locked up too, with the same users. Doing a
> sysrq-w (showing all blocked tasks):
>
> [ 2991.344562]   task                        PC stack   pid father
> [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
> [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
> [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
> [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81541750>] schedule_timeout+0xbc/0xf9
> [ 2991.344562]  [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f
> [ 2991.344562]  [<ffffffff81049513>] ? cascade+0xa8/0xa8
> [ 2991.344562]  [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20
> [ 2991.344562]  [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b
> [ 2991.344562]  [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50
> [ 2991.344562]  [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
> [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
> [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
> [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8
> [ 2991.344562]  [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a
> [ 2991.344562]  [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3
> [ 2991.344562]  [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3
> [ 2991.344562]  [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1
> [ 2991.344562]  [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1
> [ 2991.344562]  [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5
> [ 2991.344562]  [<ffffffff81059365>] ? rescuer_thread+0x332/0x332
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
> [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
> [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53
> [ 2991.344562]  [<ffffffff81548912>] notifier_call_chain+0x6b/0x98
> [ 2991.344562]  [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10
> [ 2991.344562]  [<ffffffff8103cf64>] __cpu_notify+0x20/0x32
> [ 2991.344562]  [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36
> [ 2991.344562]  [<ffffffff815225de>] _cpu_down+0x154/0x259
> [ 2991.344562]  [<ffffffff81522710>] cpu_down+0x2d/0x3a
> [ 2991.344562]  [<ffffffff81526351>] store_online+0x4e/0xe7
> [ 2991.344562]  [<ffffffff8134d764>] dev_attr_store+0x20/0x22
> [ 2991.344562]  [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144
> [ 2991.344562]  [<ffffffff8114c5ef>] vfs_write+0xfd/0x158
> [ 2991.344562]  [<ffffffff8114c928>] SyS_write+0x5c/0x83
> [ 2991.344562]  [<ffffffff8154c494>] tracesys+0xdd/0xe2
>
> As well as held locks:
>
> [ 3034.728033] Showing all locks held in the system:
> [ 3034.728033] 1 lock held by rcu_preempt/10:
> [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b
> [ 3034.728033] 4 locks held by kworker/0:1/47:
> [ 3034.728033]  #0:  (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  imx6-dongle#2:  (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a
> [ 3034.728033]  imx6-dongle#3:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 3034.728033] 1 lock held by mingetty/2563:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2565:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2569:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2572:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2575:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 7 locks held by bash/2618:
> [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c
> [ 3034.728033]  #1:  (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144
> [ 3034.728033]  imx6-dongle#2:  (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144
> [ 3034.728033]  imx6-dongle#3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19
> [ 3034.728033]  imx6-dongle#4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19
> [ 3034.728033]  imx6-dongle#5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d
> [ 3034.728033]  imx6-dongle#6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 3034.728033] 1 lock held by bash/2980:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
>
> Things looked a little weird. Also, this is a deadlock that lockdep did
> not catch. But what we have here does not look like a circular lock
> issue:
>
> Bash is blocked in rcu_cpu_notify():
>
> 1961		/* Exclude any attempts to start a new grace period. */
> 1962		mutex_lock(&rsp->onoff_mutex);
>
>
> kworker is blocked in get_online_cpus(), which makes sense as we are
> currently taking down a CPU.
>
> But rcu_preempt is not blocked on anything. It is simply sleeping in
> rcu_gp_kthread (really rcu_gp_init) here:
>
> 1453	#ifdef CONFIG_PROVE_RCU_DELAY
> 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
> 1455			    system_state == SYSTEM_RUNNING)
> 1456				schedule_timeout_uninterruptible(2);
> 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
>
> And it does this while holding the onoff_mutex that bash is waiting for.
>
> Doing a function trace, it showed me where it happened:
>
> [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread
> [...]
> [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
>
> The watchdog ran, and then:
>
> [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
>
> Not sure what modprobe was doing, but shortly after that:
>
> [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
>
> Where the migration thread took down the CPU:
>
> [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
>
> which finally did:
>
> [  125.940066]   <idle>-0       3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry
> [  125.940066]   <idle>-0       3...1 28389282548: native_play_dead <-arch_cpu_idle_dead
> [  125.940066]   <idle>-0       3...1 28389282924: play_dead_common <-native_play_dead
> [  125.940066]   <idle>-0       3...1 28389283468: idle_task_exit <-play_dead_common
> [  125.940066]   <idle>-0       3...1 28389284644: amd_e400_remove_cpu <-play_dead_common
>
>
> CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
> doing a schedule_timeout_uninterruptible() and it registered it's
> timeout to the timer base for CPU 3. You would think that it would get
> migrated right? The issue here is that the timer migration happens at
> the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
> CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
> held by the thread that just put itself into a uninterruptible sleep,
> that wont wake up until the CPU_DEAD notifier of the timer
> infrastructure is called, which wont happen until the rcu notifier
> finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay.  This maintains the intensity
of the testing.

Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Steven Rostedt <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
commit 5b879d78bc0818aa710f5d4d9abbfc2aca075cc3 upstream.

When running the LTP testsuite one may hit this kernel BUG() with the
write06 testcase:

kernel BUG at mm/filemap.c:2023!
CPU: 1 PID: 8614 Comm: writev01 Not tainted 3.10.0-rc7-64bit-c3000+ imx6-dongle#6
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401e6e84 00000000401e6e88
 IIR: 03ffe01f    ISR: 0000000010340000  IOR: 000001fbe0380820
 CPU:        1   CR30: 00000000bef80000 CR31: ffffffffffffffff
 ORIG_R28: 00000000bdc192c0
 IAOQ[0]: iov_iter_advance+0x3c/0xc0
 IAOQ[1]: iov_iter_advance+0x40/0xc0
 RP(r2): generic_file_buffered_write+0x204/0x3f0
Backtrace:
 [<00000000401e764c>] generic_file_buffered_write+0x204/0x3f0
 [<00000000401eab24>] __generic_file_aio_write+0x244/0x448
 [<00000000401eadc0>] generic_file_aio_write+0x98/0x150
 [<000000004024f460>] do_sync_readv_writev+0xc0/0x130
 [<000000004025037c>] compat_do_readv_writev+0x12c/0x340
 [<00000000402505f8>] compat_writev+0x68/0xa0
 [<0000000040251d88>] compat_SyS_writev+0x98/0xf8

Reason for this crash is a gcc miscompilation in the fault handlers of
pa_memcpy() which return the fault address instead of the copied bytes.
Since this seems to be a generic problem with gcc-4.7.x (and below), it's
better to simplify the fault handlers in pa_memcpy to avoid this problem.

Here is a simple reproducer for the problem:

int main(int argc, char **argv)
{
	int fd, nbytes;
	struct iovec wr_iovec[] = {
		{ "TEST STRING                     ",32},
		{ (char*)0x40005000,32} }; // random memory.
	fd = open(DATA_FILE, O_RDWR | O_CREAT, 0666);
	nbytes = writev(fd, wr_iovec, 2);
	printf("return value = %d, errno %d (%s)\n",
		nbytes, errno, strerror(errno));
	return 0;
}

In addition, John David Anglin wrote:
There is no gcc PR as pa_memcpy is not legitimate C code. There is an
implicit assumption that certain variables will contain correct values
when an exception occurs and the code randomly jumps to one of the
exception blocks.  There is no guarantee of this.  If a PR was filed, it
would likely be marked as invalid.

Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
commit ea3768b4386a8d1790f4cc9a35de4f55b92d6442 upstream.

We used to keep the port's char device structs and the /sys entries
around till the last reference to the port was dropped.  This is
actually unnecessary, and resulted in buggy behaviour:

1. Open port in guest
2. Hot-unplug port
3. Hot-plug a port with the same 'name' property as the unplugged one

This resulted in hot-plug being unsuccessful, as a port with the same
name already exists (even though it was unplugged).

This behaviour resulted in a warning message like this one:

-------------------8<---------------------------------------
WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
Hardware name: KVM
sysfs: cannot create duplicate filename
'/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1'

Call Trace:
 [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130
 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0
 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50
 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260
 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60
 [<ffffffff812734b4>] ? kobject_add+0x44/0x70
 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0
 [<ffffffff8134b389>] ? device_add+0xc9/0x650

-------------------8<---------------------------------------

Instead of relying on guest applications to release all references to
the ports, we should go ahead and unregister the port from all the core
layers.  Any open/read calls on the port will then just return errors,
and an unplug/plug operation on the host will succeed as expected.

This also caused buggy behaviour in case of the device removal (not just
a port): when the device was removed (which means all ports on that
device are removed automatically as well), the ports with active
users would clean up only when the last references were dropped -- and
it would be too late then to be referencing char device pointers,
resulting in oopses:

-------------------8<---------------------------------------
PID: 6162   TASK: ffff8801147ad500  CPU: 0   COMMAND: "cat"
 #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b
 #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322
 imx6-dongle#2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50
 imx6-dongle#3 [ffff88011b9d5bf0] die at ffffffff8100f26b
 imx6-dongle#4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2
 imx6-dongle#5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5
    [exception RIP: strlen+2]
    RIP: ffffffff81272ae2  RSP: ffff88011b9d5d00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880118901c18  RCX: 0000000000000000
    RDX: ffff88011799982c  RSI: 00000000000000d0  RDI: 3a303030302f3030
    RBP: ffff88011b9d5d38   R8: 0000000000000006   R9: ffffffffa0134500
    R10: 0000000000001000  R11: 0000000000001000  R12: ffff880117a1cc10
    R13: 00000000000000d0  R14: 0000000000000017  R15: ffffffff81aff700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 imx6-dongle#6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d
 imx6-dongle#7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551
 imx6-dongle#8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb
 imx6-dongle#9 [ffff88011b9d5de0] device_del at ffffffff813440c7

-------------------8<---------------------------------------

So clean up when we have all the context, and all that's left to do when
the references to the port have dropped is to free up the port struct
itself.

Reported-by: chayang <[email protected]>
Reported-by: YOGANANTH SUBRAMANIAN <[email protected]>
Reported-by: FuXiangChun <[email protected]>
Reported-by: Qunfang Zhang <[email protected]>
Reported-by: Sibiao Luo <[email protected]>
Signed-off-by: Amit Shah <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mtx512 pushed a commit to mtx512/linux-imx that referenced this issue Jan 9, 2014
commit 346ece0b7ba2730b4d633b9e371fe55488803102 upstream.

Bug 60815 - Interface hangs in mwifiex_usb
https://bugzilla.kernel.org/show_bug.cgi?id=60815

[ 2.883807] BUG: unable to handle kernel NULL pointer dereference
            at 0000000000000048
[ 2.883813] IP: [<ffffffff815a65e0>] pfifo_fast_enqueue+0x90/0x90

[ 2.883834] CPU: 1 PID: 3220 Comm: kworker/u8:90 Not tainted
            3.11.1-monotone-l0 imx6-dongle#6
[ 2.883834] Hardware name: Microsoft Corporation Surface with
            Windows 8 Pro/Surface with Windows 8 Pro,
            BIOS 1.03.0450 03/29/2013

On Surface Pro, suspend to ram gives a NULL pointer dereference in
pfifo_fast_enqueue(). The stack trace reveals that the offending
call is clearing carrier in mwifiex_usb suspend handler.

Since commit 1499d9f "mwifiex: don't drop carrier flag over suspend"
has removed the carrier flag handling over suspend/resume in SDIO
and PCIe drivers, I'm removing it in USB driver too. This also fixes
the bug for Surface Pro.

Tested-by: Dmitry Khromov <[email protected]>
Signed-off-by: Bing Zhao <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants