Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello world #1

Open
pahapoika opened this issue Jun 20, 2017 · 3 comments
Open

Hello world #1

pahapoika opened this issue Jun 20, 2017 · 3 comments

Comments

@pahapoika
Copy link

Glad to see this already up, even tho it's placeholder atm.

I hope that you will keep your good track record in releasing kernel sources.

@xAphex
Copy link

xAphex commented Jun 20, 2017

same

@aviraxp
Copy link

aviraxp commented Jun 20, 2017

You guys may need translation software to read the kernel commits, but I am Chinese

Sent from my OnePlus ONE A2001 using FastHub

@yffengdong
Copy link

Hello World!

leonfish77101 pushed a commit that referenced this issue Jun 21, 2017
…age_cache_page()

commit 22f2ac5 upstream.

Antonio reports the following crash when using fuse under memory pressure:

  kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: all of them
  CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
  task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
  RIP: shadow_lru_isolate+0x181/0x190
  Call Trace:
    __list_lru_walk_one.isra.3+0x8f/0x130
    list_lru_walk_one+0x23/0x30
    scan_shadow_nodes+0x34/0x50
    shrink_slab.part.40+0x1ed/0x3d0
    shrink_zone+0x2ca/0x2e0
    kswapd+0x51e/0x990
    kthread+0xd8/0xf0
    ret_from_fork+0x3f/0x70

which corresponds to the following sanity check in the shadow node
tracking:

  BUG_ON(node->count & RADIX_TREE_COUNT_MASK);

The workingset code tracks radix tree nodes that exclusively contain
shadow entries of evicted pages in them, and this (somewhat obscure)
line checks whether there are real pages left that would interfere with
reclaim of the radix tree node under memory pressure.

While discussing ways how fuse might sneak pages into the radix tree
past the workingset code, Miklos pointed to replace_page_cache_page(),
and indeed there is a problem there: it properly accounts for the old
page being removed - __delete_from_page_cache() does that - but then
does a raw raw radix_tree_insert(), not accounting for the replacement
page.  Eventually the page count bits in node->count underflow while
leaving the node incorrectly linked to the shadow node LRU.

To address this, make sure replace_page_cache_page() uses the tracked
page insertion code, page_cache_tree_insert().  This fixes the page
accounting and makes sure page-containing nodes are properly unlinked
from the shadow node LRU again.

Also, make the sanity checks a bit less obscure by using the helpers for
checking the number of pages and shadows in a radix tree node.

[[email protected]: backport for 4.4]
Fixes: 449dd69 ("mm: keep page cache radix tree nodes in check")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Antonio SJ Musumeci <[email protected]>
Debugged-by: Miklos Szeredi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Git-commit: f84311d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Change-Id: Id2006db3dfc5091308d7dd51ffe2c2680949a47d
Signed-off-by: Vinayak Menon <[email protected]>
leonfish77101 pushed a commit that referenced this issue Jun 21, 2017
commit d3798ae upstream.

When the underflow checks were added to workingset_node_shadow_dec(),
they triggered immediately:

  kernel BUG at ./include/linux/swap.h:276!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
   soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
  CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1
  Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
  task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
  RIP: page_cache_tree_insert+0xf1/0x100
  Call Trace:
    __add_to_page_cache_locked+0x12e/0x270
    add_to_page_cache_lru+0x4e/0xe0
    mpage_readpages+0x112/0x1d0
    blkdev_readpages+0x1d/0x20
    __do_page_cache_readahead+0x1ad/0x290
    force_page_cache_readahead+0xaa/0x100
    page_cache_sync_readahead+0x3f/0x50
    generic_file_read_iter+0x5af/0x740
    blkdev_read_iter+0x35/0x40
    __vfs_read+0xe1/0x130
    vfs_read+0x96/0x130
    SyS_read+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x13/0x8f
  Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
  RIP  page_cache_tree_insert+0xf1/0x100

This is a long-standing bug in the way shadow entries are accounted in
the radix tree nodes. The shrinker needs to know when radix tree nodes
contain only shadow entries, no pages, so node->count is split in half
to count shadows in the upper bits and pages in the lower bits.

Unfortunately, the radix tree implementation doesn't know of this and
assumes all entries are in node->count. When there is a shadow entry
directly in root->rnode and the tree is later extended, the radix tree
implementation will copy that entry into the new node and and bump its
node->count, i.e. increases the page count bits. Once the shadow gets
removed and we subtract from the upper counter, node->count underflows
and triggers the warning. Afterwards, without node->count reaching 0
again, the radix tree node is leaked.

Limit shadow entries to when we have actual radix tree nodes and can
count them properly. That means we lose the ability to detect refaults
from files that had only the first page faulted in at eviction time.

Change-Id: I9846191d5dfdb72658ebcee8cfad1a89a814019d
Fixes: 449dd69 ("mm: keep page cache radix tree nodes in check")
Signed-off-by: Johannes Weiner <[email protected]>
Reported-and-tested-by: Linus Torvalds <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Git-commit: d3798ae
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Signed-off-by: Vinayak Menon <[email protected]>
nikhil18 referenced this issue in nikhil18/Lightning_kernel_oneplus5 Jul 16, 2017
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

 irq event stamp: 83391573
 loop: Write error at byte offset 438729728, length 1024.
 hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
 hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
 loop: Write error at byte offset 438860288, length 1536.
 softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
 loop: Write error at byte offset 438990848, length 2048.
 ================================
 WARNING: inconsistent lock state
 4.12.0-rc2+ OnePlusOSS#30 Tainted: G           O
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
 {HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
 irq event stamp: 1957420
 hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
 hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
 softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&sbi->cp_lock)->rlock);
   <Interrupt>
     lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

 2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]

 stack backtrace:
 CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ OnePlusOSS#30
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
 EIP: _raw_spin_unlock_irq+0x2d/0x50
 EFLAGS: 00000296 CPU: 2
 EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
 ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
 EIP: 0xb7786c61
 EFLAGS: 00000293 CPU: 2
 EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
 ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Fixes: aaec2b1 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: [email protected]
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
nikhil18 referenced this issue in nikhil18/Lightning_kernel_oneplus5 Jul 17, 2017
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

 irq event stamp: 83391573
 loop: Write error at byte offset 438729728, length 1024.
 hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
 hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
 loop: Write error at byte offset 438860288, length 1536.
 softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
 loop: Write error at byte offset 438990848, length 2048.
 ================================
 WARNING: inconsistent lock state
 4.12.0-rc2+ OnePlusOSS#30 Tainted: G           O
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
 {HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
 irq event stamp: 1957420
 hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
 hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
 softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&sbi->cp_lock)->rlock);
   <Interrupt>
     lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

 2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]

 stack backtrace:
 CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ OnePlusOSS#30
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
 EIP: _raw_spin_unlock_irq+0x2d/0x50
 EFLAGS: 00000296 CPU: 2
 EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
 ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
 EIP: 0xb7786c61
 EFLAGS: 00000293 CPU: 2
 EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
 ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Fixes: aaec2b1 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: [email protected]
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
nikhil18 referenced this issue in nikhil18/Lightning_kernel_oneplus5 Jul 17, 2017
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

 irq event stamp: 83391573
 loop: Write error at byte offset 438729728, length 1024.
 hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
 hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
 loop: Write error at byte offset 438860288, length 1536.
 softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
 loop: Write error at byte offset 438990848, length 2048.
 ================================
 WARNING: inconsistent lock state
 4.12.0-rc2+ OnePlusOSS#30 Tainted: G           O
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
 {HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
 irq event stamp: 1957420
 hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
 hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
 softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&sbi->cp_lock)->rlock);
   <Interrupt>
     lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

 2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]

 stack backtrace:
 CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ OnePlusOSS#30
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
 EIP: _raw_spin_unlock_irq+0x2d/0x50
 EFLAGS: 00000296 CPU: 2
 EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
 ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
 EIP: 0xb7786c61
 EFLAGS: 00000293 CPU: 2
 EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
 ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Fixes: aaec2b1 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: [email protected]
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Jul 22, 2017
commit f630c38 upstream.

When destroying a VRF device we cleanup the slaves in its ndo_uninit()
function, but that causes packets to be switched (skb->dev == vrf being
destroyed) even though we're pass the point where the VRF should be
receiving any packets while it is being dismantled. This causes a BUG_ON
to trigger if we have raw sockets (trace below).
The reason is that the inetdev of the VRF has been destroyed but we're
still sending packets up the stack with it, so let's free the slaves in
the dellink callback as David Ahern suggested.

Note that this fix doesn't prevent packets from going up when the VRF
device is admin down.

[   35.631371] ------------[ cut here ]------------
[   35.631603] kernel BUG at net/ipv4/fib_frontend.c:285!
[   35.631854] invalid opcode: 0000 [OnePlusOSS#1] SMP
[   35.631977] Modules linked in:
[   35.632081] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.12.0-rc7+ #45
[   35.632247] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   35.632477] task: ffff88005ad68000 task.stack: ffff88005ad64000
[   35.632632] RIP: 0010:fib_compute_spec_dst+0xfc/0x1ee
[   35.632769] RSP: 0018:ffff88005ad67978 EFLAGS: 00010202
[   35.632910] RAX: 0000000000000001 RBX: ffff880059a7f200 RCX: 0000000000000000
[   35.633084] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff82274af0
[   35.633256] RBP: ffff88005ad679f8 R08: 000000000001ef70 R09: 0000000000000046
[   35.633430] R10: ffff88005ad679f8 R11: ffff880037731cb0 R12: 0000000000000001
[   35.633603] R13: ffff8800599e3000 R14: 0000000000000000 R15: ffff8800599cb852
[   35.634114] FS:  0000000000000000(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000
[   35.634306] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   35.634456] CR2: 00007f3563227095 CR3: 000000000201d000 CR4: 00000000000406e0
[   35.634632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   35.634865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   35.635055] Call Trace:
[   35.635271]  ? __lock_acquire+0xf0d/0x1117
[   35.635522]  ipv4_pktinfo_prepare+0x82/0x151
[   35.635831]  raw_rcv_skb+0x17/0x3c
[   35.636062]  raw_rcv+0xe5/0xf7
[   35.636287]  raw_local_deliver+0x169/0x1d9
[   35.636534]  ip_local_deliver_finish+0x87/0x1c4
[   35.636820]  ip_local_deliver+0x63/0x7f
[   35.637058]  ip_rcv_finish+0x340/0x3a1
[   35.637295]  ip_rcv+0x314/0x34a
[   35.637525]  __netif_receive_skb_core+0x49f/0x7c5
[   35.637780]  ? lock_acquire+0x13f/0x1d7
[   35.638018]  ? lock_acquire+0x15e/0x1d7
[   35.638259]  __netif_receive_skb+0x1e/0x94
[   35.638502]  ? __netif_receive_skb+0x1e/0x94
[   35.638748]  netif_receive_skb_internal+0x74/0x300
[   35.639002]  ? dev_gro_receive+0x2ed/0x411
[   35.639246]  ? lock_is_held_type+0xc4/0xd2
[   35.639491]  napi_gro_receive+0x105/0x1a0
[   35.639736]  receive_buf+0xc32/0xc74
[   35.639965]  ? detach_buf+0x67/0x153
[   35.640201]  ? virtqueue_get_buf_ctx+0x120/0x176
[   35.640453]  virtnet_poll+0x128/0x1c5
[   35.640690]  net_rx_action+0x103/0x343
[   35.640932]  __do_softirq+0x1c7/0x4b7
[   35.641171]  run_ksoftirqd+0x23/0x5c
[   35.641403]  smpboot_thread_fn+0x24f/0x26d
[   35.641646]  ? sort_range+0x22/0x22
[   35.641878]  kthread+0x129/0x131
[   35.642104]  ? __list_add+0x31/0x31
[   35.642335]  ? __list_add+0x31/0x31
[   35.642568]  ret_from_fork+0x2a/0x40
[   35.642804] Code: 05 bd 87 a3 00 01 e8 1f ef 98 ff 4d 85 f6 48 c7 c7 f0 4a 27 82 41 0f 94 c4 31 c9 31 d2 41 0f b6 f4 e8 04 71 a1 ff 45 84 e4 74 02 <0f> 0b 0f b7 93 c4 00 00 00 4d 8b a5 80 05 00 00 48 03 93 d0 00
[   35.644342] RIP: fib_compute_spec_dst+0xfc/0x1ee RSP: ffff88005ad67978

Fixes: 193125d ("net: Introduce VRF device driver")
Reported-by: Chris Cormier <[email protected]>
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[backport to 4.4 - gregkh]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Jul 22, 2017
We use IPI instead of NMI in ARM64 to get backtrace of all CPUs.
However, smp_processor_id expects preempt disabled, and this leads to
a kernel BUG seen as below:

c3  19299 SysRq : Show backtrace of all active CPUs
c3  19299 BUG: using smp_processor_id() in preemptible [00000000] code: sh/19299
c3  19299 caller is debug_smp_processor_id+0x1c/0x24
c3  19299 CPU: 3 PID: 19299 Comm: sh Tainted: G        W      3.18.31-gb1be511 OnePlusOSS#1
c3  19299 Hardware name: HTC Corporation. MSM8996pro + PMI8996 Sailfish XB (DT)
c3  19299 Call trace:
c3  19299 [<ffffffc00008aab4>] dump_backtrace+0x0/0x23c
c3  19299 [<ffffffc00008ad04>] show_stack+0x14/0x1c
c3  19299 [<ffffffc000e29f38>] dump_stack+0x80/0xa4
c3  19299 [<ffffffc0003147a4>] check_preemption_disabled+0xe0/0x104
c3  19299 [<ffffffc0003147e4>] debug_smp_processor_id+0x1c/0x24
c3  19299 [<ffffffc000092b24>] arch_trigger_all_cpu_backtrace+0x18/0xdc
c3  19299 [<ffffffc00048dd64>] sysrq_handle_showallcpus+0xc/0x14
c3  19299 [<ffffffc00048e33c>] __handle_sysrq+0x9c/0x154
c3  19299 [<ffffffc00048e724>] write_sysrq_trigger+0x54/0x68
c3  19299 [<ffffffc0001f9c54>] proc_reg_write+0x68/0x84
c3  19299 [<ffffffc0001a6e0c>] vfs_write+0xbc/0x18c
c3  19299 [<ffffffc0001a6fd4>] SyS_write+0x54/0xa4
c3  19299 Backtrace for cpu 3 (current):
c3  19299 CPU: 3 PID: 19299 Comm: sh Tainted: G        W      3.18.31-gb1be511 OnePlusOSS#1
c3  19299 Hardware name: HTC Corporation. MSM8996pro + PMI8996 Sailfish XB (DT)
c3  19299 Call trace:
c3  19299 [<ffffffc00008aab4>] dump_backtrace+0x0/0x23c
c3  19299 [<ffffffc00008ad04>] show_stack+0x14/0x1c
c3  19299 [<ffffffc000e29f38>] dump_stack+0x80/0xa4
c3  19299 [<ffffffc000092b70>] arch_trigger_all_cpu_backtrace+0x64/0xdc
c3  19299 [<ffffffc00048dd64>] sysrq_handle_showallcpus+0xc/0x14
c3  19299 [<ffffffc00048e33c>] __handle_sysrq+0x9c/0x154
c3  19299 [<ffffffc00048e724>] write_sysrq_trigger+0x54/0x68
c3  19299 [<ffffffc0001f9c54>] proc_reg_write+0x68/0x84
c3  19299 [<ffffffc0001a6e0c>] vfs_write+0xbc/0x18c
c3  19299 [<ffffffc0001a6fd4>] SyS_write+0x54/0xa4
c3  19299
c3  19299 sending IPI to all other CPUs:
c3  19299 BUG: using smp_processor_id() in preemptible [00000000] code: sh/19299
c3  19299 caller is debug_smp_processor_id+0x1c/0x24
c3  19299 CPU: 3 PID: 19299 Comm: sh Tainted: G        W      3.18.31-gb1be511 OnePlusOSS#1
c3  19299 Hardware name: HTC Corporation. MSM8996pro + PMI8996 Sailfish XB (DT)
c3  19299 Call trace:
c3  19299 [<ffffffc00008aab4>] dump_backtrace+0x0/0x23c
c3  19299 [<ffffffc00008ad04>] show_stack+0x14/0x1c
c3  19299 [<ffffffc000e29f38>] dump_stack+0x80/0xa4
c3  19299 [<ffffffc0003147a4>] check_preemption_disabled+0xe0/0x104
c3  19299 [<ffffffc0003147e4>] debug_smp_processor_id+0x1c/0x24
c3  19299 [<ffffffc000327d24>] gic_raise_softirq+0x128/0x158
c3  19299 [<ffffffc000092828>] smp_cross_call_common+0x80/0x94
c3  19299 [<ffffffc000092b94>] arch_trigger_all_cpu_backtrace+0x88/0xdc
c3  19299 [<ffffffc00048dd64>] sysrq_handle_showallcpus+0xc/0x14
c3  19299 [<ffffffc00048e33c>] __handle_sysrq+0x9c/0x154
c3  19299 [<ffffffc00048e724>] write_sysrq_trigger+0x54/0x68
c3  19299 [<ffffffc0001f9c54>] proc_reg_write+0x68/0x84
c3  19299 [<ffffffc0001a6e0c>] vfs_write+0xbc/0x18c
c3  19299 [<ffffffc0001a6fd4>] SyS_write+0x54/0xa4

Bug: 35409577
Bug: 34193533
Test: on marlin, trigger sysrq
Change-Id: I0239f388c8ec63f10fc73a32bf8a7deddc408104
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Francisco Franco <[email protected]>
Signed-off-by: Joe Maples <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Jul 30, 2017
commit c46fc04 upstream.

Zorro reported following crash while having enabled
syscall tracing (CONFIG_FTRACE_SYSCALLS):

  Unable to handle kernel pointer dereference at virtual ...
  Oops: 0011 [OnePlusOSS#1] SMP DEBUG_PAGEALLOC

  SNIP

  Call Trace:
  ([<000000000024d79c>] ftrace_syscall_enter+0xec/0x1d8)
   [<00000000001099c6>] do_syscall_trace_enter+0x236/0x2f8
   [<0000000000730f1c>] sysc_tracesys+0x1a/0x32
   [<000003fffcf946a2>] 0x3fffcf946a2
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<000000000022dd44>] rb_event_data+0x34/0x40
  ---[ end trace 8c795f86b1b3f7b9 ]---

The crash happens in syscall_get_arguments function for
syscalls with zero arguments, that will try to access
first argument (args[0]) in event entry, but it's not
allocated.

Bail out of there are no arguments.

Reported-by: Zorro Lang <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Jul 30, 2017
commit cdea465 upstream.

A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 OnePlusOSS#1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 OnePlusOSS#2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 OnePlusOSS#3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 OnePlusOSS#4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 OnePlusOSS#5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 OnePlusOSS#6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 OnePlusOSS#7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 OnePlusOSS#8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 OnePlusOSS#9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <IRQ stack>
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   [email protected]....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

This was first introduced in 7ea0ed2 ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Jul 30, 2017
commit 340d394 upstream.

The driver checks port->exists twice in i8042_interrupt(), first when
trying to assign temporary "serio" variable, and second time when deciding
whether it should call serio_interrupt(). The value of port->exists may
change between the 2 checks, and we may end up calling serio_interrupt()
with a NULL pointer:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
IP: [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40
PGD 0
Oops: 0002 [OnePlusOSS#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 OnePlusOSS#1 QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:[<ffffffff8150feaf>]  [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40
RSP: 0018:ffff880028203cc0  EFLAGS: 00010082
RAX: 0000000000010000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000282 RSI: 0000000000000098 RDI: 0000000000000050
RBP: ffff880028203cc0 R08: ffff88013e79c000 R09: ffff880028203ee0
R10: 0000000000000298 R11: 0000000000000282 R12: 0000000000000050
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000098
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000050 CR3: 0000000001a85000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88013e79c000, task ffff88013e79b500)
Stack:
ffff880028203d00 ffffffff813de186 ffffffffffffff02 0000000000000000
<d> 0000000000000000 0000000000000000 0000000000000000 0000000000000098
<d> ffff880028203d70 ffffffff813e0162 ffff880028203d20 ffffffff8103b8ac
Call Trace:
<IRQ>
 [<ffffffff813de186>] serio_interrupt+0x36/0xa0
[<ffffffff813e0162>] i8042_interrupt+0x132/0x3a0
[<ffffffff8103b8ac>] ? kvm_clock_read+0x1c/0x20
[<ffffffff8103b8b9>] ? kvm_clock_get_cycles+0x9/0x10
[<ffffffff810e1640>] handle_IRQ_event+0x60/0x170
[<ffffffff8103b154>] ? kvm_guest_apic_eoi_write+0x44/0x50
[<ffffffff810e3d8e>] handle_edge_irq+0xde/0x180
[<ffffffff8100de89>] handle_irq+0x49/0xa0
[<ffffffff81516c8c>] do_IRQ+0x6c/0xf0
[<ffffffff8100b9d3>] ret_from_intr+0x0/0x11
[<ffffffff81076f63>] ? __do_softirq+0x73/0x1e0
[<ffffffff8109b75b>] ? hrtimer_interrupt+0x14b/0x260
[<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
[<ffffffff8100de05>] ? do_softirq+0x65/0xa0
[<ffffffff81076d95>] ? irq_exit+0x85/0x90
[<ffffffff81516d80>] ? smp_apic_timer_interrupt+0x70/0x9b
[<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20

To avoid the issue let's change the second check to test whether serio is
NULL or not.

Also, let's take i8042_lock in i8042_start() and i8042_stop() instead of
trying to be overly smart and using memory barriers.

Signed-off-by: Chen Hong <[email protected]>
[dtor: take lock in i8042_start()/i8042_stop()]
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
hellobbn pushed a commit to hellobbn/android_kernel_oneplus_msm8998 that referenced this issue Aug 6, 2017
lkp-robot reported a BUG:
[   10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198
[   10.152525] IP: rt6_fill_node+0x164/0x4b8
[   10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000
[   10.153309]
[   10.154492] Oops: 0000 [OnePlusOSS#1]
[   10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty OnePlusOSS#10
[   10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   10.158254] task: d0deb000 task.stack: d0e0c000
[   10.159059] EIP: rt6_fill_node+0x164/0x4b8
[   10.159780] EFLAGS: 00010296 CPU: 0
[   10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44
[   10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4
[   10.162534]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[   10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0
[   10.164535] Call Trace:
[   10.164993]  ? paravirt_sched_clock+0x9/0xd
[   10.165727]  ? sched_clock+0x9/0xc
[   10.166329]  ? sched_clock_cpu+0x19/0xe9
[   10.166991]  ? lock_release+0x13e/0x36c
[   10.167652]  rt6_dump_route+0x4c/0x56
[   10.168276]  fib6_dump_node+0x1d/0x3d
[   10.168913]  fib6_walk_continue+0xab/0x167
[   10.169611]  fib6_walk+0x2a/0x40
[   10.170182]  inet6_dump_fib+0xfb/0x1e0
[   10.170855]  netlink_dump+0xcd/0x21f

This happens when the loopback device is set down and a ipv6 fib route
dump is requested.

ip6_null_entry is the root of all ipv6 fib tables making it integrated
into the table and hence passed to the ipv6 route dump code. The
null_entry route uses the loopback device for dst.dev but may not have
rt6i_idev set because of the order in which initializations are done --
ip6_route_net_init is run before addrconf_init has initialized the
loopback device. Fixing the initialization order is a much bigger problem
with no obvious solution thus far.

The BUG is triggered when the loopback is set down and the netif_running
check added by a1a22c1 fails. The fill_node descends to checking
rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is
NULL it faults.

The null_entry route should not be processed in a dump request. Catch
and ignore. This check is done in rt6_dump_route as it is the highest
place in the callchain with knowledge of both the route and the network
namespace.

CRs-Fixed: 1065581
Change-Id: Iaf7265f3a83ecba1a3c4f90ec3288b296d918c5a
Fixes: a1a22c1("net: ipv6: Keep nexthop of multipath route on admin down")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Git-commit: 1f17e2f
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Aug 7, 2017
commit 18ae68f upstream.

WMI ops wrappers did not properly check for null
function pointers for spectral scan. This caused
null dereference crash with WMI-TLV based firmware
which doesn't implement spectral scan.

The crash could be triggered with:

  ip link set dev wlan0 up
  echo background > /sys/kernel/debug/ieee80211/phy0/ath10k/spectral_scan_ctl

The crash looked like this:

  [  168.031989] BUG: unable to handle kernel NULL pointer dereference at           (null)
  [  168.037406] IP: [<          (null)>]           (null)
  [  168.040395] PGD cdd4067 PUD fa0f067 PMD 0
  [  168.043303] Oops: 0010 [OnePlusOSS#1] SMP
  [  168.045377] Modules linked in: ath10k_pci(O) ath10k_core(O) ath mac80211 cfg80211 [last unloaded: cfg80211]
  [  168.051560] CPU: 1 PID: 1380 Comm: bash Tainted: G        W  O    4.8.0 #78
  [  168.054336] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
  [  168.059183] task: ffff88000c460c00 task.stack: ffff88000d4bc000
  [  168.061736] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
  ...
  [  168.100620] Call Trace:
  [  168.101910]  [<ffffffffa03b9566>] ? ath10k_spectral_scan_config+0x96/0x200 [ath10k_core]
  [  168.104871]  [<ffffffff811386e2>] ? filemap_fault+0xb2/0x4a0
  [  168.106696]  [<ffffffffa03b97e6>] write_file_spec_scan_ctl+0x116/0x280 [ath10k_core]
  [  168.109618]  [<ffffffff812da3a1>] full_proxy_write+0x51/0x80
  [  168.111443]  [<ffffffff811957b8>] __vfs_write+0x28/0x120
  [  168.113090]  [<ffffffff812f1a2d>] ? security_file_permission+0x3d/0xc0
  [  168.114932]  [<ffffffff8109b912>] ? percpu_down_read+0x12/0x60
  [  168.116680]  [<ffffffff811965f8>] vfs_write+0xb8/0x1a0
  [  168.118293]  [<ffffffff81197966>] SyS_write+0x46/0xa0
  [  168.119912]  [<ffffffff818f2972>] entry_SYSCALL_64_fastpath+0x1a/0xa4
  [  168.121737] Code:  Bad RIP value.
  [  168.123318] RIP  [<          (null)>]           (null)

Signed-off-by: Michal Kazior <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Amit Pundir <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Aug 9, 2017
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.

  general protection fault: 0000 [OnePlusOSS#1] SMP DEBUG_PAGEALLOC KASAN
  Modules linked in:
  CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: writeback wb_workfn (flush-11:0)
  task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
  RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
  locked_inode_to_wb_and_lock_list+0xa2/0x750
  RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
  RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
  RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
  R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
  R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
  FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
  DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
  Stack:
   ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
   ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
   ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
  Call Trace:
   [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
   [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
   [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
   [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
   [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
   [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
   [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
   [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
   [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
   [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
  Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
  00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
  80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
  RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
  RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
  fs/fs-writeback.c:281
   RSP <ffff88006cdaf7d0>
  ---[ end trace 986a4d314dcb2694 ]---

Fix the problem by making sure __writeback_single_inode() writes inode
only with dirty times in WB_SYNC_ALL mode.

Reported-by: Dmitry Vyukov <[email protected]>
Tested-by: Laurent Dufour <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Francisco Franco <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Aug 12, 2017
commit 337c017 upstream.

 WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0
 CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ OnePlusOSS#1
 RIP: 0010:rcu_note_context_switch+0x207/0x6b0
 Call Trace:
  __schedule+0xda/0xba0
  ? kvm_async_pf_task_wait+0x1b2/0x270
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30
 RIP: 0010:__d_lookup_rcu+0x90/0x1e0

I encounter this when trying to stress the async page fault in L1 guest w/
L2 guests running.

Commit 9b132fb (Add rcu user eqs exception hooks for async page
fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu
idle eqs when needed, to protect the code that needs use rcu.  However,
we need to call the pair even if the function calls schedule(), as seen
from the above backtrace.

This patch fixes it by informing the RCU subsystem exit/enter the irq
towards/away from idle for both n.halted and !n.halted.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Aug 14, 2017
[ Upstream commit 2dda640 ]

syzkaller was able to trigger a divide by 0 in TCP stack [1]

Issue here is that keepalive timer needs to be updated to not attempt
to send a probe if the connection setup was deferred using
TCP_FASTOPEN_CONNECT socket option added in linux-4.11

[1]
 divide error: 0000 [OnePlusOSS#1] SMP
 CPU: 18 PID: 0 Comm: swapper/18 Not tainted
 task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000
 RIP: 0010:[<ffffffff8409cc0d>]  [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
 Call Trace:
  <IRQ>
  [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
  [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
  [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
  [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
  [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0
  [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
  [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257
  [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0
  [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
  [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
  <EOI>
  [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
  [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0

Tested:

Following packetdrill no longer crashes the kernel

`echo 0 >/proc/sys/net/ipv4/tcp_timestamps`

// Cache warmup: send a Fast Open cookie request
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
   +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
 +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop>
   +0 > . 1:1(0) ack 1
   +0 close(3) = 0
   +0 > F. 1:1(0) ack 1
   +0 < F. 1:1(0) ack 2 win 92
   +0 > .  2:2(0) ack 2

   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
   +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
   +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
 +.01 connect(4, ..., ...) = 0
   +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
   +10 close(4) = 0

`echo 1 >/proc/sys/net/ipv4/tcp_timestamps`

Fixes: 19f6d3f ("net/tcp-fastopen: Add new API support")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: Wei Wang <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Sep 2, 2017
[ Upstream commit 36f41f8 ]

pfkey_broadcast() might be called from non process contexts,
we can not use GFP_KERNEL in these cases [1].

This patch partially reverts commit ba51b6b ("net: Fix RCU splat in
af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
section.

[1] : syzkaller reported :

in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
3 locks held by syzkaller183439/2932:
 #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
 OnePlusOSS#1:  (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
 OnePlusOSS#2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
 OnePlusOSS#2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ OnePlusOSS#24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
 __might_sleep+0x95/0x190 kernel/sched/core.c:5947
 slab_pre_alloc_hook mm/slab.h:416 [inline]
 slab_alloc mm/slab.c:3383 [inline]
 kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
 skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
 pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
 pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
 dump_sp+0x3d6/0x500 net/key/af_key.c:2685
 xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
 pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
 pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
 pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
 pfkey_process+0x606/0x710 net/key/af_key.c:2814
 pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 ___sys_sendmsg+0x755/0x890 net/socket.c:2035
 __sys_sendmsg+0xe5/0x210 net/socket.c:2069
 SYSC_sendmsg net/socket.c:2080 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2076
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x445d79
RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000

Fixes: ba51b6b ("net: Fix RCU splat in af_key")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: David Ahern <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jancsessz pushed a commit to jancsessz/oneplus5_kernel that referenced this issue Sep 2, 2017
[ Upstream commit 348a400 ]

In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.

Here is the syzkaller reported general protection fault in fib6_locate:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [OnePlusOSS#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
RSP: 0018:ffff8801d01a36a8  EFLAGS: 00010202
RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
FS:  00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
 ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
 ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
Call Trace:
 [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
 [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
 [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
 [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
 [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
 [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
 [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
 [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
 [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
 [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
 [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
 [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
 [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
 [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
 [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
 [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
 [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17

Note: there is no "Fixes" tag as this seems to be a bug introduced
very early.

Signed-off-by: Wei Wang <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit ecdd095 upstream.

Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:

	divide error: 0000 [OnePlusOSS#1] SMP KASAN
	CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ #213
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
	01/01/2011
	task: ffff88006ba1e840 task.stack: ffff880067338000
	RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
	RSP: 0018:ffff88006733f108 EFLAGS: 00010246
	RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
	RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
	RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
	R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
	R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
	FS:  0000000000000000(0000) GS:ffff88006d000000(0000)
	knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
	Call Trace:
	 lo_do_transfer drivers/block/loop.c:251 [inline]
	 lo_read_transfer drivers/block/loop.c:392 [inline]
	 do_req_filebacked drivers/block/loop.c:541 [inline]
	 loop_handle_cmd drivers/block/loop.c:1677 [inline]
	 loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
	 kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
	 kthread+0x326/0x3f0 kernel/kthread.c:227
	 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
	Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
	84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
	7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
	RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
	ffff88006733f108
	---[ end trace 0166f7bd3b0c0933 ]---

Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit 9cf6cdb upstream.

Fixes a regression triggered by a change in the layout of
struct iio_chan_spec, but the real bug is in the driver which assumed
a specific structure layout in the first place. Hint: the two bits were
not OR:ed together as implied by the indentation prior to this patch,
there was a comma between them, which accidentally moved the ..._SCALE
bit to the next structure field. That field was .info_mask_shared_by_type
before the _available attributes was added by commit 5123960
("iio:core: add a callback to allow drivers to provide _available
attributes") and .info_mask_separate_available afterwards, and the
regression happened.

info_mask_shared_by_type is actually a better choice than the originally
intended info_mask_separate for the ..._SCALE bit since a constant is
returned from mpl3115_read_raw for the scale. Using
info_mask_shared_by_type also preserves the behavior from before the
regression and is therefore less likely to cause other interesting side
effects.

The above mentioned regression causes an unintended sysfs attibute to
show up that is not backed by code, in turn causing the following NULL
pointer defererence to happen on access.

Segmentation fault

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ecc3c000
[00000000] *pgd=87f91831
Internal error: Oops: 80000007 [OnePlusOSS#1] SMP ARM
Modules linked in:
CPU: 1 PID: 1051 Comm: cat Not tainted 4.10.0-rc5-00009-gffd8858-dirty OnePlusOSS#3
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: ed54ec00 task.stack: ee2bc000
PC is at 0x0
LR is at iio_read_channel_info_avail+0x40/0x280
pc : [<00000000>]    lr : [<c06fbc1c>]    psr: a0070013
sp : ee2bdda8  ip : 00000000  fp : ee2bddf4
r10: c0a53c74  r9 : ed79f000  r8 : ee8d1018
r7 : 00001000  r6 : 00000fff  r5 : ee8b9a00  r4 : ed79f000
r3 : ee2bddc4  r2 : ee2bddbc  r1 : c0a86dcc  r0 : ee8d1000
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 3cc3c04a  DAC: 00000051
Process cat (pid: 1051, stack limit = 0xee2bc210)
Stack: (0xee2bdda8 to 0xee2be000)
dda0:                   ee2bddc0 00000002 c016d720 c016d394 ed54ec00 00000000
ddc0: 60070013 ed413780 00000001 edffd480 ee8b9a00 00000fff 00001000 ee8d1018
dde0: ed79f000 c0a53c74 ee2bde0c ee2bddf8 c0513c58 c06fbbe8 edffd480 edffd540
de00: ee2bde3c ee2bde10 c0293474 c0513c40 c02933e4 ee2bde60 00000001 ed413780
de20: 00000001 ed413780 00000000 edffd480 ee2bde4c ee2bde40 c0291d00 c02933f0
de40: ee2bde9c ee2bde50 c024679c c0291ce0 edffd4b0 b6e37000 00020000 ee2bdf78
de60: 00000000 00000000 ed54ec00 ed013200 00000817 c0a111fc edffd540 ed413780
de80: b6e37000 00020000 00020000 ee2bdf78 ee2bded4 ee2bdea0 c0292890 c0246604
dea0: c0117940 c016ba50 00000025 c0a111fc b6e37000 ed413780 ee2bdf78 00020000
dec0: ee2bc000 b6e37000 ee2bdf44 ee2bded8 c021d158 c0292770 c0117764 b6e36004
dee0: c0f0d7c4 ee2bdfb0 b6f89228 00021008 ee2bdfac ee2bdf00 c0101374 c0117770
df00: 00000000 00000000 ee2bc000 00000000 ee2bdf34 ee2bdf20 c016ba04 c0171080
df20: 00000000 00020000 ed413780 b6e37000 00000000 ee2bdf78 ee2bdf74 ee2bdf48
df40: c021e7a0 c021d130 c023e300 c023e280 ee2bdf74 00000000 00000000 ed413780
df60: ed413780 00020000 ee2bdfa4 ee2bdf78 c021e870 c021e71c 00000000 00000000
df80: 00020000 00020000 b6e37000 00000003 c0108084 00000000 00000000 ee2bdfa8
dfa0: c0107ee0 c021e838 00020000 00020000 00000003 b6e37000 00020000 0001a2b4
dfc0: 00020000 00020000 b6e37000 00000003 7fffe000 00000000 00000000 00020000
dfe0: 00000000 be98eb4c 0000c740 b6f1985c 60070010 00000003 00000000 00000000
Backtrace:
[<c06fbbdc>] (iio_read_channel_info_avail) from [<c0513c58>] (dev_attr_show+0x24/0x50)
 r10:c0a53c74 r9:ed79f000 r8:ee8d1018 r7:00001000 r6:00000fff r5:ee8b9a00
 r4:edffd480
[<c0513c34>] (dev_attr_show) from [<c0293474>] (sysfs_kf_seq_show+0x90/0x110)
 r5:edffd540 r4:edffd480
[<c02933e4>] (sysfs_kf_seq_show) from [<c0291d00>] (kernfs_seq_show+0x2c/0x30)
 r10:edffd480 r9:00000000 r8:ed413780 r7:00000001 r6:ed413780 r5:00000001
 r4:ee2bde60 r3:c02933e4
[<c0291cd4>] (kernfs_seq_show) from [<c024679c>] (seq_read+0x1a4/0x4e0)
[<c02465f8>] (seq_read) from [<c0292890>] (kernfs_fop_read+0x12c/0x1cc)
 r10:ee2bdf78 r9:00020000 r8:00020000 r7:b6e37000 r6:ed413780 r5:edffd540
 r4:c0a111fc
[<c0292764>] (kernfs_fop_read) from [<c021d158>] (__vfs_read+0x34/0x118)
 r10:b6e37000 r9:ee2bc000 r8:00020000 r7:ee2bdf78 r6:ed413780 r5:b6e37000
 r4:c0a111fc
[<c021d124>] (__vfs_read) from [<c021e7a0>] (vfs_read+0x90/0x11c)
 r8:ee2bdf78 r7:00000000 r6:b6e37000 r5:ed413780 r4:00020000
[<c021e710>] (vfs_read) from [<c021e870>] (SyS_read+0x44/0x90)
 r8:00020000 r7:ed413780 r6:ed413780 r5:00000000 r4:00000000
[<c021e82c>] (SyS_read) from [<c0107ee0>] (ret_fast_syscall+0x0/0x1c)
 r10:00000000 r8:c0108084 r7:00000003 r6:b6e37000 r5:00020000 r4:00020000
Code: bad PC value
---[ end trace 9c4938ccd0389004 ]---

Fixes: cc26ad4 ("iio: Add Freescale MPL3115A2 pressure / temperature sensor driver")
Fixes: 5123960 ("iio:core: add a callback to allow drivers to provide _available attributes")
Reported-by: Ken Lin <[email protected]>
Tested-by: Ken Lin <[email protected]>
Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit e9b736d upstream.

The class of 4 n_hdls buf locks is the same because a single function
n_hdlc_buf_list_init is used to init all the locks. But since
flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls
n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep
emits a warning:
=============================================
[ INFO: possible recursive locking detected ]
4.3.0-25.g91e30a7-default OnePlusOSS#1 Not tainted
---------------------------------------------
a.out/1248 is trying to acquire lock:
 (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]

but task is already holding lock:
 (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&list->spinlock)->rlock);
  lock(&(&list->spinlock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by a.out/1248:
 #0:  (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50
 OnePlusOSS#1:  (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
...
Call Trace:
...
 [<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70
 [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
 [<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc]
 [<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40
...

Fix it by initializing the spin_locks separately. This removes also
reduntand memset of a freshly kzallocated space.

Signed-off-by: Jiri Slaby <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit 6cb72bc upstream.

After srp_process_rsp() returns there is a short time during which
the scsi_host_find_tag() call will return a pointer to the SCSI
command that is being completed. If during that time a duplicate
response is received, avoid that the following call stack appears:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: srp_recv_done+0x450/0x6b0 [ib_srp]
Oops: 0000 [OnePlusOSS#1] SMP
CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ OnePlusOSS#1
Call Trace:
 <IRQ>
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_poll_handler+0x1d/0x70 [ib_core]
 irq_poll_softirq+0xba/0x120
 __do_softirq+0xba/0x4c0
 irq_exit+0xbe/0xd0
 smp_apic_timer_interrupt+0x38/0x50
 apic_timer_interrupt+0x90/0xa0
 </IRQ>
RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20

Signed-off-by: Bart Van Assche <[email protected]>
Cc: Israel Rukshin <[email protected]>
Cc: Max Gurtovoy <[email protected]>
Cc: Laurence Oberman <[email protected]>
Cc: Steve Feeley <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit 0d06863 upstream.

Fix a BUG when the kernel tries to mount a file system constructed as
follows:

echo foo > foo.txt
mke2fs -Fq -t ext4 -O encrypt foo.img 100
debugfs -w foo.img << EOF
write foo.txt a
set_inode_field a i_flags 0x80800
set_super_value s_last_orphan 12
quit
EOF

root@kvm-xfstests:~# mount -o loop foo.img /mnt
[  160.238770] ------------[ cut here ]------------
[  160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
[  160.240106] invalid opcode: 0000 [OnePlusOSS#1] SMP
[  160.240106] Modules linked in:
[  160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G        W       4.10.0-rc3-00034-gcdd33b941b67 #227
[  160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[  160.240106] task: f4518000 task.stack: f47b6000
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
[  160.240106] EFLAGS: 00010246 CPU: 0
[  160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
[  160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
[  160.240106]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
[  160.240106] Call Trace:
[  160.240106]  ext4_truncate+0x1e9/0x3e5
[  160.240106]  ext4_fill_super+0x286f/0x2b1e
[  160.240106]  ? set_blocksize+0x2e/0x7e
[  160.240106]  mount_bdev+0x114/0x15f
[  160.240106]  ext4_mount+0x15/0x17
[  160.240106]  ? ext4_calculate_overhead+0x39d/0x39d
[  160.240106]  mount_fs+0x58/0x115
[  160.240106]  vfs_kern_mount+0x4b/0xae
[  160.240106]  do_mount+0x671/0x8c3
[  160.240106]  ? _copy_from_user+0x70/0x83
[  160.240106]  ? strndup_user+0x31/0x46
[  160.240106]  SyS_mount+0x57/0x7b
[  160.240106]  do_int80_syscall_32+0x4f/0x61
[  160.240106]  entry_INT80_32+0x2f/0x2f
[  160.240106] EIP: 0xb76b919e
[  160.240106] EFLAGS: 00000246 CPU: 0
[  160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
[  160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
[  160.240106]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
[  160.317241] ---[ end trace d6a773a375c810a5 ]---

The problem is that when the kernel tries to truncate an inode in
ext4_truncate(), it tries to clear any on-disk data beyond i_size.
Without the encryption key, it can't do that, and so it triggers a
BUG.

E2fsck does *not* provide this service, and in practice most file
systems have their orphan list processed by e2fsck, so to avoid
crashing, this patch skips this step if we don't have access to the
encryption key (which is the case when processing the orphan list; in
all other cases, we will have the encryption key, or the kernel
wouldn't have allowed the file to be opened).

An open question is whether the fact that e2fsck isn't clearing the
bytes beyond i_size causing problems --- and if we've lived with it
not doing it for so long, can we drop this from the kernel replay of
the orphan list in all cases (not just when we don't have the key for
encrypted inodes).

Addresses-Google-Bug: #35209576

Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
[ Upstream commit d5afb6f ]

The code where sk_clone() came from created a new socket and locked it,
but then, on the error path didn't unlock it.

This problem stayed there for a long while, till b0691c8 ("net:
Unlock sock before calling sk_free()") fixed it, but unfortunately the
callers of sk_clone() (now sk_clone_locked()) were not audited and the
one in dccp_create_openreq_child() remained.

Now in the age of the syskaller fuzzer, this was finally uncovered, as
reported by Dmitry:

 ---- 8< ----

I've got the following report while running syzkaller fuzzer on
86292b3 ("Merge branch 'akpm' (patches from Andrew)")

  [ BUG: held lock freed! ]
  4.10.0+ #234 Not tainted
  -------------------------
  syz-executor6/6898 is freeing memory
  ffff88006286cac0-ffff88006286d3b7, with a lock still held there!
   (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
  include/linux/spinlock.h:299 [inline]
   (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
  sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
  5 locks held by syz-executor6/6898:
   #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>] lock_sock
  include/net/sock.h:1460 [inline]
   #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>]
  inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681
   OnePlusOSS#1:  (rcu_read_lock){......}, at: [<ffffffff83bc1c2a>]
  inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126
   OnePlusOSS#2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_unlink
  include/linux/skbuff.h:1767 [inline]
   OnePlusOSS#2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_dequeue
  include/linux/skbuff.h:1783 [inline]
   OnePlusOSS#2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>]
  process_backlog+0x264/0x730 net/core/dev.c:4835
   OnePlusOSS#3:  (rcu_read_lock){......}, at: [<ffffffff83aeb5c0>]
  ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59
   OnePlusOSS#4:  (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
  include/linux/spinlock.h:299 [inline]
   OnePlusOSS#4:  (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
  sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504

Fix it just like was done by b0691c8 ("net: Unlock sock before calling
sk_free()").

Reported-by: Dmitry Vyukov <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Gerrit Renker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit ab2a4bf upstream.

The USB core contains a bug that can show up when a USB-3 host
controller is removed.  If the primary (USB-2) hcd structure is
released before the shared (USB-3) hcd, the core will try to do a
double-free of the common bandwidth_mutex.

The problem was described in graphical form by Chung-Geol Kim, who
first reported it:

=================================================
     At *remove USB(3.0) Storage
     sequence <1> --> <5> ((Problem Case))
=================================================
                                  VOLD
------------------------------------|------------
                                 (uevent)
                            ________|_________
                           |<1>               |
                           |dwc3_otg_sm_work  |
                           |usb_put_hcd       |
                           |peer_hcd(kref=2)|
                           |__________________|
                            ________|_________
                           |<2>               |
                           |New USB BUS OnePlusOSS#2    |
                           |                  |
                           |peer_hcd(kref=1)  |
                           |                  |
                         --(Link)-bandXX_mutex|
                         | |__________________|
                         |
    ___________________  |
   |<3>                | |
   |dwc3_otg_sm_work   | |
   |usb_put_hcd        | |
   |primary_hcd(kref=1)| |
   |___________________| |
    _________|_________  |
   |<4>                | |
   |New USB BUS OnePlusOSS#1     | |
   |hcd_release        | |
   |primary_hcd(kref=0)| |
   |                   | |
   |bandXX_mutex(free) |<-
   |___________________|
                               (( VOLD ))
                            ______|___________
                           |<5>               |
                           |      SCSI        |
                           |usb_put_hcd       |
                           |peer_hcd(kref=0)  |
                           |*hcd_release      |
                           |bandXX_mutex(free*)|<- double free
                           |__________________|

=================================================

This happens because hcd_release() frees the bandwidth_mutex whenever
it sees a primary hcd being released (which is not a very good idea
in any case), but in the course of releasing the primary hcd, it
changes the pointers in the shared hcd in such a way that the shared
hcd will appear to be primary when it gets released.

This patch fixes the problem by changing hcd_release() so that it
deallocates the bandwidth_mutex only when the _last_ hcd structure
referencing it is released.  The patch also removes an unnecessary
test, so that when an hcd is released, both the shared_hcd and
primary_hcd pointers in the hcd's peer will be cleared.

Signed-off-by: Alan Stern <[email protected]>
Reported-by: Chung-Geol Kim <[email protected]>
Tested-by: Chung-Geol Kim <[email protected]>
Cc: Sumit Semwal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit 707e59b upstream.

The following commit:

  1fb3a8b ("xen/spinlock: Fix locking path engaging too soon under PVHVM.")

... moved the initalization of the kicker interrupt until after
native_cpu_up() is called.

However, when using qspinlocks, a CPU may try to kick another CPU that is
spinning (because it has not yet initialized its kicker interrupt), resulting
in the following crash during boot:

  kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210!
  invalid opcode: 0000 [OnePlusOSS#1] SMP
  ...
  RIP: 0010:[<ffffffff814c97c9>]  [<ffffffff814c97c9>] xen_send_IPI_one+0x59/0x60
  ...
  Call Trace:
   [<ffffffff8102be9e>] xen_qlock_kick+0xe/0x10
   [<ffffffff810cabc2>] __pv_queued_spin_unlock+0xb2/0xf0
   [<ffffffff810ca6d1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
   [<ffffffff81052936>] ? check_tsc_warp+0x76/0x150
   [<ffffffff81052aa6>] check_tsc_sync_source+0x96/0x160
   [<ffffffff81051e28>] native_cpu_up+0x3d8/0x9f0
   [<ffffffff8102b315>] xen_hvm_cpu_up+0x35/0x80
   [<ffffffff8108198c>] _cpu_up+0x13c/0x180
   [<ffffffff81081a4a>] cpu_up+0x7a/0xa0
   [<ffffffff81f80dfc>] smp_init+0x7f/0x81
   [<ffffffff81f5a121>] kernel_init_freeable+0xef/0x212
   [<ffffffff81817f30>] ? rest_init+0x80/0x80
   [<ffffffff81817f3e>] kernel_init+0xe/0xe0
   [<ffffffff8182488f>] ret_from_fork+0x3f/0x70
   [<ffffffff81817f30>] ? rest_init+0x80/0x80

To fix this, only send the kick if the target CPU's interrupt has been
initialized. This check isn't racy, because the target is waiting for
the spinlock, so it won't have initialized the interrupt in the
meantime.

Signed-off-by: Ross Lagerwall <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Sumit Semwal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit 7b2db29 upstream.

If usb_get_bos_descriptor() returns an error, usb->bos will be NULL.
Nevertheless, it is dereferenced unconditionally in
hub_set_initial_usb2_lpm_policy() if usb2_hw_lpm_capable is set.
This results in a crash.

usb 5-1: unable to get BOS descriptor
...
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = ffffffc00165f000
[00000008] *pgd=000000000174f003, *pud=000000000174f003,
		*pmd=0000000001750003, *pte=00e8000001751713
Internal error: Oops: 96000005 [OnePlusOSS#1] PREEMPT SMP
Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac [ ... ]
CPU: 5 PID: 3353 Comm: kworker/5:3 Tainted: G    B 4.4.52 #480
Hardware name: Google Kevin (DT)
Workqueue: events driver_set_config_work
task: ffffffc0c3690000 ti: ffffffc0ae9a8000 task.ti: ffffffc0ae9a8000
PC is at hub_port_init+0xc3c/0xd10
LR is at hub_port_init+0xc3c/0xd10
...
Call trace:
[<ffffffc0007fbbfc>] hub_port_init+0xc3c/0xd10
[<ffffffc0007fbe2c>] usb_reset_and_verify_device+0x15c/0x82c
[<ffffffc0007fc5e0>] usb_reset_device+0xe4/0x298
[<ffffffbffc0e3fcc>] rtl8152_probe+0x84/0x9b0 [r8152]
[<ffffffc00080ca8c>] usb_probe_interface+0x244/0x2f8
[<ffffffc000774a24>] driver_probe_device+0x180/0x3b4
[<ffffffc000774e48>] __device_attach_driver+0xb4/0xe0
[<ffffffc000772168>] bus_for_each_drv+0xb4/0xe4
[<ffffffc0007747ec>] __device_attach+0xd0/0x158
[<ffffffc000775080>] device_initial_probe+0x24/0x30
[<ffffffc0007739d4>] bus_probe_device+0x50/0xe4
[<ffffffc000770bd0>] device_add+0x414/0x738
[<ffffffc000809fe8>] usb_set_configuration+0x89c/0x914
[<ffffffc00080a120>] driver_set_config_work+0xc0/0xf0
[<ffffffc000249bb8>] process_one_work+0x390/0x6b8
[<ffffffc00024abcc>] worker_thread+0x480/0x610
[<ffffffc000251a80>] kthread+0x164/0x178
[<ffffffc0002045d0>] ret_from_fork+0x10/0x40

Since we don't know anything about LPM capabilities without BOS descriptor,
don't attempt to enable LPM if it is not available.

Fixes: 890dae8 ("xhci: Enable LPM support only for hardwired ...")
Cc: Mathias Nyman <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Acked-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nethunteros pushed a commit to nethunteros/android_kernel_oneplus_msm8998 that referenced this issue Sep 10, 2017
commit 6207119 upstream.

With this reproducer:
  struct sockaddr_alg alg = {
          .salg_family = 0x26,
          .salg_type = "hash",
          .salg_feat = 0xf,
          .salg_mask = 0x5,
          .salg_name = "digest_null",
  };
  int sock, sock2;

  sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
  bind(sock, (struct sockaddr *)&alg, sizeof(alg));
  sock2 = accept(sock, NULL, NULL);
  setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
  accept(sock2, NULL, NULL);

==== 8< ======== 8< ======== 8< ======== 8< ====

one can immediatelly see an UBSAN warning:
UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
variable length array bound value 0 <= 0
CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default OnePlusOSS#1
...
Call Trace:
...
 [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
 [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
 [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
 [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
 [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
 [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40

It is a correct warning, as hash state is propagated to accept as zero,
but creating a zero-length variable array is not allowed in C.

Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
similar happens in the code there, so we just allocate one byte even
though we do not use the array.

Signed-off-by: Jiri Slaby <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]> (maintainer:CRYPTO API)
Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
During backport f7eea63 ("KVM: nVMX: handle page fault in vmread"),
there was a mistake the exception reference should be passed to function
kvm_write_guest_virt_system, instead of NULL, other wise, we will get
NULL pointer deref, eg

kvm-unit-test triggered a NULL pointer deref below:
[  948.518437] kvm [24114]: vcpu0, guest rIP: 0x407ef9 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x3, nop
[  949.106464] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  949.106707] PGD 0 P4D 0
[  949.106872] Oops: 0002 [OnePlusOSS#1] SMP
[  949.107038] CPU: 2 PID: 24126 Comm: qemu-2.7 Not tainted 4.19.77-pserver OnePlusOSS#4.19.77-1+feature+daily+update+20191005.1625+a4168bb~deb9
[  949.107283] Hardware name: Dell Inc. Precision Tower 3620/09WH54, BIOS 2.7.3 01/31/2018
[  949.107549] RIP: 0010:kvm_write_guest_virt_system+0x12/0x40 [kvm]
[  949.107719] Code: c0 5d 41 5c 41 5d 41 5e 83 f8 03 41 0f 94 c0 41 c1 e0 02 e9 b0 ed ff ff 0f 1f 44 00 00 48 89 f0 c6 87 59 56 00 00 01 48 89 d6 <49> c7 00 00 00 00 00 89 ca 49 c7 40 08 00 00 00 00 49 c7 40 10 00
[  949.108044] RSP: 0018:ffffb31b0a953cb0 EFLAGS: 00010202
[  949.108216] RAX: 000000000046b4d8 RBX: ffff9e9f415b0000 RCX: 0000000000000008
[  949.108389] RDX: ffffb31b0a953cc0 RSI: ffffb31b0a953cc0 RDI: ffff9e9f415b0000
[  949.108562] RBP: 00000000d2e14928 R08: 0000000000000000 R09: 0000000000000000
[  949.108733] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffc8
[  949.108907] R13: 0000000000000002 R14: ffff9e9f4f26f2e8 R15: 0000000000000000
[  949.109079] FS:  00007eff8694c700(0000) GS:ffff9e9f51a80000(0000) knlGS:0000000031415928
[  949.109318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  949.109495] CR2: 0000000000000000 CR3: 00000003be53b002 CR4: 00000000003626e0
[  949.109671] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  949.109845] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  949.110017] Call Trace:
[  949.110186]  handle_vmread+0x22b/0x2f0 [kvm_intel]
[  949.110356]  ? vmexit_fill_RSB+0xc/0x30 [kvm_intel]
[  949.110549]  kvm_arch_vcpu_ioctl_run+0xa98/0x1b30 [kvm]
[  949.110725]  ? kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
[  949.110901]  kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
[  949.111072]  do_vfs_ioctl+0xa2/0x620

Signed-off-by: Jack Wang <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
commit 1b82feb upstream.

It seems that smp_processor_id() is only used for a best-effort
load-balancing, refer to qat_crypto_get_instance_node(). It's not feasible
to disable preemption for the duration of the crypto requests. Therefore,
just silence the warning. This commit is similar to e7a9b05
("crypto: cavium - Fix smp_processor_id() warnings").

Silences the following splat:
BUG: using smp_processor_id() in preemptible [00000000] code: cryptomgr_test/2904
caller is qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat]
CPU: 1 PID: 2904 Comm: cryptomgr_test Tainted: P           O    4.14.69 OnePlusOSS#1
...
Call Trace:
 dump_stack+0x5f/0x86
 check_preemption_disabled+0xd3/0xe0
 qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat]
 skcipher_setkey_ablkcipher+0x2b/0x40
 __test_skcipher+0x1f3/0xb20
 ? cpumask_next_and+0x26/0x40
 ? find_busiest_group+0x10e/0x9d0
 ? preempt_count_add+0x49/0xa0
 ? try_module_get+0x61/0xf0
 ? crypto_mod_get+0x15/0x30
 ? __kmalloc+0x1df/0x1f0
 ? __crypto_alloc_tfm+0x116/0x180
 ? crypto_skcipher_init_tfm+0xa6/0x180
 ? crypto_create_tfm+0x4b/0xf0
 test_skcipher+0x21/0xa0
 alg_test_skcipher+0x3f/0xa0
 alg_test.part.6+0x126/0x2a0
 ? finish_task_switch+0x21b/0x260
 ? __schedule+0x1e9/0x800
 ? __wake_up_common+0x8d/0x140
 cryptomgr_test+0x40/0x50
 kthread+0xff/0x130
 ? cryptomgr_notify+0x540/0x540
 ? kthread_create_on_node+0x70/0x70
 ret_from_fork+0x24/0x50

Fixes: ed8ccae ("crypto: qat - Add support for SRIOV")
Cc: [email protected]
Signed-off-by: Alexander Sverdlin <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
[ Upstream commit 443f2d5 ]

Observe a segmentation fault when 'perf stat' is asked to repeat forever
with the interval option.

Without fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
  #           time             counts unit events
       5.000211692  3,13,89,82,34,157      cycles
      10.000380119  1,53,98,52,22,294      cycles
      10.040467280       17,16,79,265      cycles
  Segmentation fault

This problem was only observed when we use forever option aka -r 0 and
works with limited repeats. Calling print_counter with ts being set to
NULL, is not a correct option when interval is set. Hence avoid
print_counter(NULL,..)  if interval is set.

With fix:

  # perf stat -r 0 -I 5000 -e cycles -a sleep 10
   #           time             counts unit events
       5.019866622  3,15,14,43,08,697      cycles
      10.039865756  3,15,16,31,95,261      cycles
      10.059950628     1,26,05,47,158      cycles
       5.009902655  3,14,52,62,33,932      cycles
      10.019880228  3,14,52,22,89,154      cycles
      10.030543876       66,90,18,333      cycles
       5.009848281  3,14,51,98,25,437      cycles
      10.029854402  3,15,14,93,04,918      cycles
       5.009834177  3,14,51,95,92,316      cycles

Committer notes:

Did the 'git bisect' to find the cset introducing the problem to add the
Fixes tag below, and at that time the problem reproduced as:

  (gdb) run stat -r0 -I500 sleep 1
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  866		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
  (gdb) bt
  #0  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
  OnePlusOSS#1  0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
  OnePlusOSS#2  0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
  OnePlusOSS#3  0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
  OnePlusOSS#4  0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
  OnePlusOSS#5  0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
  OnePlusOSS#6  0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
  (gdb)

Mostly the same as just before this patch:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  964		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
  (gdb) bt
  #0  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
  OnePlusOSS#1  0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
      at util/stat-display.c:1172
  OnePlusOSS#2  0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
  OnePlusOSS#3  0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
  OnePlusOSS#4  0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
  OnePlusOSS#5  0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
  OnePlusOSS#6  0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
  OnePlusOSS#7  0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
  (gdb)

Fixes: d4f63a4 ("perf stat: Introduce print_counters function")
Signed-off-by: Srikar Dronamraju <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: [email protected] # v4.2+
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
commit 20bb759 upstream.

Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled.  From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.

This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:

  | sysrq: Trigger a crash
  | Kernel panic - not syncing: sysrq triggered crash
  | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 OnePlusOSS#1
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x148
  |  show_stack+0x14/0x20
  |  dump_stack+0xa0/0xc4
  |  panic+0x140/0x32c
  |  sysrq_handle_reboot+0x0/0x20
  |  __handle_sysrq+0x124/0x190
  |  write_sysrq_trigger+0x64/0x88
  |  proc_reg_write+0x60/0xa8
  |  __vfs_write+0x18/0x40
  |  vfs_write+0xa4/0x1b8
  |  ksys_write+0x64/0xf0
  |  __arm64_sys_write+0x14/0x20
  |  el0_svc_common.constprop.0+0xb0/0x168
  |  el0_svc_handler+0x28/0x78
  |  el0_svc+0x8/0xc
  | Kernel Offset: disabled
  | CPU features: 0x0002,24002004
  | Memory Limit: none
  | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
  |  Ping 2!
  |  Ping 1!
  |  Ping 1!
  |  Ping 2!

The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.

Disable preemption in 'panic()' before re-enabling interrupts.

Link: http://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <[email protected]>
Reported-by: Xogium <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Russell King <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
commit a553add upstream.

Call uart_unregister_driver() conditionally instead of
unconditionally, only if it has been previously registered.

This uses driver.state, just as the sh-sci.c driver does.

Fixes this null pointer dereference in tty_unregister_driver(),
since the 'driver' argument is null:

  general protection fault: 0000 [OnePlusOSS#1] PREEMPT SMP KASAN PTI
  RIP: 0010:tty_unregister_driver+0x25/0x1d0

Fixes: 238b872 ("[PATCH] serial uartlite driver")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: stable <[email protected]>
Cc: Peter Korsgaard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
The RCU lock is insufficient to protect the radix tree iteration as
a deletion from the tree can occur before we take the spinlock to
tag the entry.  In 4.19, this has manifested as a bug with the following
trace:

kernel BUG at lib/radix-tree.c:1429!
invalid opcode: 0000 [OnePlusOSS#1] SMP KASAN PTI
CPU: 7 PID: 6935 Comm: syz-executor.2 Not tainted 4.19.36 OnePlusOSS#25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:radix_tree_tag_set+0x200/0x2f0 lib/radix-tree.c:1429
Code: 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 44 24 10 e8 a3 29 7e fe 48 8b 44 24 10 48 0f ab 03 e9 d2 fe ff ff e8 90 29 7e fe <0f> 0b 48 c7 c7 e0 5a 87 84 e8 f0 e7 08 ff 4c 89 ef e8 4a ff ac fe
RSP: 0018:ffff88837b13fb60 EFLAGS: 00010016
RAX: 0000000000040000 RBX: ffff8883c5515d58 RCX: ffffffff82cb2ef0
RDX: 0000000000000b72 RSI: ffffc90004cf2000 RDI: ffff8883c5515d98
RBP: ffff88837b13fb98 R08: ffffed106f627f7e R09: ffffed106f627f7e
R10: 0000000000000001 R11: ffffed106f627f7d R12: 0000000000000004
R13: ffffea000d7fea80 R14: 1ffff1106f627f6f R15: 0000000000000002
FS:  00007fa1b8df2700(0000) GS:ffff8883e2fc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa1b8df1db8 CR3: 000000037d4d2001 CR4: 0000000000160ee0
Call Trace:
 memfd_tag_pins mm/memfd.c:51 [inline]
 memfd_wait_for_pins+0x2c5/0x12d0 mm/memfd.c:81
 memfd_add_seals mm/memfd.c:215 [inline]
 memfd_fcntl+0x33d/0x4a0 mm/memfd.c:247
 do_fcntl+0x589/0xeb0 fs/fcntl.c:421
 __do_sys_fcntl fs/fcntl.c:463 [inline]
 __se_sys_fcntl fs/fcntl.c:448 [inline]
 __x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:448
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:293

The problem does not occur in mainline due to the XArray rewrite which
changed the locking to exclude modification of the tree during iteration.
At the time, nobody realised this was a bugfix.  Backport the locking
changes to stable.

Cc: [email protected]
Reported-by: zhong jiang <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
commit 77c3012 upstream.

We have a test case like block/001 in blktests, which will create a scsi
device by loading scsi_debug module and then try to delete the device by
sysfs interface. At the same time, it may remove the scsi_debug module.

And getting a invalid paging request BUG_ON as following:

[   34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8
[   34.629189] Oops: 0000 [OnePlusOSS#1] SMP PTI
[   34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G        W         5.4.0-rc3+ #473
[   34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0
[   34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0
[   34.644545] Call Trace:
[   34.644907]  scsi_host_dev_release+0x6b/0x1f0
[   34.645511]  device_release+0x74/0x110
[   34.646046]  kobject_put+0x116/0x390
[   34.646559]  put_device+0x17/0x30
[   34.647041]  scsi_target_dev_release+0x2b/0x40
[   34.647652]  device_release+0x74/0x110
[   34.648186]  kobject_put+0x116/0x390
[   34.648691]  put_device+0x17/0x30
[   34.649157]  scsi_device_dev_release_usercontext+0x2e8/0x360
[   34.649953]  execute_in_process_context+0x29/0x80
[   34.650603]  scsi_device_dev_release+0x20/0x30
[   34.651221]  device_release+0x74/0x110
[   34.651732]  kobject_put+0x116/0x390
[   34.652230]  sysfs_unbreak_active_protection+0x3f/0x50
[   34.652935]  sdev_store_delete.cold.4+0x71/0x8f
[   34.653579]  dev_attr_store+0x1b/0x40
[   34.654103]  sysfs_kf_write+0x3d/0x60
[   34.654603]  kernfs_fop_write+0x174/0x250
[   34.655165]  __vfs_write+0x1f/0x60
[   34.655639]  vfs_write+0xc7/0x280
[   34.656117]  ksys_write+0x6d/0x140
[   34.656591]  __x64_sys_write+0x1e/0x30
[   34.657114]  do_syscall_64+0xb1/0x400
[   34.657627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   34.658335] RIP: 0033:0x7f156f337130

During deleting scsi target, the scsi_debug module have been removed. Then,
sdebug_driver_template belonged to the module cannot be accessd, resulting
in scsi_proc_hostdir_rm() BUG_ON.

To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to
increase refcount of module, avoiding the module been removed.

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yufen Yu <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Nov 1, 2019
commit e4f8e51 upstream.

A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58b ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.

Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.

  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  cat/5224 is trying to acquire lock:
  ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
  show_slab_objects+0x94/0x3a8

  but task is already holding lock:
  b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> OnePlusOSS#2 (kn->count#45){++++}:
         lock_acquire+0x31c/0x360
         __kernfs_remove+0x290/0x490
         kernfs_remove+0x30/0x44
         sysfs_remove_dir+0x70/0x88
         kobject_del+0x50/0xb0
         sysfs_slab_unlink+0x2c/0x38
         shutdown_cache+0xa0/0xf0
         kmemcg_cache_shutdown_fn+0x1c/0x34
         kmemcg_workfn+0x44/0x64
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> OnePlusOSS#1 (slab_mutex){+.+.}:
         lock_acquire+0x31c/0x360
         __mutex_lock_common+0x16c/0xf78
         mutex_lock_nested+0x40/0x50
         memcg_create_kmem_cache+0x38/0x16c
         memcg_kmem_cache_create_func+0x3c/0x70
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #0 (mem_hotplug_lock.rw_sem){++++}:
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc

  other info that might help us debug this:

  Chain exists of:
    mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(kn->count#45);
                                 lock(slab_mutex);
                                 lock(kn->count#45);
    lock(mem_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  3 locks held by cat/5224:
   #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
   OnePlusOSS#1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
   OnePlusOSS#2: b8ff009693eee398 (kn->count#45){++++}, at:
  kernfs_seq_start+0x44/0xf0

  stack backtrace:
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0xd0/0x140
   print_circular_bug+0x368/0x380
   check_noncircular+0x248/0x250
   validate_chain+0xd10/0x2bcc
   __lock_acquire+0x7f4/0xb8c
   lock_acquire+0x31c/0x360
   get_online_mems+0x54/0x150
   show_slab_objects+0x94/0x3a8
   total_objects_show+0x28/0x34
   slab_attr_show+0x38/0x54
   sysfs_kf_seq_show+0x198/0x2d4
   kernfs_seq_show+0xa4/0xcc
   seq_read+0x30c/0x8a8
   kernfs_fop_read+0xa8/0x314
   __vfs_read+0x88/0x20c
   vfs_read+0xd8/0x10c
   ksys_read+0xb0/0x120
   __arm64_sys_read+0x54/0x88
   el0_svc_handler+0x170/0x240
   el0_svc+0x8/0xc

I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free.  There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html

[[email protected]: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 01fb58b ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jgcaap pushed a commit to NewRom/android_kernel_oneplus_msm8998 that referenced this issue May 8, 2020
We use IPI instead of NMI in ARM64 to get backtrace of all CPUs.
However, smp_processor_id expects preempt disabled, and this leads to
a kernel BUG seen as below:

c3  19299 SysRq : Show backtrace of all active CPUs
c3  19299 BUG: using smp_processor_id() in preemptible [00000000] code: sh/19299
c3  19299 caller is debug_smp_processor_id+0x1c/0x24
c3  19299 CPU: 3 PID: 19299 Comm: sh Tainted: G        W      3.18.31-gb1be511 OnePlusOSS#1
c3  19299 Hardware name: HTC Corporation. MSM8996pro + PMI8996 Sailfish XB (DT)
c3  19299 Call trace:
c3  19299 [<ffffffc00008aab4>] dump_backtrace+0x0/0x23c
c3  19299 [<ffffffc00008ad04>] show_stack+0x14/0x1c
c3  19299 [<ffffffc000e29f38>] dump_stack+0x80/0xa4
c3  19299 [<ffffffc0003147a4>] check_preemption_disabled+0xe0/0x104
c3  19299 [<ffffffc0003147e4>] debug_smp_processor_id+0x1c/0x24
c3  19299 [<ffffffc000092b24>] arch_trigger_all_cpu_backtrace+0x18/0xdc
c3  19299 [<ffffffc00048dd64>] sysrq_handle_showallcpus+0xc/0x14
c3  19299 [<ffffffc00048e33c>] __handle_sysrq+0x9c/0x154
c3  19299 [<ffffffc00048e724>] write_sysrq_trigger+0x54/0x68
c3  19299 [<ffffffc0001f9c54>] proc_reg_write+0x68/0x84
c3  19299 [<ffffffc0001a6e0c>] vfs_write+0xbc/0x18c
c3  19299 [<ffffffc0001a6fd4>] SyS_write+0x54/0xa4
c3  19299 Backtrace for cpu 3 (current):
c3  19299 CPU: 3 PID: 19299 Comm: sh Tainted: G        W      3.18.31-gb1be511 OnePlusOSS#1
c3  19299 Hardware name: HTC Corporation. MSM8996pro + PMI8996 Sailfish XB (DT)
c3  19299 Call trace:
c3  19299 [<ffffffc00008aab4>] dump_backtrace+0x0/0x23c
c3  19299 [<ffffffc00008ad04>] show_stack+0x14/0x1c
c3  19299 [<ffffffc000e29f38>] dump_stack+0x80/0xa4
c3  19299 [<ffffffc000092b70>] arch_trigger_all_cpu_backtrace+0x64/0xdc
c3  19299 [<ffffffc00048dd64>] sysrq_handle_showallcpus+0xc/0x14
c3  19299 [<ffffffc00048e33c>] __handle_sysrq+0x9c/0x154
c3  19299 [<ffffffc00048e724>] write_sysrq_trigger+0x54/0x68
c3  19299 [<ffffffc0001f9c54>] proc_reg_write+0x68/0x84
c3  19299 [<ffffffc0001a6e0c>] vfs_write+0xbc/0x18c
c3  19299 [<ffffffc0001a6fd4>] SyS_write+0x54/0xa4
c3  19299
c3  19299 sending IPI to all other CPUs:
c3  19299 BUG: using smp_processor_id() in preemptible [00000000] code: sh/19299
c3  19299 caller is debug_smp_processor_id+0x1c/0x24
c3  19299 CPU: 3 PID: 19299 Comm: sh Tainted: G        W      3.18.31-gb1be511 OnePlusOSS#1
c3  19299 Hardware name: HTC Corporation. MSM8996pro + PMI8996 Sailfish XB (DT)
c3  19299 Call trace:
c3  19299 [<ffffffc00008aab4>] dump_backtrace+0x0/0x23c
c3  19299 [<ffffffc00008ad04>] show_stack+0x14/0x1c
c3  19299 [<ffffffc000e29f38>] dump_stack+0x80/0xa4
c3  19299 [<ffffffc0003147a4>] check_preemption_disabled+0xe0/0x104
c3  19299 [<ffffffc0003147e4>] debug_smp_processor_id+0x1c/0x24
c3  19299 [<ffffffc000327d24>] gic_raise_softirq+0x128/0x158
c3  19299 [<ffffffc000092828>] smp_cross_call_common+0x80/0x94
c3  19299 [<ffffffc000092b94>] arch_trigger_all_cpu_backtrace+0x88/0xdc
c3  19299 [<ffffffc00048dd64>] sysrq_handle_showallcpus+0xc/0x14
c3  19299 [<ffffffc00048e33c>] __handle_sysrq+0x9c/0x154
c3  19299 [<ffffffc00048e724>] write_sysrq_trigger+0x54/0x68
c3  19299 [<ffffffc0001f9c54>] proc_reg_write+0x68/0x84
c3  19299 [<ffffffc0001a6e0c>] vfs_write+0xbc/0x18c
c3  19299 [<ffffffc0001a6fd4>] SyS_write+0x54/0xa4

Bug: 35409577
Bug: 34193533
Test: on marlin, trigger sysrq
Change-Id: I0239f388c8ec63f10fc73a32bf8a7deddc408104
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Francisco Franco <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Jun 23, 2020
Author: @tanish2k09 (email: [email protected])

What is it?
Kernel-based Lapse ("K-Lapse") is a linear RGB scaling module that 'shifts' RGB based on time (of the day/selected by user), or (since v2.0) brightness. This concept is inspired from
LineageOS (formerly known as 'CyanogenMod') ROM's feature "livedisplay" which also changes the display settings (RGB, hue, temperature, etc) based on time.

Why did you decide to make this? (Tell me a story).
I (personally) am a big fan of the livedisplay feature found on LineageOS ROM. I used it every single day, since Android Lollipop. Starting from Android Nougat, a native night mode
solution was added to AOSP and it felt like livedisplay was still way superior, thanks to its various options (you could say it spoiled me, sure). I also maintained a kernel (Venom
kernel) for the device I was using at that time. It was all good until the OEM dropped support for the device at Android M, and XDA being XDA, was already working on N ROMs. The issue
was, these ROMs weren't LineageOS or based on it, so livedisplay was... gone. I decided I'll try to bring that feature to every other ROM. How would I do that? Of course! The kernel! It
worked on every single ROM, it was the key! I started to work on it ASAP and here it is, up on GitHub, licensed under GPL (check klapse.c), open to everyone :)

How does it work?
Think of it like a fancy night mode, but not really. Klapse is dependent on an RGB interface (like Gamma on MTK and KCAL on SD chipsets). It fetches time from the kernel, converts it to
local time, and selects and RGB set based on the time. The result is really smooth shifting of RGB over time.

How does it really work (dev)?
Klapse mode 1 (time-based scaling) uses a method void klapse_pulse(void) that should ideally be called every minute. This can be done by injecting a pulse call inside another method that
is called repeatedly naturally, like cpufreq or atomic or frame commits. It can be anything, whatever you like, even a kthread, as long as it is called repeatedly naturally. To execute
every 60 seconds, use jiffies or ktime, or any similar method. The pulse function fetches the current time and makes calculations based on the current hour and the values of the tunables
listed down below.

Klapse mode 2 (brightness-based scaling) uses a method void set_rgb_slider(<type> bl_lvl) where is the data type of the brightness level used in your kernel source. (OnePlus 6 uses u32
data type for bl_lvl) set_rgb_slider needs to be called/injected inside a function that sets brightness for your device. (OnePlus 6 uses dsi_panel.c for that, check out the diff for that
file in /op6)

What all stuff can it do?

1, Emulate night mode with the proper RGB settings
2, Smoothly scale from one set of RGB to another set of RGB in integral intervals over time.
3, Reduce perceived brightness using brightness_factor by reducing the amount of color on screen. Allows lower apparent brightness than system permits.
4, Scale RGB based on brightness of display (low brightness usually implies a dark environment, where yellowness is probably useful).
5, Automate the perceived brightness independent of whether klapse is enabled, using its own set of start and stop hours.
6, Be more efficient,faster by residing inside the kernel instead of having to use the HWC HAL like android's night mode.
7, (On older devices) Reduce stuttering or frame lags caused by native night mode.
8, An easier solution against overlay-based apps that run as service in userspace/Android and sometimes block apps asking for permissions.
9, Give you a Livedisplay alternative if it doesn't work in your ROM.
10, Impress your crush so you can get a date (Hey, don't forget to credit me if it works).

Alright, so this is a replacement for night mode?
NO! Not at all. One can say this is merely an alternative for LineageOS' Livedisplay, but inside a kernel. Night mode is a sub-function of both Livedisplay and KLapse. Most comparisons
here were made with night mode because that's what an average user uses, and will relate to the most. There is absolutely no reason for your Android kernel to not have KLapse. Go ahead
and add it or ask your kernel maintainer to. It's super-easy!

What can it NOT do (yet)?

1, Calculate scaling to the level of minutes, like "Start from 5:37pm till 7:19am". --TODO
2, Make coffee for you.
3, Fly you to the moon. Without a heavy suit.
4, Get you a monthly subscription of free food, cereal included.

All these following tunables are found in their respective files in /sys/klapse/

1. enable_klapse : A switch to enable or disable klapse. Values : 0 = off, 1 = on (since v2.0, 2 = brightness-dependent mode)
2. klapse_start_hour : The hour at which klapse should start scaling the RGB values from daytime to target (see next points). Values : 0-23
3. klapse_stop_hour : The hour by which klapse should scale back the RGB values from target to daytime (see next points). Values : 0-23
4. daytime_rgb : The RGB set that must be used for all the time outside of start and stop hour range.
5. target_rgb : The RGB set that must be scaled towards for all the time inside of start and stop hour range.
6. klapse_scaling_rate : Controls how soon the RGB reaches from daytime to target inside of start and stop hour range. Once target is reached, it remains constant till 30 minutes before
   stop hour, where target RGB scales back to daytime RGB.
7. brightness_factor : From the name itself, this value has the ability to bend perception and make your display appear as if it is at a lesser brightness level than it actually is at.
   It works by reducing the RGB values by the same factor. Values : 2-10, (10 means accurate brightness, 5 means 50% of current brightness, you get it)
8. brightness_factor_auto : A switch that allows you to automatically set the brightness factor in a set time range. Value : 0 = off, 1 = on
9. brightness_factor_auto_start_hour : The hour at which brightness_factor should be applied. Works only if OnePlusOSS#8 is 1. Values : 0-23
10. brightness_factor_auto_stop_hour : The hour at which brightness_factor should be reverted to 10. Works only if OnePlusOSS#8 is 1. Values : 0-23
11. backlight_range : The brightness range within which klapse should scale from daytime to target_rgb. Works only if OnePlusOSS#1 is 2. Values : MIN_BRIGHTNESS-MAX_BRIGHTNESS

Signed-off-by: engstk <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Aug 1, 2020
Author: @tanish2k09 (email: [email protected])

What is it?
Kernel-based Lapse ("K-Lapse") is a linear RGB scaling module that 'shifts' RGB based on time (of the day/selected by user), or (since v2.0) brightness. This concept is inspired from
LineageOS (formerly known as 'CyanogenMod') ROM's feature "livedisplay" which also changes the display settings (RGB, hue, temperature, etc) based on time.

Why did you decide to make this? (Tell me a story).
I (personally) am a big fan of the livedisplay feature found on LineageOS ROM. I used it every single day, since Android Lollipop. Starting from Android Nougat, a native night mode
solution was added to AOSP and it felt like livedisplay was still way superior, thanks to its various options (you could say it spoiled me, sure). I also maintained a kernel (Venom
kernel) for the device I was using at that time. It was all good until the OEM dropped support for the device at Android M, and XDA being XDA, was already working on N ROMs. The issue
was, these ROMs weren't LineageOS or based on it, so livedisplay was... gone. I decided I'll try to bring that feature to every other ROM. How would I do that? Of course! The kernel! It
worked on every single ROM, it was the key! I started to work on it ASAP and here it is, up on GitHub, licensed under GPL (check klapse.c), open to everyone :)

How does it work?
Think of it like a fancy night mode, but not really. Klapse is dependent on an RGB interface (like Gamma on MTK and KCAL on SD chipsets). It fetches time from the kernel, converts it to
local time, and selects and RGB set based on the time. The result is really smooth shifting of RGB over time.

How does it really work (dev)?
Klapse mode 1 (time-based scaling) uses a method void klapse_pulse(void) that should ideally be called every minute. This can be done by injecting a pulse call inside another method that
is called repeatedly naturally, like cpufreq or atomic or frame commits. It can be anything, whatever you like, even a kthread, as long as it is called repeatedly naturally. To execute
every 60 seconds, use jiffies or ktime, or any similar method. The pulse function fetches the current time and makes calculations based on the current hour and the values of the tunables
listed down below.

Klapse mode 2 (brightness-based scaling) uses a method void set_rgb_slider(<type> bl_lvl) where is the data type of the brightness level used in your kernel source. (OnePlus 6 uses u32
data type for bl_lvl) set_rgb_slider needs to be called/injected inside a function that sets brightness for your device. (OnePlus 6 uses dsi_panel.c for that, check out the diff for that
file in /op6)

What all stuff can it do?

1, Emulate night mode with the proper RGB settings
2, Smoothly scale from one set of RGB to another set of RGB in integral intervals over time.
3, Reduce perceived brightness using brightness_factor by reducing the amount of color on screen. Allows lower apparent brightness than system permits.
4, Scale RGB based on brightness of display (low brightness usually implies a dark environment, where yellowness is probably useful).
5, Automate the perceived brightness independent of whether klapse is enabled, using its own set of start and stop hours.
6, Be more efficient,faster by residing inside the kernel instead of having to use the HWC HAL like android's night mode.
7, (On older devices) Reduce stuttering or frame lags caused by native night mode.
8, An easier solution against overlay-based apps that run as service in userspace/Android and sometimes block apps asking for permissions.
9, Give you a Livedisplay alternative if it doesn't work in your ROM.
10, Impress your crush so you can get a date (Hey, don't forget to credit me if it works).

Alright, so this is a replacement for night mode?
NO! Not at all. One can say this is merely an alternative for LineageOS' Livedisplay, but inside a kernel. Night mode is a sub-function of both Livedisplay and KLapse. Most comparisons
here were made with night mode because that's what an average user uses, and will relate to the most. There is absolutely no reason for your Android kernel to not have KLapse. Go ahead
and add it or ask your kernel maintainer to. It's super-easy!

What can it NOT do (yet)?

1, Calculate scaling to the level of minutes, like "Start from 5:37pm till 7:19am". --TODO
2, Make coffee for you.
3, Fly you to the moon. Without a heavy suit.
4, Get you a monthly subscription of free food, cereal included.

All these following tunables are found in their respective files in /sys/klapse/

1. enable_klapse : A switch to enable or disable klapse. Values : 0 = off, 1 = on (since v2.0, 2 = brightness-dependent mode)
2. klapse_start_hour : The hour at which klapse should start scaling the RGB values from daytime to target (see next points). Values : 0-23
3. klapse_stop_hour : The hour by which klapse should scale back the RGB values from target to daytime (see next points). Values : 0-23
4. daytime_rgb : The RGB set that must be used for all the time outside of start and stop hour range.
5. target_rgb : The RGB set that must be scaled towards for all the time inside of start and stop hour range.
6. klapse_scaling_rate : Controls how soon the RGB reaches from daytime to target inside of start and stop hour range. Once target is reached, it remains constant till 30 minutes before
   stop hour, where target RGB scales back to daytime RGB.
7. brightness_factor : From the name itself, this value has the ability to bend perception and make your display appear as if it is at a lesser brightness level than it actually is at.
   It works by reducing the RGB values by the same factor. Values : 2-10, (10 means accurate brightness, 5 means 50% of current brightness, you get it)
8. brightness_factor_auto : A switch that allows you to automatically set the brightness factor in a set time range. Value : 0 = off, 1 = on
9. brightness_factor_auto_start_hour : The hour at which brightness_factor should be applied. Works only if OnePlusOSS#8 is 1. Values : 0-23
10. brightness_factor_auto_stop_hour : The hour at which brightness_factor should be reverted to 10. Works only if OnePlusOSS#8 is 1. Values : 0-23
11. backlight_range : The brightness range within which klapse should scale from daytime to target_rgb. Works only if OnePlusOSS#1 is 2. Values : MIN_BRIGHTNESS-MAX_BRIGHTNESS

Signed-off-by: engstk <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Aug 6, 2020
Author: @tanish2k09 (email: [email protected])

What is it?
Kernel-based Lapse ("K-Lapse") is a linear RGB scaling module that 'shifts' RGB based on time (of the day/selected by user), or (since v2.0) brightness. This concept is inspired from
LineageOS (formerly known as 'CyanogenMod') ROM's feature "livedisplay" which also changes the display settings (RGB, hue, temperature, etc) based on time.

Why did you decide to make this? (Tell me a story).
I (personally) am a big fan of the livedisplay feature found on LineageOS ROM. I used it every single day, since Android Lollipop. Starting from Android Nougat, a native night mode
solution was added to AOSP and it felt like livedisplay was still way superior, thanks to its various options (you could say it spoiled me, sure). I also maintained a kernel (Venom
kernel) for the device I was using at that time. It was all good until the OEM dropped support for the device at Android M, and XDA being XDA, was already working on N ROMs. The issue
was, these ROMs weren't LineageOS or based on it, so livedisplay was... gone. I decided I'll try to bring that feature to every other ROM. How would I do that? Of course! The kernel! It
worked on every single ROM, it was the key! I started to work on it ASAP and here it is, up on GitHub, licensed under GPL (check klapse.c), open to everyone :)

How does it work?
Think of it like a fancy night mode, but not really. Klapse is dependent on an RGB interface (like Gamma on MTK and KCAL on SD chipsets). It fetches time from the kernel, converts it to
local time, and selects and RGB set based on the time. The result is really smooth shifting of RGB over time.

How does it really work (dev)?
Klapse mode 1 (time-based scaling) uses a method void klapse_pulse(void) that should ideally be called every minute. This can be done by injecting a pulse call inside another method that
is called repeatedly naturally, like cpufreq or atomic or frame commits. It can be anything, whatever you like, even a kthread, as long as it is called repeatedly naturally. To execute
every 60 seconds, use jiffies or ktime, or any similar method. The pulse function fetches the current time and makes calculations based on the current hour and the values of the tunables
listed down below.

Klapse mode 2 (brightness-based scaling) uses a method void set_rgb_slider(<type> bl_lvl) where is the data type of the brightness level used in your kernel source. (OnePlus 6 uses u32
data type for bl_lvl) set_rgb_slider needs to be called/injected inside a function that sets brightness for your device. (OnePlus 6 uses dsi_panel.c for that, check out the diff for that
file in /op6)

What all stuff can it do?

1, Emulate night mode with the proper RGB settings
2, Smoothly scale from one set of RGB to another set of RGB in integral intervals over time.
3, Reduce perceived brightness using brightness_factor by reducing the amount of color on screen. Allows lower apparent brightness than system permits.
4, Scale RGB based on brightness of display (low brightness usually implies a dark environment, where yellowness is probably useful).
5, Automate the perceived brightness independent of whether klapse is enabled, using its own set of start and stop hours.
6, Be more efficient,faster by residing inside the kernel instead of having to use the HWC HAL like android's night mode.
7, (On older devices) Reduce stuttering or frame lags caused by native night mode.
8, An easier solution against overlay-based apps that run as service in userspace/Android and sometimes block apps asking for permissions.
9, Give you a Livedisplay alternative if it doesn't work in your ROM.
10, Impress your crush so you can get a date (Hey, don't forget to credit me if it works).

Alright, so this is a replacement for night mode?
NO! Not at all. One can say this is merely an alternative for LineageOS' Livedisplay, but inside a kernel. Night mode is a sub-function of both Livedisplay and KLapse. Most comparisons
here were made with night mode because that's what an average user uses, and will relate to the most. There is absolutely no reason for your Android kernel to not have KLapse. Go ahead
and add it or ask your kernel maintainer to. It's super-easy!

What can it NOT do (yet)?

1, Calculate scaling to the level of minutes, like "Start from 5:37pm till 7:19am". --TODO
2, Make coffee for you.
3, Fly you to the moon. Without a heavy suit.
4, Get you a monthly subscription of free food, cereal included.

All these following tunables are found in their respective files in /sys/klapse/

1. enable_klapse : A switch to enable or disable klapse. Values : 0 = off, 1 = on (since v2.0, 2 = brightness-dependent mode)
2. klapse_start_hour : The hour at which klapse should start scaling the RGB values from daytime to target (see next points). Values : 0-23
3. klapse_stop_hour : The hour by which klapse should scale back the RGB values from target to daytime (see next points). Values : 0-23
4. daytime_rgb : The RGB set that must be used for all the time outside of start and stop hour range.
5. target_rgb : The RGB set that must be scaled towards for all the time inside of start and stop hour range.
6. klapse_scaling_rate : Controls how soon the RGB reaches from daytime to target inside of start and stop hour range. Once target is reached, it remains constant till 30 minutes before
   stop hour, where target RGB scales back to daytime RGB.
7. brightness_factor : From the name itself, this value has the ability to bend perception and make your display appear as if it is at a lesser brightness level than it actually is at.
   It works by reducing the RGB values by the same factor. Values : 2-10, (10 means accurate brightness, 5 means 50% of current brightness, you get it)
8. brightness_factor_auto : A switch that allows you to automatically set the brightness factor in a set time range. Value : 0 = off, 1 = on
9. brightness_factor_auto_start_hour : The hour at which brightness_factor should be applied. Works only if OnePlusOSS#8 is 1. Values : 0-23
10. brightness_factor_auto_stop_hour : The hour at which brightness_factor should be reverted to 10. Works only if OnePlusOSS#8 is 1. Values : 0-23
11. backlight_range : The brightness range within which klapse should scale from daytime to target_rgb. Works only if OnePlusOSS#1 is 2. Values : MIN_BRIGHTNESS-MAX_BRIGHTNESS

Signed-off-by: engstk <[email protected]>
infectedmushi pushed a commit to infectedmushi/android_kernel_oneplus_msm8998-oos that referenced this issue Aug 10, 2020
Author: @tanish2k09 (email: [email protected])

What is it?
Kernel-based Lapse ("K-Lapse") is a linear RGB scaling module that 'shifts' RGB based on time (of the day/selected by user), or (since v2.0) brightness. This concept is inspired from
LineageOS (formerly known as 'CyanogenMod') ROM's feature "livedisplay" which also changes the display settings (RGB, hue, temperature, etc) based on time.

Why did you decide to make this? (Tell me a story).
I (personally) am a big fan of the livedisplay feature found on LineageOS ROM. I used it every single day, since Android Lollipop. Starting from Android Nougat, a native night mode
solution was added to AOSP and it felt like livedisplay was still way superior, thanks to its various options (you could say it spoiled me, sure). I also maintained a kernel (Venom
kernel) for the device I was using at that time. It was all good until the OEM dropped support for the device at Android M, and XDA being XDA, was already working on N ROMs. The issue
was, these ROMs weren't LineageOS or based on it, so livedisplay was... gone. I decided I'll try to bring that feature to every other ROM. How would I do that? Of course! The kernel! It
worked on every single ROM, it was the key! I started to work on it ASAP and here it is, up on GitHub, licensed under GPL (check klapse.c), open to everyone :)

How does it work?
Think of it like a fancy night mode, but not really. Klapse is dependent on an RGB interface (like Gamma on MTK and KCAL on SD chipsets). It fetches time from the kernel, converts it to
local time, and selects and RGB set based on the time. The result is really smooth shifting of RGB over time.

How does it really work (dev)?
Klapse mode 1 (time-based scaling) uses a method void klapse_pulse(void) that should ideally be called every minute. This can be done by injecting a pulse call inside another method that
is called repeatedly naturally, like cpufreq or atomic or frame commits. It can be anything, whatever you like, even a kthread, as long as it is called repeatedly naturally. To execute
every 60 seconds, use jiffies or ktime, or any similar method. The pulse function fetches the current time and makes calculations based on the current hour and the values of the tunables
listed down below.

Klapse mode 2 (brightness-based scaling) uses a method void set_rgb_slider(<type> bl_lvl) where is the data type of the brightness level used in your kernel source. (OnePlus 6 uses u32
data type for bl_lvl) set_rgb_slider needs to be called/injected inside a function that sets brightness for your device. (OnePlus 6 uses dsi_panel.c for that, check out the diff for that
file in /op6)

What all stuff can it do?

1, Emulate night mode with the proper RGB settings
2, Smoothly scale from one set of RGB to another set of RGB in integral intervals over time.
3, Reduce perceived brightness using brightness_factor by reducing the amount of color on screen. Allows lower apparent brightness than system permits.
4, Scale RGB based on brightness of display (low brightness usually implies a dark environment, where yellowness is probably useful).
5, Automate the perceived brightness independent of whether klapse is enabled, using its own set of start and stop hours.
6, Be more efficient,faster by residing inside the kernel instead of having to use the HWC HAL like android's night mode.
7, (On older devices) Reduce stuttering or frame lags caused by native night mode.
8, An easier solution against overlay-based apps that run as service in userspace/Android and sometimes block apps asking for permissions.
9, Give you a Livedisplay alternative if it doesn't work in your ROM.
10, Impress your crush so you can get a date (Hey, don't forget to credit me if it works).

Alright, so this is a replacement for night mode?
NO! Not at all. One can say this is merely an alternative for LineageOS' Livedisplay, but inside a kernel. Night mode is a sub-function of both Livedisplay and KLapse. Most comparisons
here were made with night mode because that's what an average user uses, and will relate to the most. There is absolutely no reason for your Android kernel to not have KLapse. Go ahead
and add it or ask your kernel maintainer to. It's super-easy!

What can it NOT do (yet)?

1, Calculate scaling to the level of minutes, like "Start from 5:37pm till 7:19am". --TODO
2, Make coffee for you.
3, Fly you to the moon. Without a heavy suit.
4, Get you a monthly subscription of free food, cereal included.

All these following tunables are found in their respective files in /sys/klapse/

1. enable_klapse : A switch to enable or disable klapse. Values : 0 = off, 1 = on (since v2.0, 2 = brightness-dependent mode)
2. klapse_start_hour : The hour at which klapse should start scaling the RGB values from daytime to target (see next points). Values : 0-23
3. klapse_stop_hour : The hour by which klapse should scale back the RGB values from target to daytime (see next points). Values : 0-23
4. daytime_rgb : The RGB set that must be used for all the time outside of start and stop hour range.
5. target_rgb : The RGB set that must be scaled towards for all the time inside of start and stop hour range.
6. klapse_scaling_rate : Controls how soon the RGB reaches from daytime to target inside of start and stop hour range. Once target is reached, it remains constant till 30 minutes before
   stop hour, where target RGB scales back to daytime RGB.
7. brightness_factor : From the name itself, this value has the ability to bend perception and make your display appear as if it is at a lesser brightness level than it actually is at.
   It works by reducing the RGB values by the same factor. Values : 2-10, (10 means accurate brightness, 5 means 50% of current brightness, you get it)
8. brightness_factor_auto : A switch that allows you to automatically set the brightness factor in a set time range. Value : 0 = off, 1 = on
9. brightness_factor_auto_start_hour : The hour at which brightness_factor should be applied. Works only if OnePlusOSS#8 is 1. Values : 0-23
10. brightness_factor_auto_stop_hour : The hour at which brightness_factor should be reverted to 10. Works only if OnePlusOSS#8 is 1. Values : 0-23
11. backlight_range : The brightness range within which klapse should scale from daytime to target_rgb. Works only if OnePlusOSS#1 is 2. Values : MIN_BRIGHTNESS-MAX_BRIGHTNESS

Signed-off-by: engstk <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit 71a174b39f10b4b93223d374722aa894b5d8a82e ]

b6da31b "tty: Fix data race in tty_insert_flip_string_fixed_flag"
puts tty_flip_buffer_push under port->lock introducing the following
possible circular locking dependency:

[30129.876566] ======================================================
[30129.876566] WARNING: possible circular locking dependency detected
[30129.876567] 5.9.0-rc2+ OnePlusOSS#3 Tainted: G S      W
[30129.876568] ------------------------------------------------------
[30129.876568] sysrq.sh/1222 is trying to acquire lock:
[30129.876569] ffffffff92c39480 (console_owner){....}-{0:0}, at: console_unlock+0x3fe/0xa90

[30129.876572] but task is already holding lock:
[30129.876572] ffff888107cb9018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state.cold.55+0x15b/0x6ca

[30129.876576] which lock already depends on the new lock.

[30129.876577] the existing dependency chain (in reverse order) is:

[30129.876578] -> OnePlusOSS#3 (&pool->lock/1){-.-.}-{2:2}:
[30129.876581]        _raw_spin_lock+0x30/0x70
[30129.876581]        __queue_work+0x1a3/0x10f0
[30129.876582]        queue_work_on+0x78/0x80
[30129.876582]        pty_write+0x165/0x1e0
[30129.876583]        n_tty_write+0x47f/0xf00
[30129.876583]        tty_write+0x3d6/0x8d0
[30129.876584]        vfs_write+0x1a8/0x650

[30129.876588] -> OnePlusOSS#2 (&port->lock#2){-.-.}-{2:2}:
[30129.876590]        _raw_spin_lock_irqsave+0x3b/0x80
[30129.876591]        tty_port_tty_get+0x1d/0xb0
[30129.876592]        tty_port_default_wakeup+0xb/0x30
[30129.876592]        serial8250_tx_chars+0x3d6/0x970
[30129.876593]        serial8250_handle_irq.part.12+0x216/0x380
[30129.876593]        serial8250_default_handle_irq+0x82/0xe0
[30129.876594]        serial8250_interrupt+0xdd/0x1b0
[30129.876595]        __handle_irq_event_percpu+0xfc/0x850

[30129.876602] -> OnePlusOSS#1 (&port->lock){-.-.}-{2:2}:
[30129.876605]        _raw_spin_lock_irqsave+0x3b/0x80
[30129.876605]        serial8250_console_write+0x12d/0x900
[30129.876606]        console_unlock+0x679/0xa90
[30129.876606]        register_console+0x371/0x6e0
[30129.876607]        univ8250_console_init+0x24/0x27
[30129.876607]        console_init+0x2f9/0x45e

[30129.876609] -> #0 (console_owner){....}-{0:0}:
[30129.876611]        __lock_acquire+0x2f70/0x4e90
[30129.876612]        lock_acquire+0x1ac/0xad0
[30129.876612]        console_unlock+0x460/0xa90
[30129.876613]        vprintk_emit+0x130/0x420
[30129.876613]        printk+0x9f/0xc5
[30129.876614]        show_pwq+0x154/0x618
[30129.876615]        show_workqueue_state.cold.55+0x193/0x6ca
[30129.876615]        __handle_sysrq+0x244/0x460
[30129.876616]        write_sysrq_trigger+0x48/0x4a
[30129.876616]        proc_reg_write+0x1a6/0x240
[30129.876617]        vfs_write+0x1a8/0x650

[30129.876619] other info that might help us debug this:

[30129.876620] Chain exists of:
[30129.876621]   console_owner --> &port->lock#2 --> &pool->lock/1

[30129.876625]  Possible unsafe locking scenario:

[30129.876626]        CPU0                    CPU1
[30129.876626]        ----                    ----
[30129.876627]   lock(&pool->lock/1);
[30129.876628]                                lock(&port->lock#2);
[30129.876630]                                lock(&pool->lock/1);
[30129.876631]   lock(console_owner);

[30129.876633]  *** DEADLOCK ***

[30129.876634] 5 locks held by sysrq.sh/1222:
[30129.876634]  #0: ffff8881d3ce0470 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x359/0x650
[30129.876637]  OnePlusOSS#1: ffffffff92c612c0 (rcu_read_lock){....}-{1:2}, at: __handle_sysrq+0x4d/0x460
[30129.876640]  OnePlusOSS#2: ffffffff92c612c0 (rcu_read_lock){....}-{1:2}, at: show_workqueue_state+0x5/0xf0
[30129.876642]  OnePlusOSS#3: ffff888107cb9018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state.cold.55+0x15b/0x6ca
[30129.876645]  OnePlusOSS#4: ffffffff92c39980 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x123/0x420

[30129.876648] stack backtrace:
[30129.876649] CPU: 3 PID: 1222 Comm: sysrq.sh Tainted: G S      W         5.9.0-rc2+ OnePlusOSS#3
[30129.876649] Hardware name: Intel Corporation 2012 Client Platform/Emerald Lake 2, BIOS ACRVMBY1.86C.0078.P00.1201161002 01/16/2012
[30129.876650] Call Trace:
[30129.876650]  dump_stack+0x9d/0xe0
[30129.876651]  check_noncircular+0x34f/0x410
[30129.876653]  __lock_acquire+0x2f70/0x4e90
[30129.876656]  lock_acquire+0x1ac/0xad0
[30129.876658]  console_unlock+0x460/0xa90
[30129.876660]  vprintk_emit+0x130/0x420
[30129.876660]  printk+0x9f/0xc5
[30129.876661]  show_pwq+0x154/0x618
[30129.876662]  show_workqueue_state.cold.55+0x193/0x6ca
[30129.876664]  __handle_sysrq+0x244/0x460
[30129.876665]  write_sysrq_trigger+0x48/0x4a
[30129.876665]  proc_reg_write+0x1a6/0x240
[30129.876666]  vfs_write+0x1a8/0x650

It looks like the commit was aimed to protect tty_insert_flip_string and
there is no need for tty_flip_buffer_push to be under this lock.

Fixes: b6da31b ("tty: Fix data race in tty_insert_flip_string_fixed_flag")
Signed-off-by: Artem Savkov <[email protected]>
Acked-by: Jiri Slaby <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit e63d6fb5637e92725cf143559672a34b706bca4f ]

Enabling CONFIG_TAU_INT causes random crashes:

Unrecoverable exception 1700 at c0009414 (msr=1000)
Oops: Unrecoverable exception, sig: 6 [OnePlusOSS#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac-00043-gd5f545e1a8593 OnePlusOSS#5
NIP:  c0009414 LR: c0009414 CTR: c00116fc
REGS: c0799eb8 TRAP: 1700   Not tainted  (5.7.0-pmac-00043-gd5f545e1a8593)
MSR:  00001000 <ME>  CR: 22000228  XER: 00000100

GPR00: 00000000 c0799f70 c076e300 00800000 0291c0ac 00e00000 c076e300 00049032
GPR08: 00000001 c00116fc 00000000 dfbd3200 ffffffff 007f80a8 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c075ce04
GPR24: c075ce04 dfff8880 c07b0000 c075ce04 00080000 00000001 c079ef98 c079ef5c
NIP [c0009414] arch_cpu_idle+0x24/0x6c
LR [c0009414] arch_cpu_idle+0x24/0x6c
Call Trace:
[c0799f70] [00000001] 0x1 (unreliable)
[c0799f80] [c0060990] do_idle+0xd8/0x17c
[c0799fa0] [c0060ba4] cpu_startup_entry+0x20/0x28
[c0799fb0] [c072d220] start_kernel+0x434/0x44c
[c0799ff0] [00003860] 0x3860
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX 3d20c07b XXXXXXXX XXXXXXXX XXXXXXXX 7c0802a6
XXXXXXXX XXXXXXXX XXXXXXXX 4e800421 XXXXXXXX XXXXXXXX XXXXXXXX 7d2000a6
---[ end trace 3a0c9b5cb216db6b ]---

Resolve this problem by disabling each THRMn comparator when handling
the associated THRMn interrupt and by disabling the TAU entirely when
updating THRMn thresholds.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <[email protected]>
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/5a0ba3dc5612c7aac596727331284a3676c08472.1599260540.git.fthain@telegraphics.com.au
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit e8d5f92b8d30bb4ade76494490c3c065e12411b1 ]

Fix this by increase object reference count.

BUG: KASAN: use-after-free in __lock_acquire+0x3fd4/0x4180
kernel/locking/lockdep.c:3831
Read of size 8 at addr ffff8880683b0018 by task syz-executor.0/3377

CPU: 1 PID: 3377 Comm: syz-executor.0 Not tainted 5.6.11 OnePlusOSS#1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xce/0x128 lib/dump_stack.c:118
 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374
 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:641
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
 __lock_acquire+0x3fd4/0x4180 kernel/locking/lockdep.c:3831
 lock_acquire+0x127/0x350 kernel/locking/lockdep.c:4488
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
 printer_ioctl+0x4a/0x110 drivers/usb/gadget/function/f_printer.c:723
 vfs_ioctl fs/ioctl.c:47 [inline]
 ksys_ioctl+0xfb/0x130 fs/ioctl.c:763
 __do_sys_ioctl fs/ioctl.c:772 [inline]
 __se_sys_ioctl fs/ioctl.c:770 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:770
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4531a9
Code: ed 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 bb 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fd14ad72c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 00000000004531a9
RDX: fffffffffffffff9 RSI: 000000000000009e RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004bbd61
R13: 00000000004d0a98 R14: 00007fd14ad736d4 R15: 00000000ffffffff

Allocated by task 2393:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 __kasan_kmalloc.constprop.3+0xa7/0xd0 mm/kasan/common.c:515
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529
 kmem_cache_alloc_trace+0xfa/0x2d0 mm/slub.c:2813
 kmalloc include/linux/slab.h:555 [inline]
 kzalloc include/linux/slab.h:669 [inline]
 gprinter_alloc+0xa1/0x870 drivers/usb/gadget/function/f_printer.c:1416
 usb_get_function+0x58/0xc0 drivers/usb/gadget/functions.c:61
 config_usb_cfg_link+0x1ed/0x3e0 drivers/usb/gadget/configfs.c:444
 configfs_symlink+0x527/0x11d0 fs/configfs/symlink.c:202
 vfs_symlink+0x33d/0x5b0 fs/namei.c:4201
 do_symlinkat+0x11b/0x1d0 fs/namei.c:4228
 __do_sys_symlinkat fs/namei.c:4242 [inline]
 __se_sys_symlinkat fs/namei.c:4239 [inline]
 __x64_sys_symlinkat+0x73/0xb0 fs/namei.c:4239
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3368:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 kasan_set_free_info mm/kasan/common.c:337 [inline]
 __kasan_slab_free+0x135/0x190 mm/kasan/common.c:476
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485
 slab_free_hook mm/slub.c:1444 [inline]
 slab_free_freelist_hook mm/slub.c:1477 [inline]
 slab_free mm/slub.c:3034 [inline]
 kfree+0xf7/0x410 mm/slub.c:3995
 gprinter_free+0x49/0xd0 drivers/usb/gadget/function/f_printer.c:1353
 usb_put_function+0x38/0x50 drivers/usb/gadget/functions.c:87
 config_usb_cfg_unlink+0x2db/0x3b0 drivers/usb/gadget/configfs.c:485
 configfs_unlink+0x3b9/0x7f0 fs/configfs/symlink.c:250
 vfs_unlink+0x287/0x570 fs/namei.c:4073
 do_unlinkat+0x4f9/0x620 fs/namei.c:4137
 __do_sys_unlink fs/namei.c:4184 [inline]
 __se_sys_unlink fs/namei.c:4182 [inline]
 __x64_sys_unlink+0x42/0x50 fs/namei.c:4182
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880683b0000
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 24 bytes inside of
 1024-byte region [ffff8880683b0000, ffff8880683b0400)
The buggy address belongs to the page:
page:ffffea0001a0ec00 refcount:1 mapcount:0 mapping:ffff88806c00e300
index:0xffff8880683b1800 compound_mapcount: 0
flags: 0x100000000010200(slab|head)
raw: 0100000000010200 0000000000000000 0000000600000001 ffff88806c00e300
raw: ffff8880683b1800 000000008010000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Reported-by: Kyungtae Kim <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit bad60b8d1a7194df38fd7fe4b22f3f4dcf775099 ]

The idx in __ath10k_htt_rx_ring_fill_n function lives in
consistent dma region writable by the device. Malfunctional
or malicious device could manipulate such idx to have a OOB
write. Either by
    htt->rx_ring.netbufs_ring[idx] = skb;
or by
    ath10k_htt_set_paddrs_ring(htt, paddr, idx);

The idx can also be negative as it's signed, giving a large
memory space to write to.

It's possibly exploitable by corruptting a legit pointer with
a skb pointer. And then fill skb with payload as rougue object.

Part of the log here. Sometimes it appears as UAF when writing
to a freed memory by chance.

 [   15.594376] BUG: unable to handle page fault for address: ffff887f5c1804f0
 [   15.595483] #PF: supervisor write access in kernel mode
 [   15.596250] #PF: error_code(0x0002) - not-present page
 [   15.597013] PGD 0 P4D 0
 [   15.597395] Oops: 0002 [OnePlusOSS#1] SMP KASAN PTI
 [   15.597967] CPU: 0 PID: 82 Comm: kworker/u2:2 Not tainted 5.6.0 #69
 [   15.598843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
 BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 [   15.600438] Workqueue: ath10k_wq ath10k_core_register_work [ath10k_core]
 [   15.601389] RIP: 0010:__ath10k_htt_rx_ring_fill_n
 (linux/drivers/net/wireless/ath/ath10k/htt_rx.c:173) ath10k_core

Signed-off-by: Zekun Shen <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit 6a257471fa42c8c9c04a875cd3a2a22db148e0f0 ]

As syzbot reported:

kernel BUG at fs/f2fs/segment.h:657!
invalid opcode: 0000 [OnePlusOSS#1] PREEMPT SMP KASAN
CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657
Call Trace:
 build_sit_entries fs/f2fs/segment.c:4195 [inline]
 f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779
 f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633
 mount_bdev+0x32e/0x3f0 fs/super.c:1417
 legacy_get_tree+0x105/0x220 fs/fs_context.c:592
 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x1387/0x2070 fs/namespace.c:3192
 do_mount fs/namespace.c:3205 [inline]
 __do_sys_mount fs/namespace.c:3413 [inline]
 __se_sys_mount fs/namespace.c:3390 [inline]
 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

@blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic
in following sanity check in current_sit_addr(), add check condition to
avoid this issue.

Reported-by: [email protected]
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
commit b44c018cdf748b96b676ba09fdbc5b34fc443ada upstream.

KoWei reported crash during raid5 reshape:

[ 1032.252932] Oops: 0002 [OnePlusOSS#1] SMP PTI
[...]
[ 1032.252943] RIP: 0010:memcpy_erms+0x6/0x10
[...]
[ 1032.252947] RSP: 0018:ffffba1ac0c03b78 EFLAGS: 00010286
[ 1032.252949] RAX: 0000784ac0000000 RBX: ffff91bec3d09740 RCX: 0000000000001000
[ 1032.252951] RDX: 0000000000001000 RSI: ffff91be6781c000 RDI: 0000784ac0000000
[ 1032.252953] RBP: ffffba1ac0c03bd8 R08: 0000000000001000 R09: ffffba1ac0c03bf8
[ 1032.252954] R10: 0000000000000000 R11: 0000000000000000 R12: ffffba1ac0c03bf8
[ 1032.252955] R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000000000
[ 1032.252958] FS:  0000000000000000(0000) GS:ffff91becf500000(0000) knlGS:0000000000000000
[ 1032.252959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1032.252961] CR2: 0000784ac0000000 CR3: 000000031780a002 CR4: 00000000001606e0
[ 1032.252962] Call Trace:
[ 1032.252969]  ? async_memcpy+0x179/0x1000 [async_memcpy]
[ 1032.252977]  ? raid5_release_stripe+0x8e/0x110 [raid456]
[ 1032.252982]  handle_stripe_expansion+0x15a/0x1f0 [raid456]
[ 1032.252988]  handle_stripe+0x592/0x1270 [raid456]
[ 1032.252993]  handle_active_stripes.isra.0+0x3cb/0x5a0 [raid456]
[ 1032.252999]  raid5d+0x35c/0x550 [raid456]
[ 1032.253002]  ? schedule+0x42/0xb0
[ 1032.253006]  ? schedule_timeout+0x10e/0x160
[ 1032.253011]  md_thread+0x97/0x160
[ 1032.253015]  ? wait_woken+0x80/0x80
[ 1032.253019]  kthread+0x104/0x140
[ 1032.253022]  ? md_start_sync+0x60/0x60
[ 1032.253024]  ? kthread_park+0x90/0x90
[ 1032.253027]  ret_from_fork+0x35/0x40

This is because cache_size_mutex was unlocked too early in resize_stripes,
which races with grow_one_stripe() that grow_one_stripe() allocates a
stripe with wrong pool_size.

Fix this issue by unlocking cache_size_mutex after updating pool_size.

Cc: <[email protected]> # v4.4+
Reported-by: KoWei Sung <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
commit aea948bb80b478ddc2448f7359d574387521a52d upstream.

Every error log reported by OPAL is exported to userspace through a
sysfs interface and notified using kobject_uevent(). The userspace
daemon (opal_errd) then reads the error log and acknowledges the error
log is saved safely to disk. Once acknowledged the kernel removes the
respective sysfs file entry causing respective resources to be
released including kobject.

However it's possible the userspace daemon may already be scanning
elog entries when a new sysfs elog entry is created by the kernel.
User daemon may read this new entry and ack it even before kernel can
notify userspace about it through kobject_uevent() call. If that
happens then we have a potential race between
elog_ack_store->kobject_put() and kobject_uevent which can lead to
use-after-free of a kernfs object resulting in a kernel crash. eg:

  BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6bfb
  Faulting instruction address: 0xc0000000008ff2a0
  Oops: Kernel access of bad area, sig: 11 [OnePlusOSS#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 27 PID: 805 Comm: irq/29-opal-elo Not tainted 5.9.0-rc2-gcc-8.2.0-00214-g6f56a67bcbb5-dirty #363
  ...
  NIP kobject_uevent_env+0xa0/0x910
  LR  elog_event+0x1f4/0x2d0
  Call Trace:
    0x5deadbeef0000122 (unreliable)
    elog_event+0x1f4/0x2d0
    irq_thread_fn+0x4c/0xc0
    irq_thread+0x1c0/0x2b0
    kthread+0x1c4/0x1d0
    ret_from_kernel_thread+0x5c/0x6c

This patch fixes this race by protecting the sysfs file
creation/notification by holding a reference count on kobject until we
safely send kobject_uevent().

The function create_elog_obj() returns the elog object which if used
by caller function will end up in use-after-free problem again.
However, the return value of create_elog_obj() function isn't being
used today and there is no need as well. Hence change it to return
void to make this fix complete.

Fixes: 774fea1 ("powerpc/powernv: Read OPAL error log and export it through sysfs")
Cc: [email protected] # v3.15+
Reported-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Oliver O'Halloran <[email protected]>
Reviewed-by: Vasant Hegde <[email protected]>
[mpe: Rework the logic to use a single return, reword comments, add oops]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
…lone()

[ Upstream commit 286228d382ba6320f04fa2e7c6fc8d4d92e428f4 ]

All user space generated SKBs are owned by a socket (unless injected into the
key via AF_PACKET). If a socket is closed, all associated skbs will be cleaned
up.

This leads to a problem when a CAN driver calls can_put_echo_skb() on a
unshared SKB. If the socket is closed prior to the TX complete handler,
can_get_echo_skb() and the subsequent delivering of the echo SKB to all
registered callbacks, a SKB with a refcount of 0 is delivered.

To avoid the problem, in can_get_echo_skb() the original SKB is now always
cloned, regardless of shared SKB or not. If the process exists it can now
safely discard its SKBs, without disturbing the delivery of the echo SKB.

The problem shows up in the j1939 stack, when it clones the incoming skb, which
detects the already 0 refcount.

We can easily reproduce this with following example:

testj1939 -B -r can0: &
cansend can0 1823ff40#0123

WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
refcount_t: addition on 0; use-after-free.
Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d OnePlusOSS#1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c010f570>] (dump_backtrace) from [<c010f90c>] (show_stack+0x20/0x24)
[<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
[<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
[<c0127f0c>] (__warn) from [<c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
[<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>] (refcount_warn_saturate+0x108/0x174)
[<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>] (j1939_can_recv+0x20c/0x210)
[<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
[<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
[<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
[<c0aca300>] (can_rcv) from [<c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
[<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>] (__netif_receive_skb+0x38/0x94)
[<c09b2038>] (__netif_receive_skb) from [<c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
[<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>] (netif_receive_skb+0x34/0x19c)
[<c09b21c4>] (netif_receive_skb) from [<c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)

Fixes: 0ae89be ("can: add destructor for self generated skbs")
Signed-off-by: Oleksij Rempel <[email protected]>
Link: http://lore.kernel.org/r/[email protected]
Acked-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
commit df5b0ab3e08a156701b537809914b339b0daa526 upstream.

Limit the CPU number to num_possible_cpus(), because setting it to a
value lower than INT_MAX but higher than NR_CPUS produces the following
error on reboot and shutdown:

    BUG: unable to handle page fault for address: ffffffff90ab1bb0
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0
    Oops: 0000 [OnePlusOSS#1] SMP
    CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
    RIP: 0010:migrate_to_reboot_cpu+0xe/0x60
    Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c
    RSP: 0018:ffffc90000013e08 EFLAGS: 00010246
    RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000
    RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0
    RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000
    FS:  00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
      __do_sys_reboot.cold+0x34/0x5b
      do_syscall_64+0x2d/0x40

Fixes: 1b3a5d0 ("reboot: move arch/x86 reboot= handling to generic kernel")
Signed-off-by: Matteo Croce <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
[sudip: use reboot_mode instead of mode]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
commit e50e4f0b85be308a01b830c5fbdffc657e1a6dd0 upstream

If interrupt comes late, during probe error path or device remove (could
be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
i2c_imx_isr() will access registers with the clock being disabled.  This
leads to external abort on non-linefetch on Toradex Colibri VF50 module
(with Vybrid VF5xx):

    Unhandled fault: external abort on non-linefetch (0x1008) at 0x8882d003
    Internal error: : 1008 [OnePlusOSS#1] ARM
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper Not tainted 5.7.0 #607
    Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
      (i2c_imx_isr) from [<8017009c>] (free_irq+0x25c/0x3b0)
      (free_irq) from [<805844ec>] (release_nodes+0x178/0x284)
      (release_nodes) from [<80580030>] (really_probe+0x10c/0x348)
      (really_probe) from [<80580380>] (driver_probe_device+0x60/0x170)
      (driver_probe_device) from [<80580630>] (device_driver_attach+0x58/0x60)
      (device_driver_attach) from [<805806bc>] (__driver_attach+0x84/0xc0)
      (__driver_attach) from [<8057e228>] (bus_for_each_dev+0x68/0xb4)
      (bus_for_each_dev) from [<8057f3ec>] (bus_add_driver+0x144/0x1ec)
      (bus_add_driver) from [<80581320>] (driver_register+0x78/0x110)
      (driver_register) from [<8010213c>] (do_one_initcall+0xa8/0x2f4)
      (do_one_initcall) from [<80c0100c>] (kernel_init_freeable+0x178/0x1dc)
      (kernel_init_freeable) from [<80807048>] (kernel_init+0x8/0x110)
      (kernel_init) from [<80100114>] (ret_from_fork+0x14/0x20)

Additionally, the i2c_imx_isr() could wake up the wait queue
(imx_i2c_struct->queue) before its initialization happens.

The resource-managed framework should not be used for interrupt handling,
because the resource will be released too late - after disabling clocks.
The interrupt handler is not prepared for such case.

Fixes: 1c4b6c3 ("i2c: imx: implement bus recovery")
Cc: <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
[sudip: manual backport with extra label for goto]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
commit ee45700 upstream

We recently came across a V4 filesystem causing memory corruption
due to a newly allocated inode being setup twice and being added to
the superblock inode list twice. From code inspection, the only way
this could happen is if a newly allocated inode was not marked as
free on disk (i.e. di_mode wasn't zero).

Running the metadump on an upstream debug kernel fails during inode
allocation like so:

XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod=
e.c, line: 838
 ------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:114!
invalid opcode: 0000 [OnePlusOSS#1] PREEMPT SMP
CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0=
1/2014
RIP: 0010:assfail+0x28/0x30
RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
FS:  00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000=
000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
Call Trace:
 xfs_ialloc+0x383/0x570
 xfs_dir_ialloc+0x6a/0x2a0
 xfs_create+0x412/0x670
 xfs_generic_create+0x1f7/0x2c0
 ? capable_wrt_inode_uidgid+0x3f/0x50
 vfs_mkdir+0xfb/0x1b0
 SyS_mkdir+0xcf/0xf0
 do_syscall_64+0x73/0x1a0
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Extracting the inode number we crashed on from an event trace and
looking at it with xfs_db:

xfs_db> inode 184452204
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
.....

Confirms that it is not a free inode on disk. xfs_repair
also trips over this inode:

.....
zero length extent (off = 0, fsbno = 0) in ino 184452204
correcting nextents for inode 184452204
bad attribute fork in inode 184452204, would clear attr fork
bad nblocks 1 for inode 184452204, would reset to 0
bad anextents 1 for inode 184452204, would reset to 0
imap claims in-use inode 184452204 is free, would correct imap
would have cleared inode 184452204
.....
disconnected inode 184452204, would move to lost+found

And so we have a situation where the directory structure and the
inobt thinks the inode is free, but the inode on disk thinks it is
still in use. Where this corruption came from is not possible to
diagnose, but we can detect it and prevent the kernel from oopsing
on lookup. The reproducer now results in:

$ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu=
re needs cleaning
mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o=
utput error
mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o=
utput error
....

And this corruption shutdown:

[   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not=
 marked free on disk
[   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 =
of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
[   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #=
443
[   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO=
S 1.10.2-1 04/01/2014
[   54.852859] Call Trace:
[   54.853531]  dump_stack+0x85/0xc5
[   54.854385]  xfs_trans_cancel+0x197/0x1c0
[   54.855421]  xfs_create+0x425/0x670
[   54.856314]  xfs_generic_create+0x1f7/0x2c0
[   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
[   54.858586]  vfs_mkdir+0xfb/0x1b0
[   54.859458]  SyS_mkdir+0xcf/0xf0
[   54.860254]  do_syscall_64+0x73/0x1a0
[   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[   54.862492] RIP: 0033:0x7fb73bddf547
[   54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000=
000000000053
[   54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73=
bddf547
[   54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda=
a55449a
[   54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a=
8670dd0
[   54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000=
00001ff
[   54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda=
a553500
[   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1=
024 of file fs/xfs/xfs_trans.c.  Return address = ffffffff814cd050
[   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutt=
ing down filesystem
[   54.884597] XFS (loop0): Please umount the filesystem and rectify the =
problem(s)

Note that this crash is only possible on v4 filesystemsi or v5
filesystems mounted with the ikeep mount option. For all other V5
filesystems, this problem cannot occur because we don't read inodes
we are allocating from disk - we simply overwrite them with the new
inode information.

Signed-Off-By: Dave Chinner <[email protected]>
Reviewed-by: Carlos Maiolino <[email protected]>
Tested-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
[sudip: use ip->i_d.di_mode instead of VFS_I(ip)->i_mode]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit e773ca7da8beeca7f17fe4c9d1284a2b66839cc1 ]

Actually, burst size is equal to '1 << desc->rqcfg.brst_size'.
we should use burst size, not desc->rqcfg.brst_size.

dma memcpy performance on Rockchip RV1126
@ 1512MHz A7, 1056MHz LPDDR3, 200MHz DMA:

dmatest:

/# echo dma0chan0 > /sys/module/dmatest/parameters/channel
/# echo 4194304 > /sys/module/dmatest/parameters/test_buf_size
/# echo 8 > /sys/module/dmatest/parameters/iterations
/# echo y > /sys/module/dmatest/parameters/norandom
/# echo y > /sys/module/dmatest/parameters/verbose
/# echo 1 > /sys/module/dmatest/parameters/run

dmatest: dma0chan0-copy0: result OnePlusOSS#1: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result OnePlusOSS#2: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result OnePlusOSS#3: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result OnePlusOSS#4: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result OnePlusOSS#5: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result OnePlusOSS#6: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result OnePlusOSS#7: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result OnePlusOSS#8: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000

Before:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 48 iops 200338 KB/s (0)

After this patch:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 179 iops 734873 KB/s (0)

After this patch and increase dma clk to 400MHz:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 259 iops 1062929 KB/s (0)

Signed-off-by: Sugar Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
commit 3d05cad3c357a2b749912914356072b38435edfa upstream

Lockdep reported the following splat when running test btrfs/190 from
fstests:

  [ 9482.126098] ======================================================
  [ 9482.126184] WARNING: possible circular locking dependency detected
  [ 9482.126281] 5.10.0-rc4-btrfs-next-73 OnePlusOSS#1 Not tainted
  [ 9482.126365] ------------------------------------------------------
  [ 9482.126456] mount/24187 is trying to acquire lock:
  [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.126647]
		 but task is already holding lock:
  [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.126886]
		 which lock already depends on the new lock.

  [ 9482.127078]
		 the existing dependency chain (in reverse order) is:
  [ 9482.127213]
		 -> OnePlusOSS#1 (btrfs-quota-00){++++}-{3:3}:
  [ 9482.127366]        lock_acquire+0xd8/0x490
  [ 9482.127436]        down_read_nested+0x45/0x220
  [ 9482.127528]        __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.127613]        btrfs_read_lock_root_node+0x41/0x130 [btrfs]
  [ 9482.127702]        btrfs_search_slot+0x514/0xc30 [btrfs]
  [ 9482.127788]        update_qgroup_status_item+0x72/0x140 [btrfs]
  [ 9482.127877]        btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs]
  [ 9482.127964]        btrfs_work_helper+0xf1/0x600 [btrfs]
  [ 9482.128039]        process_one_work+0x24e/0x5e0
  [ 9482.128110]        worker_thread+0x50/0x3b0
  [ 9482.128181]        kthread+0x153/0x170
  [ 9482.128256]        ret_from_fork+0x22/0x30
  [ 9482.128327]
		 -> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}:
  [ 9482.128464]        check_prev_add+0x91/0xc60
  [ 9482.128551]        __lock_acquire+0x1740/0x3110
  [ 9482.128623]        lock_acquire+0xd8/0x490
  [ 9482.130029]        __mutex_lock+0xa3/0xb30
  [ 9482.130590]        qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.131577]        btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.132175]        open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.132756]        btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.133325]        legacy_get_tree+0x30/0x60
  [ 9482.133866]        vfs_get_tree+0x28/0xe0
  [ 9482.134392]        fc_mount+0xe/0x40
  [ 9482.134908]        vfs_kern_mount.part.0+0x71/0x90
  [ 9482.135428]        btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.135942]        legacy_get_tree+0x30/0x60
  [ 9482.136444]        vfs_get_tree+0x28/0xe0
  [ 9482.136949]        path_mount+0x2d7/0xa70
  [ 9482.137438]        do_mount+0x75/0x90
  [ 9482.137923]        __x64_sys_mount+0x8e/0xd0
  [ 9482.138400]        do_syscall_64+0x33/0x80
  [ 9482.138873]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.139346]
		 other info that might help us debug this:

  [ 9482.140735]  Possible unsafe locking scenario:

  [ 9482.141594]        CPU0                    CPU1
  [ 9482.142011]        ----                    ----
  [ 9482.142411]   lock(btrfs-quota-00);
  [ 9482.142806]                                lock(&fs_info->qgroup_rescan_lock);
  [ 9482.143216]                                lock(btrfs-quota-00);
  [ 9482.143629]   lock(&fs_info->qgroup_rescan_lock);
  [ 9482.144056]
		  *** DEADLOCK ***

  [ 9482.145242] 2 locks held by mount/24187:
  [ 9482.145637]  #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400
  [ 9482.146061]  OnePlusOSS#1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.146509]
		 stack backtrace:
  [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 OnePlusOSS#1
  [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 9482.148709] Call Trace:
  [ 9482.149169]  dump_stack+0x8d/0xb5
  [ 9482.149628]  check_noncircular+0xff/0x110
  [ 9482.150090]  check_prev_add+0x91/0xc60
  [ 9482.150561]  ? kvm_clock_read+0x14/0x30
  [ 9482.151017]  ? kvm_sched_clock_read+0x5/0x10
  [ 9482.151470]  __lock_acquire+0x1740/0x3110
  [ 9482.151941]  ? __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.152402]  lock_acquire+0xd8/0x490
  [ 9482.152887]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.153354]  __mutex_lock+0xa3/0xb30
  [ 9482.153826]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154301]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154768]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155226]  qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155690]  btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.156160]  open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.156643]  btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.157108]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.157567]  ? kfree+0x31f/0x3e0
  [ 9482.158030]  legacy_get_tree+0x30/0x60
  [ 9482.158489]  vfs_get_tree+0x28/0xe0
  [ 9482.158947]  fc_mount+0xe/0x40
  [ 9482.159403]  vfs_kern_mount.part.0+0x71/0x90
  [ 9482.159875]  btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.160335]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.160805]  ? kfree+0x31f/0x3e0
  [ 9482.161260]  ? legacy_get_tree+0x30/0x60
  [ 9482.161714]  legacy_get_tree+0x30/0x60
  [ 9482.162166]  vfs_get_tree+0x28/0xe0
  [ 9482.162616]  path_mount+0x2d7/0xa70
  [ 9482.163070]  do_mount+0x75/0x90
  [ 9482.163525]  __x64_sys_mount+0x8e/0xd0
  [ 9482.163986]  do_syscall_64+0x33/0x80
  [ 9482.164437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.164902] RIP: 0033:0x7f51e907caaa

This happens because at btrfs_read_qgroup_config() we can call
qgroup_rescan_init() while holding a read lock on a quota btree leaf,
acquired by the previous call to btrfs_search_slot_for_read(), and
qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.

A qgroup rescan worker does the opposite: it acquires the mutex
qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
update the qgroup status item in the quota btree through the call to
update_qgroup_status_item(). This inversion of locking order
between the qgroup_rescan_lock mutex and quota btree locks causes the
splat.

Fix this simply by releasing and freeing the path before calling
qgroup_rescan_init() at btrfs_read_qgroup_config().

CC: [email protected] # 4.4+
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
commit ca10845a56856fff4de3804c85e6424d0f6d0cde upstream

While running btrfs/061, btrfs/073, btrfs/078, or btrfs/178 we hit the
following lockdep splat:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0-rc3+ OnePlusOSS#4 Not tainted
  ------------------------------------------------------
  kswapd0/100 is trying to acquire lock:
  ffff96ecc22ef4a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330

  but task is already holding lock:
  ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> OnePlusOSS#3 (fs_reclaim){+.+.}-{0:0}:
	 fs_reclaim_acquire+0x65/0x80
	 slab_pre_alloc_hook.constprop.0+0x20/0x200
	 kmem_cache_alloc+0x37/0x270
	 alloc_inode+0x82/0xb0
	 iget_locked+0x10d/0x2c0
	 kernfs_get_inode+0x1b/0x130
	 kernfs_get_tree+0x136/0x240
	 sysfs_get_tree+0x16/0x40
	 vfs_get_tree+0x28/0xc0
	 path_mount+0x434/0xc00
	 __x64_sys_mount+0xe3/0x120
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> OnePlusOSS#2 (kernfs_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x7e/0x7e0
	 kernfs_add_one+0x23/0x150
	 kernfs_create_link+0x63/0xa0
	 sysfs_do_create_link_sd+0x5e/0xd0
	 btrfs_sysfs_add_devices_dir+0x81/0x130
	 btrfs_init_new_device+0x67f/0x1250
	 btrfs_ioctl+0x1ef/0x2e20
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> OnePlusOSS#1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x7e/0x7e0
	 btrfs_chunk_alloc+0x125/0x3a0
	 find_free_extent+0xdf6/0x1210
	 btrfs_reserve_extent+0xb3/0x1b0
	 btrfs_alloc_tree_block+0xb0/0x310
	 alloc_tree_block_no_bg_flush+0x4a/0x60
	 __btrfs_cow_block+0x11a/0x530
	 btrfs_cow_block+0x104/0x220
	 btrfs_search_slot+0x52e/0x9d0
	 btrfs_insert_empty_items+0x64/0xb0
	 btrfs_insert_delayed_items+0x90/0x4f0
	 btrfs_commit_inode_delayed_items+0x93/0x140
	 btrfs_log_inode+0x5de/0x2020
	 btrfs_log_inode_parent+0x429/0xc90
	 btrfs_log_new_name+0x95/0x9b
	 btrfs_rename2+0xbb9/0x1800
	 vfs_rename+0x64f/0x9f0
	 do_renameat2+0x320/0x4e0
	 __x64_sys_rename+0x1f/0x30
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
	 __lock_acquire+0x119c/0x1fc0
	 lock_acquire+0xa7/0x3d0
	 __mutex_lock+0x7e/0x7e0
	 __btrfs_release_delayed_node.part.0+0x3f/0x330
	 btrfs_evict_inode+0x24c/0x500
	 evict+0xcf/0x1f0
	 dispose_list+0x48/0x70
	 prune_icache_sb+0x44/0x50
	 super_cache_scan+0x161/0x1e0
	 do_shrink_slab+0x178/0x3c0
	 shrink_slab+0x17c/0x290
	 shrink_node+0x2b2/0x6d0
	 balance_pgdat+0x30a/0x670
	 kswapd+0x213/0x4c0
	 kthread+0x138/0x160
	 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> kernfs_mutex --> fs_reclaim

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(fs_reclaim);
				 lock(kernfs_mutex);
				 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/100:
   #0: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   OnePlusOSS#1: ffffffff8dd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
   OnePlusOSS#2: ffff96ed2ade30e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0

  stack backtrace:
  CPU: 0 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ OnePlusOSS#4
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:
   dump_stack+0x8b/0xb8
   check_noncircular+0x12d/0x150
   __lock_acquire+0x119c/0x1fc0
   lock_acquire+0xa7/0x3d0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   __mutex_lock+0x7e/0x7e0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? lock_acquire+0xa7/0x3d0
   ? find_held_lock+0x2b/0x80
   __btrfs_release_delayed_node.part.0+0x3f/0x330
   btrfs_evict_inode+0x24c/0x500
   evict+0xcf/0x1f0
   dispose_list+0x48/0x70
   prune_icache_sb+0x44/0x50
   super_cache_scan+0x161/0x1e0
   do_shrink_slab+0x178/0x3c0
   shrink_slab+0x17c/0x290
   shrink_node+0x2b2/0x6d0
   balance_pgdat+0x30a/0x670
   kswapd+0x213/0x4c0
   ? _raw_spin_unlock_irqrestore+0x41/0x50
   ? add_wait_queue_exclusive+0x70/0x70
   ? balance_pgdat+0x670/0x670
   kthread+0x138/0x160
   ? kthread_create_worker_on_cpu+0x40/0x40
   ret_from_fork+0x1f/0x30

This happens because we are holding the chunk_mutex at the time of
adding in a new device.  However we only need to hold the
device_list_mutex, as we're going to iterate over the fs_devices
devices.  Move the sysfs init stuff outside of the chunk_mutex to get
rid of this lockdep splat.

CC: [email protected] # 4.4.x: f3cd2c5: btrfs: sysfs, rename device_link add/remove functions
CC: [email protected] # 4.4.x
Reported-by: David Sterba <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit 340b940ea0ed12d9adbb8f72dea17d516b2019e8 ]

If cm_create_timewait_info() fails, the timewait_info pointer will contain
an error value and will be used in cm_remove_remote() later.

  general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [OnePlusOSS#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127]
  CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e OnePlusOSS#27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978
  Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00
  RSP: 0018:ffff888013127918 EFLAGS: 00010006
  RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000
  RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4
  RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d
  R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70
  R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c
  FS:  00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155
   cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline]
   rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107
   rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140
   ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069
   ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724
   vfs_write+0x220/0×830 fs/read_write.c:603
   ksys_write+0x1df/0×240 fs/read_write.c:658
   do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a977049 ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Maor Gottlieb <[email protected]>
Reported-by: Amit Matityahu <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Feb 14, 2021
[ Upstream commit 4a9d81caf841cd2c0ae36abec9c2963bf21d0284 ]

If the elem is deleted during be iterated on it, the iteration
process will fall into an endless loop.

kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [nfsd:17137]

PID: 17137  TASK: ffff8818d93c0000  CPU: 4   COMMAND: "nfsd"
    [exception RIP: __state_in_grace+76]
    RIP: ffffffffc00e817c  RSP: ffff8818d3aefc98  RFLAGS: 00000246
    RAX: ffff881dc0c38298  RBX: ffffffff81b03580  RCX: ffff881dc02c9f50
    RDX: ffff881e3fce8500  RSI: 0000000000000001  RDI: ffffffff81b03580
    RBP: ffff8818d3aefca0   R8: 0000000000000020   R9: ffff8818d3aefd40
    R10: ffff88017fc03800  R11: ffff8818e83933c0  R12: ffff8818d3aefd40
    R13: 0000000000000000  R14: ffff8818e8391068  R15: ffff8818fa6e4000
    CS: 0010  SS: 0018
 #0 [ffff8818d3aefc98] opens_in_grace at ffffffffc00e81e3 [grace]
 OnePlusOSS#1 [ffff8818d3aefca8] nfs4_preprocess_stateid_op at ffffffffc02a3e6c [nfsd]
 OnePlusOSS#2 [ffff8818d3aefd18] nfsd4_write at ffffffffc028ed5b [nfsd]
 OnePlusOSS#3 [ffff8818d3aefd80] nfsd4_proc_compound at ffffffffc0290a0d [nfsd]
 OnePlusOSS#4 [ffff8818d3aefdd0] nfsd_dispatch at ffffffffc027b800 [nfsd]
 OnePlusOSS#5 [ffff8818d3aefe08] svc_process_common at ffffffffc02017f3 [sunrpc]
 OnePlusOSS#6 [ffff8818d3aefe70] svc_process at ffffffffc0201ce3 [sunrpc]
 OnePlusOSS#7 [ffff8818d3aefe98] nfsd at ffffffffc027b117 [nfsd]
 OnePlusOSS#8 [ffff8818d3aefec8] kthread at ffffffff810b88c1
 OnePlusOSS#9 [ffff8818d3aeff50] ret_from_fork at ffffffff816d1607

The troublemake elem:
crash> lock_manager ffff881dc0c38298
struct lock_manager {
  list = {
    next = 0xffff881dc0c38298,
    prev = 0xffff881dc0c38298
  },
  block_opens = false
}

Fixes: c87fb4a ("lockd: NLM grace period shouldn't block NFSv4 opens")
Signed-off-by: Cheng Lin <[email protected]>
Signed-off-by: Yi Wang <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
marcomarinho pushed a commit to aospa-cheeseburger/android_kernel_oneplus_msm8998_OLD that referenced this issue Mar 21, 2021
…mm_struct

mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too.  It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.

And, the mmap_sem contention may cause unexpected issue like below:

INFO: task ps:14018 blocked for more than 120 seconds.
       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 OnePlusOSS#1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
 ps              D    0 14018      1 0x00000004
 Call Trace:
   schedule+0x36/0x80
   rwsem_down_read_failed+0xf0/0x150
   call_rwsem_down_read_failed+0x18/0x30
   down_read+0x20/0x40
   proc_pid_cmdline_read+0xd9/0x4e0
   __vfs_read+0x37/0x150
   vfs_read+0x96/0x130
   SyS_read+0x55/0xc0
   entry_SYSCALL_64_fastpath+0x1a/0xc5

Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.

So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.

This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem.  The mmap_sem
scalability issue will be solved in the future.

Change-Id: Ifa8f001ee2fc4f0ce60c18e771cebcf8a1f0943e
[[email protected]: add comment about mmap_sem and arg_lock]
  Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Git-commit: 88aa7cc
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Srinivas Ramana <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants