BUG: soft lockup - CPU#1 stuck for 10s! [php5:3111]

Discussion in 'Linux & BSD' started by LordOfLA, Mar 27, 2009.

  1. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    Anyone seen this before and know of a fix?

    Centos 5.2 Kernel 2.6.18-92.el5PAE

    Code:
    BUG: soft lockup - CPU#1 stuck for 10s! [php5:3111]
    Pid: 3111, comm:                 php5
    EIP: 0060:[<f887f09f>] CPU: 1
    EIP is at ext3_find_entry+0x16b/0x51e [ext3]
     EFLAGS: 00000293    Not tainted  (2.6.18-92.el5PAE #1)
    EAX: 00000000 EBX: f649ec00 ECX: 00000000 EDX: f6a56600
    ESI: f6be0b64 EDI: 00000045 EBP: f4284ee0 DS: 007b ES: 007b
    CR0: 80050033 CR2: bfe93b78 CR3: 36507600 CR4: 000006f0
     [<f8880835>] ext3_lookup+0x26/0x10f [ext3]
     [<c047bc1e>] __lookup_hash+0xb1/0xe1
     [<c047d59d>] do_unlinkat+0x57/0x10e
     [<c040962b>] sys_ipc+0x133/0x149
     [<c046211e>] sys_brk+0xcb/0xd3
     [<c0404e95>] sysenter_past_esp+0x56/0x79
     =======================
     
  2. Geffy

    Geffy Moderator Folding Team

    Messages:
    7,805
    Location:
    United Kingdom
    Do you know what its trying to do? Looks like it might be trying to unlink a file on an ext3 file system but I'm guessing you've worked that much out already.
     
  3. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    aye that part was obvious :)

    What isn't obvious is why PHP deleting a file ends up with a CPU core locked at 100% and leaving the rest of the server almost useless...
     
  4. Geffy

    Geffy Moderator Folding Team

    Messages:
    7,805
    Location:
    United Kingdom
    the only files I can think of that PHP might link/unlink regularly are session files, but an individual script might add more to that. Of course if you've got the ZendOptimizer then I believe that caches the optimised scripts to the file system, at least I'd expect it to.

    How are you running PHP? mod_php, fastcgi process?
     
  5. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    CGI. FastCGI doesn't like output buffering with apache 2.2.

    It's daft that the other 3 cores are largely useless though.
     
  6. X-Istence

    X-Istence * Political User

    Messages:
    6,498
    Location:
    USA
    Ehm, the CPU locking means that the scheduler is unable to interrupt whatever is running on it, that is bad.

    Taking a look at your EFLAG:

    Now when we look at:

    http://en.wikipedia.org/wiki/FLAGS_register_(computing)

    We notice that bit number 9 is "interrupt enable". Which seems to be enabled, so the scheduler should be able to interrupt the process, unless it is locked by the ext3 file system.

    From what I have been able to get from some kernel hackers I hang out with on IRC it could be a bug in the module/kernel. Have you checked that you have the latest upgrades for your BIOS, apparently there have been a few micro-code updates for certain Intel processors/AMD processors which could also cause the issue you have described.

    As for the other cores not working quite as well, that makes perfect sense. If one CPU is locked, certain caches will be locked as well, that makes cache flushes rather hard, as well as various other tasks.