Staging
v0.8.1
v0.8.1
https://github.com/torvalds/linux
Revision b29c701deacd5d24453127c37ed77ef851c53b8b authored by Henry Nestler on 12 May 2008, 13:44:39 UTC, committed by Ingo Molnar on 12 June 2008, 19:26:07 UTC
Page faults in kernel address space between PAGE_OFFSET up to VMALLOC_START should not try to map as vmalloc. Fix rarely endless page faults inside mount_block_root for root filesystem at boot time. All 32bit kernels up to 2.6.25 can fail into this hole. I can not present this under native linux kernel. I see, that the 64bit has fixed the problem. I copied the same lines into 32bit part. Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation): http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt The physicaly memory was trimmed down to 192MB to better catch the bug. More memory gets the bug more rarely. Details, how every x86 32bit system can fail: Start from "mount_block_root", http://lxr.linux.no/linux/init/do_mounts.c#L297 There the variable "fs_names" got one memory page with 4096 bytes. Variable "p" walks through the existing file system types. The first string is no problem. But, with the second loop in mount_block_root the offset of "p" is not at beginning of page, the offset is for example +9, if "reiserfs" is the first in list. Than calls do_mount_root, and lands in sys_mount. Remember: Variable "type_page" contains now "fs_type+9" and not contains a full page. The sys_mount copies 4096 bytes with function "exact_copy_from_user()": http://lxr.linux.no/linux/fs/namespace.c#L1540 Mostly exist pages after the buffer "fs_names+4096+9" and the page fault handler was not called. No problem. In the case, if the page after "fs_names+4096" is not mapped, the page fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320 The do_page_fault gots an address 0xc03b4000. It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's from "__getname()" alias "kmem_cache_alloc". The "error_code" is 0. "vmalloc_fault" will be call: http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332 "vmalloc_fault" tryed to find the physical page for a non existing virtual memory area. The macro "pte_present" in vmalloc_fault() got a next page fault for 0xc0000ed0 at: http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282 No PTE exist for such virtual address. The page fault handler was trying to sync the physical page for the PTE lockup. This called vmalloc_fault() again for address 0xc000000, and that also was not existing. The endless began... In normal case the cpu would still loop with disabled interrrupts. Under coLinux this was catched by a stack overflow inside printk debugs. Signed-off-by: Henry Nestler <henry.nestler@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1 parent 3703f39
Tip revision: b29c701deacd5d24453127c37ed77ef851c53b8b authored by Henry Nestler on 12 May 2008, 13:44:39 UTC
x86: fix endless page faults in mount_block_root for Linux 2.6
x86: fix endless page faults in mount_block_root for Linux 2.6
Tip revision: b29c701
File | Mode | Size |
---|---|---|
async_tx | ||
Kconfig | -rw-r--r-- | 17.4 KB |
Makefile | -rw-r--r-- | 2.4 KB |
ablkcipher.c | -rw-r--r-- | 8.8 KB |
aead.c | -rw-r--r-- | 12.2 KB |
aes_generic.c | -rw-r--r-- | 13.5 KB |
algapi.c | -rw-r--r-- | 14.1 KB |
anubis.c | -rw-r--r-- | 27.8 KB |
api.c | -rw-r--r-- | 9.7 KB |
arc4.c | -rw-r--r-- | 2.0 KB |
authenc.c | -rw-r--r-- | 13.0 KB |
blkcipher.c | -rw-r--r-- | 19.0 KB |
blowfish.c | -rw-r--r-- | 17.5 KB |
camellia.c | -rw-r--r-- | 35.2 KB |
cast5.c | -rw-r--r-- | 34.1 KB |
cast6.c | -rw-r--r-- | 21.5 KB |
cbc.c | -rw-r--r-- | 7.4 KB |
ccm.c | -rw-r--r-- | 21.5 KB |
chainiv.c | -rw-r--r-- | 7.9 KB |
cipher.c | -rw-r--r-- | 3.3 KB |
compress.c | -rw-r--r-- | 1.3 KB |
crc32c.c | -rw-r--r-- | 2.6 KB |
cryptd.c | -rw-r--r-- | 9.2 KB |
crypto_null.c | -rw-r--r-- | 4.7 KB |
cryptomgr.c | -rw-r--r-- | 4.4 KB |
ctr.c | -rw-r--r-- | 10.8 KB |
cts.c | -rw-r--r-- | 9.8 KB |
deflate.c | -rw-r--r-- | 5.5 KB |
des_generic.c | -rw-r--r-- | 35.3 KB |
digest.c | -rw-r--r-- | 3.7 KB |
ecb.c | -rw-r--r-- | 4.9 KB |
eseqiv.c | -rw-r--r-- | 6.2 KB |
fcrypt.c | -rw-r--r-- | 18.0 KB |
gcm.c | -rw-r--r-- | 19.8 KB |
gf128mul.c | -rw-r--r-- | 13.2 KB |
hash.c | -rw-r--r-- | 2.7 KB |
hmac.c | -rw-r--r-- | 7.1 KB |
internal.h | -rw-r--r-- | 3.8 KB |
khazad.c | -rw-r--r-- | 51.8 KB |
lrw.c | -rw-r--r-- | 7.5 KB |
lzo.c | -rw-r--r-- | 2.5 KB |
md4.c | -rw-r--r-- | 6.2 KB |
md5.c | -rw-r--r-- | 7.3 KB |
michael_mic.c | -rw-r--r-- | 3.5 KB |
pcbc.c | -rw-r--r-- | 7.7 KB |
proc.c | -rw-r--r-- | 2.8 KB |
salsa20_generic.c | -rw-r--r-- | 7.3 KB |
scatterwalk.c | -rw-r--r-- | 2.9 KB |
seed.c | -rw-r--r-- | 17.4 KB |
seqiv.c | -rw-r--r-- | 8.2 KB |
serpent.c | -rw-r--r-- | 19.8 KB |
sha1_generic.c | -rw-r--r-- | 3.1 KB |
sha256_generic.c | -rw-r--r-- | 12.2 KB |
sha512_generic.c | -rw-r--r-- | 9.6 KB |
tcrypt.c | -rw-r--r-- | 46.3 KB |
tcrypt.h | -rw-r--r-- | 268.9 KB |
tea.c | -rw-r--r-- | 7.1 KB |
tgr192.c | -rw-r--r-- | 31.1 KB |
twofish.c | -rw-r--r-- | 6.3 KB |
twofish_common.c | -rw-r--r-- | 37.7 KB |
wp512.c | -rw-r--r-- | 60.3 KB |
xcbc.c | -rw-r--r-- | 9.0 KB |
xor.c | -rw-r--r-- | 3.6 KB |
xts.c | -rw-r--r-- | 7.1 KB |
Computing file changes ...