Staging
v0.8.1
https://github.com/torvalds/linux
Revision b29c701deacd5d24453127c37ed77ef851c53b8b authored by Henry Nestler on 12 May 2008, 13:44:39 UTC, committed by Ingo Molnar on 12 June 2008, 19:26:07 UTC
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1 parent 3703f39
History
Tip revision: b29c701deacd5d24453127c37ed77ef851c53b8b authored by Henry Nestler on 12 May 2008, 13:44:39 UTC
x86: fix endless page faults in mount_block_root for Linux 2.6
Tip revision: b29c701
File Mode Size
async_tx
Kconfig -rw-r--r-- 17.4 KB
Makefile -rw-r--r-- 2.4 KB
ablkcipher.c -rw-r--r-- 8.8 KB
aead.c -rw-r--r-- 12.2 KB
aes_generic.c -rw-r--r-- 13.5 KB
algapi.c -rw-r--r-- 14.1 KB
anubis.c -rw-r--r-- 27.8 KB
api.c -rw-r--r-- 9.7 KB
arc4.c -rw-r--r-- 2.0 KB
authenc.c -rw-r--r-- 13.0 KB
blkcipher.c -rw-r--r-- 19.0 KB
blowfish.c -rw-r--r-- 17.5 KB
camellia.c -rw-r--r-- 35.2 KB
cast5.c -rw-r--r-- 34.1 KB
cast6.c -rw-r--r-- 21.5 KB
cbc.c -rw-r--r-- 7.4 KB
ccm.c -rw-r--r-- 21.5 KB
chainiv.c -rw-r--r-- 7.9 KB
cipher.c -rw-r--r-- 3.3 KB
compress.c -rw-r--r-- 1.3 KB
crc32c.c -rw-r--r-- 2.6 KB
cryptd.c -rw-r--r-- 9.2 KB
crypto_null.c -rw-r--r-- 4.7 KB
cryptomgr.c -rw-r--r-- 4.4 KB
ctr.c -rw-r--r-- 10.8 KB
cts.c -rw-r--r-- 9.8 KB
deflate.c -rw-r--r-- 5.5 KB
des_generic.c -rw-r--r-- 35.3 KB
digest.c -rw-r--r-- 3.7 KB
ecb.c -rw-r--r-- 4.9 KB
eseqiv.c -rw-r--r-- 6.2 KB
fcrypt.c -rw-r--r-- 18.0 KB
gcm.c -rw-r--r-- 19.8 KB
gf128mul.c -rw-r--r-- 13.2 KB
hash.c -rw-r--r-- 2.7 KB
hmac.c -rw-r--r-- 7.1 KB
internal.h -rw-r--r-- 3.8 KB
khazad.c -rw-r--r-- 51.8 KB
lrw.c -rw-r--r-- 7.5 KB
lzo.c -rw-r--r-- 2.5 KB
md4.c -rw-r--r-- 6.2 KB
md5.c -rw-r--r-- 7.3 KB
michael_mic.c -rw-r--r-- 3.5 KB
pcbc.c -rw-r--r-- 7.7 KB
proc.c -rw-r--r-- 2.8 KB
salsa20_generic.c -rw-r--r-- 7.3 KB
scatterwalk.c -rw-r--r-- 2.9 KB
seed.c -rw-r--r-- 17.4 KB
seqiv.c -rw-r--r-- 8.2 KB
serpent.c -rw-r--r-- 19.8 KB
sha1_generic.c -rw-r--r-- 3.1 KB
sha256_generic.c -rw-r--r-- 12.2 KB
sha512_generic.c -rw-r--r-- 9.6 KB
tcrypt.c -rw-r--r-- 46.3 KB
tcrypt.h -rw-r--r-- 268.9 KB
tea.c -rw-r--r-- 7.1 KB
tgr192.c -rw-r--r-- 31.1 KB
twofish.c -rw-r--r-- 6.3 KB
twofish_common.c -rw-r--r-- 37.7 KB
wp512.c -rw-r--r-- 60.3 KB
xcbc.c -rw-r--r-- 9.0 KB
xor.c -rw-r--r-- 3.6 KB
xts.c -rw-r--r-- 7.1 KB

back to top