diff -urN 2.2.17pre19/CREDITS 2.2.17pre19aa2/CREDITS --- 2.2.17pre19/CREDITS Tue Aug 22 14:53:30 2000 +++ 2.2.17pre19aa2/CREDITS Sat Aug 26 17:13:50 2000 @@ -1354,6 +1354,13 @@ S: 2300 Copenhagen S S: Denmark +N: Heinz Mauelshagen +E: mge@EZ-Darmstadt.Telekom.de +D: Logical Volume Manager +S: Bartningstr. 12 +S: 64289 Darmstadt +S: Germany + N: Mike McLagan E: mike.mclagan@linux.org W: http://www.invlogic.com/~mmclagan diff -urN 2.2.17pre19/Documentation/Configure.help 2.2.17pre19aa2/Documentation/Configure.help --- 2.2.17pre19/Documentation/Configure.help Tue Aug 22 14:53:31 2000 +++ 2.2.17pre19aa2/Documentation/Configure.help Sat Aug 26 17:13:50 2000 @@ -168,6 +168,11 @@ on the Alpha. The only time you would ever not say Y is to say M in order to debug the code. Say Y unless you know what you are doing. +Big memory support +CONFIG_BIGMEM + This option is required if you want to utilize physical memory which + is not covered by the kernel virtual address space (> 1GB). + Normal PC floppy disk support CONFIG_BLK_DEV_FD If you want to use the floppy disk drive(s) of your PC under Linux, @@ -943,6 +948,30 @@ called on26.o. You must also have a high-level driver for the type of device that you want to support. +Logical Volume Manager (LVM) support +CONFIG_BLK_DEV_LVM + This driver lets you combine several hard disks, hard disk partitions, + multiple devices or even loop devices (for evaluation purposes) into + a volume group. Imagine a volume group as a kind of virtual disk. + Logical volumes, which can be thought of as virtual partitions, + can be created in the volume group. You can resize volume groups and + logical volumes after creation time, corresponding to new capacity needs. + Logical volumes are accessed as block devices named + /dev/VolumeGroupName/LogicalVolumeName. + + For details see /usr/src/linux/Documentaion/LVM-HOWTO. + + To get the newest software see . + +Logical Volume Manager proc filesystem information +CONFIG_LVM_PROC_FS + If you say Y here, you are able to access overall Logical Volume Manager, + Volume Group, Logical and Physical Volume information in /proc/lvm. + + To use this option, you have to check, that the "proc filesystem support" + (CONFIG_PROC_FS) is enabled too. + + Multiple devices driver support CONFIG_BLK_DEV_MD This driver lets you combine several hard disk partitions into one @@ -9518,6 +9547,20 @@ If you think you have a use for such a device (such as periodic data sampling), then say Y here, and read Documentation/rtc.txt for details. + For DEC Alpha users it is highly recommended to say Y here; if you + don't need all the features, you can choose the lightweight version + afterwards. + +Use only lightweight version (no interrupts) +CONFIG_RTC_LIGHT + This option turns off extended features of the RTC driver that deal + with interrupts (periodic signals and alarm). If you only need this + driver to read and set your system hardware clock, say Y here. + If you are on DEC Alpha, enabling this option will allow the kernel + to receive system clock interrupts in the standard, traditional + manner (that is, from the RTC device). Fully featured RTC driver + would move the clock signal source to the PIT (Programmable + Interrupt Timer), like on a PC. Tadpole ANA H8 Support CONFIG_H8 diff -urN 2.2.17pre19/Documentation/LVM-HOWTO 2.2.17pre19aa2/Documentation/LVM-HOWTO --- 2.2.17pre19/Documentation/LVM-HOWTO Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/Documentation/LVM-HOWTO Sat Aug 26 17:13:50 2000 @@ -0,0 +1,118 @@ +Heinz Mauelshagen's LVM (Logical Volume Manager) howto. 01/28/1999 + + +Abstract: +--------- +The LVM adds a kind of virtual disks and virtual partitions functionality +to the Linux operating system + +It achieves this by adding an additional layer between the physical peripherals +and the i/o interface in the kernel. + +This allows the concatenation of several disk partitions or total disks +(so-called physical volumes or PVs) or even multiple devices +to form a storage pool (so-called Volume Group or VG) with +allocation units called physical extents (called PE). +You can think of the volume group as a virtual disk. +Please see scenario below. + +Some or all PEs of this VG then can be allocated to so-called Logical Volumes +or LVs in units called logical extents or LEs. +Each LE is mapped to a corresponding PE. +LEs and PEs are equal in size. +Logical volumes are a kind of virtual partitions. + + +The LVs can be used through device special files similar to the known +/dev/sd[a-z]* or /dev/hd[a-z]* named /dev/VolumeGroupName/LogicalVolumeName. + +But going beyond this, you are able to extend or reduce +VGs _AND_ LVs at runtime! + +So... +If for example the capacity of a LV gets too small and your VG containing +this LV is full, you could add another PV to that VG and simply extend +the LV afterwards. +If you reduce or delete a LV you can use the freed capacity for different +LVs in the same VG. + + +The above scenario looks like this: + + /------------------------------------------\ + | /--PV2---\ VG 1 /--PVn---\ | + | |-VGDA---| |-VGDA-- | | + | |PE1PE2..| |PE1PE2..| | + | | | ...... | | | + | | | | | | + | | /-----------------------\ | | + | | \-------LV 1------------/ | | + | | ..PEn| | ..PEn| | + | \--------/ \--------/ | + \------------------------------------------/ + +PV 1 could be /dev/sdc1 sized 3GB +PV n could be /dev/sde1 sized 4GB +VG 1 could be test_vg +LV 1 could be /dev/test_vg/test_lv +VGDA is the volume group descriptor area holding the LVM metadata +PE1 up to PEn is the number of physical extents on each disk(partition) + + + +Installation steps see INSTALL and insmod(1)/modprobe(1), kmod/kerneld(8) +to load the logical volume manager module if you did not bind it +into the kernel. + + +Configuration steps for getting the above scenario: + +1. Set the partition system id to 0xFE on /dev/sdc1 and /dev/sde1. + +2. do a "pvcreate /dev/sd[ce]1" + For testing purposes you can use more than one partition on a disk. + You should not use more than one partition because in the case of + a striped LV you'll have a performance breakdown. + +3. do a "vgcreate test_vg /dev/sd[ce]1" to create the new VG named "test_vg" + which has the total capacity of both partitions. + vgcreate activates (transfers the metadata into the LVM driver in the kernel) + the new volume group too to be able to create LVs in the next step. + +4. do a "lvcreate -L1500 -ntest_lv test_vg" to get a 1500MB linear LV named + "test_lv" and it's block device special "/dev/test_vg/test_lv". + + Or do a "lvcreate -i2 -I4 -l1500 -nanother_test_lv test_vg" to get a 100 LE + large logical volume with 2 stripes and stripesize 4 KB. + +5. For example generate a filesystem in one LV with + "mke2fs /dev/test_vg/test_lv" and mount it. + +6. extend /dev/test_vg/test_lv to 1600MB with relative size by + "lvextend -L+100 /dev/test_vg/test_lv" + or with absolute size by + "lvextend -L1600 /dev/test_vg/test_lv" + +7. reduce /dev/test_vg/test_lv to 900 logical extents with relative extents by + "lvreduce -l-700 /dev/test_vg/test_lv" + or with absolute extents by + "lvreduce -l900 /dev/test_vg/test_lv" + +9. rename a VG by deactivating it with + "vgchange -an test_vg" # only VGs with _no_ open LVs can be deactivated! + "vgrename test_vg whatever" + and reactivate it again by + "vgchange -ay whatever" + +9. rename a LV after closing it by + "lvchange -an /dev/whatever/test_lv" # only closed LVs can be deactivated + "lvrename /dev/whatever/test_lv /dev/whatever/whatvolume" + or by + "lvrename whatever test_lv whatvolume" + and reactivate it again by + "lvchange -ay /dev/whatever/whatvolume" + +10. if you own Ted Tso's resize2fs program, you are able to resize the + ext2 type filesystems contained in logical volumes without destroyiing + the data by + "e2fsadm -L+100 /dev/test_vg/another_test_lv" diff -urN 2.2.17pre19/MAINTAINERS 2.2.17pre19aa2/MAINTAINERS --- 2.2.17pre19/MAINTAINERS Tue Aug 22 14:53:33 2000 +++ 2.2.17pre19aa2/MAINTAINERS Sat Aug 26 17:13:50 2000 @@ -561,6 +561,13 @@ W: http://people.redhat.com/zab/maestro/ S: Supported +LOGICAL VOLUME MANAGER +P: Heinz Mauelshagen +M: linux-LVM@EZ-Darmstadt.Telekom.de +L: linux-LVM@msede.com +W: http://linux.msede.com/lvm +S: Maintained + M68K P: Jes Sorensen M: Jes.Sorensen@cern.ch diff -urN 2.2.17pre19/Makefile 2.2.17pre19aa2/Makefile --- 2.2.17pre19/Makefile Tue Aug 22 14:53:33 2000 +++ 2.2.17pre19aa2/Makefile Sat Aug 26 17:13:49 2000 @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 2 SUBLEVEL = 17 -EXTRAVERSION = pre19 +EXTRAVERSION = pre19aa2 ARCH := $(shell uname -m | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ -e s/arm.*/arm/ -e s/sa110/arm/) diff -urN 2.2.17pre19/arch/alpha/config.in 2.2.17pre19aa2/arch/alpha/config.in --- 2.2.17pre19/arch/alpha/config.in Tue Aug 22 14:53:33 2000 +++ 2.2.17pre19aa2/arch/alpha/config.in Sat Aug 26 17:13:49 2000 @@ -21,6 +21,7 @@ mainmenu_option next_comment comment 'General setup' +bool 'BIGMEM support' CONFIG_BIGMEM choice 'Alpha system type' \ "Generic CONFIG_ALPHA_GENERIC \ Alcor/Alpha-XLT CONFIG_ALPHA_ALCOR \ diff -urN 2.2.17pre19/arch/alpha/defconfig 2.2.17pre19aa2/arch/alpha/defconfig --- 2.2.17pre19/arch/alpha/defconfig Sun Apr 2 21:07:48 2000 +++ 2.2.17pre19aa2/arch/alpha/defconfig Sat Aug 26 17:13:50 2000 @@ -255,7 +255,8 @@ # CONFIG_QIC02_TAPE is not set # CONFIG_WATCHDOG is not set # CONFIG_NVRAM is not set -# CONFIG_RTC is not set +CONFIG_RTC=y +CONFIG_RTC_LIGHT=y # # Video For Linux diff -urN 2.2.17pre19/arch/alpha/kernel/alpha_ksyms.c 2.2.17pre19aa2/arch/alpha/kernel/alpha_ksyms.c --- 2.2.17pre19/arch/alpha/kernel/alpha_ksyms.c Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/arch/alpha/kernel/alpha_ksyms.c Sat Aug 26 17:13:49 2000 @@ -37,6 +37,7 @@ extern struct hwrpb_struct *hwrpb; extern void dump_thread(struct pt_regs *, struct user *); extern int dump_fpu(struct pt_regs *, elf_fpregset_t *); +extern spinlock_t rtc_lock; /* these are C runtime functions with special calling conventions: */ extern void __divl (void); @@ -187,6 +188,8 @@ EXPORT_SYMBOL(local_bh_count); EXPORT_SYMBOL(local_irq_count); #endif /* __SMP__ */ + +EXPORT_SYMBOL(rtc_lock); /* * The following are special because they're not called diff -urN 2.2.17pre19/arch/alpha/kernel/irq.h 2.2.17pre19aa2/arch/alpha/kernel/irq.h --- 2.2.17pre19/arch/alpha/kernel/irq.h Sun Apr 2 21:07:48 2000 +++ 2.2.17pre19aa2/arch/alpha/kernel/irq.h Sat Aug 26 17:13:50 2000 @@ -44,7 +44,7 @@ } #define RTC_IRQ 8 -#ifdef CONFIG_RTC +#if defined(CONFIG_RTC) && !defined(CONFIG_RTC_LIGHT) #define TIMER_IRQ 0 /* timer is the pit */ #else #define TIMER_IRQ RTC_IRQ /* timer is the rtc */ diff -urN 2.2.17pre19/arch/alpha/kernel/osf_sys.c 2.2.17pre19aa2/arch/alpha/kernel/osf_sys.c --- 2.2.17pre19/arch/alpha/kernel/osf_sys.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/alpha/kernel/osf_sys.c Sat Aug 26 17:13:50 2000 @@ -108,7 +108,7 @@ int error; }; -static int osf_filldir(void *__buf, const char *name, int namlen, off_t offset, ino_t ino) +static int osf_filldir(void *__buf, const char *name, int namlen, off_t offset, ino_t ino, unsigned int d_type) { struct osf_dirent *dirent; struct osf_dirent_callback *buf = (struct osf_dirent_callback *) __buf; diff -urN 2.2.17pre19/arch/alpha/kernel/process.c 2.2.17pre19aa2/arch/alpha/kernel/process.c --- 2.2.17pre19/arch/alpha/kernel/process.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/alpha/kernel/process.c Sat Aug 26 17:13:50 2000 @@ -30,7 +30,7 @@ #include #include -#ifdef CONFIG_RTC +#if defined(CONFIG_RTC) && !defined(CONFIG_RTC_LIGHT) #include #endif @@ -150,7 +150,7 @@ } #endif /* __SMP__ */ -#ifdef CONFIG_RTC +#if defined(CONFIG_RTC) && !defined(CONFIG_RTC_LIGHT) /* Reset rtc to defaults. */ { unsigned char control; diff -urN 2.2.17pre19/arch/alpha/kernel/setup.c 2.2.17pre19aa2/arch/alpha/kernel/setup.c --- 2.2.17pre19/arch/alpha/kernel/setup.c Tue Aug 22 14:53:33 2000 +++ 2.2.17pre19aa2/arch/alpha/kernel/setup.c Sat Aug 26 17:13:52 2000 @@ -25,8 +25,9 @@ #include #include #include +#include -#ifdef CONFIG_RTC +#if defined(CONFIG_RTC) && !defined(CONFIG_RTC_LIGHT) #include #endif #ifdef CONFIG_BLK_DEV_INITRD @@ -65,7 +66,7 @@ #define N(a) (sizeof(a)/sizeof(a[0])) -static unsigned long find_end_memory(void); +static unsigned long find_end_memory(unsigned long); static unsigned long get_memory_end_override(char *); static struct alpha_machine_vector *get_sysvec(long, long, long); static struct alpha_machine_vector *get_sysvec_byname(const char *); @@ -279,26 +280,34 @@ /* Find our memory. */ *memory_start_p = (unsigned long)kernel_end; - *memory_end_p = find_end_memory(); - if (memory_end_override && memory_end_override < *memory_end_p) { - printk("Overriding memory size from %luMB to %luMB\n", - __pa(*memory_end_p) >> 20, - __pa(memory_end_override) >> 20); - *memory_end_p = memory_end_override; - } + *memory_end_p = find_end_memory(memory_end_override); #ifdef CONFIG_BLK_DEV_INITRD initrd_start = INITRD_START; if (initrd_start) { - initrd_end = initrd_start+INITRD_SIZE; + unsigned long initrd_size = INITRD_SIZE; + initrd_end = initrd_start+initrd_size; printk("Initial ramdisk at: 0x%p (%lu bytes)\n", - (void *) initrd_start, INITRD_SIZE); + (void *) initrd_start, initrd_size); if (initrd_end > *memory_end_p) { printk("initrd extends beyond end of memory " "(0x%08lx > 0x%08lx)\ndisabling initrd\n", - initrd_end, (unsigned long) memory_end_p); + initrd_end, *memory_end_p); initrd_start = initrd_end = 0; + } else { + /* move initrd from the middle of the RAM to the + end of the RAM so we won't risk to rewrite + initrd while allocating the memory at boot time */ + unsigned long relocate; + + relocate = (*memory_end_p - initrd_size) & PAGE_MASK; + if (relocate > initrd_start) { + memmove((char *) relocate, + (char *) initrd_start, initrd_size); + initrd_start = relocate; + initrd_end = initrd_start + initrd_size; + } } } #endif @@ -312,7 +321,7 @@ /* ??? There is some circumstantial evidence that this needs to be done now rather than later in time_init, which would be more natural. Someone please explain or refute. */ -#if defined(CONFIG_RTC) +#if defined(CONFIG_RTC) && !defined(CONFIG_RTC_LIGHT) rtc_init_pit(); #else alpha_mv.init_pit(); @@ -354,7 +363,7 @@ } static unsigned long __init -find_end_memory(void) +find_end_memory(unsigned long memory_end_override) { int i; unsigned long high = 0; @@ -372,17 +381,49 @@ high = tmp; } - /* Round it up to an even number of pages. */ - high = (high + PAGE_SIZE) & (PAGE_MASK*2); +#ifndef CONFIG_BIGMEM +#define MAX_MEMORY 0x80000000UL +#else +#define LOW_MEMORY 0x80000000UL +#define MAX_MEMORY (VMALLOC_START-PAGE_OFFSET) +#endif /* Enforce maximum of 2GB even if there is more, * but only if the platform (support) cannot handle it. */ - if (high > 0x80000000UL) { - printk("Cropping memory from %luMB to 2048MB\n", high >> 20); - high = 0x80000000UL; + if (high > MAX_MEMORY) { + printk("Cropping memory from %luMB to %luMB\n", + high>>20, MAX_MEMORY>>20); + high = MAX_MEMORY; } + if (memory_end_override && memory_end_override < high) { + printk("Overriding memory size from %luMB to %luMB\n", + high >> 20, memory_end_override >> 20); + high = memory_end_override; + } + +#ifdef CONFIG_BIGMEM + bigmem_start = bigmem_end = high; + if (high > LOW_MEMORY) + { + high = bigmem_start = LOW_MEMORY; + printk(KERN_NOTICE "%luMB BIGMEM available\n", + (bigmem_end-bigmem_start)>>20); + } +#ifdef BIGMEM_DEBUG + else + { + high -= high/4; + bigmem_start = high; + printk(KERN_NOTICE "emulating %luMB BIGMEM\n", + (bigmem_end-bigmem_start)>>20); + } +#endif + bigmem_start += PAGE_OFFSET; + bigmem_end += PAGE_OFFSET; +#endif + return (unsigned long) __va(high); } @@ -403,7 +444,7 @@ end = end << 30; from++; } - return (unsigned long) __va(end); + return end; } diff -urN 2.2.17pre19/arch/alpha/kernel/sys_nautilus.c 2.2.17pre19aa2/arch/alpha/kernel/sys_nautilus.c --- 2.2.17pre19/arch/alpha/kernel/sys_nautilus.c Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/arch/alpha/kernel/sys_nautilus.c Sat Aug 26 17:13:50 2000 @@ -89,7 +89,7 @@ nautilus_kill_arch (int mode, char *restart_cmd) { -#ifdef CONFIG_RTC +#if defined(CONFIG_RTC) && !defined(CONFIG_RTC_LIGHT) /* Reset rtc to defaults. */ { unsigned char control; diff -urN 2.2.17pre19/arch/alpha/kernel/time.c 2.2.17pre19aa2/arch/alpha/kernel/time.c --- 2.2.17pre19/arch/alpha/kernel/time.c Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/arch/alpha/kernel/time.c Sat Aug 26 17:13:50 2000 @@ -47,6 +47,7 @@ static int set_rtc_mmss(unsigned long); +spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED; /* * Shift amount by which scaled_ticks_per_cycle is scaled. Shifting @@ -173,7 +174,7 @@ * drivers depend on them being initialized (e.g., joystick driver). */ -#ifdef CONFIG_RTC +#if defined(CONFIG_RTC) && !defined(CONFIG_RTC_LIGHT) void rtc_init_pit (void) { @@ -338,6 +339,20 @@ irq_handler = timer_interrupt; if (request_irq(TIMER_IRQ, irq_handler, 0, "timer", NULL)) panic("Could not allocate timer IRQ!"); + do_get_fast_time = do_gettimeofday; +} + +static inline void +timeval_normalize(struct timeval * tv) +{ + time_t __sec; + + __sec = tv->tv_usec / 1000000; + if (__sec) + { + tv->tv_usec %= 1000000; + tv->tv_sec += __sec; + } } /* @@ -388,13 +403,11 @@ #endif usec += delta_usec; - if (usec >= 1000000) { - sec += 1; - usec -= 1000000; - } tv->tv_sec = sec; tv->tv_usec = usec; + + timeval_normalize(tv); } void @@ -455,6 +468,8 @@ int real_seconds, real_minutes, cmos_minutes; unsigned char save_control, save_freq_select; + /* irq are locally disabled here */ + spin_lock(&rtc_lock); /* Tell the clock it's being set */ save_control = CMOS_READ(RTC_CONTROL); CMOS_WRITE((save_control|RTC_SET), RTC_CONTROL); @@ -504,6 +519,7 @@ */ CMOS_WRITE(save_control, RTC_CONTROL); CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT); + spin_unlock(&rtc_lock); return retval; } diff -urN 2.2.17pre19/arch/alpha/mm/fault.c 2.2.17pre19aa2/arch/alpha/mm/fault.c --- 2.2.17pre19/arch/alpha/mm/fault.c Tue Aug 22 14:53:33 2000 +++ 2.2.17pre19aa2/arch/alpha/mm/fault.c Sat Aug 26 17:13:49 2000 @@ -102,7 +102,7 @@ goto good_area; if (!(vma->vm_flags & VM_GROWSDOWN)) goto bad_area; - if (expand_stack(vma, address)) + if (expand_stack(vma, address, NULL)) goto bad_area; /* * Ok, we have a good vm_area for this memory access, so diff -urN 2.2.17pre19/arch/alpha/mm/init.c 2.2.17pre19aa2/arch/alpha/mm/init.c --- 2.2.17pre19/arch/alpha/mm/init.c Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/arch/alpha/mm/init.c Sat Aug 26 17:13:49 2000 @@ -18,6 +18,7 @@ #ifdef CONFIG_BLK_DEV_INITRD #include #endif +#include #include #include @@ -30,6 +31,11 @@ extern void die_if_kernel(char *,struct pt_regs *,long); extern void show_net_buffers(void); +static unsigned long totalram_pages, totalbig_pages; + +#ifdef CONFIG_BIGMEM +unsigned long bigmem_start, bigmem_end; +#endif struct thread_struct original_pcb; #ifndef __SMP__ @@ -232,7 +238,11 @@ struct memdesc_struct * memdesc; /* initialize mem_map[] */ +#ifndef CONFIG_BIGMEM start_mem = free_area_init(start_mem, end_mem); +#else + start_mem = free_area_init(start_mem, bigmem_end); +#endif /* find free clusters, update mem_map[] accordingly */ memdesc = (struct memdesc_struct *) @@ -247,11 +257,20 @@ if (cluster->usage & 3) continue; pfn = cluster->start_pfn; +#ifndef CONFIG_BIGMEM if (pfn >= MAP_NR(end_mem)) /* if we overrode mem size */ +#else + if (pfn >= MAP_NR(bigmem_end)) +#endif continue; nr = cluster->numpages; +#ifndef CONFIG_BIGMEM if ((pfn + nr) > MAP_NR(end_mem)) /* if override in cluster */ nr = MAP_NR(end_mem) - pfn; +#else + if ((pfn + nr) > MAP_NR(bigmem_end)) /* if override in cluster */ + nr = MAP_NR(bigmem_end) - pfn; +#endif while (nr--) clear_bit(PG_reserved, &mem_map[pfn++].flags); @@ -306,9 +325,20 @@ mem_init(unsigned long start_mem, unsigned long end_mem) { unsigned long tmp; + unsigned long reservedpages = 0; +#ifdef CONFIG_BIGMEM + bigmem_start = PAGE_ALIGN(bigmem_start); + bigmem_end &= PAGE_MASK; +#endif end_mem &= PAGE_MASK; +#ifndef CONFIG_BIGMEM max_mapnr = num_physpages = MAP_NR(end_mem); +#else + max_mapnr = num_physpages = MAP_NR(bigmem_end); + /* cache the bigmem_mapnr */ + bigmem_mapnr = MAP_NR(bigmem_start); +#endif high_memory = (void *) end_mem; start_mem = PAGE_ALIGN(start_mem); @@ -321,11 +351,22 @@ tmp += PAGE_SIZE; } +#ifndef CONFIG_BIGMEM for (tmp = PAGE_OFFSET ; tmp < end_mem ; tmp += PAGE_SIZE) { +#else + for (tmp = PAGE_OFFSET ; tmp < bigmem_end; tmp += PAGE_SIZE) { +#endif if (tmp >= MAX_DMA_ADDRESS) clear_bit(PG_DMA, &mem_map[MAP_NR(tmp)].flags); if (PageReserved(mem_map+MAP_NR(tmp))) + { + reservedpages++; continue; + } +#ifdef CONFIG_BIGMEM + if (tmp >= bigmem_start) + set_bit(PG_BIGMEM, &mem_map[MAP_NR(tmp)].flags); +#endif atomic_set(&mem_map[MAP_NR(tmp)].count, 1); #ifdef CONFIG_BLK_DEV_INITRD if (initrd_start && tmp >= initrd_start && tmp < initrd_end) @@ -334,8 +375,10 @@ kill_page(tmp); free_page(tmp); } - tmp = nr_free_pages << (PAGE_SHIFT - 10); + tmp = (unsigned long) nr_free_pages << (PAGE_SHIFT - 10); printk("Memory: %luk available\n", tmp); + + totalram_pages = max_mapnr - reservedpages; return; } @@ -359,22 +402,11 @@ void si_meminfo(struct sysinfo *val) { - int i; - - i = max_mapnr; - val->totalram = 0; + val->totalram = totalram_pages << PAGE_SHIFT; val->sharedram = 0; val->freeram = ((unsigned long)nr_free_pages) << PAGE_SHIFT; val->bufferram = buffermem; - while (i-- > 0) { - if (PageReserved(mem_map+i)) - continue; - val->totalram++; - if (!atomic_read(&mem_map[i].count)) - continue; - val->sharedram += atomic_read(&mem_map[i].count) - 1; - } - val->totalram <<= PAGE_SHIFT; - val->sharedram <<= PAGE_SHIFT; + val->totalbig = totalbig_pages << PAGE_SHIFT; + val->freebig = (unsigned long) nr_free_bigpages << PAGE_SHIFT; return; } diff -urN 2.2.17pre19/arch/arm/mm/init.c 2.2.17pre19aa2/arch/arm/mm/init.c --- 2.2.17pre19/arch/arm/mm/init.c Mon Jan 17 16:44:33 2000 +++ 2.2.17pre19aa2/arch/arm/mm/init.c Sat Aug 26 17:13:49 2000 @@ -277,5 +277,7 @@ } val->totalram <<= PAGE_SHIFT; val->sharedram <<= PAGE_SHIFT; + val->totalbig = 0; + val->freebig = 0; } diff -urN 2.2.17pre19/arch/i386/config.in 2.2.17pre19aa2/arch/i386/config.in --- 2.2.17pre19/arch/i386/config.in Tue Aug 22 14:53:33 2000 +++ 2.2.17pre19aa2/arch/i386/config.in Sat Aug 26 17:13:49 2000 @@ -54,6 +54,7 @@ mainmenu_option next_comment comment 'General setup' +bool 'BIGMEM support' CONFIG_BIGMEM bool 'Networking support' CONFIG_NET bool 'PCI support' CONFIG_PCI if [ "$CONFIG_PCI" = "y" ]; then diff -urN 2.2.17pre19/arch/i386/kernel/entry.S 2.2.17pre19aa2/arch/i386/kernel/entry.S --- 2.2.17pre19/arch/i386/kernel/entry.S Tue Aug 22 14:53:33 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/entry.S Sat Aug 26 17:13:50 2000 @@ -569,6 +569,18 @@ .long SYMBOL_NAME(sys_ni_syscall) /* streams1 */ .long SYMBOL_NAME(sys_ni_syscall) /* streams2 */ .long SYMBOL_NAME(sys_vfork) /* 190 */ + .long SYMBOL_NAME(sys_ni_syscall) /* getrlimit */ + .long SYMBOL_NAME(sys_mmap2) /* 192 */ + .long SYMBOL_NAME(sys_truncate64) /* 193 */ + .long SYMBOL_NAME(sys_ftruncate64) /* 194 */ + .long SYMBOL_NAME(sys_stat64) /* 195 */ + .long SYMBOL_NAME(sys_lstat64) /* 196 */ + .long SYMBOL_NAME(sys_fstat64) /* 197 */ + .rept 22 + .long SYMBOL_NAME(sys_ni_syscall) + .endr + .long SYMBOL_NAME(sys_getdents64) /* 220 */ + .long SYMBOL_NAME(sys_fcntl64) /* 221 */ /* * NOTE!! This doesn't have to be exact - we just have @@ -576,6 +588,6 @@ * entries. Don't panic if you notice that this hasn't * been shrunk every time we add a new system call. */ - .rept NR_syscalls-190 + .rept NR_syscalls-221 .long SYMBOL_NAME(sys_ni_syscall) .endr diff -urN 2.2.17pre19/arch/i386/kernel/i386_ksyms.c 2.2.17pre19aa2/arch/i386/kernel/i386_ksyms.c --- 2.2.17pre19/arch/i386/kernel/i386_ksyms.c Mon Jan 17 16:44:33 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/i386_ksyms.c Sat Aug 26 17:13:49 2000 @@ -20,6 +20,7 @@ extern void dump_thread(struct pt_regs *, struct user *); extern int dump_fpu(elf_fpregset_t *); +extern spinlock_t rtc_lock; #if defined(CONFIG_BLK_DEV_IDE) || defined(CONFIG_BLK_DEV_HD) || defined(CONFIG_BLK_DEV_IDE_MODULE) || defined(CONFIG_BLK_DEV_HD_MODULE) extern struct drive_info_struct drive_info; @@ -119,3 +120,5 @@ #ifdef CONFIG_VT EXPORT_SYMBOL(screen_info); #endif + +EXPORT_SYMBOL(rtc_lock); diff -urN 2.2.17pre19/arch/i386/kernel/mtrr.c 2.2.17pre19aa2/arch/i386/kernel/mtrr.c --- 2.2.17pre19/arch/i386/kernel/mtrr.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/mtrr.c Sat Aug 26 17:13:49 2000 @@ -467,9 +467,9 @@ static void intel_get_mtrr (unsigned int reg, unsigned long *base, unsigned long *size, mtrr_type *type) { - unsigned long dummy, mask_lo, base_lo; + unsigned long mask_lo, mask_hi, base_lo, base_hi; - rdmsr (MTRRphysMask_MSR(reg), mask_lo, dummy); + rdmsr (MTRRphysMask_MSR(reg), mask_lo, mask_hi); if ((mask_lo & 0x800) == 0) { /* Invalid (i.e. free) range. */ *base = 0; @@ -478,20 +478,17 @@ return; } - rdmsr(MTRRphysBase_MSR(reg), base_lo, dummy); + rdmsr(MTRRphysBase_MSR(reg), base_lo, base_hi); - /* We ignore the extra address bits (32-35). If someone wants to - run x86 Linux on a machine with >4GB memory, this will be the - least of their problems. */ + /* Work out the shifted address mask. */ + mask_lo = 0xff000000 | mask_hi << (32 - PAGE_SHIFT) + | mask_lo >> PAGE_SHIFT; - /* Clean up mask_lo so it gives the real address mask. */ - mask_lo = (mask_lo & 0xfffff000UL); /* This works correctly if size is a power of two, i.e. a contiguous range. */ - *size = ~(mask_lo - 1); - - *base = (base_lo & 0xfffff000UL); - *type = (base_lo & 0xff); + *size = -mask_lo; + *base = base_hi << (32 - PAGE_SHIFT) | base_lo >> PAGE_SHIFT; + *type = base_lo & 0xff; } /* End Function intel_get_mtrr */ static void cyrix_get_arr (unsigned int reg, unsigned long *base, @@ -516,13 +513,13 @@ /* Enable interrupts if it was enabled previously */ __restore_flags (flags); shift = ((unsigned char *) base)[1] & 0x0f; - *base &= 0xfffff000UL; + *base >>= PAGE_SHIFT; /* Power of two, at least 4K on ARR0-ARR6, 256K on ARR7 * Note: shift==0xf means 4G, this is unsupported. */ if (shift) - *size = (reg < 7 ? 0x800UL : 0x20000UL) << shift; + *size = (reg < 7 ? 0x1UL : 0x40UL) << (shift - 1); else *size = 0; @@ -555,7 +552,7 @@ /* Upper dword is region 1, lower is region 0 */ if (reg == 1) low = high; /* The base masks off on the right alignment */ - *base = low & 0xFFFE0000; + *base = (low & 0xFFFE0000) >> PAGE_SHIFT; *type = 0; if (low & 1) *type = MTRR_TYPE_UNCACHABLE; if (low & 2) *type = MTRR_TYPE_WRCOMB; @@ -580,7 +577,7 @@ * *128K ... */ low = (~low) & 0x1FFFC; - *size = (low + 4) << 15; + *size = (low + 4) << (15 - PAGE_SHIFT); return; } /* End Function amd_get_mtrr */ @@ -599,8 +596,8 @@ unsigned i; u32 tb; tb = centaur_ctx->mcr[reg].low & 0xfff; - *base = centaur_ctx->mcr[reg].high & 0xfffff000; - *size = (~(centaur_ctx->mcr[reg].low & 0xfffff000))+1; + *base = centaur_ctx->mcr[reg].high >> PAGE_SHIFT; + *size = -(centaur_ctx->mcr[reg].low & 0xfffff000) >> PAGE_SHIFT; if (*size) { for( i=0; itype_bits[i]==tb) { @@ -637,8 +634,10 @@ } else { - wrmsr (MTRRphysBase_MSR (reg), base | type, 0); - wrmsr (MTRRphysMask_MSR (reg), ~(size - 1) | 0x800, 0); + wrmsr (MTRRphysBase_MSR (reg), base << PAGE_SHIFT | type, + (base & 0xf00000) >> (32 - PAGE_SHIFT)); + wrmsr (MTRRphysMask_MSR (reg), -size << PAGE_SHIFT | 0x800, + (-size & 0xf00000) >> (32 - PAGE_SHIFT)); } if (do_safe) set_mtrr_done (&ctxt); } /* End Function intel_set_mtrr_up */ @@ -652,7 +651,9 @@ arr = CX86_ARR_BASE + (reg << 1) + reg; /* avoid multiplication by 3 */ /* count down from 32M (ARR0-ARR6) or from 2G (ARR7) */ - size >>= (reg < 7 ? 12 : 18); + if (reg >= 7) + size >>= 6; + size &= 0x7fff; /* make sure arr_size <= 14 */ for(arr_size = 0; size; arr_size++, size >>= 1); @@ -673,6 +674,7 @@ } if (do_safe) set_mtrr_prepare (&ctxt); + base <<= PAGE_SHIFT; setCx86(arr, ((unsigned char *) &base)[3]); setCx86(arr+1, ((unsigned char *) &base)[2]); setCx86(arr+2, (((unsigned char *) &base)[1]) | arr_size); @@ -692,34 +694,36 @@ [RETURNS] Nothing. */ { - u32 low, high; + u32 regs[2]; struct set_mtrr_context ctxt; if (do_safe) set_mtrr_prepare (&ctxt); /* * Low is MTRR0 , High MTRR 1 */ - rdmsr (0xC0000085, low, high); + rdmsr (0xC0000085, regs[0], regs[1]); /* * Blank to disable */ if (size == 0) - *(reg ? &high : &low) = 0; + regs[reg] = 0; else - /* Set the register to the base (already shifted for us), the - type (off by one) and an inverted bitmask of the size - The size is the only odd bit. We are fed say 512K - We invert this and we get 111 1111 1111 1011 but - if you subtract one and invert you get the desired - 111 1111 1111 1100 mask - */ - *(reg ? &high : &low)=(((~(size-1))>>15)&0x0001FFFC)|base|(type+1); + /* Set the register to the base, the type (off by one) and an + inverted bitmask of the size The size is the only odd + bit. We are fed say 512K We invert this and we get 111 1111 + 1111 1011 but if you subtract one and invert you get the + desired 111 1111 1111 1100 mask + + But ~(x - 1) == ~x + 1 == -x. Two's complement rocks! */ + regs[reg] = (-size>>(15-PAGE_SHIFT) & 0x0001FFFC) + | (base<type_bits[type]); + high = base << PAGE_SHIFT; + low = -size << PAGE_SHIFT | centaur_ctx->type_bits[type]; } centaur_ctx->mcr[reg].high = high; centaur_ctx->mcr[reg].low = low; @@ -1041,7 +1045,7 @@ for (i = 0; i < max; ++i) { (*get_mtrr) (i, &lbase, &lsize, <ype); - if (lsize < 1) return i; + if (lsize == 0) return i; } return -ENOSPC; } /* End Function generic_get_free_region */ @@ -1058,7 +1062,7 @@ unsigned long lbase, lsize; /* If we are to set up a region >32M then look at ARR7 immediately */ - if (size > 0x2000000UL) { + if (size > 0x2000UL) { cyrix_get_arr (7, &lbase, &lsize, <ype); if (lsize < 1) return 7; /* else try ARR0-ARR6 first */ @@ -1066,11 +1070,11 @@ for (i = 0; i < 7; i++) { cyrix_get_arr (i, &lbase, &lsize, <ype); - if (lsize < 1) return i; + if (lsize == 0) return i; } /* ARR0-ARR6 isn't free, try ARR7 but its size must be at least 256K */ cyrix_get_arr (i, &lbase, &lsize, <ype); - if ((lsize < 1) && (size >= 0x40000)) return i; + if ((lsize == 0) && (size >= 0x40)) return i; } return -ENOSPC; } /* End Function cyrix_get_free_region */ @@ -1129,7 +1133,7 @@ /* Fall through */ case X86_VENDOR_CYRIX: case X86_VENDOR_CENTAUR: - if ( (base & 0xfff) || (size & 0xfff) ) + if ( (base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1)) ) { printk ("mtrr: size and base must be multiples of 4 kiB\n"); printk ("mtrr: size: %lx base: %lx\n", size, base); @@ -1142,7 +1146,7 @@ return -EINVAL; } } - else if (base + size < 0x100000) /* Cyrix */ + else if (base + size < 0x100000) /* Not Centaur */ { printk ("mtrr: cannot set region below 1 MiB (0x%lx,0x%lx)\n", base, size); @@ -1164,6 +1168,12 @@ return -EINVAL; /*break;*/ } + + /* For all CPU types, the checks above should have ensured that + base and size are page aligned */ + base >>= PAGE_SHIFT; + size >>= PAGE_SHIFT; + /* If the type is WC, check that this processor supports it */ if ( (type == MTRR_TYPE_WRCOMB) && !have_wrcomb () ) { @@ -1183,7 +1193,8 @@ if ( (base < lbase) || (base + size > lbase + lsize) ) { spin_unlock (&main_lock); - printk ("mtrr: 0x%lx,0x%lx overlaps existing 0x%lx,0x%lx\n", + printk ("mtrr: 0x%lx000,0x%lx000 overlaps existing" + " 0x%lx000,0x%lx000\n", base, size, lbase, lsize); return -EINVAL; } @@ -1193,7 +1204,7 @@ if ((boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR) && (type == MTRR_TYPE_UNCACHABLE)) continue; spin_unlock (&main_lock); - printk ( "mtrr: type mismatch for %lx,%lx old: %s new: %s\n", + printk ( "mtrr: type mismatch for %lx000,%lx000 old: %s new: %s\n", base, size, attrib_to_str (ltype), attrib_to_str (type) ); return -EINVAL; } @@ -1241,7 +1252,8 @@ for (i = 0; i < max; ++i) { (*get_mtrr) (i, &lbase, &lsize, <ype); - if ( (lbase == base) && (lsize == size) ) + if (lbase < 0x100000 && lbase << PAGE_SHIFT == base + && lsize < 0x100000 && lsize << PAGE_SHIFT == size) { reg = i; break; @@ -1250,7 +1262,7 @@ if (reg < 0) { spin_unlock (&main_lock); - printk ("mtrr: no MTRR for %lx,%lx found\n", base, size); + printk ("mtrr: no MTRR for %lx000,%lx000 found\n", base, size); return -EINVAL; } } @@ -1431,7 +1443,16 @@ return -EFAULT; if ( gentry.regnum >= get_num_var_ranges () ) return -EINVAL; (*get_mtrr) (gentry.regnum, &gentry.base, &gentry.size, &type); - gentry.type = type; + + /* Hide entries that go above 4GB */ + if (gentry.base + gentry.size > 0x100000 || gentry.size == 0x100000) + gentry.base = gentry.size = gentry.type = 0; + else { + gentry.base <<= PAGE_SHIFT; + gentry.size <<= PAGE_SHIFT; + gentry.type = type; + } + if ( copy_to_user ( (void *) arg, &gentry, sizeof gentry) ) return -EFAULT; break; @@ -1523,24 +1544,24 @@ for (i = 0; i < max; i++) { (*get_mtrr) (i, &base, &size, &type); - if (size < 1) usage_table[i] = 0; + if (size == 0) usage_table[i] = 0; else { - if (size < 0x100000) + if (size < 0x100000 >> PAGE_SHIFT) { - /* 1MB */ + /* less than 1MB */ factor = 'k'; - size >>= 10; + size <<= PAGE_SHIFT - 10; } else { factor = 'M'; - size >>= 20; + size >>= 20 - PAGE_SHIFT; } sprintf (ascii_buffer + ascii_buf_bytes, - "reg%02i: base=0x%08lx (%4liMB), size=%4li%cB: %s, count=%d\n", - i, base, base>>20, size, factor, + "reg%02i: base=0x%05lx000 (%4liMB), size=%4li%cB: %s, count=%d\n", + i, base, base >> (20 - PAGE_SHIFT), size, factor, attrib_to_str (type), usage_table[i]); ascii_buf_bytes += strlen (ascii_buffer + ascii_buf_bytes); } diff -urN 2.2.17pre19/arch/i386/kernel/ptrace.c 2.2.17pre19aa2/arch/i386/kernel/ptrace.c --- 2.2.17pre19/arch/i386/kernel/ptrace.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/ptrace.c Sat Aug 26 17:13:49 2000 @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -81,6 +82,7 @@ pmd_t * pgmiddle; pte_t * pgtable; unsigned long page; + unsigned long retval; int fault; repeat: @@ -126,7 +128,10 @@ if (MAP_NR(page) >= max_mapnr) return 0; page += addr & ~PAGE_MASK; - return *(unsigned long *) page; + page = kmap(page, KM_READ); + retval = *(unsigned long *) page; + kunmap(page, KM_READ); + return retval; } /* @@ -196,7 +201,13 @@ } /* this is a hack for non-kernel-mapped video buffers and similar */ if (MAP_NR(page) < max_mapnr) - *(unsigned long *) (page + (addr & ~PAGE_MASK)) = data; + { + unsigned long vaddr; + + vaddr = kmap(page, KM_WRITE); + *(unsigned long *) (vaddr + (addr & ~PAGE_MASK)) = data; + kunmap(vaddr, KM_WRITE); + } /* we're bypassing pagetables, so we have to set the dirty bit ourselves */ /* this should also re-instate whatever read-only mode there was before */ set_pte(pgtable, pte_mkdirty(mk_pte(page, vma->vm_page_prot))); diff -urN 2.2.17pre19/arch/i386/kernel/setup.c 2.2.17pre19aa2/arch/i386/kernel/setup.c --- 2.2.17pre19/arch/i386/kernel/setup.c Tue Aug 22 14:53:34 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/setup.c Sat Aug 26 17:13:49 2000 @@ -29,6 +29,8 @@ * Dragan Stancevic , May 2000 * * Transmeta CPU detection. H. Peter Anvin , May 2000 + * + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 */ /* @@ -54,6 +56,7 @@ #ifdef CONFIG_BLK_DEV_RAM #include #endif +#include #include #include #include @@ -383,12 +386,31 @@ #define VMALLOC_RESERVE (64 << 20) /* 64MB for vmalloc */ #define MAXMEM ((unsigned long)(-PAGE_OFFSET-VMALLOC_RESERVE)) +#ifdef CONFIG_BIGMEM + bigmem_start = bigmem_end = memory_end; +#endif if (memory_end > MAXMEM) { +#ifdef CONFIG_BIGMEM +#define MAXBIGMEM ((unsigned long)(~(VMALLOC_RESERVE-1))) + bigmem_start = MAXMEM; + bigmem_end = (memory_end < MAXBIGMEM) ? memory_end : MAXBIGMEM; +#endif memory_end = MAXMEM; +#ifdef CONFIG_BIGMEM + printk(KERN_NOTICE "%ldMB BIGMEM available.\n", + (bigmem_end-bigmem_start)>>20); +#else printk(KERN_WARNING "Warning only %ldMB will be used.\n", MAXMEM>>20); +#endif } +#if defined(CONFIG_BIGMEM) && defined(BIGMEM_DEBUG) + else { + memory_end -= memory_end/4; + bigmem_start = memory_end; + } +#endif memory_end += PAGE_OFFSET; *memory_start_p = memory_start; diff -urN 2.2.17pre19/arch/i386/kernel/smp.c 2.2.17pre19aa2/arch/i386/kernel/smp.c --- 2.2.17pre19/arch/i386/kernel/smp.c Tue Aug 22 14:53:34 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/smp.c Sat Aug 26 17:13:49 2000 @@ -795,7 +795,6 @@ return memory_start; } -#ifdef CONFIG_X86_TSC /* * TSC synchronization. * @@ -995,8 +994,6 @@ } #undef NR_LOOPS -#endif - extern void calibrate_delay(void); void __init smp_callin(void) @@ -1083,12 +1080,11 @@ */ set_bit(cpuid, (unsigned long *)&cpu_callin_map[0]); -#ifdef CONFIG_X86_TSC /* * Synchronize the TSC with the BP */ - synchronize_tsc_ap (); -#endif + if (boot_cpu_data.x86_capability & X86_FEATURE_TSC) + synchronize_tsc_ap (); } int cpucount = 0; @@ -1624,13 +1620,11 @@ smp_done: -#ifdef CONFIG_X86_TSC /* * Synchronize the TSC with the AP */ - if (cpucount) + if (boot_cpu_data.x86_capability & X86_FEATURE_TSC && cpucount) synchronize_tsc_bp(); -#endif } /* diff -urN 2.2.17pre19/arch/i386/kernel/sys_i386.c 2.2.17pre19aa2/arch/i386/kernel/sys_i386.c --- 2.2.17pre19/arch/i386/kernel/sys_i386.c Mon Jan 17 16:44:33 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/sys_i386.c Sat Aug 26 17:13:50 2000 @@ -41,6 +41,42 @@ return error; } +/* common code for old and new mmaps */ +static inline long do_mmap2( + unsigned long addr, unsigned long len, + unsigned long prot, unsigned long flags, + unsigned long fd, unsigned long pgoff) +{ + int error = -EBADF; + struct file * file = NULL; + + down(¤t->mm->mmap_sem); + lock_kernel(); + + flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); + if (!(flags & MAP_ANONYMOUS)) { + file = fget(fd); + if (!file) + goto out; + } + + error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff); + + if (file) + fput(file); +out: + unlock_kernel(); + up(¤t->mm->mmap_sem); + return error; +} + +asmlinkage long sys_mmap2(unsigned long addr, unsigned long len, + unsigned long prot, unsigned long flags, + unsigned long fd, unsigned long pgoff) +{ + return do_mmap2(addr, len, prot, flags, fd, pgoff); +} + /* * Perform the select(nd, in, out, ex, tv) and mmap() system * calls. Linux/i386 didn't use to be able to handle more than @@ -59,30 +95,19 @@ asmlinkage int old_mmap(struct mmap_arg_struct *arg) { - int error = -EFAULT; - struct file * file = NULL; struct mmap_arg_struct a; + int err = -EFAULT; if (copy_from_user(&a, arg, sizeof(a))) - return -EFAULT; + goto out; - down(¤t->mm->mmap_sem); - lock_kernel(); - if (!(a.flags & MAP_ANONYMOUS)) { - error = -EBADF; - file = fget(a.fd); - if (!file) - goto out; - } - a.flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); + err = -EINVAL; + if (a.offset & ~PAGE_MASK) + goto out; - error = do_mmap(file, a.addr, a.len, a.prot, a.flags, a.offset); - if (file) - fput(file); + err = do_mmap2(a.addr, a.len, a.prot, a.flags, a.fd, a.offset >> PAGE_SHIFT); out: - unlock_kernel(); - up(¤t->mm->mmap_sem); - return error; + return err; } extern asmlinkage int sys_select(int, fd_set *, fd_set *, fd_set *, struct timeval *); diff -urN 2.2.17pre19/arch/i386/kernel/time.c 2.2.17pre19aa2/arch/i386/kernel/time.c --- 2.2.17pre19/arch/i386/kernel/time.c Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/arch/i386/kernel/time.c Sat Aug 26 17:13:49 2000 @@ -77,6 +77,9 @@ unsigned long fast_gettimeoffset_quotient=0; extern rwlock_t xtime_lock; +extern volatile unsigned long lost_ticks; + +spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED; static inline unsigned long do_fast_gettimeoffset(void) { @@ -112,6 +115,8 @@ #ifndef CONFIG_X86_TSC +spinlock_t i8253_lock = SPIN_LOCK_UNLOCKED; + /* This function must be called with interrupts disabled * It was inspired by Steve McCanne's microtime-i386 for BSD. -- jrs * @@ -156,6 +161,8 @@ */ unsigned long jiffies_t; + /* gets recalled with irq locally disabled */ + spin_lock(&i8253_lock); /* timer count may underflow right here */ outb_p(0x00, 0x43); /* latch the count ASAP */ @@ -214,6 +221,7 @@ } } else jiffies_p = jiffies_t; + spin_unlock(&i8253_lock); count_p = count; @@ -231,13 +239,26 @@ #endif +/* FIXME: should be inline but gcc is buggy and breaks */ +static void +timeval_normalize(struct timeval * tv) +{ + time_t __sec; + + __sec = tv->tv_usec / 1000000; + if (__sec) + { + tv->tv_usec %= 1000000; + tv->tv_sec += __sec; + } +} + /* * This version of gettimeofday has microsecond resolution * and better than microsecond precision on fast x86 machines with TSC. */ void do_gettimeofday(struct timeval *tv) { - extern volatile unsigned long lost_ticks; unsigned long flags; unsigned long usec, sec; @@ -252,13 +273,10 @@ usec += xtime.tv_usec; read_unlock_irqrestore(&xtime_lock, flags); - while (usec >= 1000000) { - usec -= 1000000; - sec++; - } - tv->tv_sec = sec; tv->tv_usec = usec; + + timeval_normalize(tv); } void do_settimeofday(struct timeval *tv) @@ -271,6 +289,7 @@ * would have done, and then undo it! */ tv->tv_usec -= do_gettimeoffset(); + tv->tv_usec -= lost_ticks * (1000000 / HZ); while (tv->tv_usec < 0) { tv->tv_usec += 1000000; @@ -301,6 +320,8 @@ int real_seconds, real_minutes, cmos_minutes; unsigned char save_control, save_freq_select; + /* gets recalled with irq locally disabled */ + spin_lock(&rtc_lock); save_control = CMOS_READ(RTC_CONTROL); /* tell the clock it's being set */ CMOS_WRITE((save_control|RTC_SET), RTC_CONTROL); @@ -346,6 +367,7 @@ */ CMOS_WRITE(save_control, RTC_CONTROL); CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT); + spin_unlock(&rtc_lock); return retval; } @@ -447,10 +469,19 @@ rdtscl(last_tsc_low); +#if 0 /* + * SUBTLE: this is not necessary from here because it's implicit in the + * write xtime_lock. + */ + spin_lock(&i8253_lock); +#endif outb_p(0x00, 0x43); /* latch the count ASAP */ count = inb_p(0x40); /* read the latched count */ count |= inb(0x40) << 8; +#if 0 + spin_unlock(&i8253_lock); +#endif count = ((LATCH-1) - count) * TICK_SIZE; delay_at_last_interrupt = (count + LATCH/2) / LATCH; diff -urN 2.2.17pre19/arch/i386/lib/usercopy.c 2.2.17pre19aa2/arch/i386/lib/usercopy.c --- 2.2.17pre19/arch/i386/lib/usercopy.c Mon Jan 17 16:44:33 2000 +++ 2.2.17pre19aa2/arch/i386/lib/usercopy.c Sat Aug 26 17:13:52 2000 @@ -10,16 +10,20 @@ unsigned long __generic_copy_to_user(void *to, const void *from, unsigned long n) { - if (access_ok(VERIFY_WRITE, to, n)) + if (access_ok(VERIFY_WRITE, to, n)) { __copy_user(to,from,n); + conditional_schedule(); + } return n; } unsigned long __generic_copy_from_user(void *to, const void *from, unsigned long n) { - if (access_ok(VERIFY_READ, from, n)) + if (access_ok(VERIFY_READ, from, n)) { __copy_user_zeroing(to,from,n); + conditional_schedule(); + } return n; } @@ -61,6 +65,7 @@ { long res; __do_strncpy_from_user(dst, src, count, res); + conditional_schedule(); return res; } @@ -68,8 +73,10 @@ strncpy_from_user(char *dst, const char *src, long count) { long res = -EFAULT; - if (access_ok(VERIFY_READ, src, 1)) + if (access_ok(VERIFY_READ, src, 1)) { __do_strncpy_from_user(dst, src, count, res); + conditional_schedule(); + } return res; } @@ -102,8 +109,10 @@ unsigned long clear_user(void *to, unsigned long n) { - if (access_ok(VERIFY_WRITE, to, n)) + if (access_ok(VERIFY_WRITE, to, n)) { __do_clear_user(to, n); + conditional_schedule(); + } return n; } @@ -111,6 +120,7 @@ __clear_user(void *to, unsigned long n) { __do_clear_user(to, n); + conditional_schedule(); return n; } @@ -143,5 +153,6 @@ :"=r" (n), "=D" (s), "=a" (res), "=c" (tmp) :"0" (n), "1" (s), "2" (0), "3" (mask) :"cc"); + conditional_schedule(); return res & mask; } diff -urN 2.2.17pre19/arch/i386/mm/Makefile 2.2.17pre19aa2/arch/i386/mm/Makefile --- 2.2.17pre19/arch/i386/mm/Makefile Mon Jan 18 02:28:56 1999 +++ 2.2.17pre19aa2/arch/i386/mm/Makefile Sat Aug 26 17:13:49 2000 @@ -10,4 +10,8 @@ O_TARGET := mm.o O_OBJS := init.o fault.o ioremap.o extable.o +ifeq ($(CONFIG_BIGMEM),y) +O_OBJS += bigmem.o +endif + include $(TOPDIR)/Rules.make diff -urN 2.2.17pre19/arch/i386/mm/bigmem.c 2.2.17pre19aa2/arch/i386/mm/bigmem.c --- 2.2.17pre19/arch/i386/mm/bigmem.c Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/arch/i386/mm/bigmem.c Sat Aug 26 17:13:49 2000 @@ -0,0 +1,35 @@ +/* + * BIGMEM IA32 code and variables. + * + * (C) 1999 Andrea Arcangeli, SuSE GmbH, andrea@suse.de + * Gerhard Wichert, Siemens AG, Gerhard.Wichert@pdb.siemens.de + */ + +#include +#include + +unsigned long bigmem_start, bigmem_end; + +/* NOTE: fixmap_init alloc all the fixmap pagetables contigous on the + physical space so we can cache the place of the first one and move + around without checking the pgd every time. */ +pte_t *kmap_pte; +pgprot_t kmap_prot; + +#define kmap_get_fixmap_pte(vaddr) \ + pte_offset(pmd_offset(pgd_offset_k(vaddr), (vaddr)), (vaddr)) + +void __init kmap_init(void) +{ + unsigned long kmap_vstart; + + /* cache the first kmap pte */ + kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN); + kmap_pte = kmap_get_fixmap_pte(kmap_vstart); + + kmap_prot = PAGE_KERNEL; +#if 0 + if (boot_cpu_data.x86_capability & X86_FEATURE_PGE) + pgprot_val(kmap_prot) |= _PAGE_GLOBAL; +#endif +} diff -urN 2.2.17pre19/arch/i386/mm/fault.c 2.2.17pre19aa2/arch/i386/mm/fault.c --- 2.2.17pre19/arch/i386/mm/fault.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/i386/mm/fault.c Sat Aug 26 17:13:49 2000 @@ -29,13 +29,13 @@ */ int __verify_write(const void * addr, unsigned long size) { - struct vm_area_struct * vma; + struct vm_area_struct * vma, * prev_vma; unsigned long start = (unsigned long) addr; if (!size) return 1; - vma = find_vma(current->mm, start); + vma = find_vma_prev(current->mm, start, &prev_vma); if (!vma) goto bad_area; if (vma->vm_start > start) @@ -75,7 +75,7 @@ check_stack: if (!(vma->vm_flags & VM_GROWSDOWN)) goto bad_area; - if (expand_stack(vma, start) == 0) + if (expand_stack(vma, start, prev_vma) == 0) goto good_area; bad_area: @@ -112,7 +112,7 @@ { struct task_struct *tsk; struct mm_struct *mm; - struct vm_area_struct * vma; + struct vm_area_struct * vma, * prev_vma; unsigned long address; unsigned long page; unsigned long fixup; @@ -133,7 +133,7 @@ down(&mm->mmap_sem); - vma = find_vma(mm, address); + vma = find_vma_prev(mm, address, &prev_vma); if (!vma) goto bad_area; if (vma->vm_start <= address) @@ -150,7 +150,7 @@ if (address + 32 < regs->esp) goto bad_area; } - if (expand_stack(vma, address)) + if (expand_stack(vma, address, prev_vma)) goto bad_area; /* * Ok, we have a good vm_area for this memory access, so diff -urN 2.2.17pre19/arch/i386/mm/init.c 2.2.17pre19aa2/arch/i386/mm/init.c --- 2.2.17pre19/arch/i386/mm/init.c Sat Oct 23 15:31:08 1999 +++ 2.2.17pre19aa2/arch/i386/mm/init.c Sat Aug 26 17:13:52 2000 @@ -2,6 +2,8 @@ * linux/arch/i386/mm/init.c * * Copyright (C) 1995 Linus Torvalds + * + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 */ #include @@ -20,6 +22,7 @@ #ifdef CONFIG_BLK_DEV_INITRD #include #endif +#include #include #include @@ -28,6 +31,8 @@ #include #include +static int totalram_pages, totalbig_pages; + extern void show_net_buffers(void); extern unsigned long init_smp_mappings(unsigned long); @@ -148,6 +153,7 @@ { int i,free = 0,total = 0,reserved = 0; int shared = 0, cached = 0; + int bigmem = 0; printk("Mem-info:\n"); show_free_areas(); @@ -155,6 +161,8 @@ i = max_mapnr; while (i-- > 0) { total++; + if (PageBIGMEM(mem_map+i)) + bigmem++; if (PageReserved(mem_map+i)) reserved++; else if (PageSwapCache(mem_map+i)) @@ -165,6 +173,7 @@ shared += atomic_read(&mem_map[i].count) - 1; } printk("%d pages of RAM\n",total); + printk("%d pages of BIGMEM\n",bigmem); printk("%d reserved pages\n",reserved); printk("%d pages shared\n",shared); printk("%d pages swap cached\n",cached); @@ -260,6 +269,24 @@ set_pte_phys (address,phys); } +static void __init relocate_initrd(unsigned long mem_start, + unsigned long end_mem) +{ +#ifdef CONFIG_BLK_DEV_INITRD + unsigned long initrd_size, relocate; + + if (!initrd_start || mem_start > initrd_start) + return; + initrd_size = initrd_end - initrd_start; + relocate = (end_mem - initrd_size) & PAGE_MASK; + if (initrd_start < relocate) { + memmove((char *) relocate, (char *) initrd_start, initrd_size); + initrd_start = relocate; + initrd_end = initrd_start + initrd_size; + } +#endif +} + /* * paging_init() sets up the page tables - note that the first 4MB are * already mapped by head.S. @@ -344,7 +371,15 @@ #endif local_flush_tlb(); + /* relocate initrd as soon as we have the paging working */ + relocate_initrd(start_mem, end_mem); + +#ifndef CONFIG_BIGMEM return free_area_init(start_mem, end_mem); +#else + kmap_init(); /* run after fixmap_init */ + return free_area_init(start_mem, bigmem_end + PAGE_OFFSET); +#endif } /* @@ -396,8 +431,18 @@ unsigned long tmp; end_mem &= PAGE_MASK; +#ifdef CONFIG_BIGMEM + bigmem_start = PAGE_ALIGN(bigmem_start); + bigmem_end &= PAGE_MASK; +#endif high_memory = (void *) end_mem; +#ifndef CONFIG_BIGMEM max_mapnr = num_physpages = MAP_NR(end_mem); +#else + max_mapnr = num_physpages = PHYSMAP_NR(bigmem_end); + /* cache the bigmem_mapnr */ + bigmem_mapnr = PHYSMAP_NR(bigmem_start); +#endif /* clear the zero-page */ memset(empty_zero_page, 0, PAGE_SIZE); @@ -452,16 +497,39 @@ #endif free_page(tmp); } - printk("Memory: %luk/%luk available (%dk kernel code, %dk reserved, %dk data, %dk init)\n", +#ifdef CONFIG_BIGMEM + for (tmp = bigmem_start; tmp < bigmem_end; tmp += PAGE_SIZE) { + /* + RMQUEUE_ORDER in page_alloc.c returns PAGE_OFFSET + tmp + which cannot be allowed to be 0 since the callers of + __get_free_pages treat 0 as an allocation failure. To + avoid this possibility, do not allow allocation of the + BIGMEM page which would map to 0. + + Leonard N. Zubkoff, 30 October 1999 + */ + if (tmp + PAGE_OFFSET != 0) { + clear_bit(PG_reserved, &mem_map[PHYSMAP_NR(tmp)].flags); + set_bit(PG_BIGMEM, &mem_map[PHYSMAP_NR(tmp)].flags); + atomic_set(&mem_map[PHYSMAP_NR(tmp)].count, 1); + free_page(tmp + PAGE_OFFSET); + totalbig_pages++; + } + } +#endif + printk("Memory: %luk/%luk available (%dk kernel code, %dk reserved, %dk data, %dk init, %dk bigmem)\n", (unsigned long) nr_free_pages << (PAGE_SHIFT-10), max_mapnr << (PAGE_SHIFT-10), codepages << (PAGE_SHIFT-10), reservedpages << (PAGE_SHIFT-10), datapages << (PAGE_SHIFT-10), - initpages << (PAGE_SHIFT-10)); + initpages << (PAGE_SHIFT-10), + totalbig_pages << (PAGE_SHIFT-10)); if (boot_cpu_data.wp_works_ok < 0) test_wp_bit(); + + totalram_pages = max_mapnr - reservedpages; } void free_initmem(void) @@ -479,22 +547,11 @@ void si_meminfo(struct sysinfo *val) { - int i; - - i = max_mapnr; - val->totalram = 0; + val->totalram = totalram_pages << PAGE_SHIFT; val->sharedram = 0; val->freeram = nr_free_pages << PAGE_SHIFT; val->bufferram = buffermem; - while (i-- > 0) { - if (PageReserved(mem_map+i)) - continue; - val->totalram++; - if (!atomic_read(&mem_map[i].count)) - continue; - val->sharedram += atomic_read(&mem_map[i].count) - 1; - } - val->totalram <<= PAGE_SHIFT; - val->sharedram <<= PAGE_SHIFT; + val->totalbig = totalbig_pages << PAGE_SHIFT; + val->freebig = nr_free_bigpages << PAGE_SHIFT; return; } diff -urN 2.2.17pre19/arch/m68k/mm/init.c 2.2.17pre19aa2/arch/m68k/mm/init.c --- 2.2.17pre19/arch/m68k/mm/init.c Mon Jan 17 16:44:34 2000 +++ 2.2.17pre19aa2/arch/m68k/mm/init.c Sat Aug 26 17:13:49 2000 @@ -490,5 +490,7 @@ } val->totalram <<= PAGE_SHIFT; val->sharedram <<= PAGE_SHIFT; + val->totalbig = 0; + val->freebig = 0; return; } diff -urN 2.2.17pre19/arch/mips/kernel/sysirix.c 2.2.17pre19aa2/arch/mips/kernel/sysirix.c --- 2.2.17pre19/arch/mips/kernel/sysirix.c Mon Jan 17 16:44:34 2000 +++ 2.2.17pre19aa2/arch/mips/kernel/sysirix.c Sat Aug 26 17:13:50 2000 @@ -1984,7 +1984,7 @@ #define ROUND_UP32(x) (((x)+sizeof(u32)-1) & ~(sizeof(u32)-1)) static int irix_filldir32(void *__buf, const char *name, int namlen, - off_t offset, ino_t ino) + off_t offset, ino_t ino, unsigned int d_type) { struct irix_dirent32 *dirent; struct irix_dirent32_callback *buf = @@ -2097,7 +2097,7 @@ #define ROUND_UP64(x) (((x)+sizeof(u64)-1) & ~(sizeof(u64)-1)) static int irix_filldir64(void * __buf, const char * name, int namlen, - off_t offset, ino_t ino) + off_t offset, ino_t ino, unsigned int d_type) { struct irix_dirent64 *dirent; struct irix_dirent64_callback * buf = diff -urN 2.2.17pre19/arch/mips/mm/init.c 2.2.17pre19aa2/arch/mips/mm/init.c --- 2.2.17pre19/arch/mips/mm/init.c Mon Jan 17 16:44:34 2000 +++ 2.2.17pre19aa2/arch/mips/mm/init.c Sat Aug 26 17:13:49 2000 @@ -380,6 +380,8 @@ } val->totalram <<= PAGE_SHIFT; val->sharedram <<= PAGE_SHIFT; + val->totalbig = 0; + val->freebig = 0; return; } diff -urN 2.2.17pre19/arch/ppc/kernel/misc.S 2.2.17pre19aa2/arch/ppc/kernel/misc.S --- 2.2.17pre19/arch/ppc/kernel/misc.S Tue Aug 22 14:53:36 2000 +++ 2.2.17pre19aa2/arch/ppc/kernel/misc.S Sat Aug 26 17:13:50 2000 @@ -866,7 +866,7 @@ .long sys_swapon .long sys_reboot .long old_readdir - .long sys_mmap /* 90 */ + .long old_mmap /* 90 */ .long sys_munmap .long sys_truncate .long sys_ftruncate @@ -970,4 +970,19 @@ .long sys_ni_syscall /* streams1 */ .long sys_ni_syscall /* streams2 */ .long sys_vfork - .space (NR_syscalls-183)*4 + .long sys_ni_syscall /* 190 getrlimit */ + .long sys_ni_syscall /* 191 unused */ + .long sys_mmap2 /* 192 */ + .long sys_truncate64 /* 193 */ + .long sys_ftruncate64 /* 194 */ + .long sys_stat64 /* 195 */ + .long sys_lstat64 /* 196 */ + .long sys_fstat64 /* 197 */ + .rept 4 + .long sys_ni_syscall + .endr + .long sys_getdents64 /* 202 */ + .long sys_fcntl64 /* 203 */ + .rept NR_syscalls-203 + .long sys_ni_syscall + .endr diff -urN 2.2.17pre19/arch/ppc/kernel/syscalls.c 2.2.17pre19aa2/arch/ppc/kernel/syscalls.c --- 2.2.17pre19/arch/ppc/kernel/syscalls.c Tue Aug 22 14:53:36 2000 +++ 2.2.17pre19aa2/arch/ppc/kernel/syscalls.c Sat Aug 26 17:13:50 2000 @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -192,25 +193,55 @@ return error; } -asmlinkage unsigned long sys_mmap(unsigned long addr, size_t len, - unsigned long prot, unsigned long flags, - unsigned long fd, off_t offset) +/* common code for old and new mmaps */ +static inline long do_mmap2( + unsigned long addr, unsigned long len, + unsigned long prot, unsigned long flags, + unsigned long fd, unsigned long pgoff) { struct file * file = NULL; int ret = -EBADF; down(¤t->mm->mmap_sem); - lock_kernel(); + if (!(flags & MAP_ANONYMOUS)) { - if (fd >= NR_OPEN || !(file = current->files->fd[fd])) + file = fget(fd); + if (!file) goto out; } - + flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); - ret = do_mmap(file, addr, len, prot, flags, offset); -out: + ret = do_mmap_pgoff(file, addr, len, prot, flags, pgoff); + + if (file) + fput(file); + out: unlock_kernel(); up(¤t->mm->mmap_sem); + return ret; + +} + +asmlinkage long sys_mmap2(unsigned long addr, unsigned long len, + unsigned long prot, unsigned long flags, + unsigned long fd, unsigned long pgoff) +{ + return do_mmap2(addr, len, prot, flags, fd, pgoff); +} + +asmlinkage unsigned long old_mmap(unsigned long addr, size_t len, + unsigned long prot, unsigned long flags, + unsigned long fd, off_t offset) +{ + int ret; + + ret = -EINVAL; + if (offset & ~PAGE_MASK) + goto out; + + ret = do_mmap2(addr, len, prot, flags, fd, offset >> PAGE_SHIFT); + + out: return ret; } diff -urN 2.2.17pre19/arch/ppc/kernel/time.c 2.2.17pre19aa2/arch/ppc/kernel/time.c --- 2.2.17pre19/arch/ppc/kernel/time.c Tue Aug 22 14:53:36 2000 +++ 2.2.17pre19aa2/arch/ppc/kernel/time.c Sat Aug 26 17:13:48 2000 @@ -147,6 +147,19 @@ hardirq_exit(cpu); } +static inline void +timeval_normalize(struct timeval * tv) +{ + time_t __sec; + + __sec = tv->tv_usec / 1000000; + if (__sec) + { + tv->tv_usec %= 1000000; + tv->tv_sec += __sec; + } +} + /* * This version of gettimeofday has microsecond resolution. */ @@ -161,10 +174,7 @@ #ifndef __SMP__ tv->tv_usec += (decrementer_count - get_dec()) * count_period_num / count_period_den; - if (tv->tv_usec >= 1000000) { - tv->tv_usec -= 1000000; - tv->tv_sec++; - } + timeval_normalize(tv); #endif restore_flags(flags); } diff -urN 2.2.17pre19/arch/ppc/mm/fault.c 2.2.17pre19aa2/arch/ppc/mm/fault.c --- 2.2.17pre19/arch/ppc/mm/fault.c Tue Aug 22 14:53:36 2000 +++ 2.2.17pre19aa2/arch/ppc/mm/fault.c Sat Aug 26 17:13:49 2000 @@ -58,7 +58,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address, unsigned long error_code) { - struct vm_area_struct * vma; + struct vm_area_struct * vma, * prev_vma; struct mm_struct *mm = current->mm; int fault; @@ -92,14 +92,14 @@ } down(&mm->mmap_sem); - vma = find_vma(mm, address); + vma = find_vma_prev(mm, address, &prev_vma); if (!vma) goto bad_area; if (vma->vm_start <= address) goto good_area; if (!(vma->vm_flags & VM_GROWSDOWN)) goto bad_area; - if (expand_stack(vma, address)) + if (expand_stack(vma, address, prev_vma)) goto bad_area; good_area: diff -urN 2.2.17pre19/arch/ppc/mm/init.c 2.2.17pre19aa2/arch/ppc/mm/init.c --- 2.2.17pre19/arch/ppc/mm/init.c Tue Aug 22 14:53:36 2000 +++ 2.2.17pre19aa2/arch/ppc/mm/init.c Sat Aug 26 17:13:49 2000 @@ -329,6 +329,8 @@ } val->totalram <<= PAGE_SHIFT; val->sharedram <<= PAGE_SHIFT; + val->totalbig = 0; + val->freebig = 0; return; } diff -urN 2.2.17pre19/arch/s390/mm/init.c 2.2.17pre19aa2/arch/s390/mm/init.c --- 2.2.17pre19/arch/s390/mm/init.c Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/arch/s390/mm/init.c Sat Aug 26 17:13:49 2000 @@ -427,5 +427,7 @@ } val->totalram <<= PAGE_SHIFT; val->sharedram <<= PAGE_SHIFT; + val->totalbig = 0; + val->freebig = 0; return; } diff -urN 2.2.17pre19/arch/sparc/kernel/sys_sparc.c 2.2.17pre19aa2/arch/sparc/kernel/sys_sparc.c --- 2.2.17pre19/arch/sparc/kernel/sys_sparc.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/sparc/kernel/sys_sparc.c Sat Aug 26 17:13:50 2000 @@ -176,9 +176,9 @@ } /* Linux version of mmap */ -asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len, +asmlinkage unsigned long do_mmap2(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, - unsigned long off) + unsigned long pgoff) { struct file * file = NULL; unsigned long retval = -EBADF; @@ -211,7 +211,7 @@ goto out_putf; flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); - retval = do_mmap(file, addr, len, prot, flags, off); + retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff); out_putf: if (file) @@ -220,6 +220,22 @@ unlock_kernel(); up(¤t->mm->mmap_sem); return retval; +} + +asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len, + unsigned long prot, unsigned long flags, unsigned long fd, + unsigned long pgoff) +{ + /* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE + we have. */ + return do_mmap2(addr, len, prot, flags, fd, pgoff >> (PAGE_SHIFT - 12)); +} + +asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len, + unsigned long prot, unsigned long flags, unsigned long fd, + unsigned long off) +{ + return do_mmap2(addr, len, prot, flags, fd, off >> PAGE_SHIFT); } /* we come to here via sys_nis_syscall so it can setup the regs argument */ diff -urN 2.2.17pre19/arch/sparc/kernel/sys_sunos.c 2.2.17pre19aa2/arch/sparc/kernel/sys_sunos.c --- 2.2.17pre19/arch/sparc/kernel/sys_sunos.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/sparc/kernel/sys_sunos.c Sat Aug 26 17:13:50 2000 @@ -409,7 +409,7 @@ #define ROUND_UP(x) (((x)+sizeof(long)-1) & ~(sizeof(long)-1)) static int sunos_filldir(void * __buf, const char * name, int namlen, - off_t offset, ino_t ino) + off_t offset, ino_t ino, unsigned int d_type) { struct sunos_dirent * dirent; struct sunos_dirent_callback * buf = (struct sunos_dirent_callback *) __buf; @@ -500,7 +500,7 @@ }; static int sunos_filldirentry(void * __buf, const char * name, int namlen, - off_t offset, ino_t ino) + off_t offset, ino_t ino, unsigned int d_type) { struct sunos_direntry * dirent; struct sunos_direntry_callback * buf = (struct sunos_direntry_callback *) __buf; diff -urN 2.2.17pre19/arch/sparc/kernel/systbls.S 2.2.17pre19aa2/arch/sparc/kernel/systbls.S --- 2.2.17pre19/arch/sparc/kernel/systbls.S Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/sparc/kernel/systbls.S Sat Aug 26 17:13:50 2000 @@ -29,12 +29,12 @@ /*40*/ .long sys_newlstat, sys_dup, sys_pipe, sys_times, sys_nis_syscall /*45*/ .long sys_umount, sys_setgid, sys_getgid, sys_signal, sys_geteuid /*50*/ .long sys_getegid, sys_acct, sys_nis_syscall, sys_nis_syscall, sys_ioctl -/*55*/ .long sys_reboot, sys_lfs_syscall, sys_symlink, sys_readlink, sys_execve -/*60*/ .long sys_umask, sys_chroot, sys_newfstat, sys_lfs_syscall, sys_getpagesize +/*55*/ .long sys_reboot, sys_mmap2, sys_symlink, sys_readlink, sys_execve +/*60*/ .long sys_umask, sys_chroot, sys_newfstat, sys_fstat64, sys_getpagesize /*65*/ .long sys_msync, sys_vfork, sys_pread, sys_pwrite, sys_nis_syscall /*70*/ .long sys_nis_syscall, sys_mmap, sys_nis_syscall, sys_munmap, sys_mprotect -/*75*/ .long sys_nis_syscall, sys_vhangup, sys_lfs_syscall, sys_nis_syscall, sys_getgroups -/*80*/ .long sys_setgroups, sys_getpgrp, sys_nis_syscall, sys_setitimer, sys_lfs_syscall +/*75*/ .long sys_nis_syscall, sys_vhangup, sys_truncate64, sys_nis_syscall, sys_getgroups +/*80*/ .long sys_setgroups, sys_getpgrp, sys_nis_syscall, sys_setitimer, sys_ftruncate64 /*85*/ .long sys_swapon, sys_getitimer, sys_nis_syscall, sys_sethostname, sys_nis_syscall /*90*/ .long sys_dup2, sys_nis_syscall, sys_fcntl, sys_select, sys_nis_syscall /*95*/ .long sys_fsync, sys_setpriority, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall @@ -44,12 +44,12 @@ /*115*/ .long sys_nis_syscall, sys_gettimeofday, sys_getrusage, sys_nis_syscall, sys_getcwd /*120*/ .long sys_readv, sys_writev, sys_settimeofday, sys_fchown, sys_fchmod /*125*/ .long sys_nis_syscall, sys_setreuid, sys_setregid, sys_rename, sys_truncate -/*130*/ .long sys_ftruncate, sys_flock, sys_lfs_syscall, sys_nis_syscall, sys_nis_syscall -/*135*/ .long sys_nis_syscall, sys_mkdir, sys_rmdir, sys_utimes, sys_lfs_syscall -/*140*/ .long sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_getrlimit +/*130*/ .long sys_ftruncate, sys_flock, sys_lstat64, sys_nis_syscall, sys_nis_syscall +/*135*/ .long sys_nis_syscall, sys_mkdir, sys_rmdir, sys_utimes, sys_stat64 +/*140*/ .long sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_fcntl64, sys_getrlimit /*145*/ .long sys_setrlimit, sys_nis_syscall, sys_prctl, sys_pciconfig_read, sys_pciconfig_write -/*150*/ .long sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_poll, sys_nis_syscall -/*155*/ .long sys_nis_syscall, sys_nis_syscall, sys_statfs, sys_fstatfs, sys_oldumount +/*150*/ .long sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_poll, sys_getdents64 +/*155*/ .long sys_fcntl64, sys_nis_syscall, sys_statfs, sys_fstatfs, sys_oldumount /*160*/ .long sys_nis_syscall, sys_nis_syscall, sys_getdomainname, sys_setdomainname, sys_nis_syscall /*165*/ .long sys_quotactl, sys_nis_syscall, sys_mount, sys_ustat, sys_nis_syscall /*170*/ .long sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_getdents diff -urN 2.2.17pre19/arch/sparc/mm/init.c 2.2.17pre19aa2/arch/sparc/mm/init.c --- 2.2.17pre19/arch/sparc/mm/init.c Mon Jan 17 16:44:36 2000 +++ 2.2.17pre19aa2/arch/sparc/mm/init.c Sat Aug 26 17:13:49 2000 @@ -380,4 +380,6 @@ } val->totalram <<= PAGE_SHIFT; val->sharedram <<= PAGE_SHIFT; + val->totalbig = 0; + val->freebig = 0; } diff -urN 2.2.17pre19/arch/sparc64/kernel/sparc64_ksyms.c 2.2.17pre19aa2/arch/sparc64/kernel/sparc64_ksyms.c --- 2.2.17pre19/arch/sparc64/kernel/sparc64_ksyms.c Tue Aug 22 14:53:37 2000 +++ 2.2.17pre19aa2/arch/sparc64/kernel/sparc64_ksyms.c Sat Aug 26 17:13:50 2000 @@ -83,6 +83,7 @@ extern int sys32_ioctl(unsigned int fd, unsigned int cmd, u32 arg); extern int (*handle_mathemu)(struct pt_regs *, struct fpustate *); extern void VISenter(void); +extern long sparc32_open(const char * filename, int flags, int mode); extern void bcopy (const char *, char *, int); extern int __ashrdi3(int, int); @@ -270,6 +271,7 @@ EXPORT_SYMBOL(prom_cpu_nodes); EXPORT_SYMBOL(sys_ioctl); EXPORT_SYMBOL(sys32_ioctl); +EXPORT_SYMBOL(sparc32_open); EXPORT_SYMBOL(move_addr_to_kernel); EXPORT_SYMBOL(move_addr_to_user); #endif diff -urN 2.2.17pre19/arch/sparc64/kernel/sys32.S 2.2.17pre19aa2/arch/sparc64/kernel/sys32.S --- 2.2.17pre19/arch/sparc64/kernel/sys32.S Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/sparc64/kernel/sys32.S Sat Aug 26 17:13:50 2000 @@ -60,3 +60,12 @@ sethi %hi(sys_bdflush), %g1 jmpl %g1 + %lo(sys_bdflush), %g0 sra %o1, 0, %o1 + + .align 32 + .globl sys32_mmap2 +sys32_mmap2: + srl %o4, 0, %o4 + sethi %hi(sys_mmap), %g1 + srl %o5, 0, %o5 + jmpl %g1 + %lo(sys_mmap), %g0 + sllx %o5, 12, %o5 diff -urN 2.2.17pre19/arch/sparc64/kernel/sys_sparc32.c 2.2.17pre19aa2/arch/sparc64/kernel/sys_sparc32.c --- 2.2.17pre19/arch/sparc64/kernel/sys_sparc32.c Tue Aug 22 14:53:37 2000 +++ 2.2.17pre19aa2/arch/sparc64/kernel/sys_sparc32.c Sat Aug 26 17:13:51 2000 @@ -597,15 +597,27 @@ old_fs = get_fs(); set_fs (KERNEL_DS); ret = sys_fcntl(fd, cmd, (unsigned long)&f); set_fs (old_fs); + if (ret) return ret; + if (f.l_start >= 0x7fffffffUL || + f.l_len >= 0x7fffffffUL || + f.l_start + f.l_len >= 0x7fffffffUL) + return -EOVERFLOW; if(put_flock(&f, (struct flock32 *)arg)) return -EFAULT; - return ret; + return 0; } default: return sys_fcntl(fd, cmd, (unsigned long)arg); } } +asmlinkage long sys32_fcntl64(unsigned int fd, unsigned int cmd, unsigned long arg) +{ + if (cmd >= F_GETLK64 && cmd <= F_SETLKW64) + return sys_fcntl(fd, cmd + F_GETLK - F_GETLK64, arg); + return sys32_fcntl(fd, cmd, arg); +} + struct dqblk32 { __u32 dqb_bhardlimit; __u32 dqb_bsoftlimit; @@ -717,6 +729,25 @@ return ret; } +extern asmlinkage long sys_truncate(const char * path, unsigned long length); +extern asmlinkage long sys_ftruncate(unsigned int fd, unsigned long length); + +asmlinkage int sys32_truncate64(const char * path, unsigned long high, unsigned long low) +{ + if ((int)high < 0) + return -EINVAL; + else + return sys_truncate(path, (high << 32) | low); +} + +asmlinkage int sys32_ftruncate64(unsigned int fd, unsigned long high, unsigned long low) +{ + if ((int)high < 0) + return -EINVAL; + else + return sys_ftruncate(fd, (high << 32) | low); +} + extern asmlinkage int sys_utime(char * filename, struct utimbuf * times); struct utimbuf32 { @@ -914,7 +945,7 @@ }; static int fillonedir(void * __buf, const char * name, int namlen, - off_t offset, ino_t ino) + off_t offset, ino_t ino, unsigned int d_type) { struct readdir_callback32 * buf = (struct readdir_callback32 *) __buf; struct old_linux_dirent32 * dirent; @@ -979,7 +1010,8 @@ int error; }; -static int filldir(void * __buf, const char * name, int namlen, off_t offset, ino_t ino) +static int filldir(void * __buf, const char * name, int namlen, off_t offset, ino_t ino, + unsigned int d_type) { struct linux_dirent32 * dirent; struct getdents_callback32 * buf = (struct getdents_callback32 *) __buf; @@ -4021,4 +4053,37 @@ ret = -EFAULT; return ret; +} + +/* This is just a version for 32-bit applications which does + * not force O_LARGEFILE on. + */ + +asmlinkage long sparc32_open(const char * filename, int flags, int mode) +{ + char * tmp; + int fd, error; + + tmp = getname(filename); + fd = PTR_ERR(tmp); + if (!IS_ERR(tmp)) { + lock_kernel(); + fd = get_unused_fd(); + if (fd >= 0) { + struct file * f = filp_open(tmp, flags, mode); + error = PTR_ERR(f); + if (IS_ERR(f)) + goto out_error; + fd_install(fd, f); + } +out: + unlock_kernel(); + putname(tmp); + } + return fd; + +out_error: + put_unused_fd(fd); + fd = error; + goto out; } diff -urN 2.2.17pre19/arch/sparc64/kernel/sys_sunos32.c 2.2.17pre19aa2/arch/sparc64/kernel/sys_sunos32.c --- 2.2.17pre19/arch/sparc64/kernel/sys_sunos32.c Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/sparc64/kernel/sys_sunos32.c Sat Aug 26 17:13:51 2000 @@ -366,7 +366,7 @@ #define ROUND_UP(x) (((x)+sizeof(s32)-1) & ~(sizeof(s32)-1)) static int sunos_filldir(void * __buf, const char * name, int namlen, - off_t offset, ino_t ino) + off_t offset, ino_t ino, unsigned int d_type) { struct sunos_dirent * dirent; struct sunos_dirent_callback * buf = (struct sunos_dirent_callback *) __buf; @@ -458,7 +458,7 @@ }; static int sunos_filldirentry(void * __buf, const char * name, int namlen, - off_t offset, ino_t ino) + off_t offset, ino_t ino, unsigned int d_type) { struct sunos_direntry * dirent; struct sunos_direntry_callback * buf = (struct sunos_direntry_callback *) __buf; @@ -1296,13 +1296,15 @@ return rval; } +extern asmlinkage long sparc32_open(const char * filename, int flags, int mode); + asmlinkage int sunos_open(u32 filename, int flags, int mode) { int ret; lock_kernel(); current->personality |= PER_BSD; - ret = sys_open ((char *)A(filename), flags, mode); + ret = sparc32_open ((char *)A(filename), flags, mode); unlock_kernel(); return ret; } diff -urN 2.2.17pre19/arch/sparc64/kernel/systbls.S 2.2.17pre19aa2/arch/sparc64/kernel/systbls.S --- 2.2.17pre19/arch/sparc64/kernel/systbls.S Thu May 4 13:00:36 2000 +++ 2.2.17pre19aa2/arch/sparc64/kernel/systbls.S Sat Aug 26 17:13:51 2000 @@ -20,7 +20,7 @@ .globl sys_call_table32 sys_call_table32: /*0*/ .word sys_nis_syscall, sparc_exit, sys_fork, sys_read, sys_write -/*5*/ .word sys_open, sys_close, sys32_wait4, sys_creat, sys_link +/*5*/ .word sparc32_open, sys_close, sys32_wait4, sys_creat, sys_link /*10*/ .word sys_unlink, sunos_execv, sys_chdir, sys32_chown16, sys32_mknod /*15*/ .word sys32_chmod, sys32_lchown16, sparc_brk, sys_perfctr, sys32_lseek /*20*/ .word sys_getpid, sys_capget, sys_capset, sys_setuid, sys_getuid @@ -30,12 +30,12 @@ /*40*/ .word sys32_newlstat, sys_dup, sys_pipe, sys32_times, sys_nis_syscall .word sys_umount, sys_setgid, sys_getgid, sys_signal, sys_geteuid /*50*/ .word sys_getegid, sys_acct, sys_nis_syscall, sys_nis_syscall, sys32_ioctl - .word sys_reboot, sys_lfs_syscall, sys_symlink, sys_readlink, sys32_execve -/*60*/ .word sys_umask, sys_chroot, sys32_newfstat, sys_lfs_syscall, sys_getpagesize + .word sys_reboot, sys32_mmap2, sys_symlink, sys_readlink, sys32_execve +/*60*/ .word sys_umask, sys_chroot, sys32_newfstat, sys_fstat64, sys_getpagesize .word sys_msync, sys_vfork, sys32_pread, sys32_pwrite, sys_nis_syscall /*70*/ .word sys_nis_syscall, sys32_mmap, sys_nis_syscall, sys_munmap, sys_mprotect - .word sys_nis_syscall, sys_vhangup, sys_lfs_syscall, sys_nis_syscall, sys32_getgroups -/*80*/ .word sys32_setgroups, sys_getpgrp, sys_nis_syscall, sys32_setitimer, sys_lfs_syscall + .word sys_nis_syscall, sys_vhangup, sys32_truncate64, sys_nis_syscall, sys32_getgroups +/*80*/ .word sys32_setgroups, sys_getpgrp, sys_nis_syscall, sys32_setitimer, sys32_ftruncate64 .word sys_swapon, sys32_getitimer, sys_nis_syscall, sys_sethostname, sys_nis_syscall /*90*/ .word sys_dup2, sys_nis_syscall, sys32_fcntl, sys32_select, sys_nis_syscall .word sys_fsync, sys_setpriority, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall @@ -45,12 +45,12 @@ .word sys_nis_syscall, sys32_gettimeofday, sys32_getrusage, sys_nis_syscall, sys_getcwd /*120*/ .word sys32_readv, sys32_writev, sys32_settimeofday, sys32_fchown16, sys_fchmod .word sys_nis_syscall, sys32_setreuid, sys32_setregid, sys_rename, sys_truncate -/*130*/ .word sys_ftruncate, sys_flock, sys_lfs_syscall, sys_nis_syscall, sys_nis_syscall - .word sys_nis_syscall, sys_mkdir, sys_rmdir, sys32_utimes, sys_lfs_syscall -/*140*/ .word sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys32_getrlimit +/*130*/ .word sys_ftruncate, sys_flock, sys_lstat64, sys_nis_syscall, sys_nis_syscall + .word sys_nis_syscall, sys_mkdir, sys_rmdir, sys32_utimes, sys_stat64 +/*140*/ .word sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys32_fcntl64, sys32_getrlimit .word sys32_setrlimit, sys_nis_syscall, sys32_prctl, sys32_pciconfig_read, sys32_pciconfig_write -/*150*/ .word sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_poll, sys_nis_syscall - .word sys_nis_syscall, sys_nis_syscall, sys32_statfs, sys32_fstatfs, sys_oldumount +/*150*/ .word sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_poll, sys_getdents64 + .word sys32_fcntl64, sys_nis_syscall, sys32_statfs, sys32_fstatfs, sys_oldumount /*160*/ .word sys_nis_syscall, sys_nis_syscall, sys_getdomainname, sys_setdomainname, sys_nis_syscall .word sys32_quotactl, sys_nis_syscall, sys32_mount, sys_ustat, sys_nis_syscall /*170*/ .word sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall, sys32_getdents @@ -108,7 +108,7 @@ .word sys_socketpair, sys_mkdir, sys_rmdir, sys_utimes, sys_nis_syscall /*140*/ .word sys_nis_syscall, sys_getpeername, sys_nis_syscall, sys_nis_syscall, sys_getrlimit .word sys_setrlimit, sys_nis_syscall, sys_prctl, sys_pciconfig_read, sys_pciconfig_write -/*150*/ .word sys_getsockname, sys_nis_syscall, sys_nis_syscall, sys_poll, sys_nis_syscall +/*150*/ .word sys_getsockname, sys_nis_syscall, sys_nis_syscall, sys_poll, sys_getdents64 .word sys_nis_syscall, sys_nis_syscall, sys_statfs, sys_fstatfs, sys_oldumount /*160*/ .word sys_nis_syscall, sys_nis_syscall, sys_getdomainname, sys_setdomainname, sys_utrap_install .word sys_quotactl, sys_nis_syscall, sys_mount, sys_ustat, sys_nis_syscall diff -urN 2.2.17pre19/arch/sparc64/mm/init.c 2.2.17pre19aa2/arch/sparc64/mm/init.c --- 2.2.17pre19/arch/sparc64/mm/init.c Thu May 4 13:00:37 2000 +++ 2.2.17pre19aa2/arch/sparc64/mm/init.c Sat Aug 26 17:13:49 2000 @@ -1756,4 +1756,6 @@ } val->totalram <<= PAGE_SHIFT; val->sharedram <<= PAGE_SHIFT; + val->totalbig = 0; + val->freebig = 0; } diff -urN 2.2.17pre19/arch/sparc64/solaris/fs.c 2.2.17pre19aa2/arch/sparc64/solaris/fs.c --- 2.2.17pre19/arch/sparc64/solaris/fs.c Mon Jan 17 16:44:36 2000 +++ 2.2.17pre19aa2/arch/sparc64/solaris/fs.c Sat Aug 26 17:13:51 2000 @@ -572,20 +572,20 @@ return error; } +extern asmlinkage long sparc32_open(const char * filename, int flags, int mode); + asmlinkage int solaris_open(u32 filename, int flags, u32 mode) { - int (*sys_open)(const char *,int,int) = - (int (*)(const char *,int,int))SYS(open); int fl = flags & 0xf; -/* if (flags & 0x2000) - allow LFS */ + if (flags & 0x2000) fl |= O_LARGEFILE; if (flags & 0x8050) fl |= O_SYNC; if (flags & 0x80) fl |= O_NONBLOCK; if (flags & 0x100) fl |= O_CREAT; if (flags & 0x200) fl |= O_TRUNC; if (flags & 0x400) fl |= O_EXCL; if (flags & 0x800) fl |= O_NOCTTY; - return sys_open((const char *)A(filename), fl, mode); + return sparc32_open((const char *)A(filename), fl, mode); } #define SOL_F_SETLK 6 diff -urN 2.2.17pre19/drivers/block/Config.in 2.2.17pre19aa2/drivers/block/Config.in --- 2.2.17pre19/drivers/block/Config.in Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/drivers/block/Config.in Sat Aug 26 17:13:50 2000 @@ -96,6 +96,10 @@ comment 'Additional Block Devices' +tristate 'Logical volume manager (LVM) support' CONFIG_BLK_DEV_LVM N +if [ "$CONFIG_BLK_DEV_LVM" != "n" ]; then + bool ' LVM information in proc filesystem' CONFIG_LVM_PROC_FS Y +fi tristate 'Loopback device support' CONFIG_BLK_DEV_LOOP if [ "$CONFIG_NET" = "y" ]; then tristate 'Network block device support' CONFIG_BLK_DEV_NBD diff -urN 2.2.17pre19/drivers/block/Makefile 2.2.17pre19aa2/drivers/block/Makefile --- 2.2.17pre19/drivers/block/Makefile Thu May 4 13:00:37 2000 +++ 2.2.17pre19aa2/drivers/block/Makefile Sat Aug 26 17:13:50 2000 @@ -254,6 +254,14 @@ endif endif +ifeq ($(CONFIG_BLK_DEV_LVM),y) +L_OBJS += lvm.o lvm-snap.o +else + ifeq ($(CONFIG_BLK_DEV_LVM),m) + M_OBJS += lvm-mod.o + endif +endif + ifeq ($(CONFIG_BLK_DEV_MD),y) LX_OBJS += md.o @@ -312,3 +320,6 @@ ide-mod.o: ide.o $(IDE_OBJS) $(LD) $(LD_RFLAG) -r -o $@ ide.o $(IDE_OBJS) + +lvm-mod.o: lvm.o lvm-snap.o + $(LD) -r -o $@ lvm.o lvm-snap.o diff -urN 2.2.17pre19/drivers/block/README.lvm 2.2.17pre19aa2/drivers/block/README.lvm --- 2.2.17pre19/drivers/block/README.lvm Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/drivers/block/README.lvm Sat Aug 26 17:13:50 2000 @@ -0,0 +1,8 @@ + +This is the Logical Volume Manager driver for Linux, + +Tools, library that manage logical volumes can be found +at . + +There you can obtain actual driver versions too. + diff -urN 2.2.17pre19/drivers/block/genhd.c 2.2.17pre19aa2/drivers/block/genhd.c --- 2.2.17pre19/drivers/block/genhd.c Tue Aug 22 14:53:39 2000 +++ 2.2.17pre19aa2/drivers/block/genhd.c Sat Aug 26 17:13:50 2000 @@ -50,6 +50,11 @@ le32_to_cpu(__a); \ }) +#if defined CONFIG_BLK_DEV_LVM || defined CONFIG_BLK_DEV_LVM_MODULE +#include +void ( *lvm_hd_name_ptr) ( char *, int) = NULL; +#endif + struct gendisk *gendisk_head = NULL; static int current_minor = 0; @@ -104,6 +109,13 @@ * MD devices are named md0, md1, ... md15, fix it up here. */ switch (hd->major) { +#if defined CONFIG_BLK_DEV_LVM || defined CONFIG_BLK_DEV_LVM_MODULE + case LVM_BLK_MAJOR: + *buf = 0; + if ( lvm_hd_name_ptr != NULL) + ( lvm_hd_name_ptr) ( buf, minor); + return buf; +#endif case IDE5_MAJOR: unit += 2; case IDE4_MAJOR: diff -urN 2.2.17pre19/drivers/block/ll_rw_blk.c 2.2.17pre19aa2/drivers/block/ll_rw_blk.c --- 2.2.17pre19/drivers/block/ll_rw_blk.c Tue Jun 13 03:48:12 2000 +++ 2.2.17pre19aa2/drivers/block/ll_rw_blk.c Sat Aug 26 17:13:50 2000 @@ -26,6 +26,14 @@ #include +#if defined CONFIG_BLK_DEV_LVM || defined CONFIG_BLK_DEV_LVM_MODULE +#include + /* function pointer to the LVM driver remapping function + which will be setup during driver/module init; neccessary + to be able to load LVM as a module */ +int (*lvm_map_ptr) (struct buffer_head *, int) = NULL; +#endif + /* * The request-struct contains all necessary data * to load a nr of sectors into memory @@ -312,7 +320,7 @@ output.queue_ID = elevator->queue_ID; output.read_latency = elevator->read_latency; output.write_latency = elevator->write_latency; - output.max_bomb_segments = elevator->max_bomb_segments; + output.max_bomb_segments = 0; ret = -EFAULT; if (copy_to_user(arg, &output, sizeof(blkelv_ioctl_arg_t))) @@ -336,12 +344,9 @@ goto out; if (input.write_latency < 0) goto out; - if (input.max_bomb_segments <= 0) - goto out; elevator->read_latency = input.read_latency; elevator->write_latency = input.write_latency; - elevator->max_bomb_segments = input.max_bomb_segments; ret = 0; out: @@ -362,26 +367,15 @@ return -EINVAL; } -static inline int seek_to_not_starving_chunk(struct request ** req, int * lat) +static inline struct request * seek_to_not_starving_chunk(struct request * req) { - struct request * tmp = *req; - int found = 0, pos = 0; - int last_pos = 0, __lat = *lat; + struct request * tmp = req; - do { - if (tmp->elevator_latency <= 0) - { - *req = tmp; - found = 1; - last_pos = pos; - if (last_pos >= __lat) - break; - } - pos += tmp->nr_segments; - } while ((tmp = tmp->next)); - *lat -= last_pos; + while ((tmp = tmp->next)) + if ((tmp->cmd != READ && tmp->cmd != WRITE) || !tmp->elevator_latency) + req = tmp; - return found; + return req; } #define CASE_COALESCE_BUT_FIRST_REQUEST_MAYBE_BUSY \ @@ -426,51 +420,18 @@ #define elevator_starve_rest_of_queue(req) \ do { \ struct request * tmp = (req); \ - for ((tmp) = (tmp)->next; (tmp); (tmp) = (tmp)->next) \ - (tmp)->elevator_latency--; \ + for ((tmp) = (tmp)->next; (tmp); (tmp) = (tmp)->next) { \ + if ((tmp)->cmd != READ && (tmp)->cmd != WRITE) \ + continue; \ + if (--(tmp)->elevator_latency < 0) \ + panic("elevator_starve_rest_of_queue"); \ + } \ } while (0) static inline void elevator_queue(struct request * req, - struct request * tmp, - int latency, - struct blk_dev_struct * dev, - struct request ** queue_head) -{ - struct request * __tmp; - int starving, __latency; - - starving = seek_to_not_starving_chunk(&tmp, &latency); - __tmp = tmp; - __latency = latency; - - for (;; tmp = tmp->next) - { - if ((latency -= tmp->nr_segments) <= 0) - { - tmp = __tmp; - latency = __latency - tmp->nr_segments; - - if (starving) - break; - - switch (MAJOR(req->rq_dev)) - { - CASE_COALESCE_BUT_FIRST_REQUEST_MAYBE_BUSY - if (tmp == dev->current_request) - default: - goto link; - CASE_COALESCE_ALSO_FIRST_REQUEST - } - - latency += tmp->nr_segments; - req->next = tmp; - *queue_head = req; - goto after_link; - } - - if (!tmp->next) - break; - + struct request * tmp) +{ + for (tmp = seek_to_not_starving_chunk(tmp); tmp->next; tmp = tmp->next) { { const int after_current = IN_ORDER(tmp,req); const int before_next = IN_ORDER(req,tmp->next); @@ -485,13 +446,9 @@ } } - link: req->next = tmp->next; tmp->next = req; - after_link: - req->elevator_latency = latency; - elevator_starve_rest_of_queue(req); } @@ -513,7 +470,6 @@ short disk_index; unsigned long flags; int queue_new_request = 0; - int latency; switch (major) { case DAC960_MAJOR+0: @@ -544,7 +500,7 @@ if (disk_index >= 0 && disk_index < 4) drive_stat_acct(req->cmd, req->nr_sectors, disk_index); - latency = get_request_latency(&dev->elevator, req->cmd); + req->elevator_latency = get_request_latency(&dev->elevator, req->cmd); /* * We use the goto to reduce locking complexity @@ -556,13 +512,12 @@ mark_buffer_clean(req->bh); if (!(tmp = *current_request)) { req->next = NULL; - req->elevator_latency = latency; *current_request = req; if (dev->current_request != &dev->plug) queue_new_request = 1; goto out; } - elevator_queue(req, tmp, latency, dev, current_request); + elevator_queue(req, tmp); /* for SCSI devices, call request_fn unconditionally */ if (scsi_blk_major(major) || @@ -610,28 +565,13 @@ wake_up (&wait_for_request); } -#define read_pendings(req) \ -({ \ - int __ret = 0; \ - struct request * tmp = (req); \ - do { \ - if (tmp->cmd == READ) \ - { \ - __ret = 1; \ - break; \ - } \ - tmp = tmp->next; \ - } while (tmp); \ - __ret; \ -}) - void make_request(int major, int rw, struct buffer_head * bh) { unsigned int sector, count; struct request * req, * prev; int rw_ahead, max_req, max_sectors, max_segments; unsigned long flags; - int latency, starving; + int back, front; count = bh->b_size >> 9; sector = bh->b_rsector; @@ -708,8 +648,6 @@ max_sectors = get_max_sectors(bh->b_rdev); max_segments = get_max_segments(bh->b_rdev); - latency = get_request_latency(&blk_dev[major].elevator, rw); - /* * Now we acquire the request spinlock, we have to be mega careful * not to schedule or do something nonatomic @@ -719,6 +657,9 @@ if (!req) { /* MD and loop can't handle plugging without deadlocking */ if (major != MD_MAJOR && major != LOOP_MAJOR && +#if defined CONFIG_BLK_DEV_LVM || defined CONFIG_BLK_DEV_LVM_MODULE + major != LVM_BLK_MAJOR && +#endif major != DDV_MAJOR && major != NBD_MAJOR) plug_device(blk_dev + major); /* is atomic */ } else switch (major) { @@ -733,33 +674,31 @@ * entry may be busy being processed and we thus can't change it. */ if (req == blk_dev[major].current_request) - { if (!(req = req->next)) break; - latency -= req->nr_segments; - } /* fall through */ CASE_COALESCE_ALSO_FIRST_REQUEST - /* avoid write-bombs to not hurt iteractiveness of reads */ - if (rw != READ && read_pendings(req)) - max_segments = blk_dev[major].elevator.max_bomb_segments; - - starving = seek_to_not_starving_chunk(&req, &latency); + req = seek_to_not_starving_chunk(req); prev = NULL; + back = front = 0; do { - if (req->sem) - continue; if (req->cmd != rw) continue; + if (req->rq_dev != bh->b_rdev) + continue; + if (req->sector + req->nr_sectors == sector) + back = 1; + else if (req->sector - count == sector) + front = 1; + if (req->nr_sectors + count > max_sectors) continue; - if (req->rq_dev != bh->b_rdev) + if (req->sem) continue; + /* Can we add it to the end of this request? */ - if (req->sector + req->nr_sectors == sector) { - if (latency - req->nr_segments < 0) - break; + if (back) { if (req->bhtail->b_data + req->bhtail->b_size != bh->b_data) { if (req->nr_segments < max_segments) @@ -770,16 +709,18 @@ req->bhtail = bh; req->nr_sectors += count; - /* latency stuff */ - if ((latency -= req->nr_segments) < req->elevator_latency) - req->elevator_latency = latency; elevator_starve_rest_of_queue(req); /* Can we now merge this req with the next? */ attempt_merge(req, max_sectors, max_segments); /* or to the beginning? */ - } else if (req->sector - count == sector) { - if (!prev && starving) + } else if (front) { + /* + * Check that we didn't seek on a starving request, + * that could happen only at the first pass, thus + * do that only if prev is NULL. + */ + if (!prev && ((req->cmd != READ && req->cmd != WRITE) || !req->elevator_latency)) break; if (bh->b_data + bh->b_size != req->bh->b_data) { @@ -795,8 +736,8 @@ req->nr_sectors += count; /* latency stuff */ - if (latency < --req->elevator_latency) - req->elevator_latency = latency; + --req->elevator_latency; + elevator_starve_rest_of_queue(req); if (prev) @@ -809,7 +750,7 @@ return; } while (prev = req, - (latency -= req->nr_segments) >= 0 && (req = req->next) != NULL); + !front && !back && (req = req->next) != NULL); } /* find an unused request. */ @@ -886,13 +827,34 @@ correct_size, bh[i]->b_size); goto sorry; } - - /* Md remaps blocks now */ + /* LVM and MD remap blocks now */ +#if defined CONFIG_BLK_DEV_LVM || defined CONFIG_BLK_DEV_LVM_MODULE + major = MAJOR(bh[i]->b_dev); + if (major == LVM_BLK_MAJOR) { + if (lvm_map_ptr == NULL) { + printk(KERN_ERR + "Bad lvm_map_ptr in ll_rw_block\n"); + goto sorry; + } + if ((lvm_map_ptr) (bh[i], rw) != 0) { + printk(KERN_ERR + "Bad lvm_map in ll_rw_block\n"); + goto sorry; + } + /* remap major too ... */ + major = MAJOR(bh[i]->b_rdev); + } else { + bh[i]->b_rdev = bh[i]->b_dev; + bh[i]->b_rsector = bh[i]->b_blocknr * (bh[i]->b_size >> 9); + } +#else bh[i]->b_rdev = bh[i]->b_dev; bh[i]->b_rsector=bh[i]->b_blocknr*(bh[i]->b_size >> 9); +#endif #ifdef CONFIG_BLK_DEV_MD if (major==MD_MAJOR && - md_map (MINOR(bh[i]->b_dev), &bh[i]->b_rdev, + /* changed v to allow LVM to remap */ + md_map (MINOR(bh[i]->b_rdev), &bh[i]->b_rdev, &bh[i]->b_rsector, bh[i]->b_size >> 9)) { printk (KERN_ERR "Bad md_map in ll_rw_block\n"); @@ -911,8 +873,10 @@ if (bh[i]) { set_bit(BH_Req, &bh[i]->b_state); #ifdef CONFIG_BLK_DEV_MD - if (MAJOR(bh[i]->b_dev) == MD_MAJOR) { - md_make_request(MINOR (bh[i]->b_dev), rw, bh[i]); + /* changed v to allow LVM to remap */ + if (MAJOR(bh[i]->b_rdev) == MD_MAJOR) { + /* remap device for MD too v */ + md_make_request(MINOR (bh[i]->b_rdev), rw, bh[i]); continue; } #endif @@ -1097,6 +1061,9 @@ #ifdef CONFIG_SJCD sjcd_init(); #endif CONFIG_SJCD +#ifdef CONFIG_BLK_DEV_LVM + lvm_init(); +#endif #ifdef CONFIG_BLK_DEV_MD md_init(); #endif CONFIG_BLK_DEV_MD diff -urN 2.2.17pre19/drivers/block/loop.c 2.2.17pre19aa2/drivers/block/loop.c --- 2.2.17pre19/drivers/block/loop.c Sun Apr 2 21:07:48 2000 +++ 2.2.17pre19aa2/drivers/block/loop.c Sat Aug 26 17:13:51 2000 @@ -143,12 +143,12 @@ int size; if (S_ISREG(lo->lo_dentry->d_inode->i_mode)) - size = (lo->lo_dentry->d_inode->i_size - lo->lo_offset) / BLOCK_SIZE; + size = (lo->lo_dentry->d_inode->i_size - lo->lo_offset) >> BLOCK_SIZE_BITS; else { kdev_t lodev = lo->lo_device; if (blk_size[MAJOR(lodev)]) size = blk_size[MAJOR(lodev)][MINOR(lodev)] - - lo->lo_offset / BLOCK_SIZE; + (lo->lo_offset >> BLOCK_SIZE_BITS); else size = MAX_DISK_SIZE; } diff -urN 2.2.17pre19/drivers/block/lvm-snap.c 2.2.17pre19aa2/drivers/block/lvm-snap.c --- 2.2.17pre19/drivers/block/lvm-snap.c Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/drivers/block/lvm-snap.c Sat Aug 26 17:13:50 2000 @@ -0,0 +1,414 @@ +/* linux/drivers/block/lvm-snap.c + + Copyright (C) 2000 Andrea Arcangeli SuSE + + LVM snapshotting */ + +#include +#include +#include +#include +#include + + +extern const char *const lvm_name; +extern int lvm_blocksizes[]; + +void lvm_snapshot_release(lv_t *); + +#define hashfn(dev,block,mask,chunk_size) \ + ((HASHDEV(dev)^((block)/(chunk_size))) & (mask)) + +static inline lv_block_exception_t * +lvm_find_exception_table(kdev_t org_dev, unsigned long org_start, lv_t * lv) +{ + struct list_head * hash_table = lv->lv_snapshot_hash_table, * next; + unsigned long mask = lv->lv_snapshot_hash_mask; + int chunk_size = lv->lv_chunk_size; + lv_block_exception_t * ret; + int i = 0; + + hash_table = &hash_table[hashfn(org_dev, org_start, mask, chunk_size)]; + ret = NULL; + for (next = hash_table->next; next != hash_table; next = next->next) + { + lv_block_exception_t * exception; + + exception = list_entry(next, lv_block_exception_t, hash); + if (exception->rsector_org == org_start && + exception->rdev_org == org_dev) + { + if (i) + { + /* fun, isn't it? :) */ + list_del(next); + list_add(next, hash_table); + } + ret = exception; + break; + } + i++; + } + return ret; +} + +static inline void lvm_hash_link(lv_block_exception_t * exception, + kdev_t org_dev, unsigned long org_start, + lv_t * lv) +{ + struct list_head * hash_table = lv->lv_snapshot_hash_table; + unsigned long mask = lv->lv_snapshot_hash_mask; + int chunk_size = lv->lv_chunk_size; + + hash_table = &hash_table[hashfn(org_dev, org_start, mask, chunk_size)]; + list_add(&exception->hash, hash_table); +} + +int lvm_snapshot_remap_block(kdev_t * org_dev, unsigned long * org_sector, + unsigned long pe_start, lv_t * lv) +{ + int ret; + unsigned long pe_off, pe_adjustment, __org_start; + kdev_t __org_dev; + int chunk_size = lv->lv_chunk_size; + lv_block_exception_t * exception; + + pe_off = pe_start % chunk_size; + pe_adjustment = (*org_sector-pe_off) % chunk_size; + __org_start = *org_sector - pe_adjustment; + __org_dev = *org_dev; + + ret = 0; + exception = lvm_find_exception_table(__org_dev, __org_start, lv); + if (exception) + { + *org_dev = exception->rdev_new; + *org_sector = exception->rsector_new + pe_adjustment; + ret = 1; + } + return ret; +} + +static void lvm_drop_snapshot(lv_t * lv_snap, const char * reason) +{ + kdev_t last_dev; + int i; + + /* no exception storage space available for this snapshot + or error on this snapshot --> release it */ + invalidate_buffers(lv_snap->lv_dev); + + for (i = last_dev = 0; i < lv_snap->lv_remap_ptr; i++) { + if ( lv_snap->lv_block_exception[i].rdev_new != last_dev) { + last_dev = lv_snap->lv_block_exception[i].rdev_new; + invalidate_buffers(last_dev); + } + } + + lvm_snapshot_release(lv_snap); + + printk(KERN_INFO + "%s -- giving up to snapshot %s on %s due %s\n", + lvm_name, lv_snap->lv_snapshot_org->lv_name, lv_snap->lv_name, + reason); +} + +static inline void lvm_snapshot_prepare_blocks(unsigned long * blocks, + unsigned long start, + int nr_sectors, + int blocksize) +{ + int i, sectors_per_block, nr_blocks; + + sectors_per_block = blocksize >> 9; + nr_blocks = nr_sectors / sectors_per_block; + start /= sectors_per_block; + + for (i = 0; i < nr_blocks; i++) + blocks[i] = start++; +} + +static inline int get_blksize(kdev_t dev) +{ + int correct_size = BLOCK_SIZE, i, major; + + major = MAJOR(dev); + if (blksize_size[major]) + { + i = blksize_size[major][MINOR(dev)]; + if (i) + correct_size = i; + } + return correct_size; +} + +#ifdef DEBUG_SNAPSHOT +static inline void invalidate_snap_cache(unsigned long start, unsigned long nr, + kdev_t dev) +{ + struct buffer_head * bh; + int sectors_per_block, i, blksize, minor; + + minor = MINOR(dev); + blksize = lvm_blocksizes[minor]; + sectors_per_block = blksize >> 9; + nr /= sectors_per_block; + start /= sectors_per_block; + + for (i = 0; i < nr; i++) + { + bh = get_hash_table(dev, start++, blksize); + if (bh) + bforget(bh); + } +} +#endif + +/* + * copy on write handler for one snapshot logical volume + * + * read the original blocks and store it/them on the new one(s). + * if there is no exception storage space free any longer --> release snapshot. + * + * this routine gets called for each _first_ write to a physical chunk. + */ +int lvm_snapshot_COW(kdev_t org_phys_dev, + unsigned long org_phys_sector, + unsigned long org_pe_start, + unsigned long org_virt_sector, + lv_t * lv_snap) +{ + const char * reason; + unsigned long org_start, snap_start, snap_phys_dev, virt_start, pe_off; + int idx = lv_snap->lv_remap_ptr, chunk_size = lv_snap->lv_chunk_size; + struct kiobuf * iobuf; + unsigned long blocks[KIO_MAX_SECTORS]; + int blksize_snap, blksize_org, min_blksize, max_blksize; + int max_sectors, nr_sectors; + + /* check if we are out of snapshot space */ + if (idx >= lv_snap->lv_remap_end) + goto fail_out_of_space; + + /* calculate physical boundaries of source chunk */ + pe_off = org_pe_start % chunk_size; + org_start = org_phys_sector - ((org_phys_sector-pe_off) % chunk_size); + virt_start = org_virt_sector - (org_phys_sector - org_start); + + /* calculate physical boundaries of destination chunk */ + snap_phys_dev = lv_snap->lv_block_exception[idx].rdev_new; + snap_start = lv_snap->lv_block_exception[idx].rsector_new; + +#ifdef DEBUG_SNAPSHOT + printk(KERN_INFO + "%s -- COW: " + "org %02d:%02d faulting %lu start %lu, " + "snap %02d:%02d start %lu, " + "size %d, pe_start %lu pe_off %lu, virt_sec %lu\n", + lvm_name, + MAJOR(org_phys_dev), MINOR(org_phys_dev), org_phys_sector, + org_start, + MAJOR(snap_phys_dev), MINOR(snap_phys_dev), snap_start, + chunk_size, + org_pe_start, pe_off, + org_virt_sector); +#endif + + iobuf = lv_snap->lv_iobuf; + + blksize_org = get_blksize(org_phys_dev); + blksize_snap = get_blksize(snap_phys_dev); + max_blksize = max(blksize_org, blksize_snap); + min_blksize = min(blksize_org, blksize_snap); + max_sectors = KIO_MAX_SECTORS * (min_blksize>>9); + + if (chunk_size % (max_blksize>>9)) + goto fail_blksize; + + while (chunk_size) + { + nr_sectors = min(chunk_size, max_sectors); + chunk_size -= nr_sectors; + + iobuf->length = nr_sectors << 9; + + lvm_snapshot_prepare_blocks(blocks, org_start, + nr_sectors, blksize_org); + if (brw_kiovec(READ, 1, &iobuf, org_phys_dev, + blocks, blksize_org) != (nr_sectors<<9)) + goto fail_raw_read; + + lvm_snapshot_prepare_blocks(blocks, snap_start, + nr_sectors, blksize_snap); + if (brw_kiovec(WRITE, 1, &iobuf, snap_phys_dev, + blocks, blksize_snap) != (nr_sectors<<9)) + goto fail_raw_write; + } + +#ifdef DEBUG_SNAPSHOT + /* invalidate the logcial snapshot buffer cache */ + invalidate_snap_cache(virt_start, lv_snap->lv_chunk_size, + lv_snap->lv_dev); +#endif + + /* the original chunk is now stored on the snapshot volume + so update the execption table */ + lv_snap->lv_block_exception[idx].rdev_org = org_phys_dev; + lv_snap->lv_block_exception[idx].rsector_org = org_start; + lvm_hash_link(lv_snap->lv_block_exception + idx, + org_phys_dev, org_start, lv_snap); + lv_snap->lv_remap_ptr = idx + 1; + return 0; + + /* slow path */ + out: + lvm_drop_snapshot(lv_snap, reason); + return 1; + + fail_out_of_space: + reason = "out of space"; + goto out; + fail_raw_read: + reason = "read error"; + goto out; + fail_raw_write: + reason = "write error"; + goto out; + fail_blksize: + reason = "blocksize error"; + goto out; +} + +static int lvm_snapshot_alloc_iobuf_pages(struct kiobuf * iobuf, int sectors) +{ + int bytes, nr_pages, err, i; + + bytes = sectors << 9; + nr_pages = (bytes + ~PAGE_MASK) >> PAGE_SHIFT; + err = expand_kiobuf(iobuf, nr_pages); + if (err) + goto out; + + err = -ENOMEM; + iobuf->locked = 1; + iobuf->nr_pages = 0; + for (i = 0; i < nr_pages; i++) + { + struct page * page; + +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,3,27) + page = alloc_page(GFP_KERNEL); + if (!page) + goto out; +#else + { + unsigned long addr = __get_free_page(GFP_USER); + if (!addr) + goto out; + iobuf->pagelist[i] = addr; + page = mem_map + MAP_NR(addr); + } +#endif + + iobuf->maplist[i] = page; + /* the only point to lock the page here is to be allowed + to share unmap_kiobuf() in the fail-path */ +#ifndef LockPage +#define LockPage(map) set_bit(PG_locked, &(map)->flags) +#endif + LockPage(page); + iobuf->nr_pages++; + } + iobuf->offset = 0; + + err = 0; + out: + return err; +} + +static int calc_max_buckets(void) +{ + unsigned long mem; + + mem = num_physpages << PAGE_SHIFT; + mem /= 100; + mem *= 2; + mem /= sizeof(struct list_head); + + return mem; +} + +static int lvm_snapshot_alloc_hash_table(lv_t * lv) +{ + int err; + unsigned long buckets, max_buckets, size; + struct list_head * hash; + + buckets = lv->lv_remap_end; + max_buckets = calc_max_buckets(); + buckets = min(buckets, max_buckets); + while (buckets & (buckets-1)) + buckets &= (buckets-1); + + size = buckets * sizeof(struct list_head); + + err = -ENOMEM; + hash = vmalloc(size); + lv->lv_snapshot_hash_table = hash; + + if (!hash) + goto out; + + lv->lv_snapshot_hash_mask = buckets-1; + while (buckets--) + INIT_LIST_HEAD(hash+buckets); + err = 0; + out: + return err; +} + +int lvm_snapshot_alloc(lv_t * lv_snap) +{ + int err, blocksize, max_sectors; + + err = alloc_kiovec(1, &lv_snap->lv_iobuf); + if (err) + goto out; + + blocksize = lvm_blocksizes[MINOR(lv_snap->lv_dev)]; + max_sectors = KIO_MAX_SECTORS << (PAGE_SHIFT-9); + + err = lvm_snapshot_alloc_iobuf_pages(lv_snap->lv_iobuf, max_sectors); + if (err) + goto out_free_kiovec; + + err = lvm_snapshot_alloc_hash_table(lv_snap); + if (err) + goto out_free_kiovec; + out: + return err; + + out_free_kiovec: + unmap_kiobuf(lv_snap->lv_iobuf); + free_kiovec(1, &lv_snap->lv_iobuf); + goto out; +} + +void lvm_snapshot_release(lv_t * lv) +{ + if (lv->lv_block_exception) + { + vfree(lv->lv_block_exception); + lv->lv_block_exception = NULL; + } + if (lv->lv_snapshot_hash_table) + { + vfree(lv->lv_snapshot_hash_table); + lv->lv_snapshot_hash_table = NULL; + } + if (lv->lv_iobuf) + { + free_kiovec(1, &lv->lv_iobuf); + lv->lv_iobuf = NULL; + } +} diff -urN 2.2.17pre19/drivers/block/lvm.c 2.2.17pre19aa2/drivers/block/lvm.c --- 2.2.17pre19/drivers/block/lvm.c Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/drivers/block/lvm.c Sat Aug 26 17:13:50 2000 @@ -0,0 +1,2576 @@ +/* + * kernel/lvm.c + * + * Copyright (C) 1997 - 2000 Heinz Mauelshagen, Germany + * + * February-November 1997 + * April-May,July-August,November 1998 + * January-March,May,July,September,October 1999 + * + * + * LVM driver is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * LVM driver is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU CC; see the file COPYING. If not, write to + * the Free Software Foundation, 59 Temple Place - Suite 330, + * Boston, MA 02111-1307, USA. + * + */ + +/* + * Changelog + * + * 09/11/1997 - added chr ioctls VG_STATUS_GET_COUNT + * and VG_STATUS_GET_NAMELIST + * 18/01/1998 - change lvm_chr_open/close lock handling + * 30/04/1998 - changed LV_STATUS ioctl to LV_STATUS_BYNAME and + * - added LV_STATUS_BYINDEX ioctl + * - used lvm_status_byname_req_t and + * lvm_status_byindex_req_t vars + * 04/05/1998 - added multiple device support + * 08/05/1998 - added support to set/clear extendable flag in volume group + * 09/05/1998 - changed output of lvm_proc_get_info() because of + * support for free (eg. longer) logical volume names + * 12/05/1998 - added spin_locks (thanks to Pascal van Dam + * ) + * 25/05/1998 - fixed handling of locked PEs in lvm_map() and lvm_chr_ioctl() + * 26/05/1998 - reactivated verify_area by access_ok + * 07/06/1998 - used vmalloc/vfree instead of kmalloc/kfree to go + * beyond 128/256 KB max allocation limit per call + * - #ifdef blocked spin_lock calls to avoid compile errors + * with 2.0.x + * 11/06/1998 - another enhancement to spinlock code in lvm_chr_open() + * and use of LVM_VERSION_CODE instead of my own macros + * (thanks to Michael Marxmeier ) + * 07/07/1998 - added statistics in lvm_map() + * 08/07/1998 - saved statistics in do_lv_extend_reduce() + * 25/07/1998 - used __initfunc macro + * 02/08/1998 - changes for official char/block major numbers + * 07/08/1998 - avoided init_module() and cleanup_module() to be static + * 30/08/1998 - changed VG lv_open counter from sum of LV lv_open counters + * to sum of LVs open (no matter how often each is) + * 01/09/1998 - fixed lvm_gendisk.part[] index error + * 07/09/1998 - added copying of lv_current_pe-array + * in LV_STATUS_BYINDEX ioctl + * 17/11/1998 - added KERN_* levels to printk + * 13/01/1999 - fixed LV index bug in do_lv_create() which hit lvrename + * 07/02/1999 - fixed spinlock handling bug in case of LVM_RESET + * by moving spinlock code from lvm_chr_open() + * to lvm_chr_ioctl() + * - added LVM_LOCK_LVM ioctl to lvm_chr_ioctl() + * - allowed LVM_RESET and retrieval commands to go ahead; + * only other update ioctls are blocked now + * - fixed pv->pe to NULL for pv_status + * - using lv_req structure in lvm_chr_ioctl() now + * - fixed NULL ptr reference bug in do_lv_extend_reduce() + * caused by uncontiguous PV array in lvm_chr_ioctl(VG_REDUCE) + * 09/02/1999 - changed BLKRASET and BLKRAGET in lvm_chr_ioctl() to + * handle lgoical volume private read ahead sector + * - implemented LV read_ahead handling with lvm_blk_read() + * and lvm_blk_write() + * 10/02/1999 - implemented 2.[12].* support function lvm_hd_name() + * to be used in drivers/block/genhd.c by disk_name() + * 12/02/1999 - fixed index bug in lvm_blk_ioctl(), HDIO_GETGEO + * - enhanced gendisk insert/remove handling + * 16/02/1999 - changed to dynamic block minor number allocation to + * have as much as 99 volume groups with 256 logical volumes + * as the grand total; this allows having 1 volume group with + * up to 256 logical volumes in it + * 21/02/1999 - added LV open count information to proc filesystem + * - substituted redundant LVM_RESET code by calls + * to do_vg_remove() + * 22/02/1999 - used schedule_timeout() to be more responsive + * in case of do_vg_remove() with lots of logical volumes + * 19/03/1999 - fixed NULL pointer bug in module_init/lvm_init + * 17/05/1999 - used DECLARE_WAIT_QUEUE_HEAD macro (>2.3.0) + * - enhanced lvm_hd_name support + * 03/07/1999 - avoided use of KERNEL_VERSION macro based ifdefs and + * memcpy_tofs/memcpy_fromfs macro redefinitions + * 06/07/1999 - corrected reads/writes statistic counter copy in case + * of striped logical volume + * 28/07/1999 - implemented snapshot logical volumes + * - lvm_chr_ioctl + * - LV_STATUS_BYINDEX + * - LV_STATUS_BYNAME + * - do_lv_create + * - do_lv_remove + * - lvm_map + * - new lvm_snapshot_remap_block + * - new lvm_snapshot_remap_new_block + * 08/10/1999 - implemented support for multiple snapshots per + * original logical volume + * 12/10/1999 - support for 2.3.19 + * 11/11/1999 - support for 2.3.28 + * 21/11/1999 - changed lvm_map() interface to buffer_head based + * 19/12/1999 - support for 2.3.33 + * 01/01/2000 - changed locking concept in lvm_map(), + * do_vg_create() and do_lv_remove() + * + */ + + +/* + * TODO + * + * - implement special handling of unavailable physical volumes + * + */ + +char *lvm_version = "LVM version 0.8e by Heinz Mauelshagen (4/1/2000)\n"; +char *lvm_short_version = "version 0.8e (4/1/2000)"; + +#define MAJOR_NR LVM_BLK_MAJOR +#define DEVICE_OFF(device) + +#include +#include + +#ifdef MODVERSIONS +# undef MODULE +# define MODULE +# include +#endif + +#ifdef MODULE +# include +#endif + +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_KERNELD +#include +#endif + +#include +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 0) +#include +#endif + +#include +#include + +#define LVM_CORRECT_READ_AHEAD(a) \ +do { \ + if ((a) < LVM_MIN_READ_AHEAD) \ + (a) = LVM_MIN_READ_AHEAD; \ + if ((a) > LVM_MAX_READ_AHEAD) \ + (a) = LVM_MAX_READ_AHEAD; \ +} while(0) + +#define suser() ( current->uid == 0 && current->euid == 0) + + +/* + * External function prototypes + */ +#ifdef MODULE +int init_module ( void); +void cleanup_module ( void); +#else +extern int lvm_init ( void); +#endif + +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 30) +static void lvm_dummy_device_request ( request_queue_t*); +#else +static void lvm_dummy_device_request ( void); +#endif +static int lvm_blk_ioctl ( struct inode *, struct file *, uint, ulong); +static int lvm_blk_open ( struct inode *, struct file *); + +static int lvm_chr_open ( struct inode *, struct file *); + +static int lvm_chr_release ( struct inode *, struct file *); +static int lvm_blk_release ( struct inode *, struct file *); + +static int lvm_chr_ioctl ( struct inode *, struct file *, uint, ulong); + +#if defined CONFIG_LVM_PROC_FS && defined CONFIG_PROC_FS +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 30) +static int lvm_proc_get_info ( char *, char **, off_t, int); +static int (*lvm_proc_get_info_ptr)(char *, char **, off_t, int) = + &lvm_proc_get_info; +#else +static int lvm_proc_get_info ( char *, char **, off_t, int, int); +#endif +#endif + +#ifdef LVM_HD_NAME +void lvm_hd_name ( char*, int); +#endif + +/* external snapshot calls */ +int lvm_snapshot_remap_block ( kdev_t*, ulong*, unsigned long, lv_t*); +int lvm_snapshot_COW(kdev_t, unsigned long, unsigned long, + unsigned long, lv_t *); +int lvm_snapshot_alloc(lv_t *); +void lvm_snapshot_release(lv_t *); + +/* End external function prototypes */ + + +/* + * Internal function prototypes + */ +static void lvm_init_vars ( void); +extern int (*lvm_map_ptr) ( struct buffer_head*, int); + + +#ifdef LVM_HD_NAME +extern void (*lvm_hd_name_ptr) ( char*, int); +#endif +static int lvm_map ( struct buffer_head*, int); +static int do_vg_create ( int, void *); +static int do_vg_remove ( int); +static int do_lv_create ( int, char *, lv_t *); +static int do_lv_remove ( int, char *, int); +static int do_lv_extend_reduce ( int, char *, lv_t *); +static void lvm_geninit ( struct gendisk *); +#ifdef LVM_GET_INODE + static struct inode *lvm_get_inode ( int); + void lvm_clear_inode ( struct inode *); +#endif +inline int lvm_strlen ( char *); +inline void lvm_memcpy ( char *, char *, int); +inline int lvm_strcmp ( char *, char *); +inline char *lvm_strrchr ( char *, char c); +/* END Internal function prototypes */ + + +/* volume group descriptor area pointers */ +static vg_t *vg[ABS_MAX_VG + 1]; +static pv_t *pvp = NULL; +static lv_t *lvp = NULL; +static pe_t *pep = NULL; +static pe_t *pep1 = NULL; + + +/* map from block minor number to VG and LV numbers */ +typedef struct { + int vg_number; + int lv_number; +} vg_lv_map_t; +static vg_lv_map_t vg_lv_map[ABS_MAX_LV]; + + +/* Request structures (lvm_chr_ioctl()) */ +static pv_change_req_t pv_change_req; +static pv_flush_req_t pv_flush_req; +static pv_status_req_t pv_status_req; +static pe_lock_req_t pe_lock_req; +static le_remap_req_t le_remap_req; +static lv_req_t lv_req; + +#ifdef LVM_TOTAL_RESET +static int lvm_reset_spindown = 0; +#endif + +static char pv_name[NAME_LEN]; +/* static char rootvg[NAME_LEN] = { 0, }; */ +static uint lv_open = 0; +const char *const lvm_name = LVM_NAME; +static int lock = 0; +static int loadtime = 0; +static uint vg_count = 0; +static long lvm_chr_open_count = 0; +static ushort lvm_iop_version = LVM_DRIVER_IOP_VERSION; +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 0) +static DECLARE_WAIT_QUEUE_HEAD ( lvm_wait); +static DECLARE_WAIT_QUEUE_HEAD ( lvm_map_wait); +#else +struct wait_queue *lvm_wait = NULL; +struct wait_queue *lvm_map_wait = NULL; +#endif + +static spinlock_t lvm_lock = SPIN_LOCK_UNLOCKED; + +#if defined CONFIG_LVM_PROC_FS && defined CONFIG_PROC_FS +#if LINUX_VERSION_CODE < KERNEL_VERSION ( 2, 3, 31) +static struct proc_dir_entry lvm_proc_entry = { + 0, 3, LVM_NAME, S_IFREG | S_IRUGO, + 1, 0, 0, 0, + NULL, + lvm_proc_get_info, + NULL, NULL, NULL, NULL, NULL, +}; +#endif +#endif + +static struct file_operations lvm_chr_fops = { + ioctl: lvm_chr_ioctl, + open: lvm_chr_open, + release: lvm_chr_release, +}; + +#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 3, 38) +static struct file_operations lvm_blk_fops = { + read: block_read, + write: block_write, + ioctl: lvm_blk_ioctl, + open: lvm_blk_open, + release: lvm_blk_release, + fsync: block_fsync, +}; +#else +static struct block_device_operations lvm_blk_fops = +{ + open: lvm_blk_open, + release: lvm_blk_release, + ioctl: lvm_blk_ioctl, +}; +#endif + +/* gendisk structures */ +static struct hd_struct lvm_hd_struct[MAX_LV]; +int lvm_blocksizes[MAX_LV] = { 0, }; +static int lvm_size[MAX_LV] = { 0, }; +static struct gendisk lvm_gendisk = { + MAJOR_NR, /* major # */ + LVM_NAME, /* name of major */ + 0, /* number of times minor is shifted + to get real minor */ + 1, /* maximum partitions per device */ + MAX_LV, /* maximum number of real devices */ + lvm_geninit, /* initialization called before we + do other things */ + lvm_hd_struct, /* partition table */ + lvm_size, /* device size in blocks, copied + to block_size[] */ + MAX_LV, /* number or real devices */ + NULL, /* internal */ + NULL, /* pointer to next gendisk struct (internal) */ +}; + + +#ifdef MODULE +/* + * Module initialization... + */ +int init_module ( void) +#else +/* + * Driver initialization... + */ +#ifdef __initfunc +__initfunc ( int lvm_init ( void)) +#else +int __init lvm_init ( void) +#endif +#endif /* #ifdef MODULE */ +{ + struct gendisk *gendisk_ptr = NULL; + + lvm_init_vars (); + + /* insert our gendisk at the corresponding major */ + lvm_geninit ( &lvm_gendisk); + if ( gendisk_head != NULL) { + gendisk_ptr = gendisk_head; + while ( gendisk_ptr->next != NULL && + gendisk_ptr->major > lvm_gendisk.major) { + gendisk_ptr = gendisk_ptr->next; + } + lvm_gendisk.next = gendisk_ptr->next; + gendisk_ptr->next = &lvm_gendisk; + } else { + gendisk_head = &lvm_gendisk; + lvm_gendisk.next = NULL; + } + + /* reference from drivers/block/ll_rw_blk.c */ + lvm_map_ptr = lvm_map; + +#ifdef LVM_HD_NAME + /* reference from drivers/block/genhd.c */ + lvm_hd_name_ptr = lvm_hd_name; +#endif + +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 30) + blk_init_queue ( BLK_DEFAULT_QUEUE ( MAJOR_NR), lvm_dummy_device_request); +#else + blk_dev[MAJOR_NR].request_fn = lvm_dummy_device_request; + blk_dev[MAJOR_NR].current_request = NULL; +#endif + + /* optional read root VGDA */ +/* + if ( *rootvg != 0) { + vg_read_with_pv_and_lv ( rootvg, &vg); + } +*/ + + if ( register_chrdev ( LVM_CHAR_MAJOR, lvm_name, &lvm_chr_fops) < 0) { + printk ( KERN_ERR "%s -- register_chrdev failed\n", lvm_name); + return -EIO; + } + if ( register_blkdev ( MAJOR_NR, lvm_name, &lvm_blk_fops) < 0) { + printk ( "%s -- register_blkdev failed\n", lvm_name); + if ( unregister_chrdev ( LVM_CHAR_MAJOR, lvm_name) < 0) + printk ( KERN_ERR "%s -- unregister_chrdev failed\n", lvm_name); + return -EIO; + } + +#if defined CONFIG_LVM_PROC_FS && defined CONFIG_PROC_FS +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 25) + create_proc_info_entry ( LVM_NAME, S_IFREG | S_IRUGO, + &proc_root, lvm_proc_get_info_ptr); +# else + proc_register ( &proc_root, &lvm_proc_entry); +# endif +#endif + + printk ( KERN_INFO + "%s%s -- " +#ifdef MODULE + "Module" +#else + "Driver" +#endif + " successfully initialized\n", + lvm_version, lvm_name); + + return 0; +} /* init_module () / lvm_init () */ + + +#ifdef MODULE +/* + * Module cleanup... + */ +void cleanup_module ( void) { + struct gendisk *gendisk_ptr = NULL, *gendisk_ptr_prev = NULL; + + if ( unregister_chrdev ( LVM_CHAR_MAJOR, lvm_name) < 0) { + printk ( KERN_ERR "%s -- unregister_chrdev failed\n", lvm_name); + } + if ( unregister_blkdev ( MAJOR_NR, lvm_name) < 0) { + printk ( KERN_ERR "%s -- unregister_blkdev failed\n", lvm_name); + } + +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 30) + blk_cleanup_queue ( BLK_DEFAULT_QUEUE ( MAJOR_NR)); +#else + blk_dev[MAJOR_NR].request_fn = NULL; + blk_dev[MAJOR_NR].current_request = NULL; +#endif + + gendisk_ptr = gendisk_ptr_prev = gendisk_head; + while ( gendisk_ptr != NULL) { + if ( gendisk_ptr == &lvm_gendisk) break; + gendisk_ptr_prev = gendisk_ptr; + gendisk_ptr = gendisk_ptr->next; + } + /* delete our gendisk from chain */ + if ( gendisk_ptr == &lvm_gendisk) gendisk_ptr_prev->next = gendisk_ptr->next; + + blk_size[MAJOR_NR] = NULL; + blksize_size[MAJOR_NR] = NULL; + +#if defined CONFIG_LVM_PROC_FS && defined CONFIG_PROC_FS +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 30) + remove_proc_entry ( LVM_NAME, &proc_root); +# else + proc_unregister ( &proc_root, lvm_proc_entry.low_ino); +# endif +#endif + + /* reference from linux/drivers/block/ll_rw_blk.c */ + lvm_map_ptr = NULL; + +#ifdef LVM_HD_NAME + /* reference from linux/drivers/block/genhd.c */ + lvm_hd_name_ptr = NULL; +#endif + + printk ( KERN_INFO "%s -- Module successfully deactivated\n", lvm_name); + + return; +} /* void cleanup_module () */ +#endif /* #ifdef MODULE */ + + +/* + * support function to initialize lvm variables + */ +#ifdef __initfunc +__initfunc ( void lvm_init_vars ( void)) +#else +void __init lvm_init_vars ( void) +#endif +{ + int v; + + loadtime = CURRENT_TIME; + + lvm_lock = SPIN_LOCK_UNLOCKED; + + pe_lock_req.lock = UNLOCK_PE; + pe_lock_req.data.lv_dev = \ + pe_lock_req.data.pv_dev = \ + pe_lock_req.data.pv_offset = 0; + + /* Initialize VG pointers */ + for ( v = 0; v <= ABS_MAX_VG; v++) vg[v] = NULL; + + /* Initialize LV -> VG association */ + for ( v = 0; v < ABS_MAX_LV; v++) { + /* index ABS_MAX_VG never used for real VG */ + vg_lv_map[v].vg_number = ABS_MAX_VG; + vg_lv_map[v].lv_number = -1; + } + + return; +} /* lvm_init_vars () */ + + +/******************************************************************** + * + * Character device functions + * + ********************************************************************/ + +/* + * character device open routine + */ +static int lvm_chr_open ( struct inode *inode, + struct file *file) { + int minor = MINOR ( inode->i_rdev); + +#ifdef DEBUG + printk ( KERN_DEBUG + "%s -- lvm_chr_open MINOR: %d VG#: %d mode: 0x%X lock: %d\n", + lvm_name, minor, VG_CHR(minor), file->f_mode, lock); +#endif + + /* super user validation */ + if ( ! suser()) return -EACCES; + + /* Group special file open */ + if ( VG_CHR(minor) > MAX_VG) return -ENXIO; + +#ifdef MODULE + MOD_INC_USE_COUNT; +#endif + + lvm_chr_open_count++; + return 0; +} /* lvm_chr_open () */ + + +/* + * character device i/o-control routine + * + * Only one changing process can do ioctl at one time, others will block. + * + */ +static int lvm_chr_ioctl ( struct inode *inode, struct file *file, + uint command, ulong a) { + int minor = MINOR ( inode->i_rdev); + int extendable; + ulong l, le, p, v; + ulong size; + void *arg = ( void*) a; +#ifdef LVM_GET_INODE + struct inode *inode_sav; +#endif + lv_status_byname_req_t lv_status_byname_req; + lv_status_byindex_req_t lv_status_byindex_req; + lv_t lv; + + /* otherwise cc will complain about unused variables */ + ( void) lvm_lock; + + +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_chr_ioctl: command: 0x%X MINOR: %d " + "VG#: %d mode: 0x%X\n", + lvm_name, command, minor, VG_CHR(minor), file->f_mode); +#endif + +#ifdef LVM_TOTAL_RESET + if ( lvm_reset_spindown > 0) return -EACCES; +#endif + + + /* Main command switch */ + switch ( command) { + /* lock the LVM */ + case LVM_LOCK_LVM: +lock_try_again: + spin_lock ( &lvm_lock); + if( lock != 0 && lock != current->pid ) { +#ifdef DEBUG_IOCTL + printk ( KERN_INFO "lvm_chr_ioctl: %s is locked by pid %d ...\n", + lvm_name, lock); +#endif + spin_unlock ( &lvm_lock); + interruptible_sleep_on ( &lvm_wait); + if ( current->sigpending != 0) return -EINTR; +#ifdef LVM_TOTAL_RESET + if ( lvm_reset_spindown > 0) return -EACCES; +#endif + goto lock_try_again; + } + lock = current->pid; + spin_unlock ( &lvm_lock); + return 0; + + + /* check lvm version to ensure driver/tools+lib interoperability */ + case LVM_GET_IOP_VERSION: + if ( copy_to_user ( arg, &lvm_iop_version, sizeof ( ushort)) != 0) + return -EFAULT; + return 0; + + +#ifdef LVM_TOTAL_RESET + /* lock reset function */ + case LVM_RESET: + lvm_reset_spindown = 1; + for ( v = 0; v < ABS_MAX_VG; v++) { + if ( vg[v] != NULL) { + do_vg_remove ( v); + } + } + +#ifdef MODULE + while ( GET_USE_COUNT ( &__this_module) < 1) + MOD_INC_USE_COUNT; + while ( GET_USE_COUNT ( &__this_module) > 1) + MOD_DEC_USE_COUNT; +#endif /* MODULE */ + lock = 0; /* release lock */ + wake_up_interruptible ( &lvm_wait); + return 0; +#endif /* LVM_TOTAL_RESET */ + + + /* lock/unlock i/o to a physical extent to move it to another + physical volume (move's done in user space's pvmove) */ + case PE_LOCK_UNLOCK: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &pe_lock_req, arg, sizeof ( pe_lock_req_t)) != 0) + return -EFAULT; + + switch ( pe_lock_req.lock) { + case LOCK_PE: + for ( p = 0; p < vg[VG_CHR(minor)]->pv_max; p++) { + if ( vg[VG_CHR(minor)]->pv[p] != NULL && + pe_lock_req.data.pv_dev == + vg[VG_CHR(minor)]->pv[p]->pv_dev) + break; + } + + if ( p == vg[VG_CHR(minor)]->pv_max) return -ENXIO; + + pe_lock_req.lock = UNLOCK_PE; + fsync_dev ( pe_lock_req.data.lv_dev); + pe_lock_req.lock = LOCK_PE; + break; + + case UNLOCK_PE: + pe_lock_req.lock = UNLOCK_PE; + pe_lock_req.data.lv_dev = \ + pe_lock_req.data.pv_dev = \ + pe_lock_req.data.pv_offset = 0; + wake_up ( &lvm_map_wait); + break; + + default: + return -EINVAL; + } + + return 0; + + + /* remap a logical extent (after moving the physical extent) */ + case LE_REMAP: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &le_remap_req, arg, + sizeof ( le_remap_req_t)) != 0) + return -EFAULT; + + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + if ( vg[VG_CHR(minor)]->lv[l] != NULL && + lvm_strcmp ( vg[VG_CHR(minor)]->lv[l]->lv_name, + le_remap_req.lv_name) == 0) { + for ( le = 0; le < vg[VG_CHR(minor)]->lv[l]->lv_allocated_le; + le++) { + if ( vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].dev == + le_remap_req.old_dev && + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].pe == + le_remap_req.old_pe) { + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].dev = + le_remap_req.new_dev; + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].pe = + le_remap_req.new_pe; + return 0; + } + } + return -EINVAL; + } + } + + return -ENXIO; + + + /* create a VGDA */ + case VG_CREATE: + return do_vg_create ( minor, arg); + + + /* remove an inactive VGDA */ + case VG_REMOVE: + return do_vg_remove ( minor); + + + /* extend a volume group */ + case VG_EXTEND: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( vg[VG_CHR(minor)]->pv_cur < vg[VG_CHR(minor)]->pv_max) { + for ( p = 0; p < vg[VG_CHR(minor)]->pv_max; p++) { + if ( vg[VG_CHR(minor)]->pv[p] == NULL) { + if ( ( vg[VG_CHR(minor)]->pv[p] = + kmalloc ( sizeof ( pv_t), GFP_USER)) == NULL) { + printk ( KERN_CRIT + "%s -- VG_EXTEND: kmalloc error PV at line %d\n", + lvm_name, __LINE__); + return -ENOMEM; + } + if ( copy_from_user ( vg[VG_CHR(minor)]->pv[p], arg, + sizeof ( pv_t)) != 0) + return -EFAULT; + + vg[VG_CHR(minor)]->pv[p]->pv_status = PV_ACTIVE; + /* We don't need the PE list + in kernel space like LVs pe_t list */ + vg[VG_CHR(minor)]->pv[p]->pe = NULL; + vg[VG_CHR(minor)]->pv_cur++; + vg[VG_CHR(minor)]->pv_act++; + vg[VG_CHR(minor)]->pe_total += + vg[VG_CHR(minor)]->pv[p]->pe_total; +#ifdef LVM_GET_INODE + /* insert a dummy inode for fs_may_mount */ + vg[VG_CHR(minor)]->pv[p]->inode = + lvm_get_inode ( vg[VG_CHR(minor)]->pv[p]->pv_dev); +#endif + return 0; + } + } + } + return -EPERM; + + + /* reduce a volume group */ + case VG_REDUCE: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( pv_name, arg, sizeof ( pv_name)) != 0) + return -EFAULT; + + for ( p = 0; p < vg[VG_CHR(minor)]->pv_max; p++) { + if ( vg[VG_CHR(minor)]->pv[p] != NULL && + lvm_strcmp ( vg[VG_CHR(minor)]->pv[p]->pv_name, + pv_name) == 0) { + if ( vg[VG_CHR(minor)]->pv[p]->lv_cur > 0) return -EPERM; + vg[VG_CHR(minor)]->pe_total -= + vg[VG_CHR(minor)]->pv[p]->pe_total; + vg[VG_CHR(minor)]->pv_cur--; + vg[VG_CHR(minor)]->pv_act--; +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG + "%s -- kfree %d\n", lvm_name, __LINE__); +#endif +#ifdef LVM_GET_INODE + lvm_clear_inode ( vg[VG_CHR(minor)]->pv[p]->inode); +#endif + kfree ( vg[VG_CHR(minor)]->pv[p]); + /* Make PV pointer array contiguous */ + for ( ; p < vg[VG_CHR(minor)]->pv_max-1; p++) + vg[VG_CHR(minor)]->pv[p] = vg[VG_CHR(minor)]->pv[p + 1]; + vg[VG_CHR(minor)]->pv[p + 1] = NULL; + return 0; + } + } + return -ENXIO; + + + /* set/clear extendability flag of volume group */ + case VG_SET_EXTENDABLE: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &extendable, arg, sizeof ( extendable)) != 0) + return -EFAULT; + + if ( extendable == VG_EXTENDABLE || + extendable == ~VG_EXTENDABLE) { + if ( extendable == VG_EXTENDABLE) + vg[VG_CHR(minor)]->vg_status |= VG_EXTENDABLE; + else + vg[VG_CHR(minor)]->vg_status &= ~VG_EXTENDABLE; + } else return -EINVAL; + return 0; + + + /* get volume group data (only the vg_t struct) */ + case VG_STATUS: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_to_user ( arg, vg[VG_CHR(minor)], sizeof ( vg_t)) != 0) + return -EFAULT; + + return 0; + + + /* get volume group count */ + case VG_STATUS_GET_COUNT: + if ( copy_to_user ( arg, &vg_count, sizeof ( vg_count)) != 0) + return -EFAULT; + + return 0; + + + /* get volume group count */ + case VG_STATUS_GET_NAMELIST: + for ( l = v = 0; v < ABS_MAX_VG; v++) { + if ( vg[v] != NULL) { + if ( copy_to_user ( arg + l++ * NAME_LEN, + vg[v]->vg_name, + NAME_LEN) != 0) + return -EFAULT; + } + } + return 0; + + + /* create, remove, extend or reduce a logical volume */ + case LV_CREATE: + case LV_REMOVE: + case LV_EXTEND: + case LV_REDUCE: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &lv_req, arg, sizeof ( lv_req)) != 0) + return -EFAULT; + + if ( command != LV_REMOVE) { + if ( copy_from_user ( &lv, lv_req.lv, sizeof ( lv_t)) != 0) + return -EFAULT; + } + + switch ( command) { + case LV_CREATE: + return do_lv_create ( minor, lv_req.lv_name, &lv); + + case LV_REMOVE: + return do_lv_remove ( minor, lv_req.lv_name, -1); + + case LV_EXTEND: + case LV_REDUCE: + return do_lv_extend_reduce ( minor, lv_req.lv_name, &lv); + } + + + /* get status of a logical volume by name */ + case LV_STATUS_BYNAME: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &lv_status_byname_req, arg, + sizeof ( lv_status_byname_req_t)) != 0) + return -EFAULT; + + if ( lv_status_byname_req.lv == NULL) return -EINVAL; + if ( copy_from_user ( &lv, lv_status_byname_req.lv, + sizeof ( lv_t)) != 0) + return -EFAULT; + + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + if ( vg[VG_CHR(minor)]->lv[l] != NULL && + lvm_strcmp ( vg[VG_CHR(minor)]->lv[l]->lv_name, + lv_status_byname_req.lv_name) == 0) { + if ( copy_to_user ( lv_status_byname_req.lv, + vg[VG_CHR(minor)]->lv[l], + sizeof ( lv_t)) != 0) + return -EFAULT; + + if ( lv.lv_current_pe != NULL) { + size = vg[VG_CHR(minor)]->lv[l]->lv_allocated_le * + sizeof ( pe_t); + if ( copy_to_user ( lv.lv_current_pe, + vg[VG_CHR(minor)]->lv[l]->lv_current_pe, + size) != 0) + return -EFAULT; + } + return 0; + } + } + return -ENXIO; + + + /* get status of a logical volume by index */ + case LV_STATUS_BYINDEX: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &lv_status_byindex_req, arg, + sizeof ( lv_status_byindex_req)) != 0) + return -EFAULT; + + if ( ( lvp = lv_status_byindex_req.lv) == NULL) return -EINVAL; + l = lv_status_byindex_req.lv_index; + if ( vg[VG_CHR(minor)]->lv[l] == NULL) return -ENXIO; + + if ( copy_from_user ( &lv, lvp, sizeof ( lv_t)) != 0) + return -EFAULT; + + if ( copy_to_user ( lvp, vg[VG_CHR(minor)]->lv[l], + sizeof ( lv_t)) != 0) + return -EFAULT; + + if ( lv.lv_current_pe != NULL) { + size = vg[VG_CHR(minor)]->lv[l]->lv_allocated_le * sizeof ( pe_t); + if ( copy_to_user ( lv.lv_current_pe, + vg[VG_CHR(minor)]->lv[l]->lv_current_pe, + size) != 0) + return -EFAULT; + } + return 0; + + + /* change a physical volume */ + case PV_CHANGE: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &pv_change_req, arg, + sizeof ( pv_change_req)) != 0) + return -EFAULT; + + for ( p = 0; p < vg[VG_CHR(minor)]->pv_max; p++) { + if ( vg[VG_CHR(minor)]->pv[p] != NULL && + lvm_strcmp ( vg[VG_CHR(minor)]->pv[p]->pv_name, + pv_change_req.pv_name) == 0) { +#ifdef LVM_GET_INODE + inode_sav = vg[VG_CHR(minor)]->pv[p]->inode; +#endif + if ( copy_from_user ( vg[VG_CHR(minor)]->pv[p], + pv_change_req.pv, + sizeof ( pv_t)) != 0) + return -EFAULT; + + /* We don't need the PE list + in kernel space as with LVs pe_t list */ + vg[VG_CHR(minor)]->pv[p]->pe = NULL; +#ifdef LVM_GET_INODE + vg[VG_CHR(minor)]->pv[p]->inode = inode_sav; +#endif + return 0; + } + } + return -ENXIO; + + + /* get physical volume data (pv_t structure only) */ + case PV_STATUS: + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + if ( copy_from_user ( &pv_status_req, arg, + sizeof ( pv_status_req)) != 0) + return -EFAULT; + + for ( p = 0; p < vg[VG_CHR(minor)]->pv_max; p++) { + if ( vg[VG_CHR(minor)]->pv[p] != NULL) { + if ( lvm_strcmp ( vg[VG_CHR(minor)]->pv[p]->pv_name, + pv_status_req.pv_name) == 0) { + if ( copy_to_user ( pv_status_req.pv, + vg[VG_CHR(minor)]->pv[p], + sizeof ( pv_t)) != 0) + return -EFAULT; + return 0; + } + } + } + return -ENXIO; + + + /* physical volume buffer flush/invalidate */ + case PV_FLUSH: + if ( copy_from_user ( &pv_flush_req, arg, sizeof ( pv_flush_req)) != 0) + return -EFAULT; + + for ( v = 0; v < ABS_MAX_VG; v++) { + if ( vg[v] == NULL) continue; + for ( p = 0; p < vg[v]->pv_max; p++) { + if ( vg[v]->pv[p] != NULL && + lvm_strcmp ( vg[v]->pv[p]->pv_name, + pv_flush_req.pv_name) == 0) { + fsync_dev ( vg[v]->pv[p]->pv_dev); + invalidate_buffers ( vg[v]->pv[p]->pv_dev); + return 0; + } + } + } + return 0; + + + default: + printk ( KERN_WARNING + "%s -- lvm_chr_ioctl: unknown command %x\n", + lvm_name, command); + return -EINVAL; + } + + return 0; +} /* lvm_chr_ioctl */ + + +/* + * character device close routine + */ +static int lvm_chr_release ( struct inode *inode, struct file *file) +{ +#ifdef DEBUG + int minor = MINOR ( inode->i_rdev); + printk ( KERN_DEBUG + "%s -- lvm_chr_release VG#: %d\n", lvm_name, VG_CHR(minor)); +#endif + +#ifdef MODULE + if ( GET_USE_COUNT ( &__this_module) > 0) MOD_DEC_USE_COUNT; +#endif + +#ifdef LVM_TOTAL_RESET + if ( lvm_reset_spindown > 0) { + lvm_reset_spindown = 0; + lvm_chr_open_count = 1; + } +#endif + + if ( lvm_chr_open_count > 0) lvm_chr_open_count--; + if ( lock == current->pid) { + lock = 0; /* release lock */ + wake_up_interruptible ( &lvm_wait); + } + + return 0; +} /* lvm_chr_release () */ + + + +/******************************************************************** + * + * Block device functions + * + ********************************************************************/ + +/* + * block device open routine + */ +static int lvm_blk_open ( struct inode *inode, struct file *file) { + int minor = MINOR ( inode->i_rdev); + +#ifdef DEBUG_LVM_BLK_OPEN + printk ( KERN_DEBUG + "%s -- lvm_blk_open MINOR: %d VG#: %d LV#: %d mode: 0x%X\n", + lvm_name, minor, VG_BLK(minor), LV_BLK(minor), file->f_mode); +#endif + +#ifdef LVM_TOTAL_RESET + if ( lvm_reset_spindown > 0) return -EPERM; +#endif + + if ( vg[VG_BLK(minor)] != NULL && + ( vg[VG_BLK(minor)]->vg_status & VG_ACTIVE) && + vg[VG_BLK(minor)]->lv[LV_BLK(minor)] != NULL && + LV_BLK(minor) >= 0 && + LV_BLK(minor) < vg[VG_BLK(minor)]->lv_max) { + + /* Check parallel LV spindown (LV remove) */ + if ( vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_status & LV_SPINDOWN) + return -EPERM; + + /* Check inactive LV and open for read/write */ + if ( file->f_mode & O_RDWR) { + if ( ! ( vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_status & LV_ACTIVE)) + return -EPERM; + if ( ! ( vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_access & LV_WRITE)) + return -EACCES; + } + + if ( vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_open == 0) + vg[VG_BLK(minor)]->lv_open++; + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_open++; + +#ifdef MODULE + MOD_INC_USE_COUNT; +#endif + +#ifdef DEBUG_LVM_BLK_OPEN + printk ( KERN_DEBUG + "%s -- lvm_blk_open MINOR: %d VG#: %d LV#: %d size: %d\n", + lvm_name, minor, VG_BLK(minor), LV_BLK(minor), + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_size); +#endif + + return 0; + } + + return -ENXIO; +} /* lvm_blk_open () */ + + +/* + * block device i/o-control routine + */ +static int lvm_blk_ioctl (struct inode *inode, struct file *file, + uint command, ulong a) { + int minor = MINOR ( inode->i_rdev); + void *arg = ( void*) a; + struct hd_geometry *hd = ( struct hd_geometry *) a; + +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_blk_ioctl MINOR: %d command: 0x%X arg: %X " + "VG#: %dl LV#: %d\n", + lvm_name, minor, command, ( ulong) arg, + VG_BLK(minor), LV_BLK(minor)); +#endif + + switch ( command) { + /* return device size */ + case BLKGETSIZE: +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_blk_ioctl -- BLKGETSIZE: %u\n", + lvm_name, vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_size); +#endif + copy_to_user ( ( long*) arg, &vg[VG_BLK(minor)]->\ + lv[LV_BLK(minor)]->lv_size, + sizeof ( vg[VG_BLK(minor)]->\ + lv[LV_BLK(minor)]->lv_size)); + break; + + + /* flush buffer cache */ + case BLKFLSBUF: + /* super user validation */ + if ( ! suser ()) return -EACCES; + +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_blk_ioctl -- BLKFLSBUF\n", lvm_name); +#endif + fsync_dev ( inode->i_rdev); + invalidate_buffers(inode->i_rdev); + break; + + + /* set read ahead for block device */ + case BLKRASET: + /* super user validation */ + if ( ! suser ()) return -EACCES; + +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_blk_ioctl -- BLKRASET: %d sectors for %02X:%02X\n", + lvm_name, ( long) arg, MAJOR( inode->i_rdev), minor); +#endif + if ( ( long) arg < LVM_MIN_READ_AHEAD || + ( long) arg > LVM_MAX_READ_AHEAD) return -EINVAL; + read_ahead[MAJOR_NR] = + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_read_ahead = ( long) arg; + break; + + + /* get current read ahead setting */ + case BLKRAGET: +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_blk_ioctl -- BLKRAGET\n", lvm_name); +#endif + copy_to_user ( ( long*) arg, + &vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_read_ahead, + sizeof ( vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->\ + lv_read_ahead)); + break; + + + /* get disk geometry */ + case HDIO_GETGEO: +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_blk_ioctl -- HDIO_GETGEO\n", lvm_name); +#endif + if ( hd == NULL) return -EINVAL; + { + unsigned char heads = 64; + unsigned char sectors = 32; + long start = 0; + short cylinders = vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_size / + heads / sectors; + + if ( copy_to_user ( ( char*) &hd->heads, &heads, + sizeof ( heads)) != 0 || + copy_to_user ( ( char*) &hd->sectors, §ors, + sizeof ( sectors)) != 0 || + copy_to_user ( ( short*) &hd->cylinders, + &cylinders, sizeof ( cylinders)) != 0 || + copy_to_user ( ( long*) &hd->start, &start, + sizeof ( start)) != 0) + return -EFAULT; + } + +#ifdef DEBUG_IOCTL + printk ( KERN_DEBUG + "%s -- lvm_blk_ioctl -- cylinders: %d\n", + lvm_name, vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->\ + lv_size / heads / sectors); +#endif + break; + + + /* set access flags of a logical volume */ + case LV_SET_ACCESS: + /* super user validation */ + if ( ! suser ()) return -EACCES; + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_access = ( ulong) arg; + break; + + + /* set status flags of a logical volume */ + case LV_SET_STATUS: + /* super user validation */ + if ( ! suser ()) return -EACCES; + if ( ! ( ( ulong) arg & LV_ACTIVE) && + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_open > 1) return -EPERM; + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_status = ( ulong) arg; + break; + + + /* set allocation flags of a logical volume */ + case LV_SET_ALLOCATION: + /* super user validation */ + if ( ! suser ()) return -EACCES; + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_allocation = ( ulong) arg; + break; + + + default: + printk ( KERN_WARNING + "%s -- lvm_blk_ioctl: unknown command %d\n", + lvm_name, command); + return -EINVAL; + } + + return 0; +} /* lvm_blk_ioctl () */ + + +/* + * block device close routine + */ +static int lvm_blk_release ( struct inode *inode, struct file *file) +{ + int minor = MINOR ( inode->i_rdev); + +#ifdef DEBUG + printk ( KERN_DEBUG + "%s -- lvm_blk_release MINOR: %d VG#: %d LV#: %d\n", + lvm_name, minor, VG_BLK(minor), LV_BLK(minor)); +#endif + + sync_dev ( inode->i_rdev); + if ( vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_open == 1) + vg[VG_BLK(minor)]->lv_open--; + vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_open--; + +#ifdef MODULE + MOD_DEC_USE_COUNT; +#endif + + return 0; +} /* lvm_blk_release () */ + + +#if defined CONFIG_LVM_PROC_FS && defined CONFIG_PROC_FS +/* + * Support function /proc-Filesystem + */ +#define LVM_PROC_BUF ( i == 0 ? dummy_buf : &buf[sz]) + +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 25) +static int lvm_proc_get_info ( char *page, char **start, off_t pos, int count) +#else +static int lvm_proc_get_info ( char *page, char **start, off_t pos, + int count, int whence) +#endif +{ + int c, i, l, p, v, vg_counter, pv_counter, lv_counter, lv_open_counter, + lv_open_total, pe_t_bytes, lv_block_exception_t_bytes, seconds; + static off_t sz; + off_t sz_last; + char allocation_flag, inactive_flag, rw_flag, stripes_flag; + char *lv_name = NULL; + static char *buf = NULL; + static char dummy_buf[160]; /* sized for 2 lines */ + +#ifdef DEBUG_LVM_PROC_GET_INFO + printk ( KERN_DEBUG + "%s - lvm_proc_get_info CALLED pos: %lu count: %d whence: %d\n", + lvm_name, pos, count, whence); +#endif + + if ( pos == 0 || buf == NULL) { + sz_last = vg_counter = pv_counter = lv_counter = lv_open_counter = \ + lv_open_total = pe_t_bytes = lv_block_exception_t_bytes = 0; + + /* search for activity */ + for ( v = 0; v < ABS_MAX_VG; v++) { + if ( vg[v] != NULL) { + vg_counter++; + pv_counter += vg[v]->pv_cur; + lv_counter += vg[v]->lv_cur; + if ( vg[v]->lv_cur > 0) { + for ( l = 0; l < vg[v]->lv_max; l++) { + if ( vg[v]->lv[l] != NULL) { + pe_t_bytes += vg[v]->lv[l]->lv_allocated_le; + if ( vg[v]->lv[l]->lv_block_exception != NULL) { + lv_block_exception_t_bytes += + vg[v]->lv[l]->lv_remap_end; + } + if ( vg[v]->lv[l]->lv_open > 0) { + lv_open_counter++; + lv_open_total += vg[v]->lv[l]->lv_open; + } + } + } + } + } + } + pe_t_bytes *= sizeof ( pe_t); + lv_block_exception_t_bytes *= sizeof ( lv_block_exception_t); + + if ( buf != NULL) { +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG + "%s -- vfree %d\n", lvm_name, __LINE__); +#endif + vfree ( buf); + buf = NULL; + } + + /* 2 times: first to get size to allocate buffer, + 2nd to fill the vmalloced buffer */ + for ( i = 0; i < 2; i++) { + sz = 0; + sz += sprintf ( LVM_PROC_BUF, + "LVM " +#ifdef MODULE + "module" +#else + "driver" +#endif + " %s\n\n" + "Total: %d VG%s %d PV%s %d LV%s ", + lvm_short_version, + vg_counter, vg_counter == 1 ? "" : "s", + pv_counter, pv_counter == 1 ? "" : "s", + lv_counter, lv_counter == 1 ? "" : "s"); + sz += sprintf ( LVM_PROC_BUF, + "(%d LV%s open", + lv_open_counter, + lv_open_counter == 1 ? "" : "s"); + if ( lv_open_total > 0) sz += sprintf ( LVM_PROC_BUF, + " %d times)\n", + lv_open_total); + else sz += sprintf ( LVM_PROC_BUF, ")"); + sz += sprintf ( LVM_PROC_BUF, + "\nGlobal: %lu bytes vmalloced IOP version: %d ", + vg_counter * sizeof ( vg_t) + + pv_counter * sizeof ( pv_t) + + lv_counter * sizeof ( lv_t) + + pe_t_bytes + lv_block_exception_t_bytes + sz_last, + lvm_iop_version); + + seconds = CURRENT_TIME - loadtime; + if ( seconds < 0) loadtime = CURRENT_TIME + seconds; + if ( seconds / 86400 > 0) { + sz += sprintf ( LVM_PROC_BUF, "%d day%s ", + seconds / 86400, + seconds / 86400 == 0 || + seconds / 86400 > 1 ? "s": ""); + } + sz += sprintf ( LVM_PROC_BUF, "%d:%02d:%02d active\n", + ( seconds % 86400) / 3600, + ( seconds % 3600) / 60, + seconds % 60); + + if ( vg_counter > 0) { + for ( v = 0; v < ABS_MAX_VG; v++) { + /* volume group */ + if ( vg[v] != NULL) { + inactive_flag = ' '; + if ( ! ( vg[v]->vg_status & VG_ACTIVE)) + inactive_flag = 'I'; + sz += sprintf ( LVM_PROC_BUF, + "\nVG: %c%s [%d PV, %d LV/%d open] " + " PE Size: %d KB\n" + " Usage [KB/PE]: %d /%d total " + "%d /%d used %d /%d free", + inactive_flag, + vg[v]->vg_name, + vg[v]->pv_cur, + vg[v]->lv_cur, + vg[v]->lv_open, + vg[v]->pe_size >> 1, + vg[v]->pe_size * vg[v]->pe_total >> 1, + vg[v]->pe_total, + vg[v]->pe_allocated * vg[v]->pe_size >> 1, + vg[v]->pe_allocated, + ( vg[v]->pe_total - vg[v]->pe_allocated) * + vg[v]->pe_size >> 1, + vg[v]->pe_total - vg[v]->pe_allocated); + + /* physical volumes */ + sz += sprintf ( LVM_PROC_BUF, + "\n PV%s ", + vg[v]->pv_cur == 1 ? ": " : "s:"); + c = 0; + for ( p = 0; p < vg[v]->pv_max; p++) { + if ( vg[v]->pv[p] != NULL) { + inactive_flag = 'A'; + if ( ! ( vg[v]->pv[p]->pv_status & PV_ACTIVE)) + inactive_flag = 'I'; + allocation_flag = 'A'; + if ( ! ( vg[v]->pv[p]->pv_allocatable & PV_ALLOCATABLE)) + allocation_flag = 'N'; + sz += sprintf ( LVM_PROC_BUF, + "[%c%c] %-21s %8d /%-6d " + "%8d /%-6d %8d /%-6d", + inactive_flag, + allocation_flag, + vg[v]->pv[p]->pv_name, + vg[v]->pv[p]->pe_total * + vg[v]->pv[p]->pe_size >> 1, + vg[v]->pv[p]->pe_total, + vg[v]->pv[p]->pe_allocated * + vg[v]->pv[p]->pe_size >> 1, + vg[v]->pv[p]->pe_allocated, + ( vg[v]->pv[p]->pe_total - + vg[v]->pv[p]->pe_allocated) * + vg[v]->pv[p]->pe_size >> 1, + vg[v]->pv[p]->pe_total - + vg[v]->pv[p]->pe_allocated); + c++; + if ( c < vg[v]->pv_cur) sz += sprintf ( LVM_PROC_BUF, + "\n "); + } + } + + /* logical volumes */ + sz += sprintf ( LVM_PROC_BUF, + "\n LV%s ", + vg[v]->lv_cur == 1 ? ": " : "s:"); + c = 0; + for ( l = 0; l < vg[v]->lv_max; l++) { + if ( vg[v]->lv[l] != NULL) { + inactive_flag = 'A'; + if ( ! ( vg[v]->lv[l]->lv_status & LV_ACTIVE)) + inactive_flag = 'I'; + rw_flag = 'R'; + if ( vg[v]->lv[l]->lv_access & LV_WRITE) rw_flag = 'W'; + allocation_flag = 'D'; + if ( vg[v]->lv[l]->lv_allocation & LV_CONTIGUOUS) + allocation_flag = 'C'; + stripes_flag = 'L'; + if ( vg[v]->lv[l]->lv_stripes > 1) stripes_flag = 'S'; + sz += sprintf ( LVM_PROC_BUF, + "[%c%c%c%c", + inactive_flag, + rw_flag, + allocation_flag, + stripes_flag); + if ( vg[v]->lv[l]->lv_stripes > 1) + sz += sprintf ( LVM_PROC_BUF, "%-2d", + vg[v]->lv[l]->lv_stripes); + else + sz += sprintf ( LVM_PROC_BUF, " "); + lv_name = lvm_strrchr ( vg[v]->lv[l]->lv_name, '/'); + if ( lv_name != NULL) lv_name++; + else lv_name = vg[v]->lv[l]->lv_name; + sz += sprintf ( LVM_PROC_BUF, "] %-25s", lv_name); + if ( lvm_strlen ( lv_name) > 25) + sz += sprintf ( LVM_PROC_BUF, + "\n "); + sz += sprintf ( LVM_PROC_BUF, "%9d /%-6d ", + vg[v]->lv[l]->lv_size >> 1, + vg[v]->lv[l]->lv_size / vg[v]->pe_size); + + if ( vg[v]->lv[l]->lv_open == 0) + sz += sprintf ( LVM_PROC_BUF, "close"); + else + sz += sprintf ( LVM_PROC_BUF, "%dx open", + vg[v]->lv[l]->lv_open); + c++; + if ( c < vg[v]->lv_cur) sz += sprintf ( LVM_PROC_BUF, + "\n "); + } + } + if ( vg[v]->lv_cur == 0) + sz += sprintf ( LVM_PROC_BUF, "none"); + sz += sprintf ( LVM_PROC_BUF, "\n"); + } + } + } + + if ( buf == NULL) { + if ( ( buf = vmalloc ( sz)) == NULL) { + sz = 0; + return sprintf ( page, "%s - vmalloc error at line %d\n", + lvm_name, __LINE__); + } + } + sz_last = sz; + } + } + + if ( pos > sz - 1) { + vfree ( buf); + buf = NULL; + return 0; + } + + *start = &buf[pos]; + if ( sz - pos < count) return sz - pos; + else return count; +} /* lvm_proc_get_info () */ +#endif /* #if defined CONFIG_LVM_PROC_FS && defined CONFIG_PROC_FS */ + + +/* + * block device support function for /usr/src/linux/drivers/block/ll_rw_blk.c + * (see init_module/lvm_init) + */ +static int lvm_map ( struct buffer_head *bh, int rw) { + int minor = MINOR ( bh->b_dev); + int ret = 0; + ulong index; + ulong size = bh->b_size >> 9; + ulong rsector_tmp = bh->b_blocknr * size; + ulong rsector_sav; + kdev_t rdev_tmp = bh->b_dev; + kdev_t rdev_sav; + lv_t *lv = vg[VG_BLK(minor)]->lv[LV_BLK(minor)]; + unsigned long pe_start; + + + if ( ! ( lv->lv_status & LV_ACTIVE)) { + printk ( KERN_ALERT + "%s - lvm_map: ll_rw_blk for inactive LV %s\n", + lvm_name, lv->lv_name); + return -1; + } + +/* +if ( lv->lv_access & LV_SNAPSHOT) +printk ( "%s -- %02d:%02d block: %lu rw: %d\n", lvm_name, MAJOR ( bh->b_dev), MINOR ( bh->b_dev), bh->b_blocknr, rw); +*/ + + /* take care of snapshot chunk writes before + check for writable logical volume */ + if ( ( lv->lv_access & LV_SNAPSHOT) && + MAJOR ( bh->b_dev) != 0 && + MAJOR ( bh->b_dev) != MAJOR_NR && +#ifdef WRITEA + ( rw == WRITEA || rw == WRITE)) +#else + rw == WRITE) +#endif + { +/* +printk ( "%s -- doing snapshot write for %02d:%02d[%02d:%02d] b_blocknr: %lu b_rsector: %lu\n", lvm_name, MAJOR ( bh->b_dev), MINOR ( bh->b_dev), MAJOR ( bh->b_dev), MINOR ( bh->b_dev), bh->b_blocknr, bh->b_rsector); +*/ + return 0; + } + +#ifdef WRITEA + if ( ( rw == WRITE || rw == WRITEA) && +#else + if ( rw == WRITE && +#endif + ! ( lv->lv_access & LV_WRITE)) { + printk ( KERN_CRIT + "%s - lvm_map: ll_rw_blk write for readonly LV %s\n", + lvm_name, lv->lv_name); + return -1; + } + + +#ifdef DEBUG_MAP + printk ( KERN_DEBUG + "%s - lvm_map minor:%d *rdev: %02d:%02d *rsector: %lu " + "size:%lu\n", + lvm_name, minor, + MAJOR ( rdev_tmp), + MINOR ( rdev_tmp), + rsector_tmp, size); +#endif + + if ( rsector_tmp + size > lv->lv_size) { + printk ( KERN_ALERT + "%s - lvm_map *rsector: %lu or size: %lu wrong for" + " minor: %2d\n", lvm_name, rsector_tmp, size, minor); + return -1; + } + + rsector_sav = rsector_tmp; + rdev_sav = rdev_tmp; + +lvm_second_remap: + /* linear mapping */ + if ( lv->lv_stripes < 2) { + index = rsector_tmp / vg[VG_BLK(minor)]->pe_size; /* get the index */ + pe_start = lv->lv_current_pe[index].pe; + rsector_tmp = lv->lv_current_pe[index].pe + + ( rsector_tmp % vg[VG_BLK(minor)]->pe_size); + rdev_tmp = lv->lv_current_pe[index].dev; + +#ifdef DEBUG_MAP + printk ( KERN_DEBUG + "lv_current_pe[%ld].pe: %d rdev: %02d:%02d rsector:%ld\n", + index, + lv->lv_current_pe[index].pe, + MAJOR ( rdev_tmp), + MINOR ( rdev_tmp), + rsector_tmp); +#endif + + /* striped mapping */ + } else { + ulong stripe_index; + ulong stripe_length; + + stripe_length = vg[VG_BLK(minor)]->pe_size * lv->lv_stripes; + stripe_index = ( rsector_tmp % stripe_length) / lv->lv_stripesize; + index = rsector_tmp / stripe_length + + ( stripe_index % lv->lv_stripes) * + ( lv->lv_allocated_le / lv->lv_stripes); + pe_start = lv->lv_current_pe[index].pe; + rsector_tmp = lv->lv_current_pe[index].pe + + ( rsector_tmp % stripe_length) - + ( stripe_index % lv->lv_stripes) * lv->lv_stripesize - + stripe_index / lv->lv_stripes * + ( lv->lv_stripes - 1) * lv->lv_stripesize; + rdev_tmp = lv->lv_current_pe[index].dev; + +#ifdef DEBUG_MAP + printk(KERN_DEBUG + "lv_current_pe[%ld].pe: %d rdev: %02d:%02d rsector:%ld\n" + "stripe_length: %ld stripe_index: %ld\n", + index, + lv->lv_current_pe[index].pe, + MAJOR ( rdev_tmp), + MINOR ( rdev_tmp), + rsector_tmp, + stripe_length, + stripe_index); +#endif + } + + /* handle physical extents on the move */ + if ( pe_lock_req.lock == LOCK_PE) { + if ( rdev_tmp == pe_lock_req.data.pv_dev && + rsector_tmp >= pe_lock_req.data.pv_offset && + rsector_tmp < ( pe_lock_req.data.pv_offset + + vg[VG_BLK(minor)]->pe_size)) { + sleep_on ( &lvm_map_wait); + rsector_tmp = rsector_sav; + rdev_tmp = rdev_sav; + goto lvm_second_remap; + } + } + + /* statistic */ +#ifdef WRITEA + if ( rw == WRITE || rw == WRITEA) +#else + if ( rw == WRITE) +#endif + lv->lv_current_pe[index].writes++; + else + lv->lv_current_pe[index].reads++; + + /* snapshot volume exception handling on physical device address base */ + if ( lv->lv_access & ( LV_SNAPSHOT | LV_SNAPSHOT_ORG)) { + /* original logical volume */ + if ( lv->lv_access & LV_SNAPSHOT_ORG) { +#ifdef WRITEA + if ( rw == WRITE || rw == WRITEA) +#else + if ( rw == WRITE) +#endif + { + lv_t *lv_ptr; + + /* start with first snapshot and loop thrugh all of them */ + for ( lv_ptr = lv->lv_snapshot_next; + lv_ptr != NULL; + lv_ptr = lv_ptr->lv_snapshot_next) { + down(&lv_ptr->lv_snapshot_sem); + /* do we still have exception storage for this snapshot free? */ + if ( lv_ptr->lv_block_exception != NULL) { + kdev_t __dev; + unsigned long __sector; + + __dev = rdev_tmp; + __sector = rsector_tmp; + if (!lvm_snapshot_remap_block(&rdev_tmp, + &rsector_tmp, + pe_start, + lv_ptr)) + /* create a new mapping */ + ret = lvm_snapshot_COW(rdev_tmp, + rsector_tmp, + pe_start, + rsector_sav, + lv_ptr); + rdev_tmp = __dev; + rsector_tmp = __sector; + } + up(&lv_ptr->lv_snapshot_sem); + } + } + } else { + /* remap snapshot logical volume */ + down(&lv->lv_snapshot_sem); + if ( lv->lv_block_exception != NULL) + lvm_snapshot_remap_block ( &rdev_tmp, &rsector_tmp, pe_start, lv); + up(&lv->lv_snapshot_sem); + } + } + + bh->b_rdev = rdev_tmp; + bh->b_rsector = rsector_tmp; + + return ret; +} /* lvm_map () */ + + +/* + * lvm_map snapshot logical volume support functions + */ + +/* + * end lvm_map snapshot logical volume support functions + */ + + +/* + * internal support functions + */ + +#ifdef LVM_HD_NAME +/* + * generate "hard disk" name + */ +void lvm_hd_name ( char *buf, int minor) { + int len = 0; + + if ( vg[VG_BLK(minor)] == NULL || + vg[VG_BLK(minor)]->lv[LV_BLK(minor)] == NULL) return; + len = lvm_strlen ( vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_name) - 5; + lvm_memcpy ( buf, &vg[VG_BLK(minor)]->lv[LV_BLK(minor)]->lv_name[5], len); + buf[len] = 0; + return; +} +#endif + + +/* + * this one never should be called... + */ +#if LINUX_VERSION_CODE > KERNEL_VERSION ( 2, 3, 30) +static void lvm_dummy_device_request ( request_queue_t *t) +#else +static void lvm_dummy_device_request ( void) +#endif +{ + printk ( KERN_EMERG + "%s -- oops, got lvm request for %02d:%02d [sector: %lu]\n", + lvm_name, + MAJOR ( CURRENT->rq_dev), + MINOR ( CURRENT->rq_dev), + CURRENT->sector); + return; +} + + +/* + * character device support function VGDA create + */ +int do_vg_create ( int minor, void *arg) { + int snaporg_minor = 0; + ulong l, p; + lv_t lv; + vg_t *vg_ptr; + + if ( vg[VG_CHR(minor)] != NULL) return -EPERM; + + if ( ( vg_ptr = kmalloc ( sizeof ( vg_t), GFP_USER)) == NULL) { + printk ( KERN_CRIT + "%s -- VG_CREATE: kmalloc error VG at line %d\n", + lvm_name, __LINE__); + return -ENOMEM; + } + + /* get the volume group structure */ + if ( copy_from_user ( vg_ptr, arg, sizeof ( vg_t)) != 0) { + kfree ( vg_ptr); + return -EFAULT; + } + + /* we are not that active so far... */ + vg_ptr->vg_status &= ~VG_ACTIVE; + vg[VG_CHR(minor)] = vg_ptr; + + vg[VG_CHR(minor)]->pe_allocated = 0; + if ( vg[VG_CHR(minor)]->pv_max > ABS_MAX_PV) { + printk ( KERN_WARNING + "%s -- Can't activate VG: ABS_MAX_PV too small\n", + lvm_name); + kfree ( vg[VG_CHR(minor)]); + vg[VG_CHR(minor)] = NULL; + return -EPERM; + } + if ( vg[VG_CHR(minor)]->lv_max > ABS_MAX_LV) { + printk ( KERN_WARNING + "%s -- Can't activate VG: ABS_MAX_LV too small for %u\n", + lvm_name, vg[VG_CHR(minor)]->lv_max); + kfree ( vg[VG_CHR(minor)]); + vg[VG_CHR(minor)] = NULL; + return -EPERM; + } + + /* get the physical volume structures */ + vg[VG_CHR(minor)]->pv_act = vg[VG_CHR(minor)]->pv_cur = 0; + for ( p = 0; p < vg[VG_CHR(minor)]->pv_max; p++) { + /* user space address */ + if ( ( pvp = vg[VG_CHR(minor)]->pv[p]) != NULL) { + vg[VG_CHR(minor)]->pv[p] = kmalloc ( sizeof ( pv_t), GFP_USER); + if ( vg[VG_CHR(minor)]->pv[p] == NULL) { + printk ( KERN_CRIT + "%s -- VG_CREATE: kmalloc error PV at line %d\n", + lvm_name, __LINE__); + do_vg_remove ( minor); + return -ENOMEM; + } + if ( copy_from_user ( vg[VG_CHR(minor)]->pv[p], pvp, + sizeof ( pv_t)) != 0) { + do_vg_remove ( minor); + return -EFAULT; + } + + /* We don't need the PE list + in kernel space as with LVs pe_t list (see below) */ + vg[VG_CHR(minor)]->pv[p]->pe = NULL; + vg[VG_CHR(minor)]->pv[p]->pe_allocated = 0; + vg[VG_CHR(minor)]->pv[p]->pv_status = PV_ACTIVE; + vg[VG_CHR(minor)]->pv_act++; + vg[VG_CHR(minor)]->pv_cur++; + +#ifdef LVM_GET_INODE + /* insert a dummy inode for fs_may_mount */ + vg[VG_CHR(minor)]->pv[p]->inode = + lvm_get_inode ( vg[VG_CHR(minor)]->pv[p]->pv_dev); +#endif + } + } + + /* get the logical volume structures */ + vg[VG_CHR(minor)]->lv_cur = 0; + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + /* user space address */ + if ( ( lvp = vg[VG_CHR(minor)]->lv[l]) != NULL) { + if ( copy_from_user ( &lv, lvp, sizeof ( lv_t)) != 0) { + do_vg_remove ( minor); + return -EFAULT; + } + vg[VG_CHR(minor)]->lv[l] = NULL; + { + int err; + + err = do_lv_create(minor, lv.lv_name, &lv); + if (err) + { + do_vg_remove(minor); + return err; + } + } + } + } + + /* Second path to correct snapshot logical volumes which are not + in place during first path above */ + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + if ( vg[VG_CHR(minor)]->lv[l] != NULL && + vg[VG_CHR(minor)]->lv[l]->lv_access & LV_SNAPSHOT) { + snaporg_minor = vg[VG_CHR(minor)]->lv[l]->lv_snapshot_minor; + if ( vg[VG_CHR(minor)]->lv[LV_BLK(snaporg_minor)] != NULL) { + /* get pointer to original logical volume */ + lv_t *lv_ptr = vg[VG_CHR(minor)]->lv[l]->lv_snapshot_org = + vg[VG_CHR(minor)]->lv[LV_BLK(snaporg_minor)]; + + /* set necessary fields of original logical volume */ + lv_ptr->lv_access |= LV_SNAPSHOT_ORG; + lv_ptr->lv_snapshot_minor = 0; + lv_ptr->lv_snapshot_org = lv_ptr; + lv_ptr->lv_snapshot_prev = NULL; + + /* find last snapshot logical volume in the chain */ + while ( lv_ptr->lv_snapshot_next != NULL) + lv_ptr = lv_ptr->lv_snapshot_next; + + /* set back pointer to this last one in our new logical volume */ + vg[VG_CHR(minor)]->lv[l]->lv_snapshot_prev = lv_ptr; + + /* last logical volume now points to our new snapshot volume */ + lv_ptr->lv_snapshot_next = vg[VG_CHR(minor)]->lv[l]; + + /* now point to the new one */ + lv_ptr = lv_ptr->lv_snapshot_next; + + /* set necessary fields of new snapshot logical volume */ + lv_ptr->lv_snapshot_next = NULL; + lv_ptr->lv_current_pe = + vg[VG_CHR(minor)]->lv[LV_BLK(snaporg_minor)]->lv_current_pe; + lv_ptr->lv_allocated_le = + vg[VG_CHR(minor)]->lv[LV_BLK(snaporg_minor)]->lv_allocated_le; + lv_ptr->lv_current_le = + vg[VG_CHR(minor)]->lv[LV_BLK(snaporg_minor)]->lv_current_le; + lv_ptr->lv_size = + vg[VG_CHR(minor)]->lv[LV_BLK(snaporg_minor)]->lv_size; + } + } + } + + vg_count++; + + /* let's go active */ + vg[VG_CHR(minor)]->vg_status |= VG_ACTIVE; + +#ifdef MODULE + MOD_INC_USE_COUNT; +#endif + return 0; +} /* do_vg_create () */ + + +/* + * character device support function VGDA remove + */ +static int do_vg_remove ( int minor) { + int i; + + if ( vg[VG_CHR(minor)] == NULL) return -ENXIO; + +#ifdef LVM_TOTAL_RESET + if ( vg[VG_CHR(minor)]->lv_open > 0 && lvm_reset_spindown == 0) +#else + if ( vg[VG_CHR(minor)]->lv_open > 0) +#endif + return -EPERM; + + /* let's go inactive */ + vg[VG_CHR(minor)]->vg_status &= ~VG_ACTIVE; + + /* free LVs */ + /* first free snapshot logical volumes */ + for ( i = 0; i < vg[VG_CHR(minor)]->lv_max; i++) { + if ( vg[VG_CHR(minor)]->lv[i] != NULL && + vg[VG_CHR(minor)]->lv[i]->lv_access & LV_SNAPSHOT) { + do_lv_remove ( minor, NULL, i); + current->state = TASK_INTERRUPTIBLE; + schedule_timeout ( 1); + } + } + /* then free the rest */ + for ( i = 0; i < vg[VG_CHR(minor)]->lv_max; i++) { + if ( vg[VG_CHR(minor)]->lv[i] != NULL) { + do_lv_remove ( minor, NULL, i); + current->state = TASK_INTERRUPTIBLE; + schedule_timeout ( 1); + } + } + + /* free PVs */ + for ( i = 0; i < vg[VG_CHR(minor)]->pv_max; i++) { + if ( vg[VG_CHR(minor)]->pv[i] != NULL) { +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG + "%s -- kfree %d\n", lvm_name, __LINE__); +#endif +#ifdef LVM_GET_INODE + lvm_clear_inode ( vg[VG_CHR(minor)]->pv[i]->inode); +#endif + kfree ( vg[VG_CHR(minor)]->pv[i]); + vg[VG_CHR(minor)]->pv[i] = NULL; + } + } + +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG "%s -- kfree %d\n", lvm_name, __LINE__); +#endif + kfree ( vg[VG_CHR(minor)]); + vg[VG_CHR(minor)] = NULL; + + vg_count--; + +#ifdef MODULE + MOD_DEC_USE_COUNT; +#endif + return 0; +} /* do_vg_remove () */ + + +/* + * character device support function logical volume create + */ +static int do_lv_create ( int minor, char *lv_name, lv_t *lv) { + int l, le, l_new, p, size; + ulong lv_status_save; + lv_block_exception_t *lvbe = lv->lv_block_exception; + lv_t *lv_ptr = NULL; + + if ( ( pep = lv->lv_current_pe) == NULL) return -EINVAL; + if ( lv->lv_chunk_size > LVM_SNAPSHOT_MAX_CHUNK) return -EINVAL; + + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + if ( vg[VG_CHR(minor)]->lv[l] != NULL && + lvm_strcmp ( vg[VG_CHR(minor)]->lv[l]->lv_name, lv_name) == 0) + return -EEXIST; + } + + /* in case of lv_remove(), lv_create() pair; for eg. lvrename does this */ + l_new = -1; + if ( vg[VG_CHR(minor)]->lv[lv->lv_number] == NULL) l_new = lv->lv_number; + else { + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + if ( vg[VG_CHR(minor)]->lv[l] == NULL) if ( l_new == -1) l_new = l; + } + } + if ( l_new == -1) return -EPERM; + l = l_new; + + if ( ( lv_ptr = kmalloc ( sizeof ( lv_t), GFP_USER)) == NULL) {; + printk ( KERN_CRIT "%s -- LV_CREATE: kmalloc error LV at line %d\n", + lvm_name, __LINE__); + return -ENOMEM; + } + + /* copy preloaded LV */ + lvm_memcpy ( ( char*) lv_ptr, ( char *) lv, sizeof ( lv_t)); + + lv_status_save = lv_ptr->lv_status; + lv_ptr->lv_status &= ~LV_ACTIVE; + lv_ptr->lv_snapshot_org = \ + lv_ptr->lv_snapshot_prev = \ + lv_ptr->lv_snapshot_next = NULL; + lv_ptr->lv_block_exception = NULL; +#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 3, 4) + lv_ptr->lv_snapshot_sem = MUTEX; +#else + init_MUTEX(&lv_ptr->lv_snapshot_sem); +#endif + vg[VG_CHR(minor)]->lv[l] = lv_ptr; + + /* get the PE structures from user space if this + is no snapshot logical volume */ + if ( ! ( lv_ptr->lv_access & LV_SNAPSHOT)) { + size = lv_ptr->lv_allocated_le * sizeof ( pe_t); + if ( ( lv_ptr->lv_current_pe = vmalloc ( size)) == NULL) { + printk ( KERN_CRIT + "%s -- LV_CREATE: vmalloc error LV_CURRENT_PE of %d Byte " + "at line %d\n", + lvm_name, size, __LINE__); +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG "%s -- vfree %d\n", lvm_name, __LINE__); +#endif + kfree ( lv_ptr); + vg[VG_CHR(minor)]->lv[l] = NULL; + return -ENOMEM; + } + + if ( copy_from_user ( lv_ptr->lv_current_pe, pep, size)) { + vfree ( lv_ptr->lv_current_pe); + kfree ( lv_ptr); + vg[VG_CHR(minor)]->lv[l] = NULL; + return -EFAULT; + } + + /* correct the PE count in PVs */ + for ( le = 0; le < lv_ptr->lv_allocated_le; le++) { + vg[VG_CHR(minor)]->pe_allocated++; + for ( p = 0; p < vg[VG_CHR(minor)]->pv_cur; p++) { + if ( vg[VG_CHR(minor)]->pv[p]->pv_dev == + lv_ptr->lv_current_pe[le].dev) + vg[VG_CHR(minor)]->pv[p]->pe_allocated++; + } + } + } else { + /* Get snapshot exception data and block list */ + if ( lvbe != NULL) { + lv_ptr->lv_snapshot_org = + vg[VG_CHR(minor)]->lv[LV_BLK(lv_ptr->lv_snapshot_minor)]; + if ( lv_ptr->lv_snapshot_org != NULL) { + size = lv_ptr->lv_remap_end * sizeof ( lv_block_exception_t); + if ( ( lv_ptr->lv_block_exception = vmalloc ( size)) == NULL) { + printk ( KERN_CRIT + "%s -- do_lv_create: vmalloc error LV_BLOCK_EXCEPTION " + "of %d byte at line %d\n", + lvm_name, size, __LINE__); +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG "%s -- vfree %d\n", lvm_name, __LINE__); +#endif + kfree ( lv_ptr); + vg[VG_CHR(minor)]->lv[l] = NULL; + return -ENOMEM; + } + + if ( copy_from_user ( lv_ptr->lv_block_exception, lvbe, size)) { + vfree ( lv_ptr->lv_block_exception); + kfree ( lv_ptr); + vg[VG_CHR(minor)]->lv[l] = NULL; + return -EFAULT; + } + + /* get pointer to original logical volume */ + lv_ptr = lv_ptr->lv_snapshot_org; + + lv_ptr->lv_snapshot_minor = 0; + lv_ptr->lv_snapshot_org = lv_ptr; + lv_ptr->lv_snapshot_prev = NULL; + /* walk thrugh the snapshot list */ + while ( lv_ptr->lv_snapshot_next != NULL) + lv_ptr = lv_ptr->lv_snapshot_next; + /* now lv_ptr points to the last existing snapshot in the chain */ + vg[VG_CHR(minor)]->lv[l]->lv_snapshot_prev = lv_ptr; + /* our new one now back points to the previous last in the chain */ + lv_ptr = vg[VG_CHR(minor)]->lv[l]; + /* now lv_ptr points to our new last snapshot logical volume */ + lv_ptr->lv_snapshot_org = lv_ptr->lv_snapshot_prev->lv_snapshot_org; + lv_ptr->lv_snapshot_next = NULL; + lv_ptr->lv_current_pe = lv_ptr->lv_snapshot_org->lv_current_pe; + lv_ptr->lv_allocated_le = lv_ptr->lv_snapshot_org->lv_allocated_le; + lv_ptr->lv_current_le = lv_ptr->lv_snapshot_org->lv_current_le; + lv_ptr->lv_size = lv_ptr->lv_snapshot_org->lv_size; + lv_ptr->lv_stripes = lv_ptr->lv_snapshot_org->lv_stripes; + lv_ptr->lv_stripesize = lv_ptr->lv_snapshot_org->lv_stripesize; + { + int err; + + err = lvm_snapshot_alloc(lv_ptr); + if (err) + { + vfree(lv_ptr->lv_block_exception); + kfree(lv_ptr); + vg[VG_CHR(minor)]->lv[l] = NULL; + return err; + } + } + } else { + vfree ( lv_ptr->lv_block_exception); + kfree ( lv_ptr); + vg[VG_CHR(minor)]->lv[l] = NULL; + return -EFAULT; + } + } else { + kfree ( vg[VG_CHR(minor)]->lv[l]); + vg[VG_CHR(minor)]->lv[l] = NULL; + return -EINVAL; + } + } /* if ( vg[VG_CHR(minor)]->lv[l]->lv_access & LV_SNAPSHOT) */ + + lv_ptr = vg[VG_CHR(minor)]->lv[l]; + lvm_gendisk.part[MINOR(lv_ptr->lv_dev)].start_sect = 0; + lvm_gendisk.part[MINOR(lv_ptr->lv_dev)].nr_sects = lv_ptr->lv_size; + lvm_size[MINOR(lv_ptr->lv_dev)] = lv_ptr->lv_size >> 1; + vg_lv_map[MINOR(lv_ptr->lv_dev)].vg_number = vg[VG_CHR(minor)]->vg_number; + vg_lv_map[MINOR(lv_ptr->lv_dev)].lv_number = lv_ptr->lv_number; + LVM_CORRECT_READ_AHEAD ( lv_ptr->lv_read_ahead); + read_ahead[MAJOR_NR] = lv_ptr->lv_read_ahead; + vg[VG_CHR(minor)]->lv_cur++; + lv_ptr->lv_status = lv_status_save; + + /* optionally add our new snapshot LV */ + if ( lv_ptr->lv_access & LV_SNAPSHOT) { + /* sync the original logical volume */ + fsync_dev ( lv_ptr->lv_snapshot_org->lv_dev); + /* put ourselve into the chain */ + lv_ptr->lv_snapshot_prev->lv_snapshot_next = lv_ptr; + lv_ptr->lv_snapshot_org->lv_access |= LV_SNAPSHOT_ORG; + } + + return 0; +} /* do_lv_create () */ + + +/* + * character device support function logical volume remove + */ +static int do_lv_remove ( int minor, char *lv_name, int l) { + uint le, p; + lv_t *lv_ptr; + + if ( l == -1) { + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + if ( vg[VG_CHR(minor)]->lv[l] != NULL && + lvm_strcmp ( vg[VG_CHR(minor)]->lv[l]->lv_name, lv_name) == 0) { + break; + } + } + } + + lv_ptr = vg[VG_CHR(minor)]->lv[l]; + if ( l < vg[VG_CHR(minor)]->lv_max) { +#ifdef LVM_TOTAL_RESET + if ( lv_ptr->lv_open > 0 && lvm_reset_spindown == 0) +#else + if ( lv_ptr->lv_open > 0) +#endif + return -EBUSY; + + /* check for deletion of snapshot source while + snapshot volume still exists */ + if ( ( lv_ptr->lv_access & LV_SNAPSHOT_ORG) && + lv_ptr->lv_snapshot_next != NULL) + return -EPERM; + + lv_ptr->lv_status |= LV_SPINDOWN; + + /* sync the buffers */ + fsync_dev ( lv_ptr->lv_dev); + + lv_ptr->lv_status &= ~LV_ACTIVE; + + /* invalidate the buffers */ + invalidate_buffers ( lv_ptr->lv_dev); + + /* reset generic hd */ + lvm_gendisk.part[MINOR(lv_ptr->lv_dev)].start_sect = -1; + lvm_gendisk.part[MINOR(lv_ptr->lv_dev)].nr_sects = 0; + lvm_size[MINOR(lv_ptr->lv_dev)] = 0; + + /* reset VG/LV mapping */ + vg_lv_map[MINOR(lv_ptr->lv_dev)].vg_number = ABS_MAX_VG; + vg_lv_map[MINOR(lv_ptr->lv_dev)].lv_number = -1; + + /* correct the PE count in PVs if this is no snapshot logical volume */ + if ( ! ( lv_ptr->lv_access & LV_SNAPSHOT)) { + /* only if this is no snapshot logical volume because we share + the lv_current_pe[] structs with the original logical volume */ + for ( le = 0; le < lv_ptr->lv_allocated_le; le++) { + vg[VG_CHR(minor)]->pe_allocated--; + for ( p = 0; p < vg[VG_CHR(minor)]->pv_cur; p++) { + if ( vg[VG_CHR(minor)]->pv[p]->pv_dev == + lv_ptr->lv_current_pe[le].dev) + vg[VG_CHR(minor)]->pv[p]->pe_allocated--; + } + } + vfree ( lv_ptr->lv_current_pe); + /* LV_SNAPSHOT */ + } else { +/* + if ( lv_ptr->lv_block_exception != NULL) { + int i; + kdev_t last_dev; + for ( i = last_dev = 0; i < lv_ptr->lv_remap_ptr; i++) { + if ( lv_ptr->lv_block_exception[i].rdev_new != last_dev) { + last_dev = lv_ptr->lv_block_exception[i].rdev_new; + invalidate_buffers ( last_dev); + current->state = TASK_INTERRUPTIBLE; + schedule_timeout ( 1); + } + } + } +*/ + /* remove this snapshot logical volume from the chain */ + lv_ptr->lv_snapshot_prev->lv_snapshot_next = lv_ptr->lv_snapshot_next; + if ( lv_ptr->lv_snapshot_next != NULL) { + lv_ptr->lv_snapshot_next->lv_snapshot_prev = + lv_ptr->lv_snapshot_prev; + } + /* no more snapshots? */ + if ( lv_ptr->lv_snapshot_org->lv_snapshot_next == NULL) + lv_ptr->lv_snapshot_org->lv_access &= ~LV_SNAPSHOT_ORG; + lvm_snapshot_release(lv_ptr); + } + +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG "%s -- kfree %d\n", lvm_name, __LINE__); +#endif + kfree ( lv_ptr); + vg[VG_CHR(minor)]->lv[l] = NULL; + vg[VG_CHR(minor)]->lv_cur--; + return 0; + } + + return -ENXIO; +} /* do_lv_remove () */ + + +/* + * character device support function logical volume extend / reduce + */ +static int do_lv_extend_reduce ( int minor, char *lv_name, lv_t *lv) { + int l, le, p, size, old_allocated_le; + uint32_t end, lv_status_save; + pe_t *pe; + + if ( ( pep = lv->lv_current_pe) == NULL) return -EINVAL; + + for ( l = 0; l < vg[VG_CHR(minor)]->lv_max; l++) { + if ( vg[VG_CHR(minor)]->lv[l] != NULL && + lvm_strcmp ( vg[VG_CHR(minor)]->lv[l]->lv_name, lv_name) == 0) + break; + } + if ( l == vg[VG_CHR(minor)]->lv_max) return -ENXIO; + + /* check for active snapshot */ + if ( lv->lv_access & ( LV_SNAPSHOT|LV_SNAPSHOT_ORG)) return -EPERM; + + if ( ( pe = vmalloc ( size = lv->lv_current_le * sizeof ( pe_t))) == NULL) { + printk ( KERN_CRIT + "%s -- do_lv_extend_reduce: vmalloc error LV_CURRENT_PE " + "of %d Byte at line %d\n", + lvm_name, size, __LINE__); + return -ENOMEM; + } + + /* get the PE structures from user space */ + if ( copy_from_user ( pe, pep, size)) { + vfree ( pe); + return -EFAULT; + } + +#ifdef DEBUG + printk ( KERN_DEBUG + "%s -- fsync_dev and " + "invalidate_buffers for %s [%s] in %s\n", + lvm_name, vg[VG_CHR(minor)]->lv[l]->lv_name, + kdevname ( vg[VG_CHR(minor)]->lv[l]->lv_dev), + vg[VG_CHR(minor)]->vg_name); +#endif + + vg[VG_CHR(minor)]->lv[l]->lv_status |= LV_SPINDOWN; + fsync_dev ( vg[VG_CHR(minor)]->lv[l]->lv_dev); + vg[VG_CHR(minor)]->lv[l]->lv_status &= ~LV_ACTIVE; + invalidate_buffers ( vg[VG_CHR(minor)]->lv[l]->lv_dev); + + /* reduce allocation counters on PV(s) */ + for ( le = 0; le < vg[VG_CHR(minor)]->lv[l]->lv_allocated_le; le++) { + vg[VG_CHR(minor)]->pe_allocated--; + for ( p = 0; p < vg[VG_CHR(minor)]->pv_cur; p++) { + if ( vg[VG_CHR(minor)]->pv[p]->pv_dev == + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].dev) { + vg[VG_CHR(minor)]->pv[p]->pe_allocated--; + break; + } + } + } + +#ifdef DEBUG_VFREE + printk ( KERN_DEBUG "%s -- vfree %d\n", lvm_name, __LINE__); +#endif + + /* save pointer to "old" lv/pe pointer array */ + pep1 = vg[VG_CHR(minor)]->lv[l]->lv_current_pe; + end = vg[VG_CHR(minor)]->lv[l]->lv_current_le; + + /* save open counter */ + lv_open = vg[VG_CHR(minor)]->lv[l]->lv_open; + + /* save # of old allocated logical extents */ + old_allocated_le = vg[VG_CHR(minor)]->lv[l]->lv_allocated_le; + + /* copy preloaded LV */ + lv_status_save = lv->lv_status; + lv->lv_status |= LV_SPINDOWN; + lv->lv_status &= ~LV_ACTIVE; + lvm_memcpy ( ( char*) vg[VG_CHR(minor)]->lv[l], ( char*) lv, sizeof ( lv_t)); + vg[VG_CHR(minor)]->lv[l]->lv_current_pe = pe; + vg[VG_CHR(minor)]->lv[l]->lv_open = lv_open; + + /* save availiable i/o statistic data */ + /* linear logical volume */ + if ( vg[VG_CHR(minor)]->lv[l]->lv_stripes < 2) { + /* Check what last LE shall be used */ + if ( end > vg[VG_CHR(minor)]->lv[l]->lv_current_le) + end = vg[VG_CHR(minor)]->lv[l]->lv_current_le; + for ( le = 0; le < end; le++) { + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].reads = pep1[le].reads; + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].writes = pep1[le].writes; + } + /* striped logical volume */ + } else { + uint i, j, source, dest, end, old_stripe_size, new_stripe_size; + + old_stripe_size = old_allocated_le / vg[VG_CHR(minor)]->lv[l]->lv_stripes; + new_stripe_size = vg[VG_CHR(minor)]->lv[l]->lv_allocated_le / + vg[VG_CHR(minor)]->lv[l]->lv_stripes; + end = old_stripe_size; + if ( end > new_stripe_size) end = new_stripe_size; + for ( i = source = dest = 0; + i < vg[VG_CHR(minor)]->lv[l]->lv_stripes; i++) { + for ( j = 0; j < end; j++) { + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[dest+j].reads = + pep1[source+j].reads; + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[dest+j].writes = + pep1[source+j].writes; + } + source += old_stripe_size; + dest += new_stripe_size; + } + } + vfree ( pep1); pep1 = NULL; + + + /* extend the PE count in PVs */ + for ( le = 0; le < vg[VG_CHR(minor)]->lv[l]->lv_allocated_le; le++) { + vg[VG_CHR(minor)]->pe_allocated++; + for ( p = 0; p < vg[VG_CHR(minor)]->pv_cur; p++) { + if ( vg[VG_CHR(minor)]->pv[p]->pv_dev == + vg[VG_CHR(minor)]->lv[l]->lv_current_pe[le].dev) { + vg[VG_CHR(minor)]->pv[p]->pe_allocated++; + break; + } + } + } + + lvm_gendisk.part[MINOR(vg[VG_CHR(minor)]->lv[l]->lv_dev)].start_sect = 0; + lvm_gendisk.part[MINOR(vg[VG_CHR(minor)]->lv[l]->lv_dev)].nr_sects = + vg[VG_CHR(minor)]->lv[l]->lv_size; + lvm_size[MINOR(vg[VG_CHR(minor)]->lv[l]->lv_dev)] = + vg[VG_CHR(minor)]->lv[l]->lv_size >> 1; + /* vg_lv_map array doesn't have to be changed here */ + + LVM_CORRECT_READ_AHEAD ( vg[VG_CHR(minor)]->lv[l]->lv_read_ahead); + read_ahead[MAJOR_NR] = vg[VG_CHR(minor)]->lv[l]->lv_read_ahead; + vg[VG_CHR(minor)]->lv[l]->lv_status = lv_status_save; + + return 0; +} /* do_lv_extend_reduce () */ + + +/* + * support function initialize gendisk variables + */ +#ifdef __initfunc +__initfunc ( void lvm_geninit ( struct gendisk *lvm_gdisk)) +#else +void __init lvm_geninit ( struct gendisk *lvm_gdisk) +#endif +{ + int i = 0; + +#ifdef DEBUG_GENDISK + printk ( KERN_DEBUG "%s -- lvm_gendisk\n", lvm_name); +#endif + + for ( i = 0; i < MAX_LV; i++) { + lvm_gendisk.part[i].start_sect = -1; /* avoid partition check */ + lvm_size[i] = lvm_gendisk.part[i].nr_sects = 0; + lvm_blocksizes[i] = BLOCK_SIZE; + } + + blksize_size[MAJOR_NR] = lvm_blocksizes; + blk_size[MAJOR_NR] = lvm_size; + + return; +} /* lvm_gen_init () */ + + +#ifdef LVM_GET_INODE +/* + * support function to get an empty inode + * + * Gets an empty inode to be inserted into the inode hash, + * so that a physical volume can't be mounted. + * This is analog to drivers/block/md.c + * + * Is this the real thing? + * + */ +struct inode *lvm_get_inode ( int dev) { + struct inode *inode_this = NULL; + + /* Lock the device by inserting a dummy inode. */ + inode_this = get_empty_inode (); + inode_this->i_dev = dev; + insert_inode_hash ( inode_this); + return inode_this; +} + + +/* + * support function to clear an inode + * + */ +void lvm_clear_inode ( struct inode *inode) { +#ifdef I_FREEING + inode->i_state |= I_FREEING; +#endif + clear_inode ( inode); + return; +} +#endif /* #ifdef LVM_GET_INODE */ + + +/* my strlen */ +inline int lvm_strlen ( char *s1) { + int len = 0; + + while ( s1[len] != 0) len++; + return len; +} + + +/* my strcmp */ +inline int lvm_strcmp ( char *s1, char *s2) { + while ( *s1 != 0 && *s2 != 0) { + if ( *s1 != *s2) return -1; + s1++; s2++; + } + if ( *s1 == 0 && *s2 == 0) return 0; + return -1; +} + + +/* my strrchr */ +inline char *lvm_strrchr ( char *s1, char c) { + char *s2 = NULL; + + while ( *s1 != 0) { + if ( *s1 == c) s2 = s1; + s1++; + } + return s2; +} + + +/* my memcpy */ +inline void lvm_memcpy ( char *dest, char *source, int size) { + for ( ;size > 0; size--) *dest++ = *source++; +} diff -urN 2.2.17pre19/drivers/block/rd.c 2.2.17pre19aa2/drivers/block/rd.c --- 2.2.17pre19/drivers/block/rd.c Tue Aug 22 14:53:39 2000 +++ 2.2.17pre19aa2/drivers/block/rd.c Sat Aug 26 17:13:52 2000 @@ -173,7 +173,7 @@ if (CURRENT->cmd == READ) memset(CURRENT->buffer, 0, len); else - set_bit(BH_Protected, &CURRENT->bh->b_state); + mark_buffer_protected(CURRENT->bh); end_request(1); goto repeat; diff -urN 2.2.17pre19/drivers/char/Config.in 2.2.17pre19aa2/drivers/char/Config.in --- 2.2.17pre19/drivers/char/Config.in Tue Jun 13 03:48:13 2000 +++ 2.2.17pre19aa2/drivers/char/Config.in Sat Aug 26 17:13:50 2000 @@ -117,6 +117,9 @@ tristate '/dev/nvram support' CONFIG_NVRAM bool 'Enhanced Real Time Clock Support' CONFIG_RTC +if [ "$CONFIG_RTC" = "y" -a "$ARCH" = "alpha" ]; then + bool ' Use only lightweight version (no interrupts)' CONFIG_RTC_LIGHT +fi if [ "$CONFIG_ALPHA_BOOK1" = "y" ]; then bool 'Tadpole ANA H8 Support' CONFIG_H8 fi diff -urN 2.2.17pre19/drivers/char/Makefile 2.2.17pre19aa2/drivers/char/Makefile --- 2.2.17pre19/drivers/char/Makefile Tue Aug 22 14:53:39 2000 +++ 2.2.17pre19aa2/drivers/char/Makefile Sat Aug 26 17:13:50 2000 @@ -20,7 +20,7 @@ L_TARGET := char.a M_OBJS := -L_OBJS := tty_io.o n_tty.o tty_ioctl.o mem.o random.o +L_OBJS := tty_io.o n_tty.o tty_ioctl.o mem.o random.o raw.o LX_OBJS := pty.o misc.o ifdef CONFIG_VT diff -urN 2.2.17pre19/drivers/char/mem.c 2.2.17pre19aa2/drivers/char/mem.c --- 2.2.17pre19/drivers/char/mem.c Sun Apr 2 21:07:48 2000 +++ 2.2.17pre19aa2/drivers/char/mem.c Sat Aug 26 17:13:50 2000 @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -620,6 +621,7 @@ if (register_chrdev(MEM_MAJOR,"mem",&memory_fops)) printk("unable to get major %d for memory devs\n", MEM_MAJOR); rand_initialize(); + raw_init(); #ifdef CONFIG_USB #ifdef CONFIG_USB_UHCI uhci_init(); diff -urN 2.2.17pre19/drivers/char/raw.c 2.2.17pre19aa2/drivers/char/raw.c --- 2.2.17pre19/drivers/char/raw.c Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/drivers/char/raw.c Sat Aug 26 17:13:50 2000 @@ -0,0 +1,387 @@ +/* + * linux/drivers/char/raw.c + * + * Front-end raw character devices. These can be bound to any block + * devices to provide genuine Unix raw character device semantics. + * + * We reserve minor number 0 for a control interface. ioctl()s on this + * device are used to bind the other minor numbers to block devices. + */ + +#include +#include +#include +#include +#include +#include + +#define dprintk(x...) + +static kdev_t raw_device_bindings[256] = {}; +static int raw_device_inuse[256] = {}; +static int raw_device_sector_size[256] = {}; +static int raw_device_sector_bits[256] = {}; + +extern struct file_operations * get_blkfops(unsigned int major); + +static ssize_t rw_raw_dev(int rw, struct file *, char *, size_t, loff_t *); + +ssize_t raw_read(struct file *, char *, size_t, loff_t *); +ssize_t raw_write(struct file *, const char *, size_t, loff_t *); +int raw_open(struct inode *, struct file *); +int raw_release(struct inode *, struct file *); +int raw_ctl_ioctl(struct inode *, struct file *, unsigned int, unsigned long); + + +static struct file_operations raw_fops = { + NULL, /* llseek */ + raw_read, /* read */ + raw_write, /* write */ + NULL, /* readdir */ + NULL, /* poll */ + NULL, /* ioctl */ + NULL, /* mmap */ + raw_open, /* open */ + NULL, /* flush */ + raw_release, /* release */ + NULL /* fsync */ +}; + +static struct file_operations raw_ctl_fops = { + NULL, /* llseek */ + NULL, /* read */ + NULL, /* write */ + NULL, /* readdir */ + NULL, /* poll */ + raw_ctl_ioctl, /* ioctl */ + NULL, /* mmap */ + raw_open, /* open */ + NULL, /* flush */ + NULL, /* no special release code */ + NULL /* fsync */ +}; + + + +void __init raw_init(void) +{ + register_chrdev(RAW_MAJOR, "raw", &raw_fops); +} + + +/* + * The raw IO open and release code needs to fake appropriate + * open/release calls to the underlying block devices. + */ + +static int bdev_open(kdev_t dev, int mode) +{ + int err = 0; + struct file dummy_file = {}; + struct dentry dummy_dentry = {}; + struct inode * inode = get_empty_inode(); + + if (!inode) + return -ENOMEM; + + dummy_file.f_op = get_blkfops(MAJOR(dev)); + if (!dummy_file.f_op) { + err = -ENODEV; + goto done; + } + + if (dummy_file.f_op->open) { + inode->i_rdev = dev; + dummy_dentry.d_inode = inode; + dummy_file.f_dentry = &dummy_dentry; + dummy_file.f_mode = mode; + err = dummy_file.f_op->open(inode, &dummy_file); + } + + done: + iput(inode); + return err; +} + +static int bdev_close(kdev_t dev) +{ + int err; + struct inode * inode = get_empty_inode(); + + if (!inode) + return -ENOMEM; + + inode->i_rdev = dev; + err = blkdev_release(inode); + iput(inode); + return err; +} + + + +/* + * Open/close code for raw IO. + */ + +int raw_open(struct inode *inode, struct file *filp) +{ + int minor; + kdev_t bdev; + int err; + int sector_size; + int sector_bits; + + minor = MINOR(inode->i_rdev); + + /* + * Is it the control device? + */ + + if (minor == 0) { + filp->f_op = &raw_ctl_fops; + return 0; + } + + /* + * No, it is a normal raw device. All we need to do on open is + * to check that the device is bound, and force the underlying + * block device to a sector-size blocksize. + */ + + bdev = raw_device_bindings[minor]; + if (bdev == NODEV) + return -ENODEV; + + err = bdev_open(bdev, filp->f_mode); + if (err) + return err; + + /* + * Don't change the blocksize if we already have users using + * this device + */ + + if (raw_device_inuse[minor]++) + return 0; + + /* + * Don't interfere with mounted devices: we cannot safely set + * the blocksize on a device which is already mounted. + */ + + sector_size = 512; + if (lookup_vfsmnt(bdev) != NULL) { + if (blksize_size[MAJOR(bdev)]) + sector_size = blksize_size[MAJOR(bdev)][MINOR(bdev)]; + } else { + if (hardsect_size[MAJOR(bdev)]) + sector_size = hardsect_size[MAJOR(bdev)][MINOR(bdev)]; + } + + set_blocksize(bdev, sector_size); + raw_device_sector_size[minor] = sector_size; + + for (sector_bits = 0; !(sector_size & 1); ) + sector_size>>=1, sector_bits++; + raw_device_sector_bits[minor] = sector_bits; + + return 0; +} + +int raw_release(struct inode *inode, struct file *filp) +{ + int minor; + kdev_t bdev; + + minor = MINOR(inode->i_rdev); + bdev = raw_device_bindings[minor]; + bdev_close(bdev); + raw_device_inuse[minor]--; + return 0; +} + + + +/* + * Deal with ioctls against the raw-device control interface, to bind + * and unbind other raw devices. + */ + +int raw_ctl_ioctl(struct inode *inode, + struct file *flip, + unsigned int command, + unsigned long arg) +{ + struct raw_config_request rq; + int err = 0; + int minor; + + switch (command) { + case RAW_SETBIND: + case RAW_GETBIND: + + /* First, find out which raw minor we want */ + + err = copy_from_user(&rq, (void *) arg, sizeof(rq)); + if (err) + break; + + minor = rq.raw_minor; + if (minor == 0 || minor > MINORMASK) { + err = -EINVAL; + break; + } + + if (command == RAW_SETBIND) { + /* + * For now, we don't need to check that the underlying + * block device is present or not: we can do that when + * the raw device is opened. Just check that the + * major/minor numbers make sense. + */ + + if (rq.block_major == NODEV || + rq.block_major > MAX_BLKDEV || + rq.block_minor > MINORMASK) { + err = -EINVAL; + break; + } + + if (raw_device_inuse[minor]) { + err = -EBUSY; + break; + } + raw_device_bindings[minor] = + MKDEV(rq.block_major, rq.block_minor); + } else { + rq.block_major = MAJOR(raw_device_bindings[minor]); + rq.block_minor = MINOR(raw_device_bindings[minor]); + err = copy_to_user((void *) arg, &rq, sizeof(rq)); + } + break; + + default: + err = -EINVAL; + } + + return err; +} + + + +ssize_t raw_read(struct file *filp, char * buf, + size_t size, loff_t *offp) +{ + return rw_raw_dev(READ, filp, buf, size, offp); +} + +ssize_t raw_write(struct file *filp, const char *buf, + size_t size, loff_t *offp) +{ + return rw_raw_dev(WRITE, filp, (char *) buf, size, offp); +} + +#define SECTOR_BITS 9 +#define SECTOR_SIZE (1U << SECTOR_BITS) +#define SECTOR_MASK (SECTOR_SIZE - 1) + +ssize_t rw_raw_dev(int rw, struct file *filp, char *buf, + size_t size, loff_t *offp) +{ + struct kiobuf * iobuf; + int err; + unsigned long blocknr, blocks; + unsigned long b[KIO_MAX_SECTORS]; + size_t transferred; + int iosize; + int i; + int minor; + kdev_t dev; + unsigned long limit; + + int sector_size, sector_bits, sector_mask; + int max_sectors; + + /* + * First, a few checks on device size limits + */ + + minor = MINOR(filp->f_dentry->d_inode->i_rdev); + dev = raw_device_bindings[minor]; + sector_size = raw_device_sector_size[minor]; + sector_bits = raw_device_sector_bits[minor]; + sector_mask = sector_size- 1; + max_sectors = KIO_MAX_SECTORS >> (sector_bits - 9); + + if (blk_size[MAJOR(dev)]) + limit = (((loff_t) blk_size[MAJOR(dev)][MINOR(dev)]) << BLOCK_SIZE_BITS) >> sector_bits; + else + limit = INT_MAX; + dprintk ("rw_raw_dev: dev %d:%d (+%d)\n", + MAJOR(dev), MINOR(dev), limit); + + if ((*offp & sector_mask) || (size & sector_mask)) + return -EINVAL; + if ((*offp >> sector_bits) >= limit) { + if (size) + return -ENXIO; + return 0; + } + + /* + * We'll just use one kiobuf + */ + + err = alloc_kiovec(1, &iobuf); + if (err) + return err; + + /* + * Split the IO into KIO_MAX_SECTORS chunks, mapping and + * unmapping the single kiobuf as we go to perform each chunk of + * IO. + */ + + transferred = 0; + blocknr = *offp >> sector_bits; + while (size > 0) { + blocks = size >> sector_bits; + if (blocks > max_sectors) + blocks = max_sectors; + if (blocks > limit - blocknr) + blocks = limit - blocknr; + if (!blocks) + break; + + iosize = blocks << sector_bits; + + err = map_user_kiobuf(rw, iobuf, (unsigned long) buf, iosize); + if (err) + break; + + for (i=0; i < blocks; i++) + b[i] = blocknr++; + + err = brw_kiovec(rw, 1, &iobuf, dev, b, sector_size); + + if (err >= 0) { + transferred += err; + size -= err; + buf += err; + } + + unmap_kiobuf(iobuf); + + if (err != iosize) + break; + } + + free_kiovec(1, &iobuf); + + if (transferred) { + *offp += transferred; + return transferred; + } + + return err; +} diff -urN 2.2.17pre19/drivers/char/rtc.c 2.2.17pre19aa2/drivers/char/rtc.c --- 2.2.17pre19/drivers/char/rtc.c Sun Jan 2 18:26:37 2000 +++ 2.2.17pre19aa2/drivers/char/rtc.c Sat Aug 26 17:13:50 2000 @@ -113,13 +113,19 @@ unsigned char days_in_mo[] = {0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}; +extern spinlock_t rtc_lock; + /* * A very tiny interrupt handler. It runs with SA_INTERRUPT set, - * so that there is no possibility of conflicting with the - * set_rtc_mmss() call that happens during some timer interrupts. + * but there is possibility of conflicting with the set_rtc_mmss() + * call (the rtc irq and the timer irq can easily run at the same + * time in two different CPUs). So we need to serializes + * accesses to the chip with the rtc_lock spinlock that each + * architecture should implement in the timer code. * (See ./arch/XXXX/kernel/time.c for the set_rtc_mmss() function.) */ +#ifndef CONFIG_RTC_LIGHT static void rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs) { /* @@ -131,12 +137,16 @@ rtc_irq_data += 0x100; rtc_irq_data &= ~0xff; + /* runs with irq locally disabled (see SA_INTERRUPT flag). */ + spin_lock(&rtc_lock); rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0); + spin_unlock(&rtc_lock); wake_up_interruptible(&rtc_wait); if (rtc_status & RTC_TIMER_ON) mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); } +#endif /* * Now all the various file operations that we export. @@ -150,6 +160,9 @@ static ssize_t rtc_read(struct file *file, char *buf, size_t count, loff_t *ppos) { +#ifdef CONFIG_RTC_LIGHT + return -EIO; +#else struct wait_queue wait = { current, NULL }; unsigned long data; ssize_t retval; @@ -181,6 +194,7 @@ remove_wait_queue(&rtc_wait, &wait); return retval; +#endif } static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, @@ -191,6 +205,7 @@ struct rtc_time wtime; switch (cmd) { +#ifndef CONFIG_RTC_LIGHT case RTC_AIE_OFF: /* Mask alarm int. enab. bit */ { mask_rtc_irq_bit(RTC_AIE); @@ -238,6 +253,7 @@ set_rtc_irq_bit(RTC_UIE); return 0; } +#endif case RTC_ALM_READ: /* Read the present alarm time */ { /* @@ -276,8 +292,7 @@ if (sec >= 60) sec = 0xff; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); if (!(CMOS_READ(RTC_CONTROL) & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { @@ -288,7 +303,7 @@ CMOS_WRITE(hrs, RTC_HOURS_ALARM); CMOS_WRITE(min, RTC_MINUTES_ALARM); CMOS_WRITE(sec, RTC_SECONDS_ALARM); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); return 0; } @@ -336,12 +351,11 @@ if ((yrs -= epoch) > 255) /* They are unsigned */ return -EINVAL; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); if (!(CMOS_READ(RTC_CONTROL) & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { if (yrs > 169) { - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); return -EINVAL; } if (yrs >= 100) @@ -370,13 +384,14 @@ CMOS_WRITE(save_control, RTC_CONTROL); CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); return 0; } case RTC_IRQP_READ: /* Read the periodic IRQ rate. */ { return put_user(rtc_freq, (unsigned long *)arg); } +#ifndef CONFIG_RTC_LIGHT case RTC_IRQP_SET: /* Set periodic IRQ rate. */ { int tmp = 0; @@ -405,14 +420,14 @@ rtc_freq = arg; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); val = CMOS_READ(RTC_FREQ_SELECT) & 0xf0; val |= (16 - tmp); CMOS_WRITE(val, RTC_FREQ_SELECT); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); return 0; } +#endif #ifdef __alpha__ case RTC_EPOCH_READ: /* Read the epoch. */ { @@ -462,18 +477,18 @@ * in use, and clear the data. */ +#ifndef CONFIG_RTC_LIGHT unsigned char tmp; unsigned long flags; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); tmp = CMOS_READ(RTC_CONTROL); tmp &= ~RTC_PIE; tmp &= ~RTC_AIE; tmp &= ~RTC_UIE; CMOS_WRITE(tmp, RTC_CONTROL); CMOS_READ(RTC_INTR_FLAGS); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); if (rtc_status & RTC_TIMER_ON) { rtc_status &= ~RTC_TIMER_ON; @@ -481,10 +496,12 @@ } rtc_irq_data = 0; +#endif rtc_status &= ~RTC_IS_OPEN; return 0; } +#ifndef CONFIG_RTC_LIGHT static unsigned int rtc_poll(struct file *file, poll_table *wait) { poll_wait(file, &rtc_wait, wait); @@ -492,6 +509,7 @@ return POLLIN | POLLRDNORM; return 0; } +#endif /* * The various file operations we support. @@ -502,7 +520,11 @@ rtc_read, NULL, /* No write */ NULL, /* No readdir */ +#ifdef CONFIG_RTC_LIGHT + NULL, +#else rtc_poll, +#endif rtc_ioctl, NULL, /* No mmap */ rtc_open, @@ -526,12 +548,14 @@ char *guess = NULL; #endif printk(KERN_INFO "Real Time Clock Driver v%s\n", RTC_VERSION); +#ifndef CONFIG_RTC_LIGHT if(request_irq(RTC_IRQ, rtc_interrupt, SA_INTERRUPT, "rtc", NULL)) { /* Yeah right, seeing as irq 8 doesn't even hit the bus. */ printk(KERN_ERR "rtc: IRQ %d is not free.\n", RTC_IRQ); return -EIO; } +#endif misc_register(&rtc_dev); /* Check region? Naaah! Just snarf it up. */ request_region(RTC_PORT(0), RTC_IO_EXTENT, "rtc"); @@ -546,11 +570,10 @@ while (jiffies - uip_watchdog < 2*HZ/100) barrier(); - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); year = CMOS_READ(RTC_YEAR); ctrl = CMOS_READ(RTC_CONTROL); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD) BCD_TO_BIN(year); /* This should never happen... */ @@ -565,14 +588,15 @@ if (guess) printk("rtc: %s epoch (%lu) detected\n", guess, epoch); #endif +#ifndef CONFIG_RTC_LIGHT init_timer(&rtc_irq_timer); rtc_irq_timer.function = rtc_dropped_irq; rtc_wait = NULL; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); /* Initialize periodic freq. to CMOS reset default, which is 1024Hz */ CMOS_WRITE(((CMOS_READ(RTC_FREQ_SELECT) & 0xF0) | 0x06), RTC_FREQ_SELECT); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); +#endif rtc_freq = 1024; return 0; } @@ -589,6 +613,7 @@ * for something that requires a steady > 1KHz signal anyways.) */ +#ifndef CONFIG_RTC_LIGHT void rtc_dropped_irq(unsigned long data) { unsigned long flags; @@ -596,13 +621,13 @@ printk(KERN_INFO "rtc: lost some interrupts at %ldHz.\n", rtc_freq); mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); rtc_irq_data += ((rtc_freq/HZ)<<8); rtc_irq_data &= ~0xff; rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0); /* restart */ - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); } +#endif /* * Info exported via "/proc/rtc". @@ -615,11 +640,10 @@ unsigned char batt, ctrl; unsigned long flags; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); batt = CMOS_READ(RTC_VALID) & RTC_VRT; ctrl = CMOS_READ(RTC_CONTROL); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); p = buf; @@ -690,10 +714,9 @@ unsigned long flags; unsigned char uip; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); uip = (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); return uip; } @@ -723,8 +746,7 @@ * RTC has RTC_DAY_OF_WEEK, we ignore it, as it is only updated * by the RTC when initially set to a non-zero value. */ - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); rtc_tm->tm_sec = CMOS_READ(RTC_SECONDS); rtc_tm->tm_min = CMOS_READ(RTC_MINUTES); rtc_tm->tm_hour = CMOS_READ(RTC_HOURS); @@ -732,7 +754,7 @@ rtc_tm->tm_mon = CMOS_READ(RTC_MONTH); rtc_tm->tm_year = CMOS_READ(RTC_YEAR); ctrl = CMOS_READ(RTC_CONTROL); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { @@ -763,13 +785,12 @@ * Only the values that we read from the RTC are set. That * means only tm_hour, tm_min, and tm_sec. */ - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); alm_tm->tm_sec = CMOS_READ(RTC_SECONDS_ALARM); alm_tm->tm_min = CMOS_READ(RTC_MINUTES_ALARM); alm_tm->tm_hour = CMOS_READ(RTC_HOURS_ALARM); ctrl = CMOS_READ(RTC_CONTROL); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { @@ -789,18 +810,18 @@ * meddles with the interrupt enable/disable bits. */ +#ifndef CONFIG_RTC_LIGHT void mask_rtc_irq_bit(unsigned char bit) { unsigned char val; unsigned long flags; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); val = CMOS_READ(RTC_CONTROL); val &= ~bit; CMOS_WRITE(val, RTC_CONTROL); CMOS_READ(RTC_INTR_FLAGS); - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); rtc_irq_data = 0; } @@ -809,12 +830,12 @@ unsigned char val; unsigned long flags; - save_flags(flags); - cli(); + spin_lock_irqsave(&rtc_lock, flags); val = CMOS_READ(RTC_CONTROL); val |= bit; CMOS_WRITE(val, RTC_CONTROL); CMOS_READ(RTC_INTR_FLAGS); rtc_irq_data = 0; - restore_flags(flags); + spin_unlock_irqrestore(&rtc_lock, flags); } +#endif diff -urN 2.2.17pre19/fs/Makefile 2.2.17pre19aa2/fs/Makefile --- 2.2.17pre19/fs/Makefile Thu Aug 26 14:20:19 1999 +++ 2.2.17pre19aa2/fs/Makefile Sat Aug 26 17:13:50 2000 @@ -13,7 +13,7 @@ O_OBJS = open.o read_write.o devices.o file_table.o buffer.o \ super.o block_dev.o stat.o exec.o pipe.o namei.o fcntl.o \ ioctl.o readdir.o select.o fifo.o locks.o filesystems.o \ - dcache.o inode.o attr.o bad_inode.o file.o $(BINFMTS) + dcache.o inode.o attr.o bad_inode.o file.o iobuf.o $(BINFMTS) MOD_LIST_NAME := FS_MODULES ALL_SUB_DIRS = coda minix ext2 fat msdos vfat proc isofs nfs umsdos ntfs \ diff -urN 2.2.17pre19/fs/adfs/dir.c 2.2.17pre19aa2/fs/adfs/dir.c --- 2.2.17pre19/fs/adfs/dir.c Mon Jan 17 16:44:41 2000 +++ 2.2.17pre19aa2/fs/adfs/dir.c Sat Aug 26 17:13:51 2000 @@ -320,12 +320,12 @@ if (filp->f_pos < 2) { if (filp->f_pos < 1) { - if (filldir (dirent, ".", 1, 0, inode->i_ino) < 0) + if (filldir (dirent, ".", 1, 0, inode->i_ino, DT_DIR) < 0) return 0; filp->f_pos ++; } if (filldir (dirent, "..", 2, 1, - adfs_inode_generate (parent_object_id, 0)) < 0) + adfs_inode_generate (parent_object_id, 0), DT_DIR) < 0) return 0; filp->f_pos ++; } @@ -337,7 +337,7 @@ if (!adfs_dir_get (sb, bh, buffers, pos, dir_object_id, &ide)) break; - if (filldir (dirent, ide.name, ide.name_len, filp->f_pos, ide.inode_no) < 0) + if (filldir (dirent, ide.name, ide.name_len, filp->f_pos, ide.inode_no, DT_UNKNOWN) < 0) return 0; filp->f_pos ++; pos += 26; diff -urN 2.2.17pre19/fs/adfs/inode.c 2.2.17pre19aa2/fs/adfs/inode.c --- 2.2.17pre19/fs/adfs/inode.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/adfs/inode.c Sat Aug 26 17:13:51 2000 @@ -181,7 +181,7 @@ inode->i_nlink = 2; inode->i_size = ADFS_NEWDIR_SIZE; inode->i_blksize = PAGE_SIZE; - inode->i_blocks = inode->i_size / sb->s_blocksize; + inode->i_blocks = inode->i_size >> sb->s_blocksize_bits; inode->i_mtime = inode->i_atime = inode->i_ctime = 0; diff -urN 2.2.17pre19/fs/affs/dir.c 2.2.17pre19aa2/fs/affs/dir.c --- 2.2.17pre19/fs/affs/dir.c Mon Jan 17 16:44:41 2000 +++ 2.2.17pre19aa2/fs/affs/dir.c Sat Aug 26 17:13:51 2000 @@ -98,14 +98,14 @@ if (filp->f_pos == 0) { filp->private_data = (void *)0; - if (filldir(dirent,".",1,filp->f_pos,inode->i_ino) < 0) { + if (filldir(dirent,".",1,filp->f_pos,inode->i_ino, DT_DIR) < 0) { return 0; } ++filp->f_pos; stored++; } if (filp->f_pos == 1) { - if (filldir(dirent,"..",2,filp->f_pos,affs_parent_ino(inode)) < 0) { + if (filldir(dirent,"..",2,filp->f_pos,affs_parent_ino(inode), DT_DIR) < 0) { return stored; } filp->f_pos = 2; @@ -161,7 +161,7 @@ pr_debug("AFFS: readdir(): filldir(\"%.*s\",ino=%lu), i=%d\n", namelen,name,ino,i); filp->private_data = (void *)ino; - if (filldir(dirent,name,namelen,filp->f_pos,ino) < 0) + if (filldir(dirent,name,namelen,filp->f_pos,ino, DT_UNKNOWN) < 0) goto readdir_done; filp->private_data = (void *)i; affs_brelse(fh_bh); diff -urN 2.2.17pre19/fs/affs/file.c 2.2.17pre19aa2/fs/affs/file.c --- 2.2.17pre19/fs/affs/file.c Tue Aug 22 14:53:58 2000 +++ 2.2.17pre19aa2/fs/affs/file.c Sat Aug 26 17:13:51 2000 @@ -582,17 +582,17 @@ affs_file_write(struct file *filp, const char *buf, size_t count, loff_t *ppos) { struct inode *inode = filp->f_dentry->d_inode; - off_t pos; + loff_t pos; ssize_t written; ssize_t c; - ssize_t blocksize; + ssize_t blocksize, blockshift; struct buffer_head *bh; char *p; if (!count) return 0; - pr_debug("AFFS: file_write(ino=%lu,pos=%lu,count=%d)\n",inode->i_ino, - (unsigned long)*ppos,count); + pr_debug("AFFS: file_write(ino=%lu,pos=%Lu,count=%d)\n",inode->i_ino, + *ppos,count); if (!inode) { affs_error(inode->i_sb,"file_write","Inode = NULL"); @@ -611,16 +611,22 @@ else pos = *ppos; written = 0; - blocksize = AFFS_I2BSIZE(inode); + blocksize = AFFS_I2BSIZE(inode); + blockshift = AFFS_I2BITS(inode); + + if (pos >= 0x7fffffff) /* Max size: 2G-1 */ + return -EFBIG; + if ((pos + count) > 0x7fffffff) + count = 0x7fffffff - pos; while (written < count) { - bh = affs_getblock(inode,pos / blocksize); + bh = affs_getblock(inode, pos >> blockshift); if (!bh) { if (!written) written = -ENOSPC; break; } - c = blocksize - (pos % blocksize); + c = blocksize - (pos & (blocksize -1)); if (c > count - written) c = count - written; if (c != blocksize && !buffer_uptodate(bh)) { @@ -633,7 +639,7 @@ break; } } - p = (pos % blocksize) + bh->b_data; + p = (pos & (blocksize -1)) + bh->b_data; c -= copy_from_user(p,buf,c); if (!c) { affs_brelse(bh); @@ -664,7 +670,7 @@ off_t pos; ssize_t written; ssize_t c; - ssize_t blocksize; + ssize_t blocksize, blockshift; struct buffer_head *bh; char *p; @@ -692,15 +698,16 @@ bh = NULL; blocksize = AFFS_I2BSIZE(inode) - 24; + blockshift = AFFS_I2BITS(inode); written = 0; while (written < count) { - bh = affs_getblock(inode,pos / blocksize); + bh = affs_getblock(inode,pos >> blockshift); if (!bh) { if (!written) written = -ENOSPC; break; } - c = blocksize - (pos % blocksize); + c = blocksize - (pos & (blocksize -1)); if (c > count - written) c = count - written; if (c != blocksize && !buffer_uptodate(bh)) { @@ -713,7 +720,7 @@ break; } } - p = (pos % blocksize) + bh->b_data + 24; + p = (pos & (blocksize -1)) + bh->b_data + 24; c -= copy_from_user(p,buf,c); if (!c) { affs_brelse(bh); @@ -782,10 +789,10 @@ int rem; int ext; - pr_debug("AFFS: truncate(inode=%ld,size=%lu)\n",inode->i_ino,inode->i_size); + pr_debug("AFFS: truncate(inode=%ld,size=%Lu)\n",inode->i_ino,inode->i_size); net_blocksize = blocksize - ((inode->i_sb->u.affs_sb.s_flags & SF_OFS) ? 24 : 0); - first = (inode->i_size + net_blocksize - 1) / net_blocksize; + first = (u_long)(inode->i_size + net_blocksize - 1) / net_blocksize; if (inode->u.affs_i.i_lastblock < first - 1) { /* There has to be at least one new block to be allocated */ if (!inode->u.affs_i.i_ec && alloc_ext_cache(inode)) { @@ -795,9 +802,9 @@ bh = affs_getblock(inode,first - 1); if (!bh) { affs_warning(inode->i_sb,"truncate","Cannot extend file"); - inode->i_size = net_blocksize * (inode->u.affs_i.i_lastblock + 1); + inode->i_size = (inode->u.affs_i.i_lastblock + 1) * net_blocksize; } else if (inode->i_sb->u.affs_sb.s_flags & SF_OFS) { - rem = inode->i_size % net_blocksize; + rem = ((u_long)inode->i_size) & (net_blocksize -1); DATA_FRONT(bh)->data_size = cpu_to_be32(rem ? rem : net_blocksize); affs_fix_checksum(blocksize,bh->b_data,5); mark_buffer_dirty(bh,0); @@ -864,7 +871,7 @@ affs_free_block(inode->i_sb,ekey); ekey = key; } - block = ((inode->i_size + net_blocksize - 1) / net_blocksize) - 1; + block = (((u_long)inode->i_size + net_blocksize - 1) / net_blocksize) - 1; inode->u.affs_i.i_lastblock = block; /* If the file is not truncated to a block boundary, @@ -872,7 +879,7 @@ * so it cannot become accessible again. */ - rem = inode->i_size % net_blocksize; + rem = inode->i_size & (net_blocksize -1); if (rem) { if ((inode->i_sb->u.affs_sb.s_flags & SF_OFS)) rem += 24; diff -urN 2.2.17pre19/fs/affs/inode.c 2.2.17pre19aa2/fs/affs/inode.c --- 2.2.17pre19/fs/affs/inode.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/affs/inode.c Sat Aug 26 17:13:51 2000 @@ -146,7 +146,7 @@ block = AFFS_I2BSIZE(inode) - 24; else block = AFFS_I2BSIZE(inode); - inode->u.affs_i.i_lastblock = ((inode->i_size + block - 1) / block) - 1; + inode->u.affs_i.i_lastblock = (((u_long)inode->i_size + block - 1) / block) - 1; break; case ST_SOFTLINK: inode->i_mode |= S_IFLNK; diff -urN 2.2.17pre19/fs/autofs/dir.c 2.2.17pre19aa2/fs/autofs/dir.c --- 2.2.17pre19/fs/autofs/dir.c Mon Jan 17 16:44:41 2000 +++ 2.2.17pre19aa2/fs/autofs/dir.c Sat Aug 26 17:13:51 2000 @@ -20,12 +20,12 @@ switch((unsigned long) filp->f_pos) { case 0: - if (filldir(dirent, ".", 1, 0, inode->i_ino) < 0) + if (filldir(dirent, ".", 1, 0, inode->i_ino, DT_DIR) < 0) return 0; filp->f_pos++; /* fall through */ case 1: - if (filldir(dirent, "..", 2, 1, AUTOFS_ROOT_INO) < 0) + if (filldir(dirent, "..", 2, 1, AUTOFS_ROOT_INO, DT_DIR) < 0) return 0; filp->f_pos++; /* fall through */ diff -urN 2.2.17pre19/fs/autofs/root.c 2.2.17pre19aa2/fs/autofs/root.c --- 2.2.17pre19/fs/autofs/root.c Mon Jan 17 16:44:41 2000 +++ 2.2.17pre19aa2/fs/autofs/root.c Sat Aug 26 17:13:51 2000 @@ -79,19 +79,19 @@ switch(nr) { case 0: - if (filldir(dirent, ".", 1, nr, inode->i_ino) < 0) + if (filldir(dirent, ".", 1, nr, inode->i_ino, DT_DIR) < 0) return 0; filp->f_pos = ++nr; /* fall through */ case 1: - if (filldir(dirent, "..", 2, nr, inode->i_ino) < 0) + if (filldir(dirent, "..", 2, nr, inode->i_ino, DT_DIR) < 0) return 0; filp->f_pos = ++nr; /* fall through */ default: while ( onr = nr, ent = autofs_hash_enum(dirhash,&nr,ent) ) { if ( !ent->dentry || ent->dentry->d_mounts != ent->dentry ) { - if (filldir(dirent,ent->name,ent->len,onr,ent->ino) < 0) + if (filldir(dirent,ent->name,ent->len,onr,ent->ino, DT_UNKNOWN) < 0) return 0; filp->f_pos = nr; } diff -urN 2.2.17pre19/fs/binfmt_aout.c 2.2.17pre19aa2/fs/binfmt_aout.c --- 2.2.17pre19/fs/binfmt_aout.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/binfmt_aout.c Sat Aug 26 17:13:52 2000 @@ -62,9 +62,9 @@ static int dump_write(struct file *file, const void *addr, int nr) { int r; - down(&file->f_dentry->d_inode->i_sem); + fs_down(&file->f_dentry->d_inode->i_sem); r = file->f_op->write(file, addr, nr, &file->f_pos) == nr; - up(&file->f_dentry->d_inode->i_sem); + fs_up(&file->f_dentry->d_inode->i_sem); return r; } diff -urN 2.2.17pre19/fs/binfmt_elf.c 2.2.17pre19aa2/fs/binfmt_elf.c --- 2.2.17pre19/fs/binfmt_elf.c Tue Jun 13 03:48:14 2000 +++ 2.2.17pre19aa2/fs/binfmt_elf.c Sat Aug 26 17:13:52 2000 @@ -948,9 +948,9 @@ static int dump_write(struct file *file, const void *addr, int nr) { int r; - down(&file->f_dentry->d_inode->i_sem); + fs_down(&file->f_dentry->d_inode->i_sem); r = file->f_op->write(file, addr, nr, &file->f_pos) == nr; - up(&file->f_dentry->d_inode->i_sem); + fs_up(&file->f_dentry->d_inode->i_sem); return r; } diff -urN 2.2.17pre19/fs/buffer.c 2.2.17pre19aa2/fs/buffer.c --- 2.2.17pre19/fs/buffer.c Tue Aug 22 14:53:58 2000 +++ 2.2.17pre19aa2/fs/buffer.c Sat Aug 26 17:13:52 2000 @@ -27,9 +27,7 @@ /* invalidate_buffers/set_blocksize/sync_dev race conditions and fs corruption fixes, 1999, Andrea Arcangeli */ -/* Wait for dirty buffers to sync in sync_page_buffers. - * 2000, Marcelo Tosatti - */ +/* async buffer flushing, 1999 Andrea Arcangeli */ #include #include @@ -43,6 +41,7 @@ #include #include #include +#include #include #include @@ -83,6 +82,7 @@ static int nr_buffers = 0; static int nr_buffers_type[NR_LIST] = {0,}; +static unsigned long size_buffers_type[NR_LIST]; static int nr_buffer_heads = 0; static int nr_unused_buffer_heads = 0; static int nr_hashed_buffers = 0; @@ -143,13 +143,14 @@ bh->b_count++; wait.task = tsk; add_wait_queue(&bh->b_wait, &wait); -repeat: - tsk->state = TASK_UNINTERRUPTIBLE; - run_task_queue(&tq_disk); - if (buffer_locked(bh)) { + do { + run_task_queue(&tq_disk); + tsk->state = TASK_UNINTERRUPTIBLE; + mb(); + if (!buffer_locked(bh)) + break; schedule(); - goto repeat; - } + } while (buffer_locked(bh)); tsk->state = TASK_RUNNING; remove_wait_queue(&bh->b_wait, &wait); bh->b_count--; @@ -359,9 +360,9 @@ goto out_putf; /* We need to protect against concurrent writers.. */ - down(&inode->i_sem); + fs_down(&inode->i_sem); err = file->f_op->fsync(file, dentry); - up(&inode->i_sem); + fs_up(&inode->i_sem); out_putf: fput(file); @@ -396,9 +397,9 @@ goto out_putf; /* this needs further work, at the moment it is identical to fsync() */ - down(&inode->i_sem); + fs_down(&inode->i_sem); err = file->f_op->fsync(file, dentry); - up(&inode->i_sem); + fs_up(&inode->i_sem); out_putf: fput(file); @@ -474,6 +475,7 @@ return; } nr_buffers_type[bh->b_list]--; + size_buffers_type[bh->b_list] -= bh->b_size; remove_from_hash_queue(bh); remove_from_lru_list(bh); } @@ -523,6 +525,7 @@ (*bhp)->b_prev_free = bh; nr_buffers_type[bh->b_list]++; + size_buffers_type[bh->b_list] += bh->b_size; /* Put the buffer in new hash-queue if it has a device. */ bh->b_next = NULL; @@ -571,8 +574,10 @@ { struct buffer_head * bh; bh = find_buffer(dev,block,size); - if (bh) + if (bh) { bh->b_count++; + touch_buffer(bh); + } return bh; } @@ -816,6 +821,46 @@ insert_into_queues(bh); } +/* -1 -> no need to flush + 0 -> async flush + 1 -> sync flush (wait for I/O completation) */ +static int balance_dirty_state(kdev_t dev) +{ + unsigned long dirty, tot, hard_dirty_limit, soft_dirty_limit; + + dirty = size_buffers_type[BUF_DIRTY] >> PAGE_SHIFT; + tot = (buffermem >> PAGE_SHIFT) + nr_free_pages; + tot -= size_buffers_type[BUF_PROTECTED] >> PAGE_SHIFT; + + dirty *= 200; + soft_dirty_limit = tot * bdf_prm.b_un.nfract; + hard_dirty_limit = soft_dirty_limit * 2; + + if (dirty > soft_dirty_limit) + { + if (dirty > hard_dirty_limit) + return 1; + return 0; + } + return -1; +} + +/* + * if a new dirty buffer is created we need to balance bdflush. + * + * in the future we might want to make bdflush aware of different + * pressures on different devices - thus the (currently unused) + * 'dev' parameter. + */ +void balance_dirty(kdev_t dev) +{ + int state = balance_dirty_state(dev); + + if (state < 0) + return; + wakeup_bdflush(state); +} + /* * A buffer may need to be moved from one buffer list to another * (e.g. in case it is not shared any more). Handle this. @@ -828,7 +873,9 @@ printk("Attempt to refile free buffer\n"); return; } - if (buffer_dirty(buf)) + if (buffer_protected(buf)) + dispose = BUF_PROTECTED; + else if (buffer_dirty(buf)) dispose = BUF_DIRTY; else if (buffer_locked(buf)) dispose = BUF_LOCKED; @@ -837,13 +884,7 @@ if(dispose != buf->b_list) { file_buffer(buf, dispose); if(dispose == BUF_DIRTY) { - int too_many = (nr_buffers * bdf_prm.b_un.nfract/100); - - /* This buffer is dirty, maybe we need to start flushing. - * If too high a percentage of the buffers are dirty... - */ - if (nr_buffers_type[BUF_DIRTY] > too_many) - wakeup_bdflush(1); + balance_dirty(buf->b_dev); /* If this is a loop device, and * more than half of the buffers are dirty... @@ -864,7 +905,6 @@ /* If dirty, mark the time this buffer should be written back. */ set_writetime(buf, 0); refile_buffer(buf); - touch_buffer(buf); if (buf->b_count) { buf->b_count--; @@ -1168,7 +1208,7 @@ #endif } if (test_and_clear_bit(PG_swap_unlock_after, &page->flags)) - swap_after_unlock_page(page->offset); + swap_after_unlock_page(pgoff2ulong(page->index)); if (test_and_clear_bit(PG_free_after, &page->flags)) __free_page(page); } @@ -1261,6 +1301,225 @@ return; } + +/* + * For brw_kiovec: submit a set of buffer_head temporary IOs and wait + * for them to complete. Clean up the buffer_heads afterwards. + */ + +#define dprintk(x...) + +static int do_kio(int rw, int nr, struct buffer_head *bh[], int size) +{ + int iosize; + int i; + int err; + struct buffer_head *tmp; + + dprintk ("do_kio start\n"); + + ll_rw_block(rw, nr, bh); + iosize = err = 0; + + for (i = nr; --i >= 0; ) { + tmp = bh[i]; + wait_on_buffer(tmp); + if (!buffer_uptodate(tmp)) { + err = -EIO; + /* We are waiting on bh'es in reverse order so + clearing iosize on error calculates the + amount of IO before the first error. */ + iosize = 0; + } + + free_async_buffers(tmp); + iosize += size; + } + + dprintk ("do_kio end %d %d\n", iosize, err); + + if (iosize) + return iosize; + else + return err; +} + +/* + * Clean up the bounce buffers potentially used by brw_kiovec. All of + * the kiovec's bounce buffers must be cleared of temporarily allocated + * bounce pages, but only READ pages for whom IO completed successfully + * can actually be transferred back to user space. + */ + +void cleanup_bounce_buffers(int rw, int nr, struct kiobuf *iovec[], + int transferred) +{ + int i; + for (i = 0; i < nr; i++) { + struct kiobuf *iobuf = iovec[i]; + if (iobuf->bounced) { + if (transferred > 0 && !(rw & WRITE)) + kiobuf_copy_bounce(iobuf, COPY_FROM_BOUNCE, + transferred); + + clear_kiobuf_bounce_pages(iobuf); + } + transferred -= iobuf->length; + } +} + +/* + * Start I/O on a physical range of kernel memory, defined by a vector + * of kiobuf structs (much like a user-space iovec list). + * + * The kiobuf must already be locked for IO. IO is submitted + * asynchronously: you need to check page->locked, page->uptodate, and + * maybe wait on page->wait. + * + * It is up to the caller to make sure that there are enough blocks + * passed in to completely map the iobufs to disk. + */ + +int brw_kiovec(int rw, int nr, struct kiobuf *iovec[], + kdev_t dev, unsigned long b[], int size) +{ + int err; + int length; + int transferred; + int i; + int bufind; + int pageind; + int bhind; + int offset; + unsigned long blocknr; + struct kiobuf * iobuf = NULL; + unsigned long page; + unsigned long bounce; + struct page * map; + struct buffer_head *tmp, *bh[KIO_MAX_SECTORS]; + + /* + * First, do some alignment and validity checks + */ + for (i = 0; i < nr; i++) { + iobuf = iovec[i]; + if ((iobuf->offset & (size-1)) || + (iobuf->length & (size-1))) + return -EINVAL; + if (!iobuf->locked) + panic("brw_kiovec: iobuf not locked for I/O"); + if (!iobuf->nr_pages) + panic("brw_kiovec: iobuf not initialised"); + } + + /* DEBUG */ +#if 0 + return iobuf->length; +#endif + dprintk ("brw_kiovec: start\n"); + + /* + * OK to walk down the iovec doing page IO on each page we find. + */ + bufind = bhind = transferred = err = 0; + for (i = 0; i < nr; i++) { + iobuf = iovec[i]; + err = setup_kiobuf_bounce_pages(iobuf, GFP_USER); + if (err) + goto finished; + if (rw & WRITE) + kiobuf_copy_bounce(iobuf, COPY_TO_BOUNCE, -1); + + offset = iobuf->offset; + length = iobuf->length; + dprintk ("iobuf %d %d %d\n", offset, length, size); + + for (pageind = 0; pageind < iobuf->nr_pages; pageind++) { + map = iobuf->maplist[pageind]; + bounce = iobuf->bouncelist[pageind]; + + if (bounce) + page = bounce; + else + page = iobuf->pagelist[pageind]; + + while (length > 0) { + blocknr = b[bufind++]; + tmp = get_unused_buffer_head(0); + if (!tmp) { + err = -ENOMEM; + goto error; + } + + tmp->b_dev = B_FREE; + tmp->b_size = size; + tmp->b_data = (char *) (page + offset); + tmp->b_this_page = tmp; + + init_buffer(tmp, dev, blocknr, + end_buffer_io_sync, NULL); + if (rw == WRITE) { + set_bit(BH_Uptodate, &tmp->b_state); + set_bit(BH_Dirty, &tmp->b_state); + } + + dprintk ("buffer %d (%d) at %p\n", + bhind, tmp->b_blocknr, tmp->b_data); + bh[bhind++] = tmp; + length -= size; + offset += size; + + /* + * Start the IO if we have got too much or if + * this is the end of the last iobuf + */ + if (bhind >= KIO_MAX_SECTORS) { + err = do_kio(rw, bhind, bh, size); + if (err >= 0) + transferred += err; + else + goto finished; + bhind = 0; + } + + if (offset >= PAGE_SIZE) { + offset = 0; + break; + } + } /* End of block loop */ + } /* End of page loop */ + } /* End of iovec loop */ + + /* Is there any IO still left to submit? */ + if (bhind) { + err = do_kio(rw, bhind, bh, size); + if (err >= 0) + transferred += err; + else + goto finished; + } + + finished: + dprintk ("brw_kiovec: end (%d, %d)\n", transferred, err); + + cleanup_bounce_buffers(rw, nr, iovec, transferred); + + if (transferred) + return transferred; + return err; + + error: + /* We got an error allocation the bh'es. Just free the current + buffer_heads and exit. */ + for (i = bhind; --i >= 0; ) { + free_async_buffers(bh[bhind]); + } + + clear_kiobuf_bounce_pages(iobuf); + + goto finished; +} + /* * Start I/O on a page. * This function expects the page to be locked and may return before I/O is complete. @@ -1395,15 +1654,46 @@ set_bit(PG_locked, &page->flags); set_bit(PG_free_after, &page->flags); + /* Blocks within a page */ i = PAGE_SIZE >> inode->i_sb->s_blocksize_bits; - block = page->offset >> inode->i_sb->s_blocksize_bits; - p = nr; - do { - *p = inode->i_op->bmap(inode, block); - i--; - block++; - p++; - } while (i > 0); + + block = pgoff2ulong(page->index); + /* Scaled already by PAGE_SHIFT, which said shift should + be same or larger, than that of any filesystem in + this system -- that is, at i386 with 4k pages one + can't use 8k (primitive) blocks at the filesystems... */ + + if (i > 0) { + /* Filesystem blocksize is same, or smaller than CPU + page size, we can easily process this.. */ + + if (i > 1) + block *= i; + /* Scale by FS blocks per page, presuming FS-blocks are smaller + than the processor page... */ + + p = nr; + do { + *p = inode->i_op->bmap(inode, block); + i--; + block++; + p++; + } while (i > 0); + } else { + /* Filesystem blocksize is larger than CPU page size, + but if the underlying storage system block size is + smaller than CPU page size, all is well, else we + are in deep trouble -- for direct paging in at least.. */ + /* Nobody needs such monsterous fs block sizes ? + Well, it is the only way to get files in terabyte + range.. Nobody needs them ? You are for a surprise.. + However EXT2 (at least) needs access to internal + blocks and there it needs allocations of 8k/16k (or + whatever the block size is) for internal uses.. + Fixing this function alone isn't enough, although + perhaps fairly trivial.. */ + /* FIXME: WRITE THE CODE HERE !!! */ + } /* IO start */ brw_page(READ, page, inode->i_dev, nr, inode->i_sb->s_blocksize, 1); @@ -1457,6 +1747,7 @@ } tmp->b_this_page = bh; free_list[isize] = bh; + mem_map[MAP_NR(page)].flags = 0; mem_map[MAP_NR(page)].buffers = bh; buffermem += PAGE_SIZE; return 1; @@ -1468,28 +1759,21 @@ #define BUFFER_BUSY_BITS ((1<b_count || ((bh)->b_state & BUFFER_BUSY_BITS)) -static int sync_page_buffers(struct buffer_head *bh, int wait) +static void sync_page_buffers(struct buffer_head *bh) { struct buffer_head * tmp = bh; do { struct buffer_head *p = tmp; tmp = tmp->b_this_page; - if (buffer_locked(p)) { - if (wait) - __wait_on_buffer(p); - } else if (buffer_dirty(p)) - ll_rw_block(WRITE, 1, &p); - } while (tmp != bh); - do { - struct buffer_head *p = tmp; - tmp = tmp->b_this_page; - if (buffer_busy(p)) - return 1; + if (buffer_dirty(p)) + ll_rw_block(WRITEA, 1, &p); + + if (buffer_dirty(p)) + if (test_and_set_bit(BH_Wait_IO, &p->b_state)) + ll_rw_block(WRITE, 1, &p); } while (tmp != bh); - - return 0; } /* @@ -1499,10 +1783,9 @@ * Wake up bdflush() if this fails - if we're running low on memory due * to dirty buffers, we need to flush them out as quickly as possible. */ -int try_to_free_buffers(struct page * page_map, int wait) +int try_to_free_buffers(struct page * page_map, int gfp_mask) { struct buffer_head * tmp, * bh = page_map->buffers; - int too_many; tmp = bh; do { @@ -1511,8 +1794,6 @@ tmp = tmp->b_this_page; } while (tmp != bh); - succeed: - tmp = bh; do { struct buffer_head * p = tmp; tmp = tmp->b_this_page; @@ -1531,25 +1812,12 @@ return 1; busy: - too_many = (nr_buffers * bdf_prm.b_un.nfract/100); + if (gfp_mask & __GFP_IO) + sync_page_buffers(bh); - if (!sync_page_buffers(bh, wait)) { - - /* If a high percentage of the buffers are dirty, - * wake kflushd - */ - if (nr_buffers_type[BUF_DIRTY] > too_many) - wakeup_bdflush(0); - - /* - * We can jump after the busy check because - * we rely on the kernel lock. - */ - goto succeed; - } - - if(nr_buffers_type[BUF_DIRTY] > too_many) + if (balance_dirty_state(NODEV) >= 0) wakeup_bdflush(0); + return 0; } @@ -1561,7 +1829,7 @@ int found = 0, locked = 0, dirty = 0, used = 0, lastused = 0; int protected = 0; int nlist; - static char *buf_types[NR_LIST] = {"CLEAN","LOCKED","DIRTY"}; + static char *buf_types[NR_LIST] = {"CLEAN","LOCKED","DIRTY","PROTECTED",}; printk("Buffer memory: %8ldkB\n",buffermem>>10); printk("Buffer heads: %6d\n",nr_buffer_heads); @@ -1585,7 +1853,7 @@ used++, lastused = found; bh = bh->b_next_free; } while (bh != lru_list[nlist]); - printk("%8s: %d buffers, %d used (last=%d), " + printk("%9s: %d buffers, %d used (last=%d), " "%d locked, %d protected, %d dirty\n", buf_types[nlist], found, used, lastused, locked, protected, dirty); @@ -1757,7 +2025,6 @@ if (ncount) printk("sync_old_buffers: %d dirty buffers not on dirty list\n", ncount); printk("Wrote %d/%d buffers\n", nwritten, ndirty); #endif - run_task_queue(&tq_disk); return 0; } @@ -1930,7 +2197,8 @@ /* If there are still a lot of dirty buffers around, skip the sleep and flush some more */ - if(ndirty == 0 || nr_buffers_type[BUF_DIRTY] <= nr_buffers * bdf_prm.b_un.nfract/100) { + if (!ndirty || balance_dirty_state(NODEV) < 0) + { spin_lock_irq(¤t->sigmask_lock); flush_signals(current); spin_unlock_irq(¤t->sigmask_lock); @@ -1954,13 +2222,18 @@ tsk->session = 1; tsk->pgrp = 1; strcpy(tsk->comm, "kupdate"); + + /* sigstop and sigcont will stop and wakeup kupdate */ + spin_lock_irq(&tsk->sigmask_lock); sigfillset(&tsk->blocked); - /* sigcont will wakeup kupdate after setting interval to 0 */ sigdelset(&tsk->blocked, SIGCONT); + sigdelset(&tsk->blocked, SIGSTOP); + spin_unlock_irq(&tsk->sigmask_lock); lock_kernel(); for (;;) { + /* update interval */ interval = bdf_prm.b_un.interval; if (interval) { @@ -1969,8 +2242,24 @@ } else { + stop_kupdate: tsk->state = TASK_STOPPED; schedule(); /* wait for SIGCONT */ + } + /* check for sigstop */ + if (signal_pending(tsk)) + { + int stopped = 0; + spin_lock_irq(&tsk->sigmask_lock); + if (sigismember(&tsk->signal, SIGSTOP)) + { + sigdelset(&tsk->signal, SIGSTOP); + stopped = 1; + } + recalc_sigpending(tsk); + spin_unlock_irq(&tsk->sigmask_lock); + if (stopped) + goto stop_kupdate; } #ifdef DEBUG printk("kupdate() activated...\n"); diff -urN 2.2.17pre19/fs/coda/dir.c 2.2.17pre19aa2/fs/coda/dir.c --- 2.2.17pre19/fs/coda/dir.c Tue Aug 22 14:53:58 2000 +++ 2.2.17pre19aa2/fs/coda/dir.c Sat Aug 26 17:13:51 2000 @@ -749,7 +749,7 @@ char *name = vdirent->d_name; errfill = filldir(getdent, name, namlen, - offs, ino); + offs, ino, DT_UNKNOWN); CDEBUG(D_FILE, "entry %d: ino %ld, namlen %d, reclen %d, type %d, pos %d, string_offs %d, name %*s, offset %d, result: %d, errfill: %d.\n", i,vdirent->d_fileno, vdirent->d_namlen, vdirent->d_reclen, vdirent->d_type, pos, string_offset, vdirent->d_namlen, vdirent->d_name, (u_int) offs, result, errfill); /* errfill means no space for filling in this round */ if ( errfill < 0 ) { diff -urN 2.2.17pre19/fs/coda/file.c 2.2.17pre19aa2/fs/coda/file.c --- 2.2.17pre19/fs/coda/file.c Tue Aug 22 14:53:59 2000 +++ 2.2.17pre19aa2/fs/coda/file.c Sat Aug 26 17:13:52 2000 @@ -99,7 +99,7 @@ &cont_file, &cont_dentry); CDEBUG(D_INODE, "coda ino: %ld, cached ino %ld, page offset: %lx\n", - coda_inode->i_ino, cii->c_ovp->i_ino, page->offset); + coda_inode->i_ino, cii->c_ovp->i_ino, pgoff2ulong(page->index)); generic_readpage(&cont_file, page); EXIT; @@ -190,10 +190,10 @@ return -1; } - down(&cont_inode->i_sem); + fs_down(&cont_inode->i_sem); result = cont_file.f_op->write(&cont_file , buff, count, &(cont_file.f_pos)); - up(&cont_inode->i_sem); + fs_up(&cont_inode->i_sem); coda_restore_codafile(coda_inode, coda_file, cont_inode, &cont_file); if (result) @@ -228,14 +228,14 @@ coda_prepare_openfile(coda_inode, coda_file, cont_inode, &cont_file, &cont_dentry); - down(&cont_inode->i_sem); + fs_down(&cont_inode->i_sem); result = file_fsync(&cont_file ,&cont_dentry); if ( result == 0 ) { result = venus_fsync(coda_inode->i_sb, &(cnp->c_fid)); } - up(&cont_inode->i_sem); + fs_up(&cont_inode->i_sem); coda_restore_codafile(coda_inode, coda_file, cont_inode, &cont_file); return result; diff -urN 2.2.17pre19/fs/dcache.c 2.2.17pre19aa2/fs/dcache.c --- 2.2.17pre19/fs/dcache.c Tue Jun 13 03:48:14 2000 +++ 2.2.17pre19aa2/fs/dcache.c Sat Aug 26 17:13:52 2000 @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -475,9 +476,9 @@ */ void shrink_dcache_memory(int priority, unsigned int gfp_mask) { - if (gfp_mask & __GFP_IO) { + if (gfp_mask & __GFP_IO && !current->fs_locks) { int count = 0; - if (priority) + if (priority > 1) count = dentry_stat.nr_unused / priority; prune_dcache(count, -1); } @@ -927,7 +928,11 @@ if (!dentry_cache) panic("Cannot create dentry cache"); +#ifndef CONFIG_BIGMEM memory_size = num_physpages << PAGE_SHIFT; +#else + memory_size = bigmem_mapnr << PAGE_SHIFT; +#endif memory_size >>= 13; memory_size *= 2 * sizeof(void *); for (order = 0; ((1UL << order) << PAGE_SHIFT) < memory_size; order++); diff -urN 2.2.17pre19/fs/devpts/root.c 2.2.17pre19aa2/fs/devpts/root.c --- 2.2.17pre19/fs/devpts/root.c Mon Jan 17 16:44:41 2000 +++ 2.2.17pre19aa2/fs/devpts/root.c Sat Aug 26 17:13:51 2000 @@ -86,12 +86,12 @@ switch(nr) { case 0: - if (filldir(dirent, ".", 1, nr, inode->i_ino) < 0) + if (filldir(dirent, ".", 1, nr, inode->i_ino, DT_DIR) < 0) return 0; filp->f_pos = ++nr; /* fall through */ case 1: - if (filldir(dirent, "..", 2, nr, inode->i_ino) < 0) + if (filldir(dirent, "..", 2, nr, inode->i_ino, DT_DIR) < 0) return 0; filp->f_pos = ++nr; /* fall through */ @@ -100,7 +100,7 @@ int ptynr = nr - 2; if ( sbi->inodes[ptynr] ) { genptsname(numbuf, ptynr); - if ( filldir(dirent, numbuf, strlen(numbuf), nr, nr) < 0 ) + if ( filldir(dirent, numbuf, strlen(numbuf), nr, nr, DT_CHR) < 0 ) return 0; } filp->f_pos = ++nr; diff -urN 2.2.17pre19/fs/dquot.c 2.2.17pre19aa2/fs/dquot.c --- 2.2.17pre19/fs/dquot.c Thu May 4 13:00:39 2000 +++ 2.2.17pre19aa2/fs/dquot.c Sat Aug 26 17:13:48 2000 @@ -570,7 +570,7 @@ */ if (prune_dcache(0, 128)) { - free_inode_memory(10); + free_inode_memory(); goto repeat; } diff -urN 2.2.17pre19/fs/efs/dir.c 2.2.17pre19aa2/fs/efs/dir.c --- 2.2.17pre19/fs/efs/dir.c Mon Jan 17 16:44:41 2000 +++ 2.2.17pre19aa2/fs/efs/dir.c Sat Aug 26 17:13:51 2000 @@ -107,7 +107,7 @@ filp->f_pos = (block << EFS_DIRBSIZE_BITS) | slot; /* copy filename and data in dirslot */ - filldir(dirent, nameptr, namelen, filp->f_pos, inodenum); + filldir(dirent, nameptr, namelen, filp->f_pos, inodenum, DT_UNKNOWN); /* sanity check */ if (nameptr - (char *) dirblock + namelen > EFS_DIRBSIZE) { diff -urN 2.2.17pre19/fs/ext2/dir.c 2.2.17pre19aa2/fs/ext2/dir.c --- 2.2.17pre19/fs/ext2/dir.c Thu May 4 13:00:39 2000 +++ 2.2.17pre19aa2/fs/ext2/dir.c Sat Aug 26 17:13:51 2000 @@ -32,6 +32,10 @@ return -EISDIR; } +static unsigned char ext2_filetype_table[] = { + DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK +}; + static int ext2_readdir(struct file *, void *, filldir_t); static struct file_operations ext2_dir_operations = { @@ -201,10 +205,14 @@ * the descriptor. */ unsigned long version = filp->f_version; + unsigned char d_type = DT_UNKNOWN; + if (EXT2_HAS_INCOMPAT_FEATURE(sb, EXT2_FEATURE_INCOMPAT_FILETYPE) + && de->file_type < EXT2_FT_MAX) + d_type = ext2_filetype_table[de->file_type]; error = filldir(dirent, de->name, de->name_len, - filp->f_pos, le32_to_cpu(de->inode)); + filp->f_pos, le32_to_cpu(de->inode), d_type); if (error) break; if (version != filp->f_version) diff -urN 2.2.17pre19/fs/ext2/file.c 2.2.17pre19aa2/fs/ext2/file.c --- 2.2.17pre19/fs/ext2/file.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/ext2/file.c Sat Aug 26 17:13:51 2000 @@ -39,10 +39,6 @@ static long long ext2_file_lseek(struct file *, long long, int); static ssize_t ext2_file_write (struct file *, const char *, size_t, loff_t *); static int ext2_release_file (struct inode *, struct file *); -#if BITS_PER_LONG < 64 -static int ext2_open_file (struct inode *, struct file *); - -#else #define EXT2_MAX_SIZE(bits) \ (((EXT2_NDIR_BLOCKS + (1LL << (bits - 2)) + \ @@ -55,8 +51,6 @@ EXT2_MAX_SIZE(10), EXT2_MAX_SIZE(11), EXT2_MAX_SIZE(12), EXT2_MAX_SIZE(13) }; -#endif - /* * We have mostly NULL's here: the current defaults are ok for * the ext2 filesystem. @@ -69,11 +63,7 @@ NULL, /* poll - default */ ext2_ioctl, /* ioctl */ generic_file_mmap, /* mmap */ -#if BITS_PER_LONG == 64 NULL, /* no special open is needed */ -#else - ext2_open_file, -#endif NULL, /* flush */ ext2_release_file, /* release */ ext2_sync_file, /* fsync */ @@ -121,12 +111,8 @@ offset += file->f_pos; } if (((unsigned long long) offset >> 32) != 0) { -#if BITS_PER_LONG < 64 - return -EINVAL; -#else if (offset > ext2_max_sizes[EXT2_BLOCK_SIZE_BITS(inode->i_sb)]) return -EINVAL; -#endif } if (offset != file->f_pos) { file->f_pos = offset; @@ -155,7 +141,7 @@ size_t count, loff_t *ppos) { struct inode * inode = filp->f_dentry->d_inode; - off_t pos; + loff_t pos; long block; int offset; int written, c; @@ -202,24 +188,18 @@ /* Check for overflow.. */ -#if BITS_PER_LONG < 64 - /* If the fd's pos is already greater than or equal to the file - * descriptor's offset maximum, then we need to return EFBIG for - * any non-zero count (and we already tested for zero above). */ - if (((unsigned) pos) >= 0x7FFFFFFFUL) - return -EFBIG; - - /* If we are about to overflow the maximum file size, we also - * need to return the error, but only if no bytes can be written - * successfully. */ - if (((unsigned) pos + count) > 0x7FFFFFFFUL) { - count = 0x7FFFFFFFL - pos; - if (((signed) count) < 0) + /* L-F-S spec 2.2.1.27: */ + if (!(filp->f_flags & O_LARGEFILE)) { + if (pos >= 0x7ffffffeULL) /* pos@2G forbidden */ return -EFBIG; + + if (pos + count >= 0x7fffffffULL) + /* Write only until end of allowed region */ + count = 0x7fffffffULL - pos; } -#else + { - off_t max = ext2_max_sizes[EXT2_BLOCK_SIZE_BITS(sb)]; + loff_t max = ext2_max_sizes[EXT2_BLOCK_SIZE_BITS(sb)]; if (pos >= max) return -EFBIG; @@ -239,7 +219,6 @@ mark_buffer_dirty(sb->u.ext2_sb.s_sbh, 1); } } -#endif /* From SUS: We must generate a SIGXFSZ for file size overflow * only if no bytes were actually written to the file. --sct */ @@ -382,15 +361,3 @@ return 0; } -#if BITS_PER_LONG < 64 -/* - * Called when an inode is about to be open. - * We use this to disallow opening RW large files on 32bit systems. - */ -static int ext2_open_file (struct inode * inode, struct file * filp) -{ - if (inode->u.ext2_i.i_high_size && (filp->f_mode & FMODE_WRITE)) - return -EFBIG; - return 0; -} -#endif diff -urN 2.2.17pre19/fs/ext2/inode.c 2.2.17pre19aa2/fs/ext2/inode.c --- 2.2.17pre19/fs/ext2/inode.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/ext2/inode.c Sat Aug 26 17:13:51 2000 @@ -533,15 +533,8 @@ inode->u.ext2_i.i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl); else { inode->u.ext2_i.i_dir_acl = 0; - inode->u.ext2_i.i_high_size = - le32_to_cpu(raw_inode->i_size_high); -#if BITS_PER_LONG < 64 - if (raw_inode->i_size_high) - inode->i_size = (__u32)-1; -#else - inode->i_size |= ((__u64)le32_to_cpu(raw_inode->i_size_high)) - << 32; -#endif + inode->i_size = ((__u64)(inode->i_size & 0xFFFFFFFFUL)) | + (((__u64)le32_to_cpu(raw_inode->i_size_high)) << 32); } inode->u.ext2_i.i_version = le32_to_cpu(raw_inode->i_version); inode->i_generation = inode->u.ext2_i.i_version; @@ -667,12 +660,7 @@ if (S_ISDIR(inode->i_mode)) raw_inode->i_dir_acl = cpu_to_le32(inode->u.ext2_i.i_dir_acl); else { -#if BITS_PER_LONG < 64 - raw_inode->i_size_high = - cpu_to_le32(inode->u.ext2_i.i_high_size); -#else raw_inode->i_size_high = cpu_to_le32(inode->i_size >> 32); -#endif } raw_inode->i_version = cpu_to_le32(inode->u.ext2_i.i_version); if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) @@ -728,21 +716,18 @@ } if (iattr->ia_valid & ATTR_SIZE) { - off_t size = iattr->ia_size; + loff_t size = iattr->ia_size; unsigned long limit = current->rlim[RLIMIT_FSIZE].rlim_cur; if (size < 0) return -EINVAL; -#if BITS_PER_LONG == 64 if (size > ext2_max_sizes[EXT2_BLOCK_SIZE_BITS(inode->i_sb)]) return -EFBIG; -#endif if (limit < RLIM_INFINITY && size > limit) { send_sig(SIGXFSZ, current, 0); return -EFBIG; } -#if BITS_PER_LONG == 64 if (size >> 33) { struct super_block *sb = inode->i_sb; struct ext2_super_block *es = sb->u.ext2_sb.s_es; @@ -755,7 +740,6 @@ mark_buffer_dirty(sb->u.ext2_sb.s_sbh, 1); } } -#endif } retval = inode_change_ok(inode, iattr); diff -urN 2.2.17pre19/fs/ext2/truncate.c 2.2.17pre19aa2/fs/ext2/truncate.c --- 2.2.17pre19/fs/ext2/truncate.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/ext2/truncate.c Sat Aug 26 17:13:51 2000 @@ -53,9 +53,10 @@ * Currently we always hold the inode semaphore during truncate, so * there's no need to test for changes during the operation. */ -#define DIRECT_BLOCK(inode) \ - ((inode->i_size + inode->i_sb->s_blocksize - 1) / \ - inode->i_sb->s_blocksize) +#define DIRECT_BLOCK(inode) \ + ((long) \ + ((inode->i_size + inode->i_sb->s_blocksize - 1) >> \ + inode->i_sb->s_blocksize_bits)) #define INDIRECT_BLOCK(inode,offset) ((int)DIRECT_BLOCK(inode) - offset) #define DINDIRECT_BLOCK(inode,offset) \ (INDIRECT_BLOCK(inode,offset) / addr_per_block) diff -urN 2.2.17pre19/fs/fat/dir.c 2.2.17pre19aa2/fs/fat/dir.c --- 2.2.17pre19/fs/fat/dir.c Tue Jun 13 03:48:14 2000 +++ 2.2.17pre19aa2/fs/fat/dir.c Sat Aug 26 17:13:51 2000 @@ -313,7 +313,7 @@ /* Fake . and .. for the root directory. */ if (inode->i_ino == MSDOS_ROOT_INO) { while (cpos < 2) { - if (filldir(dirent, "..", cpos+1, cpos, MSDOS_ROOT_INO) < 0) + if (filldir(dirent, "..", cpos+1, cpos, MSDOS_ROOT_INO, DT_DIR) < 0) return 0; cpos++; filp->f_pos++; @@ -456,7 +456,8 @@ if (!long_slots||shortnames) { if (both) bufname[i] = '\0'; - if (filldir(dirent, bufname, i, *furrfu, inum) < 0) + if (filldir(dirent, bufname, i, *furrfu, inum, + (de->attr & ATTR_DIR) ? DT_DIR : DT_REG) < 0) goto FillFailed; } else { char longname[275]; @@ -467,7 +468,8 @@ memcpy(&longname[long_len+1], bufname, i); long_len += i; } - if (filldir(dirent, longname, long_len, *furrfu, inum) < 0) + if (filldir(dirent, longname, long_len, *furrfu, inum, + (de->attr & ATTR_DIR) ? DT_DIR : DT_REG) < 0) goto FillFailed; } @@ -497,7 +499,8 @@ const char * name, int name_len, off_t offset, - ino_t ino) + ino_t ino, + unsigned int d_type) { struct dirent *d1 = (struct dirent *)buf; struct dirent *d2 = d1 + 1; diff -urN 2.2.17pre19/fs/fat/file.c 2.2.17pre19aa2/fs/fat/file.c --- 2.2.17pre19/fs/fat/file.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/fat/file.c Sat Aug 26 17:13:51 2000 @@ -227,7 +227,7 @@ Each time we process one block in bhlist, we replace it by a new prefetch block if needed. */ - PRINTK (("#### ino %ld pos %ld size %ld count %d\n",inode->i_ino,*ppos,inode->i_size,count)); + PRINTK (("#### ino %ld pos %ld size %ld count %d\n",inode->i_ino,*ppos,(u_long)inode->i_size,count)); { /* We must prefetch complete block, so we must @@ -253,7 +253,7 @@ } pre.nolist = 0; PRINTK (("count %d ahead %d nblist %d\n",count,read_ahead[MAJOR(inode->i_dev)],pre.nblist)); - while ((left_in_file = inode->i_size - *ppos) > 0 + while ((left_in_file = (u_long)inode->i_size - *ppos) > 0 && buf < end){ struct buffer_head *bh = pre.bhlist[pre.nolist]; char *data; @@ -451,7 +451,7 @@ void fat_truncate(struct inode *inode) { - int cluster; + int cluster_bytes, cluster_shift; /* Why no return value? Surely the disk could fail... */ if (IS_IMMUTABLE(inode)) @@ -460,8 +460,10 @@ printk("FAT: fat_truncate called though fs is read-only, uhh...\n"); return /* -EROFS */; } - cluster = SECTOR_SIZE*MSDOS_SB(inode->i_sb)->cluster_size; - (void) fat_free(inode,(inode->i_size+(cluster-1))/cluster); + cluster_bytes = SECTOR_SIZE * MSDOS_SB(inode->i_sb)->cluster_size; + cluster_shift = fslog2(cluster_bytes); + (void) fat_free(inode, + (inode->i_size+(cluster_bytes-1)) >> cluster_shift); MSDOS_I(inode)->i_attrs |= ATTR_ARCH; mark_inode_dirty(inode); } diff -urN 2.2.17pre19/fs/fat/inode.c 2.2.17pre19aa2/fs/fat/inode.c --- 2.2.17pre19/fs/fat/inode.c Tue Jun 13 03:48:14 2000 +++ 2.2.17pre19aa2/fs/fat/inode.c Sat Aug 26 17:13:51 2000 @@ -392,8 +392,9 @@ sizeof(struct msdos_dir_entry); } inode->i_blksize = MSDOS_SB(sb)->cluster_size* SECTOR_SIZE; - inode->i_blocks = (inode->i_size+inode->i_blksize-1)/ - inode->i_blksize*MSDOS_SB(sb)->cluster_size; + inode->i_blocks = (((inode->i_size+inode->i_blksize-1) >> + fslog2(inode->i_blksize)) * + MSDOS_SB(sb)->cluster_size); MSDOS_I(inode)->i_logstart = 0; MSDOS_I(inode)->i_attrs = 0; @@ -823,8 +824,9 @@ MSDOS_I(inode)->i_attrs = de->attr & ATTR_UNUSED; /* this is as close to the truth as we can get ... */ inode->i_blksize = MSDOS_SB(sb)->cluster_size*SECTOR_SIZE; - inode->i_blocks = (inode->i_size+inode->i_blksize-1)/ - inode->i_blksize*MSDOS_SB(sb)->cluster_size; + inode->i_blocks = (((inode->i_size+inode->i_blksize-1) >> + fslog2(inode->i_blksize)) * + MSDOS_SB(sb)->cluster_size); inode->i_mtime = inode->i_atime = date_dos2unix(CF_LE_W(de->time),CF_LE_W(de->date)); inode->i_ctime = diff -urN 2.2.17pre19/fs/fcntl.c 2.2.17pre19aa2/fs/fcntl.c --- 2.2.17pre19/fs/fcntl.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/fcntl.c Sat Aug 26 17:13:51 2000 @@ -143,17 +143,11 @@ return 0; } -asmlinkage long sys_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg) -{ - struct file * filp; - long err = -EBADF; - - lock_kernel(); - filp = fget(fd); - if (!filp) - goto out; +static long do_fcntl(unsigned int fd, unsigned int cmd, + unsigned long arg, struct file * filp) +{ + long err = 0; - err = 0; switch (cmd) { case F_DUPFD: err = dupfd(fd, arg); @@ -217,11 +211,60 @@ err = sock_fcntl (filp, cmd, arg); break; } + + return err; +} + +asmlinkage long sys_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg) +{ + struct file * filp; + long err = -EBADF; + + lock_kernel(); + filp = fget(fd); + if (!filp) + goto out; + + err = do_fcntl(fd, cmd, arg, filp); + + fput(filp); +out: + unlock_kernel(); + return err; +} + +#if BITS_PER_LONG == 32 +asmlinkage long sys_fcntl64(unsigned int fd, unsigned int cmd, unsigned long arg) +{ + struct file * filp; + long err = -EBADF; + + lock_kernel(); + filp = fget(fd); + if (!filp) + goto out; + + switch (cmd) { + case F_GETLK64: + err = fcntl_getlk64(fd, (struct flock64 *) arg); + break; + case F_SETLK64: + err = fcntl_setlk64(fd, cmd, (struct flock64 *) arg); + break; + case F_SETLKW64: + err = fcntl_setlk64(fd, cmd, (struct flock64 *) arg); + break; + default: + err = do_fcntl(fd, cmd, arg, filp); + break; + } + fput(filp); out: unlock_kernel(); return err; } +#endif static void send_sigio(struct fown_struct *fown, struct fasync_struct *fa) { diff -urN 2.2.17pre19/fs/hfs/dir_cap.c 2.2.17pre19aa2/fs/hfs/dir_cap.c --- 2.2.17pre19/fs/hfs/dir_cap.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/hfs/dir_cap.c Sat Aug 26 17:13:51 2000 @@ -243,7 +243,7 @@ if (filp->f_pos == 0) { /* Entry 0 is for "." */ - if (filldir(dirent, DOT->Name, DOT_LEN, 0, dir->i_ino)) { + if (filldir(dirent, DOT->Name, DOT_LEN, 0, dir->i_ino, DT_DIR)) { return 0; } filp->f_pos = 1; @@ -260,7 +260,7 @@ } if (filldir(dirent, DOT_DOT->Name, - DOT_DOT_LEN, 1, ntohl(cnid))) { + DOT_DOT_LEN, 1, ntohl(cnid), DT_DIR)) { return 0; } filp->f_pos = 2; @@ -287,7 +287,7 @@ len = hfs_namein(dir, tmp_name, &((struct hfs_cat_key *)brec.key)->CName); if (filldir(dirent, tmp_name, len, - filp->f_pos, ino)) { + filp->f_pos, ino, DT_UNKNOWN)) { hfs_cat_close(entry, &brec); return 0; } @@ -303,7 +303,7 @@ /* In root dir last-2 entry is for ".rootinfo" */ if (filldir(dirent, DOT_ROOTINFO->Name, DOT_ROOTINFO_LEN, filp->f_pos, - ntohl(entry->cnid) | HFS_CAP_FNDR)) { + ntohl(entry->cnid) | HFS_CAP_FNDR, DT_UNKNOWN)) { return 0; } } @@ -315,7 +315,7 @@ /* In normal dirs last-1 entry is for ".finderinfo" */ if (filldir(dirent, DOT_FINDERINFO->Name, DOT_FINDERINFO_LEN, filp->f_pos, - ntohl(entry->cnid) | HFS_CAP_FDIR)) { + ntohl(entry->cnid) | HFS_CAP_FDIR, DT_UNKNOWN)) { return 0; } } @@ -327,7 +327,7 @@ /* In normal dirs last entry is for ".resource" */ if (filldir(dirent, DOT_RESOURCE->Name, DOT_RESOURCE_LEN, filp->f_pos, - ntohl(entry->cnid) | HFS_CAP_RDIR)) { + ntohl(entry->cnid) | HFS_CAP_RDIR, DT_UNKNOWN)) { return 0; } } diff -urN 2.2.17pre19/fs/hfs/dir_dbl.c 2.2.17pre19aa2/fs/hfs/dir_dbl.c --- 2.2.17pre19/fs/hfs/dir_dbl.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/hfs/dir_dbl.c Sat Aug 26 17:13:51 2000 @@ -206,7 +206,7 @@ if (filp->f_pos == 0) { /* Entry 0 is for "." */ - if (filldir(dirent, DOT->Name, DOT_LEN, 0, dir->i_ino)) { + if (filldir(dirent, DOT->Name, DOT_LEN, 0, dir->i_ino, DT_DIR)) { return 0; } filp->f_pos = 1; @@ -215,7 +215,7 @@ if (filp->f_pos == 1) { /* Entry 1 is for ".." */ if (filldir(dirent, DOT_DOT->Name, DOT_DOT_LEN, 1, - hfs_get_hl(entry->key.ParID))) { + hfs_get_hl(entry->key.ParID), DT_DIR)) { return 0; } filp->f_pos = 2; @@ -252,7 +252,7 @@ &((struct hfs_cat_key *)brec.key)->CName); } - if (filldir(dirent, tmp_name, len, filp->f_pos, ino)) { + if (filldir(dirent, tmp_name, len, filp->f_pos, ino, DT_UNKNOWN)) { hfs_cat_close(entry, &brec); return 0; } @@ -266,7 +266,7 @@ /* In root dir last entry is for "%RootInfo" */ if (filldir(dirent, PCNT_ROOTINFO->Name, PCNT_ROOTINFO_LEN, filp->f_pos, - ntohl(entry->cnid) | HFS_DBL_HDR)) { + ntohl(entry->cnid) | HFS_DBL_HDR, DT_UNKNOWN)) { return 0; } } diff -urN 2.2.17pre19/fs/hfs/dir_nat.c 2.2.17pre19aa2/fs/hfs/dir_nat.c --- 2.2.17pre19/fs/hfs/dir_nat.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/hfs/dir_nat.c Sat Aug 26 17:13:51 2000 @@ -231,7 +231,7 @@ if (filp->f_pos == 0) { /* Entry 0 is for "." */ - if (filldir(dirent, DOT->Name, DOT_LEN, 0, dir->i_ino)) { + if (filldir(dirent, DOT->Name, DOT_LEN, 0, dir->i_ino, DT_DIR)) { return 0; } filp->f_pos = 1; @@ -248,7 +248,7 @@ } if (filldir(dirent, DOT_DOT->Name, - DOT_DOT_LEN, 1, ntohl(cnid))) { + DOT_DOT_LEN, 1, ntohl(cnid), DT_DIR)) { return 0; } filp->f_pos = 2; @@ -275,7 +275,7 @@ len = hfs_namein(dir, tmp_name, &((struct hfs_cat_key *)brec.key)->CName); if (filldir(dirent, tmp_name, len, - filp->f_pos, ino)) { + filp->f_pos, ino, DT_UNKNOWN)) { hfs_cat_close(entry, &brec); return 0; } @@ -290,14 +290,14 @@ /* In normal dirs entry 2 is for ".AppleDouble" */ if (filldir(dirent, DOT_APPLEDOUBLE->Name, DOT_APPLEDOUBLE_LEN, filp->f_pos, - ntohl(entry->cnid) | HFS_NAT_HDIR)) { + ntohl(entry->cnid) | HFS_NAT_HDIR, DT_UNKNOWN)) { return 0; } } else if (type == HFS_NAT_HDIR) { /* In .AppleDouble entry 2 is for ".Parent" */ if (filldir(dirent, DOT_PARENT->Name, DOT_PARENT_LEN, filp->f_pos, - ntohl(entry->cnid) | HFS_NAT_HDR)) { + ntohl(entry->cnid) | HFS_NAT_HDR, DT_UNKNOWN)) { return 0; } } @@ -310,7 +310,7 @@ (type == HFS_NAT_HDIR)) { if (filldir(dirent, ROOTINFO->Name, ROOTINFO_LEN, filp->f_pos, - ntohl(entry->cnid) | HFS_NAT_HDR)) { + ntohl(entry->cnid) | HFS_NAT_HDR, DT_UNKNOWN)) { return 0; } } diff -urN 2.2.17pre19/fs/hpfs/hpfs_fs.c 2.2.17pre19aa2/fs/hpfs/hpfs_fs.c --- 2.2.17pre19/fs/hpfs/hpfs_fs.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/hpfs/hpfs_fs.c Sat Aug 26 17:13:51 2000 @@ -1376,13 +1376,13 @@ break; case 0: - if (filldir(dirent, ".", 1, filp->f_pos, inode->i_ino) < 0) + if (filldir(dirent, ".", 1, filp->f_pos, inode->i_ino, DT_DIR) < 0) break; filp->f_pos = -1; /* fall through */ case -1: - if (filldir(dirent, "..", 2, filp->f_pos, inode->i_hpfs_parent_dir) < 0) + if (filldir(dirent, "..", 2, filp->f_pos, inode->i_hpfs_parent_dir, DT_DIR) < 0) break; filp->f_pos = 1; /* fall through */ @@ -1402,7 +1402,7 @@ else ino = file_ino(de->fnode); brelse4(&qbh); - if (filldir(dirent, tempname, namelen, old_pos, ino) < 0) { + if (filldir(dirent, tempname, namelen, old_pos, ino, DT_UNKNOWN) < 0) { filp->f_pos = old_pos; break; } diff -urN 2.2.17pre19/fs/inode.c 2.2.17pre19aa2/fs/inode.c --- 2.2.17pre19/fs/inode.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/inode.c Sat Aug 26 17:13:48 2000 @@ -435,7 +435,7 @@ * This is the externally visible routine for * inode memory management. */ -void free_inode_memory(int goal) +void free_inode_memory(void) { spin_lock(&inode_lock); free_inodes(); diff -urN 2.2.17pre19/fs/iobuf.c 2.2.17pre19aa2/fs/iobuf.c --- 2.2.17pre19/fs/iobuf.c Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/fs/iobuf.c Sat Aug 26 17:13:50 2000 @@ -0,0 +1,236 @@ +/* + * iobuf.c + * + * Keep track of the general-purpose IO-buffer structures used to track + * abstract kernel-space io buffers. + * + */ + +#include +#include +#include +#include + +static kmem_cache_t *kiobuf_cachep; + +void __init kiobuf_init(void) +{ + kiobuf_cachep = kmem_cache_create("kiobuf", + sizeof(struct kiobuf), + 0, + SLAB_HWCACHE_ALIGN, NULL, NULL); + if(!kiobuf_cachep) + panic("Cannot create kernel iobuf cache\n"); +} + + +int alloc_kiovec(int nr, struct kiobuf **bufp) +{ + int i; + struct kiobuf *iobuf; + + for (i = 0; i < nr; i++) { + iobuf = kmem_cache_alloc(kiobuf_cachep, SLAB_KERNEL); + if (!iobuf) { + free_kiovec(i, bufp); + return -ENOMEM; + } + + memset(iobuf, 0, sizeof(*iobuf)); + iobuf->array_len = KIO_STATIC_PAGES; + iobuf->pagelist = iobuf->page_array; + iobuf->maplist = iobuf->map_array; + iobuf->bouncelist = iobuf->bounce_array; + *bufp++ = iobuf; + } + + return 0; +} + +void clear_kiobuf_bounce_pages(struct kiobuf *iobuf) +{ + int i; + + if (!iobuf->bounced) + return; + + for (i = 0; i < iobuf->nr_pages; i++) { + unsigned long page = iobuf->bouncelist[i]; + if (page) + free_page(page); + } + iobuf->bounced = 0; +} + +void free_kiovec(int nr, struct kiobuf **bufp) +{ + struct kiobuf *iobuf; + int i; + + for (i = 0; i < nr; i++) { + iobuf = bufp[i]; + clear_kiobuf_bounce_pages(iobuf); + if (iobuf->array_len > KIO_STATIC_PAGES) { + kfree (iobuf->pagelist); + } + kmem_cache_free(kiobuf_cachep, bufp[i]); + } +} + +int expand_kiobuf(struct kiobuf *iobuf, int wanted) +{ + unsigned long * pagelist, * bouncelist; + struct page ** maplist; + + if (iobuf->array_len >= wanted) + return 0; + + /* + * kmalloc enough space for the page, map and bounce lists all + * at once. + */ + pagelist = (unsigned long *) + kmalloc(3 * wanted * sizeof(unsigned long), GFP_KERNEL); + if (!pagelist) + return -ENOMEM; + + /* Did it grow while we waited? */ + if (iobuf->array_len >= wanted) { + kfree(pagelist); + return 0; + } + + maplist = (struct page **) (pagelist + wanted); + bouncelist = pagelist + 2 * wanted; + + memcpy (pagelist, iobuf->pagelist, + iobuf->array_len * sizeof(unsigned long)); + memcpy (maplist, iobuf->maplist, + iobuf->array_len * sizeof(struct page **)); + memcpy (bouncelist, iobuf->bouncelist, + iobuf->array_len * sizeof(unsigned long)); + + if (iobuf->array_len > KIO_STATIC_PAGES) + kfree (iobuf->pagelist); + + iobuf->pagelist = pagelist; + iobuf->maplist = maplist; + iobuf->bouncelist = bouncelist; + iobuf->array_len = wanted; + return 0; +} + + +/* + * Test whether a given page from the bounce buffer matches the given + * gfp_mask. Return true if a bounce buffer is required for this + * page. + */ + +static inline int test_bounce_page(unsigned long page, + struct page * map, + int gfp_mask) +{ + /* Unmapped pages from PCI memory or BIGMEM pages always need a + * bounce buffer unless the caller is prepared to accept + * GFP_BIGMEM pages. */ + + if (!map || PageBIGMEM(map) ) + /* Careful, the following must return the right value + * even if CONFIG_BIGMEM is not set */ + return !(gfp_mask & __GFP_BIGMEM); + + /* A DMA-able page never needs a bounce buffer */ + if (PageDMA(map)) + return 0; + + /* Otherwise it is a non-ISA-DMA-capable page and needs bounce + * buffers if GFP_DMA is requested */ + return gfp_mask & __GFP_DMA; +} + +int setup_kiobuf_bounce_pages(struct kiobuf *iobuf, int gfp_mask) +{ + int i; + + clear_kiobuf_bounce_pages(iobuf); + + for (i = 0; i < iobuf->nr_pages; i++) { + struct page *map = iobuf->maplist[i]; + unsigned long page = iobuf->pagelist[i]; + unsigned long bounce_page; + + if (!test_bounce_page(page, map, gfp_mask)) { + iobuf->bouncelist[i] = 0; + continue; + } + + bounce_page = __get_free_page(gfp_mask); + if (!bounce_page) + goto error; + + iobuf->bouncelist[i] = bounce_page; + iobuf->bounced = 1; + } + return 0; + + error: + clear_kiobuf_bounce_pages(iobuf); + return -ENOMEM; +} + +/* + * Copy a bounce buffer. For completion of partially-failed read IOs, + * we need to be able to place an upper limit on the data successfully + * transferred from bounce buffers to the user's own buffers. + */ + +void kiobuf_copy_bounce(struct kiobuf *iobuf, int direction, int max) +{ + int i; + int offset, length; + + if (!iobuf->bounced) + return; + + offset = iobuf->offset; + length = iobuf->length; + if (max >= 0 && length > max) + length = max; + + i = 0; + + if (offset > PAGE_SIZE) { + i = (offset >> PAGE_SHIFT); + offset &= ~PAGE_MASK; + } + + for (; i < iobuf->nr_pages && length > 0; i++) { + unsigned long page = iobuf->pagelist[i]; + unsigned long bounce_page = iobuf->bouncelist[i]; + unsigned long kin, kout; + int pagelen = length; + + if (bounce_page) { + if ((pagelen+offset) > PAGE_SIZE) + pagelen = PAGE_SIZE - offset; + + if (direction == COPY_TO_BOUNCE) { + kin = kmap(page, KM_READ); + kout = kmap(bounce_page, KM_WRITE); + } else { + kin = kmap(bounce_page, KM_READ); + kout = kmap(page, KM_WRITE); + } + + memcpy((char *) (kout+offset), + (char *) (kin+offset), + pagelen); + kunmap(kout, KM_WRITE); + kunmap(kin, KM_READ); + } + + length -= pagelen; + offset = 0; + } +} diff -urN 2.2.17pre19/fs/isofs/dir.c 2.2.17pre19aa2/fs/isofs/dir.c --- 2.2.17pre19/fs/isofs/dir.c Tue Aug 22 14:53:59 2000 +++ 2.2.17pre19aa2/fs/isofs/dir.c Sat Aug 26 17:13:51 2000 @@ -220,7 +220,7 @@ /* Handle the case of the '.' directory */ if (de->name_len[0] == 1 && de->name[0] == 0) { - if (filldir(dirent, ".", 1, filp->f_pos, inode->i_ino) < 0) + if (filldir(dirent, ".", 1, filp->f_pos, inode->i_ino, DT_DIR) < 0) break; filp->f_pos += de_len; continue; @@ -231,7 +231,7 @@ /* Handle the case of the '..' directory */ if (de->name_len[0] == 1 && de->name[0] == 1) { inode_number = filp->f_dentry->d_parent->d_inode->i_ino; - if (filldir(dirent, "..", 2, filp->f_pos, inode_number) < 0) + if (filldir(dirent, "..", 2, filp->f_pos, inode_number, DT_DIR) < 0) break; filp->f_pos += de_len; continue; @@ -276,7 +276,7 @@ } } if (len > 0) { - if (filldir(dirent, p, len, filp->f_pos, inode_number) < 0) + if (filldir(dirent, p, len, filp->f_pos, inode_number, DT_UNKNOWN) < 0) break; } filp->f_pos += de_len; diff -urN 2.2.17pre19/fs/isofs/inode.c 2.2.17pre19aa2/fs/isofs/inode.c --- 2.2.17pre19/fs/isofs/inode.c Tue Aug 22 14:54:00 2000 +++ 2.2.17pre19aa2/fs/isofs/inode.c Sat Aug 26 17:13:51 2000 @@ -898,7 +898,8 @@ int isofs_bmap(struct inode * inode,int block) { - off_t b_off, offset, size; + loff_t b_off; + unsigned offset, size; struct inode *ino; unsigned int firstext; unsigned long nextino; @@ -909,7 +910,7 @@ return 0; } - b_off = block << ISOFS_BUFFER_BITS(inode); + b_off = (loff_t)block << ISOFS_BUFFER_BITS(inode); /* * If we are beyond the end of this file, don't give out any @@ -917,7 +918,7 @@ */ if( b_off >= inode->i_size ) { - off_t max_legal_read_offset; + loff_t max_legal_read_offset; /* * If we are *way* beyond the end of the file, print a message. @@ -928,20 +929,21 @@ * I/O errors. */ max_legal_read_offset = (inode->i_size + PAGE_SIZE - 1) - & ~(PAGE_SIZE - 1); + & ~(loff_t)(PAGE_SIZE - 1); if( b_off >= max_legal_read_offset ) { printk("_isofs_bmap: block>= EOF(%d, %ld)\n", block, - inode->i_size); + (u_long)((inode->i_size >> ISOFS_BUFFER_BITS(inode)) + + ((inode->i_size & ((1 << ISOFS_BUFFER_BITS(inode))-1)) != 0))); } return 0; } offset = 0; firstext = inode->u.isofs_i.i_first_extent; - size = inode->u.isofs_i.i_section_size; - nextino = inode->u.isofs_i.i_next_section_ino; + size = inode->u.isofs_i.i_section_size; + nextino = inode->u.isofs_i.i_next_section_ino; #ifdef DEBUG printk("first inode: inode=%x nextino=%x firstext=%u size=%lu\n", inode->i_ino, nextino, firstext, size); @@ -1180,7 +1182,7 @@ #ifdef DEBUG printk("Get inode %x: %d %d: %d\n",inode->i_ino, block, - ((int)pnt) & 0x3ff, inode->i_size); + ((int)pnt) & 0x3ff, (u_long)inode->i_size); #endif inode->i_mtime = inode->i_atime = inode->i_ctime = diff -urN 2.2.17pre19/fs/lockd/svclock.c 2.2.17pre19aa2/fs/lockd/svclock.c --- 2.2.17pre19/fs/lockd/svclock.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/lockd/svclock.c Sat Aug 26 17:13:51 2000 @@ -94,13 +94,15 @@ struct file_lock *fl; dprintk("lockd: nlmsvc_lookup_block f=%p pd=%d %ld-%ld ty=%d\n", - file, lock->fl.fl_pid, lock->fl.fl_start, - lock->fl.fl_end, lock->fl.fl_type); + file, lock->fl.fl_pid, + (u_long)lock->fl.fl_start, (u_long)lock->fl.fl_end, + lock->fl.fl_type); for (head = &nlm_blocked; (block = *head); head = &block->b_next) { fl = &block->b_call.a_args.lock.fl; dprintk(" check f=%p pd=%d %ld-%ld ty=%d\n", - block->b_file, fl->fl_pid, fl->fl_start, - fl->fl_end, fl->fl_type); + block->b_file, fl->fl_pid, + (u_long)fl->fl_start, (u_long)fl->fl_end, + fl->fl_type); if (block->b_file == file && nlm_compare_locks(fl, &lock->fl)) { if (remove) *head = block->b_next; @@ -286,8 +288,8 @@ file->f_file.f_dentry->d_inode->i_dev, file->f_file.f_dentry->d_inode->i_ino, lock->fl.fl_type, lock->fl.fl_pid, - lock->fl.fl_start, - lock->fl.fl_end, + (u_long)lock->fl.fl_start, + (u_long)lock->fl.fl_end, wait); /* Lock file against concurrent access */ @@ -359,12 +361,13 @@ file->f_file.f_dentry->d_inode->i_dev, file->f_file.f_dentry->d_inode->i_ino, lock->fl.fl_type, - lock->fl.fl_start, - lock->fl.fl_end); + (u_long)lock->fl.fl_start, + (u_long)lock->fl.fl_end); if ((fl = posix_test_lock(&file->f_file, &lock->fl)) != NULL) { dprintk("lockd: conflicting lock(ty=%d, %ld-%ld)\n", - fl->fl_type, fl->fl_start, fl->fl_end); + fl->fl_type, + (u_long)fl->fl_start, (u_long)fl->fl_end); conflock->caller = "somehost"; /* FIXME */ conflock->oh.len = 0; /* don't return OH info */ conflock->fl = *fl; @@ -390,8 +393,8 @@ file->f_file.f_dentry->d_inode->i_dev, file->f_file.f_dentry->d_inode->i_ino, lock->fl.fl_pid, - lock->fl.fl_start, - lock->fl.fl_end); + (u_long)lock->fl.fl_start, + (u_long)lock->fl.fl_end); /* First, cancel any lock that might be there */ nlmsvc_cancel_blocked(file, lock); @@ -418,8 +421,8 @@ file->f_file.f_dentry->d_inode->i_dev, file->f_file.f_dentry->d_inode->i_ino, lock->fl.fl_pid, - lock->fl.fl_start, - lock->fl.fl_end); + (u_long)lock->fl.fl_start, + (u_long)lock->fl.fl_end); down(&file->f_sema); if ((block = nlmsvc_lookup_block(file, lock, 1)) != NULL) diff -urN 2.2.17pre19/fs/lockd/xdr.c 2.2.17pre19aa2/fs/lockd/xdr.c --- 2.2.17pre19/fs/lockd/xdr.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/lockd/xdr.c Sat Aug 26 17:13:51 2000 @@ -142,7 +142,7 @@ fl->fl_pid = ntohl(*p++); fl->fl_flags = FL_POSIX; fl->fl_type = F_RDLCK; /* as good as anything else */ - fl->fl_start = ntohl(*p++); + fl->fl_start = (u_long)ntohl(*p++); len = ntohl(*p++); if (len == 0 || (fl->fl_end = fl->fl_start + len - 1) < 0) fl->fl_end = NLM_OFFSET_MAX; @@ -163,11 +163,11 @@ return NULL; *p++ = htonl(fl->fl_pid); - *p++ = htonl(lock->fl.fl_start); + *p++ = htonl((u_long)lock->fl.fl_start); if (lock->fl.fl_end == NLM_OFFSET_MAX) *p++ = xdr_zero; else - *p++ = htonl(lock->fl.fl_end - lock->fl.fl_start + 1); + *p++ = htonl((u_long)(lock->fl.fl_end - lock->fl.fl_start + 1)); return p; } @@ -192,11 +192,11 @@ if (!(p = xdr_encode_netobj(p, &resp->lock.oh))) return 0; - *p++ = htonl(fl->fl_start); + *p++ = htonl((u_long)fl->fl_start); if (fl->fl_end == NLM_OFFSET_MAX) *p++ = xdr_zero; else - *p++ = htonl(fl->fl_end - fl->fl_start + 1); + *p++ = htonl((u_long)(fl->fl_end - fl->fl_start + 1)); } return p; @@ -425,7 +425,7 @@ fl->fl_flags = FL_POSIX; fl->fl_type = excl? F_WRLCK : F_RDLCK; - fl->fl_start = ntohl(*p++); + fl->fl_start = (u_long)ntohl(*p++); len = ntohl(*p++); if (len == 0 || (fl->fl_end = fl->fl_start + len - 1) < 0) fl->fl_end = NLM_OFFSET_MAX; diff -urN 2.2.17pre19/fs/locks.c 2.2.17pre19aa2/fs/locks.c --- 2.2.17pre19/fs/locks.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/locks.c Sat Aug 26 17:13:51 2000 @@ -111,12 +111,15 @@ #include -#define OFFSET_MAX ((off_t)LONG_MAX) /* FIXME: move elsewhere? */ +#define OFFSET_MAX ((loff_t)((~0ULL)>>1)) /* FIXME: move elsewhere? */ +#define OFFT_OFFSET_MAX ((off_t)((~0UL)>>1)) static int flock_make_lock(struct file *filp, struct file_lock *fl, unsigned int cmd); -static int posix_make_lock(struct file *filp, struct file_lock *fl, +static int flock_to_posix_lock(struct file *filp, struct file_lock *fl, struct flock *l); +static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, + struct flock64 *l); static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl); static int posix_locks_conflict(struct file_lock *caller_fl, @@ -195,7 +198,7 @@ if (waiter->fl_prevblock) { printk(KERN_ERR "locks_insert_block: remove duplicated lock " - "(pid=%d %ld-%ld type=%d)\n", + "(pid=%d %Ld-%Ld type=%d)\n", waiter->fl_pid, waiter->fl_start, waiter->fl_end, waiter->fl_type); locks_delete_block(waiter->fl_prevblock, waiter); @@ -342,7 +345,7 @@ if (!filp->f_dentry || !filp->f_dentry->d_inode) goto out_putf; - if (!posix_make_lock(filp, &file_lock, &flock)) + if (!flock_to_posix_lock(filp, &file_lock, &flock)) goto out_putf; if (filp->f_op->lock) { @@ -361,8 +364,21 @@ flock.l_type = F_UNLCK; if (fl != NULL) { flock.l_pid = fl->fl_pid; +#if BITS_PER_LONG == 32 + /* + * Make sure we can represent the posix lock via + * legacy 32bit flock. + */ + error = -EOVERFLOW; + if (fl->fl_end > OFFT_OFFSET_MAX) + goto out_putf; +#endif flock.l_start = fl->fl_start; +#if BITS_PER_LONG == 32 + flock.l_len = fl->fl_end == OFFT_OFFSET_MAX ? 0 : +#else flock.l_len = fl->fl_end == OFFSET_MAX ? 0 : +#endif fl->fl_end - fl->fl_start + 1; flock.l_whence = 0; flock.l_type = fl->fl_type; @@ -425,7 +441,167 @@ } error = -EINVAL; - if (!posix_make_lock(filp, &file_lock, &flock)) + if (!flock_to_posix_lock(filp, &file_lock, &flock)) + goto out_putf; + + error = -EBADF; + switch (flock.l_type) { + case F_RDLCK: + if (!(filp->f_mode & FMODE_READ)) + goto out_putf; + break; + case F_WRLCK: + if (!(filp->f_mode & FMODE_WRITE)) + goto out_putf; + break; + case F_UNLCK: + break; + case F_SHLCK: + case F_EXLCK: +#ifdef __sparc__ +/* warn a bit for now, but don't overdo it */ +{ + static int count = 0; + if (!count) { + count=1; + printk(KERN_WARNING + "fcntl_setlk() called by process %d (%s) with broken flock() emulation\n", + current->pid, current->comm); + } +} + if (!(filp->f_mode & 3)) + goto out_putf; + break; +#endif + default: + error = -EINVAL; + goto out_putf; + } + + if (filp->f_op->lock != NULL) { + error = filp->f_op->lock(filp, cmd, &file_lock); + if (error < 0) + goto out_putf; + } + error = posix_lock_file(filp, &file_lock, cmd == F_SETLKW); + +out_putf: + fput(filp); +out: + return error; +} + +#if BITS_PER_LONG == 32 +/* Report the first existing lock that would conflict with l. + * This implements the F_GETLK command of fcntl(). + */ +int fcntl_getlk64(unsigned int fd, struct flock64 *l) +{ + struct file *filp; + struct file_lock *fl,file_lock; + struct flock64 flock; + int error; + + error = -EFAULT; + if (copy_from_user(&flock, l, sizeof(flock))) + goto out; + error = -EINVAL; + if ((flock.l_type != F_RDLCK) && (flock.l_type != F_WRLCK)) + goto out; + + error = -EBADF; + filp = fget(fd); + if (!filp) + goto out; + + error = -EINVAL; + if (!filp->f_dentry || !filp->f_dentry->d_inode) + goto out_putf; + + if (!flock64_to_posix_lock(filp, &file_lock, &flock)) + goto out_putf; + + if (filp->f_op->lock) { + error = filp->f_op->lock(filp, F_GETLK, &file_lock); + if (error < 0) + goto out_putf; + else if (error == LOCK_USE_CLNT) + /* Bypass for NFS with no locking - 2.0.36 compat */ + fl = posix_test_lock(filp, &file_lock); + else + fl = (file_lock.fl_type == F_UNLCK ? NULL : &file_lock); + } else { + fl = posix_test_lock(filp, &file_lock); + } + + flock.l_type = F_UNLCK; + if (fl != NULL) { + flock.l_pid = fl->fl_pid; + flock.l_start = fl->fl_start; + flock.l_len = fl->fl_end == OFFSET_MAX ? 0 : + fl->fl_end - fl->fl_start + 1; + flock.l_whence = 0; + flock.l_type = fl->fl_type; + } + error = -EFAULT; + if (!copy_to_user(l, &flock, sizeof(flock))) + error = 0; + +out_putf: + fput(filp); +out: + return error; +} + +/* Apply the lock described by l to an open file descriptor. + * This implements both the F_SETLK and F_SETLKW commands of fcntl(). + */ +int fcntl_setlk64(unsigned int fd, unsigned int cmd, struct flock64 *l) +{ + struct file *filp; + struct file_lock file_lock; + struct flock64 flock; + struct dentry * dentry; + struct inode *inode; + int error; + + /* + * This might block, so we do it before checking the inode. + */ + error = -EFAULT; + if (copy_from_user(&flock, l, sizeof(flock))) + goto out; + + /* Get arguments and validate them ... + */ + + error = -EBADF; + filp = fget(fd); + if (!filp) + goto out; + + error = -EINVAL; + if (!(dentry = filp->f_dentry)) + goto out_putf; + if (!(inode = dentry->d_inode)) + goto out_putf; + + /* Don't allow mandatory locks on files that may be memory mapped + * and shared. + */ + if (IS_MANDLOCK(inode) && + (inode->i_mode & (S_ISGID | S_IXGRP)) == S_ISGID && + inode->i_mmap) { + struct vm_area_struct *vma = inode->i_mmap; + error = -EAGAIN; + do { + if (vma->vm_flags & VM_MAYSHARE) + goto out_putf; + } while ((vma = vma->vm_next_share) != NULL); + } + + error = -EINVAL; + if (!flock64_to_posix_lock(filp, &file_lock, &flock)) goto out_putf; error = -EBADF; @@ -474,6 +650,7 @@ out: return error; } +#endif /* BITS_PER_LONG == 32 */ /* * This function is called when the file is being removed @@ -653,10 +830,60 @@ /* Verify a "struct flock" and copy it to a "struct file_lock" as a POSIX * style lock. */ -static int posix_make_lock(struct file *filp, struct file_lock *fl, - struct flock *l) +static int flock_to_posix_lock(struct file *filp, struct file_lock *fl, + struct flock *l) +{ + loff_t start; + + memset(fl, 0, sizeof(*fl)); + + fl->fl_flags = FL_POSIX; + + switch (l->l_type) { + case F_RDLCK: + case F_WRLCK: + case F_UNLCK: + fl->fl_type = l->l_type; + break; + default: + return (0); + } + + switch (l->l_whence) { + case 0: /*SEEK_SET*/ + start = 0; + break; + case 1: /*SEEK_CUR*/ + start = filp->f_pos; + break; + case 2: /*SEEK_END*/ + start = filp->f_dentry->d_inode->i_size; + break; + default: + return (0); + } + + if (((start += l->l_start) < 0) || (l->l_len < 0)) + return (0); + fl->fl_end = start + l->l_len - 1; + if (l->l_len > 0 && fl->fl_end < 0) + return (0); + fl->fl_start = start; /* we record the absolute position */ + if (l->l_len == 0) + fl->fl_end = OFFSET_MAX; + + fl->fl_file = filp; + fl->fl_owner = current->files; + fl->fl_pid = current->pid; + + return (1); +} + +#if BITS_PER_LONG == 32 +static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, + struct flock64 *l) { - off_t start; + loff_t start; memset(fl, 0, sizeof(*fl)); @@ -701,6 +928,7 @@ return (1); } +#endif /* Verify a call to flock() and fill in a file_lock structure with * an appropriate FLOCK lock. @@ -1207,7 +1435,7 @@ p += sprintf(p, "FLOCK ADVISORY "); } p += sprintf(p, "%s ", (fl->fl_type == F_RDLCK) ? "READ " : "WRITE"); - p += sprintf(p, "%d %s:%ld %ld %ld ", + p += sprintf(p, "%d %s:%ld %Ld %Ld ", fl->fl_pid, kdevname(inode->i_dev), inode->i_ino, fl->fl_start, fl->fl_end); @@ -1271,6 +1499,3 @@ *start = buffer; return (q - buffer); } - - - diff -urN 2.2.17pre19/fs/minix/dir.c 2.2.17pre19aa2/fs/minix/dir.c --- 2.2.17pre19/fs/minix/dir.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/minix/dir.c Sat Aug 26 17:13:51 2000 @@ -82,7 +82,7 @@ de = (struct minix_dir_entry *) (offset + bh->b_data); if (de->inode) { int size = strnlen(de->name, info->s_namelen); - if (filldir(dirent, de->name, size, filp->f_pos, de->inode) < 0) { + if (filldir(dirent, de->name, size, filp->f_pos, de->inode, DT_UNKNOWN) < 0) { brelse(bh); return 0; } diff -urN 2.2.17pre19/fs/minix/file.c 2.2.17pre19aa2/fs/minix/file.c --- 2.2.17pre19/fs/minix/file.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/minix/file.c Sat Aug 26 17:13:51 2000 @@ -70,7 +70,7 @@ size_t count, loff_t *ppos) { struct inode * inode = filp->f_dentry->d_inode; - off_t pos; + loff_t pos; ssize_t written, c; struct buffer_head * bh; char * p; @@ -87,6 +87,24 @@ pos = inode->i_size; else pos = *ppos; + + /* L-F-S spec 2.2.1.27: */ + if (!(filp->f_flags & O_LARGEFILE)) { + if (pos >= 0x7ffffffeULL) /* pos@2G forbidden */ + return -EFBIG; + + if (pos + count >= 0x7fffffffULL) + /* Write only until end of allowed region */ + count = 0x7fffffffULL - pos; + } + /* MINIX i-node file-size can't exceed 4G-1 */ + /* With 1k blocks and triple indirection MINIX can have files + up to 16 GB in size -- filesystem maximum is then 4G*1k = 4T */ + if (pos >= 0xffffffffULL) + return -EFBIG; /* Absolutely too much! */ + if ((pos + count) >= 0x100000000ULL) /* too much to write! */ + count = 0xffffffffULL - pos; + written = 0; while (written < count) { bh = minix_getblk(inode,pos/BLOCK_SIZE,1); diff -urN 2.2.17pre19/fs/ncpfs/dir.c 2.2.17pre19aa2/fs/ncpfs/dir.c --- 2.2.17pre19/fs/ncpfs/dir.c Tue Jun 13 03:48:14 2000 +++ 2.2.17pre19aa2/fs/ncpfs/dir.c Sat Aug 26 17:13:51 2000 @@ -447,14 +447,14 @@ result = 0; if (filp->f_pos == 0) { ncp_invalid_dir_cache(inode); - if (filldir(dirent, ".", 1, 0, inode->i_ino) < 0) { + if (filldir(dirent, ".", 1, 0, inode->i_ino, DT_DIR) < 0) { goto finished; } filp->f_pos = 1; } if (filp->f_pos == 1) { if (filldir(dirent, "..", 2, 1, - dentry->d_parent->d_inode->i_ino) < 0) { + dentry->d_parent->d_inode->i_ino, DT_DIR) < 0) { goto finished; } filp->f_pos = 2; @@ -535,7 +535,7 @@ ino = ncp_invent_inos(1); if (filldir(dirent, entry->i.entryName, entry->i.nameLen, - entry->f_pos, ino) < 0) { + entry->f_pos, ino, DT_UNKNOWN) < 0) { break; } if ((inode->i_dev != c_dev) diff -urN 2.2.17pre19/fs/ncpfs/file.c 2.2.17pre19aa2/fs/ncpfs/file.c --- 2.2.17pre19/fs/ncpfs/file.c Thu May 4 13:00:39 2000 +++ 2.2.17pre19aa2/fs/ncpfs/file.c Sat Aug 26 17:13:51 2000 @@ -17,6 +17,7 @@ #include #include #include +#include #include #include "ncplib_kernel.h" @@ -161,7 +162,7 @@ /* First read in as much as possible for each bufsize. */ while (already_read < count) { int read_this_time; - size_t to_read = min(bufsize - (pos % bufsize), + size_t to_read = min(bufsize - (pos & (bufsize-1)), count - already_read); error = ncp_read_bounce(NCP_SERVER(inode), @@ -201,7 +202,7 @@ struct dentry *dentry = file->f_dentry; struct inode *inode = dentry->d_inode; size_t already_written = 0; - off_t pos; + loff_t pos; size_t bufsize; int errno; void* bouncebuffer; @@ -238,12 +239,18 @@ already_written = 0; + /* Maximum file size: 2G-1 */ + if (pos >= 0x7fffffffULL) + return -EFBIG; + if ((pos + count) >= 0x7fffffffULL) + count = 0x7fffffffULL - pos; + bouncebuffer = kmalloc(bufsize, GFP_NFS); if (!bouncebuffer) return -EIO; /* -ENOMEM */ while (already_written < count) { int written_this_time; - size_t to_write = min(bufsize - (pos % bufsize), + size_t to_write = min(bufsize - (pos & (bufsize-1)), count - already_written); if (copy_from_user(bouncebuffer, buf, to_write)) { diff -urN 2.2.17pre19/fs/ncpfs/inode.c 2.2.17pre19aa2/fs/ncpfs/inode.c --- 2.2.17pre19/fs/ncpfs/inode.c Tue Jun 13 03:48:14 2000 +++ 2.2.17pre19aa2/fs/ncpfs/inode.c Sat Aug 26 17:13:51 2000 @@ -131,7 +131,7 @@ } inode->i_blocks = 0; if ((inode->i_size)&&(inode->i_blksize)) { - inode->i_blocks = (inode->i_size-1)/(inode->i_blksize)+1; + inode->i_blocks = ((inode->i_size-1) >> fslog2(inode->i_blksize)) +1; } inode->i_mtime = ncp_date_dos2unix(le16_to_cpu(nwi->modifyTime), @@ -201,8 +201,7 @@ inode->i_blocks = 0; if ((inode->i_blksize != 0) && (inode->i_size != 0)) { - inode->i_blocks = - (inode->i_size - 1) / inode->i_blksize + 1; + inode->i_blocks = ((inode->i_size - 1) >> fslog2(inode->i_blksize)) + 1; } inode->i_mtime = ncp_date_dos2unix(le16_to_cpu(nwi->modifyTime), diff -urN 2.2.17pre19/fs/nfs/dir.c 2.2.17pre19aa2/fs/nfs/dir.c --- 2.2.17pre19/fs/nfs/dir.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/nfs/dir.c Sat Aug 26 17:13:51 2000 @@ -271,7 +271,7 @@ fileid, result); */ - if (filldir(dirent, name, length, cookie, fileid) < 0) + if (filldir(dirent, name, length, cookie, fileid, DT_UNKNOWN) < 0) break; cookie = nextpos; index++; diff -urN 2.2.17pre19/fs/nfs/file.c 2.2.17pre19aa2/fs/nfs/file.c --- 2.2.17pre19/fs/nfs/file.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/nfs/file.c Sat Aug 26 17:13:51 2000 @@ -114,6 +114,11 @@ dentry->d_parent->d_name.name, dentry->d_name.name, (unsigned long) count, (unsigned long) *ppos); + /* Unconditionally allow only up to 2G files */ + /* FIXME: NFSv3 could allow 64-bit file offsets! */ + if (*ppos >= 0x7ffffffeULL) + return -EOVERFLOW; + result = nfs_revalidate_inode(NFS_DSERVER(dentry), dentry); if (!result) result = generic_file_read(file, buf, count, ppos); @@ -178,6 +183,13 @@ if (result) goto out; + /* Unconditionally allow only up to 2G files */ + /* FIXME: NFSv3 could allow 64-bit file offsets! */ + if (*ppos >= 0x7ffffffeULL) { + result = -EOVERFLOW; + goto out; + } + result = count; if (!count) goto out; @@ -203,7 +215,7 @@ dprintk("NFS: nfs_lock(f=%4x/%ld, t=%x, fl=%x, r=%ld:%ld)\n", inode->i_dev, inode->i_ino, fl->fl_type, fl->fl_flags, - fl->fl_start, fl->fl_end); + (u_long)fl->fl_start, (u_long)fl->fl_end); if (!inode) return -EINVAL; diff -urN 2.2.17pre19/fs/nfs/inode.c 2.2.17pre19aa2/fs/nfs/inode.c --- 2.2.17pre19/fs/nfs/inode.c Tue Jun 13 03:48:14 2000 +++ 2.2.17pre19aa2/fs/nfs/inode.c Sat Aug 26 17:13:51 2000 @@ -673,8 +673,13 @@ sattr.gid = attr->ia_gid; sattr.size = (u32) -1; - if ((attr->ia_valid & ATTR_SIZE) && S_ISREG(inode->i_mode)) - sattr.size = attr->ia_size; + if ((attr->ia_valid & ATTR_SIZE) && S_ISREG(inode->i_mode)) { + /* XXX: should check rlim here */ + if (attr->ia_size < 0 || attr->ia_size >= 0x7FFFFFFF) + return -EFBIG; + sattr.size = (u32)attr->ia_size; + } + sattr.mtime.seconds = sattr.mtime.useconds = (u32) -1; if (attr->ia_valid & ATTR_MTIME) { @@ -702,7 +707,7 @@ */ if (sattr.size != (u32) -1) { if (sattr.size != fattr.size) - printk("nfs_notify_change: sattr=%d, fattr=%d??\n", + printk("nfs_notify_change: sattr=%Ld, fattr=%Ld??\n", sattr.size, fattr.size); inode->i_size = sattr.size; inode->i_mtime = fattr.mtime.seconds; diff -urN 2.2.17pre19/fs/nfs/proc.c 2.2.17pre19aa2/fs/nfs/proc.c --- 2.2.17pre19/fs/nfs/proc.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/nfs/proc.c Sat Aug 26 17:13:51 2000 @@ -110,14 +110,14 @@ int nfs_proc_read(struct nfs_server *server, struct nfs_fh *fhandle, int swap, - unsigned long offset, unsigned int count, - void *buffer, struct nfs_fattr *fattr) + loff_t offset, unsigned int count, + void *buffer, struct nfs_fattr *fattr) { struct nfs_readargs arg = { fhandle, offset, count, buffer }; struct nfs_readres res = { fattr, count }; int status; - dprintk("NFS call read %d @ %ld\n", count, offset); + dprintk("NFS call read %d @ %Ld\n", count, offset); status = rpc_call(server->client, NFSPROC_READ, &arg, &res, swap? NFS_RPC_SWAPFLAGS : 0); dprintk("NFS reply read: %d\n", status); @@ -126,13 +126,13 @@ int nfs_proc_write(struct nfs_server *server, struct nfs_fh *fhandle, int swap, - unsigned long offset, unsigned int count, - const void *buffer, struct nfs_fattr *fattr) + loff_t offset, unsigned int count, + const void *buffer, struct nfs_fattr *fattr) { struct nfs_writeargs arg = { fhandle, offset, count, buffer }; int status; - dprintk("NFS call write %d @ %ld\n", count, offset); + dprintk("NFS call write %d @ %Ld\n", count, offset); status = rpc_call(server->client, NFSPROC_WRITE, &arg, fattr, swap? (RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS) : 0); dprintk("NFS reply read: %d\n", status); diff -urN 2.2.17pre19/fs/nfs/read.c 2.2.17pre19aa2/fs/nfs/read.c --- 2.2.17pre19/fs/nfs/read.c Thu May 4 13:00:40 2000 +++ 2.2.17pre19aa2/fs/nfs/read.c Sat Aug 26 17:13:51 2000 @@ -52,7 +52,7 @@ */ static inline void nfs_readreq_setup(struct nfs_rreq *req, struct nfs_fh *fh, - unsigned long offset, void *buffer, unsigned int rsize) + loff_t offset, void *buffer, unsigned int rsize) { req->ra_args.fh = fh; req->ra_args.offset = offset; @@ -70,12 +70,12 @@ nfs_readpage_sync(struct dentry *dentry, struct inode *inode, struct page *page) { struct nfs_rreq rqst; - unsigned long offset = page->offset; - char *buffer = (char *) page_address(page); - int rsize = NFS_SERVER(inode)->rsize; - int result, refresh = 0; - int count = PAGE_SIZE; - int flags = IS_SWAPFILE(inode)? NFS_RPC_SWAPFLAGS : 0; + loff_t offset = pgoff2loff(page->index); + char *buffer = (char *) page_address(page); + int rsize = NFS_SERVER(inode)->rsize; + int result = 0, refresh = 0; + int count = PAGE_SIZE; + int flags = IS_SWAPFILE(inode)? NFS_RPC_SWAPFLAGS : 0; dprintk("NFS: nfs_readpage_sync(%p)\n", page); clear_bit(PG_error, &page->flags); @@ -84,13 +84,28 @@ if (count < rsize) rsize = count; - dprintk("NFS: nfs_proc_read(%s, (%s/%s), %ld, %d, %p)\n", + dprintk("NFS: nfs_proc_read(%s, (%s/%s), %Ld, %d, %p)\n", NFS_SERVER(inode)->hostname, dentry->d_parent->d_name.name, dentry->d_name.name, offset, rsize, buffer); + /* FIXME: NFSv3 could allow 64-bit offsets! ... */ + + if (offset > 0x7ffffffeULL) { + if (result) + break; + result = -EOVERFLOW; + goto io_error; + } + if ((offset + rsize) > 0x7fffffffULL) /* 2G-1 */ + rsize = 0x7fffffffULL - offset; + + /* ... END FIXME! */ + /* Set up arguments and perform rpc call */ - nfs_readreq_setup(&rqst, NFS_FH(dentry), offset, buffer, rsize); + nfs_readreq_setup(&rqst, NFS_FH(dentry), offset, + buffer, rsize); + result = rpc_call(NFS_CLIENT(inode), NFSPROC_READ, &rqst.ra_args, &rqst.ra_res, flags); @@ -173,8 +188,16 @@ unsigned long address = page_address(page); struct nfs_rreq *req; int result = -1, flags; + loff_t loffset = pgoff2loff(page->index); dprintk("NFS: nfs_readpage_async(%p)\n", page); + + /* FIXME: NFSv3 allows 64-bit offsets.. */ + if ((loffset + PAGE_SIZE) >= 0x7fffffffULL) { + dprintk("NFS: Async read beyond 2G-1 marker!\n"); + return -EOVERFLOW; + } + if (NFS_CONGESTED(inode)) goto out_defer; @@ -186,8 +209,9 @@ /* Initialize request */ /* N.B. Will the dentry remain valid for life of request? */ - nfs_readreq_setup(req, NFS_FH(dentry), page->offset, - (void *) address, PAGE_SIZE); + nfs_readreq_setup(req, NFS_FH(dentry), loffset, + (void *) address, PAGE_SIZE); + req->ra_inode = inode; req->ra_page = page; /* count has been incremented by caller */ @@ -230,8 +254,8 @@ struct inode *inode = dentry->d_inode; int error; - dprintk("NFS: nfs_readpage (%p %ld@%ld)\n", - page, PAGE_SIZE, page->offset); + dprintk("NFS: nfs_readpage (%p %ld@%Ld)\n", + page, PAGE_SIZE, pgoff2loff(page->index)); atomic_inc(&page->count); set_bit(PG_locked, &page->flags); diff -urN 2.2.17pre19/fs/nfs/write.c 2.2.17pre19aa2/fs/nfs/write.c --- 2.2.17pre19/fs/nfs/write.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/nfs/write.c Sat Aug 26 17:13:51 2000 @@ -86,26 +86,43 @@ */ static int nfs_writepage_sync(struct dentry *dentry, struct inode *inode, - struct page *page, unsigned long offset, unsigned int count) + struct page *page, unsigned int offset, unsigned int count) { unsigned int wsize = NFS_SERVER(inode)->wsize; int result, refresh = 0, written = 0; u8 *buffer; struct nfs_fattr fattr; + loff_t loffset = pgoff2loff(page->index)+offset; - dprintk("NFS: nfs_writepage_sync(%s/%s %d@%ld)\n", + dprintk("NFS: nfs_writepage_sync(%s/%s %d@%Ld)\n", dentry->d_parent->d_name.name, dentry->d_name.name, - count, page->offset + offset); + count, loffset); buffer = (u8 *) page_address(page) + offset; - offset += page->offset; + + /* FIXME: NFSv3 !!! */ +#if 1 + if (loffset >= 0x7ffffffeULL) + return -EFBIG; + if (loffset + count >= 0x7fffffffULL) { + /* At MOST this much! */ + count = 0x7fffffffULL - loffset; + } +#else + if (S_ISREG(inode->i_flags) && + !(dentry->d_file->f_flags & O_LARGEFILE) && + (loffset >= 0x7ffffffeULL || + loffset + count >= 0x7fffffffULL) { + /* Writing beyond LargeFile maximums without O_LARGEFILE */ + } +#endif do { if (count < wsize && !IS_SWAPFILE(inode)) wsize = count; result = nfs_proc_write(NFS_DSERVER(dentry), NFS_FH(dentry), - IS_SWAPFILE(inode), offset, wsize, + IS_SWAPFILE(inode), loffset, wsize, buffer, &fattr); if (result < 0) { @@ -118,15 +135,15 @@ wsize, result); refresh = 1; buffer += wsize; - offset += wsize; + loffset += wsize; written += wsize; count -= wsize; /* * If we've extended the file, update the inode * now so we don't invalidate the cache. */ - if (offset > inode->i_size) - inode->i_size = offset; + if (loffset > inode->i_size) + inode->i_size = loffset; } while (count); io_error: @@ -271,7 +288,7 @@ dprintk("NFS: create_write_request(%s/%s, %ld+%d)\n", dentry->d_parent->d_name.name, dentry->d_name.name, - page->offset + offset, bytes); + (u_long)pgoff2loff(page->index) + offset, bytes); /* FIXME: Enforce hard limit on number of concurrent writes? */ wreq = (struct nfs_wreq *) kmalloc(sizeof(*wreq), GFP_KERNEL); @@ -417,7 +434,7 @@ dprintk("NFS: nfs_updatepage(%s/%s %d@%ld, sync=%d)\n", dentry->d_parent->d_name.name, dentry->d_name.name, - count, page->offset+offset, sync); + count, (u_long)pgoff2loff(page->index)+offset, sync); /* * Try to find a corresponding request on the writeback queue. @@ -603,7 +620,7 @@ /* Setup the task struct for a writeback call */ req->wb_flags |= NFS_WRITE_INPROGRESS; req->wb_args.fh = NFS_FH(dentry); - req->wb_args.offset = page->offset + req->wb_offset; + req->wb_args.offset = pgoff2loff(page->index) + req->wb_offset; req->wb_args.count = req->wb_bytes; req->wb_args.buffer = (void *) (page_address(page) + req->wb_offset); diff -urN 2.2.17pre19/fs/nfsd/nfs3xdr.c 2.2.17pre19aa2/fs/nfsd/nfs3xdr.c --- 2.2.17pre19/fs/nfsd/nfs3xdr.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/nfsd/nfs3xdr.c Sat Aug 26 17:13:51 2000 @@ -636,7 +636,7 @@ #define NFS3_ENTRYPLUS_BAGGAGE ((1 + 20 + 1 + NFS3_FHSIZE) << 2) int nfs3svc_encode_entry(struct readdir_cd *cd, const char *name, - int namlen, unsigned long offset, ino_t ino) + int namlen, unsigned long offset, ino_t ino, unsigned int d_type) { u32 *p = cd->buffer; int buflen, slen, elen; diff -urN 2.2.17pre19/fs/nfsd/nfsfh.c 2.2.17pre19aa2/fs/nfsd/nfsfh.c --- 2.2.17pre19/fs/nfsd/nfsfh.c Thu May 4 13:00:40 2000 +++ 2.2.17pre19aa2/fs/nfsd/nfsfh.c Sat Aug 26 17:13:51 2000 @@ -319,7 +319,7 @@ * the specified inode number. */ static int filldir_one(void * __buf, const char * name, int len, - off_t pos, ino_t ino) + off_t pos, ino_t ino, unsigned int d_type) { struct nfsd_getdents_callback *buf = __buf; struct nfsd_dirent *dirent = buf->dirent; diff -urN 2.2.17pre19/fs/nfsd/nfsxdr.c 2.2.17pre19aa2/fs/nfsd/nfsxdr.c --- 2.2.17pre19/fs/nfsd/nfsxdr.c Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/nfsd/nfsxdr.c Sat Aug 26 17:13:51 2000 @@ -424,7 +424,7 @@ int nfssvc_encode_entry(struct readdir_cd *cd, const char *name, - int namlen, off_t offset, ino_t ino) + int namlen, off_t offset, ino_t ino, unsigned int d_type) { u32 *p = cd->buffer; int buflen, slen; diff -urN 2.2.17pre19/fs/nfsd/vfs.c 2.2.17pre19aa2/fs/nfsd/vfs.c --- 2.2.17pre19/fs/nfsd/vfs.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/nfsd/vfs.c Sat Aug 26 17:13:51 2000 @@ -503,8 +503,9 @@ /* Write back readahead params */ if (ra != NULL) { dprintk("nfsd: raparms %ld %ld %ld %ld %ld\n", - file.f_reada, file.f_ramax, file.f_raend, - file.f_ralen, file.f_rawin); + (u_long)file.f_reada, (u_long)file.f_ramax, + (u_long)file.f_raend, (u_long)file.f_ralen, + (u_long)file.f_rawin); ra->p_reada = file.f_reada; ra->p_ramax = file.f_ramax; ra->p_raend = file.f_raend; diff -urN 2.2.17pre19/fs/ntfs/fs.c 2.2.17pre19aa2/fs/ntfs/fs.c --- 2.2.17pre19/fs/ntfs/fs.c Tue Aug 22 14:54:10 2000 +++ 2.2.17pre19aa2/fs/ntfs/fs.c Sat Aug 26 17:13:51 2000 @@ -199,7 +199,7 @@ /* filldir expects an off_t rather than an loff_t. Hope we don't have more than 65535 index records */ error=nf->filldir(nf->dirent,nf->name,nf->namelen, - (nf->ph<<16)|nf->pl,inum); + (nf->ph<<16)|nf->pl,inum,DT_UNKNOWN); ntfs_free(nf->name); /* Linux filldir errors are negative, other errors positive */ return error; @@ -225,11 +225,11 @@ if(cb.ph==0xFFFF){ /* FIXME: Maybe we can return those with the previous call */ switch(cb.pl){ - case 0: filldir(dirent,".",1,filp->f_pos,dir->i_ino); + case 0: filldir(dirent,".",1,filp->f_pos,dir->i_ino,DT_DIR); filp->f_pos=0xFFFF0001; return 0; /* FIXME: parent directory */ - case 1: filldir(dirent,"..",2,filp->f_pos,0); + case 1: filldir(dirent,"..",2,filp->f_pos,0,DT_DIR); filp->f_pos=0xFFFF0002; return 0; } @@ -822,6 +822,7 @@ struct statfs fs; struct inode *mft; ntfs_volume *vol; + ntfs_u64 size; int error; ntfs_debug(DEBUG_OTHER, "ntfs_statfs\n"); @@ -830,16 +831,17 @@ fs.f_type=NTFS_SUPER_MAGIC; fs.f_bsize=vol->clustersize; - error = ntfs_get_volumesize( NTFS_SB2VOL( sb ), &fs.f_blocks ); + error = ntfs_get_volumesize( NTFS_SB2VOL( sb ), &size ); if( error ) return -error; + fs.f_blocks = size; fs.f_bfree=ntfs_get_free_cluster_count(vol->bitmap); fs.f_bavail=fs.f_bfree; /* Number of files is limited by free space only, so we lie here */ fs.f_ffree=0; mft=iget(sb,FILE_MFT); - fs.f_files=mft->i_size/vol->mft_recordsize; + fs.f_files = (long)mft->i_size / vol->mft_recordsize; iput(mft); /* should be read from volume */ diff -urN 2.2.17pre19/fs/ntfs/super.c 2.2.17pre19aa2/fs/ntfs/super.c --- 2.2.17pre19/fs/ntfs/super.c Tue Aug 22 14:54:11 2000 +++ 2.2.17pre19aa2/fs/ntfs/super.c Sat Aug 26 17:13:51 2000 @@ -304,7 +304,7 @@ * Writes the volume size into vol_size. Returns 0 if successful * or error. */ -int ntfs_get_volumesize(ntfs_volume *vol, long *vol_size ) +int ntfs_get_volumesize(ntfs_volume *vol, ntfs_u64 *vol_size ) { ntfs_io io; ntfs_u64 size; @@ -325,9 +325,7 @@ ntfs_getput_clusters(vol,0,0,&io); size=NTFS_GETU64(cluster0+0x28); ntfs_free(cluster0); - /* FIXME: more than 2**32 cluster */ - /* FIXME: gcc will emit udivdi3 if we don't truncate it */ - *vol_size = ((unsigned long)size)/vol->clusterfactor; + *vol_size = size; return 0; } diff -urN 2.2.17pre19/fs/ntfs/super.h 2.2.17pre19aa2/fs/ntfs/super.h --- 2.2.17pre19/fs/ntfs/super.h Mon Jan 17 16:44:42 2000 +++ 2.2.17pre19aa2/fs/ntfs/super.h Sat Aug 26 17:13:51 2000 @@ -10,7 +10,7 @@ #define ALLOC_REQUIRE_SIZE 2 int ntfs_get_free_cluster_count(ntfs_inode *bitmap); -int ntfs_get_volumesize(ntfs_volume *vol, long *vol_size ); +int ntfs_get_volumesize(ntfs_volume *vol, ntfs_u64 *vol_size ); int ntfs_init_volume(ntfs_volume *vol,char *boot); int ntfs_load_special_files(ntfs_volume *vol); int ntfs_release_volume(ntfs_volume *vol); diff -urN 2.2.17pre19/fs/open.c 2.2.17pre19aa2/fs/open.c --- 2.2.17pre19/fs/open.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/open.c Sat Aug 26 17:13:52 2000 @@ -12,7 +12,7 @@ #include -asmlinkage int sys_statfs(const char * path, struct statfs * buf) +asmlinkage long sys_statfs(const char * path, struct statfs * buf) { struct dentry * dentry; int error; @@ -34,7 +34,7 @@ return error; } -asmlinkage int sys_fstatfs(unsigned int fd, struct statfs * buf) +asmlinkage long sys_fstatfs(unsigned int fd, struct statfs * buf) { struct file * file; struct inode * inode; @@ -63,17 +63,18 @@ return error; } -int do_truncate(struct dentry *dentry, unsigned long length) +int do_truncate(struct dentry *dentry, loff_t length) { struct inode *inode = dentry->d_inode; int error; struct iattr newattrs; - /* Not pretty: "inode->i_size" shouldn't really be "off_t". But it is. */ - if ((off_t) length < 0) - return -EINVAL; + /* Not pretty: "inode->i_size" shouldn't really be signed. But it is. */ + error = -EINVAL; + if (length < 0) + goto out; - down(&inode->i_sem); + fs_down(&inode->i_sem); newattrs.ia_size = length; newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME; error = notify_change(dentry, &newattrs); @@ -83,16 +84,21 @@ if (inode->i_op && inode->i_op->truncate) inode->i_op->truncate(inode); } - up(&inode->i_sem); + fs_up(&inode->i_sem); +out: return error; } -asmlinkage int sys_truncate(const char * path, unsigned long length) +static inline long do_sys_truncate(const char * path, loff_t length) { struct dentry * dentry; struct inode * inode; int error; + error = -EINVAL; + if (length < 0) + goto out_nolock; + lock_kernel(); dentry = namei(path); @@ -133,10 +139,16 @@ dput(dentry); out: unlock_kernel(); +out_nolock: return error; } -asmlinkage int sys_ftruncate(unsigned int fd, unsigned long length) +asmlinkage long sys_truncate(const char * path, unsigned long length) +{ + return do_sys_truncate(path, length); +} + +static inline long do_sys_ftruncate(unsigned int fd, loff_t length) { struct inode * inode; struct dentry *dentry; @@ -171,6 +183,24 @@ return error; } +asmlinkage long sys_ftruncate(unsigned int fd, unsigned long length) +{ + return do_sys_ftruncate(fd, length); +} + +/* LFS versions of truncate are only needed on 32 bit machines */ +#if BITS_PER_LONG == 32 +asmlinkage long sys_truncate64(const char * path, loff_t length) +{ + return do_sys_truncate(path, length); +} + +asmlinkage long sys_ftruncate64(unsigned int fd, loff_t length) +{ + return do_sys_ftruncate(fd, length); +} +#endif + #ifndef __alpha__ /* @@ -184,7 +214,7 @@ * must be owner or have write permission. * Else, update from *times, must be owner or super user. */ -asmlinkage int sys_utime(char * filename, struct utimbuf * times) +asmlinkage long sys_utime(char * filename, struct utimbuf * times) { int error; struct dentry * dentry; @@ -232,7 +262,7 @@ * must be owner or have write permission. * Else, update from *times, must be owner or super user. */ -asmlinkage int sys_utimes(char * filename, struct timeval * utimes) +asmlinkage long sys_utimes(char * filename, struct timeval * utimes) { int error; struct dentry * dentry; @@ -278,7 +308,7 @@ * We do this by temporarily clearing all FS-related capabilities and * switching the fsuid/fsgid around to the real ones. */ -asmlinkage int sys_access(const char * filename, int mode) +asmlinkage long sys_access(const char * filename, int mode) { struct dentry * dentry; int old_fsuid, old_fsgid; @@ -319,7 +349,7 @@ return res; } -asmlinkage int sys_chdir(const char * filename) +asmlinkage long sys_chdir(const char * filename) { int error; struct inode *inode; @@ -354,7 +384,7 @@ return error; } -asmlinkage int sys_fchdir(unsigned int fd) +asmlinkage long sys_fchdir(unsigned int fd) { struct file *file; struct dentry *dentry; @@ -391,7 +421,7 @@ return error; } -asmlinkage int sys_chroot(const char * filename) +asmlinkage long sys_chroot(const char * filename) { int error; struct inode *inode; @@ -431,7 +461,7 @@ return error; } -asmlinkage int sys_fchmod(unsigned int fd, mode_t mode) +asmlinkage long sys_fchmod(unsigned int fd, mode_t mode) { struct inode * inode; struct dentry * dentry; @@ -469,7 +499,7 @@ return err; } -asmlinkage int sys_chmod(const char * filename, mode_t mode) +asmlinkage long sys_chmod(const char * filename, mode_t mode) { struct dentry * dentry; struct inode * inode; @@ -565,7 +595,7 @@ return error; } -asmlinkage int sys_chown(const char * filename, uid_t user, gid_t group) +asmlinkage long sys_chown(const char * filename, uid_t user, gid_t group) { struct dentry * dentry; int error; @@ -582,7 +612,7 @@ return error; } -asmlinkage int sys_lchown(const char * filename, uid_t user, gid_t group) +asmlinkage long sys_lchown(const char * filename, uid_t user, gid_t group) { struct dentry * dentry; int error; @@ -600,7 +630,7 @@ } -asmlinkage int sys_fchown(unsigned int fd, uid_t user, gid_t group) +asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group) { struct dentry * dentry; struct file * file; @@ -760,6 +790,9 @@ char * tmp; int fd, error; +#if BITS_PER_LONG != 32 + flags |= O_LARGEFILE; +#endif tmp = getname(filename); fd = PTR_ERR(tmp); if (!IS_ERR(tmp)) { @@ -790,7 +823,7 @@ * For backward compatibility? Maybe this should be moved * into arch/i386 instead? */ -asmlinkage int sys_creat(const char * pathname, int mode) +asmlinkage long sys_creat(const char * pathname, int mode) { return sys_open(pathname, O_CREAT | O_WRONLY | O_TRUNC, mode); } @@ -863,7 +896,7 @@ * This routine simulates a hangup on the tty, to arrange that users * are given clean terminals at login time. */ -asmlinkage int sys_vhangup(void) +asmlinkage long sys_vhangup(void) { int ret = -EPERM; diff -urN 2.2.17pre19/fs/proc/array.c 2.2.17pre19aa2/fs/proc/array.c --- 2.2.17pre19/fs/proc/array.c Tue Aug 22 14:54:11 2000 +++ 2.2.17pre19aa2/fs/proc/array.c Sat Aug 26 17:13:51 2000 @@ -42,6 +42,8 @@ * Alan Cox : security fixes. * * + * Gerhard Wichert : added BIGMEM support + * Siemens AG */ #include @@ -389,6 +391,8 @@ "MemShared: %8lu kB\n" "Buffers: %8lu kB\n" "Cached: %8lu kB\n" + "BigTotal: %8lu kB\n" + "BigFree: %8lu kB\n" "SwapTotal: %8lu kB\n" "SwapFree: %8lu kB\n", i.totalram >> 10, @@ -396,6 +400,8 @@ i.sharedram >> 10, i.bufferram >> 10, page_cache_size << (PAGE_SHIFT - 10), + i.totalbig >> 10, + i.freebig >> 10, i.totalswap >> 10, i.freeswap >> 10); } @@ -451,6 +457,8 @@ return pte_page(pte) + (ptr & ~PAGE_MASK); } +#include + static int get_array(struct task_struct *p, unsigned long start, unsigned long end, char * buffer) { unsigned long addr; @@ -463,6 +471,7 @@ addr = get_phys_addr(p, start); if (!addr) return result; + addr = kmap(addr, KM_READ); do { c = *(char *) addr; if (!c) @@ -470,12 +479,19 @@ if (size < PAGE_SIZE) buffer[size++] = c; else + { + kunmap(addr, KM_READ); return result; + } addr++; start++; if (!c && start >= end) + { + kunmap(addr, KM_READ); return result; + } } while (addr & ~PAGE_MASK); + kunmap(addr-1, KM_READ); } return result; } @@ -1161,11 +1177,11 @@ * + (index into the line) */ /* for systems with sizeof(void*) == 4: */ -#define MAPS_LINE_FORMAT4 "%08lx-%08lx %s %08lx %s %lu" -#define MAPS_LINE_MAX4 49 /* sum of 8 1 8 1 4 1 8 1 5 1 10 1 */ +#define MAPS_LINE_FORMAT4 "%08lx-%08lx %s %016Lx %s %lu" +#define MAPS_LINE_MAX4 57 /* sum of 8 1 8 1 4 1 16 1 5 1 10 1 */ /* for systems with sizeof(void*) == 8: */ -#define MAPS_LINE_FORMAT8 "%016lx-%016lx %s %016lx %s %lu" +#define MAPS_LINE_FORMAT8 "%016lx-%016lx %s %016Lx %s %lu" #define MAPS_LINE_MAX8 73 /* sum of 16 1 16 1 4 1 16 1 5 1 10 1 */ #define MAPS_LINE_MAX MAPS_LINE_MAX8 diff -urN 2.2.17pre19/fs/proc/fd.c 2.2.17pre19aa2/fs/proc/fd.c --- 2.2.17pre19/fs/proc/fd.c Sun Oct 31 23:31:32 1999 +++ 2.2.17pre19aa2/fs/proc/fd.c Sat Aug 26 17:13:51 2000 @@ -87,7 +87,6 @@ fd = 0; len = dentry->d_name.len; name = dentry->d_name.name; - if (len > 1 && *name == '0') goto out; while (len-- > 0) { c = *name - '0'; name++; @@ -147,7 +146,7 @@ ino = inode->i_ino; if (fd) ino = (ino & 0xffff0000) | PROC_PID_INO; - if (filldir(dirent, "..", fd+1, fd, ino) < 0) + if (filldir(dirent, "..", fd+1, fd, ino, DT_DIR) < 0) goto out; } @@ -177,7 +176,7 @@ read_unlock(&tasklist_lock); ino = (pid << 16) + PROC_PID_FD_DIR + fd; - if (filldir(dirent, buf+j, NUMBUF-j, fd+2, ino) < 0) + if (filldir(dirent, buf+j, NUMBUF-j, fd+2, ino, DT_LNK) < 0) goto out; read_lock(&tasklist_lock); diff -urN 2.2.17pre19/fs/proc/mem.c 2.2.17pre19aa2/fs/proc/mem.c --- 2.2.17pre19/fs/proc/mem.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/proc/mem.c Sat Aug 26 17:13:49 2000 @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -120,7 +121,9 @@ i = PAGE_SIZE-(addr & ~PAGE_MASK); if (i > scount) i = scount; + page = (char *) kmap((unsigned long) page, KM_READ); copy_to_user(tmp, page, i); + kunmap((unsigned long) page, KM_READ); addr += i; tmp += i; scount -= i; @@ -177,7 +180,9 @@ i = PAGE_SIZE-(addr & ~PAGE_MASK); if (i > count) i = count; + page = (unsigned long) kmap((unsigned long) page, KM_WRITE); copy_from_user(page, tmp, i); + kunmap((unsigned long) page, KM_WRITE); addr += i; tmp += i; count -= i; diff -urN 2.2.17pre19/fs/proc/openpromfs.c 2.2.17pre19aa2/fs/proc/openpromfs.c --- 2.2.17pre19/fs/proc/openpromfs.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/proc/openpromfs.c Sat Aug 26 17:13:51 2000 @@ -846,14 +846,14 @@ i = filp->f_pos; switch (i) { case 0: - if (filldir(dirent, ".", 1, i, ino) < 0) return 0; + if (filldir(dirent, ".", 1, i, ino, DT_DIR) < 0) return 0; i++; filp->f_pos++; /* fall thru */ case 1: if (filldir(dirent, "..", 2, i, (NODE(ino).parent == 0xffff) ? - PROC_ROOT_INO : NODE2INO(NODE(ino).parent)) < 0) + PROC_ROOT_INO : NODE2INO(NODE(ino).parent), DT_DIR) < 0) return 0; i++; filp->f_pos++; @@ -869,14 +869,14 @@ if (prom_getname (nodes[node].node, buffer, 128) < 0) return 0; if (filldir(dirent, buffer, strlen(buffer), - filp->f_pos, NODE2INO(node)) < 0) + filp->f_pos, NODE2INO(node), DT_DIR) < 0) return 0; filp->f_pos++; node = nodes[node].next; } j = NODEP2INO(NODE(ino).first_prop); if (!i) { - if (filldir(dirent, ".node", 5, filp->f_pos, j) < 0) + if (filldir(dirent, ".node", 5, filp->f_pos, j, DT_REG) < 0) return 0; filp->f_pos++; } else @@ -887,7 +887,7 @@ if (alias_names [i]) { if (filldir (dirent, alias_names [i], strlen (alias_names [i]), - filp->f_pos, j) < 0) return 0; + filp->f_pos, j, DT_REG) < 0) return 0; filp->f_pos++; } } @@ -899,7 +899,7 @@ if (i) i--; else { if (filldir(dirent, p, strlen(p), - filp->f_pos, j) < 0) + filp->f_pos, j, DT_REG) < 0) return 0; filp->f_pos++; } @@ -911,7 +911,7 @@ else { if (filldir(dirent, d->name, strlen(d->name), - filp->f_pos, d->inode) < 0) + filp->f_pos, d->inode, d->mode >> 12) < 0) return 0; filp->f_pos++; } diff -urN 2.2.17pre19/fs/proc/root.c 2.2.17pre19aa2/fs/proc/root.c --- 2.2.17pre19/fs/proc/root.c Sun Oct 31 23:31:32 1999 +++ 2.2.17pre19aa2/fs/proc/root.c Sat Aug 26 17:13:51 2000 @@ -845,7 +845,6 @@ } pid *= 10; pid += c; - if (!pid) break; if (pid & 0xffff0000) { pid = 0; break; @@ -892,13 +891,13 @@ i = filp->f_pos; switch (i) { case 0: - if (filldir(dirent, ".", 1, i, ino) < 0) + if (filldir(dirent, ".", 1, i, ino, DT_DIR) < 0) return 0; i++; filp->f_pos++; /* fall through */ case 1: - if (filldir(dirent, "..", 2, i, de->parent->low_ino) < 0) + if (filldir(dirent, "..", 2, i, de->parent->low_ino, DT_DIR) < 0) return 0; i++; filp->f_pos++; @@ -917,7 +916,7 @@ } do { - if (filldir(dirent, de->name, de->namelen, filp->f_pos, ino | de->low_ino) < 0) + if (filldir(dirent, de->name, de->namelen, filp->f_pos, ino | de->low_ino, de->mode >> 12) < 0) return 0; filp->f_pos++; de = de->next; @@ -984,7 +983,7 @@ pid /= 10; } while (pid); - if (filldir(dirent, buf+j, PROC_NUMBUF-j, filp->f_pos, ino) < 0) + if (filldir(dirent, buf+j, PROC_NUMBUF-j, filp->f_pos, ino, DT_DIR) < 0) break; filp->f_pos++; } diff -urN 2.2.17pre19/fs/qnx4/dir.c 2.2.17pre19aa2/fs/qnx4/dir.c --- 2.2.17pre19/fs/qnx4/dir.c Thu May 4 13:00:40 2000 +++ 2.2.17pre19aa2/fs/qnx4/dir.c Sat Aug 26 17:13:51 2000 @@ -61,7 +61,7 @@ QNX4_INODES_PER_BLOCK + le->dl_inode_ndx; } - if (filldir(dirent, de->di_fname, size, filp->f_pos, ino) < 0) { + if (filldir(dirent, de->di_fname, size, filp->f_pos, ino, DT_UNKNOWN) < 0) { brelse(bh); return 0; } diff -urN 2.2.17pre19/fs/read_write.c 2.2.17pre19aa2/fs/read_write.c --- 2.2.17pre19/fs/read_write.c Tue Aug 22 14:54:11 2000 +++ 2.2.17pre19aa2/fs/read_write.c Sat Aug 26 17:13:52 2000 @@ -48,7 +48,7 @@ asmlinkage off_t sys_lseek(unsigned int fd, off_t offset, unsigned int origin) { - off_t retval; + off_t retval, oldpos; struct file * file; struct dentry * dentry; struct inode * inode; @@ -62,9 +62,21 @@ if (!(dentry = file->f_dentry) || !(inode = dentry->d_inode)) goto out_putf; + oldpos = file->f_pos; retval = -EINVAL; if (origin <= 2) retval = llseek(file, offset, origin); + +#if BITS_PER_LONG < 64 + /* Demand L-F-S compliance only from normal files, + thus raw devices can do whatever they please.. */ + if (retval >= 0 && S_ISREG(inode->i_mode) && + !(file->f_flags & O_LARGEFILE) && + file->f_pos >= LONG_MAX) { + file->f_pos = oldpos; + retval = -EOVERFLOW; + } +#endif out_putf: fput(file); bad: @@ -81,7 +93,7 @@ struct file * file; struct dentry * dentry; struct inode * inode; - loff_t offset; + loff_t offset, oldpos; lock_kernel(); retval = -EBADF; @@ -96,6 +108,7 @@ if (origin > 2) goto out_putf; + oldpos = file->f_pos; offset = llseek(file, ((loff_t) offset_high << 32) | offset_low, origin); @@ -105,6 +118,14 @@ if (!copy_to_user(result, &offset, sizeof(offset))) retval = 0; } + if (!(file->f_flags & O_LARGEFILE) && S_ISREG(inode->i_mode) && + file->f_pos >= LONG_MAX) { + /* The target position isn't presentable without + O_LARGEFILE flag being set --> yield error, and + restore the file position. */ + file->f_pos = oldpos; + retval = -EOVERFLOW; + } out_putf: fput(file); bad: @@ -166,9 +187,9 @@ if (!file->f_op || !(write = file->f_op->write)) goto out; - down(&inode->i_sem); + fs_down(&inode->i_sem); ret = write(file, buf, count, &file->f_pos); - up(&inode->i_sem); + fs_up(&inode->i_sem); out: fput(file); bad_file: @@ -314,9 +335,9 @@ if (!file) goto bad_file; if (file->f_op && file->f_op->write && (file->f_mode & FMODE_WRITE)) { - down(&file->f_dentry->d_inode->i_sem); + fs_down(&file->f_dentry->d_inode->i_sem); ret = do_readv_writev(VERIFY_READ, file, vector, count); - up(&file->f_dentry->d_inode->i_sem); + fs_up(&file->f_dentry->d_inode->i_sem); } fput(file); @@ -335,6 +356,7 @@ ssize_t ret; struct file * file; ssize_t (*read)(struct file *, char *, size_t, loff_t *); + struct inode * inode; lock_kernel(); @@ -342,10 +364,13 @@ file = fget(fd); if (!file) goto bad_file; + + inode = file->f_dentry->d_inode; + if (!(file->f_mode & FMODE_READ)) goto out; - ret = locks_verify_area(FLOCK_VERIFY_READ, file->f_dentry->d_inode, - file, pos, count); + + ret = locks_verify_area(FLOCK_VERIFY_READ, inode, file, pos, count); if (ret) goto out; ret = -EINVAL; @@ -367,6 +392,7 @@ ssize_t ret; struct file * file; ssize_t (*write)(struct file *, const char *, size_t, loff_t *); + struct inode * inode; lock_kernel(); @@ -376,8 +402,10 @@ goto bad_file; if (!(file->f_mode & FMODE_WRITE)) goto out; - ret = locks_verify_area(FLOCK_VERIFY_WRITE, file->f_dentry->d_inode, - file, pos, count); + + inode = file->f_dentry->d_inode; + + ret = locks_verify_area(FLOCK_VERIFY_WRITE, inode, file, pos, count); if (ret) goto out; ret = -EINVAL; @@ -386,9 +414,9 @@ if (pos < 0) goto out; - down(&file->f_dentry->d_inode->i_sem); + fs_down(&inode->i_sem); ret = write(file, buf, count, &pos); - up(&file->f_dentry->d_inode->i_sem); + fs_up(&inode->i_sem); out: fput(file); diff -urN 2.2.17pre19/fs/readdir.c 2.2.17pre19aa2/fs/readdir.c --- 2.2.17pre19/fs/readdir.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/readdir.c Sat Aug 26 17:13:51 2000 @@ -36,7 +36,7 @@ int count; }; -static int fillonedir(void * __buf, const char * name, int namlen, off_t offset, ino_t ino) +static int fillonedir(void * __buf, const char * name, int namlen, off_t offset, ino_t ino, unsigned int d_type) { struct readdir_callback * buf = (struct readdir_callback *) __buf; struct old_linux_dirent * dirent; @@ -118,7 +118,7 @@ int error; }; -static int filldir(void * __buf, const char * name, int namlen, off_t offset, ino_t ino) +static int filldir(void * __buf, const char * name, int namlen, off_t offset, ino_t ino, unsigned int d_type) { struct linux_dirent * dirent; struct getdents_callback * buf = (struct getdents_callback *) __buf; @@ -188,6 +188,123 @@ if (lastdirent) { put_user(file->f_pos, &lastdirent->d_off); error = count - buf.count; + } + +out_putf: + fput(file); +out: + unlock_kernel(); + return error; +} + + +/* + * And even better one including d_type field and 64bit d_ino and d_off. + */ +struct linux_dirent64 { + u64 d_ino; + s64 d_off; + unsigned short d_reclen; + unsigned char d_type; + char d_name[0]; +}; + +#define ROUND_UP64(x) (((x)+sizeof(u64)-1) & ~(sizeof(u64)-1)) + +struct getdents_callback64 { + struct linux_dirent64 * current_dir; + struct linux_dirent64 * previous; + int count; + int error; +}; + +static int filldir64(void * __buf, const char * name, int namlen, off_t offset, + ino_t ino, unsigned int d_type) +{ + struct linux_dirent64 * dirent, d; + struct getdents_callback64 * buf = (struct getdents_callback64 *) __buf; + int reclen = ROUND_UP64(NAME_OFFSET(dirent) + namlen + 1); + + buf->error = -EINVAL; /* only used if we fail.. */ + if (reclen > buf->count) + return -EINVAL; + dirent = buf->previous; + if (dirent) { +#if BITS_PER_LONG < 64 + d.d_off = offset; + copy_to_user(&dirent->d_off, &d.d_off, sizeof(d.d_off)); +#else + put_user(offset, &dirent->d_off); +#endif + } + dirent = buf->current_dir; + buf->previous = dirent; + memset(&d, 0, NAME_OFFSET(&d)); + d.d_ino = ino; + d.d_reclen = reclen; + d.d_type = d_type; + copy_to_user(dirent, &d, NAME_OFFSET(&d)); + copy_to_user(dirent->d_name, name, namlen); + put_user(0, dirent->d_name + namlen); + ((char *) dirent) += reclen; + buf->current_dir = dirent; + buf->count -= reclen; + return 0; +} + +asmlinkage int sys_getdents64(unsigned int fd, void * dirent, unsigned int count) +{ + struct file * file; + struct dentry * dentry; + struct inode * inode; + struct linux_dirent64 * lastdirent; + struct getdents_callback64 buf; + int error; + + lock_kernel(); + error = -EBADF; + file = fget(fd); + if (!file) + goto out; + + dentry = file->f_dentry; + if (!dentry) + goto out_putf; + + inode = dentry->d_inode; + if (!inode) + goto out_putf; + + buf.current_dir = (struct linux_dirent64 *) dirent; + buf.previous = NULL; + buf.count = count; + buf.error = 0; + + error = -ENOTDIR; + if (!file->f_op || !file->f_op->readdir) + goto out_putf; + + + /* + * Get the inode's semaphore to prevent changes + * to the directory while we read it. + */ + down(&inode->i_sem); + error = file->f_op->readdir(file, &buf, filldir64); + up(&inode->i_sem); + if (error < 0) + goto out_putf; + error = buf.error; + lastdirent = buf.previous; + if (lastdirent) { +#if BITS_PER_LONG < 64 + s64 d_off; + d_off = file->f_pos; + copy_to_user(&lastdirent->d_off, &d_off, sizeof(d_off)); + error = count - buf.count; +#else + put_user(file->f_pos, &lastdirent->d_off); +#endif } out_putf: diff -urN 2.2.17pre19/fs/romfs/inode.c 2.2.17pre19aa2/fs/romfs/inode.c --- 2.2.17pre19/fs/romfs/inode.c Thu May 4 13:00:40 2000 +++ 2.2.17pre19aa2/fs/romfs/inode.c Sat Aug 26 17:13:51 2000 @@ -258,6 +258,10 @@ return res; } +static unsigned char romfs_dtype_table[] = { + DT_UNKNOWN, DT_DIR, DT_REG, DT_LNK, DT_BLK, DT_CHR, DT_SOCK, DT_FIFO +}; + static int romfs_readdir(struct file *filp, void *dirent, filldir_t filldir) { @@ -302,7 +306,8 @@ nextfh = ntohl(ri.next); if ((nextfh & ROMFH_TYPE) == ROMFH_HRD) ino = ntohl(ri.spec); - if (filldir(dirent, fsname, j, offset, ino) < 0) { + if (filldir(dirent, fsname, j, offset, ino, + romfs_dtype_table[nextfh & ROMFH_TYPE]) < 0) { return stored; } stored++; @@ -396,7 +401,7 @@ buf = page_address(page); clear_bit(PG_uptodate, &page->flags); clear_bit(PG_error, &page->flags); - offset = page->offset; + offset = pgoff2loff(page->index); if (offset < inode->i_size) { avail = inode->i_size-offset; readlen = min(avail, PAGE_SIZE); diff -urN 2.2.17pre19/fs/smbfs/cache.c 2.2.17pre19aa2/fs/smbfs/cache.c --- 2.2.17pre19/fs/smbfs/cache.c Tue Aug 22 14:54:11 2000 +++ 2.2.17pre19aa2/fs/smbfs/cache.c Sat Aug 26 17:13:51 2000 @@ -40,7 +40,7 @@ struct cache_head * cachep; VERBOSE("finding cache for %s/%s\n", DENTRY_PATH(dentry)); - cachep = (struct cache_head *) get_cached_page(inode, 0, 1); + cachep = (struct cache_head *) get_cached_page(inode, ulong2pgoff(0), 1); if (!cachep) goto out; if (cachep->valid) @@ -61,9 +61,10 @@ PARANOIA("cache %s/%s has existing block!\n", DENTRY_PATH(dentry)); #endif - offset = PAGE_SIZE + (i << PAGE_SHIFT); - block = (struct cache_block *) get_cached_page(inode, - offset, 0); + /* byte_offset = PAGE_SIZE + (i << PAGE_SHIFT); */ + /* --> page_offset = 1 + i */ + block = (struct cache_block *) + get_cached_page(inode, ulong2pgoff(i+1), 0); if (!block) goto out; index->block = block; @@ -128,7 +129,7 @@ struct inode * inode = get_cache_inode(cachep); struct cache_index * index; struct cache_block * block; - unsigned long page_off; + pgoff_t page_off; unsigned int nent, offset, len = entry->len; unsigned int needed = len + sizeof(struct cache_entry); @@ -180,14 +181,15 @@ */ get_block: cachep->pages++; - page_off = PAGE_SIZE + (cachep->idx << PAGE_SHIFT); + /* page_byte_off = PAGE_SIZE + (cachep->idx << PAGE_SHIFT); */ + page_off = ulong2pgoff(1 + cachep->idx); block = (struct cache_block *) get_cached_page(inode, page_off, 1); if (block) { index->block = block; index->space = PAGE_SIZE; VERBOSE("inode=%p, pages=%d, block at %ld\n", - inode, cachep->pages, page_off); + inode, cachep->pages, (u_long)pgoff2loff(page_off)); goto add_entry; } /* diff -urN 2.2.17pre19/fs/smbfs/dir.c 2.2.17pre19aa2/fs/smbfs/dir.c --- 2.2.17pre19/fs/smbfs/dir.c Tue Aug 22 14:54:11 2000 +++ 2.2.17pre19aa2/fs/smbfs/dir.c Sat Aug 26 17:13:51 2000 @@ -91,12 +91,12 @@ switch ((unsigned int) filp->f_pos) { case 0: - if (filldir(dirent, ".", 1, 0, dir->i_ino) < 0) + if (filldir(dirent, ".", 1, 0, dir->i_ino, DT_DIR) < 0) goto out; filp->f_pos = 1; case 1: if (filldir(dirent, "..", 2, 1, - dentry->d_parent->d_inode->i_ino) < 0) + dentry->d_parent->d_inode->i_ino, DT_DIR) < 0) goto out; filp->f_pos = 2; } @@ -151,7 +151,7 @@ } if (filldir(dirent, entry->name, entry->len, - filp->f_pos, entry->ino) < 0) + filp->f_pos, entry->ino, DT_UNKNOWN) < 0) break; filp->f_pos += 1; } diff -urN 2.2.17pre19/fs/smbfs/file.c 2.2.17pre19aa2/fs/smbfs/file.c --- 2.2.17pre19/fs/smbfs/file.c Tue Aug 22 14:54:11 2000 +++ 2.2.17pre19aa2/fs/smbfs/file.c Sat Aug 26 17:13:51 2000 @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -47,15 +48,15 @@ smb_readpage_sync(struct dentry *dentry, struct page *page) { char *buffer = (char *) page_address(page); - unsigned long offset = page->offset; + loff_t loffset = pgoff2loff(page->index); int rsize = smb_get_rsize(server_from_dentry(dentry)); int count = PAGE_SIZE; int result; clear_bit(PG_error, &page->flags); - VERBOSE("file %s/%s, count=%d@%ld, rsize=%d\n", - DENTRY_PATH(dentry), count, offset, rsize); + VERBOSE("file %s/%s, count=%d@%Ld, rsize=%d\n", + DENTRY_PATH(dentry), count, loffset, rsize); result = smb_open(dentry, SMB_O_RDONLY); if (result < 0) { @@ -68,12 +69,12 @@ if (count < rsize) rsize = count; - result = smb_proc_read(dentry, offset, rsize, buffer); + result = smb_proc_read(dentry, loffset, rsize, buffer); if (result < 0) goto io_error; count -= result; - offset += result; + loffset += result; buffer += result; dentry->d_inode->i_atime = CURRENT_TIME; if (result < rsize) @@ -113,23 +114,37 @@ * Offset is the data offset within the page. */ static int -smb_writepage_sync(struct dentry *dentry, struct page *page, +smb_writepage_sync(struct file *file, struct page *page, unsigned long offset, unsigned int count) { + struct dentry * dentry = file->f_dentry; struct inode *inode = dentry->d_inode; u8 *buffer = (u8 *) page_address(page) + offset; int wsize = smb_get_wsize(server_from_dentry(dentry)); int result, written = 0; + loff_t loffset = pgoff2loff(page->index) + offset; + + VERBOSE("file %s/%s, count=%d@%Ld, wsize=%d\n", + DENTRY_PATH(dentry), count, loffset, wsize); - offset += page->offset; - VERBOSE("file %s/%s, count=%d@%ld, wsize=%d\n", - DENTRY_PATH(dentry), count, offset, wsize); + if (!(file->f_flags & O_LARGEFILE) && + loffset >= 0x7ffffffeULL) + return -EFBIG; + + if (!(file->f_flags & O_LARGEFILE) && + loffset + count >= 0x7fffffffULL) + count = LONG_MAX - loffset; + + if (loffset >= 0xffffffffULL) /* 4G-1 ??? Or 2G-1 ??? */ + return -EFBIG; + if ((loffset + count) >= 0xffffffffULL) + count = 0xffffffffULL - loffset; do { if (count < wsize) wsize = count; - result = smb_proc_write(dentry, offset, wsize, buffer); + result = smb_proc_write(dentry, loffset, wsize, buffer); if (result < 0) break; /* N.B. what if result < wsize?? */ @@ -138,29 +153,27 @@ printk(KERN_DEBUG "short write, wsize=%d, result=%d\n", wsize, result); #endif - buffer += wsize; - offset += wsize; + buffer += wsize; + loffset += wsize; written += wsize; - count -= wsize; + count -= wsize; /* * Update the inode now rather than waiting for a refresh. */ inode->i_mtime = inode->i_atime = CURRENT_TIME; - if (offset > inode->i_size) - inode->i_size = offset; + if (loffset > inode->i_size) + inode->i_size = loffset; inode->u.smbfs_i.cache_valid |= SMB_F_LOCALWRITE; } while (count); return written ? written : result; } /* - * Write a page to the server. This will be used for NFS swapping only - * (for now), and we currently do this synchronously only. + * Write a page to the server. */ static int smb_writepage(struct file *file, struct page *page) { - struct dentry *dentry = file->f_dentry; int result; #ifdef SMBFS_PARANOIA @@ -169,7 +182,7 @@ #endif set_bit(PG_locked, &page->flags); atomic_inc(&page->count); - result = smb_writepage_sync(dentry, page, 0, PAGE_SIZE); + result = smb_writepage_sync(file, page, 0, PAGE_SIZE); smb_unlock_page(page); free_page(page_address(page)); return result; @@ -180,10 +193,10 @@ { struct dentry *dentry = file->f_dentry; - DEBUG1("(%s/%s %d@%ld, sync=%d)\n", - DENTRY_PATH(dentry), count, page->offset+offset, sync); + DEBUG1("(%s/%s %d@%Ld, sync=%d)\n", + DENTRY_PATH(dentry), count, pgoff2loff(page->index)+offset, sync); - return smb_writepage_sync(dentry, page, offset, count); + return smb_writepage_sync(file, page, offset, count); } static ssize_t @@ -192,8 +205,8 @@ struct dentry * dentry = file->f_dentry; ssize_t status; - VERBOSE("file %s/%s, count=%lu@%lu\n", DENTRY_PATH(dentry), - (unsigned long) count, (unsigned long) *ppos); + VERBOSE("file %s/%s, count=%lu@%Lu\n", DENTRY_PATH(dentry), + (unsigned long) count, *ppos); status = smb_revalidate_inode(dentry); if (status) @@ -242,8 +255,8 @@ struct dentry * dentry = file->f_dentry; ssize_t result; - VERBOSE("file %s/%s, count=%lu@%lu, pages=%ld\n", DENTRY_PATH(dentry), - (unsigned long) count, (unsigned long) *ppos, + VERBOSE("file %s/%s, count=%lu@%Lu, pages=%ld\n", DENTRY_PATH(dentry), + (unsigned long) count, *ppos, dentry->d_inode->i_nrpages); result = smb_revalidate_inode(dentry); @@ -261,8 +274,8 @@ if (count > 0) { result = generic_file_write(file, buf, count, ppos); - VERBOSE("pos=%ld, size=%ld, mtime=%ld, atime=%ld\n", - (long) file->f_pos, dentry->d_inode->i_size, + VERBOSE("pos=%Ld, size=%Ld, mtime=%ld, atime=%ld\n", + file->f_pos, dentry->d_inode->i_size, dentry->d_inode->i_mtime, dentry->d_inode->i_atime); } out: diff -urN 2.2.17pre19/fs/smbfs/proc.c 2.2.17pre19aa2/fs/smbfs/proc.c --- 2.2.17pre19/fs/smbfs/proc.c Tue Aug 22 14:54:11 2000 +++ 2.2.17pre19aa2/fs/smbfs/proc.c Sat Aug 26 17:13:51 2000 @@ -947,13 +947,16 @@ file-id would not be valid after a reconnection. */ int -smb_proc_read(struct dentry *dentry, off_t offset, int count, char *data) +smb_proc_read(struct dentry *dentry, loff_t offset, int count, char *data) { struct smb_sb_info *server = server_from_dentry(dentry); __u16 returned_count, data_len; unsigned char *buf; int result; + if (offset > 0xffffffff) + return -EIO; + smb_lock_server(server); smb_setup_header(server, SMBread, 5, 0); buf = server->packet; @@ -996,13 +999,16 @@ } int -smb_proc_write(struct dentry *dentry, off_t offset, int count, const char *data) +smb_proc_write(struct dentry *dentry, loff_t offset, int count, const char *data) { struct smb_sb_info *server = server_from_dentry(dentry); int result; __u8 *p; - VERBOSE("file %s/%s, count=%d@%ld, packet_size=%d\n", + if (offset > 0xffffffff) + return -EIO; + + VERBOSE("file %s/%s, count=%d@%Ld, packet_size=%d\n", DENTRY_PATH(dentry), count, offset, server->packet_size); smb_lock_server(server); diff -urN 2.2.17pre19/fs/stat.c 2.2.17pre19aa2/fs/stat.c --- 2.2.17pre19/fs/stat.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/stat.c Sat Aug 26 17:13:51 2000 @@ -48,6 +48,10 @@ tmp.st_uid = inode->i_uid; tmp.st_gid = inode->i_gid; tmp.st_rdev = kdev_t_to_nr(inode->i_rdev); +#if BITS_PER_LONG == 32 + if (inode->i_size > 0x7fffffff) + return -EOVERFLOW; +#endif tmp.st_size = inode->i_size; tmp.st_atime = inode->i_atime; tmp.st_mtime = inode->i_mtime; @@ -70,6 +74,10 @@ tmp.st_uid = inode->i_uid; tmp.st_gid = inode->i_gid; tmp.st_rdev = kdev_t_to_nr(inode->i_rdev); +#if BITS_PER_LONG == 32 + if (inode->i_size > 0x7fffffff) + return -EOVERFLOW; +#endif tmp.st_size = inode->i_size; tmp.st_atime = inode->i_atime; tmp.st_mtime = inode->i_mtime; @@ -280,3 +288,124 @@ unlock_kernel(); return error; } + + +/* ---------- LFS-64 ----------- */ +#if !defined(__alpha__) + +static long cp_new_stat64(struct inode * inode, struct stat64 * statbuf) +{ + struct stat64 tmp; + unsigned int blocks, indirect; + + memset(&tmp, 0, sizeof(tmp)); + tmp.st_dev = kdev_t_to_nr(inode->i_dev); + tmp.st_ino = inode->i_ino; + tmp.st_mode = inode->i_mode; + tmp.st_nlink = inode->i_nlink; + tmp.st_uid = inode->i_uid; + tmp.st_gid = inode->i_gid; + tmp.st_rdev = kdev_t_to_nr(inode->i_rdev); + tmp.st_atime = inode->i_atime; + tmp.st_mtime = inode->i_mtime; + tmp.st_ctime = inode->i_ctime; + tmp.st_size = inode->i_size; +/* + * st_blocks and st_blksize are approximated with a simple algorithm if + * they aren't supported directly by the filesystem. The minix and msdos + * filesystems don't keep track of blocks, so they would either have to + * be counted explicitly (by delving into the file itself), or by using + * this simple algorithm to get a reasonable (although not 100% accurate) + * value. + */ + +/* + * Use minix fs values for the number of direct and indirect blocks. The + * count is now exact for the minix fs except that it counts zero blocks. + * Everything is in units of BLOCK_SIZE until the assignment to + * tmp.st_blksize. + */ +#define D_B 7 +#define I_B (BLOCK_SIZE / sizeof(unsigned short)) + + if (!inode->i_blksize) { + blocks = (tmp.st_size + BLOCK_SIZE - 1) >> BLOCK_SIZE_BITS; + if (blocks > D_B) { + indirect = (blocks - D_B + I_B - 1) / I_B; + blocks += indirect; + if (indirect > 1) { + indirect = (indirect - 1 + I_B - 1) / I_B; + blocks += indirect; + if (indirect > 1) + blocks++; + } + } + tmp.st_blocks = (BLOCK_SIZE / 512) * blocks; + tmp.st_blksize = BLOCK_SIZE; + } else { + tmp.st_blocks = inode->i_blocks; + tmp.st_blksize = inode->i_blksize; + } + return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0; +} + +asmlinkage long sys_stat64(char * filename, struct stat64 * statbuf, long flags) +{ + struct dentry * dentry; + int error; + + lock_kernel(); + dentry = namei(filename); + + error = PTR_ERR(dentry); + if (!IS_ERR(dentry)) { + error = do_revalidate(dentry); + if (!error) + error = cp_new_stat64(dentry->d_inode, statbuf); + + dput(dentry); + } + unlock_kernel(); + return error; +} + +asmlinkage long sys_lstat64(char * filename, struct stat64 * statbuf, long flags) +{ + struct dentry * dentry; + int error; + + lock_kernel(); + dentry = lnamei(filename); + + error = PTR_ERR(dentry); + if (!IS_ERR(dentry)) { + error = do_revalidate(dentry); + if (!error) + error = cp_new_stat64(dentry->d_inode, statbuf); + + dput(dentry); + } + unlock_kernel(); + return error; +} + +asmlinkage long sys_fstat64(unsigned long fd, struct stat64 * statbuf, long flags) +{ + struct file * f; + int err = -EBADF; + + lock_kernel(); + f = fget(fd); + if (f) { + struct dentry * dentry = f->f_dentry; + + err = do_revalidate(dentry); + if (!err) + err = cp_new_stat64(dentry->d_inode, statbuf); + fput(f); + } + unlock_kernel(); + return err; +} + +#endif /* LFS-64 */ diff -urN 2.2.17pre19/fs/sysv/dir.c 2.2.17pre19aa2/fs/sysv/dir.c --- 2.2.17pre19/fs/sysv/dir.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/sysv/dir.c Sat Aug 26 17:13:51 2000 @@ -100,7 +100,7 @@ inode->i_ino, (off_t) filp->f_pos, sde.inode); i = strnlen(sde.name, SYSV_NAMELEN); - if (filldir(dirent, sde.name, i, filp->f_pos, sde.inode) < 0) { + if (filldir(dirent, sde.name, i, filp->f_pos, sde.inode, DT_UNKNOWN) < 0) { brelse(bh); return 0; } diff -urN 2.2.17pre19/fs/sysv/file.c 2.2.17pre19aa2/fs/sysv/file.c --- 2.2.17pre19/fs/sysv/file.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/sysv/file.c Sat Aug 26 17:13:51 2000 @@ -207,7 +207,7 @@ { struct inode * inode = filp->f_dentry->d_inode; struct super_block * sb = inode->i_sb; - off_t pos; + loff_t pos; ssize_t written, c; struct buffer_head * bh; char * p; @@ -232,6 +232,21 @@ else pos = *ppos; written = 0; + + /* L-F-S spec 2.2.1.27: */ + if (!(filp->f_flags & O_LARGEFILE)) { + if (pos >= 0x7ffffffeULL) /* pos@2G forbidden */ + return -EFBIG; + + if (pos + count >= 0x7fffffffULL) + /* Write only until end of allowed region */ + count = 0x7fffffffULL - pos; + } + if (pos >= 0xffffffffULL) + return -EFBIG; /* Only up to 4G-1! */ + if ((pos + count) > 0xffffffffULL) + count = 0xffffffffULL - pos; + while (written> sb->sv_block_size_bits, 1); if (!bh) { diff -urN 2.2.17pre19/fs/ufs/balloc.c 2.2.17pre19aa2/fs/ufs/balloc.c --- 2.2.17pre19/fs/ufs/balloc.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/ufs/balloc.c Sat Aug 26 17:13:51 2000 @@ -660,9 +660,9 @@ struct ufs_sb_private_info * uspi; struct ufs_super_block_first * usb1; struct ufs_cylinder_group * ucg; - unsigned start, length, location, result; - unsigned possition, fragsize, blockmap, mask; - unsigned swab; + unsigned int start, length, location, result; + unsigned int possition, fragsize, blockmap, mask; + unsigned int swab; UFSD(("ENTER, cg %u, goal %u, count %u\n", ucpi->c_cgx, goal, count)) @@ -676,7 +676,7 @@ else start = ucpi->c_frotor >> 3; - length = howmany(uspi->s_fpg, 8) - start; + length = ((uspi->s_fpg + 7) >> 3) - start; location = ubh_scanc(UCPI_UBH, ucpi->c_freeoff + start, length, (uspi->s_fpb == 8) ? ufs_fragtable_8fpb : ufs_fragtable_other, 1 << (count - 1 + (uspi->s_fpb & 7))); diff -urN 2.2.17pre19/fs/ufs/dir.c 2.2.17pre19aa2/fs/ufs/dir.c --- 2.2.17pre19/fs/ufs/dir.c Thu May 4 13:00:40 2000 +++ 2.2.17pre19aa2/fs/ufs/dir.c Sat Aug 26 17:13:51 2000 @@ -15,6 +15,7 @@ #include #include +#include #include "swab.h" #include "util.h" @@ -124,11 +125,14 @@ * not the directory has been modified * during the copy operation. */ unsigned long version = inode->i_version; + unsigned char d_type = DT_UNKNOWN; UFSD(("filldir(%s,%u)\n", de->d_name, SWAB32(de->d_ino))) UFSD(("namlen %u\n", ufs_get_de_namlen(de))) + if ((flags & UFS_DE_MASK) == UFS_DE_44BSD) + d_type = de->d_u.d_44.d_type; error = filldir(dirent, de->d_name, ufs_get_de_namlen(de), - filp->f_pos, SWAB32(de->d_ino)); + filp->f_pos, SWAB32(de->d_ino), d_type); if (error) break; if (version != inode->i_version) @@ -170,7 +174,7 @@ error_msg = "inode out of bounds"; if (error_msg != NULL) - ufs_error (sb, function, "bad entry in directory #%lu, size %lu: %s - " + ufs_error (sb, function, "bad entry in directory #%lu, size %Lu: %s - " "offset=%lu, inode=%lu, reclen=%d, namlen=%d", dir->i_ino, dir->i_size, error_msg, offset, (unsigned long) SWAB32(de->d_ino), diff -urN 2.2.17pre19/fs/ufs/file.c 2.2.17pre19aa2/fs/ufs/file.c --- 2.2.17pre19/fs/ufs/file.c Sun Apr 2 21:07:49 2000 +++ 2.2.17pre19aa2/fs/ufs/file.c Sat Aug 26 17:13:51 2000 @@ -140,7 +140,7 @@ loff_t *ppos ) { struct inode * inode = filp->f_dentry->d_inode; - __u32 pos; + loff_t pos; long block; int offset; int written, c; @@ -177,11 +177,14 @@ return -EINVAL; } - /* Check for overflow.. */ - if (pos > (__u32) (pos + count)) { - count = ~pos; /* == 0xFFFFFFFF - pos */ - if (!count) + /* L-F-S spec 2.2.1.27: */ + if (!(filp->f_flags & O_LARGEFILE)) { + if (pos >= 0x7ffffffeULL) /* pos@2G forbidden */ return -EFBIG; + + if (pos + count >= 0x7fffffffULL) + /* Write only until end of allowed region */ + count = 0x7fffffffULL - pos; } /* diff -urN 2.2.17pre19/fs/ufs/inode.c 2.2.17pre19aa2/fs/ufs/inode.c --- 2.2.17pre19/fs/ufs/inode.c Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/fs/ufs/inode.c Sat Aug 26 17:13:51 2000 @@ -54,7 +54,7 @@ { unsigned swab = inode->i_sb->u.ufs_sb.s_swab; printk("ino %lu mode 0%6.6o nlink %d uid %d uid32 %u" - " gid %d gid32 %u size %lu blocks %lu\n", + " gid %d gid32 %u size %Lu blocks %lu\n", inode->i_ino, inode->i_mode, inode->i_nlink, inode->i_uid, inode->u.ufs_i.i_uid, inode->i_gid, inode->u.ufs_i.i_gid, inode->i_size, inode->i_blocks); @@ -213,13 +213,14 @@ if (!create) return NULL; limit = current->rlim[RLIMIT_FSIZE].rlim_cur; - if (limit < RLIM_INFINITY) { + if (limit != RLIM_INFINITY) { limit >>= sb->s_blocksize_bits; if (new_fragment >= limit) { send_sig(SIGXFSZ, current, 0); return NULL; } } + lastblock = ufs_fragstoblks (lastfrag); lastblockoff = ufs_fragnum (lastfrag); /* @@ -321,7 +322,8 @@ brelse (result); goto repeat; } - if (!create || new_fragment >= (current->rlim[RLIMIT_FSIZE].rlim_cur >> sb->s_blocksize)) { + if (!create || (current->rlim[RLIMIT_FSIZE].rlim_cur != RLIM_INFINITY && + new_fragment >= (current->rlim[RLIMIT_FSIZE].rlim_cur >> sb->s_blocksize))) { brelse (bh); *err = -EFBIG; return NULL; @@ -497,13 +499,10 @@ } /* - * Linux i_size can be 32 on some architectures. We will mark - * big files as read only and let user access first 32 bits. + * Linux i_size used to be 32 bits on some architectures. + * These days we allow access to the entire file as is.. */ - inode->u.ufs_i.i_size = SWAB64(ufs_inode->ui_size); - inode->i_size = (off_t) inode->u.ufs_i.i_size; - if (sizeof(off_t) == 4 && (inode->u.ufs_i.i_size >> 32)) - inode->i_size = (__u32)-1; + inode->i_size = SWAB64(ufs_inode->ui_size); inode->i_atime = SWAB32(ufs_inode->ui_atime.tv_sec); inode->i_ctime = SWAB32(ufs_inode->ui_ctime.tv_sec); @@ -516,7 +515,7 @@ inode->u.ufs_i.i_gen = SWAB32(ufs_inode->ui_gen); inode->u.ufs_i.i_shadow = SWAB32(ufs_inode->ui_u3.ui_sun.ui_shadow); inode->u.ufs_i.i_oeftflag = SWAB32(ufs_inode->ui_u3.ui_sun.ui_oeftflag); - inode->u.ufs_i.i_lastfrag = howmany (inode->i_size, uspi->s_fsize); + inode->u.ufs_i.i_lastfrag = (inode->i_size + uspi->s_fsize -1) >> uspi->s_fshift; if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) inode->i_rdev = to_kdev_t(SWAB32(ufs_inode->ui_u2.ui_addr.ui_db[0])); diff -urN 2.2.17pre19/fs/ufs/super.c 2.2.17pre19aa2/fs/ufs/super.c --- 2.2.17pre19/fs/ufs/super.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/ufs/super.c Sat Aug 26 17:13:51 2000 @@ -328,7 +328,7 @@ * on the device. */ size = uspi->s_cssize; - blks = howmany(size, uspi->s_fsize); + blks = (size + uspi->s_fsize-1) >> uspi->s_fshift; base = space = kmalloc(size, GFP_KERNEL); if (!base) goto failed; @@ -405,7 +405,7 @@ uspi = sb->u.ufs_sb.s_uspi; size = uspi->s_cssize; - blks = howmany(size, uspi->s_fsize); + blks = (size + uspi->s_fsize-1) >> uspi->s_fshift; base = space = (char*) sb->u.ufs_sb.s_csp[0]; for (i = 0; i < blks; i += uspi->s_fpb) { size = uspi->s_bsize; diff -urN 2.2.17pre19/fs/ufs/truncate.c 2.2.17pre19aa2/fs/ufs/truncate.c --- 2.2.17pre19/fs/ufs/truncate.c Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/fs/ufs/truncate.c Sat Aug 26 17:13:51 2000 @@ -59,8 +59,8 @@ * Linus */ -#define DIRECT_BLOCK howmany (inode->i_size, uspi->s_bsize) -#define DIRECT_FRAGMENT howmany (inode->i_size, uspi->s_fsize) +#define DIRECT_BLOCK ((inode->i_size + uspi->s_bsize -1) >> uspi->s_bshift) +#define DIRECT_FRAGMENT ((inode->i_size + uspi->s_fsize -1) >> uspi->s_fshift) static int ufs_trunc_direct (struct inode * inode) { @@ -194,7 +194,7 @@ } -static int ufs_trunc_indirect (struct inode * inode, unsigned offset, u32 * p) +static int ufs_trunc_indirect (struct inode * inode, u_long offset, u32 * p) { struct super_block * sb; struct ufs_sb_private_info * uspi; @@ -297,7 +297,7 @@ struct super_block * sb; struct ufs_sb_private_info * uspi; struct ufs_buffer_head * dind_bh; - unsigned i, tmp, dindirect_block; + unsigned int i, tmp, dindirect_block; u32 * dind; int retry = 0; unsigned swab; @@ -308,8 +308,8 @@ swab = sb->u.ufs_sb.s_swab; uspi = sb->u.ufs_sb.s_uspi; - dindirect_block = (DIRECT_BLOCK > offset) - ? ((DIRECT_BLOCK - offset) / uspi->s_apb) : 0; + dindirect_block = ((DIRECT_BLOCK > offset) ? + ((DIRECT_BLOCK - offset) >> uspi->s_apbshift) : 0); retry = 0; tmp = SWAB32(*p); @@ -379,7 +379,7 @@ retry = 0; tindirect_block = (DIRECT_BLOCK > (UFS_NDADDR + uspi->s_apb + uspi->s_2apb)) - ? ((DIRECT_BLOCK - UFS_NDADDR - uspi->s_apb - uspi->s_2apb) / uspi->s_2apb) : 0; + ? ((DIRECT_BLOCK - UFS_NDADDR - uspi->s_apb - uspi->s_2apb) >> uspi->s_2apbshift) : 0; p = inode->u.ufs_i.i_u1.i_data + UFS_TIND_BLOCK; if (!(tmp = SWAB32(*p))) return 0; @@ -467,7 +467,8 @@ } } inode->i_mtime = inode->i_ctime = CURRENT_TIME; - inode->u.ufs_i.i_lastfrag = howmany (inode->i_size, uspi->s_fsize); + inode->u.ufs_i.i_lastfrag = + (inode->i_size + uspi->s_fsize -1) >> uspi->s_fshift; mark_inode_dirty(inode); UFSD(("EXIT\n")) } diff -urN 2.2.17pre19/fs/ufs/util.h 2.2.17pre19aa2/fs/ufs/util.h --- 2.2.17pre19/fs/ufs/util.h Sat May 20 00:06:21 2000 +++ 2.2.17pre19aa2/fs/ufs/util.h Sat Aug 26 17:13:51 2000 @@ -14,7 +14,6 @@ * some useful macros */ #define in_range(b,first,len) ((b)>=(first)&&(b)<(first)+(len)) -#define howmany(x,y) (((x)+(y)-1)/(y)) #define min(x,y) ((x)<(y)?(x):(y)) #define max(x,y) ((x)>(y)?(x):(y)) diff -urN 2.2.17pre19/fs/umsdos/dir.c 2.2.17pre19aa2/fs/umsdos/dir.c --- 2.2.17pre19/fs/umsdos/dir.c Thu May 4 13:00:40 2000 +++ 2.2.17pre19aa2/fs/umsdos/dir.c Sat Aug 26 17:13:51 2000 @@ -90,7 +90,7 @@ if (d->count == 0) { PRINTK ((KERN_DEBUG "dir_once :%.*s: offset %Ld\n", len, name, offset)); - ret = d->filldir (d->dirbuf, name, len, offset, ino); + ret = d->filldir (d->dirbuf, name, len, offset, ino, DT_UNKNOWN); d->stop = ret < 0; d->count = 1; } @@ -136,7 +136,7 @@ Printk ((KERN_WARNING "umsdos_readdir_x: pseudo_root thing UMSDOS_SPECIAL_DIRFPOS\n")); if (filldir (dirbuf, "DOS", 3, - UMSDOS_SPECIAL_DIRFPOS, UMSDOS_ROOT_INO) == 0) { + UMSDOS_SPECIAL_DIRFPOS, UMSDOS_ROOT_INO, DT_DIR) == 0) { filp->f_pos++; } goto out_end; @@ -255,7 +255,7 @@ if (inode != pseudo_root && (internal_read || !(entry.flags & UMSDOS_HIDDEN))) { if (filldir (dirbuf, entry.name, entry.name_len, - cur_f_pos, inode->i_ino) < 0) { + cur_f_pos, inode->i_ino, DT_UNKNOWN) < 0) { new_filp.f_pos = cur_f_pos; } Printk(("umsdos_readdir_x: got %s/%s, ino=%ld\n", diff -urN 2.2.17pre19/fs/umsdos/rdir.c 2.2.17pre19aa2/fs/umsdos/rdir.c --- 2.2.17pre19/fs/umsdos/rdir.c Thu May 4 13:00:40 2000 +++ 2.2.17pre19aa2/fs/umsdos/rdir.c Sat Aug 26 17:13:51 2000 @@ -33,7 +33,8 @@ const char *name, int name_len, off_t offset, - ino_t ino) + ino_t ino, + unsigned int d_type) { int ret = 0; struct RDIR_FILLDIR *d = (struct RDIR_FILLDIR *) buf; @@ -48,11 +49,11 @@ /* Make sure the .. entry points back to the pseudo_root */ ino = pseudo_root->i_ino; } - ret = d->filldir (d->dirbuf, name, name_len, offset, ino); + ret = d->filldir (d->dirbuf, name, name_len, offset, ino, DT_UNKNOWN); } } else { /* Any DOS directory */ - ret = d->filldir (d->dirbuf, name, name_len, offset, ino); + ret = d->filldir (d->dirbuf, name, name_len, offset, ino, DT_UNKNOWN); } return ret; } diff -urN 2.2.17pre19/include/asm-alpha/bigmem.h 2.2.17pre19aa2/include/asm-alpha/bigmem.h --- 2.2.17pre19/include/asm-alpha/bigmem.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/asm-alpha/bigmem.h Sat Aug 26 17:13:49 2000 @@ -0,0 +1,27 @@ +/* + * linux/include/asm-alpha/bigmem.h + * + * On alpha we can address all the VM with a flat mapping. We need + * to differentiate BIGMEM memory only because the default PCI DMA window + * is currently limited to 2g. Thus kmap/kunmap are noops here. + * + * With bigmem support the alpha now is capable of allocating up to + * 2048Giga of memory. + * + * Copyright (C) 2000 Andrea Arcangeli , SuSE GmbH + */ + +#ifndef _ASM_BIGMEM_H +#define _ASM_BIGMEM_H + +#include + +#undef BIGMEM_DEBUG /* undef for production */ + +/* declarations for bigmem.c */ +extern unsigned long bigmem_start, bigmem_end; + +#define kmap(kaddr, type) kaddr +#define kunmap(vaddr, type) do { } while (0) + +#endif /* _ASM_BIGMEM_H */ diff -urN 2.2.17pre19/include/asm-alpha/fcntl.h 2.2.17pre19aa2/include/asm-alpha/fcntl.h --- 2.2.17pre19/include/asm-alpha/fcntl.h Mon Jan 17 16:44:43 2000 +++ 2.2.17pre19aa2/include/asm-alpha/fcntl.h Sat Aug 26 17:13:51 2000 @@ -20,6 +20,7 @@ #define O_DIRECT 040000 /* direct disk access - should check with OSF/1 */ #define O_DIRECTORY 0100000 /* must be a directory */ #define O_NOFOLLOW 0200000 /* don't follow links */ +#define O_LARGEFILE 0400000 /* will be set by the kernel on every open */ #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get f_flags */ @@ -61,5 +62,9 @@ __kernel_off_t l_len; __kernel_pid_t l_pid; }; + +#ifdef __KERNEL__ +#define flock64 flock +#endif #endif diff -urN 2.2.17pre19/include/asm-alpha/smplock.h 2.2.17pre19aa2/include/asm-alpha/smplock.h --- 2.2.17pre19/include/asm-alpha/smplock.h Thu Mar 30 16:04:36 2000 +++ 2.2.17pre19aa2/include/asm-alpha/smplock.h Sat Aug 26 17:13:49 2000 @@ -17,8 +17,6 @@ { if (task->lock_depth >= 0) spin_unlock(&kernel_flag); - release_irqlock(cpu); - __sti(); } /* diff -urN 2.2.17pre19/include/asm-alpha/unistd.h 2.2.17pre19aa2/include/asm-alpha/unistd.h --- 2.2.17pre19/include/asm-alpha/unistd.h Tue Aug 22 14:54:12 2000 +++ 2.2.17pre19aa2/include/asm-alpha/unistd.h Sat Aug 26 17:13:51 2000 @@ -314,6 +314,7 @@ #define __NR_pivot_root 374 /* implemented in 2.3 */ #define __NR_mincore 375 /* implemented in 2.3 */ #define __NR_pciconfig_iobase 376 +#define __NR_getdents64 377 #if defined(__GNUC__) diff -urN 2.2.17pre19/include/asm-arm/fcntl.h 2.2.17pre19aa2/include/asm-arm/fcntl.h --- 2.2.17pre19/include/asm-arm/fcntl.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-arm/fcntl.h Sat Aug 26 17:13:51 2000 @@ -18,6 +18,8 @@ #define FASYNC 020000 /* fcntl, for BSD compatibility */ #define O_DIRECTORY 040000 /* must be a directory */ #define O_NOFOLLOW 0100000 /* don't follow links */ +#define O_DIRECT 0200000 /* direct disk access hint - currently ignored */ +#define O_LARGEFILE 0400000 #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get f_flags */ @@ -33,6 +35,10 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_GETLK64 12 /* using 'struct flock64' */ +#define F_SETLK64 13 +#define F_SETLKW64 14 + /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ @@ -58,6 +64,14 @@ off_t l_start; off_t l_len; pid_t l_pid; +}; + +struct flock64 { + short l_type; + short l_whence; + loff_t l_start; + loff_t l_len; + pid_t l_pid; }; #endif diff -urN 2.2.17pre19/include/asm-arm/smplock.h 2.2.17pre19aa2/include/asm-arm/smplock.h --- 2.2.17pre19/include/asm-arm/smplock.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-arm/smplock.h Sat Aug 26 17:13:49 2000 @@ -15,8 +15,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-arm/stat.h 2.2.17pre19aa2/include/asm-arm/stat.h --- 2.2.17pre19/include/asm-arm/stat.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-arm/stat.h Sat Aug 26 17:13:51 2000 @@ -38,4 +38,5 @@ unsigned long __unused5; }; +/* Someone please add a glibc/arm compatible stat64 struct here. */ #endif diff -urN 2.2.17pre19/include/asm-arm/unistd.h 2.2.17pre19aa2/include/asm-arm/unistd.h --- 2.2.17pre19/include/asm-arm/unistd.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-arm/unistd.h Sat Aug 26 17:13:52 2000 @@ -198,6 +198,13 @@ /* 188 reserved */ /* 189 reserved */ #define __NR_vfork (__NR_SYSCALL_BASE+190) +/* #define __NR_getrlimit (__NR_SYSCALL_BASE+191) */ +#define __NR_mmap2 (__NR_SYSCALL_BASE+192) +#define __NR_truncate64 (__NR_SYSCALL_BASE+193) +#define __NR_ftruncate64 (__NR_SYSCALL_BASE+194) +#define __NR_stat64 (__NR_SYSCALL_BASE+195) +#define __NR_lstat64 (__NR_SYSCALL_BASE+196) +#define __NR_fstat64 (__NR_SYSCALL_BASE+197) #define __sys2(x) #x #define __sys1(x) __sys2(x) diff -urN 2.2.17pre19/include/asm-generic/smplock.h 2.2.17pre19aa2/include/asm-generic/smplock.h --- 2.2.17pre19/include/asm-generic/smplock.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-generic/smplock.h Sat Aug 26 17:13:49 2000 @@ -15,8 +15,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-i386/bigmem.h 2.2.17pre19aa2/include/asm-i386/bigmem.h --- 2.2.17pre19/include/asm-i386/bigmem.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/asm-i386/bigmem.h Sat Aug 26 17:13:49 2000 @@ -0,0 +1,69 @@ +/* + * bigmem.h: virtual kernel memory mappings for big memory + * + * Used in CONFIG_BIGMEM systems for memory pages which are not + * addressable by direct kernel virtual adresses. + * + * Copyright (C) 1999 Gerhard Wichert, Siemens AG + * Gerhard.Wichert@pdb.siemens.de + */ + +#ifndef _ASM_BIGMEM_H +#define _ASM_BIGMEM_H + +#include + +#undef BIGMEM_DEBUG /* undef for production */ + +/* declarations for bigmem.c */ +extern unsigned long bigmem_start, bigmem_end; +extern int nr_free_bigpages; + +extern pte_t *kmap_pte; +extern pgprot_t kmap_prot; + +extern void kmap_init(void) __init; + +/* kmap helper functions necessary to access the bigmem pages in kernel */ +#include +#include + +extern inline unsigned long kmap(unsigned long kaddr, enum km_type type) +{ + if (__pa(kaddr) < bigmem_start) + return kaddr; + { + enum fixed_addresses idx = type+KM_TYPE_NR*smp_processor_id(); + unsigned long vaddr = __fix_to_virt(FIX_KMAP_BEGIN+idx); + +#ifdef BIGMEM_DEBUG + if (!pte_none(*(kmap_pte-idx))) + { + __label__ here; + here: + printk(KERN_ERR "not null pte on CPU %d from %p\n", + smp_processor_id(), &&here); + } +#endif + set_pte(kmap_pte-idx, mk_pte(kaddr & PAGE_MASK, kmap_prot)); + __flush_tlb_one(vaddr); + + return vaddr | (kaddr & ~PAGE_MASK); + } +} + +extern inline void kunmap(unsigned long vaddr, enum km_type type) +{ +#ifdef BIGMEM_DEBUG + enum fixed_addresses idx = type+KM_TYPE_NR*smp_processor_id(); + if ((vaddr & PAGE_MASK) == __fix_to_virt(FIX_KMAP_BEGIN+idx)) + { + /* force other mappings to Oops if they'll try to access + this pte without first remap it */ + pte_clear(kmap_pte-idx); + __flush_tlb_one(vaddr); + } +#endif +} + +#endif /* _ASM_BIGMEM_H */ diff -urN 2.2.17pre19/include/asm-i386/fcntl.h 2.2.17pre19aa2/include/asm-i386/fcntl.h --- 2.2.17pre19/include/asm-i386/fcntl.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-i386/fcntl.h Sat Aug 26 17:13:52 2000 @@ -35,6 +35,10 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_GETLK64 12 /* using 'struct flock64' */ +#define F_SETLK64 13 +#define F_SETLKW64 14 + /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ @@ -60,6 +64,14 @@ off_t l_start; off_t l_len; pid_t l_pid; +}; + +struct flock64 { + short l_type; + short l_whence; + loff_t l_start; + loff_t l_len; + pid_t l_pid; }; #endif diff -urN 2.2.17pre19/include/asm-i386/fixmap.h 2.2.17pre19aa2/include/asm-i386/fixmap.h --- 2.2.17pre19/include/asm-i386/fixmap.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/asm-i386/fixmap.h Sat Aug 26 17:13:49 2000 @@ -6,6 +6,8 @@ * for more details. * * Copyright (C) 1998 Ingo Molnar + * + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 */ #ifndef _ASM_FIXMAP_H @@ -14,6 +16,10 @@ #include #include #include +#ifdef CONFIG_BIGMEM +#include +#include +#endif /* * Here we define all the compile-time 'special' virtual @@ -55,6 +61,10 @@ FIX_CO_APIC, /* Cobalt APIC Redirection Table */ FIX_LI_PCIA, /* Lithium PCI Bridge A */ FIX_LI_PCIB, /* Lithium PCI Bridge B */ +#endif +#ifdef CONFIG_BIGMEM + FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */ + FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1, #endif __end_of_fixed_addresses }; diff -urN 2.2.17pre19/include/asm-i386/io.h 2.2.17pre19aa2/include/asm-i386/io.h --- 2.2.17pre19/include/asm-i386/io.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/asm-i386/io.h Sat Aug 26 17:13:49 2000 @@ -27,6 +27,7 @@ /* * Bit simplified and optimized by Jan Hubicka + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999. */ #ifdef SLOW_IO_BY_JUMPING @@ -109,12 +110,20 @@ */ extern inline unsigned long virt_to_phys(volatile void * address) { +#ifdef CONFIG_BIGMEM + return __pa(address); +#else return __io_phys(address); +#endif } extern inline void * phys_to_virt(unsigned long address) { +#ifdef CONFIG_BIGMEM + return __va(address); +#else return __io_virt(address); +#endif } extern void * __ioremap(unsigned long offset, unsigned long size, unsigned long flags); diff -urN 2.2.17pre19/include/asm-i386/kmap_types.h 2.2.17pre19aa2/include/asm-i386/kmap_types.h --- 2.2.17pre19/include/asm-i386/kmap_types.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/asm-i386/kmap_types.h Sat Aug 26 17:13:49 2000 @@ -0,0 +1,10 @@ +#ifndef _ASM_KMAP_TYPES_H +#define _ASM_KMAP_TYPES_H + +enum km_type { + KM_READ, + KM_WRITE, + KM_TYPE_NR, +}; + +#endif diff -urN 2.2.17pre19/include/asm-i386/page.h 2.2.17pre19aa2/include/asm-i386/page.h --- 2.2.17pre19/include/asm-i386/page.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/asm-i386/page.h Sat Aug 26 17:13:49 2000 @@ -88,6 +88,7 @@ #define __pa(x) ((unsigned long)(x)-PAGE_OFFSET) #define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) #define MAP_NR(addr) (__pa(addr) >> PAGE_SHIFT) +#define PHYSMAP_NR(addr) ((unsigned long)(addr) >> PAGE_SHIFT) #endif /* __KERNEL__ */ diff -urN 2.2.17pre19/include/asm-i386/smplock.h 2.2.17pre19aa2/include/asm-i386/smplock.h --- 2.2.17pre19/include/asm-i386/smplock.h Wed Aug 23 19:57:40 2000 +++ 2.2.17pre19aa2/include/asm-i386/smplock.h Sat Aug 26 17:13:49 2000 @@ -15,8 +15,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-i386/stat.h 2.2.17pre19aa2/include/asm-i386/stat.h --- 2.2.17pre19/include/asm-i386/stat.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-i386/stat.h Sat Aug 26 17:13:52 2000 @@ -38,4 +38,40 @@ unsigned long __unused5; }; +/* This matches struct stat64 in glibc2.1, hence the absolutely + * insane amounts of padding around dev_t's. + */ +struct stat64 { + unsigned short st_dev; + unsigned char __pad0[10]; + + unsigned long st_ino; + unsigned int st_mode; + unsigned int st_nlink; + + unsigned long st_uid; + unsigned long st_gid; + + unsigned short st_rdev; + unsigned char __pad3[10]; + + long long st_size; + unsigned long st_blksize; + + unsigned long st_blocks; /* Number 512-byte blocks allocated. */ + unsigned long __pad4; /* future possible st_blocks high bits */ + + unsigned long st_atime; + unsigned long __pad5; + + unsigned long st_mtime; + unsigned long __pad6; + + unsigned long st_ctime; + unsigned long __pad7; /* will be high 32 bits of ctime someday */ + + unsigned long __unused1; + unsigned long __unused2; +}; + #endif diff -urN 2.2.17pre19/include/asm-i386/uaccess.h 2.2.17pre19aa2/include/asm-i386/uaccess.h --- 2.2.17pre19/include/asm-i386/uaccess.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/asm-i386/uaccess.h Sat Aug 26 17:13:52 2000 @@ -6,6 +6,7 @@ */ #include #include +#include #include #define VERIFY_READ 0 @@ -270,6 +271,7 @@ : "=&c"(size), "=&D" (__d0), "=&S" (__d1) \ : "r"(size & 3), "0"(size / 4), "1"(to), "2"(from) \ : "memory"); \ + conditional_schedule(); \ } while (0) #define __copy_user_zeroing(to,from,size) \ @@ -298,6 +300,7 @@ : "=&c"(size), "=&D" (__d0), "=&S" (__d1) \ : "r"(size & 3), "0"(size / 4), "1"(to), "2"(from) \ : "memory"); \ + conditional_schedule(); \ } while (0) /* We let the __ versions of copy_from/to_user inline, because they're often @@ -338,6 +341,7 @@ : "=c"(size), "=&S" (__d0), "=&D" (__d1)\ : "1"(from), "2"(to), "0"(size/4) \ : "memory"); \ + conditional_schedule(); \ break; \ case 1: \ __asm__ __volatile__( \ @@ -428,6 +432,7 @@ : "=c"(size), "=&S" (__d0), "=&D" (__d1)\ : "1"(from), "2"(to), "0"(size/4) \ : "memory"); \ + conditional_schedule(); \ break; \ case 1: \ __asm__ __volatile__( \ diff -urN 2.2.17pre19/include/asm-i386/unistd.h 2.2.17pre19aa2/include/asm-i386/unistd.h --- 2.2.17pre19/include/asm-i386/unistd.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-i386/unistd.h Sat Aug 26 17:13:52 2000 @@ -80,7 +80,7 @@ #define __NR_sigpending 73 #define __NR_sethostname 74 #define __NR_setrlimit 75 -#define __NR_getrlimit 76 +#define __NR_getrlimit 76 /* Back compatible 2Gig limited rlimit */ #define __NR_getrusage 77 #define __NR_gettimeofday 78 #define __NR_settimeofday 79 @@ -195,8 +195,42 @@ #define __NR_getpmsg 188 /* some people actually want streams */ #define __NR_putpmsg 189 /* some people actually want streams */ #define __NR_vfork 190 +/* #define __NR_ugetrlimit 191 SuS compliant getrlimit */ +#define __NR_mmap2 192 +#define __NR_truncate64 193 +#define __NR_ftruncate64 194 +#define __NR_stat64 195 +#define __NR_lstat64 196 +#define __NR_fstat64 197 +#if 0 /* 2.3.x */ +#define __NR_lchown32 198 +#define __NR_getuid32 199 +#define __NR_getgid32 200 +#define __NR_geteuid32 201 +#define __NR_getegid32 202 +#define __NR_setreuid32 203 +#define __NR_setregid32 204 +#define __NR_getgroups32 205 +#define __NR_setgroups32 206 +#define __NR_fchown32 207 +#define __NR_setresuid32 208 +#define __NR_getresuid32 209 +#define __NR_setresgid32 210 +#define __NR_getresgid32 211 +#define __NR_chown32 212 +#define __NR_setuid32 213 +#define __NR_setgid32 214 +#define __NR_setfsuid32 215 +#define __NR_setfsgid32 216 +#define __NR_pivot_root 217 +#define __NR_mincore 218 +#define __NR_madvise 219 +#define __NR_madvise1 219 /* delete when C lib stub is removed */ +#endif +#define __NR_fcntl64 220 +#define __NR_getdents64 221 -/* user-visible error numbers are in the range -1 - -122: see */ +/* user-visible error numbers are in the range -1 - -124: see */ #define __syscall_return(type, res) \ do { \ @@ -269,6 +303,19 @@ : "=a" (__res) \ : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \ "d" ((long)(arg3)),"S" ((long)(arg4)),"D" ((long)(arg5))); \ +__syscall_return(type,__res); \ +} + +#define _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4, \ + type5,arg5,type6,arg6) \ +type name (type1 arg1,type2 arg2,type3 arg3,type4 arg4,type5 arg5,type6 arg6) \ +{ \ +long __res; \ +__asm__ volatile ("push %%ebp ; movl %%eax,%%ebp ; movl %1,%%eax ; int $0x80 ; pop %%ebp" \ + : "=a" (__res) \ + : "i" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \ + "d" ((long)(arg3)),"S" ((long)(arg4)),"D" ((long)(arg5)), \ + "0" ((long)(arg6))); \ __syscall_return(type,__res); \ } diff -urN 2.2.17pre19/include/asm-m68k/fcntl.h 2.2.17pre19aa2/include/asm-m68k/fcntl.h --- 2.2.17pre19/include/asm-m68k/fcntl.h Mon Jan 17 16:44:44 2000 +++ 2.2.17pre19aa2/include/asm-m68k/fcntl.h Sat Aug 26 17:13:52 2000 @@ -33,6 +33,10 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_GETLK64 12 /* using 'struct flock64' */ +#define F_SETLK64 13 +#define F_SETLKW64 14 + /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ @@ -58,6 +62,14 @@ off_t l_start; off_t l_len; pid_t l_pid; +}; + +struct flock64 { + short l_type; + short l_whence; + loff_t l_start; + loff_t l_len; + pid_t l_pid; }; #endif /* _M68K_FCNTL_H */ diff -urN 2.2.17pre19/include/asm-m68k/smplock.h 2.2.17pre19aa2/include/asm-m68k/smplock.h --- 2.2.17pre19/include/asm-m68k/smplock.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-m68k/smplock.h Sat Aug 26 17:13:49 2000 @@ -15,8 +15,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-m68k/stat.h 2.2.17pre19aa2/include/asm-m68k/stat.h --- 2.2.17pre19/include/asm-m68k/stat.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-m68k/stat.h Sat Aug 26 17:13:52 2000 @@ -38,4 +38,8 @@ unsigned long __unused5; }; +/* stat64 struct goes here -- someone please make + * it mesh with whatever glibc does in userland on + * m68k's. + */ #endif /* _M68K_STAT_H */ diff -urN 2.2.17pre19/include/asm-m68k/unistd.h 2.2.17pre19aa2/include/asm-m68k/unistd.h --- 2.2.17pre19/include/asm-m68k/unistd.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-m68k/unistd.h Sat Aug 26 17:13:52 2000 @@ -80,7 +80,7 @@ #define __NR_sigpending 73 #define __NR_sethostname 74 #define __NR_setrlimit 75 -#define __NR_getrlimit 76 +#define __NR_getrlimit 76 #define __NR_getrusage 77 #define __NR_gettimeofday 78 #define __NR_settimeofday 79 @@ -194,6 +194,13 @@ #define __NR_getpmsg 188 /* some people actually want streams */ #define __NR_putpmsg 189 /* some people actually want streams */ #define __NR_vfork 190 +/* #define __NR_getrlimit 191 */ +#define __NR_mmap2 192 +#define __NR_truncate64 193 +#define __NR_ftruncate64 194 +#define __NR_stat64 195 +#define __NR_lstat64 196 +#define __NR_fstat64 197 /* user-visible error numbers are in the range -1 - -122: see */ diff -urN 2.2.17pre19/include/asm-mips/fcntl.h 2.2.17pre19aa2/include/asm-mips/fcntl.h --- 2.2.17pre19/include/asm-mips/fcntl.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-mips/fcntl.h Sat Aug 26 17:13:52 2000 @@ -44,6 +44,10 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_GETLK64 33 /* using 'struct flock64' */ +#define F_SETLK64 34 +#define F_SETLKW64 35 + /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ @@ -72,5 +76,13 @@ __kernel_pid_t l_pid; long pad[4]; /* ZZZZZZZZZZZZZZZZZZZZZZZZZZ */ } flock_t; + +typedef struct flock64 { + short l_type; + short l_whence; + loff_t l_start; + loff_t l_len; + pid_t l_pid; +} flock64_t; #endif /* __ASM_MIPS_FCNTL_H */ diff -urN 2.2.17pre19/include/asm-mips/smplock.h 2.2.17pre19aa2/include/asm-mips/smplock.h --- 2.2.17pre19/include/asm-mips/smplock.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-mips/smplock.h Sat Aug 26 17:13:49 2000 @@ -18,8 +18,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-ppc/fcntl.h 2.2.17pre19aa2/include/asm-ppc/fcntl.h --- 2.2.17pre19/include/asm-ppc/fcntl.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-ppc/fcntl.h Sat Aug 26 17:13:52 2000 @@ -18,6 +18,8 @@ #define FASYNC 020000 /* fcntl, for BSD compatibility */ #define O_DIRECTORY 040000 /* must be a directory */ #define O_NOFOLLOW 0100000 /* don't follow links */ +#define O_LARGEFILE 0200000 +#define O_DIRECT 0400000 /* direct disk access hint - currently ignored */ #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get f_flags */ @@ -33,6 +35,10 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_GETLK64 12 /* using 'struct flock64' */ +#define F_SETLK64 13 +#define F_SETLKW64 14 + /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ @@ -64,6 +70,14 @@ off_t l_start; off_t l_len; pid_t l_pid; +}; + +struct flock64 { + short l_type; + short l_whence; + loff_t l_start; + loff_t l_len; + pid_t l_pid; }; #endif diff -urN 2.2.17pre19/include/asm-ppc/smplock.h 2.2.17pre19aa2/include/asm-ppc/smplock.h --- 2.2.17pre19/include/asm-ppc/smplock.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-ppc/smplock.h Sat Aug 26 17:13:49 2000 @@ -15,8 +15,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-ppc/stat.h 2.2.17pre19aa2/include/asm-ppc/stat.h --- 2.2.17pre19/include/asm-ppc/stat.h Mon Jan 17 16:44:45 2000 +++ 2.2.17pre19aa2/include/asm-ppc/stat.h Sat Aug 26 17:13:52 2000 @@ -37,4 +37,30 @@ unsigned long __unused5; }; +/* This matches struct stat64 in glibc2.1. + */ +struct stat64 { + unsigned long long st_dev; /* Device. */ + unsigned short int __pad1; + unsigned long st_ino; /* File serial number. */ + unsigned int st_mode; /* File mode. */ + unsigned int st_nlink; /* Link count. */ + unsigned int st_uid; /* User ID of the file's owner. */ + unsigned int st_gid; /* Group ID of the file's group. */ + unsigned long long st_rdev; /* Device number, if device. */ + unsigned short int __pad2; + long long st_size; /* Size of file, in bytes. */ + long st_blksize; /* Optimal block size for I/O. */ + + long long st_blocks; /* Number 512-byte blocks allocated. */ + long st_atime; /* Time of last access. */ + unsigned long int __unused1; + long st_mtime; /* Time of last modification. */ + unsigned long int __unused2; + long st_ctime; /* Time of last status change. */ + unsigned long int __unused3; + unsigned long int __unused4; + unsigned long int __unused5; +}; + #endif diff -urN 2.2.17pre19/include/asm-ppc/unistd.h 2.2.17pre19aa2/include/asm-ppc/unistd.h --- 2.2.17pre19/include/asm-ppc/unistd.h Tue Aug 22 14:54:13 2000 +++ 2.2.17pre19aa2/include/asm-ppc/unistd.h Sat Aug 26 17:13:52 2000 @@ -194,11 +194,18 @@ #define __NR_getpmsg 187 /* some people actually want streams */ #define __NR_putpmsg 188 /* some people actually want streams */ #define __NR_vfork 189 - +#define __NR_mmap2 192 +#define __NR_truncate64 193 +#define __NR_ftruncate64 194 +#define __NR_stat64 195 +#define __NR_lstat64 196 +#define __NR_fstat64 197 #define __NR_sys_pciconfig_read 198 #define __NR_sys_pciconfig_write 199 #define __NR_sys_pciconfig_iobase 200 #define __NR_multiplexer 201 +#define __NR_getdents64 202 +#define __NR_fcntl64 203 #define __NR(n) #n diff -urN 2.2.17pre19/include/asm-s390/smplock.h 2.2.17pre19aa2/include/asm-s390/smplock.h --- 2.2.17pre19/include/asm-s390/smplock.h Sun Apr 2 21:07:50 2000 +++ 2.2.17pre19aa2/include/asm-s390/smplock.h Sat Aug 26 17:13:49 2000 @@ -18,8 +18,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-sparc/fcntl.h 2.2.17pre19aa2/include/asm-sparc/fcntl.h --- 2.2.17pre19/include/asm-sparc/fcntl.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc/fcntl.h Sat Aug 26 17:13:52 2000 @@ -19,6 +19,7 @@ #define O_NOCTTY 0x8000 /* not fcntl */ #define O_DIRECTORY 0x10000 /* must be a directory */ #define O_NOFOLLOW 0x20000 /* don't follow links */ +#define O_LARGEFILE 0x40000 /* LFS */ #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get f_flags */ @@ -32,6 +33,9 @@ #define F_SETLKW 9 #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_GETLK64 12 +#define F_SETLK64 13 +#define F_SETLKW64 14 /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ @@ -57,6 +61,15 @@ short l_whence; off_t l_start; off_t l_len; + pid_t l_pid; + short __unused; +}; + +struct flock64 { + short l_type; + short l_whence; + loff_t l_start; + loff_t l_len; pid_t l_pid; short __unused; }; diff -urN 2.2.17pre19/include/asm-sparc/smplock.h 2.2.17pre19aa2/include/asm-sparc/smplock.h --- 2.2.17pre19/include/asm-sparc/smplock.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc/smplock.h Sat Aug 26 17:13:49 2000 @@ -15,8 +15,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-sparc/stat.h 2.2.17pre19aa2/include/asm-sparc/stat.h --- 2.2.17pre19/include/asm-sparc/stat.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc/stat.h Sat Aug 26 17:13:52 2000 @@ -1,4 +1,4 @@ -/* $Id: stat.h,v 1.9 1998/07/26 05:24:39 davem Exp $ */ +/* $Id: stat.h,v 1.10 1999/12/21 14:09:41 jj Exp $ */ #ifndef _SPARC_STAT_H #define _SPARC_STAT_H @@ -36,6 +36,42 @@ off_t st_blksize; off_t st_blocks; unsigned long __unused4[2]; +}; + +struct stat64 { + unsigned char __pad0[6]; + unsigned short st_dev; + unsigned char __pad1[4]; + + unsigned int st_ino; + unsigned int st_mode; + unsigned int st_nlink; + + unsigned int st_uid; + unsigned int st_gid; + + unsigned char __pad2[6]; + unsigned short st_rdev; + + unsigned char __pad3[8]; + + long long st_size; + unsigned int st_blksize; + + unsigned char __pad4[8]; + unsigned int st_blocks; + + unsigned int st_atime; + unsigned int __unused1; + + unsigned int st_mtime; + unsigned int __unused2; + + unsigned int st_ctime; + unsigned int __unused3; + + unsigned int __unused4; + unsigned int __unused5; }; #endif diff -urN 2.2.17pre19/include/asm-sparc/unistd.h 2.2.17pre19aa2/include/asm-sparc/unistd.h --- 2.2.17pre19/include/asm-sparc/unistd.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc/unistd.h Sat Aug 26 17:13:52 2000 @@ -71,14 +71,14 @@ /* #define __NR_mctl 53 SunOS specific */ #define __NR_ioctl 54 /* Common */ #define __NR_reboot 55 /* Common */ -/* #define __NR_ni_syscall 56 ENOSYS under SunOS */ +#define __NR_mmap2 56 /* Linux sparc32 Specific */ #define __NR_symlink 57 /* Common */ #define __NR_readlink 58 /* Common */ #define __NR_execve 59 /* Common */ #define __NR_umask 60 /* Common */ #define __NR_chroot 61 /* Common */ #define __NR_fstat 62 /* Common */ -/* #define __NR_ni_syscall 63 ENOSYS under SunOS */ +#define __NR_fstat64 63 /* Linux sparc32 Specific */ #define __NR_getpagesize 64 /* Common */ #define __NR_msync 65 /* Common in newer 1.3.x revs... */ #define __NR_vfork 66 /* Common */ @@ -92,14 +92,14 @@ #define __NR_mprotect 74 /* Common */ /* #define __NR_madvise 75 SunOS Specific */ #define __NR_vhangup 76 /* Common */ -/* #define __NR_ni_syscall 77 ENOSYS under SunOS */ +#define __NR_truncate64 77 /* Linux sparc32 Specific */ /* #define __NR_mincore 78 SunOS Specific */ #define __NR_getgroups 79 /* Common */ #define __NR_setgroups 80 /* Common */ #define __NR_getpgrp 81 /* Common */ /* #define __NR_setpgrp 82 setpgid, same difference... */ #define __NR_setitimer 83 /* Common */ -/* #define __NR_ni_syscall 84 ENOSYS under SunOS */ +#define __NR_ftruncate64 84 /* Linux sparc32 Specific */ #define __NR_swapon 85 /* Common */ #define __NR_getitimer 86 /* Common */ /* #define __NR_gethostname 87 SunOS Specific */ @@ -147,18 +147,18 @@ #define __NR_truncate 129 /* Common */ #define __NR_ftruncate 130 /* Common */ #define __NR_flock 131 /* Common */ -/* #define __NR_ni_syscall 132 ENOSYS under SunOS */ +#define __NR_lstat64 132 /* Linux sparc32 Specific */ #define __NR_sendto 133 /* Common */ #define __NR_shutdown 134 /* Common */ #define __NR_socketpair 135 /* Common */ #define __NR_mkdir 136 /* Common */ #define __NR_rmdir 137 /* Common */ #define __NR_utimes 138 /* SunOS Specific */ -/* #define __NR_ni_syscall 139 ENOSYS under SunOS */ +#define __NR_stat64 139 /* Linux sparc32 Specific */ /* #define __NR_adjtime 140 SunOS Specific */ #define __NR_getpeername 141 /* Common */ /* #define __NR_gethostid 142 SunOS Specific */ -/* #define __NR_ni_syscall 143 ENOSYS under SunOS */ +#define __NR_fcntl64 143 /* Linux sparc32 Specific */ #define __NR_getrlimit 144 /* Common */ #define __NR_setrlimit 145 /* Common */ /* #define __NR_killpg 146 SunOS Specific */ @@ -169,7 +169,7 @@ /* #define __NR_getmsg 151 SunOS Specific */ /* #define __NR_putmsg 152 SunOS Specific */ #define __NR_poll 153 /* Common */ -/* #define __NR_ni_syscall 154 ENOSYS under SunOS */ +#define __NR_getdents64 154 /* Linux Specific */ /* #define __NR_nfssvc 155 SunOS Specific */ /* #define __NR_getdirentries 156 SunOS Specific */ #define __NR_statfs 157 /* Common */ diff -urN 2.2.17pre19/include/asm-sparc64/fcntl.h 2.2.17pre19aa2/include/asm-sparc64/fcntl.h --- 2.2.17pre19/include/asm-sparc64/fcntl.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc64/fcntl.h Sat Aug 26 17:13:52 2000 @@ -19,6 +19,7 @@ #define O_NOCTTY 0x8000 /* not fcntl */ #define O_DIRECTORY 0x10000 /* must be a directory */ #define O_NOFOLLOW 0x20000 /* don't follow links */ +#define O_LARGEFILE 0x40000 #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get f_flags */ @@ -32,6 +33,11 @@ #define F_SETLKW 9 #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#ifdef __KERNEL__ +#define F_GETLK64 12 +#define F_SETLK64 13 +#define F_SETLKW64 14 +#endif /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ @@ -58,7 +64,6 @@ off_t l_start; off_t l_len; pid_t l_pid; - short __unused; }; #ifdef __KERNEL__ @@ -70,6 +75,17 @@ __kernel_pid_t32 l_pid; short __unused; }; + +struct flock32_64 { + short l_type; + short l_whence; + __kernel_loff_t32 l_start; + __kernel_loff_t32 l_len; + __kernel_pid_t32 l_pid; + short __unused; +}; + +#define flock64 flock #endif #endif /* !(_SPARC64_FCNTL_H) */ diff -urN 2.2.17pre19/include/asm-sparc64/smplock.h 2.2.17pre19aa2/include/asm-sparc64/smplock.h --- 2.2.17pre19/include/asm-sparc64/smplock.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc64/smplock.h Sat Aug 26 17:13:49 2000 @@ -16,8 +16,6 @@ do { \ if (task->lock_depth >= 0) \ spin_unlock(&kernel_flag); \ - release_irqlock(cpu); \ - __sti(); \ } while (0) /* diff -urN 2.2.17pre19/include/asm-sparc64/stat.h 2.2.17pre19aa2/include/asm-sparc64/stat.h --- 2.2.17pre19/include/asm-sparc64/stat.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc64/stat.h Sat Aug 26 17:13:52 2000 @@ -1,4 +1,4 @@ -/* $Id: stat.h,v 1.5 1998/07/26 05:24:41 davem Exp $ */ +/* $Id: stat.h,v 1.6 1999/12/21 14:09:48 jj Exp $ */ #ifndef _SPARC64_STAT_H #define _SPARC64_STAT_H @@ -41,5 +41,46 @@ off_t st_blocks; unsigned long __unused4[2]; }; + +#ifdef __KERNEL__ +/* This is sparc32 stat64 structure. */ + +struct stat64 { + unsigned char __pad0[6]; + unsigned short st_dev; + unsigned char __pad1[4]; + + unsigned int st_ino; + unsigned int st_mode; + unsigned int st_nlink; + + unsigned int st_uid; + unsigned int st_gid; + + unsigned char __pad2[6]; + unsigned short st_rdev; + + unsigned char __pad3[8]; + + long long st_size; + unsigned int st_blksize; + + unsigned char __pad4[8]; + unsigned int st_blocks; + + unsigned int st_atime; + unsigned int __unused1; + + unsigned int st_mtime; + unsigned int __unused2; + + unsigned int st_ctime; + unsigned int __unused3; + + unsigned int __unused4; + unsigned int __unused5; +}; + +#endif #endif diff -urN 2.2.17pre19/include/asm-sparc64/unistd.h 2.2.17pre19aa2/include/asm-sparc64/unistd.h --- 2.2.17pre19/include/asm-sparc64/unistd.h Mon Jan 17 16:44:46 2000 +++ 2.2.17pre19aa2/include/asm-sparc64/unistd.h Sat Aug 26 17:13:52 2000 @@ -71,14 +71,14 @@ /* #define __NR_mctl 53 SunOS specific */ #define __NR_ioctl 54 /* Common */ #define __NR_reboot 55 /* Common */ -/* #define __NR_ni_syscall 56 ENOSYS under SunOS */ +/* #define __NR_mmap2 56 Linux sparc32 Specific */ #define __NR_symlink 57 /* Common */ #define __NR_readlink 58 /* Common */ #define __NR_execve 59 /* Common */ #define __NR_umask 60 /* Common */ #define __NR_chroot 61 /* Common */ #define __NR_fstat 62 /* Common */ -/* #define __NR_ni_syscall 63 ENOSYS under SunOS */ +/* #define __NR_fstat64 63 Linux sparc32 Specific */ #define __NR_getpagesize 64 /* Common */ #define __NR_msync 65 /* Common in newer 1.3.x revs... */ #define __NR_vfork 66 /* Common */ @@ -92,14 +92,14 @@ #define __NR_mprotect 74 /* Common */ /* #define __NR_madvise 75 SunOS Specific */ #define __NR_vhangup 76 /* Common */ -/* #define __NR_ni_syscall 77 ENOSYS under SunOS */ +/* #define __NR_truncate64 77 Linux sparc32 Specific */ /* #define __NR_mincore 78 SunOS Specific */ #define __NR_getgroups 79 /* Common */ #define __NR_setgroups 80 /* Common */ #define __NR_getpgrp 81 /* Common */ /* #define __NR_setpgrp 82 setpgid, same difference... */ #define __NR_setitimer 83 /* Common */ -/* #define __NR_ni_syscall 84 ENOSYS under SunOS */ +/* #define __NR_ftruncate64 84 Linux sparc32 Specific */ #define __NR_swapon 85 /* Common */ #define __NR_getitimer 86 /* Common */ /* #define __NR_gethostname 87 SunOS Specific */ @@ -147,19 +147,19 @@ #define __NR_truncate 129 /* Common */ #define __NR_ftruncate 130 /* Common */ #define __NR_flock 131 /* Common */ -/* #define __NR_ni_syscall 132 ENOSYS under SunOS */ +/* #define __NR_lstat64 132 Linux sparc32 Specific */ #define __NR_sendto 133 /* Common */ #define __NR_shutdown 134 /* Common */ #define __NR_socketpair 135 /* Common */ #define __NR_mkdir 136 /* Common */ #define __NR_rmdir 137 /* Common */ #define __NR_utimes 138 /* SunOS Specific */ -/* #define __NR_ni_syscall 139 ENOSYS under SunOS */ +/* #define __NR_stat64 139 Linux sparc32 Specific */ /* #define __NR_adjtime 140 SunOS Specific */ #define __NR_getpeername 141 /* Common */ /* #define __NR_gethostid 142 SunOS Specific */ -/* #define __NR_ni_syscall 143 ENOSYS under SunOS */ -#define __NR_getrlimit 144 /* Common */ +/* #define __NR_fcntl64 143 Linux sparc32 Specific */ +#define __NR_getrlimit 144 /* Common */ #define __NR_setrlimit 145 /* Common */ /* #define __NR_killpg 146 SunOS Specific */ #define __NR_prctl 147 /* ENOSYS under SunOS */ @@ -169,8 +169,8 @@ /* #define __NR_getmsg 151 SunOS Specific */ /* #define __NR_putmsg 152 SunOS Specific */ #define __NR_poll 153 /* Common */ -/* #define __NR_ni_syscall 154 ENOSYS under SunOS */ -/* #define __NR_nfssvc 155 SunOS Specific */ +#define __NR_getdents64 154 /* Linux specific */ +/* #define __NR_fcntl64 155 Linux sparc32 Specific */ /* #define __NR_getdirentries 156 SunOS Specific */ #define __NR_statfs 157 /* Common */ #define __NR_fstatfs 158 /* Common */ diff -urN 2.2.17pre19/include/linux/bigmem.h 2.2.17pre19aa2/include/linux/bigmem.h --- 2.2.17pre19/include/linux/bigmem.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/linux/bigmem.h Sat Aug 26 17:13:49 2000 @@ -0,0 +1,50 @@ +#ifndef _LINUX_BIGMEM_H +#define _LINUX_BIGMEM_H + +#include + +#ifdef CONFIG_BIGMEM + +#include + +/* declarations for linux/mm/bigmem.c */ +extern unsigned long bigmem_mapnr; +extern int nr_free_bigpages; + +extern struct page * prepare_bigmem_swapout(struct page *); +extern struct page * replace_with_bigmem(struct page *); +extern unsigned long prepare_bigmem_shm_swapin(unsigned long); + +#else /* CONFIG_BIGMEM */ + +#define prepare_bigmem_swapout(page) page +#define replace_with_bigmem(page) page +#define prepare_bigmem_shm_swapin(page) page +#define kmap(kaddr, type) kaddr +#define kunmap(vaddr, type) do { } while (0) +#define nr_free_bigpages 0 + +#endif /* CONFIG_BIGMEM */ + +/* when CONFIG_BIGMEM is not set these will be plain clear/copy_page */ +extern inline void clear_bigpage(unsigned long kaddr) +{ + unsigned long vaddr; + + vaddr = kmap(kaddr, KM_WRITE); + clear_page(vaddr); + kunmap(vaddr, KM_WRITE); +} + +extern inline void copy_bigpage(unsigned long to, unsigned long from) +{ + unsigned long vfrom, vto; + + vfrom = kmap(from, KM_READ); + vto = kmap(to, KM_WRITE); + copy_page(vto, vfrom); + kunmap(vfrom, KM_READ); + kunmap(vto, KM_WRITE); +} + +#endif /* _LINUX_BIGMEM_H */ diff -urN 2.2.17pre19/include/linux/blkdev.h 2.2.17pre19aa2/include/linux/blkdev.h --- 2.2.17pre19/include/linux/blkdev.h Wed Aug 23 16:31:10 2000 +++ 2.2.17pre19aa2/include/linux/blkdev.h Sat Aug 26 17:13:48 2000 @@ -42,15 +42,13 @@ { int read_latency; int write_latency; - int max_bomb_segments; unsigned int queue_ID; } elevator_t; #define ELEVATOR_DEFAULTS \ ((elevator_t) { \ - 128, /* read_latency */ \ - 8192, /* write_latency */ \ - 32, /* max_bomb_segments */ \ + 256, /* read_latency */ \ + 512, /* write_latency */ \ 0 /* queue_ID */ \ }) diff -urN 2.2.17pre19/include/linux/condsched.h 2.2.17pre19aa2/include/linux/condsched.h --- 2.2.17pre19/include/linux/condsched.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/linux/condsched.h Sat Aug 26 17:13:52 2000 @@ -0,0 +1,14 @@ +#ifndef _LINUX_CONDSCHED_H +#define _LINUX_CONDSCHED_H + +#ifndef __ASSEMBLY__ +#define conditional_schedule() \ +do { \ + if (current->need_resched) { \ + current->state = TASK_RUNNING; \ + schedule(); \ + } \ +} while(0) +#endif + +#endif diff -urN 2.2.17pre19/include/linux/dcache.h 2.2.17pre19aa2/include/linux/dcache.h --- 2.2.17pre19/include/linux/dcache.h Sun Apr 2 21:07:50 2000 +++ 2.2.17pre19aa2/include/linux/dcache.h Sat Aug 26 17:13:48 2000 @@ -143,7 +143,7 @@ /* dcache memory management */ extern void shrink_dcache_memory(int, unsigned int); extern void check_dcache_memory(void); -extern void free_inode_memory(int); /* defined in fs/inode.c */ +extern void free_inode_memory(void); /* defined in fs/inode.c */ /* only used at mount-time */ extern struct dentry * d_alloc_root(struct inode * root_inode, struct dentry * old_root); diff -urN 2.2.17pre19/include/linux/dirent.h 2.2.17pre19aa2/include/linux/dirent.h --- 2.2.17pre19/include/linux/dirent.h Tue Feb 1 18:24:19 2000 +++ 2.2.17pre19aa2/include/linux/dirent.h Sat Aug 26 17:13:52 2000 @@ -8,4 +8,12 @@ char d_name[256]; /* We must not include limits.h! */ }; +struct dirent64 { + __u64 d_ino; + __s64 d_off; + unsigned short d_reclen; + unsigned char d_type; + char d_name[256]; +}; + #endif diff -urN 2.2.17pre19/include/linux/ext2_fs_i.h 2.2.17pre19aa2/include/linux/ext2_fs_i.h --- 2.2.17pre19/include/linux/ext2_fs_i.h Tue Feb 1 18:24:19 2000 +++ 2.2.17pre19aa2/include/linux/ext2_fs_i.h Sat Aug 26 17:13:52 2000 @@ -35,7 +35,6 @@ __u32 i_next_alloc_goal; __u32 i_prealloc_block; __u32 i_prealloc_count; - __u32 i_high_size; int i_new_inode:1; /* Is a freshly allocated inode */ }; diff -urN 2.2.17pre19/include/linux/fs.h 2.2.17pre19aa2/include/linux/fs.h --- 2.2.17pre19/include/linux/fs.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/linux/fs.h Sat Aug 26 17:13:52 2000 @@ -185,6 +185,7 @@ #define BH_Lock 2 /* 1 if the buffer is locked */ #define BH_Req 3 /* 0 if the buffer has been invalidated */ #define BH_Protected 6 /* 1 if the buffer is protected */ +#define BH_Wait_IO 7 /* 1 if we should throttle on this buffer */ /* * Try to keep the most commonly used fields in single cache lines (16 @@ -260,6 +261,25 @@ #define buffer_page(bh) (mem_map + MAP_NR((bh)->b_data)) #define touch_buffer(bh) set_bit(PG_referenced, &buffer_page(bh)->flags) +/* log of base-2 for filesystem uses, in case their super-blocks + don't have the shift counts readily calculated.. -- presuming + the divisors in question are power-of-two values! */ +static int fslog2(unsigned long val) __attribute__ ((const)); +static __inline__ int fslog2(unsigned long val) +{ + int i; + for (i = 0; val != 0; ++i, val >>= 1) { + if (val & 1) return i; + } + return 0; +} + +static int off_t_presentable(loff_t) __attribute((const)); +static __inline__ int off_t_presentable(loff_t loff) +{ + return loff >= 0 && loff <= (~0UL >> 1); +} + #include #include #include @@ -310,7 +330,7 @@ umode_t ia_mode; uid_t ia_uid; gid_t ia_gid; - off_t ia_size; + loff_t ia_size; time_t ia_atime; time_t ia_mtime; time_t ia_ctime; @@ -345,7 +365,7 @@ uid_t i_uid; gid_t i_gid; kdev_t i_rdev; - off_t i_size; + loff_t i_size; time_t i_atime; time_t i_mtime; time_t i_ctime; @@ -422,7 +442,7 @@ mode_t f_mode; loff_t f_pos; unsigned int f_count, f_flags; - unsigned long f_reada, f_ramax, f_raend, f_ralen, f_rawin; + loff_t f_reada, f_ramax, f_raend, f_ralen, f_rawin; struct fown_struct f_owner; unsigned int f_uid, f_gid; int f_error; @@ -462,8 +482,8 @@ struct file *fl_file; unsigned char fl_flags; unsigned char fl_type; - off_t fl_start; - off_t fl_end; + loff_t fl_start; + loff_t fl_end; void (*fl_notify)(struct file_lock *); /* unblock callback */ @@ -479,6 +499,9 @@ extern int fcntl_getlk(unsigned int fd, struct flock *l); extern int fcntl_setlk(unsigned int fd, unsigned int cmd, struct flock *l); +extern int fcntl_getlk64(unsigned int fd, struct flock64 *l); +extern int fcntl_setlk64(unsigned int fd, unsigned int cmd, struct flock64 *l); + /* fs/locks.c */ extern void locks_remove_posix(struct file *, fl_owner_t id); extern void locks_remove_flock(struct file *); @@ -574,12 +597,25 @@ extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); /* + * File types + */ +#define DT_UNKNOWN 0 +#define DT_FIFO 1 +#define DT_CHR 2 +#define DT_DIR 4 +#define DT_BLK 6 +#define DT_REG 8 +#define DT_LNK 10 +#define DT_SOCK 12 +#define DT_WHT 14 + +/* * This is the "filldir" function type, used by readdir() to let * the kernel specify what kind of dirent layout it wants to have. * This allows the kernel to read directories into kernel space or * to have different dirent layouts depending on the binary type. */ -typedef int (*filldir_t)(void *, const char *, int, off_t, ino_t); +typedef int (*filldir_t)(void *, const char *, int, off_t, ino_t, unsigned); struct file_operations { loff_t (*llseek) (struct file *, loff_t, int); @@ -700,7 +736,7 @@ asmlinkage int sys_open(const char *, int, int); asmlinkage int sys_close(unsigned int); /* yes, it's really unsigned */ -extern int do_truncate(struct dentry *, unsigned long); +extern int do_truncate(struct dentry *, loff_t); extern int get_unused_fd(void); extern void put_unused_fd(unsigned int); @@ -754,7 +790,7 @@ extern void refile_buffer(struct buffer_head * buf); extern void set_writetime(struct buffer_head * buf, int flag); -extern int try_to_free_buffers(struct page *, int wait); +extern int try_to_free_buffers(struct page *, int); extern int nr_buffers; extern long buffermem; @@ -763,15 +799,25 @@ #define BUF_CLEAN 0 #define BUF_LOCKED 1 /* Buffers scheduled for write */ #define BUF_DIRTY 2 /* Dirty buffers, not yet scheduled for write */ -#define NR_LIST 3 +#define BUF_PROTECTED 3 /* Ramdisk persistent storage */ +#define NR_LIST 4 void mark_buffer_uptodate(struct buffer_head * bh, int on); +extern inline void mark_buffer_protected(struct buffer_head * bh) +{ + if (!test_and_set_bit(BH_Protected, &bh->b_state)) { + if (bh->b_list != BUF_PROTECTED) + refile_buffer(bh); + } +} + extern inline void mark_buffer_clean(struct buffer_head * bh) { if (test_and_clear_bit(BH_Dirty, &bh->b_state)) { if (bh->b_list == BUF_DIRTY) refile_buffer(bh); + clear_bit(BH_Wait_IO, &bh->b_state); } } @@ -908,6 +954,9 @@ extern void inode_setattr(struct inode *, struct iattr *); extern __u32 inode_generation_count; + +#define fs_down(sem) do { current->fs_locks++; down(sem); } while (0) +#define fs_up(sem) do { up(sem); current->fs_locks--; } while (0) #endif /* __KERNEL__ */ diff -urN 2.2.17pre19/include/linux/iobuf.h 2.2.17pre19aa2/include/linux/iobuf.h --- 2.2.17pre19/include/linux/iobuf.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/linux/iobuf.h Sat Aug 26 17:13:50 2000 @@ -0,0 +1,82 @@ +/* + * iobuf.h + * + * Defines the structures used to track abstract kernel-space io buffers. + * + */ + +#ifndef __LINUX_IOBUF_H +#define __LINUX_IOBUF_H + +#include +#include + +/* + * The kiobuf structure describes a physical set of pages reserved + * locked for IO. The reference counts on each page will have been + * incremented, and the flags field will indicate whether or not we have + * pre-locked all of the pages for IO. + * + * kiobufs may be passed in arrays to form a kiovec, but we must + * preserve the property that no page is present more than once over the + * entire iovec. + */ + +#define KIO_MAX_ATOMIC_IO 64 /* in kb */ +#define KIO_MAX_ATOMIC_BYTES (64 * 1024) +#define KIO_STATIC_PAGES (KIO_MAX_ATOMIC_IO / (PAGE_SIZE >> 10)) +#define KIO_MAX_SECTORS (KIO_MAX_ATOMIC_IO * 2) + +struct kiobuf +{ + int nr_pages; /* Pages actually referenced */ + int array_len; /* Space in the allocated lists */ + int offset; /* Offset to start of valid data */ + int length; /* Number of valid bytes of data */ + + /* Keep separate track of the physical addresses and page + * structs involved. If we do IO to a memory-mapped device + * region, there won't necessarily be page structs defined for + * every address. */ + + unsigned long * pagelist; + struct page ** maplist; + unsigned long * bouncelist; + + unsigned int locked : 1; /* If set, pages has been locked */ + unsigned int bounced : 1; /* If set, bounce pages are set up */ + + /* Always embed enough struct pages for 64k of IO */ + unsigned long page_array[KIO_STATIC_PAGES]; + struct page * map_array[KIO_STATIC_PAGES]; + unsigned long bounce_array[KIO_STATIC_PAGES]; +}; + + +/* mm/memory.c */ + +int map_user_kiobuf(int rw, struct kiobuf *, unsigned long va, size_t len); +void unmap_kiobuf(struct kiobuf *iobuf); + +/* fs/iobuf.c */ + +void __init kiobuf_init(void); +int alloc_kiovec(int nr, struct kiobuf **); +void free_kiovec(int nr, struct kiobuf **); +int expand_kiobuf(struct kiobuf *, int); +int setup_kiobuf_bounce_pages(struct kiobuf *, int gfp_mask); +void clear_kiobuf_bounce_pages(struct kiobuf *); +void kiobuf_copy_bounce(struct kiobuf *, int direction, int max); + +/* Direction codes for kiobuf_copy_bounce: */ +enum { + COPY_TO_BOUNCE, + COPY_FROM_BOUNCE +}; + +/* fs/buffer.c */ + +int brw_kiovec(int rw, int nr, struct kiobuf *iovec[], + kdev_t dev, unsigned long b[], int size); + +#endif /* __LINUX_IOBUF_H */ diff -urN 2.2.17pre19/include/linux/kernel.h 2.2.17pre19aa2/include/linux/kernel.h --- 2.2.17pre19/include/linux/kernel.h Fri Jun 30 03:56:09 2000 +++ 2.2.17pre19aa2/include/linux/kernel.h Sat Aug 26 17:13:49 2000 @@ -90,7 +90,9 @@ unsigned long totalswap; /* Total swap space size */ unsigned long freeswap; /* swap space still available */ unsigned short procs; /* Number of current processes */ - char _f[22]; /* Pads structure to 64 bytes */ + unsigned long totalbig; /* Total big memory size */ + unsigned long freebig; /* Available big memory size */ + char _f[20-2*sizeof(long)]; /* Padding: libc5 uses this.. */ }; #endif diff -urN 2.2.17pre19/include/linux/locks.h 2.2.17pre19aa2/include/linux/locks.h --- 2.2.17pre19/include/linux/locks.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/linux/locks.h Sat Aug 26 17:13:52 2000 @@ -50,10 +50,12 @@ if (sb->s_lock) __wait_on_super(sb); sb->s_lock = 1; + current->fs_locks++; } extern inline void unlock_super(struct super_block * sb) { + current->fs_locks--; sb->s_lock = 0; wake_up(&sb->s_wait); } diff -urN 2.2.17pre19/include/linux/lvm.h 2.2.17pre19aa2/include/linux/lvm.h --- 2.2.17pre19/include/linux/lvm.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/linux/lvm.h Sat Aug 26 17:13:50 2000 @@ -0,0 +1,827 @@ +/* + * kernel/lvm.h + * + * Copyright (C) 1997 - 2000 Heinz Mauelshagen, Germany + * + * February-November 1997 + * May-July 1998 + * January-March,July,September,October,Dezember 1999 + * January 2000 + * + * lvm is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * lvm is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU CC; see the file COPYING. If not, write to + * the Free Software Foundation, 59 Temple Place - Suite 330, + * Boston, MA 02111-1307, USA. + * + */ + +/* + * Changelog + * + * 10/10/1997 - beginning of new structure creation + * 12/05/1998 - incorporated structures from lvm_v1.h and deleted lvm_v1.h + * 07/06/1998 - avoided LVM_KMALLOC_MAX define by using vmalloc/vfree + * instead of kmalloc/kfree + * 01/07/1998 - fixed wrong LVM_MAX_SIZE + * 07/07/1998 - extended pe_t structure by ios member (for statistic) + * 02/08/1998 - changes for official char/block major numbers + * 07/08/1998 - avoided init_module() and cleanup_module() to be static + * 29/08/1998 - seprated core and disk structure type definitions + * 01/09/1998 - merged kernel integration version (mike) + * 20/01/1999 - added LVM_PE_DISK_OFFSET macro for use in + * vg_read_with_pv_and_lv(), pv_move_pe(), pv_show_pe_text()... + * 18/02/1999 - added definition of time_disk_t structure for; + * keeps time stamps on disk for nonatomic writes (future) + * 15/03/1999 - corrected LV() and VG() macro definition to use argument + * instead of minor + * 03/07/1999 - define for genhd.c name handling + * 23/07/1999 - implemented snapshot part + * 08/12/1999 - changed LVM_LV_SIZE_MAX macro to reflect current 1TB limit + * 01/01/2000 - extended lv_v2 core structure by wait_queue member + * + */ + + +#ifndef _LVM_H_INCLUDE +#define _LVM_H_INCLUDE + +#define _LVM_H_VERSION "LVM 0.8 (1/1/2000)" + +/* + * preprocessor definitions + */ +/* if you like emergency reset code in the driver */ +#define LVM_TOTAL_RESET + +#define LVM_GET_INODE +#define LVM_HD_NAME + +/* lots of debugging output (see driver source) +#define DEBUG_LVM_GET_INFO +#define DEBUG +#define DEBUG_MAP +#define DEBUG_MAP_SIZE +#define DEBUG_IOCTL +#define DEBUG_READ +#define DEBUG_GENDISK +#define DEBUG_VG_CREATE +#define DEBUG_LVM_BLK_OPEN +#define DEBUG_VFREE +#define DEBUG_SNAPSHOT +*/ +/* + * end of preprocessor definitions + */ + +#ifndef LINUX_VERSION_CODE +# include + /* for 2.0.x series */ +# ifndef KERNEL_VERSION +# define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c)) +# endif +#endif + +#include +#include +#include +#if LINUX_VERSION_CODE >= KERNEL_VERSION ( 2, 3 ,0) +# include +#else +# include +#endif + +/* leave this for now until major.h is updated (mike) */ +#ifndef LVM_BLK_MAJOR +# define LVM_BLK_MAJOR 58 +#endif +#ifndef LVM_CHAR_MAJOR +# define LVM_CHAR_MAJOR 109 +#endif + +#if !defined ( LVM_BLK_MAJOR) || !defined ( LVM_CHAR_MAJOR) + #error Bad include/linux/major.h - LVM MAJOR undefined +#endif + + +#if LINUX_VERSION_CODE < KERNEL_VERSION ( 2, 1 ,0) +# ifndef uint8_t +# define uint8_t __u8 +# endif +# ifndef uint16_t +# define uint16_t __u16 +# endif +# ifndef uint32_t +# define uint32_t __u32 +# endif +# ifndef uint64_t +# define uint64_t __u64 +# endif +#endif + +#define LVM_STRUCT_VERSION 1 /* structure version */ + +#ifndef min +#define min(a,b) (((a)<(b))?(a):(b)) +#endif +#ifndef max +#define max(a,b) (((a)>(b))?(a):(b)) +#endif + +/* set the default structure version */ +#if ( LVM_STRUCT_VERSION == 1) +# define pv_t pv_v1_t +# define lv_t lv_v2_t +# define vg_t vg_v1_t +# define pv_disk_t pv_disk_v1_t +# define lv_disk_t lv_disk_v1_t +# define vg_disk_t vg_disk_v1_t +# define lv_exception_t lv_v2_exception_t +#endif + + +/* + * i/o protocoll version + * + * defined here for the driver and defined seperate in the + * user land LVM parts + * + */ +#define LVM_DRIVER_IOP_VERSION 6 + +#define LVM_NAME "lvm" + +/* + * VG/LV indexing macros + */ +/* character minor maps directly to volume group */ +#define VG_CHR(a) ( a) + +/* block minor indexes into a volume group/logical volume indirection table */ +#define VG_BLK(a) ( vg_lv_map[a].vg_number) +#define LV_BLK(a) ( vg_lv_map[a].lv_number) + +/* + * absolute limits for VGs, PVs per VG and LVs per VG + */ +#define ABS_MAX_VG 99 +#define ABS_MAX_PV 256 +#define ABS_MAX_LV 256 /* caused by 8 bit minor */ + +#define MAX_VG ABS_MAX_VG +#define MAX_LV ABS_MAX_LV +#define MAX_PV ABS_MAX_PV + +#if ( MAX_VG > ABS_MAX_VG) +# undef MAX_VG +# define MAX_VG ABS_MAX_VG +#endif + +#if ( MAX_LV > ABS_MAX_LV) +# undef MAX_LV +# define MAX_LV ABS_MAX_LV +#endif + + +/* + * VGDA: default disk spaces and offsets + * + * there's space after the structures for later extensions. + * + * offset what size + * --------------- ---------------------------------- ------------ + * 0 physical volume structure ~500 byte + * + * 1K volume group structure ~200 byte + * + * 5K time stamp structure ~ + * + * 6K namelist of physical volumes 128 byte each + * + * 6k + n * 128byte n logical volume structures ~300 byte each + * + * + m * 328byte m physical extent alloc. structs 4 byte each + * + * End of disk - first physical extent typical 4 megabyte + * PE total * + * PE size + * + * + */ + +/* DONT TOUCH THESE !!! */ +/* base of PV structure in disk partition */ +#define LVM_PV_DISK_BASE 0L + +/* size reserved for PV structure on disk */ +#define LVM_PV_DISK_SIZE 1024L + +/* base of VG structure in disk partition */ +#define LVM_VG_DISK_BASE LVM_PV_DISK_SIZE + +/* size reserved for VG structure */ +#define LVM_VG_DISK_SIZE ( 9 * 512L) + +/* size reserved for timekeeping */ +#define LVM_TIMESTAMP_DISK_BASE ( LVM_VG_DISK_BASE + LVM_VG_DISK_SIZE) +#define LVM_TIMESTAMP_DISK_SIZE 512L /* reserved for timekeeping */ + +/* name list of physical volumes on disk */ +#define LVM_PV_NAMELIST_DISK_BASE ( LVM_TIMESTAMP_DISK_BASE + \ + LVM_TIMESTAMP_DISK_SIZE) + +/* now for the dynamically calculated parts of the VGDA */ +#define LVM_LV_DISK_OFFSET(a, b) ( (a)->lv_on_disk.base + sizeof ( lv_t) * b) +#define LVM_DISK_SIZE(pv) ( (pv)->pe_on_disk.base + \ + (pv)->pe_on_disk.size) +#define LVM_PE_DISK_OFFSET(pe, pv) ( pe * pv->pe_size + \ + ( LVM_DISK_SIZE ( pv) / SECTOR_SIZE)) +#define LVM_PE_ON_DISK_BASE(pv) \ + { int rest; \ + pv->pe_on_disk.base = pv->lv_on_disk.base + pv->lv_on_disk.size; \ + if ( ( rest = pv->pe_on_disk.base % SECTOR_SIZE) != 0) \ + pv->pe_on_disk.base += ( SECTOR_SIZE - rest); \ + } +/* END default disk spaces and offsets for PVs */ + + +/* + * LVM_PE_T_MAX corresponds to: + * + * 8KB PE size can map a ~512 MB logical volume at the cost of 1MB memory, + * + * 128MB PE size can map a 8TB logical volume at the same cost of memory. + * + * Default PE size of 4 MB gives a maximum logical volume size of 256 GB. + * + * Maximum PE size of 16GB gives a maximum logical volume size of 1024 TB. + * + * AFAIK, the actual kernels limit this to 1 TB. + * + * Should be a sufficient spectrum ;*) + */ + +/* This is the usable size of disk_pe_t.le_num !!! v v */ +#define LVM_PE_T_MAX ( ( 1 << ( sizeof ( uint16_t) * 8)) - 2) + +#define LVM_LV_SIZE_MAX(a) ( ( long long) LVM_PE_T_MAX * (a)->pe_size > ( long long) 2*1024*1024*1024 ? ( long long) 2*1024*1024*1024 : ( long long) LVM_PE_T_MAX * (a)->pe_size) +#define LVM_MIN_PE_SIZE ( 8L * 2) /* 8 KB in sectors */ +#define LVM_MAX_PE_SIZE ( 16L * 1024L * 1024L * 2) /* 16GB in sectors */ +#define LVM_DEFAULT_PE_SIZE ( 4096L * 2) /* 4 MB in sectors */ +#define LVM_DEFAULT_STRIPE_SIZE 16L /* 16 KB */ +#define LVM_MIN_STRIPE_SIZE 2L /* 1 KB in sectors */ +#define LVM_MAX_STRIPE_SIZE ( 512L * 2) /* 512 KB in sectors */ +#define LVM_MAX_STRIPES 128 /* max # of stripes */ +#define LVM_MAX_SIZE ( 1024LU * 1024 * 1024 * 2) /* 1TB[sectors] */ +#define LVM_MAX_MIRRORS 2 /* future use */ +#define LVM_MIN_READ_AHEAD 0 /* minimum read ahead sectors */ +#define LVM_MAX_READ_AHEAD 256 /* maximum read ahead sectors */ +#define LVM_DEF_READ_AHEAD ((LVM_MAX_READ_AHEAD-LVM_MIN_READ_AHEAD)/2 + LVM_MIN_READ_AHEAD) +#define LVM_MAX_LV_IO_TIMEOUT 60 /* seconds I/O timeout (future use) */ +#define LVM_PARTITION 0xfe /* LVM partition id */ +#define LVM_NEW_PARTITION 0x8e /* new LVM partition id (10/09/1999) */ +#define LVM_PE_SIZE_PV_SIZE_REL 5 /* max relation PV size and PE size */ + +#define LVM_SNAPSHOT_MAX_CHUNK 256 /* 256 KB */ +#define LVM_SNAPSHOT_DEF_CHUNK 64 /* 64 KB */ +#define LVM_SNAPSHOT_MIN_CHUNK 1 /* 1 KB */ + +#define UNDEF -1 +#define FALSE 0 +#define TRUE 1 + + +/* + * ioctls + */ +/* volume group */ +#define VG_CREATE _IOW ( 0xfe, 0x00, 1) +#define VG_REMOVE _IOW ( 0xfe, 0x01, 1) + +#define VG_EXTEND _IOW ( 0xfe, 0x03, 1) +#define VG_REDUCE _IOW ( 0xfe, 0x04, 1) + +#define VG_STATUS _IOWR ( 0xfe, 0x05, 1) +#define VG_STATUS_GET_COUNT _IOWR ( 0xfe, 0x06, 1) +#define VG_STATUS_GET_NAMELIST _IOWR ( 0xfe, 0x07, 1) + +#define VG_SET_EXTENDABLE _IOW ( 0xfe, 0x08, 1) + + +/* logical volume */ +#define LV_CREATE _IOW ( 0xfe, 0x20, 1) +#define LV_REMOVE _IOW ( 0xfe, 0x21, 1) + +#define LV_ACTIVATE _IO ( 0xfe, 0x22) +#define LV_DEACTIVATE _IO ( 0xfe, 0x23) + +#define LV_EXTEND _IOW ( 0xfe, 0x24, 1) +#define LV_REDUCE _IOW ( 0xfe, 0x25, 1) + +#define LV_STATUS_BYNAME _IOWR ( 0xfe, 0x26, 1) +#define LV_STATUS_BYINDEX _IOWR ( 0xfe, 0x27, 1) + +#define LV_SET_ACCESS _IOW ( 0xfe, 0x28, 1) +#define LV_SET_ALLOCATION _IOW ( 0xfe, 0x29, 1) +#define LV_SET_STATUS _IOW ( 0xfe, 0x2a, 1) + +#define LE_REMAP _IOW ( 0xfe, 0x2b, 1) + + +/* physical volume */ +#define PV_STATUS _IOWR ( 0xfe, 0x40, 1) +#define PV_CHANGE _IOWR ( 0xfe, 0x41, 1) +#define PV_FLUSH _IOW ( 0xfe, 0x42, 1) + +/* physical extent */ +#define PE_LOCK_UNLOCK _IOW ( 0xfe, 0x50, 1) + +/* i/o protocol version */ +#define LVM_GET_IOP_VERSION _IOR ( 0xfe, 0x98, 1) + +#ifdef LVM_TOTAL_RESET +/* special reset function for testing purposes */ +#define LVM_RESET _IO ( 0xfe, 0x99) +#endif + +/* lock the logical volume manager */ +#define LVM_LOCK_LVM _IO ( 0xfe, 0x100) +/* END ioctls */ + + +/* + * Status flags + */ +/* volume group */ +#define VG_ACTIVE 0x01 /* vg_status */ +#define VG_EXPORTED 0x02 /* " */ +#define VG_EXTENDABLE 0x04 /* " */ + +#define VG_READ 0x01 /* vg_access */ +#define VG_WRITE 0x02 /* " */ + +/* logical volume */ +#define LV_ACTIVE 0x01 /* lv_status */ +#define LV_SPINDOWN 0x02 /* " */ + +#define LV_READ 0x01 /* lv_access */ +#define LV_WRITE 0x02 /* " */ +#define LV_SNAPSHOT 0x04 /* " */ +#define LV_SNAPSHOT_ORG 0x08 /* " */ + +#define LV_BADBLOCK_ON 0x01 /* lv_badblock */ + +#define LV_STRICT 0x01 /* lv_allocation */ +#define LV_CONTIGUOUS 0x02 /* " */ + +/* physical volume */ +#define PV_ACTIVE 0x01 /* pv_status */ +#define PV_ALLOCATABLE 0x02 /* pv_allocatable */ + + +/* + * Structure definitions core/disk follow + * + * conditional conversion takes place on big endian architectures + * in functions * pv_copy_*(), vg_copy_*() and lv_copy_*() + * + */ + +#define NAME_LEN 128 /* don't change!!! */ +#define UUID_LEN 16 /* don't change!!! */ + +/* remap physical sector/rdev pairs */ +typedef struct { + struct list_head hash; + ulong rsector_org; + kdev_t rdev_org; + ulong rsector_new; + kdev_t rdev_new; +} lv_block_exception_t; + + +/* disk stored pe information */ +typedef struct { + uint16_t lv_num; + uint16_t le_num; +} disk_pe_t; + +/* disk stored PV, VG, LV and PE size and offset information */ +typedef struct { + uint32_t base; + uint32_t size; +} lvm_disk_data_t; + + +/* + * Structure Physical Volume (PV) Version 1 + */ + +/* core */ +typedef struct { + uint8_t id[2]; /* Identifier */ + uint16_t version; /* HM lvm version */ + lvm_disk_data_t pv_on_disk; + lvm_disk_data_t vg_on_disk; + lvm_disk_data_t pv_namelist_on_disk; + lvm_disk_data_t lv_on_disk; + lvm_disk_data_t pe_on_disk; + uint8_t pv_name[NAME_LEN]; + uint8_t vg_name[NAME_LEN]; + uint8_t system_id[NAME_LEN]; /* for vgexport/vgimport */ + kdev_t pv_dev; + uint32_t pv_number; + uint32_t pv_status; + uint32_t pv_allocatable; + uint32_t pv_size; /* HM */ + uint32_t lv_cur; + uint32_t pe_size; + uint32_t pe_total; + uint32_t pe_allocated; + uint32_t pe_stale; /* for future use */ + disk_pe_t *pe; /* HM */ + struct inode *inode; /* HM */ +} pv_v1_t; + +/* disk */ +typedef struct { + uint8_t id[2]; /* Identifier */ + uint16_t version; /* HM lvm version */ + lvm_disk_data_t pv_on_disk; + lvm_disk_data_t vg_on_disk; + lvm_disk_data_t pv_namelist_on_disk; + lvm_disk_data_t lv_on_disk; + lvm_disk_data_t pe_on_disk; + uint8_t pv_name[NAME_LEN]; + uint8_t vg_name[NAME_LEN]; + uint8_t system_id[NAME_LEN]; /* for vgexport/vgimport */ + uint32_t pv_major; + uint32_t pv_number; + uint32_t pv_status; + uint32_t pv_allocatable; + uint32_t pv_size; /* HM */ + uint32_t lv_cur; + uint32_t pe_size; + uint32_t pe_total; + uint32_t pe_allocated; + uint32_t dummy1; + uint32_t dummy2; + uint32_t dummy3; +} pv_disk_v1_t; + + +/* + * Structure Physical Volume (PV) Version 2 (future!) + */ + +typedef struct { + uint8_t id[2]; /* Identifier */ + uint16_t version; /* HM lvm version */ + lvm_disk_data_t pv_on_disk; + lvm_disk_data_t vg_on_disk; + lvm_disk_data_t pv_uuid_on_disk; + lvm_disk_data_t lv_on_disk; + lvm_disk_data_t pe_on_disk; + uint8_t pv_name[NAME_LEN]; + uint8_t vg_name[NAME_LEN]; + uint8_t system_id[NAME_LEN]; /* for vgexport/vgimport */ + kdev_t pv_dev; + uint32_t pv_number; + uint32_t pv_status; + uint32_t pv_allocatable; + uint32_t pv_size; /* HM */ + uint32_t lv_cur; + uint32_t pe_size; + uint32_t pe_total; + uint32_t pe_allocated; + uint32_t pe_stale; /* for future use */ + disk_pe_t *pe; /* HM */ + struct inode *inode; /* HM */ + /* delta to version 1 starts here */ + uint8_t pv_uuid[UUID_LEN]; + uint32_t pv_atime; /* PV access time */ + uint32_t pv_ctime; /* PV creation time */ + uint32_t pv_mtime; /* PV modification time */ +} pv_v2_t; + + +/* + * Structures for Logical Volume (LV) + */ + +/* core PE information */ +typedef struct { + kdev_t dev; + uint32_t pe; /* to be changed if > 2TB */ + uint32_t reads; + uint32_t writes; +} pe_t; + +typedef struct { + uint8_t lv_name[NAME_LEN]; + kdev_t old_dev; + kdev_t new_dev; + ulong old_pe; + ulong new_pe; +} le_remap_req_t; + + + +/* + * Structure Logical Volume (LV) Version 1 + */ + +/* core */ +typedef struct { + uint8_t lv_name[NAME_LEN]; + uint8_t vg_name[NAME_LEN]; + uint32_t lv_access; + uint32_t lv_status; + uint32_t lv_open; /* HM */ + kdev_t lv_dev; /* HM */ + uint32_t lv_number; /* HM */ + uint32_t lv_mirror_copies; /* for future use */ + uint32_t lv_recovery; /* " */ + uint32_t lv_schedule; /* " */ + uint32_t lv_size; + pe_t *lv_current_pe; /* HM */ + uint32_t lv_current_le; /* for future use */ + uint32_t lv_allocated_le; + uint32_t lv_stripes; + uint32_t lv_stripesize; + uint32_t lv_badblock; /* for future use */ + uint32_t lv_allocation; + uint32_t lv_io_timeout; /* for future use */ + uint32_t lv_read_ahead; +} lv_v1_t; + +/* disk */ +typedef struct { + uint8_t lv_name[NAME_LEN]; + uint8_t vg_name[NAME_LEN]; + uint32_t lv_access; + uint32_t lv_status; + uint32_t lv_open; /* HM */ + uint32_t lv_dev; /* HM */ + uint32_t lv_number; /* HM */ + uint32_t lv_mirror_copies; /* for future use */ + uint32_t lv_recovery; /* " */ + uint32_t lv_schedule; /* " */ + uint32_t lv_size; + uint32_t dummy; + uint32_t lv_current_le; /* for future use */ + uint32_t lv_allocated_le; + uint32_t lv_stripes; + uint32_t lv_stripesize; + uint32_t lv_badblock; /* for future use */ + uint32_t lv_allocation; + uint32_t lv_io_timeout; /* for future use */ + uint32_t lv_read_ahead; /* HM, for future use */ +} lv_disk_v1_t; + + +/* + * Structure Logical Volume (LV) Version 2 + */ + +/* core */ +typedef struct lv_v2 { + uint8_t lv_name[NAME_LEN]; + uint8_t vg_name[NAME_LEN]; + uint32_t lv_access; + uint32_t lv_status; + uint32_t lv_open; /* HM */ + kdev_t lv_dev; /* HM */ + uint32_t lv_number; /* HM */ + uint32_t lv_mirror_copies; /* for future use */ + uint32_t lv_recovery; /* " */ + uint32_t lv_schedule; /* " */ + uint32_t lv_size; + pe_t *lv_current_pe; /* HM */ + uint32_t lv_current_le; /* for future use */ + uint32_t lv_allocated_le; + uint32_t lv_stripes; + uint32_t lv_stripesize; + uint32_t lv_badblock; /* for future use */ + uint32_t lv_allocation; + uint32_t lv_io_timeout; /* for future use */ + uint32_t lv_read_ahead; + /* delta to version 1 starts here */ + struct lv_v2 *lv_snapshot_org; + struct lv_v2 *lv_snapshot_prev; + struct lv_v2 *lv_snapshot_next; + lv_block_exception_t *lv_block_exception; + uint8_t __unused2; + uint32_t lv_remap_ptr; + uint32_t lv_remap_end; + uint32_t lv_chunk_size; + uint32_t lv_snapshot_minor; + struct kiobuf * lv_iobuf; + struct semaphore lv_snapshot_sem; + struct list_head * lv_snapshot_hash_table; + unsigned long lv_snapshot_hash_mask; +} lv_v2_t; + +/* disk */ +typedef struct { + uint8_t lv_name[NAME_LEN]; + uint8_t vg_name[NAME_LEN]; + uint32_t lv_access; + uint32_t lv_status; + uint32_t lv_open; /* HM */ + uint32_t lv_dev; /* HM */ + uint32_t lv_number; /* HM */ + uint32_t lv_mirror_copies; /* for future use */ + uint32_t lv_recovery; /* " */ + uint32_t lv_schedule; /* " */ + uint32_t lv_size; + uint32_t dummy; + uint32_t lv_current_le; /* for future use */ + uint32_t lv_allocated_le; + uint32_t lv_stripes; + uint32_t lv_stripesize; + uint32_t lv_badblock; /* for future use */ + uint32_t lv_allocation; + uint32_t lv_io_timeout; /* for future use */ + uint32_t lv_read_ahead; /* HM, for future use */ +} lv_disk_v2_t; + + +/* + * Structure Volume Group (VG) Version 1 + */ + +typedef struct { + uint8_t vg_name[NAME_LEN]; /* volume group name */ + uint32_t vg_number; /* volume group number */ + uint32_t vg_access; /* read/write */ + uint32_t vg_status; /* active or not */ + uint32_t lv_max; /* maximum logical volumes */ + uint32_t lv_cur; /* current logical volumes */ + uint32_t lv_open; /* open logical volumes */ + uint32_t pv_max; /* maximum physical volumes */ + uint32_t pv_cur; /* current physical volumes FU */ + uint32_t pv_act; /* active physical volumes */ + uint32_t dummy; /* was obsolete max_pe_per_pv */ + uint32_t vgda; /* volume group descriptor arrays FU */ + uint32_t pe_size; /* physical extent size in sectors */ + uint32_t pe_total; /* total of physical extents */ + uint32_t pe_allocated; /* allocated physical extents */ + uint32_t pvg_total; /* physical volume groups FU */ + struct proc_dir_entry *proc; + pv_t *pv[ABS_MAX_PV+1]; /* physical volume struct pointers */ + lv_t *lv[ABS_MAX_LV+1]; /* logical volume struct pointers */ +} vg_v1_t; + +typedef struct { + uint8_t vg_name[NAME_LEN]; /* volume group name */ + uint32_t vg_number; /* volume group number */ + uint32_t vg_access; /* read/write */ + uint32_t vg_status; /* active or not */ + uint32_t lv_max; /* maximum logical volumes */ + uint32_t lv_cur; /* current logical volumes */ + uint32_t lv_open; /* open logical volumes */ + uint32_t pv_max; /* maximum physical volumes */ + uint32_t pv_cur; /* current physical volumes FU */ + uint32_t pv_act; /* active physical volumes */ + uint32_t dummy; + uint32_t vgda; /* volume group descriptor arrays FU */ + uint32_t pe_size; /* physical extent size in sectors */ + uint32_t pe_total; /* total of physical extents */ + uint32_t pe_allocated; /* allocated physical extents */ + uint32_t pvg_total; /* physical volume groups FU */ +} vg_disk_v1_t; + +/* + * Structure Volume Group (VG) Version 2 + */ + +typedef struct { + uint8_t vg_name[NAME_LEN]; /* volume group name */ + uint32_t vg_number; /* volume group number */ + uint32_t vg_access; /* read/write */ + uint32_t vg_status; /* active or not */ + uint32_t lv_max; /* maximum logical volumes */ + uint32_t lv_cur; /* current logical volumes */ + uint32_t lv_open; /* open logical volumes */ + uint32_t pv_max; /* maximum physical volumes */ + uint32_t pv_cur; /* current physical volumes FU */ + uint32_t pv_act; /* future: active physical volumes */ + uint32_t max_pe_per_pv; /* OBSOLETE maximum PE/PV */ + uint32_t vgda; /* volume group descriptor arrays FU */ + uint32_t pe_size; /* physical extent size in sectors */ + uint32_t pe_total; /* total of physical extents */ + uint32_t pe_allocated; /* allocated physical extents */ + uint32_t pvg_total; /* physical volume groups FU */ + struct proc_dir_entry *proc; + pv_t *pv[ABS_MAX_PV+1]; /* physical volume struct pointers */ + lv_t *lv[ABS_MAX_LV+1]; /* logical volume struct pointers */ + /* delta to version 1 starts here */ + uint8_t vg_uuid[UUID_LEN]; /* volume group UUID */ + time_t vg_atime; /* VG access time */ + time_t vg_ctime; /* VG creation time */ + time_t vg_mtime; /* VG modification time */ +} vg_v2_t; + + +/* + * Timekeeping structure on disk (0.7 feature) + * + * Holds several timestamps for start/stop time of non + * atomic VGDA disk i/o operations + * + */ + +typedef struct { + uint32_t seconds; /* seconds since the epoch */ + uint32_t jiffies; /* micro timer */ +} lvm_time_t; + +#define TIMESTAMP_ID_SIZE 2 +typedef struct { + uint8_t id[TIMESTAMP_ID_SIZE]; /* Identifier */ + lvm_time_t pv_vg_lv_pe_io_begin; + lvm_time_t pv_vg_lv_pe_io_end; + lvm_time_t pv_io_begin; + lvm_time_t pv_io_end; + lvm_time_t vg_io_begin; + lvm_time_t vg_io_end; + lvm_time_t lv_io_begin; + lvm_time_t lv_io_end; + lvm_time_t pe_io_begin; + lvm_time_t pe_io_end; + lvm_time_t pe_move_io_begin; + lvm_time_t pe_move_io_end; + uint8_t dummy[LVM_TIMESTAMP_DISK_SIZE - + TIMESTAMP_ID_SIZE - + 12 * sizeof(lvm_time_t)]; + /* ATTENTION ^^ */ +} timestamp_disk_t; + +/* same on disk and in core so far */ +typedef timestamp_disk_t timestamp_t; + +/* function identifiers for timestamp actions */ +typedef enum { PV_VG_LV_PE_IO_BEGIN, + PV_VG_LV_PE_IO_END, + PV_IO_BEGIN, + PV_IO_END, + VG_IO_BEGIN, + VG_IO_END, + LV_IO_BEGIN, + LV_IO_END, + PE_IO_BEGIN, + PE_IO_END, + PE_MOVE_IO_BEGIN, + PE_MOVE_IO_END} ts_fct_id_t; + + +/* + * Request structures for ioctls + */ + +/* Request structure PV_STATUS */ +typedef struct { + char pv_name[NAME_LEN]; + pv_t *pv; +} pv_status_req_t, pv_change_req_t; + +/* Request structure PV_FLUSH */ +typedef struct { + char pv_name[NAME_LEN]; +} pv_flush_req_t; + + +/* Request structure PE_MOVE */ +typedef struct { + enum { LOCK_PE, UNLOCK_PE} lock; + struct { + kdev_t lv_dev; + kdev_t pv_dev; + uint32_t pv_offset; + } data; +} pe_lock_req_t; + + +/* Request structure LV_STATUS_BYNAME */ +typedef struct { + char lv_name[NAME_LEN]; + lv_t *lv; +} lv_status_byname_req_t, lv_req_t; + +/* Request structure LV_STATUS_BYINDEX */ +typedef struct { + ulong lv_index; + lv_t *lv; +} lv_status_byindex_req_t; + +#endif /* #ifndef _LVM_H_INCLUDE */ diff -urN 2.2.17pre19/include/linux/major.h 2.2.17pre19aa2/include/linux/major.h --- 2.2.17pre19/include/linux/major.h Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/include/linux/major.h Sat Aug 26 17:13:50 2000 @@ -117,6 +117,8 @@ #define AURORA_MAJOR 79 +#define RAW_MAJOR 162 + #define UNIX98_PTY_MASTER_MAJOR 128 #define UNIX98_PTY_MAJOR_COUNT 8 #define UNIX98_PTY_SLAVE_MAJOR (UNIX98_PTY_MASTER_MAJOR+UNIX98_PTY_MAJOR_COUNT) diff -urN 2.2.17pre19/include/linux/mm.h 2.2.17pre19aa2/include/linux/mm.h --- 2.2.17pre19/include/linux/mm.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/linux/mm.h Sat Aug 26 17:13:52 2000 @@ -54,7 +54,7 @@ struct vm_area_struct **vm_pprev_share; struct vm_operations_struct * vm_ops; - unsigned long vm_offset; + loff_t vm_offset; struct file * vm_file; unsigned long vm_pte; /* shared mem */ }; @@ -106,9 +106,46 @@ unsigned long (*wppage)(struct vm_area_struct * area, unsigned long address, unsigned long page); int (*swapout)(struct vm_area_struct *, struct page *); - pte_t (*swapin)(struct vm_area_struct *, unsigned long, unsigned long); + pte_t (*swapin)(struct vm_area_struct *, loff_t, unsigned long); }; + +/* + * pgoff_t type -- a complex one, and its simple alternate. + * The complex one has type that compiler can trap at compile + * time, but the simple one does simpler code (?) + */ + +#if 0 +typedef struct pgoff_t { + unsigned long pgoff; +} pgoff_t; + +#define pgoff2ulong(pgof) ((pgof).pgoff) +extern __inline__ pgoff_t ulong2pgoff(unsigned long ul) { + pgoff_t up; + up.pgoff = ul; + return up; +} + +#define pgoff2loff(pgof) (((loff_t)(pgof).pgoff) << PAGE_SHIFT) +#define loff2pgoff(loff) ulong2pgoff((loff) >> PAGE_SHIFT) + +#else /* Integer scalars -- simpler code.. */ + +typedef unsigned long pgoff_t; + +#define pgoff2ulong(pgof) (pgof) +#define ulong2pgoff(pgof) (pgof) + +#define pgoff2loff(pgof) (((loff_t)(pgof)) << PAGE_SHIFT) +#define loff2pgoff(loff) ulong2pgoff((loff) >> PAGE_SHIFT) + +#endif + +#define PAGE_MASK_loff ((loff_t)(long)(PAGE_MASK)) + + /* * Try to keep the most commonly accessed fields in single cache lines * here (16 bytes or greater). This ordering should be particularly @@ -117,12 +154,13 @@ * The first line is data used in page cache lookup, the second line * is used for linear searches (eg. clock algorithm scans). */ + typedef struct page { /* these must be first (free area handling) */ struct page *next; struct page *prev; + pgoff_t index; struct inode *inode; - unsigned long offset; struct page *next_hash; atomic_t count; unsigned long flags; /* atomic flags, some possibly updated asynchronously */ @@ -144,6 +182,7 @@ #define PG_Slab 9 #define PG_swap_cache 10 #define PG_skip 11 +#define PG_BIGMEM 12 #define PG_reserved 31 /* Make it prettier to test the above... */ @@ -175,6 +214,11 @@ (test_and_clear_bit(PG_dirty, &(page)->flags)) #define PageTestandClearSwapCache(page) \ (test_and_clear_bit(PG_swap_cache, &(page)->flags)) +#ifdef CONFIG_BIGMEM +#define PageBIGMEM(page) (test_bit(PG_BIGMEM, &(page)->flags)) +#else +#define PageBIGMEM(page) 0 /* needed to optimize away at compile time */ +#endif /* * Various page->flags bits: @@ -291,7 +335,7 @@ extern int remap_page_range(unsigned long from, unsigned long to, unsigned long size, pgprot_t prot); extern int zeromap_page_range(unsigned long from, unsigned long size, pgprot_t prot); -extern void vmtruncate(struct inode * inode, unsigned long offset); +extern void vmtruncate(struct inode * inode, loff_t offset); extern int handle_mm_fault(struct task_struct *tsk,struct vm_area_struct *vma, unsigned long address, int write_access); extern int make_pages_present(unsigned long addr, unsigned long end); @@ -311,16 +355,22 @@ extern void exit_mmap(struct mm_struct *); extern unsigned long get_unmapped_area(unsigned long, unsigned long); -extern unsigned long do_mmap(struct file *, unsigned long, unsigned long, - unsigned long, unsigned long, unsigned long); +extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, + unsigned long len, unsigned long prot, + unsigned long flag, unsigned long pgoff); + +extern unsigned long do_mmap(struct file *, unsigned long, + unsigned long, unsigned long, + unsigned long, unsigned long); + extern int do_munmap(unsigned long, size_t); /* filemap.c */ extern void remove_inode_page(struct page *); extern unsigned long page_unuse(struct page *); extern int shrink_mmap(int, int); -extern void truncate_inode_pages(struct inode *, unsigned long); -extern unsigned long get_cached_page(struct inode *, unsigned long, int); +extern void truncate_inode_pages(struct inode *, loff_t); +extern unsigned long get_cached_page(struct inode *, pgoff_t, int); extern void put_cached_page(unsigned long); /* @@ -332,11 +382,17 @@ #define __GFP_HIGH 0x08 #define __GFP_IO 0x10 #define __GFP_SWAP 0x20 +#ifdef CONFIG_BIGMEM +#define __GFP_BIGMEM 0x40 +#else +#define __GFP_BIGMEM 0x0 /* noop */ +#endif #define __GFP_DMA 0x80 #define GFP_BUFFER (__GFP_MED | __GFP_WAIT) #define GFP_ATOMIC (__GFP_HIGH) +#define GFP_BIGUSER (__GFP_LOW | __GFP_WAIT | __GFP_IO | __GFP_BIGMEM) #define GFP_USER (__GFP_LOW | __GFP_WAIT | __GFP_IO) #define GFP_KERNEL (__GFP_MED | __GFP_WAIT | __GFP_IO) #define GFP_NFS (__GFP_HIGH | __GFP_WAIT | __GFP_IO) @@ -347,13 +403,23 @@ #define GFP_DMA __GFP_DMA +/* Flag - indicates that the buffer can be taken from big memory which is not + directly addressable by the kernel */ + +#define GFP_BIGMEM __GFP_BIGMEM + +extern int heap_stack_gap; + /* vma is the first one with address < vma->vm_end, * and even address < vma->vm_start. Have to extend vma. */ -static inline int expand_stack(struct vm_area_struct * vma, unsigned long address) +static inline int expand_stack(struct vm_area_struct * vma, unsigned long address, + struct vm_area_struct * prev_vma) { unsigned long grow; address &= PAGE_MASK; + if (prev_vma && prev_vma->vm_end + (heap_stack_gap << PAGE_SHIFT) > address) + return -ENOMEM; grow = vma->vm_start - address; if ((vma->vm_end - address > current->rlim[RLIMIT_STACK].rlim_cur) || diff -urN 2.2.17pre19/include/linux/nfs.h 2.2.17pre19aa2/include/linux/nfs.h --- 2.2.17pre19/include/linux/nfs.h Tue Feb 1 18:24:19 2000 +++ 2.2.17pre19aa2/include/linux/nfs.h Sat Aug 26 17:13:52 2000 @@ -111,7 +111,7 @@ __u32 nlink; __u32 uid; __u32 gid; - __u32 size; + __u64 size; __u32 blocksize; __u32 rdev; __u32 blocks; @@ -126,7 +126,7 @@ __u32 mode; __u32 uid; __u32 gid; - __u32 size; + __u64 size; struct nfs_time atime; struct nfs_time mtime; }; @@ -141,7 +141,7 @@ struct nfs_writeargs { struct nfs_fh * fh; - __u32 offset; + __u64 offset; __u32 count; const void * buffer; }; @@ -160,7 +160,7 @@ struct nfs_readargs { struct nfs_fh * fh; - __u32 offset; + __u64 offset; __u32 count; void * buffer; }; diff -urN 2.2.17pre19/include/linux/nfs_fs.h 2.2.17pre19aa2/include/linux/nfs_fs.h --- 2.2.17pre19/include/linux/nfs_fs.h Fri May 19 20:24:31 2000 +++ 2.2.17pre19aa2/include/linux/nfs_fs.h Sat Aug 26 17:13:52 2000 @@ -143,10 +143,10 @@ void **p0, char **string, unsigned int *len, unsigned int maxlen); extern int nfs_proc_read(struct nfs_server *server, struct nfs_fh *fhandle, - int swap, unsigned long offset, unsigned int count, + int swap, loff_t offset, unsigned int count, void *buffer, struct nfs_fattr *fattr); extern int nfs_proc_write(struct nfs_server *server, struct nfs_fh *fhandle, - int swap, unsigned long offset, unsigned int count, + int swap, loff_t offset, unsigned int count, const void *buffer, struct nfs_fattr *fattr); extern int nfs_proc_create(struct nfs_server *server, struct nfs_fh *dir, const char *name, struct nfs_sattr *sattr, diff -urN 2.2.17pre19/include/linux/nfsd/nfsd.h 2.2.17pre19aa2/include/linux/nfsd/nfsd.h --- 2.2.17pre19/include/linux/nfsd/nfsd.h Wed Aug 23 16:41:12 2000 +++ 2.2.17pre19aa2/include/linux/nfsd/nfsd.h Sat Aug 26 17:13:52 2000 @@ -54,7 +54,7 @@ char dotonly; }; typedef int (*encode_dent_fn)(struct readdir_cd *, const char *, - int, off_t, ino_t); + int, off_t, ino_t, unsigned int); typedef int (*nfsd_dirop_t)(struct inode *, struct dentry *, int, int); /* diff -urN 2.2.17pre19/include/linux/nfsd/xdr.h 2.2.17pre19aa2/include/linux/nfsd/xdr.h --- 2.2.17pre19/include/linux/nfsd/xdr.h Wed Aug 23 16:41:12 2000 +++ 2.2.17pre19aa2/include/linux/nfsd/xdr.h Sat Aug 26 17:13:52 2000 @@ -152,7 +152,7 @@ int nfssvc_encode_readdirres(struct svc_rqst *, u32 *, struct nfsd_readdirres *); int nfssvc_encode_entry(struct readdir_cd *, const char *name, - int namlen, off_t offset, ino_t ino); + int namlen, off_t offset, ino_t ino, unsigned int d_type); int nfssvc_release_fhandle(struct svc_rqst *, u32 *, struct nfsd_fhandle *); diff -urN 2.2.17pre19/include/linux/nfsd/xdr3.h 2.2.17pre19aa2/include/linux/nfsd/xdr3.h --- 2.2.17pre19/include/linux/nfsd/xdr3.h Mon Jan 17 16:44:48 2000 +++ 2.2.17pre19aa2/include/linux/nfsd/xdr3.h Sat Aug 26 17:13:52 2000 @@ -263,6 +263,6 @@ int nfs3svc_release_fhandle2(struct svc_rqst *, u32 *, struct nfsd3_fhandle2 *); int nfs3svc_encode_entry(struct readdir_cd *, const char *name, - int namlen, unsigned long offset, ino_t ino); + int namlen, unsigned long offset, ino_t ino, unsigned int d_type); #endif /* LINUX_NFSD_XDR3_H */ diff -urN 2.2.17pre19/include/linux/pagemap.h 2.2.17pre19aa2/include/linux/pagemap.h --- 2.2.17pre19/include/linux/pagemap.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/linux/pagemap.h Sat Aug 26 17:13:52 2000 @@ -28,6 +28,7 @@ #define PAGE_CACHE_SHIFT PAGE_SHIFT #define PAGE_CACHE_SIZE PAGE_SIZE #define PAGE_CACHE_MASK PAGE_MASK +#define PAGE_CACHE_MASK_loff PAGE_MASK_loff #define page_cache_alloc() __get_free_page(GFP_USER) #define page_cache_free(x) free_page(x) @@ -54,10 +55,10 @@ * inode pointer and offsets are distributed (ie, we * roughly know which bits are "significant") */ -static inline unsigned long _page_hashfn(struct inode * inode, unsigned long offset) +static inline unsigned long _page_hashfn(struct inode * inode, pgoff_t index) { #define i (((unsigned long) inode)/(sizeof(struct inode) & ~ (sizeof(struct inode) - 1))) -#define o ((offset >> PAGE_SHIFT) + (offset & ~PAGE_MASK)) +#define o ((pgoff2ulong(index) >> PAGE_SHIFT) + (pgoff2ulong(index) & ~PAGE_MASK)) return ((i+o) & PAGE_HASH_MASK); #undef i #undef o @@ -65,7 +66,7 @@ #define page_hash(inode,offset) (page_hash_table+_page_hashfn(inode,offset)) -static inline struct page * __find_page(struct inode * inode, unsigned long offset, struct page *page) +static inline struct page * __find_page(struct inode * inode, pgoff_t index, struct page *page) { goto inside; for (;;) { @@ -75,7 +76,7 @@ goto not_found; if (page->inode != inode) continue; - if (page->offset == offset) + if (pgoff2ulong(page->index) == pgoff2ulong(index)) break; } /* Found the page. */ @@ -85,9 +86,9 @@ return page; } -static inline struct page *find_page(struct inode * inode, unsigned long offset) +static inline struct page *find_page(struct inode * inode, pgoff_t poffset) { - return __find_page(inode, offset, *page_hash(inode, offset)); + return __find_page(inode, poffset, *page_hash(inode, poffset)); } static inline void remove_page_from_hash_queue(struct page * page) @@ -110,9 +111,9 @@ page->pprev_hash = p; } -static inline void add_page_to_hash_queue(struct page * page, struct inode * inode, unsigned long offset) +static inline void add_page_to_hash_queue(struct page * page, struct inode * inode, pgoff_t poffset) { - __add_page_to_hash_queue(page, page_hash(inode,offset)); + __add_page_to_hash_queue(page, page_hash(inode,poffset)); } static inline void remove_page_from_inode_queue(struct page * page) @@ -150,7 +151,7 @@ __wait_on_page(page); } -extern void update_vm_cache_conditional(struct inode *, unsigned long, const char *, int, unsigned long); -extern void update_vm_cache(struct inode *, unsigned long, const char *, int); +extern void update_vm_cache_conditional(struct inode *, loff_t, const char *, int, unsigned long); +extern void update_vm_cache(struct inode *, loff_t, const char *, int); #endif diff -urN 2.2.17pre19/include/linux/raw.h 2.2.17pre19aa2/include/linux/raw.h --- 2.2.17pre19/include/linux/raw.h Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/include/linux/raw.h Sat Aug 26 17:13:50 2000 @@ -0,0 +1,23 @@ +#ifndef __LINUX_RAW_H +#define __LINUX_RAW_H + +#include + +#define RAW_SETBIND _IO( 0xac, 0 ) +#define RAW_GETBIND _IO( 0xac, 1 ) + +struct raw_config_request +{ + int raw_minor; + __u64 block_major; + __u64 block_minor; +}; + +#ifdef __KERNEL__ + +/* drivers/char/raw.c */ +extern void raw_init(void); + +#endif /* __KERNEL__ */ + +#endif /* __LINUX_RAW_H */ diff -urN 2.2.17pre19/include/linux/sched.h 2.2.17pre19aa2/include/linux/sched.h --- 2.2.17pre19/include/linux/sched.h Wed Aug 23 19:57:22 2000 +++ 2.2.17pre19aa2/include/linux/sched.h Sat Aug 26 17:13:52 2000 @@ -299,7 +299,7 @@ kernel_cap_t cap_effective, cap_inheritable, cap_permitted; struct user_struct *user; /* limits */ - struct rlimit rlim[RLIM_NLIMITS]; + struct rlimit rlim[RLIM_NLIMITS]; unsigned short used_math; char comm[16]; /* file system info */ @@ -316,6 +316,8 @@ struct files_struct *files; /* memory management info */ struct mm_struct *mm; + struct list_head local_pages; int allocation_order, nr_local_pages; + int fs_locks; /* signal handlers */ spinlock_t sigmask_lock; /* Protects signal and blocked */ @@ -348,6 +350,7 @@ #define PF_SIGNALED 0x00000400 /* killed by a signal */ #define PF_MEMALLOC 0x00000800 /* Allocating memory */ #define PF_VFORK 0x00001000 /* Wake up parent in mm_release */ +#define PF_FREE_PAGES 0x00002000 /* The current-> */ #define PF_USEDFPU 0x00100000 /* task used FPU this quantum (SMP) */ #define PF_DTRACE 0x00200000 /* delayed trace (used on m68k, i386) */ @@ -395,7 +398,7 @@ /* tss */ INIT_TSS, \ /* fs */ &init_fs, \ /* files */ &init_files, \ -/* mm */ &init_mm, \ +/* mm */ &init_mm, { &init_task.local_pages, &init_task.local_pages}, 0, 0, 0, \ /* signals */ SPIN_LOCK_UNLOCKED, &init_signals, {{0}}, {{0}}, NULL, &init_task.sigqueue, 0, 0, \ /* exec cts */ 0,0, \ /* oom */ 0, \ diff -urN 2.2.17pre19/include/linux/shm.h 2.2.17pre19aa2/include/linux/shm.h --- 2.2.17pre19/include/linux/shm.h Tue Feb 1 18:24:19 2000 +++ 2.2.17pre19aa2/include/linux/shm.h Sat Aug 26 17:13:49 2000 @@ -7,7 +7,7 @@ struct shmid_ds { struct ipc_perm shm_perm; /* operation perms */ - int shm_segsz; /* size of segment (bytes) */ + unsigned int shm_segsz; /* size of segment (bytes) */ __kernel_time_t shm_atime; /* last attach time */ __kernel_time_t shm_dtime; /* last detach time */ __kernel_time_t shm_ctime; /* last change time */ @@ -68,7 +68,7 @@ #define SHM_DEST 01000 /* segment will be destroyed on last detach */ #define SHM_LOCKED 02000 /* segment will not be swapped */ -asmlinkage int sys_shmget (key_t key, int size, int flag); +asmlinkage int sys_shmget (key_t key, unsigned int size, int flag); asmlinkage int sys_shmat (int shmid, char *shmaddr, int shmflg, unsigned long *addr); asmlinkage int sys_shmdt (char *shmaddr); asmlinkage int sys_shmctl (int shmid, int cmd, struct shmid_ds *buf); diff -urN 2.2.17pre19/include/linux/smb_fs.h 2.2.17pre19aa2/include/linux/smb_fs.h --- 2.2.17pre19/include/linux/smb_fs.h Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/include/linux/smb_fs.h Sat Aug 26 17:13:52 2000 @@ -128,8 +128,8 @@ void smb_close_dentry(struct dentry *); int smb_close_fileid(struct dentry *, __u16); int smb_open(struct dentry *, int); -int smb_proc_read(struct dentry *, off_t, int, char *); -int smb_proc_write(struct dentry *, off_t, int, const char *); +int smb_proc_read(struct dentry *, loff_t, int, char *); +int smb_proc_write(struct dentry *, loff_t, int, const char *); int smb_proc_create(struct dentry *, __u16, time_t, __u16 *); int smb_proc_mv(struct dentry *, struct dentry *); int smb_proc_mkdir(struct dentry *); diff -urN 2.2.17pre19/include/linux/swap.h 2.2.17pre19aa2/include/linux/swap.h --- 2.2.17pre19/include/linux/swap.h Fri Jun 30 04:03:13 2000 +++ 2.2.17pre19aa2/include/linux/swap.h Sat Aug 26 17:13:52 2000 @@ -114,7 +114,7 @@ extern unsigned int nr_swapfiles; extern struct swap_info_struct swap_info[]; void si_swapinfo(struct sysinfo *); -unsigned long get_swap_page(void); +extern unsigned long get_swap_page(void); extern void FASTCALL(swap_free(unsigned long)); struct swap_list_t { int head; /* head of priority-ordered swapfile list */ @@ -147,7 +147,7 @@ extern inline unsigned long in_swap_cache(struct page *page) { if (PageSwapCache(page)) - return page->offset; + return pgoff2ulong(page->index); return 0; } @@ -164,7 +164,7 @@ return 1; count = atomic_read(&page->count); if (PageSwapCache(page)) - count += swap_count(page->offset) - 2; + count += swap_count(pgoff2ulong(page->index)) - 2; if (PageFreeAfter(page)) count--; return count > 1; diff -urN 2.2.17pre19/include/linux/sysctl.h 2.2.17pre19aa2/include/linux/sysctl.h --- 2.2.17pre19/include/linux/sysctl.h Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/include/linux/sysctl.h Sat Aug 26 17:13:49 2000 @@ -121,7 +121,8 @@ VM_PAGECACHE=7, /* struct: Set cache memory thresholds */ VM_PAGERDAEMON=8, /* struct: Control kswapd behaviour */ VM_PGT_CACHE=9, /* struct: Set page table cache parameters */ - VM_PAGE_CLUSTER=10 /* int: set number of pages to swap together */ + VM_PAGE_CLUSTER=10, /* int: set number of pages to swap together */ + VM_HEAP_STACK_GAP=11, /* int: page gap between heap and stack */ }; diff -urN 2.2.17pre19/include/linux/time.h 2.2.17pre19aa2/include/linux/time.h --- 2.2.17pre19/include/linux/time.h Tue Feb 1 18:24:19 2000 +++ 2.2.17pre19aa2/include/linux/time.h Sat Aug 26 17:13:48 2000 @@ -46,10 +46,53 @@ value->tv_sec = jiffies / HZ; } +static __inline__ int +timespec_before(struct timespec a, struct timespec b) +{ + if (a.tv_sec == b.tv_sec) + return a.tv_nsec < b.tv_nsec; + return a.tv_sec < b.tv_sec; +} + +/* computes `a - b' and write the result in `result', assumes `a >= b' */ +static inline void +timespec_less(struct timespec a, struct timespec b, struct timespec * result) +{ + if (a.tv_nsec < b.tv_nsec) + { + a.tv_sec--; + a.tv_nsec += 1000000000; + } + + result->tv_sec = a.tv_sec - b.tv_sec; + result->tv_nsec = a.tv_nsec - b.tv_nsec; +} + struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ }; + +/* computes `a - b' and write the result in `result', assumes `a >= b' */ +static inline void +timeval_less(struct timeval a, struct timeval b, struct timeval * result) +{ + if (a.tv_usec < b.tv_usec) + { + a.tv_sec--; + a.tv_usec += 1000000; + } + + result->tv_sec = a.tv_sec - b.tv_sec; + result->tv_usec = a.tv_usec - b.tv_usec; +} + +static __inline__ void +timeval_to_timespec(struct timeval tv, struct timespec * ts) +{ + ts->tv_sec = tv.tv_sec; + ts->tv_nsec = (long) tv.tv_usec * 1000; +} struct timezone { int tz_minuteswest; /* minutes west of Greenwich */ diff -urN 2.2.17pre19/include/linux/ufs_fs_i.h 2.2.17pre19aa2/include/linux/ufs_fs_i.h --- 2.2.17pre19/include/linux/ufs_fs_i.h Tue Feb 1 18:24:19 2000 +++ 2.2.17pre19aa2/include/linux/ufs_fs_i.h Sat Aug 26 17:13:52 2000 @@ -18,7 +18,6 @@ __u32 i_data[15]; __u8 i_symlink[4*15]; } i_u1; - __u64 i_size; __u32 i_flags; __u32 i_gen; __u32 i_shadow; diff -urN 2.2.17pre19/init/main.c 2.2.17pre19aa2/init/main.c --- 2.2.17pre19/init/main.c Tue Aug 22 14:54:13 2000 +++ 2.2.17pre19aa2/init/main.c Sat Aug 26 17:13:52 2000 @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -79,7 +80,6 @@ extern int bdflush(void *); extern int kupdate(void *); extern int kswapd(void *); -extern int kpiod(void *); extern void kswapd_setup(void); extern unsigned long init_IRQ( unsigned long); extern void init_modules(void); @@ -1413,6 +1413,7 @@ #ifdef CONFIG_ARCH_S390 ccwcache_init(); #endif + kiobuf_init(); signals_init(); inode_init(); file_table_init(); @@ -1530,7 +1531,6 @@ kernel_thread(kupdate, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND); /* Start the background pageout daemon. */ kswapd_setup(); - kernel_thread(kpiod, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND); kernel_thread(kswapd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND); #if CONFIG_AP1000 diff -urN 2.2.17pre19/ipc/shm.c 2.2.17pre19aa2/ipc/shm.c --- 2.2.17pre19/ipc/shm.c Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/ipc/shm.c Sat Aug 26 17:13:52 2000 @@ -4,6 +4,7 @@ * Many improvements/fixes by Bruno Haible. * Replaced `struct shm_desc' by `struct vm_area_struct', July 1994. * Fixed the shm swap deallocation (shm_unuse()), August 1998 Andrea Arcangeli. + * BIGMEM support, Andrea Arcangeli */ #include @@ -13,6 +14,8 @@ #include #include #include +#include +#include #include #include @@ -20,7 +23,7 @@ extern int ipcperms (struct ipc_perm *ipcp, short shmflg); extern unsigned long get_swap_page (void); static int findkey (key_t key); -static int newseg (key_t key, int shmflg, int size); +static int newseg (key_t key, int shmflg, unsigned int size); static int shm_map (struct vm_area_struct *shmd); static void killseg (int id); static void shm_open (struct vm_area_struct *shmd); @@ -75,7 +78,7 @@ /* * allocate new shmid_kernel and pgtable. protected by shm_segs[id] = NOID. */ -static int newseg (key_t key, int shmflg, int size) +static int newseg (key_t key, int shmflg, unsigned int size) { struct shmid_kernel *shp; int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT; @@ -136,9 +139,9 @@ return (unsigned int) shp->u.shm_perm.seq * SHMMNI + id; } -int shmmax = SHMMAX; +unsigned int shmmax = SHMMAX; -asmlinkage int sys_shmget (key_t key, int size, int shmflg) +asmlinkage int sys_shmget (key_t key, unsigned int size, int shmflg) { struct shmid_kernel *shp; int err, id = 0; @@ -648,21 +651,29 @@ pte = __pte(shp->shm_pages[idx]); if (!pte_present(pte)) { - unsigned long page = get_free_page(GFP_USER); + unsigned long page = __get_free_page(GFP_BIGUSER); if (!page) return -1; + clear_bigpage(page); pte = __pte(shp->shm_pages[idx]); if (pte_present(pte)) { free_page (page); /* doesn't sleep */ goto done; } if (!pte_none(pte)) { + struct page * page_map; + + page = prepare_bigmem_shm_swapin(page); + if (!page) + return -1; rw_swap_page_nocache(READ, pte_val(pte), (char *)page); pte = __pte(shp->shm_pages[idx]); if (pte_present(pte)) { free_page (page); /* doesn't sleep */ goto done; } + page_map = replace_with_bigmem(&mem_map[MAP_NR(page)]); + page = page_address(page_map); swap_free(pte_val(pte)); shm_swp--; } @@ -679,7 +690,7 @@ } /* - * Goes through counter = (shm_rss >> prio) present shm pages. + * Goes through counter = (shm_rss / prio) present shm pages. */ static unsigned long swap_id = 0; /* currently being swapped */ static unsigned long swap_idx = 0; /* next to swap */ @@ -692,8 +703,9 @@ unsigned long id, idx; int loop = 0; int counter; + struct page * page_map; - counter = shm_rss >> prio; + counter = shm_rss / prio; if (!counter || !(swap_nr = get_swap_page())) return 0; @@ -720,7 +732,10 @@ page = __pte(shp->shm_pages[idx]); if (!pte_present(page)) goto check_table; - if ((gfp_mask & __GFP_DMA) && !PageDMA(&mem_map[MAP_NR(pte_page(page))])) + page_map = &mem_map[MAP_NR(pte_page(page))]; + if ((gfp_mask & __GFP_DMA) && !PageDMA(page_map)) + goto check_table; + if (!(gfp_mask & __GFP_BIGMEM) && PageBIGMEM(page_map)) goto check_table; swap_attempts++; @@ -729,11 +744,13 @@ swap_free (swap_nr); return 0; } - if (atomic_read(&mem_map[MAP_NR(pte_page(page))].count) != 1) + if (atomic_read(&page_map->count) != 1) + goto check_table; + if (!(page_map = prepare_bigmem_swapout(page_map))) goto check_table; shp->shm_pages[idx] = swap_nr; - rw_swap_page_nocache (WRITE, swap_nr, (char *) pte_page(page)); - free_page(pte_page(page)); + rw_swap_page_nocache (WRITE, swap_nr, (char *) page_address(page_map)); + __free_page(page_map); swap_successes++; shm_swp++; shm_rss--; diff -urN 2.2.17pre19/ipc/util.c 2.2.17pre19aa2/ipc/util.c --- 2.2.17pre19/ipc/util.c Mon Jan 17 16:44:49 2000 +++ 2.2.17pre19aa2/ipc/util.c Sat Aug 26 17:13:49 2000 @@ -99,7 +99,7 @@ return -ENOSYS; } -asmlinkage int sys_shmget (key_t key, int size, int flag) +asmlinkage int sys_shmget (key_t key, unsigned int size, int flag) { return -ENOSYS; } diff -urN 2.2.17pre19/kernel/fork.c 2.2.17pre19aa2/kernel/fork.c --- 2.2.17pre19/kernel/fork.c Mon Jan 17 16:44:50 2000 +++ 2.2.17pre19aa2/kernel/fork.c Sat Aug 26 17:13:52 2000 @@ -665,6 +665,8 @@ p->lock_depth = -1; /* -1 = no lock */ p->start_time = jiffies; + INIT_LIST_HEAD(&p->local_pages); + retval = -ENOMEM; /* copy all the process information */ if (copy_files(clone_flags, p)) diff -urN 2.2.17pre19/kernel/ksyms.c 2.2.17pre19aa2/kernel/ksyms.c --- 2.2.17pre19/kernel/ksyms.c Tue Aug 22 14:54:13 2000 +++ 2.2.17pre19aa2/kernel/ksyms.c Sat Aug 26 17:13:52 2000 @@ -39,6 +39,7 @@ #include #include #include +#include #if defined(CONFIG_PROC_FS) #include @@ -72,6 +73,13 @@ }; #endif +#ifdef CONFIG_BLK_DEV_LVM_MODULE + extern int (*lvm_map_ptr) ( int, kdev_t *, unsigned long *, + unsigned long, int); + extern void (*lvm_hd_name_ptr) ( char*, int); + EXPORT_SYMBOL(lvm_map_ptr); + EXPORT_SYMBOL(lvm_hd_name_ptr); +#endif #ifdef CONFIG_KMOD EXPORT_SYMBOL(request_module); @@ -85,6 +93,7 @@ /* process memory management */ EXPORT_SYMBOL(do_mmap); +EXPORT_SYMBOL(do_mmap_pgoff); EXPORT_SYMBOL(do_munmap); EXPORT_SYMBOL(exit_mm); EXPORT_SYMBOL(exit_files); @@ -108,6 +117,7 @@ EXPORT_SYMBOL(mem_map); EXPORT_SYMBOL(remap_page_range); EXPORT_SYMBOL(max_mapnr); +EXPORT_SYMBOL(num_physpages); EXPORT_SYMBOL(high_memory); EXPORT_SYMBOL(update_vm_cache); EXPORT_SYMBOL(update_vm_cache_conditional); @@ -240,6 +250,14 @@ EXPORT_SYMBOL(max_sectors); EXPORT_SYMBOL(max_segments); EXPORT_SYMBOL(max_readahead); + +/* kiobuf support */ +EXPORT_SYMBOL(map_user_kiobuf); +EXPORT_SYMBOL(unmap_kiobuf); +EXPORT_SYMBOL(alloc_kiovec); +EXPORT_SYMBOL(free_kiovec); +EXPORT_SYMBOL(expand_kiobuf); +EXPORT_SYMBOL(brw_kiovec); /* tty routines */ EXPORT_SYMBOL(tty_hangup); diff -urN 2.2.17pre19/kernel/sched.c 2.2.17pre19aa2/kernel/sched.c --- 2.2.17pre19/kernel/sched.c Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/kernel/sched.c Sat Aug 26 17:13:49 2000 @@ -212,101 +212,89 @@ } /* - * If there is a dependency between p1 and p2, - * don't be too eager to go into the slow schedule. - * In particular, if p1 and p2 both want the kernel - * lock, there is no point in trying to make them - * extremely parallel.. - * - * (No lock - lock_depth < 0) - * - * There are two additional metrics here: - * - * first, a 'cutoff' interval, currently 0-200 usecs on - * x86 CPUs, depending on the size of the 'SMP-local cache'. - * If the current process has longer average timeslices than - * this, then we utilize the idle CPU. - * - * second, if the wakeup comes from a process context, - * then the two processes are 'related'. (they form a - * 'gang') - * - * An idle CPU is almost always a bad thing, thus we skip - * the idle-CPU utilization only if both these conditions - * are true. (ie. a 'process-gang' rescheduling with rather - * high frequency should stay on the same CPU). - * - * [We can switch to something more finegrained in 2.3.] - * - * do not 'guess' if the to-be-scheduled task is RT. + * This is ugly, but reschedule_idle() is very timing-critical. + * We enter with the runqueue spinlock held, but we might end + * up unlocking it early, so the caller must not unlock the + * runqueue, it's always done by reschedule_idle(). */ -#define related(p1,p2) (((p1)->lock_depth >= 0) && (p2)->lock_depth >= 0) && \ - (((p2)->policy == SCHED_OTHER) && ((p1)->avg_slice < cacheflush_time)) - -static inline void reschedule_idle_slow(struct task_struct * p) +static inline void reschedule_idle(struct task_struct * p, unsigned long flags) { #ifdef __SMP__ -/* - * (see reschedule_idle() for an explanation first ...) - * - * Pass #2 - * - * We try to find another (idle) CPU for this woken-up process. - * - * On SMP, we mostly try to see if the CPU the task used - * to run on is idle.. but we will use another idle CPU too, - * at this point we already know that this CPU is not - * willing to reschedule in the near future. - * - * An idle CPU is definitely wasted, especially if this CPU is - * running long-timeslice processes. The following algorithm is - * pretty good at finding the best idle CPU to send this process - * to. - * - * [We can try to preempt low-priority processes on other CPUs in - * 2.3. Also we can try to use the avg_slice value to predict - * 'likely reschedule' events even on other CPUs.] - */ int this_cpu = smp_processor_id(), target_cpu; - struct task_struct *tsk, *target_tsk; - int cpu, best_cpu, weight, best_weight, i; - unsigned long flags; - - best_weight = 0; /* prevents negative weight */ - - spin_lock_irqsave(&runqueue_lock, flags); + struct task_struct *tsk; + int cpu, best_cpu, i; /* * shortcut if the woken up task's last CPU is * idle now. */ best_cpu = p->processor; - target_tsk = idle_task(best_cpu); - if (cpu_curr(best_cpu) == target_tsk) + tsk = idle_task(best_cpu); + if (cpu_curr(best_cpu) == tsk) goto send_now; - target_tsk = NULL; - for (i = 0; i < smp_num_cpus; i++) { + /* + * We know that the preferred CPU has a cache-affine current + * process, lets try to find a new idle CPU for the woken-up + * process: + */ + for (i = smp_num_cpus - 1; i >= 0; i--) { cpu = cpu_logical_map(i); + if (cpu == best_cpu) + continue; tsk = cpu_curr(cpu); - if (related(tsk, p)) - goto out_no_target; - weight = preemption_goodness(tsk, p, cpu); - if (weight > best_weight) { - best_weight = weight; - target_tsk = tsk; - } + /* + * We use the last available idle CPU. This creates + * a priority list between idle CPUs, but this is not + * a problem. + */ + if (tsk == idle_task(cpu)) + goto send_now; } /* - * found any suitable CPU? + * No CPU is idle, but maybe this process has enough priority + * to preempt it's preferred CPU. + */ + tsk = cpu_curr(best_cpu); + if (preemption_goodness(tsk, p, best_cpu) > 0) + goto send_now; + + /* + * We will get here often - or in the high CPU contention + * case. No CPU is idle and this process is either lowprio or + * the preferred CPU is highprio. Try to preemt some other CPU + * only if it's RT or if it's iteractive and the preferred + * cpu won't reschedule shortly. */ - if (!target_tsk) - goto out_no_target; + if ((p->avg_slice < cacheflush_time && cpu_curr(best_cpu)->avg_slice > cacheflush_time) || + ((p->policy & ~SCHED_YIELD) != SCHED_OTHER)) + { + int weight, best_weight = 0; + struct task_struct * best_tsk = NULL; + + for (i = smp_num_cpus - 1; i >= 0; i--) { + cpu = cpu_logical_map(i); + if (cpu == best_cpu) + continue; + tsk = cpu_curr(cpu); + weight = preemption_goodness(tsk, p, cpu); + if (weight > best_weight) { + best_weight = weight; + best_tsk = tsk; + } + } + + if ((tsk = best_tsk)) + goto send_now; + } + + spin_unlock_irqrestore(&runqueue_lock, flags); + return; send_now: - target_cpu = target_tsk->processor; - target_tsk->need_resched = 1; + target_cpu = tsk->processor; + tsk->need_resched = 1; spin_unlock_irqrestore(&runqueue_lock, flags); /* * the APIC stuff can go outside of the lock because @@ -315,9 +303,6 @@ if (target_cpu != this_cpu) smp_send_reschedule(target_cpu); return; -out_no_target: - spin_unlock_irqrestore(&runqueue_lock, flags); - return; #else /* UP */ int this_cpu = smp_processor_id(); struct task_struct *tsk; @@ -325,38 +310,10 @@ tsk = current; if (preemption_goodness(tsk, p, this_cpu) > 0) tsk->need_resched = 1; + spin_unlock_irqrestore(&runqueue_lock, flags); #endif } -static void reschedule_idle(struct task_struct * p) -{ -#ifdef __SMP__ - int cpu = smp_processor_id(); - /* - * ("wakeup()" should not be called before we've initialized - * SMP completely. - * Basically a not-yet initialized SMP subsystem can be - * considered as a not-yet working scheduler, simply dont use - * it before it's up and running ...) - * - * SMP rescheduling is done in 2 passes: - * - pass #1: faster: 'quick decisions' - * - pass #2: slower: 'lets try and find a suitable CPU' - */ - - /* - * Pass #1. (subtle. We might be in the middle of __switch_to, so - * to preserve scheduling atomicity we have to use cpu_curr) - */ - if ((p->processor == cpu) && related(cpu_curr(cpu), p)) - return; -#endif /* __SMP__ */ - /* - * Pass #2 - */ - reschedule_idle_slow(p); -} - /* * Careful! * @@ -453,9 +410,8 @@ if (p->next_run) goto out; add_to_runqueue(p); - spin_unlock_irqrestore(&runqueue_lock, flags); + reschedule_idle(p, flags); // spin_unlocks runqueue - reschedule_idle(p); return; out: spin_unlock_irqrestore(&runqueue_lock, flags); @@ -668,8 +624,12 @@ { #ifdef __SMP__ if ((prev->state == TASK_RUNNING) && - (prev != idle_task(smp_processor_id()))) - reschedule_idle(prev); + (prev != idle_task(smp_processor_id()))) { + unsigned long flags; + + spin_lock_irqsave(&runqueue_lock, flags); + reschedule_idle(prev, flags); // spin_unlocks runqueue + } wmb(); prev->has_cpu = 0; #endif /* __SMP__ */ @@ -696,6 +656,7 @@ struct task_struct *prev, *next, *p; int this_cpu, c; + sti(); if (tq_scheduler) goto handle_tq_scheduler; tq_scheduler_back: @@ -1919,6 +1880,7 @@ { struct timespec t; unsigned long expire; + struct timeval before, after; if(copy_from_user(&t, rqtp, sizeof(struct timespec))) return -EFAULT; @@ -1943,11 +1905,20 @@ expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec); current->state = TASK_INTERRUPTIBLE; + get_fast_time(&before); expire = schedule_timeout(expire); + get_fast_time(&after); if (expire) { if (rmtp) { - jiffies_to_timespec(expire, &t); + struct timespec elapsed; + + timeval_less(after, before, &after); + timeval_to_timespec(after, &elapsed); + if (timespec_before(elapsed, t)) + timespec_less(t, elapsed, &t); + else + t.tv_nsec = t.tv_sec = 0; if (copy_to_user(rmtp, &t, sizeof(struct timespec))) return -EFAULT; } diff -urN 2.2.17pre19/kernel/sysctl.c 2.2.17pre19aa2/kernel/sysctl.c --- 2.2.17pre19/kernel/sysctl.c Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/kernel/sysctl.c Sat Aug 26 17:13:49 2000 @@ -252,6 +252,8 @@ &pgt_cache_water, 2*sizeof(int), 0600, NULL, &proc_dointvec}, {VM_PAGE_CLUSTER, "page-cluster", &page_cluster, sizeof(int), 0600, NULL, &proc_dointvec}, + {VM_HEAP_STACK_GAP, "heap-stack-gap", + &heap_stack_gap, sizeof(int), 0644, NULL, &proc_dointvec}, {0} }; diff -urN 2.2.17pre19/lib/vsprintf.c 2.2.17pre19aa2/lib/vsprintf.c --- 2.2.17pre19/lib/vsprintf.c Mon Jan 17 16:44:50 2000 +++ 2.2.17pre19aa2/lib/vsprintf.c Sat Aug 26 17:13:52 2000 @@ -67,10 +67,106 @@ #define LARGE 64 /* use 'ABCDEF' instead of 'abcdef' */ #define do_div(n,base) ({ \ -int __res; \ -__res = ((unsigned long) n) % (unsigned) base; \ -n = ((unsigned long) n) / (unsigned) base; \ -__res; }) + int __res; \ + __res = ((unsigned long) n) % (unsigned) base; \ + n = ((unsigned long) n) / (unsigned) base; \ + __res; }) + +#if BITS_PER_LONG < 64 + +/* Note: do_ldiv assumes that unsigned long long is a 64 bit long + * and unsigned long is at least a 32 bits long. + */ +#define do_ldiv(n, base) \ +({ \ + unsigned long long value = n; \ + unsigned long long leftover; \ + unsigned long temp; \ + unsigned long result_div1, result_div2, result_div3, result_mod; \ +\ + temp = value >> 32; \ + result_div1 = temp/(base); \ + result_mod = temp%(base); \ +\ + temp = (result_mod << 24) | ((value >> 8) & 0xFFFFFF); \ + result_div2 = temp/(base); \ + result_mod = temp%(base); \ +\ + temp = (result_mod << 8) | (value & 0xFF); \ + result_div3 = temp/(base); \ + result_mod = temp%(base);\ +\ + leftover = ((unsigned long long)result_div1 << 32) | \ + ((unsigned long long)result_div2 << 8) | (result_div3); \ +\ + n = leftover; \ + result_mod; \ +}) + + +static char * lnumber(char * str, long long num, int base, int size, + int precision, int type) +{ + char c,sign,tmp[66]; + const char *digits="0123456789abcdef"; + int i; + + if (type & LARGE) + digits = "0123456789ABCDEF"; + if (type & LEFT) + type &= ~ZEROPAD; + if (base < 2 || base > 36) + return 0; + c = (type & ZEROPAD) ? '0' : ' '; + sign = 0; + if (type & SIGN) { + if (num < 0) { + sign = '-'; + num = -num; + size--; + } else if (type & PLUS) { + sign = '+'; + size--; + } else if (type & SPACE) { + sign = ' '; + size--; + } + } + if (type & SPECIAL) { + if (base == 16) + size -= 2; + } + i = 0; + if (num == 0) + tmp[i++]='0'; + else while (num != 0) + tmp[i++] = digits[do_ldiv(num,base)]; + if (i > precision) + precision = i; + size -= precision; + if (!(type&(ZEROPAD+LEFT))) + while(size-->0) + *str++ = ' '; + if (sign) + *str++ = sign; + if (type & SPECIAL) { + if (base==16) { + *str++ = '0'; + *str++ = digits[33]; + } + } + if (!(type & LEFT)) + while (size-- > 0) + *str++ = c; + while (i < precision--) + *str++ = '0'; + while (i-- > 0) + *str++ = tmp[i]; + while (size-- > 0) + *str++ = ' '; + return str; +} +#endif static char * number(char * str, long num, int base, int size, int precision ,int type) @@ -207,7 +303,10 @@ /* get the conversion qualifier */ qualifier = -1; if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L') { - qualifier = *fmt; + if (*fmt == 'l' && qualifier == 'l') + qualifier = 'L'; + else + qualifier = *fmt; ++fmt; } @@ -290,7 +389,22 @@ --fmt; continue; } - if (qualifier == 'l') + if (qualifier == 'L') { + +#if BITS_PER_LONG < 64 + /* 64-bit printout in 32-bit systems !! + Needed at some point for 64-bit file offsets and + mmap() reporting functions. */ + + unsigned long long lnum; + lnum = va_arg(args, unsigned long long); + str = lnumber(str, lnum, base, field_width, + precision, flags); + continue; +#else + num = va_arg(args, unsigned long); /* 64-bit longs..*/ +#endif + } else if (qualifier == 'l') num = va_arg(args, unsigned long); else if (qualifier == 'h') { num = (unsigned short) va_arg(args, int); diff -urN 2.2.17pre19/mm/Makefile 2.2.17pre19aa2/mm/Makefile --- 2.2.17pre19/mm/Makefile Mon Jan 18 02:27:01 1999 +++ 2.2.17pre19aa2/mm/Makefile Sat Aug 26 17:13:49 2000 @@ -12,4 +12,8 @@ vmalloc.o slab.o \ swap.o vmscan.o page_io.o page_alloc.o swap_state.o swapfile.o +ifeq ($(CONFIG_BIGMEM),y) +O_OBJS += bigmem.o +endif + include $(TOPDIR)/Rules.make diff -urN 2.2.17pre19/mm/bigmem.c 2.2.17pre19aa2/mm/bigmem.c --- 2.2.17pre19/mm/bigmem.c Thu Jan 1 01:00:00 1970 +++ 2.2.17pre19aa2/mm/bigmem.c Sat Aug 26 17:13:52 2000 @@ -0,0 +1,87 @@ +/* + * BIGMEM common code and variables. + * + * (C) 1999 Andrea Arcangeli, SuSE GmbH, andrea@suse.de + * Gerhard Wichert, Siemens AG, Gerhard.Wichert@pdb.siemens.de + */ + +#include +#include +#include + +unsigned long bigmem_mapnr; +int nr_free_bigpages = 0; + +struct page * prepare_bigmem_swapout(struct page * page) +{ + /* if this is a bigmem page so it can't be swapped out directly + otherwise the b_data buffer addresses will break + the lowlevel device drivers. */ + if (PageBIGMEM(page)) + { + unsigned long regular_page; + unsigned long vaddr; + + regular_page = __get_free_page(GFP_ATOMIC); + if (!regular_page) + return NULL; + + vaddr = kmap(page_address(page), KM_READ); + copy_page(regular_page, vaddr); + kunmap(vaddr, KM_READ); + + /* ok, we can just forget about our bigmem page since + we stored its data into the new regular_page. */ + __free_page(page); + + page = MAP_NR(regular_page) + mem_map; + } + return page; +} + +struct page * replace_with_bigmem(struct page * page) +{ + if (!PageBIGMEM(page) && nr_free_bigpages) + { + unsigned long kaddr; + + kaddr = __get_free_page(GFP_ATOMIC|GFP_BIGMEM); + if (kaddr) + { + struct page * bigmem_page; + + bigmem_page = MAP_NR(kaddr) + mem_map; + if (PageBIGMEM(bigmem_page)) + { + unsigned long vaddr; + + vaddr = kmap(kaddr, KM_WRITE); + copy_page(vaddr, page_address(page)); + kunmap(vaddr, KM_WRITE); + + /* Preserve the caching of the swap_entry. */ + bigmem_page->index = page->index; + + /* We can just forget the old page since + we stored its data into the new + bigmem_page. */ + __free_page(page); + + page = bigmem_page; + } + } + } + return page; +} + +unsigned long prepare_bigmem_shm_swapin(unsigned long page) +{ + if (!PageBIGMEM(&mem_map[MAP_NR(page)])) + return page; + + free_page(page); + + /* no need to clear the page since it will be rewrited by the + swapin. */ + return __get_free_page(GFP_ATOMIC); +} diff -urN 2.2.17pre19/mm/filemap.c 2.2.17pre19aa2/mm/filemap.c --- 2.2.17pre19/mm/filemap.c Tue Aug 22 14:54:13 2000 +++ 2.2.17pre19aa2/mm/filemap.c Sat Aug 26 17:13:52 2000 @@ -19,8 +19,8 @@ #include #include #include -#include #include +#include #include #include @@ -36,26 +36,6 @@ unsigned int page_hash_bits, page_hash_mask; struct page **page_hash_table; -/* - * Define a request structure for outstanding page write requests - * to the background page io daemon - */ - -struct pio_request -{ - struct pio_request * next; - struct file * file; - unsigned long offset; - unsigned long page; -}; -static struct pio_request *pio_first = NULL, **pio_last = &pio_first; -static kmem_cache_t *pio_request_cache; -static struct wait_queue *pio_wait = NULL; - -static inline void -make_pio_request(struct file *, unsigned long, unsigned long); - - /* * Invalidate the pages of an inode, removing all pages that aren't * locked down (those are sure to be up-to-date anyway, so we shouldn't @@ -88,7 +68,7 @@ * Truncate the page cache at a set offset, removing the pages * that are beyond that offset (and zeroing out partial pages). */ -void truncate_inode_pages(struct inode * inode, unsigned long start) +void truncate_inode_pages(struct inode * inode, loff_t start) { struct page ** p; struct page * page; @@ -96,10 +76,10 @@ repeat: p = &inode->i_pages; while ((page = *p) != NULL) { - unsigned long offset = page->offset; + loff_t loffset = pgoff2loff(page->index); /* page wholly truncated - free it */ - if (offset >= start) { + if (loffset >= start) { if (PageLocked(page)) { wait_on_page(page); goto repeat; @@ -115,9 +95,10 @@ continue; } p = &page->next; - offset = start - offset; + loffset = start - loffset; /* partial truncate, clear end of page */ - if (offset < PAGE_CACHE_SIZE) { + if (loffset < PAGE_CACHE_SIZE) { + unsigned int offset = loffset; /* truncate ok */ unsigned long address = page_address(page); memset((void *) (offset + address), 0, PAGE_CACHE_SIZE - offset); flush_page_to_ram(address); @@ -138,26 +119,39 @@ int shrink_mmap(int priority, int gfp_mask) { static unsigned long clock = 0; +#ifndef CONFIG_BIGMEM unsigned long limit = num_physpages; +#else + unsigned long limit = bigmem_mapnr; +#endif struct page * page; int count; - int nr_dirty = 0; - + /* Make sure we scan all pages twice at priority 0. */ - count = (limit << 1) >> priority; + count = limit / priority; refresh_clock: page = mem_map + clock; do { int referenced; + if (current->need_resched) { + current->state = TASK_RUNNING; + schedule(); + goto refresh_clock; + } + /* This works even in the presence of PageSkip because * the first two entries at the beginning of a hole will * be marked, not just the first. */ page++; clock++; +#ifndef CONFIG_BIGMEM if (clock >= max_mapnr) { +#else + if (clock >= bigmem_mapnr) { +#endif clock = 0; page = mem_map; } @@ -176,16 +170,22 @@ if (PageLocked(page)) continue; + if (!(gfp_mask & __GFP_BIGMEM) && PageBIGMEM(page)) + continue; + if ((gfp_mask & __GFP_DMA) && !PageDMA(page)) continue; + count--; + /* * Is it a page swap page? If so, we want to * drop it if it is no longer used, even if it * were to be marked referenced.. */ if (PageSwapCache(page)) { - if (referenced && swap_count(page->offset) != 1) + if (referenced && + swap_count(pgoff2ulong(page->index)) != 1) continue; delete_from_swap_cache(page); return 1; @@ -196,14 +196,6 @@ /* Is it a buffer page? */ if (page->buffers) { - /* - * Wait for async IO to complete - * at each 64 buffers - */ - - int wait = ((gfp_mask & __GFP_IO) - && (!(nr_dirty++ % 64))); - if (buffer_under_min()) continue; /* @@ -211,7 +203,7 @@ * throttling. */ - if (!try_to_free_buffers(page, wait)) + if (!try_to_free_buffers(page, gfp_mask)) goto refresh_clock; return 1; } @@ -223,8 +215,7 @@ remove_inode_page(page); return 1; } - - } while (--count > 0); + } while (count > 0); return 0; } @@ -249,11 +240,12 @@ * memory maps. --sct */ -void update_vm_cache_conditional(struct inode * inode, unsigned long pos, const char * buf, int count, unsigned long source_address) +void update_vm_cache_conditional(struct inode * inode, loff_t pos, const char * buf, int count, unsigned long source_address) { unsigned long offset, len; + pgoff_t pgoff = loff2pgoff(pos); - offset = (pos & ~PAGE_CACHE_MASK); + offset = ((unsigned long)pos & ~PAGE_CACHE_MASK); pos = pos & PAGE_CACHE_MASK; len = PAGE_CACHE_SIZE - offset; do { @@ -261,7 +253,7 @@ if (len > count) len = count; - page = find_page(inode, pos); + page = find_page(inode, pgoff); if (page) { char *dest = (char*) (offset + page_address(page)); @@ -280,19 +272,20 @@ } while (count); } -void update_vm_cache(struct inode * inode, unsigned long pos, const char * buf, int count) +void update_vm_cache(struct inode * inode, loff_t pos, const char * buf, int count) { update_vm_cache_conditional(inode, pos, buf, count, 0); } static inline void add_to_page_cache(struct page * page, - struct inode * inode, unsigned long offset, - struct page **hash) + struct inode * inode, + pgoff_t pgoff, + struct page **hash) { atomic_inc(&page->count); - page->flags = (page->flags & ~((1 << PG_uptodate) | (1 << PG_error))) | (1 << PG_referenced); - page->offset = offset; + page->flags = page->flags & ~((1 << PG_uptodate) | (1 << PG_error)); + page->index = pgoff; add_page_to_inode_queue(inode, page); __add_page_to_hash_queue(page, hash); } @@ -303,29 +296,32 @@ * this is all overlapped with the IO on the previous page finishing anyway) */ static unsigned long try_to_read_ahead(struct file * file, - unsigned long offset, unsigned long page_cache) + pgoff_t pgoff, unsigned long page_cache) { struct inode *inode = file->f_dentry->d_inode; - struct page * page; - struct page ** hash; + pgoff_t pg_size; + + /* Calculate file size in 'pages' -- if even one byte (according to + the 'i_size') exceeds the final page-size block, round up. */ + pg_size = loff2pgoff(inode->i_size+(PAGE_SIZE-1)); - offset &= PAGE_CACHE_MASK; - switch (page_cache) { - case 0: + if (!page_cache) { page_cache = page_cache_alloc(); if (!page_cache) - break; - default: - if (offset >= inode->i_size) - break; - hash = page_hash(inode, offset); - page = __find_page(inode, offset, *hash); + return 0; /* Can't allocate! */ + } + /* Ok, we have a page, make sure it is in the page cache */ + if (pgoff2ulong(pgoff) < pgoff2ulong(pg_size)) { + struct page * page; + struct page ** hash; + hash = page_hash(inode, pgoff); + page = __find_page(inode, pgoff, *hash); if (!page) { /* * Ok, add the new page to the hash-queues... */ page = page_cache_entry(page_cache); - add_to_page_cache(page, inode, offset, hash); + add_to_page_cache(page, inode, pgoff, hash); inode->i_op->readpage(file, page); page_cache = 0; } @@ -348,13 +344,14 @@ wait.task = tsk; add_wait_queue(&page->wait, &wait); -repeat: - tsk->state = TASK_UNINTERRUPTIBLE; - run_task_queue(&tq_disk); - if (PageLocked(page)) { + do { + run_task_queue(&tq_disk); + tsk->state = TASK_UNINTERRUPTIBLE; + mb(); + if (!PageLocked(page)) + break; schedule(); - goto repeat; - } + } while (PageLocked(page)); tsk->state = TASK_RUNNING; remove_wait_queue(&page->wait, &wait); } @@ -379,11 +376,11 @@ #define PROFILE_MAXREADCOUNT 1000 -static unsigned long total_reada; -static unsigned long total_async; -static unsigned long total_ramax; -static unsigned long total_ralen; -static unsigned long total_rawin; +static u_long total_reada; +static u_long total_async; +static u_long total_ramax; +static u_long total_ralen; +static u_long total_rawin; static void profile_readahead(int async, struct file *filp) { @@ -491,13 +488,13 @@ static inline unsigned long generic_file_readahead(int reada_ok, struct file * filp, struct inode * inode, - unsigned long ppos, struct page * page, unsigned long page_cache) + loff_t ppos, struct page * page, unsigned long page_cache) { - unsigned long max_ahead, ahead; - unsigned long raend; + loff_t max_ahead, ahead; + loff_t raend; int max_readahead = get_max_readahead(inode); - raend = filp->f_raend & PAGE_CACHE_MASK; + raend = filp->f_raend & PAGE_CACHE_MASK_loff; max_ahead = 0; /* @@ -555,7 +552,7 @@ ahead = 0; while (ahead < max_ahead) { ahead += PAGE_CACHE_SIZE; - page_cache = try_to_read_ahead(filp, raend + ahead, + page_cache = try_to_read_ahead(filp, loff2pgoff(raend + ahead), page_cache); } /* @@ -570,9 +567,11 @@ * accessed sequentially. */ if (ahead) { +#if 0 if (reada_ok == 2) { run_task_queue(&tq_disk); } +#endif filp->f_ralen += ahead; filp->f_rawin += filp->f_ralen; @@ -621,14 +620,17 @@ { struct dentry *dentry = filp->f_dentry; struct inode *inode = dentry->d_inode; - size_t pos, pgpos, page_cache; + size_t page_cache; + pgoff_t pgpos; + loff_t pos, posp; int reada_ok; int max_readahead = get_max_readahead(inode); page_cache = 0; pos = *ppos; - pgpos = pos & PAGE_CACHE_MASK; + posp = pos & PAGE_CACHE_MASK_loff; + pgpos = loff2pgoff(pos); /* * If the current position is outside the previous read-ahead window, * we reset the current read-ahead context and set read ahead max to zero @@ -636,7 +638,7 @@ * otherwise, we assume that the file accesses are sequential enough to * continue read-ahead. */ - if (pgpos > filp->f_raend || pgpos + filp->f_rawin < filp->f_raend) { + if (posp > filp->f_raend || posp + filp->f_rawin < filp->f_raend) { reada_ok = 0; filp->f_raend = 0; filp->f_ralen = 0; @@ -652,12 +654,12 @@ * Then, at least MIN_READAHEAD if read ahead is ok, * and at most MAX_READAHEAD in all cases. */ - if (pos + desc->count <= (PAGE_CACHE_SIZE >> 1)) { + if (pos + desc->count <= (loff_t)(PAGE_CACHE_SIZE >> 1)) { filp->f_ramax = 0; } else { - unsigned long needed; + loff_t needed; - needed = ((pos + desc->count) & PAGE_CACHE_MASK) - pgpos; + needed = ((pos + desc->count) & PAGE_CACHE_MASK) - posp; if (filp->f_ramax < needed) filp->f_ramax = needed; @@ -670,6 +672,7 @@ for (;;) { struct page *page, **hash; + pgoff_t pgoff; if (pos >= inode->i_size) break; @@ -677,8 +680,9 @@ /* * Try to find the data in the page cache.. */ - hash = page_hash(inode, pos & PAGE_CACHE_MASK); - page = __find_page(inode, pos & PAGE_CACHE_MASK, *hash); + pgoff = loff2pgoff(pos); + hash = page_hash(inode, pgoff); + page = __find_page(inode, pgoff, *hash); if (!page) goto no_cached_page; @@ -691,7 +695,7 @@ * the page has been rewritten. */ if (PageUptodate(page) || PageLocked(page)) - page_cache = generic_file_readahead(reada_ok, filp, inode, pos & PAGE_CACHE_MASK, page, page_cache); + page_cache = generic_file_readahead(reada_ok, filp, inode, pos & PAGE_CACHE_MASK_loff, page, page_cache); else if (reada_ok && filp->f_ramax > MIN_READAHEAD) filp->f_ramax = MIN_READAHEAD; @@ -709,8 +713,8 @@ unsigned long offset, nr; offset = pos & ~PAGE_CACHE_MASK; - nr = PAGE_CACHE_SIZE - offset; - if (nr > inode->i_size - pos) + nr = PAGE_CACHE_SIZE - offset; /* small value */ + if ((loff_t)nr > (inode->i_size - pos)) nr = inode->i_size - pos; /* @@ -750,7 +754,7 @@ */ page = page_cache_entry(page_cache); page_cache = 0; - add_to_page_cache(page, inode, pos & PAGE_CACHE_MASK, hash); + add_to_page_cache(page, inode, pgoff, hash); /* * Error handling is tricky. If we get a read error, @@ -831,10 +835,28 @@ ssize_t generic_file_read(struct file * filp, char * buf, size_t count, loff_t *ppos) { ssize_t retval; + struct inode *inode = filp->f_dentry->d_inode; retval = -EFAULT; if (access_ok(VERIFY_WRITE, buf, count)) { retval = 0; + +#if BITS_PER_LONG < 64 + /* L-F-S spec 2.2.1.25: */ + if (count && !(filp->f_flags & O_LARGEFILE) && + S_ISREG(inode->i_mode) && + (*ppos < inode->i_size) && + (*ppos >= 0x7ffffffe)) /* pos@2G forbidden */ + return -EOVERFLOW; + if (count && !(filp->f_flags & O_LARGEFILE) && + S_ISREG(inode->i_mode) && + (*ppos < inode->i_size) && + (*ppos + count >= 0x7fffffff)) { + /* Read only until end of allowed region */ + count = LONG_MAX - *ppos; + } +#endif + if (count) { read_descriptor_t desc; @@ -862,12 +884,12 @@ if (size > count) size = count; - down(&inode->i_sem); + fs_down(&inode->i_sem); old_fs = get_fs(); set_fs(KERNEL_DS); written = file->f_op->write(file, area, size, &file->f_pos); set_fs(old_fs); - up(&inode->i_sem); + fs_up(&inode->i_sem); if (written < 0) { desc->error = written; written = 0; @@ -976,20 +998,25 @@ struct file * file = area->vm_file; struct dentry * dentry = file->f_dentry; struct inode * inode = dentry->d_inode; - unsigned long offset, reada, i; + loff_t offset; + pgoff_t pgoff, reada; + int i; struct page * page, **hash; unsigned long old_page, new_page; new_page = 0; - offset = (address & PAGE_MASK) - area->vm_start + area->vm_offset; + offset = ((loff_t)((address & PAGE_MASK) - area->vm_start) + + area->vm_offset); + if (offset >= inode->i_size && (area->vm_flags & VM_SHARED) && area->vm_mm == current->mm) goto no_page; /* * Do we have something in the page cache already? */ - hash = page_hash(inode, offset); - page = __find_page(inode, offset, *hash); + pgoff = loff2pgoff(offset); + hash = page_hash(inode, pgoff); + page = __find_page(inode, pgoff, *hash); if (!page) goto no_cached_page; @@ -1040,11 +1067,12 @@ /* * Try to read in an entire cluster at once. */ - reada = offset; - reada >>= PAGE_CACHE_SHIFT + page_cluster; - reada <<= PAGE_CACHE_SHIFT + page_cluster; + reada = loff2pgoff(offset); + /* Mask lowest 'page_cluster' worth of the lowest bits */ + reada = ulong2pgoff(pgoff2ulong(reada) & ((~(0UL)) << page_cluster)); - for (i = 1 << page_cluster; i > 0; --i, reada += PAGE_CACHE_SIZE) + for (i = 1 << page_cluster; i > 0; + --i, reada = ulong2pgoff(pgoff2ulong(reada)+1)) new_page = try_to_read_ahead(file, reada, new_page); if (!new_page) @@ -1058,7 +1086,7 @@ * cache.. The page we just got may be useful if we * can't share, so don't get rid of it here. */ - page = find_page(inode, offset); + page = find_page(inode, pgoff); if (page) goto found_page; @@ -1067,7 +1095,7 @@ */ page = page_cache_entry(new_page); new_page = 0; - add_to_page_cache(page, inode, offset, hash); + add_to_page_cache(page, inode, pgoff, hash); if (inode->i_op->readpage(file, page) != 0) goto failure; @@ -1116,10 +1144,10 @@ * if the disk is full. */ static inline int do_write_page(struct inode * inode, struct file * file, - const char * page, unsigned long offset) + const char * page, loff_t offset) { int retval; - unsigned long size; + loff_t size; loff_t loff = offset; mm_segment_t old_fs; @@ -1143,9 +1171,8 @@ } static int filemap_write_page(struct vm_area_struct * vma, - unsigned long offset, - unsigned long page, - int wait) + loff_t offset, + unsigned long page) { int result; struct file * file; @@ -1163,20 +1190,9 @@ * and file could be released ... increment the count to be safe. */ file->f_count++; - - /* - * If this is a swapping operation rather than msync(), then - * leave the actual IO, and the restoration of the file count, - * to the kpiod thread. Just queue the request for now. - */ - if (!wait) { - make_pio_request(file, offset, page); - return 0; - } - - down(&inode->i_sem); + fs_down(&inode->i_sem); result = do_write_page(inode, file, (const char *) page, offset); - up(&inode->i_sem); + fs_up(&inode->i_sem); fput(file); return result; } @@ -1189,7 +1205,7 @@ */ int filemap_swapout(struct vm_area_struct * vma, struct page * page) { - return filemap_write_page(vma, page->offset, page_address(page), 0); + return filemap_write_page(vma, pgoff2loff(page->index), page_address(page)); } static inline int filemap_sync_pte(pte_t * ptep, struct vm_area_struct *vma, @@ -1226,7 +1242,7 @@ return 0; } } - error = filemap_write_page(vma, address - vma->vm_start + vma->vm_offset, page, 1); + error = filemap_write_page(vma, address - vma->vm_start + vma->vm_offset, page); page_cache_free(page); return error; } @@ -1398,9 +1414,9 @@ if (file) { struct dentry * dentry = file->f_dentry; struct inode * inode = dentry->d_inode; - down(&inode->i_sem); + fs_down(&inode->i_sem); error = file_fsync(file, dentry); - up(&inode->i_sem); + fs_up(&inode->i_sem); } } return error; @@ -1488,7 +1504,7 @@ { struct dentry *dentry = file->f_dentry; struct inode *inode = dentry->d_inode; - unsigned long pos = *ppos; + loff_t pos = *ppos; unsigned long limit = current->rlim[RLIMIT_FSIZE].rlim_cur; struct page *page, **hash; unsigned long page_cache = 0; @@ -1514,7 +1530,7 @@ * Check whether we've reached the file size limit. */ status = -EFBIG; - if (pos >= limit) { + if (limit != RLIM_INFINITY && pos >= limit) { send_sig(SIGXFSZ, current, 0); goto out; } @@ -1524,21 +1540,21 @@ * Check whether to truncate the write, * and send the signal if we do. */ - if (count > limit - pos) { + if (limit != RLIM_INFINITY && count > limit - pos) { send_sig(SIGXFSZ, current, 0); count = limit - pos; } while (count) { - unsigned long bytes, pgpos, offset; + unsigned long bytes, offset; + pgoff_t pgpos = loff2pgoff(pos); char * dest; /* * Try to find the page in the cache. If it isn't there, * allocate a free page. */ - offset = (pos & ~PAGE_CACHE_MASK); - pgpos = pos & PAGE_CACHE_MASK; + offset = ((unsigned long)pos & ~PAGE_CACHE_MASK); bytes = PAGE_CACHE_SIZE - offset; if (bytes > count) bytes = count; @@ -1610,15 +1626,14 @@ * Note: we don't have to worry about races here, as the caller * is holding the inode semaphore. */ -unsigned long get_cached_page(struct inode * inode, unsigned long offset, - int new) +unsigned long get_cached_page(struct inode * inode, pgoff_t pgoff, int new) { struct page * page; struct page ** hash; unsigned long page_cache = 0; - hash = page_hash(inode, offset); - page = __find_page(inode, offset, *hash); + hash = page_hash(inode, pgoff); + page = __find_page(inode, pgoff, *hash); if (!page) { if (!new) goto out; @@ -1627,7 +1642,7 @@ goto out; clear_page(page_cache); page = page_cache_entry(page_cache); - add_to_page_cache(page, inode, offset, hash); + add_to_page_cache(page, inode, pgoff, hash); } if (atomic_read(&page->count) != 2) printk(KERN_ERR "get_cached_page: page count=%d\n", @@ -1656,130 +1671,6 @@ clear_bit(PG_locked, &page->flags); wake_up(&page->wait); page_cache_release(page); -} - - -/* Add request for page IO to the queue */ - -static inline void put_pio_request(struct pio_request *p) -{ - *pio_last = p; - p->next = NULL; - pio_last = &p->next; -} - -/* Take the first page IO request off the queue */ - -static inline struct pio_request * get_pio_request(void) -{ - struct pio_request * p = pio_first; - pio_first = p->next; - if (!pio_first) - pio_last = &pio_first; - return p; -} - -/* Make a new page IO request and queue it to the kpiod thread */ - -static inline void make_pio_request(struct file *file, - unsigned long offset, - unsigned long page) -{ - struct pio_request *p; - - atomic_inc(&page_cache_entry(page)->count); - - /* - * We need to allocate without causing any recursive IO in the - * current thread's context. We might currently be swapping out - * as a result of an allocation made while holding a critical - * filesystem lock. To avoid deadlock, we *MUST* not reenter - * the filesystem in this thread. - * - * We can wait for kswapd to free memory, or we can try to free - * pages without actually performing further IO, without fear of - * deadlock. --sct - */ - - while ((p = kmem_cache_alloc(pio_request_cache, GFP_BUFFER)) == NULL) { - if (try_to_free_pages(__GFP_WAIT)) - continue; - current->state = TASK_INTERRUPTIBLE; - schedule_timeout(HZ/10); - } - - p->file = file; - p->offset = offset; - p->page = page; - - put_pio_request(p); - wake_up(&pio_wait); -} - - -/* - * This is the only thread which is allowed to write out filemap pages - * while swapping. - * - * To avoid deadlock, it is important that we never reenter this thread. - * Although recursive memory allocations within this thread may result - * in more page swapping, that swapping will always be done by queuing - * another IO request to the same thread: we will never actually start - * that IO request until we have finished with the current one, and so - * we will not deadlock. - */ - -int kpiod(void * unused) -{ - struct task_struct *tsk = current; - struct wait_queue wait = { tsk, }; - struct inode * inode; - struct dentry * dentry; - struct pio_request * p; - - tsk->session = 1; - tsk->pgrp = 1; - strcpy(tsk->comm, "kpiod"); - sigfillset(&tsk->blocked); - init_waitqueue(&pio_wait); - /* - * Mark this task as a memory allocator - we don't want to get caught - * up in the regular mm freeing frenzy if we have to allocate memory - * in order to write stuff out. - */ - tsk->flags |= PF_MEMALLOC; - - lock_kernel(); - - pio_request_cache = kmem_cache_create("pio_request", - sizeof(struct pio_request), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (!pio_request_cache) - panic ("Could not create pio_request slab cache"); - - while (1) { - tsk->state = TASK_INTERRUPTIBLE; - add_wait_queue(&pio_wait, &wait); - if (!pio_first) - schedule(); - remove_wait_queue(&pio_wait, &wait); - tsk->state = TASK_RUNNING; - - while (pio_first) { - p = get_pio_request(); - dentry = p->file->f_dentry; - inode = dentry->d_inode; - - down(&inode->i_sem); - do_write_page(inode, p->file, - (const char *) p->page, p->offset); - up(&inode->i_sem); - fput(p->file); - page_cache_free(p->page); - kmem_cache_free(pio_request_cache, p); - } - } } void __init page_cache_init(unsigned long memory_size) diff -urN 2.2.17pre19/mm/memory.c 2.2.17pre19aa2/mm/memory.c --- 2.2.17pre19/mm/memory.c Sun Apr 2 21:07:50 2000 +++ 2.2.17pre19aa2/mm/memory.c Sat Aug 26 17:13:52 2000 @@ -31,12 +31,18 @@ /* * 05.04.94 - Multi-page memory management added for v1.1. * Idea by Alex Bligh (alex@cconcepts.co.uk) + * + * 16.07.99 - Support of BIGMEM added by Gerhard Wichert, Siemens AG + * (Gerhard.Wichert@pdb.siemens.de) */ #include #include #include #include +#include +#include +#include #include #include @@ -53,10 +59,10 @@ static inline void copy_cow_page(unsigned long from, unsigned long to) { if (from == ZERO_PAGE(to)) { - clear_page(to); + clear_bigpage(to); return; } - copy_page(to, from); + copy_bigpage(to, from); } mem_map_t * mem_map = NULL; @@ -397,6 +403,183 @@ } } + +/* + * Do a quick page-table lookup for a single page. + */ +static unsigned long get_page(unsigned long address) +{ + pgd_t *pgd; + pmd_t *pmd; + + pgd = pgd_offset(current->mm, address); + pmd = pmd_offset(pgd, address); + if (pmd) { + pte_t * pte = pte_offset(pmd, address); + if (pte && pte_present(*pte)) { + return pte_page(*pte); + } + } + + printk(KERN_ERR "Missing page in lock_down_page\n"); + return 0; +} + +/* + * Given a physical address, is there a useful struct page pointing to it? + */ + +static struct page * get_page_map(unsigned long page) +{ + struct page *map; + + if (MAP_NR(page) >= max_mapnr) + return 0; + if (page == ZERO_PAGE(page)) + return 0; + map = mem_map + MAP_NR(page); + if (PageReserved(map)) + return 0; + return map; +} + +/* + * Force in an entire range of pages from the current process's user VA, + * and pin and lock the pages for IO. + */ + +#define dprintk(x...) +int map_user_kiobuf(int rw, struct kiobuf *iobuf, unsigned long va, size_t len) +{ + unsigned long ptr, end; + int err; + struct mm_struct * mm; + struct vm_area_struct * vma = 0; + unsigned long page; + struct page * map; + int doublepage = 0; + int repeat = 0; + int i; + + /* Make sure the iobuf is not already mapped somewhere. */ + if (iobuf->nr_pages) + return -EINVAL; + + mm = current->mm; + dprintk ("map_user_kiobuf: begin\n"); + + ptr = va & PAGE_MASK; + end = (va + len + PAGE_SIZE - 1) & PAGE_MASK; + err = expand_kiobuf(iobuf, (end - ptr) >> PAGE_SHIFT); + if (err) + return err; + + repeat: + down(&mm->mmap_sem); + + err = -EFAULT; + iobuf->locked = 1; + iobuf->offset = va & ~PAGE_MASK; + iobuf->length = len; + + i = 0; + + /* + * First of all, try to fault in all of the necessary pages + */ + while (ptr < end) { + if (!vma || ptr >= vma->vm_end) { + vma = find_vma(current->mm, ptr); + if (!vma) + goto out_unlock; + } + if (!handle_mm_fault(current, vma, ptr, (rw==READ))) + goto out_unlock; + page = get_page(ptr); + if (!page) { + printk (KERN_ERR "Missing page in map_user_kiobuf\n"); + goto out_unlock; + } + map = get_page_map(page); + if (map) { + if (PageLocked(map)) + goto retry; + atomic_inc(&map->count); + set_bit(PG_locked, &map->flags); + } + dprintk ("Installing page %p %p: %d\n", (void *)page, map, i); + iobuf->pagelist[i] = page; + iobuf->maplist[i] = map; + iobuf->nr_pages = ++i; + + ptr += PAGE_SIZE; + } + + up(&mm->mmap_sem); + dprintk ("map_user_kiobuf: end OK\n"); + return 0; + + out_unlock: + up(&mm->mmap_sem); + unmap_kiobuf(iobuf); + dprintk ("map_user_kiobuf: end %d\n", err); + return err; + + retry: + + /* + * Undo the locking so far, wait on the page we got to, and try again. + */ + unmap_kiobuf(iobuf); + up(&mm->mmap_sem); + + /* + * Did the release also unlock the page we got stuck on? + */ + if (!PageLocked(map)) { + /* If so, we may well have the page mapped twice in the + * IO address range. Bad news. Of course, it _might_ + * just be a coincidence, but if it happens more than + * once, chances are we have a double-mapped page. */ + if (++doublepage >= 3) { + return -EINVAL; + } + } + + /* + * Try again... + */ + wait_on_page(map); + if (++repeat < 16) + goto repeat; + return -EAGAIN; +} + + +/* + * Unmap all of the pages referenced by a kiobuf. We release the pages, + * and unlock them if they were locked. + */ + +void unmap_kiobuf (struct kiobuf *iobuf) +{ + int i; + struct page *map; + + for (i = 0; i < iobuf->nr_pages; i++) { + map = iobuf->maplist[i]; + + if (map && iobuf->locked) { + clear_bit(PG_locked, &map->flags); + wake_up(&map->wait); + __free_page(map); + } + } + + iobuf->nr_pages = 0; + iobuf->locked = 0; +} + static inline void zeromap_pte_range(pte_t * pte, unsigned long address, unsigned long size, pgprot_t prot) { @@ -613,7 +796,7 @@ struct page * page_map; pte = *page_table; - new_page = __get_free_page(GFP_USER); + new_page = __get_free_page(GFP_BIGUSER); /* Did swap_out() unmapped the protected page while we slept? */ if (pte_val(*page_table) != pte_val(pte)) goto end_wp_page; @@ -639,7 +822,7 @@ case 2: if (!PageSwapCache(page_map)) break; - if (swap_count(page_map->offset) != 1) + if (swap_count(pgoff2ulong(page_map->index)) != 1) break; delete_from_swap_cache(page_map); /* FallThrough */ @@ -730,7 +913,7 @@ * between the file and the memory map for a potential last * incomplete page. Ugly, but necessary. */ -void vmtruncate(struct inode * inode, unsigned long offset) +void vmtruncate(struct inode * inode, loff_t offset) { struct vm_area_struct * mpnt; @@ -807,10 +990,10 @@ { pte_t entry = pte_wrprotect(mk_pte(ZERO_PAGE(addr), vma->vm_page_prot)); if (write_access) { - unsigned long page = __get_free_page(GFP_USER); + unsigned long page = __get_free_page(GFP_BIGUSER); if (!page) return -1; - clear_page(page); + clear_bigpage(page); entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); vma->vm_mm->rss++; tsk->min_flt++; diff -urN 2.2.17pre19/mm/mmap.c 2.2.17pre19aa2/mm/mmap.c --- 2.2.17pre19/mm/mmap.c Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/mm/mmap.c Sat Aug 26 17:13:52 2000 @@ -40,6 +40,7 @@ kmem_cache_t *vm_area_cachep; int sysctl_overcommit_memory; +int heap_stack_gap = 1; /* Check that a process has enough memory to allocate a * new virtual mapping. @@ -66,7 +67,6 @@ free += page_cache_size; free += nr_free_pages; free += nr_swap_pages; - free -= (page_cache.min_percent + buffer_mem.min_percent + 2)*num_physpages/100; return free > pages; } @@ -169,11 +169,25 @@ #undef _trans } -unsigned long do_mmap(struct file * file, unsigned long addr, unsigned long len, - unsigned long prot, unsigned long flags, unsigned long off) +unsigned long do_mmap(struct file *file, unsigned long addr, + unsigned long len, unsigned long prot, + unsigned long flag, unsigned long offset) +{ + unsigned long ret = -EINVAL; + if ((offset + PAGE_ALIGN(len)) < offset) + goto out; + if (!(offset & ~PAGE_MASK)) + ret = do_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT); +out: + return ret; +} + +unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, unsigned long len, + unsigned long prot, unsigned long flags, unsigned long pg_off) { struct mm_struct * mm = current->mm; struct vm_area_struct * vma; + loff_t off = (loff_t)pg_off << PAGE_SHIFT; int error; if (file && (!file->f_op || !file->f_op->mmap)) @@ -371,9 +385,14 @@ for (vmm = find_vma(current->mm, addr); ; vmm = vmm->vm_next) { /* At this point: (!vmm || addr < vmm->vm_end). */ + unsigned long __heap_stack_gap = 0; if (TASK_SIZE - len < addr) return 0; - if (!vmm || addr + len <= vmm->vm_start) + if (!vmm) + return addr; + if (vmm->vm_flags & VM_GROWSDOWN) + __heap_stack_gap = heap_stack_gap << PAGE_SHIFT; + if (addr + len + __heap_stack_gap <= vmm->vm_start) return addr; addr = vmm->vm_end; } @@ -836,7 +855,8 @@ * the offsets must be contiguous.. */ if ((mpnt->vm_file != NULL) || (mpnt->vm_flags & VM_SHM)) { - unsigned long off = prev->vm_offset+prev->vm_end-prev->vm_start; + loff_t off = (prev->vm_offset + + (loff_t)(prev->vm_end - prev->vm_start)); if (off != mpnt->vm_offset) continue; } diff -urN 2.2.17pre19/mm/mremap.c 2.2.17pre19aa2/mm/mremap.c --- 2.2.17pre19/mm/mremap.c Mon Jan 17 16:44:50 2000 +++ 2.2.17pre19aa2/mm/mremap.c Sat Aug 26 17:13:48 2000 @@ -127,7 +127,7 @@ new_vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); if (new_vma) { - unsigned long new_addr = get_unmapped_area(addr, new_len); + unsigned long new_addr = get_unmapped_area(0, new_len); if (new_addr && !move_page_tables(current->mm, new_addr, addr, old_len)) { *new_vma = *vma; diff -urN 2.2.17pre19/mm/page_alloc.c 2.2.17pre19aa2/mm/page_alloc.c --- 2.2.17pre19/mm/page_alloc.c Tue Aug 22 14:54:13 2000 +++ 2.2.17pre19aa2/mm/page_alloc.c Sat Aug 26 17:13:52 2000 @@ -3,6 +3,7 @@ * * Copyright (C) 1991, 1992, 1993, 1994 Linus Torvalds * Swap reorganised 29.12.95, Stephen Tweedie + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 */ #include @@ -13,6 +14,7 @@ #include #include #include +#include /* export bigmem vars */ #include #include /* for copy_to/from_user */ @@ -35,7 +37,11 @@ #else #define NR_MEM_LISTS 10 #endif +#ifndef CONFIG_BIGMEM #define NR_MEM_TYPES 2 /* GFP_DMA vs not for now. */ +#else +#define NR_MEM_TYPES 3 +#endif /* The start of this MUST match the start of "struct page" */ struct free_area_struct { @@ -93,34 +99,79 @@ */ spinlock_t page_alloc_lock = SPIN_LOCK_UNLOCKED; -static inline void free_pages_ok(unsigned long map_nr, unsigned long order, unsigned type) -{ - struct free_area_struct *area = free_area[type] + order; - unsigned long index = map_nr >> (1 + order); +#define list(x) (mem_map+(x)) +#define __free_pages_ok(map_nr, mask, area, index) \ + nr_free_pages -= (mask); \ + while ((mask) + (1 << (NR_MEM_LISTS-1))) { \ + if (!test_and_change_bit((index), (area)->map)) \ + break; \ + (area)->count--; \ + remove_mem_queue(list((map_nr) ^ -(mask))); \ + (mask) <<= 1; \ + (area)++; \ + (index) >>= 1; \ + (map_nr) &= (mask); \ + } \ + add_mem_queue(area, list(map_nr)); + +static void free_local_pages(struct page * page) { + unsigned long order = page->index; + unsigned int type = PageDMA(page) ? 1 : 0; + struct free_area_struct *area; + unsigned long map_nr = page - mem_map; unsigned long mask = (~0UL) << order; - unsigned long flags; + unsigned long index = map_nr >> (1 + order); - spin_lock_irqsave(&page_alloc_lock, flags); +#ifdef CONFIG_BIGMEM + if (PageBIGMEM(page)) + type = 2; +#endif + area = free_area[type] + order; + __free_pages_ok(map_nr, mask, area, index); +} -#define list(x) (mem_map+(x)) +static inline void free_pages_ok(unsigned long map_nr, unsigned long order, unsigned type) +{ + struct free_area_struct *area; + unsigned long index; + unsigned long mask; + unsigned long flags; + struct page * page; + if (current->flags & PF_FREE_PAGES) + goto local_freelist; + back_local_freelist: + + area = free_area[type] + order; + index = map_nr >> (1 + order); + mask = (~0UL) << order; map_nr &= mask; - nr_free_pages -= mask; - while (mask + (1 << (NR_MEM_LISTS-1))) { - if (!test_and_change_bit(index, area->map)) - break; - area->count--; - remove_mem_queue(list(map_nr ^ -mask)); - mask <<= 1; - area++; - index >>= 1; - map_nr &= mask; + + spin_lock_irqsave(&page_alloc_lock, flags); +#ifdef CONFIG_BIGMEM + if (map_nr >= bigmem_mapnr) { + area = free_area[2] + order; + nr_free_bigpages -= mask; } - add_mem_queue(area, list(map_nr)); +#endif + __free_pages_ok(map_nr, mask, area, index); + spin_unlock_irqrestore(&page_alloc_lock, flags); + return; -#undef list + local_freelist: + /* + * This is a little subtle: if the allocation order + * wanted is major than zero we'd better take all the pages + * local since we must deal with fragmentation too and we + * can't rely on the nr_local_pages information. + */ + if (current->nr_local_pages && !current->allocation_order) + goto back_local_freelist; - spin_unlock_irqrestore(&page_alloc_lock, flags); + page = mem_map + map_nr; + list_add((struct list_head *) page, ¤t->local_pages); + page->index = order; + current->nr_local_pages++; } void __free_pages(struct page *page, unsigned long order) @@ -148,6 +199,17 @@ #define MARK_USED(index, order, area) \ change_bit((index) >> (1+(order)), (area)->map) #define ADDRESS(x) (PAGE_OFFSET + ((x) << PAGE_SHIFT)) +#ifdef CONFIG_BIGMEM +#define UPDATE_NR_FREE_BIGPAGES(map_nr, order) \ + do \ + { \ + if ((map_nr) >= bigmem_mapnr) \ + nr_free_bigpages -= 1 << (order); \ + } \ + while (0) +#else +#define UPDATE_NR_FREE_BIGPAGES(map_nr, order) do { } while (0) +#endif #define RMQUEUE_TYPE(order, type) \ do { struct free_area_struct * area = free_area[type]+order; \ unsigned long new_order = order; \ @@ -158,6 +220,7 @@ map_nr = ret - mem_map; \ MARK_USED(map_nr, new_order, area); \ nr_free_pages -= 1 << order; \ + UPDATE_NR_FREE_BIGPAGES(map_nr, order); \ area->count--; \ EXPAND(ret, map_nr, order, new_order, area); \ spin_unlock_irqrestore(&page_alloc_lock, flags); \ @@ -179,13 +242,32 @@ atomic_set(&map->count, 1); \ } while (0) +static void refile_local_pages(void) +{ + if (current->nr_local_pages) { + struct page * page; + struct list_head * entry; + int nr_pages = current->nr_local_pages; + + while ((entry = current->local_pages.next) != ¤t->local_pages) { + list_del(entry); + page = (struct page *) entry; + free_local_pages(page); + if (!nr_pages--) + panic("__get_free_pages local_pages list corrupted I"); + } + if (nr_pages) + panic("__get_free_pages local_pages list corrupted II"); + current->nr_local_pages = 0; + } +} + unsigned long __get_free_pages(int gfp_mask, unsigned long order) { unsigned long flags; - static atomic_t free_before_allocate = ATOMIC_INIT(0); if (order >= NR_MEM_LISTS) - goto nopage; + goto out; #ifdef ATOMIC_MEMORY_DEBUGGING if ((gfp_mask & __GFP_WAIT) && in_interrupt()) { @@ -194,26 +276,25 @@ printk("gfp called nonatomically from interrupt %p\n", __builtin_return_address(0)); } - goto nopage; + goto out; } #endif /* + * Acquire lock before reading nr_free_pages to make sure it + * won't change from under us. + */ + spin_lock_irqsave(&page_alloc_lock, flags); + + /* * If this is a recursive call, we'd better * do our best to just allocate things without * further thought. */ if (!(current->flags & PF_MEMALLOC)) { - int freed; extern struct wait_queue * kswapd_wait; - /* Somebody needs to free pages so we free some of our own. */ - if (atomic_read(&free_before_allocate)) { - current->flags |= PF_MEMALLOC; - try_to_free_pages(gfp_mask); - current->flags &= ~PF_MEMALLOC; - } - +#ifndef CONFIG_BIGMEM if (nr_free_pages > freepages.low) goto ok_to_allocate; @@ -223,35 +304,88 @@ /* Do we have to block or can we proceed? */ if (nr_free_pages > freepages.min) goto ok_to_allocate; +#else + if (gfp_mask & __GFP_BIGMEM) { + if (nr_free_pages > freepages.low) + goto ok_to_allocate; + + /* + * Wake kswapd only if the normal classzone + * is low on memory otherwise waking up kswapd would + * be useless. + */ + if (nr_free_pages-nr_free_bigpages <= freepages.low && + waitqueue_active(&kswapd_wait)) + wake_up_interruptible(&kswapd_wait); + + /* Do we have to block or can we proceed? */ + if (nr_free_pages > freepages.min) + goto ok_to_allocate; + } else { + if (nr_free_pages-nr_free_bigpages > freepages.low) + goto ok_to_allocate; + + if (waitqueue_active(&kswapd_wait)) + wake_up_interruptible(&kswapd_wait); + + /* Do we have to block or can we proceed? */ + if (nr_free_pages-nr_free_bigpages > freepages.min) + goto ok_to_allocate; + } +#endif + if (gfp_mask & __GFP_WAIT) { + int freed; + /* + * If the task is ok to sleep it's fine also + * if we release irq here. + */ + spin_unlock_irq(&page_alloc_lock); + + current->flags |= PF_MEMALLOC|PF_FREE_PAGES; + current->allocation_order = order; + freed = try_to_free_pages(gfp_mask); + current->flags &= ~(PF_MEMALLOC|PF_FREE_PAGES); + + spin_lock_irq(&page_alloc_lock); + refile_local_pages(); + + /* + * Re-check we're still low on memory after we blocked + * for some time. Somebody may have released lots of + * memory from under us while we was trying to free + * the pages. We check against pages_high to be sure + * to succeed only if lots of memory is been released. + */ +#ifndef CONFIG_BIGMEM + if (nr_free_pages > freepages.high) + goto ok_to_allocate; +#else + if (gfp_mask & __GFP_BIGMEM) { + if (nr_free_pages > freepages.high) + goto ok_to_allocate; + } else { + if (nr_free_pages-nr_free_bigpages > freepages.high) + goto ok_to_allocate; + } +#endif - current->flags |= PF_MEMALLOC; - atomic_inc(&free_before_allocate); - freed = try_to_free_pages(gfp_mask); - atomic_dec(&free_before_allocate); - current->flags &= ~PF_MEMALLOC; - - /* - * Re-check we're still low on memory after we blocked - * for some time. Somebody may have released lots of - * memory from under us while we was trying to free - * the pages. We check against pages_high to be sure - * to succeed only if lots of memory is been released. - */ - if (nr_free_pages > freepages.high) - goto ok_to_allocate; - - if (!freed && !(gfp_mask & (__GFP_MED | __GFP_HIGH))) - goto nopage; + if (!freed && !(gfp_mask & (__GFP_MED | __GFP_HIGH))) + goto nopage; + } } ok_to_allocate: - spin_lock_irqsave(&page_alloc_lock, flags); /* if it's not a dma request, try non-dma first */ - if (!(gfp_mask & __GFP_DMA)) + if (!(gfp_mask & __GFP_DMA)) { +#ifdef CONFIG_BIGMEM + if (gfp_mask & __GFP_BIGMEM) + RMQUEUE_TYPE(order, 2); +#endif RMQUEUE_TYPE(order, 0); + } RMQUEUE_TYPE(order, 1); + nopage: spin_unlock_irqrestore(&page_alloc_lock, flags); - -nopage: + out: return 0; } @@ -266,7 +400,9 @@ unsigned type; spin_lock_irqsave(&page_alloc_lock, flags); - printk("Free pages: %6dkB\n ( ",nr_free_pages<<(PAGE_SHIFT-10)); + printk("Free pages: %6dkB (%6dkB BigMem)\n ( ", + nr_free_pages<<(PAGE_SHIFT-10), + nr_free_bigpages<<(PAGE_SHIFT-10)); printk("Free: %d (%d %d %d)\n", nr_free_pages, freepages.min, @@ -274,7 +410,19 @@ freepages.high); for (type = 0; type < NR_MEM_TYPES; type++) { unsigned long total = 0; +#ifdef CONFIG_BIGMEM + switch (type) + { + case 0: + case 1: +#endif printk("%sDMA: ", type ? "" : "Non"); +#ifdef CONFIG_BIGMEM + break; + case 2: + printk("BIGMEM: "); + } +#endif for (order=0 ; order < NR_MEM_LISTS; order++) { unsigned long nr = free_area[type][order].count; @@ -426,6 +574,8 @@ * this process. */ delete_from_swap_cache(page_map); + page_map = replace_with_bigmem(page_map); + page = page_address(page_map); set_pte(page_table, pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)))); return 1; } diff -urN 2.2.17pre19/mm/page_io.c 2.2.17pre19aa2/mm/page_io.c --- 2.2.17pre19/mm/page_io.c Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/mm/page_io.c Sat Aug 26 17:13:52 2000 @@ -112,7 +112,7 @@ * as if it were: we are not allowed to manipulate the inode * hashing for locked pages. */ - if (page->offset != entry) { + if (pgoff2ulong(page->index) != entry) { printk ("swap entry mismatch"); return; } @@ -265,8 +265,8 @@ printk("VM: swap page is not in swap cache\n"); return; } - if (page->offset != entry) { - printk ("swap entry mismatch"); + if (pgoff2ulong(page->index) != entry) { + printk ("VM: swap entry mismatch"); return; } rw_swap_page_base(rw, entry, page, wait); @@ -291,12 +291,12 @@ printk ("VM: read_swap_page: page already in page cache!\n"); return; } - page->inode = &swapper_inode; - page->offset = entry; + page->inode = &swapper_inode; + page->index = ulong2pgoff(entry); atomic_inc(&page->count); /* Protect from shrink_mmap() */ rw_swap_page(rw, entry, buffer, 1); atomic_dec(&page->count); - page->inode = 0; + page->inode = 0; clear_bit(PG_swap_cache, &page->flags); } diff -urN 2.2.17pre19/mm/swap.c 2.2.17pre19aa2/mm/swap.c --- 2.2.17pre19/mm/swap.c Mon Jan 18 02:27:01 1999 +++ 2.2.17pre19aa2/mm/swap.c Sat Aug 26 17:13:47 2000 @@ -47,13 +47,13 @@ atomic_t nr_async_pages = ATOMIC_INIT(0); buffer_mem_t buffer_mem = { - 2, /* minimum percent buffer */ + 4, /* minimum percent buffer */ 10, /* borrow percent buffer */ 60 /* maximum percent buffer */ }; buffer_mem_t page_cache = { - 2, /* minimum percent page cache */ + 4, /* minimum percent page cache */ 15, /* borrow percent page cache */ 75 /* maximum */ }; diff -urN 2.2.17pre19/mm/swap_state.c 2.2.17pre19aa2/mm/swap_state.c --- 2.2.17pre19/mm/swap_state.c Mon Jan 17 16:44:50 2000 +++ 2.2.17pre19aa2/mm/swap_state.c Sat Aug 26 17:13:52 2000 @@ -54,7 +54,7 @@ if (PageTestandSetSwapCache(page)) { printk(KERN_ERR "swap_cache: replacing non-empty entry %08lx " "on page %08lx\n", - page->offset, page_address(page)); + pgoff2ulong(page->index), page_address(page)); return 0; } if (page->inode) { @@ -63,9 +63,10 @@ return 0; } atomic_inc(&page->count); + page->flags = page->flags & ~((1 << PG_uptodate) | (1 << PG_error)); page->inode = &swapper_inode; - page->offset = entry; - add_page_to_hash_queue(page, &swapper_inode, entry); + page->index = ulong2pgoff(entry); + add_page_to_hash_queue(page, &swapper_inode, ulong2pgoff(entry)); add_page_to_inode_queue(&swapper_inode, page); return 1; } @@ -203,7 +204,7 @@ */ void delete_from_swap_cache(struct page *page) { - long entry = page->offset; + long entry = pgoff2ulong(page->index); #ifdef SWAP_CACHE_INFO swap_cache_del_total++; @@ -251,7 +252,7 @@ swap_cache_find_total++; #endif while (1) { - found = find_page(&swapper_inode, entry); + found = find_page(&swapper_inode, ulong2pgoff(entry)); if (!found) return 0; if (found->inode != &swapper_inode || !PageSwapCache(found)) diff -urN 2.2.17pre19/mm/vmalloc.c 2.2.17pre19aa2/mm/vmalloc.c --- 2.2.17pre19/mm/vmalloc.c Tue Jul 13 00:33:04 1999 +++ 2.2.17pre19aa2/mm/vmalloc.c Sat Aug 26 17:13:49 2000 @@ -2,6 +2,7 @@ * linux/mm/vmalloc.c * * Copyright (C) 1993 Linus Torvalds + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 */ #include @@ -94,7 +95,7 @@ unsigned long page; if (!pte_none(*pte)) printk("alloc_area_pte: page already exists\n"); - page = __get_free_page(GFP_KERNEL); + page = __get_free_page(GFP_KERNEL|GFP_BIGMEM); if (!page) return -ENOMEM; set_pte(pte, mk_pte(page, PAGE_KERNEL)); diff -urN 2.2.17pre19/mm/vmscan.c 2.2.17pre19aa2/mm/vmscan.c --- 2.2.17pre19/mm/vmscan.c Tue Aug 22 14:54:13 2000 +++ 2.2.17pre19aa2/mm/vmscan.c Sat Aug 26 17:13:52 2000 @@ -17,6 +17,7 @@ #include #include #include +#include #include @@ -60,7 +61,8 @@ if (PageReserved(page_map) || PageLocked(page_map) - || ((gfp_mask & __GFP_DMA) && !PageDMA(page_map))) + || ((gfp_mask & __GFP_DMA) && !PageDMA(page_map)) + || (!(gfp_mask & __GFP_BIGMEM) && PageBIGMEM(page_map))) return 0; /* @@ -72,7 +74,7 @@ * memory, and we should just continue our scan. */ if (PageSwapCache(page_map)) { - entry = page_map->offset; + entry = pgoff2ulong(page_map->index); swap_duplicate(entry); set_pte(page_table, __pte(entry)); drop_pte: @@ -96,6 +98,9 @@ * some real work in the future in "shrink_mmap()". */ if (!pte_dirty(pte)) { + if (page_map->inode && pgcache_under_min()) + /* unmapping this page would be useless */ + return 0; flush_cache_page(vma, address); pte_clear(page_table); goto drop_pte; @@ -106,7 +111,7 @@ * we cannot do I/O! Avoid recursing on FS * locks etc. */ - if (!(gfp_mask & __GFP_IO)) + if (!(gfp_mask & __GFP_IO) || current->fs_locks) return 0; /* @@ -151,6 +156,9 @@ if (!entry) return 0; /* No swap space left */ + if (!(page_map = prepare_bigmem_swapout(page_map))) + goto out_swap_free; + vma->vm_mm->rss--; tsk->nswap++; set_pte(page_table, __pte(entry)); @@ -162,10 +170,14 @@ set_bit(PG_locked, &page_map->flags); /* OK, do a physical asynchronous write to swap. */ - rw_swap_page(WRITE, entry, (char *) page, 0); + rw_swap_page(WRITE, entry, (char *) page_address(page_map), 0); __free_page(page_map); return 1; + + out_swap_free: + swap_free(entry); + return 0; } /* @@ -208,6 +220,8 @@ result = try_to_swap_out(tsk, vma, address, pte, gfp_mask); if (result) return result; + if (current->need_resched) + return 2; address += PAGE_SIZE; pte++; } while (address < end); @@ -327,7 +341,7 @@ * Think of swap_cnt as a "shadow rss" - it tells us which process * we want to page out (always try largest first). */ - counter = nr_tasks / (priority+1); + counter = nr_tasks / priority; if (counter < 1) counter = 1; @@ -361,8 +375,13 @@ goto out; } - if (swap_out_process(pbest, gfp_mask)) + switch (swap_out_process(pbest, gfp_mask)) { + case 1: return 1; + case 2: + current->state = TASK_RUNNING; + schedule(); + } } out: return 0; @@ -377,11 +396,9 @@ * cluster them so that we get good swap-out behaviour. See * the "free_memory()" macro for details. */ -static int do_try_to_free_pages(unsigned int gfp_mask) +int try_to_free_pages(unsigned int gfp_mask) { int priority; - int ret = 0; - int swapcount; int count = SWAP_CLUSTER_MAX; lock_kernel(); @@ -389,41 +406,34 @@ /* Always trim SLAB caches when memory gets low. */ kmem_cache_reap(gfp_mask); - priority = 6; + priority = 5; do { while (shrink_mmap(priority, gfp_mask)) { - ret = 1; if (!--count) goto done; } /* Try to get rid of some shared memory pages.. */ - if (gfp_mask & __GFP_IO) { + if (gfp_mask & __GFP_IO && !current->fs_locks) { while (shm_swap(priority, gfp_mask)) { - ret = 1; if (!--count) goto done; } } /* Then, try to page stuff out.. */ - swapcount = count; while (swap_out(priority, gfp_mask)) { - ret = 1; - if (!--swapcount) - break; + if (!--count) + goto done; } shrink_dcache_memory(priority, gfp_mask); - } while (--priority >= 0); + } while (--priority > 0); done: unlock_kernel(); - if (!ret) - printk("VM: do_try_to_free_pages failed for %s...\n", - current->comm); /* Return success if we freed a page. */ - return ret; + return priority > 0; } /* @@ -497,9 +507,13 @@ */ interruptible_sleep_on(&kswapd_wait); - while (nr_free_pages < freepages.high) + /* + * In 2.2.x-bigmem kswapd is critical to provide GFP_ATOMIC + * allocations (not GFP_BIGMEM ones). + */ + while (nr_free_pages - nr_free_bigpages < freepages.high) { - if (do_try_to_free_pages(GFP_KSWAPD)) + if (try_to_free_pages(GFP_KSWAPD)) { if (tsk->need_resched) schedule(); @@ -510,17 +524,3 @@ } } } - -/* - * Called by non-kswapd processes when kswapd really cannot - * keep up with the demand for free memory. - */ -int try_to_free_pages(unsigned int gfp_mask) -{ - int retval = 1; - - if (gfp_mask & __GFP_WAIT) - retval = do_try_to_free_pages(gfp_mask); - return retval; -} - diff -urN 2.2.17pre19/net/ipv4/tcp_input.c 2.2.17pre19aa2/net/ipv4/tcp_input.c --- 2.2.17pre19/net/ipv4/tcp_input.c Tue Aug 22 14:54:14 2000 +++ 2.2.17pre19aa2/net/ipv4/tcp_input.c Sat Aug 26 17:13:48 2000 @@ -97,6 +97,7 @@ */ static void tcp_delack_estimator(struct tcp_opt *tp) { + tcp_exit_quickack_mode(tp); if(tp->ato == 0) { tp->lrcvtime = tcp_time_stamp; @@ -115,10 +116,7 @@ if(m > tp->rto) tp->ato = tp->rto; else { - /* This funny shift makes sure we - * clear the "quick ack mode" bit. - */ - tp->ato = ((tp->ato << 1) >> 2) + m; + tp->ato = (tp->ato >> 1) + m; } } } diff -urN 2.2.17pre19/net/ipv4/tcp_ipv4.c 2.2.17pre19aa2/net/ipv4/tcp_ipv4.c --- 2.2.17pre19/net/ipv4/tcp_ipv4.c Tue Aug 22 14:54:14 2000 +++ 2.2.17pre19aa2/net/ipv4/tcp_ipv4.c Sat Aug 26 17:13:48 2000 @@ -1402,6 +1402,7 @@ newtp->snd_una = req->snt_isn + 1; newtp->srtt = 0; newtp->ato = 0; + tcp_enter_quickack_mode(newtp); newtp->snd_wl1 = req->rcv_isn; newtp->snd_wl2 = req->snt_isn; @@ -1947,6 +1948,7 @@ skb_queue_head_init(&tp->out_of_order_queue); tcp_init_xmit_timers(sk); + tcp_enter_quickack_mode(tp); tp->rto = TCP_TIMEOUT_INIT; /*TCP_WRITE_TIME*/ tp->mdev = TCP_TIMEOUT_INIT; tp->mss_clamp = ~0; diff -urN 2.2.17pre19/net/ipv4/tcp_output.c 2.2.17pre19aa2/net/ipv4/tcp_output.c --- 2.2.17pre19/net/ipv4/tcp_output.c Tue Aug 22 14:54:14 2000 +++ 2.2.17pre19aa2/net/ipv4/tcp_output.c Sat Aug 26 17:13:48 2000 @@ -1038,6 +1038,17 @@ timeout = (tp->ato << 1) >> 1; if (timeout > max_timeout) timeout = max_timeout; + if (!timeout) + { + timeout = tp->rto; + if ((signed) timeout <= 0) + { + printk(KERN_ERR + "tcp_send_delayed_ack: rto %ld!\n", timeout); + timeout = 1; + } + timeout = min(timeout, max_timeout); + } timeout += jiffies; /* Use new timeout only if there wasn't a older one earlier. */ diff -urN 2.2.17pre19/net/ipv4/tcp_timer.c 2.2.17pre19aa2/net/ipv4/tcp_timer.c --- 2.2.17pre19/net/ipv4/tcp_timer.c Tue Jun 13 03:48:15 2000 +++ 2.2.17pre19aa2/net/ipv4/tcp_timer.c Sat Aug 26 17:13:48 2000 @@ -195,7 +195,21 @@ if (!atomic_read(&sk->sock_readers)) tcp_send_ack(sk); else - tcp_send_delayed_ack(&(sk->tp_pinfo.af_tcp), HZ/10); + { + struct tcp_opt * tp = &(sk->tp_pinfo.af_tcp); + int rto; + + rto = tp->rto; + if (rto <= 0) + { + printk(KERN_ERR + "tcp_delack_timer: rto %d!\n", rto); + rto = 1; + } + rto = min(rto, HZ/10); + tp->delack_timer.expires = rto + jiffies; + add_timer(&tp->delack_timer); + } } }