diff -Nru a/CREDITS b/CREDITS --- a/CREDITS Fri Aug 22 17:05:49 2003 +++ b/CREDITS Fri Aug 22 17:05:49 2003 @@ -110,16 +110,18 @@ S: USA N: Andrea Arcangeli -E: andrea@e-mind.com -W: http://e-mind.com/~andrea/ -P: 1024/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 +E: andrea@suse.de +W: http://www.kernel.org/pub/linux/kernel/people/andrea/ +P: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 +P: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 D: Parport hacker D: Implemented a workaround for some interrupt buggy printers -D: Author of pscan that helps to fix lp/parport bug +D: Author of pscan that helps to fix lp/parport bugs D: Author of lil (Linux Interrupt Latency benchmark) D: Fixed the shm swap deallocation at swapoff time (try_to_unuse message) +D: VM hacker D: Various other kernel hacks -S: Via Ciaclini 26 +S: Via Cicalini 26 S: Imola 40026 S: Italy @@ -629,17 +631,18 @@ S: USA N: Alan Cox -W: http://roadrunner.swansea.linux.org.uk/alan.shtml -E: alan@lxorguk.ukuu.org.uk -E: alan@www.linux.org.uk (linux.org.uk stuff) -E: Alan.Cox@linux.org (if others fail) +W: http://www.linux.org.uk/diary/ D: Linux Networking (0.99.10->2.0.29) D: Original Appletalk, AX.25, and IPX code -D: Current 3c501 hacker. >>More 3c501 info/tricks wanted<<. +D: 3c501 hacker D: Watchdog timer drivers D: Linux/SMP x86 (up to 2.0 only) D: Initial Mac68K port D: Video4Linux design, bw-qcam and PMS driver ports. +D: IDE modularisation work +D: Z85230 driver +D: Former security contact point (please use vendor-sec@lst.de) +D: ex 2.2 maintainer D: 2.1.x modular sound S: c/o Red Hat UK Ltd S: Alexandra House @@ -1988,7 +1991,7 @@ S: Canada B3J 3C8 N: Kai Mäkisara -E: Kai.Makisara@metla.fi +E: Kai.Makisara@kolumbus.fi D: SCSI Tape Driver N: Asit Mallick diff -Nru a/Documentation/00-INDEX b/Documentation/00-INDEX --- a/Documentation/00-INDEX Fri Aug 22 17:05:42 2003 +++ b/Documentation/00-INDEX Fri Aug 22 17:05:42 2003 @@ -70,6 +70,8 @@ - info about directory notification in Linux. driver-model.txt - info about Linux driver model. +early-userspace/ + - info about initramfs, klibc, and userspace early during boot. exception.txt - how Linux v2.2 handles exceptions without verify_area etc. fb/ diff -Nru a/Documentation/BK-usage/bk-kernel-howto.txt b/Documentation/BK-usage/bk-kernel-howto.txt --- a/Documentation/BK-usage/bk-kernel-howto.txt Fri Aug 22 17:05:46 2003 +++ b/Documentation/BK-usage/bk-kernel-howto.txt Fri Aug 22 17:05:46 2003 @@ -216,7 +216,7 @@ 3) Include a summary and "diffstat -p1" of each changeset that will be downloaded, when Linus issues a "bk pull". The author auto-generates -these summaries using "bk push -nl 2>&1", to obtain a listing +these summaries using "bk changes -L ", to obtain a listing of all the pending-to-send changesets, and their commit messages. It is important to show Linus what he will be downloading when he issues diff -Nru a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt --- a/Documentation/DMA-mapping.txt Fri Aug 22 17:05:46 2003 +++ b/Documentation/DMA-mapping.txt Fri Aug 22 17:05:46 2003 @@ -689,7 +689,7 @@ and offset using something like this: struct page *page = virt_to_page(ptr); - unsigned long offset = ((unsigned long)ptr & ~PAGE_MASK); + unsigned long offset = offset_in_page(ptr); Here are the interfaces: diff -Nru a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl --- a/Documentation/DocBook/kernel-api.tmpl Fri Aug 22 17:05:38 2003 +++ b/Documentation/DocBook/kernel-api.tmpl Fri Aug 22 17:05:38 2003 @@ -206,7 +206,7 @@ Power Management -!Ekernel/pm.c +!Ekernel/power/pm.c diff -Nru a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt --- a/Documentation/binfmt_misc.txt Fri Aug 22 17:05:44 2003 +++ b/Documentation/binfmt_misc.txt Fri Aug 22 17:05:44 2003 @@ -11,6 +11,9 @@ bits) you have supplied. Binfmt_misc can also recognise a filename extension aka '.com' or '.exe'. +First you must mount binfmt_misc: + mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc + To actually register a new binary type, you have to set up a string looking like :name:type:offset:magic:mask:interpreter: (where you can choose the ':' upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register. diff -Nru a/Documentation/crypto/api-intro.txt b/Documentation/crypto/api-intro.txt --- a/Documentation/crypto/api-intro.txt Fri Aug 22 17:05:49 2003 +++ b/Documentation/crypto/api-intro.txt Fri Aug 22 17:05:49 2003 @@ -185,7 +185,7 @@ Matthew Skala (Twofish) Dag Arne Osvik (Serpent) Brian Gladman (AES) - + Kartikey Mahendra Bhatt (CAST6) SHA1 algorithm contributors: Jean-Francois Dive @@ -213,6 +213,9 @@ Herbert Valerio Riedel Kyle McMartin Adam J. Richter + +CAST5 algorithm contributors: + Kartikey Mahendra Bhatt (original developers unknown, FSF copyright). Generic scatterwalk code by Adam J. Richter diff -Nru a/Documentation/dnotify.txt b/Documentation/dnotify.txt --- a/Documentation/dnotify.txt Fri Aug 22 17:05:46 2003 +++ b/Documentation/dnotify.txt Fri Aug 22 17:05:46 2003 @@ -32,7 +32,8 @@ Preferably the application will choose one of the real time signals (SIGRTMIN + ) so that the notifications may be queued. This is -especially important if DN_MULTISHOT is specified. +especially important if DN_MULTISHOT is specified. Note that SIGRTMIN +is often blocked, so it is better to use (at least) SIGRTMIN + 1. Implementation expectations (features and bugs :-)) --------------------------- @@ -78,10 +79,10 @@ act.sa_sigaction = handler; sigemptyset(&act.sa_mask); act.sa_flags = SA_SIGINFO; - sigaction(SIGRTMIN, &act, NULL); + sigaction(SIGRTMIN + 1, &act, NULL); fd = open(".", O_RDONLY); - fcntl(fd, F_SETSIG, SIGRTMIN); + fcntl(fd, F_SETSIG, SIGRTMIN + 1); fcntl(fd, F_NOTIFY, DN_MODIFY|DN_CREATE|DN_MULTISHOT); /* we will now be notified if any of the files in "." is modified or new files are created */ diff -Nru a/Documentation/early-userspace/README b/Documentation/early-userspace/README --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/Documentation/early-userspace/README Fri Aug 22 17:06:18 2003 @@ -0,0 +1,75 @@ +Early userspace support +======================= + +Last update: 2003-08-21 + + +"Early userspace" is a set of libraries and programs that provide +various pieces of functionality that are important enough to be +available while a Linux kernel is coming up, but that don't need to be +run inside the kernel itself. + +It consists of several major infrastructure components: + +- gen_init_cpio, a program that builds a cpio-format archive + containing a root filesystem image. This archive is compressed, and + the compressed image is linked into the kernel image. +- initramfs, a chunk of code that unpacks the compressed cpio image + midway through the kernel boot process. +- klibc, a userspace C library, currently packaged separately, that is + optimised for correctness and small size. + +The cpio file format used by initramfs is the "newc" (aka "cpio -c") +format, and is documented in the file "buffer-format.txt". If you +want to generate your own cpio files directly instead of hacking on +gen_init_cpio, you will need to short-circuit the build process in +usr/ so that gen_init_cpio does not get run, then simply pop your own +initramfs_data.cpio.gz file into place. + + +Where's this all leading? +========================= + +The klibc distribution contains some of the necessary software to make +early userspace useful. The klibc distribution is currently +maintained separately from the kernel, but this may change early in +the 2.7 era (it missed the boat for 2.5). + +You can obtain somewhat infrequent snapshots of klibc from +ftp://ftp.kernel.org/pub/linux/libs/klibc/ + +For active users, you are better off using the klibc BitKeeper +repositories, at http://klibc.bkbits.net/ + +The standalone klibc distribution currently provides three components, +in addition to the klibc library: + +- ipconfig, a program that configures network interfaces. It can + configure them statically, or use DHCP to obtain information + dynamically (aka "IP autoconfiguration"). +- nfsmount, a program that can mount an NFS filesystem. +- kinit, the "glue" that uses ipconfig and nfsmount to replace the old + support for IP autoconfig, mount a filesystem over NFS, and continue + system boot using that filesystem as root. + +kinit is built as a single statically linked binary to save space. + +Eventually, several more chunks of kernel functionality will hopefully +move to early userspace: + +- Almost all of init/do_mounts* (the beginning of this is already in + place) +- ACPI table parsing +- Insert unwieldy subsystem that doesn't really need to be in kernel + space here + +If kinit doesn't meet your current needs and you've got bytes to burn, +the klibc distribution includes a small Bourne-compatible shell (ash) +and a number of other utilities, so you can replace kinit and build +custom initramfs images that meet your needs exactly. + +For questions and help, you can sign up for the early userspace +mailing list at http://www.zytor.com/mailman/listinfo/klibc + + +Bryan O'Sullivan diff -Nru a/Documentation/early-userspace/buffer-format.txt b/Documentation/early-userspace/buffer-format.txt --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/Documentation/early-userspace/buffer-format.txt Fri Aug 22 17:06:18 2003 @@ -0,0 +1,112 @@ + initramfs buffer format + ----------------------- + + Al Viro, H. Peter Anvin + Last revision: 2002-01-13 + +Starting with kernel 2.5.x, the old "initial ramdisk" protocol is +getting {replaced/complemented} with the new "initial ramfs" +(initramfs) protocol. The initramfs contents is passed using the same +memory buffer protocol used by the initrd protocol, but the contents +is different. The initramfs buffer contains an archive which is +expanded into a ramfs filesystem; this document details the format of +the initramfs buffer format. + +The initramfs buffer format is based around the "newc" or "crc" CPIO +formats, and can be created with the cpio(1) utility. The cpio +archive can be compressed using gzip(1). One valid version of an +initramfs buffer is thus a single .cpio.gz file. + +The full format of the initramfs buffer is defined by the following +grammar, where: + * is used to indicate "0 or more occurrences of" + (|) indicates alternatives + + indicates concatenation + GZIP() indicates the gzip(1) of the operand + ALGN(n) means padding with null bytes to an n-byte boundary + + initramfs := ("\0" | cpio_archive | cpio_gzip_archive)* + + cpio_gzip_archive := GZIP(cpio_archive) + + cpio_archive := cpio_file* + ( | cpio_trailer) + + cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data + + cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4) + + +In human terms, the initramfs buffer contains a collection of +compressed and/or uncompressed cpio archives (in the "newc" or "crc" +formats); arbitrary amounts zero bytes (for padding) can be added +between members. + +The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is +not ignored; see "handling of hard links" below. + +The structure of the cpio_header is as follows (all fields contain +hexadecimal ASCII numbers fully padded with '0' on the left to the +full width of the field, for example, the integer 4780 is represented +by the ASCII string "000012ac"): + +Field name Field size Meaning +c_magic 6 bytes The string "070701" or "070702" +c_ino 8 bytes File inode number +c_mode 8 bytes File mode and permissions +c_uid 8 bytes File uid +c_gid 8 bytes File gid +c_nlink 8 bytes Number of links +c_mtime 8 bytes Modification time +c_filesize 8 bytes Size of data field +c_maj 8 bytes Major part of file device number +c_min 8 bytes Minor part of file device number +c_rmaj 8 bytes Major part of device node reference +c_rmin 8 bytes Minor part of device node reference +c_namesize 8 bytes Length of filename, including final \0 +c_chksum 8 bytes Checksum of data field if c_magic is 070702; + otherwise zero + +The c_mode field matches the contents of st_mode returned by stat(2) +on Linux, and encodes the file type and file permissions. + +The c_filesize should be zero for any file which is not a regular file +or symlink. + +The c_chksum field contains a simple 32-bit unsigned sum of all the +bytes in the data field. cpio(1) refers to this as "crc", which is +clearly incorrect (a cyclic redundancy check is a different and +significantly stronger integrity check), however, this is the +algorithm used. + +If the filename is "TRAILER!!!" this is actually an end-of-archive +marker; the c_filesize for an end-of-archive marker must be zero. + + +*** Handling of hard links + +When a nondirectory with c_nlink > 1 is seen, the (c_maj,c_min,c_ino) +tuple is looked up in a tuple buffer. If not found, it is entered in +the tuple buffer and the entry is created as usual; if found, a hard +link rather than a second copy of the file is created. It is not +necessary (but permitted) to include a second copy of the file +contents; if the file contents is not included, the c_filesize field +should be set to zero to indicate no data section follows. If data is +present, the previous instance of the file is overwritten; this allows +the data-carrying instance of a file to occur anywhere in the sequence +(GNU cpio is reported to attach the data to the last instance of a +file only.) + +c_filesize must not be zero for a symlink. + +When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is +reset. This permits archives which are generated independently to be +concatenated. + +To combine file data from different sources (without having to +regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of +the following techniques can be used: + +a) Separate the different file data sources with a "TRAILER!!!" + end-of-archive marker, or + +b) Make sure c_nlink == 1 for all nondirectory entries. diff -Nru a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking --- a/Documentation/filesystems/Locking Fri Aug 22 17:05:52 2003 +++ b/Documentation/filesystems/Locking Fri Aug 22 17:05:52 2003 @@ -28,8 +28,9 @@ --------------------------- inode_operations --------------------------- prototypes: - int (*create) (struct inode *,struct dentry *,int); - struct dentry * (*lookup) (struct inode *,struct dentry *); + int (*create) (struct inode *,struct dentry *,int, struct nameidata *); + struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameid +ata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); @@ -38,13 +39,13 @@ int (*mknod) (struct inode *,struct dentry *,int,dev_t); int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *); - int (*readlink) (struct dentry *, char *,int); + int (*readlink) (struct dentry *, char __user *,int); int (*follow_link) (struct dentry *, struct nameidata *); void (*truncate) (struct inode *); - int (*permission) (struct inode *, int); + int (*permission) (struct inode *, int, struct nameidata *); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *); - int (*setxattr) (struct dentry *, const char *, void *, size_t, int); + int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t); ssize_t (*listxattr) (struct dentry *, char *, size_t); int (*removexattr) (struct dentry *, const char *); @@ -85,42 +86,55 @@ --------------------------- super_operations --------------------------- prototypes: + struct inode *(*alloc_inode)(struct super_block *sb); + void (*destroy_inode)(struct inode *); void (*read_inode) (struct inode *); + void (*dirty_inode) (struct inode *); void (*write_inode) (struct inode *, int); void (*put_inode) (struct inode *); void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); void (*put_super) (struct super_block *); void (*write_super) (struct super_block *); - int (*sync_fs) (struct super_block *sb, int wait); - int (*statfs) (struct super_block *, struct statfs *); + int (*sync_fs)(struct super_block *sb, int wait); + void (*write_super_lockfs) (struct super_block *); + void (*unlockfs) (struct super_block *); + int (*statfs) (struct super_block *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); + int (*show_options)(struct seq_file *, struct vfsmount *); locking rules: All may block. - BKL s_lock mount_sem -read_inode: yes (see below) -write_inode: no -put_inode: no -drop_inode: no !!!inode_lock!!! -delete_inode: no -clear_inode: no -put_super: yes yes maybe (see below) -write_super: no yes maybe (see below) -sync_fs: no no maybe (see below) -statfs: no no no -remount_fs: yes yes maybe (see below) -umount_begin: yes no maybe (see below) + BKL s_lock s_umount +alloc_inode: no no no +destroy_inode: no +read_inode: no (see below) +dirty_inode: no (must not sleep) +write_inode: no +put_inode: no +drop_inode: no !!!inode_lock!!! +delete_inode: no +put_super: yes yes no +write_super: no yes read +sync_fs: no no read +write_super_lockfs: ? +unlockfs: ? +statfs: no no no +remount_fs: no yes maybe (see below) +clear_inode: no +umount_begin: yes no no +show_options: no (vfsmount->sem) ->read_inode() is not a method - it's a callback used in iget(). -rules for mount_sem are not too nice - it is going to die and be replaced -by better scheme anyway. +->remount_fs() will have the s_umount lock if it's already mounted. +When called from get_sb_single, it does NOT have the s_umount lock. --------------------------- file_system_type --------------------------- prototypes: - struct super_block *(*get_sb) (struct file_system_type *, int, const char *, void *); + struct super_block *(*get_sb) (struct file_system_type *, int, + const char *, void *); void (*kill_sb) (struct super_block *); locking rules: may block BKL @@ -128,7 +142,7 @@ kill_sb yes yes ->get_sb() returns error or a locked superblock (exclusive on ->s_umount). -->kill_sb() takes a locked superblock, does all shutdown work on it, +->kill_sb() takes a write-locked superblock, does all shutdown work on it, unlocks and drops the reference. --------------------------- address_space_operations -------------------------- @@ -138,12 +152,15 @@ int (*sync_page)(struct page *); int (*writepages)(struct address_space *, struct writeback_control *); int (*set_page_dirty)(struct page *page); + int (*readpages)(struct file *filp, struct address_space *mapping, + struct list_head *pages, unsigned nr_pages); int (*prepare_write)(struct file *, struct page *, unsigned, unsigned); int (*commit_write)(struct file *, struct page *, unsigned, unsigned); - int (*bmap)(struct address_space *, long); + sector_t (*bmap)(struct address_space *, sector_t); int (*invalidatepage) (struct page *, unsigned long); int (*releasepage) (struct page *, int); - int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int); + int (*direct_IO)(int, struct kiocb *, const struct iovec *iov, + loff_t offset, unsigned long nr_segs); locking rules: All except set_page_dirty may block @@ -151,15 +168,16 @@ BKL PageLocked(page) writepage: no yes, unlocks (see below) readpage: no yes, unlocks -readpages: no sync_page: no maybe writepages: no set_page_dirty no no +readpages: no prepare_write: no yes commit_write: no yes bmap: yes invalidatepage: no yes releasepage: no yes +direct_IO: no ->prepare_write(), ->commit_write(), ->sync_page() and ->readpage() may be called from the request handler (/dev/loop). @@ -253,8 +271,8 @@ locking rules: BKL may block fl_notify: yes no -fl_insert: yes maybe -fl_remove: yes maybe +fl_insert: yes no +fl_remove: yes no Currently only NLM provides instances of this class. None of the them block. If you have out-of-tree instances - please, show up. Locking in that area will change. @@ -274,57 +292,75 @@ int (*open) (struct inode *, struct file *); int (*release) (struct inode *, struct file *); int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); - int (*check_media_change) (kdev_t); - int (*revalidate) (kdev_t); + int (*media_changed) (struct gendisk *); + int (*revalidate_disk) (struct gendisk *); + locking rules: BKL bd_sem open: yes yes release: yes yes ioctl: yes no -check_media_change: yes no -revalidate: yes no +media_changed: no no +revalidate_disk: no no -The last two are called only from check_disk_change(). Prototypes are very -bad - as soon as we'll get disk_struct they will change (and methods will -become per-disk instead of per-partition). +The last two are called only from check_disk_change(). --------------------------- file_operations ------------------------------- prototypes: loff_t (*llseek) (struct file *, loff_t, int); - ssize_t (*read) (struct file *, char *, size_t, loff_t *); - ssize_t (*write) (struct file *, const char *, size_t, loff_t *); + ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); + ssize_t (*aio_read) (struct kiocb *, char __user *, size_t, loff_t); + ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); + ssize_t (*aio_write) (struct kiocb *, const char __user *, size_t, + loff_t); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct poll_table_struct *); - int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); + int (*ioctl) (struct inode *, struct file *, unsigned int, + unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); int (*open) (struct inode *, struct file *); int (*flush) (struct file *); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, struct dentry *, int datasync); + int (*aio_fsync) (struct kiocb *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); - ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); - ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); + ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, + loff_t *); + ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, + loff_t *); + ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, + void __user *); + ssize_t (*sendpage) (struct file *, struct page *, int, size_t, + loff_t *, int); + unsigned long (*get_unmapped_area)(struct file *, unsigned long, + unsigned long, unsigned long, unsigned long); }; locking rules: All except ->poll() may block. - BKL -llseek: yes (see below) -read: no -write: no -readdir: no -poll: no -ioctl: yes (see below) -mmap: no -open: maybe (see below) -flush: no -release: no -fsync: yes (see below) -fasync: yes (see below) -lock: yes -readv: no -writev: no + BKL +llseek: no (see below) +read: no +aio_read: no +write: no +aio_write: no +readdir: no +poll: no +ioctl: yes (see below) +mmap: no +open: maybe (see below) +flush: no +release: no +fsync: no (see below) +aio_fsync: no +fasync: yes (see below) +lock: yes +readv: no +writev: no +sendfile: no +sendpage: no +get_unmapped_area: no ->llseek() locking has moved from llseek to the individual llseek implementations. If your fs is not using generic_file_llseek, you diff -Nru a/Documentation/firmware_class/README b/Documentation/firmware_class/README --- a/Documentation/firmware_class/README Fri Aug 22 17:05:49 2003 +++ b/Documentation/firmware_class/README Fri Aug 22 17:05:49 2003 @@ -15,6 +15,71 @@ 3) Some people, like the Debian crowd, don't consider some firmware free enough and remove entire drivers (e.g.: keyspan). + High level behavior (mixed): + ============================ + + kernel(driver): calls request_firmware(&fw_entry, $FIRMWARE, device) + + userspace: + - /sys/class/firmware/xxx/{loading,data} appear. + - hotplug gets called with a firmware identifier in $FIRMWARE + and the usual hotplug environment. + - hotplug: echo 1 > /sys/class/firmware/xxx/loading + + kernel: Discard any previous partial load. + + userspace: + - hotplug: cat appropriate_firmware_image > \ + /sys/class/firmware/xxx/data + + kernel: grows a buffer in PAGE_SIZE increments to hold the image as it + comes in. + + userspace: + - hotplug: echo 0 > /sys/class/firmware/xxx/loading + + kernel: request_firmware() returns and the driver has the firmware + image in fw_entry->{data,size}. If something went wrong + request_firmware() returns non-zero and fw_entry is set to + NULL. + + kernel(driver): Driver code calls release_firmware(fw_entry) releasing + the firmware image and any related resource. + + High level behavior (driver code): + ================================== + + if(request_firmware(&fw_entry, $FIRMWARE, device) == 0) + copy_fw_to_device(fw_entry->data, fw_entry->size); + release(fw_entry); + + Sample/simple hotplug script: + ============================ + + # Both $DEVPATH and $FIRMWARE are already provided in the environment. + + HOTPLUG_FW_DIR=/usr/lib/hotplug/firmware/ + + echo 1 > /sysfs/$DEVPATH/loading + cat $HOTPLUG_FW_DIR/$FIRMWARE > /sysfs/$DEVPATH/data + echo 0 > /sysfs/$DEVPATH/loading + + Random notes: + ============ + + - "echo -1 > /sys/class/firmware/xxx/loading" will cancel the load at + once and make request_firmware() return with error. + + - firmware_data_read() and firmware_loading_show() are just provided + for testing and completeness, they are not called in normal use. + + - There is also /sys/class/firmware/timeout which holds a timeout in + seconds for the whole load operation. + + - request_firmware_nowait() is also provided for convenience in + non-user contexts. + + about in-kernel persistence: --------------------------- Under some circumstances, as explained below, it would be interesting to keep @@ -56,3 +121,4 @@ Note: If persistence is implemented on top of initramfs, register_firmware() may not be appropriate. + diff -Nru a/Documentation/hw_random.txt b/Documentation/hw_random.txt --- a/Documentation/hw_random.txt Fri Aug 22 17:05:48 2003 +++ b/Documentation/hw_random.txt Fri Aug 22 17:05:48 2003 @@ -1,17 +1,17 @@ - Hardware driver for Intel i810 Random Number Generator (RNG) + Hardware driver for Intel/AMD/VIA Random Number Generators (RNG) Copyright 2000,2001 Jeff Garzik Copyright 2000,2001 Philipp Rumpf Introduction: - The i810_rng device driver is software that makes use of a - special hardware feature on the Intel i8xx-based chipsets, + The hw_random device driver is software that makes use of a + special hardware feature on your CPU or motherboard, a Random Number Generator (RNG). In order to make effective use of this device driver, you should download the support software as well. Download the - latest version of the "intel-rng-tools" package from the - i810_rng driver's official Web site: + latest version of the "rng-tools" package from the + hw_random driver's official Web site: http://sourceforge.net/projects/gkernel/ @@ -29,14 +29,14 @@ Character driver. Using the standard open() and read() system calls, you can read random data from - the i810 RNG device. This data is NOT CHECKED by any + the hardware RNG device. This data is NOT CHECKED by any fitness tests, and could potentially be bogus (if the hardware is faulty or has been tampered with). Data is only output if the hardware "has-data" flag is set, but nevertheless a security-conscious person would run fitness tests on the data before assuming it is truly random. - /dev/intel_rng is char device major 10, minor 183. + /dev/hwrandom is char device major 10, minor 183. Driver notes: @@ -69,6 +69,10 @@ did the "brains" and all the testing. Change history: + + Version 1.0.0: + * Merge Intel, AMD, VIA RNG drivers into one. + Further changelog in BitKeeper. Version 0.9.8: * Support other i8xx chipsets by adding 82801E detection diff -Nru a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt --- a/Documentation/kbuild/kconfig-language.txt Fri Aug 22 17:05:43 2003 +++ b/Documentation/kbuild/kconfig-language.txt Fri Aug 22 17:05:43 2003 @@ -124,8 +124,8 @@ '!=' (3) '(' ')' (4) '!' (5) - '||' (6) - '&&' (7) + '&&' (6) + '||' (7) Expressions are listed in decreasing order of precedence. diff -Nru a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt --- a/Documentation/kernel-parameters.txt Fri Aug 22 17:06:18 2003 +++ b/Documentation/kernel-parameters.txt Fri Aug 22 17:06:18 2003 @@ -85,7 +85,10 @@ See also Documentation/scsi/ncr53c7xx.txt. acpi= [HW,ACPI] Advanced Configuration and Power Interface - Format: off[,<...>] + Format: { force | off | ht } + force -- enables ACPI for systems with default off + off -- disabled ACPI for systems with default on + ht -- run only enough ACPI to enable Hyper Threading See also Documentation/pm.txt. ad1816= [HW,OSS] @@ -436,6 +439,8 @@ l2cr= [PPC] + lapic [IA-32,APIC] Enable the local APIC even if BIOS disabled it. + lasi= [HW,SCSI] PARISC LASI driver for the 53c700 chip Format: addr:,irq: @@ -613,8 +618,6 @@ instruction doesn't work correctly and not to use it. - noht [SMP,IA-32] Disables P4 Xeon(tm) HyperThreading. - noirqdebug [IA-32] Disables the code which attempts to detect and disable unhandled interrupt sources. @@ -624,6 +627,8 @@ initial RAM disk. nointroute [IA-64] + + nolapic [IA-32,APIC] Do not enable or use the local APIC. nomce [IA-32] Machine Check Exception diff -Nru a/Documentation/networking/8139too.txt b/Documentation/networking/8139too.txt --- a/Documentation/networking/8139too.txt Fri Aug 22 17:05:47 2003 +++ b/Documentation/networking/8139too.txt Fri Aug 22 17:05:47 2003 @@ -93,6 +93,7 @@ --------------- AOpen ALN-325C AT-2500TX 10/100 PCI Fast Ethernet Network Adapter Card +D-Link DFE-530TX Cnet CNF401 'SinglePoint' 10/100 Base-TX Genius GF 100TXR4 Fast Ethernet 10/100M PCI Network Card KTI KF-230TX diff -Nru a/Documentation/networking/driver.txt b/Documentation/networking/driver.txt --- a/Documentation/networking/driver.txt Fri Aug 22 17:05:51 2003 +++ b/Documentation/networking/driver.txt Fri Aug 22 17:05:51 2003 @@ -82,3 +82,13 @@ 1) Any hardware layer address you obtain for your device should be verified. For example, for ethernet check it with linux/etherdevice.h:is_valid_ether_addr() + +Close/stop guidelines: + +1) After the dev->stop routine has been called, the hardware must + not receive or transmit any data. All in flight packets must + be aborted. If necessary, poll or wait for completion of + any reset commands. + +2) The dev->stop routine will be called by unregister_netdevice + if device is still UP. diff -Nru a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt --- a/Documentation/networking/netdevices.txt Fri Aug 22 17:05:42 2003 +++ b/Documentation/networking/netdevices.txt Fri Aug 22 17:05:42 2003 @@ -7,6 +7,18 @@ The following is a random collection of documentation regarding network devices. +struct net_device allocation rules +================================== +Network device structures need to persist even after module is unloaded and +must be allocated with kmalloc. If device has registered successfully, +it will be freed on last use by free_netdev. This is required to handle the +pathologic case cleanly (example: rmmod mydriver priv) then it is up to the module exit handler to free that. struct net_device synchronization rules diff -Nru a/Documentation/nmi_watchdog.txt b/Documentation/nmi_watchdog.txt --- a/Documentation/nmi_watchdog.txt Fri Aug 22 17:05:41 2003 +++ b/Documentation/nmi_watchdog.txt Fri Aug 22 17:05:41 2003 @@ -12,7 +12,7 @@ NMI interrupts, the kernel can monitor whether any CPU has locked up, and print out debugging messages if so. -In order to use the NMI watchdoc, you need to have APIC support in your +In order to use the NMI watchdog, you need to have APIC support in your kernel. For SMP kernels, APIC support gets compiled in automatically. For UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and diff -Nru a/Documentation/power/devices.txt b/Documentation/power/devices.txt --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/Documentation/power/devices.txt Fri Aug 22 17:06:18 2003 @@ -0,0 +1,143 @@ + +Device Power Management + + +Device power management encompasses two areas - the ability to save +state and transition a device to a low-power state when the system is +entering a low-power state; and the ability to transition a device to +a low-power state while the system is running (and independently of +any other power management activity). + + +Methods + +The methods to suspend and resume devices reside in struct bus_type: + +struct bus_type { + ... + int (*suspend)(struct device * dev, u32 state); + int (*resume)(struct device * dev); +}; + +Each bus driver is responsible implementing these methods, translating +the call into a bus-specific request and forwarding the call to the +bus-specific drivers. For example, PCI drivers implement suspend() and +resume() methods in struct pci_driver. The PCI core is simply +responsible for translating the pointers to PCI-specific ones and +calling the low-level driver. + +This is done to a) ease transition to the new power management methods +and leverage the existing PM code in various bus drivers; b) allow +buses to implement generic and default PM routines for devices, and c) +make the flow of execution obvious to the reader. + + +System Power Management + +When the system enters a low-power state, the device tree is walked in +a depth-first fashion to transition each device into a low-power +state. The ordering of the device tree is guaranteed by the order in +which devices get registered - children are never registered before +their ancestors, and devices are placed at the back of the list when +registered. By walking the list in reverse order, we are guaranteed to +suspend devices in the proper order. + +Devices are suspended once with interrupts enabled. Drivers are +expected to stop I/O transactions, save device state, and place the +device into a low-power state. Drivers may sleep, allocate memory, +etc. at will. + +Some devices are broken and will inevitably have problems powering +down or disabling themselves with interrupts enabled. For these +special cases, they may return -EAGAIN. This will put the device on a +list to be taken care of later. When interrupts are disabled, before +we enter the low-power state, their drivers are called again to put +their device to sleep. + +On resume, the devices that returned -EAGAIN will be called to power +themselves back on with interrupts disabled. Once interrupts have been +re-enabled, the rest of the drivers will be called to resume their +devices. On resume, a driver is responsible for powering back on each +device, restoring state, and re-enabling I/O transactions for that +device. + +System devices follow a slightly different API, which can be found in + + include/linux/sysdev.h + drivers/base/sys.c + +System devices will only be suspended with interrupts disabled, and +after all other devices have been suspended. On resume, they will be +resumed before any other devices, and also with interrupts disabled. + + +Runtime Power Management + +Many devices are able to dynamically power down while the system is +still running. This feature is useful for devices that are not being +used, and can offer significant power savings on a running system. + +In each device's directory, there is a 'power' directory, which +contains at least a 'state' file. Reading from this file displays what +power state the device is currently in. Writing to this file initiates +a transition to the specified power state, which must be a decimal in +the range 1-3, inclusive; or 0 for 'On'. + +The PM core will call the ->suspend() method in the bus_type object +that the device belongs to if the specified state is not 0, or +->resume() if it is. + +Nothing will happen if the specified state is the same state the +device is currently in. + +If the device is already in a low-power state, and the specified state +is another, but different, low-power state, the ->resume() method will +first be called to power the device back on, then ->suspend() will be +called again with the new state. + +The driver is responsible for saving the working state of the device +and putting it into the low-power state specified. If this was +successful, it returns 0, and the device's power_state field is +updated. + +The driver must take care to know whether or not it is able to +properly resume the device, including all step of reinitialization +necessary. (This is the hardest part, and the one most protected by +NDA'd documents). + +The driver must also take care not to suspend a device that is +currently in use. It is their responsibility to provide their own +exclusion mechanisms. + +The runtime power transition happens with interrupts enabled. If a +device cannot support being powered down with interrupts, it may +return -EAGAIN (as it would during a system power management +transition), but it will _not_ be called again, and the transaction +will fail. + +There is currently no way to know what states a device or driver +supports a priori. This will change in the future. + + +Driver Detach Power Management + +The kernel now supports the ability to place a device in a low-power +state when it is detached from its driver, which happens when its +module is removed. + +Each device contains a 'detach_state' file in its sysfs directory +which can be used to control this state. Reading from this file +displays what the current detach state is set to. This is 0 (On) by +default. A user may write a positive integer value to this file in the +range of 1-4 inclusive. + +A value of 1-3 will indicate the device should be placed in that +low-power state, which will cause ->suspend() to be called for that +device. A value of 4 indicates that the device should be shutdown, so +->shutdown() will be called for that device. + +The driver is responsible for reinitializing the device when the +module is re-inserted during it's ->probe() (or equivalent) method. +The driver core will not call any extra functions when binding the +device to the driver. + diff -Nru a/Documentation/power/interface.txt b/Documentation/power/interface.txt --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/Documentation/power/interface.txt Fri Aug 22 17:06:18 2003 @@ -0,0 +1,43 @@ +Power Management Interface + + +The power management subsystem provides a unified sysfs interface to +userspace, regardless of what architecture or platform one is +running. The interface exists in /sys/power/ directory (assuming sysfs +is mounted at /sys). + +/sys/power/state controls system power state. Reading from this file +returns what states are supported, which is hard-coded to 'standby' +(Power-On Suspend), 'mem' (Suspend-to-RAM), and 'disk' +(Suspend-to-Disk). + +Writing to this file one of those strings causes the system to +transition into that state. Please see the file +Documentation/power/states.txt for a description of each of those +states. + + +/sys/power/disk controls the operating mode of the suspend-to-disk +mechanism. Suspend-to-disk can be handled in several ways. The +greatest distinction is who writes memory to disk - the firmware or +the kernel. If the firmware does it, we assume that it also handles +suspending the system. + +If the kernel does it, then we have three options for putting the system +to sleep - using the platform driver (e.g. ACPI or other PM +registers), powering off the system or rebooting the system (for +testing). The system will support either 'firmware' or 'platform', and +that is known a priori. But, the user may choose 'shutdown' or +'reboot' as alternatives. + +Reading from this file will display what the mode is currently set +to. Writing to this file will accept one of + + 'firmware' + 'platform' + 'shutdown' + 'reboot' + +It will only change to 'firmware' or 'platform' if the system supports +it. + diff -Nru a/Documentation/power/states.txt b/Documentation/power/states.txt --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/Documentation/power/states.txt Fri Aug 22 17:06:18 2003 @@ -0,0 +1,79 @@ + +System Power Management States + + +The kernel supports three power management states generically, though +each is dependent on platform support code to implement the low-level +details for each state. This file describes each state, what they are +commonly called, what ACPI state they map to, and what string to write +to /sys/power/state to enter that state + + +State: Standby / Power-On Suspend +ACPI State: S1 +String: "standby" + +This state offers minimal, though real, power savings, while providing +a very low-latency transition back to a working system. No operating +state is lost (the CPU retains power), so the system easily starts up +again where it left off. + +We try to put devices in a low-power state equivalent to D1, which +also offers low power savings, but low resume latency. Not all devices +support D1, and those that don't are left on. + +A transition from Standby to the On state should take about 1-2 +seconds. + + +State: Suspend-to-RAM +ACPI State: S3 +String: "mem" + +This state offers significant power savings as everything in the +system is put into a low-power state, except for memory, which is +placed in self-refresh mode to retain its contents. + +System and device state is saved and kept in memory. All devices are +suspended and put into D3. In many cases, all peripheral buses lose +power when entering STR, so devices must be able to handle the +transition back to the On state. + +For at least ACPI, STR requires some minimal boot-strapping code to +resume the system from STR. This may be true on other platforms. + +A transition from Suspend-to-RAM to the On state should take about +3-5 seconds. + + +State: Suspend-to-disk +ACPI State: S4 +String: "disk" + +This state offers the greatest power savings, and can be used even in +the absence of low-level platform support for power management. This +state operates similarly to Suspend-to-RAM, but includes a final step +of writing memory contents to disk. On resume, this is read and memory +is restored to its pre-suspend state. + +STD can be handled by the firmware or the kernel. If it is handled by +the firmware, it usually requires a dedicated partition that must be +setup via another operating system for it to use. Despite the +inconvenience, this method requires minimal work by the kernel, since +the firmware will also handle restoring memory contents on resume. + +If the kernel is responsible for persistantly saving state, a mechanism +called 'swsusp' (Swap Suspend) is used to write memory contents to +free swap space. swsusp has some restrictive requirements, but should +work in most cases. Some, albeit outdated, documentation can be found +in Documentation/power/swsusp.txt. + +Once memory state is written to disk, the system may either enter a +low-power state (like ACPI S4), or it may simply power down. Powering +down offers greater savings, and allows this mechanism to work on any +system. However, entering a real low-power state allows the user to +trigger wake up events (e.g. pressing a key or opening a laptop lid). + +A transition from Suspend-to-Disk to the On state should take about 30 +seconds, though it's typically a bit more with the current +implementation. diff -Nru a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/Documentation/power/swsusp.txt Fri Aug 22 17:05:38 2003 @@ -0,0 +1,196 @@ +From kernel/suspend.c: + + * BIG FAT WARNING ********************************************************* + * + * If you have unsupported (*) devices using DMA... + * ...say goodbye to your data. + * + * If you touch anything on disk between suspend and resume... + * ...kiss your data goodbye. + * + * If your disk driver does not support suspend... (IDE does) + * ...you'd better find out how to get along + * without your data. + * + * (*) pm interface support is needed to make it safe. + +You need to append resume=/dev/your_swap_partition to kernel command +line. Then you suspend by echo 4 > /proc/acpi/sleep. + +[Notice. Rest docs is pretty outdated (see date!) It should be safe to +use swsusp on ext3/reiserfs these days.] + + +Article about goals and implementation of Software Suspend for Linux +Author: G‚ábor Kuti +Last revised: 2002-04-08 + +Idea and goals to achieve + +Nowadays it is common in several laptops that they have a suspend button. It +saves the state of the machine to a filesystem or to a partition and switches +to standby mode. Later resuming the machine the saved state is loaded back to +ram and the machine can continue its work. It has two real benefits. First we +save ourselves the time machine goes down and later boots up, energy costs +real high when running from batteries. The other gain is that we don't have to +interrupt our programs so processes that are calculating something for a long +time shouldn't need to be written interruptible. + +On desk machines the power saving function isn't as important as it is in +laptops but we really may benefit from the second one. Nowadays the number of +desk machines supporting suspend function in their APM is going up but there +are (and there will still be for a long time) machines that don't even support +APM of any kind. On the other hand it is reported that using APM's suspend +some irqs (e.g. ATA disk irq) is lost and it is annoying for the user until +the Linux kernel resets the device. + +So I started thinking about implementing Software Suspend which doesn't need +any APM support and - since it uses pretty near only high-level routines - is +supposed to be architecture independent code. + +Using the code + +The code is experimental right now - testers, extra eyes are welcome. To +compile this support into the kernel, you need CONFIG_EXPERIMENTAL, +and then CONFIG_SOFTWARE_SUSPEND in menu General Setup to be enabled. It +cannot be used as a module and I don't think it will ever be needed. + +You have two ways to use this code. The first one is if you've compiled in +sysrq support then you may press Sysrq-D to request suspend. The other way +is with a patched SysVinit (my patch is against 2.76 and available at my +home page). You might call 'swsusp' or 'shutdown -z