## Automatically generated incremental diff ## From: linux-2.5.1-pre10 ## To: linux-2.5.1-pre11 ## Robot: $Id: make-incremental-diff,v 1.9 2001/12/10 00:06:56 hpa Exp $ diff -urN linux-2.5.1-pre10/Documentation/driver-model.txt linux/Documentation/driver-model.txt --- linux-2.5.1-pre10/Documentation/driver-model.txt Wed Dec 31 16:00:00 1969 +++ linux/Documentation/driver-model.txt Wed Dec 12 23:32:28 2001 @@ -0,0 +1,598 @@ +The (New) Linux Kernel Driver Model + +Version 0.04 + +Patrick Mochel + +03 December 2001 + + +Overview +~~~~~~~~ + +This driver model is a unification of all the current, disparate driver models +that are currently in the kernel. It is intended is to augment the +bus-specific drivers for bridges and devices by consolidating a set of data +and operations into globally accessible data structures. + +Current driver models implement some sort of tree-like structure (sometimes +just a list) for the devices they control. But, there is no linkage between +the different bus types. + +A common data structure can provide this linkage with little overhead: when a +bus driver discovers a particular device, it can insert it into the global +tree as well as its local tree. In fact, the local tree becomes just a subset +of the global tree. + +Common data fields can also be moved out of the local bus models into the +global model. Some of the manipulation of these fields can also be +consolidated. Most likely, manipulation functions will become a set +of helper functions, which the bus drivers wrap around to include any +bus-specific items. + +The common device and bridge interface currently reflects the goals of the +modern PC: namely the ability to do seamless Plug and Play, power management, +and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures +us that any device in the system may fit any of these criteria.) + +In reality, not every bus will be able to support such operations. But, most +buses will support a majority of those operations, and all future buses will. +In other words, a bus that doesn't support an operation is the exception, +instead of the other way around. + + +Drivers +~~~~~~~ + +The callbacks for bridges and devices are intended to be singular for a +particular type of bus. For each type of bus that has support compiled in the +kernel, there should be one statically allocated structure with the +appropriate callbacks that each device (or bridge) of that type share. + +Each bus layer should implement the callbacks for these drivers. It then +forwards the calls on to the device-specific callbacks. This means that +device-specific drivers must still implement callbacks for each operation. +But, they are not called from the top level driver layer. + +This does add another layer of indirection for calling one of these functions, +but there are benefits that are believed to outweigh this slowdown. + +First, it prevents device-specific drivers from having to know about the +global device layer. This speeds up integration time incredibly. It also +allows drivers to be more portable across kernel versions. Note that the +former was intentional, the latter is an added bonus. + +Second, this added indirection allows the bus to perform any additional logic +necessary for its child devices. A bus layer may add additional information to +the call, or translate it into something meaningful for its children. + +This could be done in the driver, but if it happens for every object of a +particular type, it is best done at a higher level. + +Recap +~~~~~ + +Instances of devices and bridges are allocated dynamically as the system +discovers their existence. Their fields describe the individual object. +Drivers - in the global sense - are statically allocated and singular for a +particular type of bus. They describe a set of operations that every type of +bus could implement, the implementation following the bus's semantics. + + +Downstream Access +~~~~~~~~~~~~~~~~~ + +Common data fields have been moved out of individual bus layers into a common +data structure. But, these fields must still be accessed by the bus layers, +and sometimes by the device-specific drivers. + +Other bus layers are encouraged to do what has been done for the PCI layer. +struct pci_dev now looks like this: + +struct pci_dev { + ... + + struct device device; +}; + +Note first that it is statically allocated. This means only one allocation on +device discovery. Note also that it is at the _end_ of struct pci_dev. This is +to make people think about what they're doing when switching between the bus +driver and the global driver; and to prevent against mindless casts between +the two. + +The PCI bus layer freely accesses the fields of struct device. It knows about +the structure of struct pci_dev, and it should know the structure of struct +device. PCI devices that have been converted generally do not touch the fields +of struct device. More precisely, device-specific drivers should not touch +fields of struct device unless there is a strong compelling reason to do so. + +This abstraction is prevention of unnecessary pain during transitional phases. +If the name of the field changes or is removed, then every downstream driver +will break. On the other hand, if only the bus layer (and not the device +layer) accesses struct device, it is only those that need to change. + + +User Interface +~~~~~~~~~~~~~~ + +By virtue of having a complete hierarchical view of all the devices in the +system, exporting a complete hierarchical view to userspace becomes relatively +easy. + +Whenever a device is inserted into the tree, a directory is created for it. +This directory may be populated at each layer of discovery - the global layer, +the bus layer, or the device layer. + +The global layer currently creates two files - 'status' and 'power'. The +former only reports the name of the device and its bus ID. The latter reports +the current power state of the device. It also be used to set the current +power state. + +The bus layer may also create files for the devices it finds while probing the +bus. For example, the PCI layer currently creates 'wake' and 'resource' files +for each PCI device. + +A device-specific driver may also export files in its directory to expose +device-specific data or tunable interfaces. + +These features were initially implemented using procfs. However, after one +conversation with Linus, a new filesystem - driverfs - was created to +implement these features. It is an in-memory filesystem, based heavily off of +ramfs, though it uses procfs as inspiration for its callback functionality. + +Each struct device has a 'struct driver_dir_entry' which encapsulates the +device's directory and the files within. + +Device Structures +~~~~~~~~~~~~~~~~~ + +struct device { + struct list_head bus_list; + struct iobus *parent; + struct iobus *subordinate; + + char name[DEVICE_NAME_SIZE]; + char bus_id[BUS_ID_SIZE]; + + struct driver_dir_entry * dir; + + spinlock_t lock; + atomic_t refcount; + + struct device_driver *driver; + void *driver_data; + void *platform_data; + + u32 current_state; + unsigned char *saved_state; +}; + +bus_list: + List of all devices on a particular bus; i.e. the device's siblings + +parent: + The parent bridge for the device. + +subordinate: + If the device is a bridge itself, this points to the struct io_bus that is + created for it. + +name: + Human readable (descriptive) name of device. E.g. "Intel EEPro 100" + +bus_id: + Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function + 0). It is necessary to have a searchable bus id for each device; making it + ASCII allows us to use it for its directory name without translating it. + +dir: + Driver's driverfs directory. + +lock: + Driver specific lock. + +refcount: + Driver's usage count. + When this goes to 0, the device is assumed to be removed. It will be removed + from its parent's list of children. It's remove() callback will be called to + inform the driver to clean up after itself. + +driver: + Pointer to a struct device_driver, the common operations for each device. See + next section. + +driver_data: + Private data for the driver. + Much like the PCI implementation of this field, this allows device-specific + drivers to keep a pointer to a device-specific data. + +platform_data: + Data that the platform (firmware) provides about the device. + For example, the ACPI BIOS or EFI may have additional information about the + device that is not directly mappable to any existing kernel data structure. + It also allows the platform driver (e.g. ACPI) to a driver without the driver + having to have explicit knowledge of (atrocities like) ACPI. + + +current_state: + Current power state of the device. For PCI and other modern devices, this is + 0-3, though it's not necessarily limited to those values. + +saved_state: + Pointer to driver-specific set of saved state. + Having it here allows modules to be unloaded on system suspend and reloaded + on resume and maintain state across transitions. + It also allows generic drivers to maintain state across system state + transitions. + (I've implemented a generic PCI driver for devices that don't have a + device-specific driver. Instead of managing some vector of saved state + for each device the generic driver supports, it can simply store it here.) + + + +struct device_driver { + int (*probe) (struct device *dev); + int (*remove) (struct device *dev); + + int (*suspend) (struct device *dev, u32 state, u32 level); + int (*resume) (struct device *dev, u32 level); +} + +probe: + Check for device existence and associate driver with it. + +remove: + Dissociate driver with device. Releases device so that it could be used by + another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an + ejection event could take place here. + +suspend: + Perform one step of the device suspend process. + +resume: + Perform one step of the device resume process. + +The probe() and remove() callbacks are intended to be much simpler than the +current PCI correspondents. + +probe() should do the following only: + +- Check if hardware is present +- Register device interface +- Disable DMA/interrupts, etc, just in case. + +Some device initialisation was done in probe(). This should not be the case +anymore. All initialisation should take place in the open() call for the +device. + +Breaking initialisation code out must also be done for the resume() callback, +as most devices will have to be completely reinitialised when coming back from +a suspend state. + +remove() should simply unregister the device interface. + + +Device power management can be quite complicated, based exactly what is +desired to be done. Four operations sum up most of it: + +- OS directed power management. + The OS takes care of notifying all drivers that a suspend is requested, + saving device state, and powering devices down. +- Firmware controlled power management. + The OS only wants to notify devices that a suspend is requested. +- Device power management. + A user wants to place only one device in a low power state, and maybe save + state. +- System reboot. + The system wants to place devices in a quiescent state before the system is + reset. + +In an attempt to please all of these scenarios, the power management +transition for any device is broken up into several stages - notify, save +state, and power down. The disable stage, which should happen after notify and +before save state has been considered and may be implemented in the future. + +Depending on what the system-wide policy is (usually dictated by the power +management scheme present), each driver's suspend callback may be called +multiple times, each with a different stage. + +On all power management transitions, the stages should be called sequentially +(notify before save state; save state before power down). However, drivers +should not assume that any stage was called before hand. (If a driver gets a +power down call, it shouldn't assume notify or save state was called first.) +This allows the framework to be used seamlessly by all power management +actions. Hopefully. + +Resume transitions happen in a similar manner. They are broken up into two +stages currently (power on and restore state), though a third stage (enable) +may be added later. + +For suspend and resume transitions, the following values are defined to denote +the stage: + +enum{ + SUSPEND_NOTIFY, + SUSPEND_SAVE_STATE, + SUSPEND_POWER_DOWN, +}; + +enum { + RESUME_POWER_ON, + RESUME_RESTORE_STATE, +}; + + +During a system power transition, the device tree must be walked in order, +calling the suspend() or resume() callback for each node. This may happen +several times. + +Initially, this was done in kernel space. However, it has occurred to me that +doing recursion to a non-bounded depth is dangerous, and that there are a lot +of inherent race conditions in such an operation. + +Non-recursive walking of the device tree is possible. However, this makes for +convoluted code. + +No matter what, if the transition happens in kernel space, it is difficult to +gracefully recover from errors or to implement a policy that prevents one from +shutting down the device(s) you want to save state to. + +Instead, the walking of the device tree has been moved to userspace. When a +user requests the system to suspend, it will walk the device tree, as exported +via driverfs, and tell each device to go to sleep. It will do this multiple +times based on what the system policy is. + +Device resume should happen in the same manner when the system awakens. + +Each suspend stage is described below: + +SUSPEND_NOTIFY: + +This level to notify the driver that it is going to sleep. If it knows that it +cannot resume the hardware from the requested level, or it feels that it is +too important to be put to sleep, it should return an error from this function. + +It does not have to stop I/O requests or actually save state at this point. + +SUSPEND_DISABLE: + +The driver should stop taking I/O requests at this stage. Because the save +state stage happens afterwards, the driver may not want to physically disable +the device; only mark itself unavailable if possible. + +SUSPEND_SAVE_STATE: + +The driver should allocate memory and save any device state that is relevant +for the state it is going to enter. + +SUSPEND_POWER_DOWN: + +The driver should place the device in the power state requested. + + +For resume, the stages are defined as follows: + +RESUME_POWER_ON: + +Devices should be powered on and reinitialised to some known working state. + +RESUME_RESTORE_STATE: + +The driver should restore device state to its pre-suspend state and free any +memory allocated for its saved state. + +RESUME_ENABLE: + +The device should start taking I/O requests again. + + +Each driver does not have to implement each stage. But, it if it does +implemente a stage, it should do what is described above. It should not assume +that it performed any stage previously, or that it will perform any stage +later. + +It is quite possible that a driver can fail during the suspend process, for +whatever reason. In this event, the calling process must gracefully recover +and restore everything to their states before the suspend transition began. + +If a driver knows that it cannot suspend or resume properly, it should fail +during the notify stage. Properly implemented power management schemes should +make sure that this is the first stage that is called. + +If a driver gets a power down request, it should obey it, as it may very +likely be during a reboot. + + +Bus Structures +~~~~~~~~~~~~~~ + +struct iobus { + struct list_head node; + struct iobus *parent; + struct list_head children; + struct list_head devices; + + struct list_head bus_list; + + spinlock_t lock; + atomic_t refcount; + + struct device *self; + struct driver_dir_entry * dir; + + char name[DEVICE_NAME_SIZE]; + char bus_id[BUS_ID_SIZE]; + + struct bus_driver *driver; +}; + +node: + Bus's node in sibling list (its parent's list of child buses). + +parent: + Pointer to parent bridge. + +children: + List of subordinate buses. + In the children, this correlates to their 'node' field. + +devices: + List of devices on the bus this bridge controls. + This field corresponds to the 'bus_list' field in each child device. + +bus_list: + Each type of bus keeps a list of all bridges that it finds. This is the + bridges entry in that list. + +self: + Pointer to the struct device for this bridge. + +lock: + Lock for the bus. + +refcount: + Usage count for the bus. + +dir: + Driverfs directory. + +name: + Human readable ASCII name of bus. + +bus_id: + Machine readable (though ASCII) description of position on parent bus. + +driver: + Pointer to operations for bus. + + +struct iobus_driver { + char name[16]; + struct list_head node; + + int (*scan) (struct io_bus*); + int (*add_device) (struct io_bus*, char*); +}; + +name: + ASCII name of bus. + +node: + List of buses of this type in system. + +scan: + Search the bus for new devices. This may happen either at boot - where every + device discovered will be new - or later on - in which there may only be a few + (or no) new devices. + +add_device: + Trigger a device insertion at a particular location. + + + +The API +~~~~~~~ + +There are several functions exported by the global device layer, including +several optional helper functions, written solely to try and make your life +easier. + +void device_init_dev(struct device * dev); + +Initialise a device structure. It first zeros the device, the initialises all +of the lists. (Note that this would have been called device_init(), but that +name was already taken. :/) + + +struct device * device_alloc(void) + +Allocate memory for a device structure and initialise it. +First, allocates memory, then calls device_init_dev() with the new pointer. + + +int device_register(struct device * dev); + +Register a device with the global device layer. +The bus layer should call this function upon device discovery, e.g. when +probing the bus. +dev should be fully initialised when this is called. +If dev->parent is not set, it sets its parent to be the device root. +It then does the following: + - inserts it into its parent's list of children + - creates a driverfs directory for it + - creates a set of default files for the device in its directory + - calls platform_notify() to notify the firmware driver of its existence. + + +void get_device(struct device * dev); + +Increment the refcount for a device. + + +int valid_device(struct device * dev); + +Check if reference count is positive for a device (it's not waiting to be +freed). If it is positive, it increments the reference count for the device. +It returns whether or not the device is usable. + + +void put_device(struct device * dev); + +Decrement the reference count for the device. If it hits 0, it removes the +device from its parent's list of children and calls the remove() callback for +the device. + + +void lock_device(struct device * dev); + +Take the spinlock for the device. + + +void unlock_device(struct device * dev); + +Release the spinlock for the device. + + + +void iobus_init(struct iobus * iobus); +struct iobus * iobus_alloc(void); +int iobus_register(struct iobus * iobus); +void get_iobus(struct iobus * iobus); +int valid_iobus(struct iobus * iobus); +void put_iobus(struct iobus * iobus); +void lock_iobus(struct iobus * iobus); +void unlock_iobus(struct iobus * iobus); + +These functions provide the same functionality as the device_* +counterparts, only operating on a struct iobus. One important thing to note, +though is that iobus_register() and iobus_unregister() operate recursively. It +is possible to add an entire tree in one call. + + + +int device_driver_init(void); + +Main initialisation routine. + +This makes sure driverfs is up and running and initialises the device tree. + + +void device_driver_exit(void); + +This frees up the device tree. + + + + +Credits +~~~~~~~ + +The following people have been extremely helpful in solidifying this document +and the driver model. + +Randy Dunlap rddunlap@osdl.org +Jeff Garzik jgarzik@mandrakesoft.com +Ben Herrenschmidt benh@kernel.crashing.org + + diff -urN linux-2.5.1-pre10/Documentation/filesystems/driverfs.txt linux/Documentation/filesystems/driverfs.txt --- linux-2.5.1-pre10/Documentation/filesystems/driverfs.txt Wed Dec 31 16:00:00 1969 +++ linux/Documentation/filesystems/driverfs.txt Wed Dec 12 23:32:28 2001 @@ -0,0 +1,211 @@ + +driverfs - The Device Driver Filesystem + +Patrick Mochel + +3 December 2001 + + +What it is: +~~~~~~~~~~~ +driverfs is a unified means for device drivers to export interfaces to +userspace. + +Some drivers have a need for exporting interfaces for things like +setting device-specific parameters, or tuning the device performance. +For example, wireless networking cards export a file in procfs to set +their SSID. + +Other times, the bus on which a device resides may export other +information about the device. For example, PCI and USB both export +device information via procfs or usbdevfs. + +In these cases, the files or directories are in nearly random places +in /proc. One benefit of driverfs is that it can consolidate all of +these interfaces to one standard location. + + +Why it's better than procfs: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This of course can't happen without changing every single driver that +exports a procfs interface, and having some coordination between all +of them as to what the proper place for their files is. Or can it? + + +driverfs was developed in conjunction with the new driver model for +the 2.5 kernel. In that model, the system has one unified tree of all +the devices that are present in the system. It follows naturally that +this tree can be exported to userspace in the same order. + +So, every bus and every device gets a directory in the filesystem. +This directory is created when the device is registered in the tree; +before the driver actually gets a initialised. The dentry for this +directory is stored in the struct device for this driver, so the +driver has access to it. + +Now, every driver has one standard place to export its files. + +Granted, the location of the file is not as intuitive as it may have +been under procfs. But, I argue that with the exception of +/proc/bus/pci, none of the files had intuitive locations. I also argue +that the development of userspace tools can help cope with these +changes and inconsistencies in locations. + + +Why we're not just using procfs: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When developing the new driver model, it was initially implemented +with a procfs tree. In explaining the concept to Linus, he said "Don't +use proc." + +I was a little shocked (especially considering I had already +implemented it using procfs). "What do you mean 'don't use proc'?" + +His argument was that too many things use proc that shouldn't. And +even more things misuse proc that shouldn't. On top of that, procfs +was written before the VFS layer was written, so it doesn't use the +dcache. It reimplements many of the same features that the dcache +does, and is in general, crufty. + +So, he told me to write my own. Soon after, he pointed me at ramfs, +the simplest filesystem known to man. + +Consequently, we have a virtual fileystem based heavily on ramfs, and +borrowing some conceptual functionality from procfs. + +It may suck, but it does what it was designed to. At least so far. + + +How it works: +~~~~~~~~~~~~~ + +Directories are encapsulated like this: + +struct driver_dir_entry { + char * name; + struct dentry * dentry; + mode_t mode; + struct list_head files; +}; + +name: + Name of the directory. +dentry: + Dentry for the directory. +mode: + Permissions of the directory. +files: + Linked list of driver_file_entry's that are in the directory. + + +To create a directory, one first calls + +struct driver_dir_entry * +driverfs_create_dir_entry(const char * name, mode_t mode); + +which allocates and initialises a struct driver_dir_entry. Then to actually +create the directory: + +int driverfs_create_dir(struct driver_dir_entry *, struct driver_dir_entry *); + +To remove a directory: + +void driverfs_remove_dir(struct driver_dir_entry * entry); + + +Files are encapsulated like this: + +struct driver_file_entry { + struct driver_dir_entry * parent; + struct list_head node; + char * name; + mode_t mode; + struct dentry * dentry; + void * data; + struct driverfs_operations * ops; +}; + +struct driverfs_operations { + ssize_t (*read) (char *, size_t, loff_t, void *); + ssize_t (*write)(const char *, size_t, loff_t, void*); +}; + +node: + Node in its parent directory's list of files. + +name: + The name of the file. + +dentry: + The dentry for the file. + +data: + Caller specific data that is passed to the callbacks when they + are called. + +ops: + Operations for the file. Currently, this only contains read() and write() + callbacks for the file. + +To create a file, one first calls + +struct driver_file_entry * +driverfs_create_entry (const char * name, mode_t mode, + struct driverfs_operations * ops, void * data); + +That allocates and initialises a struct driver_file_entry. Then, to actually +create a file, one calls + +int driverfs_create_file(struct driver_file_entry * entry, + struct driver_dir_entry * parent); + + +To remove a file, one calls + +void driverfs_remove_file(struct driver_dir_entry *, const char * name); + + +The callback functionality is similar to the way procfs works. When a +user performs a read(2) or write(2) on the file, it first calls a +driverfs function. This function then checks for a non-NULL pointer in +the file->private_data field, which it assumes to be a pointer to a +struct driver_file_entry. + +It then checks for the appropriate callback and calls it. + + +What driverfs is not: +~~~~~~~~~~~~~~~~~~~~~ +It is not a replacement for either devfs or procfs. + +It does not handle device nodes, like devfs is intended to do. I think +this functionality is possible, but indeed think that integration of +the device nodes and control files should be done. Whether driverfs or +devfs, or something else, is the place to do it, I don't know. + +It is not intended to be a replacement for all of the procfs +functionality. I think that many of the driver files should be moved +out of /proc (and maybe a few other things as well ;). + + + +Limitations: +~~~~~~~~~~~~ +The driverfs functions assume that at most a page is being either read +or written each time. + + +Possible bugs: +~~~~~~~~~~~~~~ +It may not deal with offsets and/or seeks very well, especially if +they cross a page boundary. + +There may be locking issues when dynamically adding/removing files and +directories rapidly (like if you have a hot plug device). + +There are some people that believe that filesystems which add +files/directories dynamically based on the presence of devices are +inherently flawed. Though not as technically versed in this area as +some of those people, I like to believe that they can be made to work, +with the right guidance. + diff -urN linux-2.5.1-pre10/Makefile linux/Makefile --- linux-2.5.1-pre10/Makefile Wed Dec 12 23:32:26 2001 +++ linux/Makefile Wed Dec 12 23:32:28 2001 @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 5 SUBLEVEL = 1 -EXTRAVERSION =-pre10 +EXTRAVERSION =-pre11 KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION) diff -urN linux-2.5.1-pre10/arch/i386/lib/iodebug.c linux/arch/i386/lib/iodebug.c --- linux-2.5.1-pre10/arch/i386/lib/iodebug.c Tue Oct 19 12:26:53 1999 +++ linux/arch/i386/lib/iodebug.c Wed Dec 12 23:32:28 2001 @@ -9,11 +9,3 @@ return (void *)x; } -unsigned long __io_phys_debug(unsigned long x, const char *file, int line) -{ - if (x < PAGE_OFFSET) { - printk("io mapaddr 0x%05lx not valid at %s:%d!\n", x, file, line); - return x; - } - return __pa(x); -} diff -urN linux-2.5.1-pre10/drivers/block/cciss.c linux/drivers/block/cciss.c --- linux-2.5.1-pre10/drivers/block/cciss.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/cciss.c Wed Dec 12 23:32:29 2001 @@ -1237,7 +1237,7 @@ blkdev_dequeue_request(creq); - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); c->cmd_type = CMD_RWREQ; c->rq = creq; @@ -1298,7 +1298,7 @@ c->Request.CDB[8]= creq->nr_sectors & 0xff; c->Request.CDB[9] = c->Request.CDB[11] = c->Request.CDB[12] = 0; - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); addQ(&(h->reqQ),c); h->Qdepth++; @@ -1866,7 +1866,7 @@ q = BLK_DEFAULT_QUEUE(MAJOR_NR + i); q->queuedata = hba[i]; - blk_init_queue(q, do_cciss_request); + blk_init_queue(q, do_cciss_request, &hba[i]->lock); blk_queue_bounce_limit(q, hba[i]->pdev->dma_mask); blk_queue_max_segments(q, MAXSGENTRIES); blk_queue_max_sectors(q, 512); diff -urN linux-2.5.1-pre10/drivers/block/cciss.h linux/drivers/block/cciss.h --- linux-2.5.1-pre10/drivers/block/cciss.h Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/cciss.h Wed Dec 12 23:32:29 2001 @@ -66,6 +66,7 @@ unsigned int Qdepth; unsigned int maxQsinceinit; unsigned int maxSG; + spinlock_t lock; //* pointers to command and error info pool */ CommandList_struct *cmd_pool; @@ -242,7 +243,7 @@ struct access_method *access; }; -#define CCISS_LOCK(i) (&((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock)) +#define CCISS_LOCK(i) ((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock) #endif /* CCISS_H */ diff -urN linux-2.5.1-pre10/drivers/block/cpqarray.c linux/drivers/block/cpqarray.c --- linux-2.5.1-pre10/drivers/block/cpqarray.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/cpqarray.c Wed Dec 12 23:32:29 2001 @@ -467,7 +467,7 @@ q = BLK_DEFAULT_QUEUE(MAJOR_NR + i); q->queuedata = hba[i]; - blk_init_queue(q, do_ida_request); + blk_init_queue(q, do_ida_request, &hba[i]->lock); blk_queue_bounce_limit(q, hba[i]->pci_dev->dma_mask); blk_queue_max_segments(q, SG_MAX); blksize_size[MAJOR_NR+i] = ida_blocksizes + (i*256); @@ -882,7 +882,7 @@ blkdev_dequeue_request(creq); - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); c->ctlr = h->ctlr; c->hdr.unit = MINOR(creq->rq_dev) >> NWD_SHIFT; @@ -915,7 +915,7 @@ c->req.hdr.cmd = (rq_data_dir(creq) == READ) ? IDA_READ : IDA_WRITE; c->type = CMD_RWREQ; - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); /* Put the request on the tail of the request queue */ addQ(&h->reqQ, c); diff -urN linux-2.5.1-pre10/drivers/block/cpqarray.h linux/drivers/block/cpqarray.h --- linux-2.5.1-pre10/drivers/block/cpqarray.h Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/cpqarray.h Wed Dec 12 23:32:29 2001 @@ -106,6 +106,7 @@ cmdlist_t *cmd_pool; dma_addr_t cmd_pool_dhandle; __u32 *cmd_pool_bits; + spinlock_t lock; unsigned int Qdepth; unsigned int maxQsinceinit; @@ -117,7 +118,7 @@ unsigned int misc_tflags; }; -#define IDA_LOCK(i) (&((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock)) +#define IDA_LOCK(i) ((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock) #endif diff -urN linux-2.5.1-pre10/drivers/block/floppy.c linux/drivers/block/floppy.c --- linux-2.5.1-pre10/drivers/block/floppy.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/floppy.c Wed Dec 12 23:32:29 2001 @@ -204,6 +204,8 @@ * record each buffers capabilities */ +static spinlock_t floppy_lock; + static unsigned short virtual_dma_port=0x3f0; void floppy_interrupt(int irq, void *dev_id, struct pt_regs * regs); static int set_dor(int fdc, char mask, char data); @@ -2296,7 +2298,7 @@ DRS->maxtrack = 1; /* unlock chained buffers */ - spin_lock_irqsave(&QUEUE->queue_lock, flags); + spin_lock_irqsave(QUEUE->queue_lock, flags); while (current_count_sectors && !QUEUE_EMPTY && current_count_sectors >= CURRENT->current_nr_sectors){ current_count_sectors -= CURRENT->current_nr_sectors; @@ -2304,7 +2306,7 @@ CURRENT->sector += CURRENT->current_nr_sectors; end_request(1); } - spin_unlock_irqrestore(&QUEUE->queue_lock, flags); + spin_unlock_irqrestore(QUEUE->queue_lock, flags); if (current_count_sectors && !QUEUE_EMPTY){ /* "unlock" last subsector */ @@ -2329,9 +2331,9 @@ DRWE->last_error_sector = CURRENT->sector; DRWE->last_error_generation = DRS->generation; } - spin_lock_irqsave(&QUEUE->queue_lock, flags); + spin_lock_irqsave(QUEUE->queue_lock, flags); end_request(0); - spin_unlock_irqrestore(&QUEUE->queue_lock, flags); + spin_unlock_irqrestore(QUEUE->queue_lock, flags); } } @@ -2433,17 +2435,20 @@ static int buffer_chain_size(void) { struct bio *bio; - int size; + struct bio_vec *bv; + int size, i; char *base; - base = CURRENT->buffer; + base = bio_data(CURRENT->bio); size = 0; rq_for_each_bio(bio, CURRENT) { - if (bio_data(bio) != base + size) - break; + bio_for_each_segment(bv, bio, i) { + if (page_address(bv->bv_page) + bv->bv_offset != base + size) + break; - size += bio->bi_size; + size += bv->bv_len; + } } return size >> 9; @@ -2469,9 +2474,10 @@ static void copy_buffer(int ssize, int max_sector, int max_sector_2) { int remaining; /* number of transferred 512-byte sectors */ + struct bio_vec *bv; struct bio *bio; char *buffer, *dma_buffer; - int size; + int size, i; max_sector = transfer_size(ssize, minimum(max_sector, max_sector_2), @@ -2501,12 +2507,17 @@ dma_buffer = floppy_track_buffer + ((fsector_t - buffer_min) << 9); - bio = CURRENT->bio; size = CURRENT->current_nr_sectors << 9; - buffer = CURRENT->buffer; - while (remaining > 0){ - SUPBOUND(size, remaining); + rq_for_each_bio(bio, CURRENT) { + bio_for_each_segment(bv, bio, i) { + if (!remaining) + break; + + size = bv->bv_len; + SUPBOUND(size, remaining); + + buffer = page_address(bv->bv_page) + bv->bv_offset; #ifdef FLOPPY_SANITY_CHECK if (dma_buffer + size > floppy_track_buffer + (max_buffer_sectors << 10) || @@ -2526,24 +2537,14 @@ if (((unsigned long)buffer) % 512) DPRINT("%p buffer not aligned\n", buffer); #endif - if (CT(COMMAND) == FD_READ) - memcpy(buffer, dma_buffer, size); - else - memcpy(dma_buffer, buffer, size); - remaining -= size; - if (!remaining) - break; + if (CT(COMMAND) == FD_READ) + memcpy(buffer, dma_buffer, size); + else + memcpy(dma_buffer, buffer, size); - dma_buffer += size; - bio = bio->bi_next; -#ifdef FLOPPY_SANITY_CHECK - if (!bio){ - DPRINT("bh=null in copy buffer after copy\n"); - break; + remaining -= size; + dma_buffer += size; } -#endif - size = bio->bi_size; - buffer = bio_data(bio); } #ifdef FLOPPY_SANITY_CHECK if (remaining){ @@ -4169,7 +4170,7 @@ blk_size[MAJOR_NR] = floppy_sizes; blksize_size[MAJOR_NR] = floppy_blocksizes; - blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST); + blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST, &floppy_lock); reschedule_timeout(MAXTIMEOUT, "floppy init", MAXTIMEOUT); config_types(); @@ -4477,6 +4478,7 @@ #else __setup ("floppy=", floppy_setup); +module_init(floppy_init) /* eject the boot floppy (if we need the drive for a different root floppy) */ /* This should only be called at boot time when we're sure that there's no diff -urN linux-2.5.1-pre10/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c --- linux-2.5.1-pre10/drivers/block/ll_rw_blk.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/ll_rw_blk.c Wed Dec 12 23:32:29 2001 @@ -254,6 +254,12 @@ q->seg_boundary_mask = mask; } +void blk_queue_assign_lock(request_queue_t *q, spinlock_t *lock) +{ + spin_lock_init(lock); + q->queue_lock = lock; +} + static char *rq_flags[] = { "REQ_RW", "REQ_RW_AHEAD", "REQ_BARRIER", "REQ_CMD", "REQ_NOMERGE", "REQ_STARTED", "REQ_DONTPREP", "REQ_DRIVE_CMD", "REQ_DRIVE_TASK", @@ -536,9 +542,9 @@ request_queue_t *q = (request_queue_t *) data; unsigned long flags; - spin_lock_irqsave(&q->queue_lock, flags); + spin_lock_irqsave(q->queue_lock, flags); __generic_unplug_device(q); - spin_unlock_irqrestore(&q->queue_lock, flags); + spin_unlock_irqrestore(q->queue_lock, flags); } static int __blk_cleanup_queue(struct request_list *list) @@ -624,7 +630,6 @@ init_waitqueue_head(&q->rq[READ].wait); init_waitqueue_head(&q->rq[WRITE].wait); - spin_lock_init(&q->queue_lock); return 0; nomem: blk_cleanup_queue(q); @@ -661,7 +666,7 @@ * blk_init_queue() must be paired with a blk_cleanup_queue() call * when the block device is deactivated (such as at module unload). **/ -int blk_init_queue(request_queue_t *q, request_fn_proc *rfn) +int blk_init_queue(request_queue_t *q, request_fn_proc *rfn, spinlock_t *lock) { int ret; @@ -682,6 +687,7 @@ q->plug_tq.routine = &generic_unplug_device; q->plug_tq.data = q; q->queue_flags = (1 << QUEUE_FLAG_CLUSTER); + q->queue_lock = lock; /* * by default assume old behaviour and bounce for any highmem page @@ -728,7 +734,7 @@ struct request_list *rl = &q->rq[rw]; struct request *rq; - spin_lock_prefetch(&q->queue_lock); + spin_lock_prefetch(q->queue_lock); generic_unplug_device(q); add_wait_queue(&rl->wait, &wait); @@ -736,9 +742,9 @@ set_current_state(TASK_UNINTERRUPTIBLE); if (rl->count < batch_requests) schedule(); - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); rq = get_request(q, rw); - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); } while (rq == NULL); remove_wait_queue(&rl->wait, &wait); current->state = TASK_RUNNING; @@ -949,9 +955,9 @@ { unsigned long flags; - spin_lock_irqsave(&q->queue_lock, flags); + spin_lock_irqsave(q->queue_lock, flags); __blk_attempt_remerge(q, rq); - spin_unlock_irqrestore(&q->queue_lock, flags); + spin_unlock_irqrestore(q->queue_lock, flags); } static int __make_request(request_queue_t *q, struct bio *bio) @@ -974,7 +980,7 @@ */ blk_queue_bounce(q, &bio); - spin_lock_prefetch(&q->queue_lock); + spin_lock_prefetch(q->queue_lock); latency = elevator_request_latency(elevator, rw); barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw); @@ -983,7 +989,7 @@ req = NULL; head = &q->queue_head; - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); insert_here = head->prev; if (blk_queue_empty(q) || barrier) { @@ -1066,7 +1072,7 @@ freereq = NULL; } else if ((req = get_request(q, rw)) == NULL) { - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); /* * READA bit set @@ -1111,7 +1117,7 @@ out: if (freereq) blkdev_release_request(freereq); - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); return 0; end_io: @@ -1608,3 +1614,4 @@ EXPORT_SYMBOL(blk_dump_rq_flags); EXPORT_SYMBOL(submit_bio); EXPORT_SYMBOL(blk_contig_segment); +EXPORT_SYMBOL(blk_queue_assign_lock); diff -urN linux-2.5.1-pre10/drivers/block/nbd.c linux/drivers/block/nbd.c --- linux-2.5.1-pre10/drivers/block/nbd.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/nbd.c Wed Dec 12 23:32:29 2001 @@ -62,6 +62,8 @@ static struct nbd_device nbd_dev[MAX_NBD]; static devfs_handle_t devfs_handle; +static spinlock_t nbd_lock; + #define DEBUG( s ) /* #define DEBUG( s ) printk( s ) */ @@ -347,22 +349,22 @@ #endif req->errors = 0; blkdev_dequeue_request(req); - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); down (&lo->queue_lock); list_add(&req->queuelist, &lo->queue_head); nbd_send_req(lo->sock, req); /* Why does this block? */ up (&lo->queue_lock); - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); continue; error_out: req->errors++; blkdev_dequeue_request(req); - spin_unlock(&q->queue_lock); + spin_unlock(q->queue_lock); nbd_end_request(req); - spin_lock(&q->queue_lock); + spin_lock(q->queue_lock); } return; } @@ -515,7 +517,7 @@ #endif blksize_size[MAJOR_NR] = nbd_blksizes; blk_size[MAJOR_NR] = nbd_sizes; - blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), do_nbd_request); + blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), do_nbd_request, &nbd_lock); for (i = 0; i < MAX_NBD; i++) { nbd_dev[i].refcnt = 0; nbd_dev[i].file = NULL; diff -urN linux-2.5.1-pre10/drivers/block/paride/pcd.c linux/drivers/block/paride/pcd.c --- linux-2.5.1-pre10/drivers/block/paride/pcd.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/paride/pcd.c Wed Dec 12 23:32:29 2001 @@ -146,6 +146,8 @@ #include +static spinlock_t pcd_lock; + #ifndef MODULE #include "setup.h" @@ -355,7 +357,7 @@ } } - blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST); + blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST, &pcd_lock); read_ahead[MAJOR_NR] = 8; /* 8 sector (4kB) read ahead */ for (i=0;iqueue_lock,saved_flags); + spin_lock_irqsave(&pcd_lock,saved_flags); pcd_busy = 0; end_request(0); do_pcd_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pcd_lock,saved_flags); return; } @@ -845,11 +847,11 @@ pcd_retries = 0; pcd_transfer(); if (!pcd_count) { - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pcd_lock,saved_flags); end_request(1); pcd_busy = 0; do_pcd_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pcd_lock,saved_flags); return; } @@ -868,19 +870,19 @@ pi_do_claimed(PI,pcd_start); return; } - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pcd_lock,saved_flags); pcd_busy = 0; pcd_bufblk = -1; end_request(0); do_pcd_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pcd_lock,saved_flags); return; } do_pcd_read(); - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pcd_lock,saved_flags); do_pcd_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pcd_lock,saved_flags); } /* the audio_ioctl stuff is adapted from sr_ioctl.c */ diff -urN linux-2.5.1-pre10/drivers/block/paride/pf.c linux/drivers/block/paride/pf.c --- linux-2.5.1-pre10/drivers/block/paride/pf.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/paride/pf.c Wed Dec 12 23:32:29 2001 @@ -164,6 +164,8 @@ #include +static spinlock_t pf_spin_lock; + #ifndef MODULE #include "setup.h" @@ -358,7 +360,7 @@ return -1; } q = BLK_DEFAULT_QUEUE(MAJOR_NR); - blk_init_queue(q, DEVICE_REQUEST); + blk_init_queue(q, DEVICE_REQUEST, &pf_spin_lock); blk_queue_max_segments(q, cluster); read_ahead[MAJOR_NR] = 8; /* 8 sector (4kB) read ahead */ @@ -876,9 +878,9 @@ { long saved_flags; - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(1); - if (!pf_run) { spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + if (!pf_run) { spin_unlock_irqrestore(&pf_spin_lock,saved_flags); return; } @@ -894,7 +896,7 @@ pf_count = CURRENT->current_nr_sectors; pf_buf = CURRENT->buffer; - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); } static void do_pf_read( void ) @@ -918,11 +920,11 @@ pi_do_claimed(PI,do_pf_read_start); return; } - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(0); pf_busy = 0; do_pf_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); return; } pf_mask = STAT_DRQ; @@ -944,11 +946,11 @@ pi_do_claimed(PI,do_pf_read_start); return; } - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(0); pf_busy = 0; do_pf_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); return; } pi_read_block(PI,pf_buf,512); @@ -959,11 +961,11 @@ if (!pf_count) pf_next_buf(unit); } pi_disconnect(PI); - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(1); pf_busy = 0; do_pf_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); } static void do_pf_write( void ) @@ -985,11 +987,11 @@ pi_do_claimed(PI,do_pf_write_start); return; } - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(0); pf_busy = 0; do_pf_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); return; } @@ -1002,11 +1004,11 @@ pi_do_claimed(PI,do_pf_write_start); return; } - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(0); pf_busy = 0; do_pf_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); return; } pi_write_block(PI,pf_buf,512); @@ -1032,19 +1034,19 @@ pi_do_claimed(PI,do_pf_write_start); return; } - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(0); pf_busy = 0; do_pf_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); return; } pi_disconnect(PI); - spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); + spin_lock_irqsave(&pf_spin_lock,saved_flags); end_request(1); pf_busy = 0; do_pf_request(NULL); - spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); + spin_unlock_irqrestore(&pf_spin_lock,saved_flags); } /* end of pf.c */ diff -urN linux-2.5.1-pre10/drivers/block/ps2esdi.c linux/drivers/block/ps2esdi.c --- linux-2.5.1-pre10/drivers/block/ps2esdi.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/ps2esdi.c Wed Dec 12 23:32:29 2001 @@ -189,6 +189,8 @@ return 0; } /* ps2esdi_init */ +module_init(ps2esdi_init); + #ifdef MODULE static int cyl[MAX_HD] = {-1,-1}; diff -urN linux-2.5.1-pre10/drivers/block/rd.c linux/drivers/block/rd.c --- linux-2.5.1-pre10/drivers/block/rd.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/block/rd.c Wed Dec 12 23:32:29 2001 @@ -44,9 +44,6 @@ #include #include -#include -#include -#include #include #include #include @@ -79,19 +76,10 @@ /* The RAM disk size is now a parameter */ #define NUM_RAMDISKS 16 /* This cannot be overridden (yet) */ -#ifndef MODULE -/* We don't have to load RAM disks or gunzip them in a module. */ -#define RD_LOADER -#define BUILD_CRAMDISK - -void rd_load(void); -static int crd_load(struct file *fp, struct file *outfp); - #ifdef CONFIG_BLK_DEV_INITRD static int initrd_users; static spinlock_t initrd_users_lock = SPIN_LOCK_UNLOCKED; #endif -#endif /* Various static variables go here. Most are used only in the RAM disk code. */ @@ -542,6 +530,8 @@ #ifdef CONFIG_BLK_DEV_INITRD /* We ought to separate initrd operations here */ register_disk(NULL, MKDEV(MAJOR_NR,INITRD_MINOR), 1, &rd_bd_op, rd_size<<1); + devfs_register(devfs_handle, "initrd", DEVFS_FL_DEFAULT, MAJOR_NR, + INITRD_MINOR, S_IFBLK | S_IRUSR, &rd_bd_op, NULL); #endif blksize_size[MAJOR_NR] = rd_blocksizes; /* Avoid set_blocksize() check */ @@ -565,462 +555,3 @@ MODULE_PARM_DESC(rd_blocksize, "Blocksize of each RAM disk in bytes."); MODULE_LICENSE("GPL"); - -/* End of non-loading portions of the RAM disk driver */ - -#ifdef RD_LOADER -/* - * This routine tries to find a RAM disk image to load, and returns the - * number of blocks to read for a non-compressed image, 0 if the image - * is a compressed image, and -1 if an image with the right magic - * numbers could not be found. - * - * We currently check for the following magic numbers: - * minix - * ext2 - * romfs - * gzip - */ -static int __init -identify_ramdisk_image(kdev_t device, struct file *fp, int start_block) -{ - const int size = 512; - struct minix_super_block *minixsb; - struct ext2_super_block *ext2sb; - struct romfs_super_block *romfsb; - int nblocks = -1; - unsigned char *buf; - - buf = kmalloc(size, GFP_KERNEL); - if (buf == 0) - return -1; - - minixsb = (struct minix_super_block *) buf; - ext2sb = (struct ext2_super_block *) buf; - romfsb = (struct romfs_super_block *) buf; - memset(buf, 0xe5, size); - - /* - * Read block 0 to test for gzipped kernel - */ - if (fp->f_op->llseek) - fp->f_op->llseek(fp, start_block * BLOCK_SIZE, 0); - fp->f_pos = start_block * BLOCK_SIZE; - - fp->f_op->read(fp, buf, size, &fp->f_pos); - - /* - * If it matches the gzip magic numbers, return -1 - */ - if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) { - printk(KERN_NOTICE - "RAMDISK: Compressed image found at block %d\n", - start_block); - nblocks = 0; - goto done; - } - - /* romfs is at block zero too */ - if (romfsb->word0 == ROMSB_WORD0 && - romfsb->word1 == ROMSB_WORD1) { - printk(KERN_NOTICE - "RAMDISK: romfs filesystem found at block %d\n", - start_block); - nblocks = (ntohl(romfsb->size)+BLOCK_SIZE-1)>>BLOCK_SIZE_BITS; - goto done; - } - - /* - * Read block 1 to test for minix and ext2 superblock - */ - if (fp->f_op->llseek) - fp->f_op->llseek(fp, (start_block+1) * BLOCK_SIZE, 0); - fp->f_pos = (start_block+1) * BLOCK_SIZE; - - fp->f_op->read(fp, buf, size, &fp->f_pos); - - /* Try minix */ - if (minixsb->s_magic == MINIX_SUPER_MAGIC || - minixsb->s_magic == MINIX_SUPER_MAGIC2) { - printk(KERN_NOTICE - "RAMDISK: Minix filesystem found at block %d\n", - start_block); - nblocks = minixsb->s_nzones << minixsb->s_log_zone_size; - goto done; - } - - /* Try ext2 */ - if (ext2sb->s_magic == cpu_to_le16(EXT2_SUPER_MAGIC)) { - printk(KERN_NOTICE - "RAMDISK: ext2 filesystem found at block %d\n", - start_block); - nblocks = le32_to_cpu(ext2sb->s_blocks_count); - goto done; - } - - printk(KERN_NOTICE - "RAMDISK: Couldn't find valid RAM disk image starting at %d.\n", - start_block); - -done: - if (fp->f_op->llseek) - fp->f_op->llseek(fp, start_block * BLOCK_SIZE, 0); - fp->f_pos = start_block * BLOCK_SIZE; - - kfree(buf); - return nblocks; -} - -/* - * This routine loads in the RAM disk image. - */ -static void __init rd_load_image(kdev_t device, int offset, int unit) -{ - struct inode *inode, *out_inode; - struct file infile, outfile; - struct dentry in_dentry, out_dentry; - mm_segment_t fs; - kdev_t ram_device; - int nblocks, i; - char *buf; - unsigned short rotate = 0; - unsigned short devblocks = 0; -#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES) - char rotator[4] = { '|' , '/' , '-' , '\\' }; -#endif - ram_device = MKDEV(MAJOR_NR, unit); - - if ((inode = get_empty_inode()) == NULL) - return; - memset(&infile, 0, sizeof(infile)); - memset(&in_dentry, 0, sizeof(in_dentry)); - infile.f_mode = 1; /* read only */ - infile.f_dentry = &in_dentry; - in_dentry.d_inode = inode; - infile.f_op = &def_blk_fops; - init_special_inode(inode, S_IFBLK | S_IRUSR, kdev_t_to_nr(device)); - - if ((out_inode = get_empty_inode()) == NULL) - goto free_inode; - memset(&outfile, 0, sizeof(outfile)); - memset(&out_dentry, 0, sizeof(out_dentry)); - outfile.f_mode = 3; /* read/write */ - outfile.f_dentry = &out_dentry; - out_dentry.d_inode = out_inode; - outfile.f_op = &def_blk_fops; - init_special_inode(out_inode, S_IFBLK | S_IRUSR | S_IWUSR, kdev_t_to_nr(ram_device)); - - if (blkdev_open(inode, &infile) != 0) { - iput(out_inode); - goto free_inode; - } - if (blkdev_open(out_inode, &outfile) != 0) - goto free_inodes; - - fs = get_fs(); - set_fs(KERNEL_DS); - - nblocks = identify_ramdisk_image(device, &infile, offset); - if (nblocks < 0) - goto done; - - if (nblocks == 0) { -#ifdef BUILD_CRAMDISK - if (crd_load(&infile, &outfile) == 0) - goto successful_load; -#else - printk(KERN_NOTICE - "RAMDISK: Kernel does not support compressed " - "RAM disk images\n"); -#endif - goto done; - } - - /* - * NOTE NOTE: nblocks suppose that the blocksize is BLOCK_SIZE, so - * rd_load_image will work only with filesystem BLOCK_SIZE wide! - * So make sure to use 1k blocksize while generating ext2fs - * ramdisk-images. - */ - if (nblocks > (rd_length[unit] >> BLOCK_SIZE_BITS)) { - printk("RAMDISK: image too big! (%d/%ld blocks)\n", - nblocks, rd_length[unit] >> BLOCK_SIZE_BITS); - goto done; - } - - /* - * OK, time to copy in the data - */ - buf = kmalloc(BLOCK_SIZE, GFP_KERNEL); - if (buf == 0) { - printk(KERN_ERR "RAMDISK: could not allocate buffer\n"); - goto done; - } - - if (blk_size[MAJOR(device)]) - devblocks = blk_size[MAJOR(device)][MINOR(device)]; - -#ifdef CONFIG_BLK_DEV_INITRD - if (MAJOR(device) == MAJOR_NR && MINOR(device) == INITRD_MINOR) - devblocks = nblocks; -#endif - - if (devblocks == 0) { - printk(KERN_ERR "RAMDISK: could not determine device size\n"); - goto done; - } - - printk(KERN_NOTICE "RAMDISK: Loading %d blocks [%d disk%s] into ram disk... ", - nblocks, ((nblocks-1)/devblocks)+1, nblocks>devblocks ? "s" : ""); - for (i=0; i < nblocks; i++) { - if (i && (i % devblocks == 0)) { - printk("done disk #%d.\n", i/devblocks); - rotate = 0; - if (infile.f_op->release(inode, &infile) != 0) { - printk("Error closing the disk.\n"); - goto noclose_input; - } - printk("Please insert disk #%d and press ENTER\n", i/devblocks+1); - wait_for_keypress(); - if (blkdev_open(inode, &infile) != 0) { - printk("Error opening disk.\n"); - goto noclose_input; - } - infile.f_pos = 0; - printk("Loading disk #%d... ", i/devblocks+1); - } - infile.f_op->read(&infile, buf, BLOCK_SIZE, &infile.f_pos); - outfile.f_op->write(&outfile, buf, BLOCK_SIZE, &outfile.f_pos); -#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES) - if (!(i % 16)) { - printk("%c\b", rotator[rotate & 0x3]); - rotate++; - } -#endif - } - printk("done.\n"); - kfree(buf); - -successful_load: - ROOT_DEV = MKDEV(MAJOR_NR, unit); - if (ROOT_DEVICE_NAME != NULL) strcpy (ROOT_DEVICE_NAME, "rd/0"); - -done: - infile.f_op->release(inode, &infile); -noclose_input: - blkdev_close(out_inode, &outfile); - iput(inode); - iput(out_inode); - set_fs(fs); - return; -free_inodes: /* free inodes on error */ - iput(out_inode); - infile.f_op->release(inode, &infile); -free_inode: - iput(inode); -} - -#ifdef CONFIG_MAC_FLOPPY -int swim3_fd_eject(int devnum); -#endif - -static void __init rd_load_disk(int n) -{ - - if (rd_doload == 0) - return; - - if (MAJOR(ROOT_DEV) != FLOPPY_MAJOR -#ifdef CONFIG_BLK_DEV_INITRD - && MAJOR(real_root_dev) != FLOPPY_MAJOR -#endif - ) - return; - - if (rd_prompt) { -#ifdef CONFIG_BLK_DEV_FD - floppy_eject(); -#endif -#ifdef CONFIG_MAC_FLOPPY - if(MAJOR(ROOT_DEV) == FLOPPY_MAJOR) - swim3_fd_eject(MINOR(ROOT_DEV)); - else if(MAJOR(real_root_dev) == FLOPPY_MAJOR) - swim3_fd_eject(MINOR(real_root_dev)); -#endif - printk(KERN_NOTICE - "VFS: Insert root floppy disk to be loaded into RAM disk and press ENTER\n"); - wait_for_keypress(); - } - - rd_load_image(ROOT_DEV,rd_image_start, n); - -} - -void __init rd_load(void) -{ - rd_load_disk(0); -} - -void __init rd_load_secondary(void) -{ - rd_load_disk(1); -} - -#ifdef CONFIG_BLK_DEV_INITRD -void __init initrd_load(void) -{ - rd_load_image(MKDEV(MAJOR_NR, INITRD_MINOR),rd_image_start,0); -} -#endif - -#endif /* RD_LOADER */ - -#ifdef BUILD_CRAMDISK - -/* - * gzip declarations - */ - -#define OF(args) args - -#ifndef memzero -#define memzero(s, n) memset ((s), 0, (n)) -#endif - -typedef unsigned char uch; -typedef unsigned short ush; -typedef unsigned long ulg; - -#define INBUFSIZ 4096 -#define WSIZE 0x8000 /* window size--must be a power of two, and */ - /* at least 32K for zip's deflate method */ - -static uch *inbuf; -static uch *window; - -static unsigned insize; /* valid bytes in inbuf */ -static unsigned inptr; /* index of next byte to be processed in inbuf */ -static unsigned outcnt; /* bytes in output buffer */ -static int exit_code; -static long bytes_out; -static struct file *crd_infp, *crd_outfp; - -#define get_byte() (inptr < insize ? inbuf[inptr++] : fill_inbuf()) - -/* Diagnostic functions (stubbed out) */ -#define Assert(cond,msg) -#define Trace(x) -#define Tracev(x) -#define Tracevv(x) -#define Tracec(c,x) -#define Tracecv(c,x) - -#define STATIC static - -static int fill_inbuf(void); -static void flush_window(void); -static void *malloc(int size); -static void free(void *where); -static void error(char *m); -static void gzip_mark(void **); -static void gzip_release(void **); - -#include "../../lib/inflate.c" - -static void __init *malloc(int size) -{ - return kmalloc(size, GFP_KERNEL); -} - -static void __init free(void *where) -{ - kfree(where); -} - -static void __init gzip_mark(void **ptr) -{ -} - -static void __init gzip_release(void **ptr) -{ -} - - -/* =========================================================================== - * Fill the input buffer. This is called only when the buffer is empty - * and at least one byte is really needed. - */ -static int __init fill_inbuf(void) -{ - if (exit_code) return -1; - - insize = crd_infp->f_op->read(crd_infp, inbuf, INBUFSIZ, - &crd_infp->f_pos); - if (insize == 0) return -1; - - inptr = 1; - - return inbuf[0]; -} - -/* =========================================================================== - * Write the output window window[0..outcnt-1] and update crc and bytes_out. - * (Used for the decompressed data only.) - */ -static void __init flush_window(void) -{ - ulg c = crc; /* temporary variable */ - unsigned n; - uch *in, ch; - - crd_outfp->f_op->write(crd_outfp, window, outcnt, &crd_outfp->f_pos); - in = window; - for (n = 0; n < outcnt; n++) { - ch = *in++; - c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8); - } - crc = c; - bytes_out += (ulg)outcnt; - outcnt = 0; -} - -static void __init error(char *x) -{ - printk(KERN_ERR "%s", x); - exit_code = 1; -} - -static int __init -crd_load(struct file * fp, struct file *outfp) -{ - int result; - - insize = 0; /* valid bytes in inbuf */ - inptr = 0; /* index of next byte to be processed in inbuf */ - outcnt = 0; /* bytes in output buffer */ - exit_code = 0; - bytes_out = 0; - crc = (ulg)0xffffffffL; /* shift register contents */ - - crd_infp = fp; - crd_outfp = outfp; - inbuf = kmalloc(INBUFSIZ, GFP_KERNEL); - if (inbuf == 0) { - printk(KERN_ERR "RAMDISK: Couldn't allocate gzip buffer\n"); - return -1; - } - window = kmalloc(WSIZE, GFP_KERNEL); - if (window == 0) { - printk(KERN_ERR "RAMDISK: Couldn't allocate gzip window\n"); - kfree(inbuf); - return -1; - } - makecrc(); - result = gunzip(); - kfree(inbuf); - kfree(window); - return result; -} - -#endif /* BUILD_CRAMDISK */ - diff -urN linux-2.5.1-pre10/drivers/ide/ide-probe.c linux/drivers/ide/ide-probe.c --- linux-2.5.1-pre10/drivers/ide/ide-probe.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/ide/ide-probe.c Wed Dec 12 23:32:29 2001 @@ -597,7 +597,7 @@ int max_sectors; q->queuedata = HWGROUP(drive); - blk_init_queue(q, do_ide_request); + blk_init_queue(q, do_ide_request, &ide_lock); blk_queue_segment_boundary(q, 0xffff); /* IDE can do up to 128K per request, pdc4030 needs smaller limit */ diff -urN linux-2.5.1-pre10/drivers/ide/ide.c linux/drivers/ide/ide.c --- linux-2.5.1-pre10/drivers/ide/ide.c Wed Dec 12 23:32:26 2001 +++ linux/drivers/ide/ide.c Wed Dec 12 23:32:29 2001 @@ -177,8 +177,6 @@ /* * protects global structures etc, we want to split this into per-hwgroup * instead. - * - * anti-deadlock ordering: ide_lock -> DRIVE_LOCK */ spinlock_t ide_lock __cacheline_aligned = SPIN_LOCK_UNLOCKED; @@ -583,11 +581,9 @@ if (!end_that_request_first(rq, uptodate, nr_secs)) { add_blkdev_randomness(MAJOR(rq->rq_dev)); - spin_lock(DRIVE_LOCK(drive)); blkdev_dequeue_request(rq); hwgroup->rq = NULL; end_that_request_last(rq); - spin_unlock(DRIVE_LOCK(drive)); ret = 0; } @@ -900,11 +896,9 @@ } } - spin_lock(DRIVE_LOCK(drive)); blkdev_dequeue_request(rq); HWGROUP(drive)->rq = NULL; end_that_request_last(rq); - spin_unlock(DRIVE_LOCK(drive)); spin_unlock_irqrestore(&ide_lock, flags); } @@ -1368,7 +1362,7 @@ /* * Issue a new request to a drive from hwgroup - * Caller must have already done spin_lock_irqsave(DRIVE_LOCK(drive), ...) + * Caller must have already done spin_lock_irqsave(&ide_lock, ...) * * A hwgroup is a serialized group of IDE interfaces. Usually there is * exactly one hwif (interface) per hwgroup, but buggy controllers (eg. CMD640) @@ -1456,9 +1450,7 @@ /* * just continuing an interrupted request maybe */ - spin_lock(DRIVE_LOCK(drive)); rq = hwgroup->rq = elv_next_request(&drive->queue); - spin_unlock(DRIVE_LOCK(drive)); /* * Some systems have trouble with IDE IRQs arriving while @@ -1496,19 +1488,7 @@ */ void do_ide_request(request_queue_t *q) { - unsigned long flags; - - /* - * release queue lock, grab IDE global lock and restore when - * we leave... - */ - spin_unlock(&q->queue_lock); - - spin_lock_irqsave(&ide_lock, flags); ide_do_request(q->queuedata, 0); - spin_unlock_irqrestore(&ide_lock, flags); - - spin_lock(&q->queue_lock); } /* @@ -1875,7 +1855,6 @@ if (action == ide_wait) rq->waiting = &wait; spin_lock_irqsave(&ide_lock, flags); - spin_lock(DRIVE_LOCK(drive)); if (blk_queue_empty(&drive->queue) || action == ide_preempt) { if (action == ide_preempt) hwgroup->rq = NULL; @@ -1886,7 +1865,6 @@ queue_head = queue_head->next; } q->elevator.elevator_add_req_fn(q, rq, queue_head); - spin_unlock(DRIVE_LOCK(drive)); ide_do_request(hwgroup, 0); spin_unlock_irqrestore(&ide_lock, flags); if (action == ide_wait) { diff -urN linux-2.5.1-pre10/drivers/md/linear.c linux/drivers/md/linear.c --- linux-2.5.1-pre10/drivers/md/linear.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/md/linear.c Wed Dec 12 23:32:29 2001 @@ -189,7 +189,7 @@ status: linear_status, }; -static int md__init linear_init (void) +static int __init linear_init (void) { return register_md_personality (LINEAR, &linear_personality); } diff -urN linux-2.5.1-pre10/drivers/md/md.c linux/drivers/md/md.c --- linux-2.5.1-pre10/drivers/md/md.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/md/md.c Wed Dec 12 23:32:29 2001 @@ -130,7 +130,7 @@ /* * Enables to iterate over all existing md arrays */ -static MD_LIST_HEAD(all_mddevs); +static LIST_HEAD(all_mddevs); /* * The mapping between kdev and mddev is not necessary a simple @@ -201,8 +201,8 @@ init_MUTEX(&mddev->reconfig_sem); init_MUTEX(&mddev->recovery_sem); init_MUTEX(&mddev->resync_sem); - MD_INIT_LIST_HEAD(&mddev->disks); - MD_INIT_LIST_HEAD(&mddev->all_mddevs); + INIT_LIST_HEAD(&mddev->disks); + INIT_LIST_HEAD(&mddev->all_mddevs); atomic_set(&mddev->active, 0); /* @@ -211,7 +211,7 @@ * if necessary. */ add_mddev_mapping(mddev, dev, 0); - md_list_add(&mddev->all_mddevs, &all_mddevs); + list_add(&mddev->all_mddevs, &all_mddevs); MOD_INC_USE_COUNT; @@ -221,7 +221,7 @@ mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) { mdk_rdev_t * rdev; - struct md_list_head *tmp; + struct list_head *tmp; ITERATE_RDEV(mddev,rdev,tmp) { if (rdev->desc_nr == nr) @@ -232,7 +232,7 @@ mdk_rdev_t * find_rdev(mddev_t * mddev, kdev_t dev) { - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; ITERATE_RDEV(mddev,rdev,tmp) { @@ -242,17 +242,17 @@ return NULL; } -static MD_LIST_HEAD(device_names); +static LIST_HEAD(device_names); char * partition_name(kdev_t dev) { struct gendisk *hd; static char nomem [] = ""; dev_name_t *dname; - struct md_list_head *tmp = device_names.next; + struct list_head *tmp = device_names.next; while (tmp != &device_names) { - dname = md_list_entry(tmp, dev_name_t, list); + dname = list_entry(tmp, dev_name_t, list); if (dname->dev == dev) return dname->name; tmp = tmp->next; @@ -275,8 +275,8 @@ } dname->dev = dev; - MD_INIT_LIST_HEAD(&dname->list); - md_list_add(&dname->list, &device_names); + INIT_LIST_HEAD(&dname->list); + list_add(&dname->list, &device_names); return dname->name; } @@ -311,7 +311,7 @@ { unsigned int mask; mdk_rdev_t * rdev; - struct md_list_head *tmp; + struct list_head *tmp; if (!mddev->sb) { MD_BUG(); @@ -341,7 +341,7 @@ { int i, c; mdk_rdev_t *rdev; - struct md_list_head *tmp; + struct list_head *tmp; /* * First, all devices must be fully functional @@ -435,7 +435,7 @@ mddev->sb = (mdp_super_t *) __get_free_page (GFP_KERNEL); if (!mddev->sb) return -ENOMEM; - md_clear_page(mddev->sb); + clear_page(mddev->sb); return 0; } @@ -449,7 +449,7 @@ printk(OUT_OF_MEM); return -EINVAL; } - md_clear_page(rdev->sb); + clear_page(rdev->sb); return 0; } @@ -564,7 +564,7 @@ static mdk_rdev_t * match_dev_unit(mddev_t *mddev, kdev_t dev) { - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; ITERATE_RDEV(mddev,rdev,tmp) @@ -576,7 +576,7 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) { - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; ITERATE_RDEV(mddev1,rdev,tmp) @@ -586,8 +586,8 @@ return 0; } -static MD_LIST_HEAD(all_raid_disks); -static MD_LIST_HEAD(pending_raid_disks); +static LIST_HEAD(all_raid_disks); +static LIST_HEAD(pending_raid_disks); static void bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) { @@ -605,7 +605,7 @@ mdidx(mddev), partition_name(rdev->dev), partition_name(same_pdev->dev)); - md_list_add(&rdev->same_set, &mddev->disks); + list_add(&rdev->same_set, &mddev->disks); rdev->mddev = mddev; mddev->nb_dev++; printk(KERN_INFO "md: bind<%s,%d>\n", partition_name(rdev->dev), mddev->nb_dev); @@ -617,8 +617,8 @@ MD_BUG(); return; } - md_list_del(&rdev->same_set); - MD_INIT_LIST_HEAD(&rdev->same_set); + list_del(&rdev->same_set); + INIT_LIST_HEAD(&rdev->same_set); rdev->mddev->nb_dev--; printk(KERN_INFO "md: unbind<%s,%d>\n", partition_name(rdev->dev), rdev->mddev->nb_dev); @@ -664,13 +664,13 @@ MD_BUG(); unlock_rdev(rdev); free_disk_sb(rdev); - md_list_del(&rdev->all); - MD_INIT_LIST_HEAD(&rdev->all); + list_del(&rdev->all); + INIT_LIST_HEAD(&rdev->all); if (rdev->pending.next != &rdev->pending) { printk(KERN_INFO "md: (%s was pending)\n", partition_name(rdev->dev)); - md_list_del(&rdev->pending); - MD_INIT_LIST_HEAD(&rdev->pending); + list_del(&rdev->pending); + INIT_LIST_HEAD(&rdev->pending); } #ifndef MODULE md_autodetect_dev(rdev->dev); @@ -688,7 +688,7 @@ static void export_array(mddev_t *mddev) { - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; mdp_super_t *sb = mddev->sb; @@ -723,14 +723,14 @@ * Make sure nobody else is using this mddev * (careful, we rely on the global kernel lock here) */ - while (md_atomic_read(&mddev->resync_sem.count) != 1) + while (atomic_read(&mddev->resync_sem.count) != 1) schedule(); - while (md_atomic_read(&mddev->recovery_sem.count) != 1) + while (atomic_read(&mddev->recovery_sem.count) != 1) schedule(); del_mddev_mapping(mddev, MKDEV(MD_MAJOR, mdidx(mddev))); - md_list_del(&mddev->all_mddevs); - MD_INIT_LIST_HEAD(&mddev->all_mddevs); + list_del(&mddev->all_mddevs); + INIT_LIST_HEAD(&mddev->all_mddevs); kfree(mddev); MOD_DEC_USE_COUNT; } @@ -793,7 +793,7 @@ void md_print_devices(void) { - struct md_list_head *tmp, *tmp2; + struct list_head *tmp, *tmp2; mdk_rdev_t *rdev; mddev_t *mddev; @@ -871,12 +871,12 @@ static mdk_rdev_t * find_rdev_all(kdev_t dev) { - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; tmp = all_raid_disks.next; while (tmp != &all_raid_disks) { - rdev = md_list_entry(tmp, mdk_rdev_t, all); + rdev = list_entry(tmp, mdk_rdev_t, all); if (rdev->dev == dev) return rdev; tmp = tmp->next; @@ -980,7 +980,7 @@ { mdk_rdev_t *rdev; mdp_super_t *sb; - struct md_list_head *tmp; + struct list_head *tmp; ITERATE_RDEV(mddev,rdev,tmp) { if (rdev->faulty || rdev->alias_device) @@ -996,15 +996,15 @@ int md_update_sb(mddev_t * mddev) { int err, count = 100; - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; repeat: mddev->sb->utime = CURRENT_TIME; - if ((++mddev->sb->events_lo)==0) + if (!(++mddev->sb->events_lo)) ++mddev->sb->events_hi; - if ((mddev->sb->events_lo|mddev->sb->events_hi)==0) { + if (!(mddev->sb->events_lo | mddev->sb->events_hi)) { /* * oops, this 64-bit counter should never wrap. * Either we are in around ~1 trillion A.C., assuming @@ -1128,8 +1128,8 @@ rdev->desc_nr = -1; } } - md_list_add(&rdev->all, &all_raid_disks); - MD_INIT_LIST_HEAD(&rdev->pending); + list_add(&rdev->all, &all_raid_disks); + INIT_LIST_HEAD(&rdev->pending); if (rdev->faulty && rdev->sb) free_disk_sb(rdev); @@ -1167,7 +1167,7 @@ static int analyze_sbs(mddev_t * mddev) { int out_of_date = 0, i, first; - struct md_list_head *tmp, *tmp2; + struct list_head *tmp, *tmp2; mdk_rdev_t *rdev, *rdev2, *freshest; mdp_super_t *sb; @@ -1225,7 +1225,7 @@ */ if (calc_sb_csum(rdev->sb) != rdev->sb->sb_csum) { if (rdev->sb->events_lo || rdev->sb->events_hi) - if ((rdev->sb->events_lo--)==0) + if (!(rdev->sb->events_lo--)) rdev->sb->events_hi--; } @@ -1513,7 +1513,7 @@ int data_disks = 0, persistent; unsigned int readahead; mdp_super_t *sb = mddev->sb; - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; /* @@ -1572,7 +1572,7 @@ md_size[mdidx(mddev)] = sb->size * data_disks; readahead = MD_READAHEAD; - if ((sb->level == 0) || (sb->level == 4) || (sb->level == 5)) { + if (!sb->level || (sb->level == 4) || (sb->level == 5)) { readahead = (mddev->sb->chunk_size>>PAGE_SHIFT) * 4 * data_disks; if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2) readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2; @@ -1608,7 +1608,7 @@ { int pnum, err; int chunk_size; - struct md_list_head *tmp; + struct list_head *tmp; mdk_rdev_t *rdev; @@ -1873,7 +1873,7 @@ static void autorun_array(mddev_t *mddev) { mdk_rdev_t *rdev; - struct md_list_head *tmp; + struct list_head *tmp; int err; if (mddev->disks.prev == &mddev->disks) { @@ -1913,8 +1913,8 @@ */ static void autorun_devices(kdev_t countdev) { - struct md_list_head candidates; - struct md_list_head *tmp; + struct list_head candidates; + struct list_head *tmp; mdk_rdev_t *rdev0, *rdev; mddev_t *mddev; kdev_t md_kdev; @@ -1922,11 +1922,11 @@ printk(KERN_INFO "md: autorun ...\n"); while (pending_raid_disks.next != &pending_raid_disks) { - rdev0 = md_list_entry(pending_raid_disks.next, + rdev0 = list_entry(pending_raid_disks.next, mdk_rdev_t, pending); printk(KERN_INFO "md: considering %s ...\n", partition_name(rdev0->dev)); - MD_INIT_LIST_HEAD(&candidates); + INIT_LIST_HEAD(&candidates); ITERATE_RDEV_PENDING(rdev,tmp) { if (uuid_equal(rdev0, rdev)) { if (!sb_equal(rdev0->sb, rdev->sb)) { @@ -1936,8 +1936,8 @@ continue; } printk(KERN_INFO "md: adding %s ...\n", partition_name(rdev->dev)); - md_list_del(&rdev->pending); - md_list_add(&rdev->pending, &candidates); + list_del(&rdev->pending); + list_add(&rdev->pending, &candidates); } } /* @@ -1964,8 +1964,8 @@ printk(KERN_INFO "md: created md%d\n", mdidx(mddev)); ITERATE_RDEV_GENERIC(candidates,pending,rdev,tmp) { bind_rdev_to_array(rdev, mddev); - md_list_del(&rdev->pending); - MD_INIT_LIST_HEAD(&rdev->pending); + list_del(&rdev->pending); + INIT_LIST_HEAD(&rdev->pending); } autorun_array(mddev); } @@ -2025,7 +2025,7 @@ partition_name(startdev)); goto abort; } - md_list_add(&start_rdev->pending, &pending_raid_disks); + list_add(&start_rdev->pending, &pending_raid_disks); sb = start_rdev->sb; @@ -2058,7 +2058,7 @@ MD_BUG(); goto abort; } - md_list_add(&rdev->pending, &pending_raid_disks); + list_add(&rdev->pending, &pending_raid_disks); } /* @@ -2091,7 +2091,7 @@ ver.minor = MD_MINOR_VERSION; ver.patchlevel = MD_PATCHLEVEL_VERSION; - if (md_copy_to_user(arg, &ver, sizeof(ver))) + if (copy_to_user(arg, &ver, sizeof(ver))) return -EFAULT; return 0; @@ -2128,7 +2128,7 @@ SET_FROM_SB(layout); SET_FROM_SB(chunk_size); - if (md_copy_to_user(arg, &info, sizeof(info))) + if (copy_to_user(arg, &info, sizeof(info))) return -EFAULT; return 0; @@ -2144,7 +2144,7 @@ if (!mddev->sb) return -EINVAL; - if (md_copy_from_user(&info, arg, sizeof(info))) + if (copy_from_user(&info, arg, sizeof(info))) return -EFAULT; nr = info.number; @@ -2156,7 +2156,7 @@ SET_FROM_SB(raid_disk); SET_FROM_SB(state); - if (md_copy_to_user(arg, &info, sizeof(info))) + if (copy_to_user(arg, &info, sizeof(info))) return -EFAULT; return 0; @@ -2191,7 +2191,7 @@ return -EINVAL; } if (mddev->nb_dev) { - mdk_rdev_t *rdev0 = md_list_entry(mddev->disks.next, + mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, mdk_rdev_t, same_set); if (!uuid_equal(rdev0, rdev)) { printk(KERN_WARNING "md: %s has different UUID to %s\n", @@ -2223,7 +2223,7 @@ SET_SB(raid_disk); SET_SB(state); - if ((info->state & (1<state & (1<i_rdev; @@ -2604,12 +2604,12 @@ MD_BUG(); goto abort; } - err = md_put_user(md_hd_struct[minor].nr_sects, + err = put_user(md_hd_struct[minor].nr_sects, (unsigned long *) arg); goto done; case BLKGETSIZE64: /* Return device size */ - err = md_put_user((u64)md_hd_struct[minor].nr_sects << 9, + err = put_user((u64)md_hd_struct[minor].nr_sects << 9, (u64 *) arg); goto done; @@ -2618,7 +2618,7 @@ case BLKFLSBUF: case BLKBSZGET: case BLKBSZSET: - err = blk_ioctl (dev, cmd, arg); + err = blk_ioctl(dev, cmd, arg); goto abort; default:; @@ -2670,7 +2670,7 @@ } if (arg) { mdu_array_info_t info; - if (md_copy_from_user(&info, (void*)arg, sizeof(info))) { + if (copy_from_user(&info, (void*)arg, sizeof(info))) { err = -EFAULT; goto abort_unlock; } @@ -2753,17 +2753,17 @@ err = -EINVAL; goto abort_unlock; } - err = md_put_user (2, (char *) &loc->heads); + err = put_user (2, (char *) &loc->heads); if (err) goto abort_unlock; - err = md_put_user (4, (char *) &loc->sectors); + err = put_user (4, (char *) &loc->sectors); if (err) goto abort_unlock; - err = md_put_user (md_hd_struct[mdidx(mddev)].nr_sects/8, + err = put_user (md_hd_struct[mdidx(mddev)].nr_sects/8, (short *) &loc->cylinders); if (err) goto abort_unlock; - err = md_put_user (get_start_sect(dev), + err = put_user (get_start_sect(dev), (long *) &loc->start); goto done_unlock; } @@ -2787,7 +2787,7 @@ case ADD_NEW_DISK: { mdu_disk_info_t info; - if (md_copy_from_user(&info, (void*)arg, sizeof(info))) + if (copy_from_user(&info, (void*)arg, sizeof(info))) err = -EFAULT; else err = add_new_disk(mddev, &info); @@ -2828,7 +2828,7 @@ { /* The data is never used.... mdu_param_t param; - err = md_copy_from_user(¶m, (mdu_param_t *)arg, + err = copy_from_user(¶m, (mdu_param_t *)arg, sizeof(param)); if (err) goto abort_unlock; @@ -2887,7 +2887,7 @@ return 0; } -static struct block_device_operations md_fops= +static struct block_device_operations md_fops = { owner: THIS_MODULE, open: md_open, @@ -2896,11 +2896,18 @@ }; +static inline void flush_curr_signals(void) +{ + spin_lock(¤t->sigmask_lock); + flush_signals(current); + spin_unlock(¤t->sigmask_lock); +} + int md_thread(void * arg) { mdk_thread_t *thread = arg; - md_lock_kernel(); + lock_kernel(); /* * Detach thread @@ -2909,8 +2916,9 @@ daemonize(); sprintf(current->comm, thread->name); - md_init_signals(); - md_flush_signals(); + current->exit_signal = SIGCHLD; + siginitsetinv(¤t->blocked, sigmask(SIGKILL)); + flush_curr_signals(); thread->tsk = current; /* @@ -2926,7 +2934,7 @@ */ current->policy = SCHED_OTHER; current->nice = -20; - md_unlock_kernel(); + unlock_kernel(); complete(thread->event); while (thread->run) { @@ -2949,8 +2957,8 @@ run(thread->data); run_task_queue(&tq_disk); } - if (md_signal_pending(current)) - md_flush_signals(); + if (signal_pending(current)) + flush_curr_signals(); } complete(thread->event); return 0; @@ -2976,7 +2984,7 @@ return NULL; memset(thread, 0, sizeof(mdk_thread_t)); - md_init_waitqueue_head(&thread->wqueue); + init_waitqueue_head(&thread->wqueue); init_completion(&event); thread->event = &event; @@ -3064,7 +3072,7 @@ { int sz = 0, i = 0; mdk_rdev_t *rdev; - struct md_list_head *tmp; + struct list_head *tmp; sz += sprintf(page + sz, "unused devices: "); @@ -3150,7 +3158,7 @@ int count, int *eof, void *data) { int sz = 0, j, size; - struct md_list_head *tmp, *tmp2; + struct list_head *tmp, *tmp2; mdk_rdev_t *rdev; mddev_t *mddev; @@ -3207,7 +3215,7 @@ if (mddev->curr_resync) { sz += status_resync (page+sz, mddev); } else { - if (md_atomic_read(&mddev->resync_sem.count) != 1) + if (atomic_read(&mddev->resync_sem.count) != 1) sz += sprintf(page + sz, " resync=DELAYED"); } sz += sprintf(page + sz, "\n"); @@ -3251,7 +3259,7 @@ mdp_super_t *sb = mddev->sb; mdp_disk_t *disk; mdk_rdev_t *rdev; - struct md_list_head *tmp; + struct list_head *tmp; ITERATE_RDEV(mddev,rdev,tmp) { if (rdev->faulty) @@ -3288,7 +3296,7 @@ static int is_mddev_idle(mddev_t *mddev) { mdk_rdev_t * rdev; - struct md_list_head *tmp; + struct list_head *tmp; int idle; unsigned long curr_events; @@ -3311,7 +3319,7 @@ return idle; } -MD_DECLARE_WAIT_QUEUE_HEAD(resync_wait); +DECLARE_WAIT_QUEUE_HEAD(resync_wait); void md_done_sync(mddev_t *mddev, int blocks, int ok) { @@ -3333,7 +3341,7 @@ unsigned long mark[SYNC_MARKS]; unsigned long mark_cnt[SYNC_MARKS]; int last_mark,m; - struct md_list_head *tmp; + struct list_head *tmp; unsigned long last_check; @@ -3356,8 +3364,8 @@ } if (serialize) { interruptible_sleep_on(&resync_wait); - if (md_signal_pending(current)) { - md_flush_signals(); + if (signal_pending(current)) { + flush_curr_signals(); err = -EINTR; goto out; } @@ -3365,8 +3373,7 @@ } mddev->curr_resync = 1; - - max_sectors = mddev->sb->size<<1; + max_sectors = mddev->sb->size << 1; printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev)); printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed: %d KB/sec/disc.\n", @@ -3403,7 +3410,6 @@ int sectors; sectors = mddev->pers->sync_request(mddev, j); - if (sectors < 0) { err = sectors; goto out; @@ -3432,13 +3438,13 @@ } - if (md_signal_pending(current)) { + if (signal_pending(current)) { /* * got a signal, exit. */ mddev->curr_resync = 0; printk(KERN_INFO "md: md_do_sync() got signal ... exiting\n"); - md_flush_signals(); + flush_curr_signals(); err = -EINTR; goto out; } @@ -3451,7 +3457,7 @@ * about not overloading the IO subsystem. (things like an * e2fsck being done on the RAID array should execute fast) */ - if (md_need_resched(current)) + if (current->need_resched) schedule(); currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1; @@ -3462,7 +3468,7 @@ if ((currspeed > sysctl_speed_limit_max) || !is_mddev_idle(mddev)) { current->state = TASK_INTERRUPTIBLE; - md_schedule_timeout(HZ/4); + schedule_timeout(HZ/4); goto repeat; } } else @@ -3474,7 +3480,7 @@ * this also signals 'finished resyncing' to md_stop */ out: - wait_event(mddev->recovery_wait, atomic_read(&mddev->recovery_active)==0); + wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); up(&mddev->resync_sem); out_nolock: mddev->curr_resync = 0; @@ -3497,7 +3503,7 @@ mddev_t *mddev; mdp_super_t *sb; mdp_disk_t *spare; - struct md_list_head *tmp; + struct list_head *tmp; printk(KERN_INFO "md: recovery thread got woken up ...\n"); restart: @@ -3581,13 +3587,13 @@ int md_notify_reboot(struct notifier_block *this, unsigned long code, void *x) { - struct md_list_head *tmp; + struct list_head *tmp; mddev_t *mddev; - if ((code == MD_SYS_DOWN) || (code == MD_SYS_HALT) - || (code == MD_SYS_POWER_OFF)) { + if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { printk(KERN_INFO "md: stopping all md devices.\n"); + return NOTIFY_DONE; ITERATE_MDDEV(mddev,tmp) do_md_stop (mddev, 1); @@ -3597,7 +3603,7 @@ * right place to handle this issue is the given * driver, we do want to have a safe RAID driver ... */ - md_mdelay(1000*1); + mdelay(1000*1); } return NOTIFY_DONE; } @@ -3628,7 +3634,7 @@ #endif } -int md__init md_init(void) +int __init md_init(void) { static char * name = "mdrecoveryd"; int minor; @@ -3665,7 +3671,7 @@ printk(KERN_ALERT "md: bug: couldn't allocate md_recovery_thread\n"); - md_register_reboot_notifier(&md_notifier); + register_reboot_notifier(&md_notifier); raid_table_header = register_sysctl_table(raid_root_table, 1); md_geninit(); @@ -3687,7 +3693,7 @@ struct { int set; int noautodetect; -} raid_setup_args md__initdata; +} raid_setup_args __initdata; /* * Searches all registered partitions for autorun RAID arrays @@ -3730,7 +3736,7 @@ MD_BUG(); continue; } - md_list_add(&rdev->pending, &pending_raid_disks); + list_add(&rdev->pending, &pending_raid_disks); } dev_cnt = 0; @@ -3742,7 +3748,7 @@ int pers[MAX_MD_DEVS]; int chunk[MAX_MD_DEVS]; char *device_names[MAX_MD_DEVS]; -} md_setup_args md__initdata; +} md_setup_args __initdata; /* * Parse the command-line parameters given our kernel, but do not @@ -3764,7 +3770,7 @@ * Shifted name_to_kdev_t() and related operations to md_set_drive() * for later execution. Rewrote section to make devfs compatible. */ -static int md__init md_setup(char *str) +static int __init md_setup(char *str) { int minor, level, factor, fault; char *pername = ""; @@ -3783,7 +3789,7 @@ } switch (get_option(&str, &level)) { /* RAID Personality */ case 2: /* could be 0 or -1.. */ - if (level == 0 || level == -1) { + if (!level || level == -1) { if (get_option(&str, &factor) != 2 || /* Chunk Size */ get_option(&str, &fault) != 2) { printk(KERN_WARNING "md: Too few arguments supplied to md=.\n"); @@ -3825,8 +3831,8 @@ return 1; } -extern kdev_t name_to_kdev_t(char *line) md__init; -void md__init md_setup_drive(void) +extern kdev_t name_to_kdev_t(char *line) __init; +void __init md_setup_drive(void) { int minor, i; kdev_t dev; @@ -3838,7 +3844,8 @@ char *devname; mdu_disk_info_t dinfo; - if ((devname = md_setup_args.device_names[minor]) == 0) continue; + if (!(devname = md_setup_args.device_names[minor])) + continue; for (i = 0; i < MD_SB_DISKS && devname != 0; i++) { @@ -3857,7 +3864,7 @@ devfs_get_maj_min(handle, &major, &minor); dev = MKDEV(major, minor); } - if (dev == 0) { + if (!dev) { printk(KERN_WARNING "md: Unknown device name: %s\n", devname); break; } @@ -3869,7 +3876,7 @@ } devices[i] = 0; - if (md_setup_args.device_set[minor] == 0) + if (!md_setup_args.device_set[minor]) continue; if (mddev_map[minor].mddev) { @@ -3933,7 +3940,7 @@ } } -static int md__init raid_setup(char *str) +static int __init raid_setup(char *str) { int len, pos; @@ -3947,7 +3954,7 @@ wlen = (comma-str)-pos; else wlen = (len-1)-pos; - if (strncmp(str, "noautodetect", wlen) == 0) + if (!strncmp(str, "noautodetect", wlen)) raid_setup_args.noautodetect = 1; pos += wlen+1; } @@ -3955,7 +3962,7 @@ return 1; } -int md__init md_run_setup(void) +int __init md_run_setup(void) { if (raid_setup_args.noautodetect) printk(KERN_INFO "md: Skipping autodetection of RAID arrays. (raid=noautodetect)\n"); @@ -4008,23 +4015,23 @@ } #endif -MD_EXPORT_SYMBOL(md_size); -MD_EXPORT_SYMBOL(register_md_personality); -MD_EXPORT_SYMBOL(unregister_md_personality); -MD_EXPORT_SYMBOL(partition_name); -MD_EXPORT_SYMBOL(md_error); -MD_EXPORT_SYMBOL(md_do_sync); -MD_EXPORT_SYMBOL(md_sync_acct); -MD_EXPORT_SYMBOL(md_done_sync); -MD_EXPORT_SYMBOL(md_recover_arrays); -MD_EXPORT_SYMBOL(md_register_thread); -MD_EXPORT_SYMBOL(md_unregister_thread); -MD_EXPORT_SYMBOL(md_update_sb); -MD_EXPORT_SYMBOL(md_wakeup_thread); -MD_EXPORT_SYMBOL(md_print_devices); -MD_EXPORT_SYMBOL(find_rdev_nr); -MD_EXPORT_SYMBOL(md_interrupt_thread); -MD_EXPORT_SYMBOL(mddev_map); -MD_EXPORT_SYMBOL(md_check_ordering); -MD_EXPORT_SYMBOL(get_spare); +EXPORT_SYMBOL(md_size); +EXPORT_SYMBOL(register_md_personality); +EXPORT_SYMBOL(unregister_md_personality); +EXPORT_SYMBOL(partition_name); +EXPORT_SYMBOL(md_error); +EXPORT_SYMBOL(md_do_sync); +EXPORT_SYMBOL(md_sync_acct); +EXPORT_SYMBOL(md_done_sync); +EXPORT_SYMBOL(md_recover_arrays); +EXPORT_SYMBOL(md_register_thread); +EXPORT_SYMBOL(md_unregister_thread); +EXPORT_SYMBOL(md_update_sb); +EXPORT_SYMBOL(md_wakeup_thread); +EXPORT_SYMBOL(md_print_devices); +EXPORT_SYMBOL(find_rdev_nr); +EXPORT_SYMBOL(md_interrupt_thread); +EXPORT_SYMBOL(mddev_map); +EXPORT_SYMBOL(md_check_ordering); +EXPORT_SYMBOL(get_spare); diff -urN linux-2.5.1-pre10/drivers/md/raid0.c linux/drivers/md/raid0.c --- linux-2.5.1-pre10/drivers/md/raid0.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/md/raid0.c Wed Dec 12 23:32:29 2001 @@ -334,7 +334,7 @@ status: raid0_status, }; -static int md__init raid0_init (void) +static int __init raid0_init (void) { return register_md_personality (RAID0, &raid0_personality); } diff -urN linux-2.5.1-pre10/drivers/md/raid1.c linux/drivers/md/raid1.c --- linux-2.5.1-pre10/drivers/md/raid1.c Wed Oct 17 14:21:00 2001 +++ linux/drivers/md/raid1.c Wed Dec 12 23:32:29 2001 @@ -1,7 +1,7 @@ /* * raid1.c : Multiple Devices driver for Linux * - * Copyright (C) 1999, 2000 Ingo Molnar, Red Hat + * Copyright (C) 1999, 2000, 2001 Ingo Molnar, Red Hat * * Copyright (C) 1996, 1997, 1998 Ingo Molnar, Miguel de Icaza, Gadi Oxman * @@ -22,330 +22,208 @@ * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#include -#include #include -#include #define MAJOR_NR MD_MAJOR #define MD_DRIVER #define MD_PERSONALITY #define MAX_WORK_PER_DISK 128 - -#define NR_RESERVED_BUFS 32 - - /* - * The following can be used to debug the driver + * Number of guaranteed r1bios in case of extreme VM load: */ -#define RAID1_DEBUG 0 - -#if RAID1_DEBUG -#define PRINTK(x...) printk(x) -#define inline -#define __inline__ -#else -#define PRINTK(x...) do { } while (0) -#endif - +#define NR_RAID1_BIOS 256 static mdk_personality_t raid1_personality; -static md_spinlock_t retry_list_lock = MD_SPIN_LOCK_UNLOCKED; -struct raid1_bh *raid1_retry_list = NULL, **raid1_retry_tail; +static spinlock_t retry_list_lock = SPIN_LOCK_UNLOCKED; +static LIST_HEAD(retry_list_head); -static struct buffer_head *raid1_alloc_bh(raid1_conf_t *conf, int cnt) +static inline void check_all_w_bios_empty(r1bio_t *r1_bio) { - /* return a linked list of "cnt" struct buffer_heads. - * don't take any off the free list unless we know we can - * get all we need, otherwise we could deadlock - */ - struct buffer_head *bh=NULL; - - while(cnt) { - struct buffer_head *t; - md_spin_lock_irq(&conf->device_lock); - if (!conf->freebh_blocked && conf->freebh_cnt >= cnt) - while (cnt) { - t = conf->freebh; - conf->freebh = t->b_next; - t->b_next = bh; - bh = t; - t->b_state = 0; - conf->freebh_cnt--; - cnt--; - } - md_spin_unlock_irq(&conf->device_lock); - if (cnt == 0) - break; - t = kmem_cache_alloc(bh_cachep, SLAB_NOIO); - if (t) { - t->b_next = bh; - bh = t; - cnt--; - } else { - PRINTK("raid1: waiting for %d bh\n", cnt); - conf->freebh_blocked = 1; - wait_disk_event(conf->wait_buffer, - !conf->freebh_blocked || - conf->freebh_cnt > conf->raid_disks * NR_RESERVED_BUFS/2); - conf->freebh_blocked = 0; - } - } - return bh; + int i; + + return; + for (i = 0; i < MD_SB_DISKS; i++) + if (r1_bio->write_bios[i]) + BUG(); } -static inline void raid1_free_bh(raid1_conf_t *conf, struct buffer_head *bh) +static inline void check_all_bios_empty(r1bio_t *r1_bio) { - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - while (bh) { - struct buffer_head *t = bh; - bh=bh->b_next; - if (t->b_pprev == NULL) - kmem_cache_free(bh_cachep, t); - else { - t->b_next= conf->freebh; - conf->freebh = t; - conf->freebh_cnt++; - } - } - spin_unlock_irqrestore(&conf->device_lock, flags); - wake_up(&conf->wait_buffer); + return; + if (r1_bio->read_bio) + BUG(); + check_all_w_bios_empty(r1_bio); } -static int raid1_grow_bh(raid1_conf_t *conf, int cnt) +static void * r1bio_pool_alloc(int gfp_flags, void *data) { - /* allocate cnt buffer_heads, possibly less if kmalloc fails */ - int i = 0; + r1bio_t *r1_bio; - while (i < cnt) { - struct buffer_head *bh; - bh = kmem_cache_alloc(bh_cachep, SLAB_KERNEL); - if (!bh) break; + r1_bio = kmalloc(sizeof(r1bio_t), gfp_flags); + if (r1_bio) + memset(r1_bio, 0, sizeof(*r1_bio)); - md_spin_lock_irq(&conf->device_lock); - bh->b_pprev = &conf->freebh; - bh->b_next = conf->freebh; - conf->freebh = bh; - conf->freebh_cnt++; - md_spin_unlock_irq(&conf->device_lock); - - i++; - } - return i; + return r1_bio; } -static void raid1_shrink_bh(raid1_conf_t *conf) +static void r1bio_pool_free(void *r1_bio, void *data) { - /* discard all buffer_heads */ - - md_spin_lock_irq(&conf->device_lock); - while (conf->freebh) { - struct buffer_head *bh = conf->freebh; - conf->freebh = bh->b_next; - kmem_cache_free(bh_cachep, bh); - conf->freebh_cnt--; - } - md_spin_unlock_irq(&conf->device_lock); + check_all_bios_empty(r1_bio); + kfree(r1_bio); } - -static struct raid1_bh *raid1_alloc_r1bh(raid1_conf_t *conf) +#define RESYNC_BLOCK_SIZE (64*1024) +#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) +#define RESYNC_WINDOW (2048*1024) + +static void * r1buf_pool_alloc(int gfp_flags, void *data) { - struct raid1_bh *r1_bh = NULL; + conf_t *conf = data; + struct page *page; + r1bio_t *r1_bio; + struct bio *bio; + int i, j; - do { - md_spin_lock_irq(&conf->device_lock); - if (!conf->freer1_blocked && conf->freer1) { - r1_bh = conf->freer1; - conf->freer1 = r1_bh->next_r1; - conf->freer1_cnt--; - r1_bh->next_r1 = NULL; - r1_bh->state = (1 << R1BH_PreAlloc); - r1_bh->bh_req.b_state = 0; - } - md_spin_unlock_irq(&conf->device_lock); - if (r1_bh) - return r1_bh; - r1_bh = (struct raid1_bh *) kmalloc(sizeof(struct raid1_bh), GFP_NOIO); - if (r1_bh) { - memset(r1_bh, 0, sizeof(*r1_bh)); - return r1_bh; - } - conf->freer1_blocked = 1; - wait_disk_event(conf->wait_buffer, - !conf->freer1_blocked || - conf->freer1_cnt > NR_RESERVED_BUFS/2 - ); - conf->freer1_blocked = 0; - } while (1); -} - -static inline void raid1_free_r1bh(struct raid1_bh *r1_bh) -{ - struct buffer_head *bh = r1_bh->mirror_bh_list; - raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev); - - r1_bh->mirror_bh_list = NULL; - - if (test_bit(R1BH_PreAlloc, &r1_bh->state)) { - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - r1_bh->next_r1 = conf->freer1; - conf->freer1 = r1_bh; - conf->freer1_cnt++; - spin_unlock_irqrestore(&conf->device_lock, flags); - /* don't need to wakeup wait_buffer because - * raid1_free_bh below will do that - */ - } else { - kfree(r1_bh); - } - raid1_free_bh(conf, bh); -} + r1_bio = mempool_alloc(conf->r1bio_pool, gfp_flags); + check_all_bios_empty(r1_bio); -static int raid1_grow_r1bh (raid1_conf_t *conf, int cnt) -{ - int i = 0; + bio = bio_alloc(gfp_flags, RESYNC_PAGES); + if (!bio) + goto out_free_r1_bio; - while (i < cnt) { - struct raid1_bh *r1_bh; - r1_bh = (struct raid1_bh*)kmalloc(sizeof(*r1_bh), GFP_KERNEL); - if (!r1_bh) - break; - memset(r1_bh, 0, sizeof(*r1_bh)); - set_bit(R1BH_PreAlloc, &r1_bh->state); - r1_bh->mddev = conf->mddev; + for (i = 0; i < RESYNC_PAGES; i++) { + page = alloc_page(gfp_flags); + if (unlikely(!page)) + goto out_free_pages; - raid1_free_r1bh(r1_bh); - i++; + bio->bi_io_vec[i].bv_page = page; + bio->bi_io_vec[i].bv_len = PAGE_SIZE; + bio->bi_io_vec[i].bv_offset = 0; } - return i; -} -static void raid1_shrink_r1bh(raid1_conf_t *conf) -{ - md_spin_lock_irq(&conf->device_lock); - while (conf->freer1) { - struct raid1_bh *r1_bh = conf->freer1; - conf->freer1 = r1_bh->next_r1; - conf->freer1_cnt--; - kfree(r1_bh); - } - md_spin_unlock_irq(&conf->device_lock); -} + /* + * Allocate a single data page for this iovec. + */ + bio->bi_vcnt = RESYNC_PAGES; + bio->bi_idx = 0; + bio->bi_size = RESYNC_BLOCK_SIZE; + bio->bi_end_io = NULL; + atomic_set(&bio->bi_cnt, 1); + r1_bio->master_bio = bio; + return r1_bio; -static inline void raid1_free_buf(struct raid1_bh *r1_bh) -{ - unsigned long flags; - struct buffer_head *bh = r1_bh->mirror_bh_list; - raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev); - r1_bh->mirror_bh_list = NULL; - - spin_lock_irqsave(&conf->device_lock, flags); - r1_bh->next_r1 = conf->freebuf; - conf->freebuf = r1_bh; - spin_unlock_irqrestore(&conf->device_lock, flags); - raid1_free_bh(conf, bh); +out_free_pages: + for (j = 0; j < i; j++) + __free_page(bio->bi_io_vec[j].bv_page); + bio_put(bio); +out_free_r1_bio: + mempool_free(r1_bio, conf->r1bio_pool); + return NULL; } -static struct raid1_bh *raid1_alloc_buf(raid1_conf_t *conf) +static void r1buf_pool_free(void *__r1_bio, void *data) { - struct raid1_bh *r1_bh; - - md_spin_lock_irq(&conf->device_lock); - wait_event_lock_irq(conf->wait_buffer, conf->freebuf, conf->device_lock); - r1_bh = conf->freebuf; - conf->freebuf = r1_bh->next_r1; - r1_bh->next_r1= NULL; - md_spin_unlock_irq(&conf->device_lock); + int i; + conf_t *conf = data; + r1bio_t *r1bio = __r1_bio; + struct bio *bio = r1bio->master_bio; - return r1_bh; + check_all_bios_empty(r1bio); + if (atomic_read(&bio->bi_cnt) != 1) + BUG(); + for (i = 0; i < RESYNC_PAGES; i++) { + __free_page(bio->bi_io_vec[i].bv_page); + bio->bi_io_vec[i].bv_page = NULL; + } + if (atomic_read(&bio->bi_cnt) != 1) + BUG(); + bio_put(bio); + mempool_free(r1bio, conf->r1bio_pool); } -static int raid1_grow_buffers (raid1_conf_t *conf, int cnt) +static void put_all_bios(conf_t *conf, r1bio_t *r1_bio) { - int i = 0; - - md_spin_lock_irq(&conf->device_lock); - while (i < cnt) { - struct raid1_bh *r1_bh; - struct page *page; - - page = alloc_page(GFP_KERNEL); - if (!page) - break; + int i; - r1_bh = (struct raid1_bh *) kmalloc(sizeof(*r1_bh), GFP_KERNEL); - if (!r1_bh) { - __free_page(page); - break; + if (r1_bio->read_bio) { + if (atomic_read(&r1_bio->read_bio->bi_cnt) != 1) + BUG(); + bio_put(r1_bio->read_bio); + r1_bio->read_bio = NULL; + } + for (i = 0; i < MD_SB_DISKS; i++) { + struct bio **bio = r1_bio->write_bios + i; + if (*bio) { + if (atomic_read(&(*bio)->bi_cnt) != 1) + BUG(); + bio_put(*bio); } - memset(r1_bh, 0, sizeof(*r1_bh)); - r1_bh->bh_req.b_page = page; - r1_bh->bh_req.b_data = page_address(page); - r1_bh->next_r1 = conf->freebuf; - conf->freebuf = r1_bh; - i++; + *bio = NULL; } - md_spin_unlock_irq(&conf->device_lock); - return i; + check_all_bios_empty(r1_bio); } -static void raid1_shrink_buffers (raid1_conf_t *conf) +static inline void free_r1bio(r1bio_t *r1_bio) { - md_spin_lock_irq(&conf->device_lock); - while (conf->freebuf) { - struct raid1_bh *r1_bh = conf->freebuf; - conf->freebuf = r1_bh->next_r1; - __free_page(r1_bh->bh_req.b_page); - kfree(r1_bh); - } - md_spin_unlock_irq(&conf->device_lock); + conf_t *conf = mddev_to_conf(r1_bio->mddev); + + put_all_bios(conf, r1_bio); + mempool_free(r1_bio, conf->r1bio_pool); } -static int raid1_map (mddev_t *mddev, kdev_t *rdev) +static inline void put_buf(r1bio_t *r1_bio) { - raid1_conf_t *conf = mddev_to_conf(mddev); + conf_t *conf = mddev_to_conf(r1_bio->mddev); + struct bio *bio = r1_bio->master_bio; + + /* + * undo any possible partial request fixup magic: + */ + if (bio->bi_size != RESYNC_BLOCK_SIZE) + bio->bi_io_vec[bio->bi_vcnt-1].bv_len = PAGE_SIZE; + put_all_bios(conf, r1_bio); + mempool_free(r1_bio, conf->r1buf_pool); +} + +static int map(mddev_t *mddev, kdev_t *rdev) +{ + conf_t *conf = mddev_to_conf(mddev); int i, disks = MD_SB_DISKS; /* - * Later we do read balancing on the read side + * Later we do read balancing on the read side * now we use the first available disk. */ for (i = 0; i < disks; i++) { if (conf->mirrors[i].operational) { *rdev = conf->mirrors[i].dev; - return (0); + return 0; } } printk (KERN_ERR "raid1_map(): huh, no more operational devices?\n"); - return (-1); + return -1; } -static void raid1_reschedule_retry (struct raid1_bh *r1_bh) +static void reschedule_retry(r1bio_t *r1_bio) { unsigned long flags; - mddev_t *mddev = r1_bh->mddev; - raid1_conf_t *conf = mddev_to_conf(mddev); + mddev_t *mddev = r1_bio->mddev; + conf_t *conf = mddev_to_conf(mddev); + + spin_lock_irqsave(&retry_list_lock, flags); + list_add(&r1_bio->retry_list, &retry_list_head); + spin_unlock_irqrestore(&retry_list_lock, flags); - md_spin_lock_irqsave(&retry_list_lock, flags); - if (raid1_retry_list == NULL) - raid1_retry_tail = &raid1_retry_list; - *raid1_retry_tail = r1_bh; - raid1_retry_tail = &r1_bh->next_r1; - r1_bh->next_r1 = NULL; - md_spin_unlock_irqrestore(&retry_list_lock, flags); md_wakeup_thread(conf->thread); } -static void inline io_request_done(unsigned long sector, raid1_conf_t *conf, int phase) +static void inline raid_request_done(unsigned long sector, conf_t *conf, int phase) { unsigned long flags; spin_lock_irqsave(&conf->segment_lock, flags); @@ -359,9 +237,10 @@ spin_unlock_irqrestore(&conf->segment_lock, flags); } -static void inline sync_request_done (unsigned long sector, raid1_conf_t *conf) +static void inline sync_request_done(sector_t sector, conf_t *conf) { unsigned long flags; + spin_lock_irqsave(&conf->segment_lock, flags); if (sector >= conf->start_ready) --conf->cnt_ready; @@ -375,73 +254,80 @@ } /* - * raid1_end_bh_io() is called when we have finished servicing a mirrored + * raid_end_bio_io() is called when we have finished servicing a mirrored * operation and are ready to return a success/failure code to the buffer * cache layer. */ -static void raid1_end_bh_io (struct raid1_bh *r1_bh, int uptodate) +static int raid_end_bio_io(r1bio_t *r1_bio, int uptodate, int nr_sectors) { - struct buffer_head *bh = r1_bh->master_bh; + struct bio *bio = r1_bio->master_bio; + + raid_request_done(bio->bi_sector, mddev_to_conf(r1_bio->mddev), + test_bit(R1BIO_SyncPhase, &r1_bio->state)); - io_request_done(bh->b_rsector, mddev_to_conf(r1_bh->mddev), - test_bit(R1BH_SyncPhase, &r1_bh->state)); + bio_endio(bio, uptodate, nr_sectors); + free_r1bio(r1_bio); - bh->b_end_io(bh, uptodate); - raid1_free_r1bh(r1_bh); + return 0; } -void raid1_end_request (struct buffer_head *bh, int uptodate) + +static int end_request(struct bio *bio, int nr_sectors) { - struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private); + int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); + r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private); /* * this branch is our 'one mirror IO has finished' event handler: */ if (!uptodate) - md_error (r1_bh->mddev, bh->b_dev); + md_error(r1_bio->mddev, bio->bi_dev); else /* - * Set R1BH_Uptodate in our master buffer_head, so that + * Set R1BIO_Uptodate in our master bio, so that * we will return a good error code for to the higher * levels even if IO on some other mirrored buffer fails. * - * The 'master' represents the complex operation to + * The 'master' represents the complex operation to * user-side. So if something waits for IO, then it will - * wait for the 'master' buffer_head. + * wait for the 'master' bio. */ - set_bit (R1BH_Uptodate, &r1_bh->state); + set_bit(R1BIO_Uptodate, &r1_bio->state); /* - * We split up the read and write side, imho they are + * We split up the read and write side, imho they are * conceptually different. */ - if ( (r1_bh->cmd == READ) || (r1_bh->cmd == READA) ) { + if ((r1_bio->cmd == READ) || (r1_bio->cmd == READA)) { + if (!r1_bio->read_bio) + BUG(); /* - * we have only one buffer_head on the read side + * we have only one bio on the read side */ - if (uptodate) { - raid1_end_bh_io(r1_bh, uptodate); - return; + raid_end_bio_io(r1_bio, uptodate, nr_sectors); + return 0; } /* * oops, read error: */ - printk(KERN_ERR "raid1: %s: rescheduling block %lu\n", - partition_name(bh->b_dev), bh->b_blocknr); - raid1_reschedule_retry(r1_bh); - return; + printk(KERN_ERR "raid1: %s: rescheduling sector %lu\n", + partition_name(bio->bi_dev), r1_bio->sector); + reschedule_retry(r1_bio); + return 0; } + if (r1_bio->read_bio) + BUG(); /* * WRITE: * - * Let's see if all mirrored write operations have finished + * Let's see if all mirrored write operations have finished * already. */ - - if (atomic_dec_and_test(&r1_bh->remaining)) - raid1_end_bh_io(r1_bh, test_bit(R1BH_Uptodate, &r1_bh->state)); + if (atomic_dec_and_test(&r1_bio->remaining)) + raid_end_bio_io(r1_bio, uptodate, nr_sectors); + return 0; } /* @@ -456,22 +342,20 @@ * reads should be somehow balanced. */ -static int raid1_read_balance (raid1_conf_t *conf, struct buffer_head *bh) +static int read_balance(conf_t *conf, struct bio *bio, r1bio_t *r1_bio) { - int new_disk = conf->last_used; - const int sectors = bh->b_size >> 9; - const unsigned long this_sector = bh->b_rsector; - int disk = new_disk; - unsigned long new_distance; - unsigned long current_distance; - + const int sectors = bio->bi_size >> 9; + const unsigned long this_sector = r1_bio->sector; + unsigned long new_distance, current_distance; + int new_disk = conf->last_used, disk = new_disk; + /* * Check if it is sane at all to balance */ - + if (conf->resync_mirrors) goto rb_out; - + /* make sure that disk is operational */ while( !conf->mirrors[new_disk].operational) { @@ -483,7 +367,7 @@ * Nothing much to do, lets not change anything * and hope for the best... */ - + new_disk = conf->last_used; goto rb_out; @@ -491,53 +375,51 @@ } disk = new_disk; /* now disk == new_disk == starting point for search */ - + /* * Don't touch anything for sequential reads. */ - if (this_sector == conf->mirrors[new_disk].head_position) goto rb_out; - + /* * If reads have been done only on a single disk * for a time, lets give another disk a change. * This is for kicking those idling disks so that * they would find work near some hotspot. */ - if (conf->sect_count >= conf->mirrors[new_disk].sect_limit) { conf->sect_count = 0; do { - if (new_disk<=0) + if (new_disk <= 0) new_disk = conf->raid_disks; new_disk--; if (new_disk == disk) break; } while ((conf->mirrors[new_disk].write_only) || - (!conf->mirrors[new_disk].operational)); + (!conf->mirrors[new_disk].operational)); goto rb_out; } - + current_distance = abs(this_sector - conf->mirrors[disk].head_position); - + /* Find the disk which is closest */ - + do { if (disk <= 0) disk = conf->raid_disks; disk--; - + if ((conf->mirrors[disk].write_only) || (!conf->mirrors[disk].operational)) continue; - + new_distance = abs(this_sector - conf->mirrors[disk].head_position); - + if (new_distance < current_distance) { conf->sect_count = 0; current_distance = new_distance; @@ -554,69 +436,73 @@ return new_disk; } -static int raid1_make_request (mddev_t *mddev, int rw, - struct buffer_head * bh) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - struct buffer_head *bh_req, *bhl; - struct raid1_bh * r1_bh; - int disks = MD_SB_DISKS; - int i, sum_bhs = 0; - struct mirror_info *mirror; - - if (!buffer_locked(bh)) - BUG(); - /* - * make_request() can abort the operation when READA is being - * used and no empty request is available. - * - * Currently, just replace the command with READ/WRITE. + * Wait if the reconstruction state machine puts up a bar for + * new requests in this sector range: */ - if (rw == READA) - rw = READ; - - r1_bh = raid1_alloc_r1bh (conf); - +static inline void new_request(conf_t *conf, r1bio_t *r1_bio) +{ spin_lock_irq(&conf->segment_lock); wait_event_lock_irq(conf->wait_done, - bh->b_rsector < conf->start_active || - bh->b_rsector >= conf->start_future, + r1_bio->sector < conf->start_active || + r1_bio->sector >= conf->start_future, conf->segment_lock); - if (bh->b_rsector < conf->start_active) + if (r1_bio->sector < conf->start_active) conf->cnt_done++; else { conf->cnt_future++; if (conf->phase) - set_bit(R1BH_SyncPhase, &r1_bh->state); + set_bit(R1BIO_SyncPhase, &r1_bio->state); } spin_unlock_irq(&conf->segment_lock); - +} + +static int make_request(mddev_t *mddev, int rw, struct bio * bio) +{ + conf_t *conf = mddev_to_conf(mddev); + mirror_info_t *mirror; + r1bio_t *r1_bio; + struct bio *read_bio; + int i, sum_bios = 0, disks = MD_SB_DISKS; + /* - * i think the read and write branch should be separated completely, - * since we want to do read balancing on the read side for example. - * Alternative implementations? :) --mingo + * make_request() can abort the operation when READA is being + * used and no empty request is available. + * + * Currently, just replace the command with READ. */ + if (rw == READA) + rw = READ; + + r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO); + check_all_bios_empty(r1_bio); - r1_bh->master_bh = bh; - r1_bh->mddev = mddev; - r1_bh->cmd = rw; + r1_bio->master_bio = bio; + + r1_bio->mddev = mddev; + r1_bio->sector = bio->bi_sector; + r1_bio->cmd = rw; + + new_request(conf, r1_bio); if (rw == READ) { /* * read balancing logic: */ - mirror = conf->mirrors + raid1_read_balance(conf, bh); + mirror = conf->mirrors + read_balance(conf, bio, r1_bio); + + read_bio = bio_clone(bio, GFP_NOIO); + if (r1_bio->read_bio) + BUG(); + r1_bio->read_bio = read_bio; + + read_bio->bi_sector = r1_bio->sector; + read_bio->bi_dev = mirror->dev; + read_bio->bi_end_io = end_request; + read_bio->bi_rw = rw; + read_bio->bi_private = r1_bio; - bh_req = &r1_bh->bh_req; - memcpy(bh_req, bh, sizeof(*bh)); - bh_req->b_blocknr = bh->b_rsector; - bh_req->b_dev = mirror->dev; - bh_req->b_rdev = mirror->dev; - /* bh_req->b_rsector = bh->n_rsector; */ - bh_req->b_end_io = raid1_end_request; - bh_req->b_private = r1_bh; - generic_make_request (rw, bh_req); + generic_make_request(read_bio); return 0; } @@ -624,62 +510,35 @@ * WRITE: */ - bhl = raid1_alloc_bh(conf, conf->raid_disks); + check_all_w_bios_empty(r1_bio); + for (i = 0; i < disks; i++) { - struct buffer_head *mbh; - if (!conf->mirrors[i].operational) + struct bio *mbio; + if (!conf->mirrors[i].operational) continue; - - /* - * We should use a private pool (size depending on NR_REQUEST), - * to avoid writes filling up the memory with bhs - * - * Such pools are much faster than kmalloc anyways (so we waste - * almost nothing by not using the master bh when writing and - * win alot of cleanness) but for now we are cool enough. --mingo - * - * It's safe to sleep here, buffer heads cannot be used in a shared - * manner in the write branch. Look how we lock the buffer at the - * beginning of this function to grok the difference ;) - */ - mbh = bhl; - if (mbh == NULL) { - MD_BUG(); - break; - } - bhl = mbh->b_next; - mbh->b_next = NULL; - mbh->b_this_page = (struct buffer_head *)1; - - /* - * prepare mirrored mbh (fields ordered for max mem throughput): - */ - mbh->b_blocknr = bh->b_rsector; - mbh->b_dev = conf->mirrors[i].dev; - mbh->b_rdev = conf->mirrors[i].dev; - mbh->b_rsector = bh->b_rsector; - mbh->b_state = (1<b_count, 1); - mbh->b_size = bh->b_size; - mbh->b_page = bh->b_page; - mbh->b_data = bh->b_data; - mbh->b_list = BUF_LOCKED; - mbh->b_end_io = raid1_end_request; - mbh->b_private = r1_bh; - - mbh->b_next = r1_bh->mirror_bh_list; - r1_bh->mirror_bh_list = mbh; - sum_bhs++; - } - if (bhl) raid1_free_bh(conf,bhl); - if (!sum_bhs) { - /* Gag - all mirrors non-operational.. */ - raid1_end_bh_io(r1_bh, 0); + + mbio = bio_clone(bio, GFP_NOIO); + if (r1_bio->write_bios[i]) + BUG(); + r1_bio->write_bios[i] = mbio; + + mbio->bi_sector = r1_bio->sector; + mbio->bi_dev = conf->mirrors[i].dev; + mbio->bi_end_io = end_request; + mbio->bi_rw = rw; + mbio->bi_private = r1_bio; + + sum_bios++; + } + if (!sum_bios) { + /* + * If all mirrors are non-operational + * then return an IO error: + */ + raid_end_bio_io(r1_bio, 0, 0); return 0; } - md_atomic_set(&r1_bh->remaining, sum_bhs); + atomic_set(&r1_bio->remaining, sum_bios); /* * We have to be a bit careful about the semaphore above, thats @@ -688,28 +547,30 @@ * safer solution. Imagine, end_request decreasing the semaphore * before we could have set it up ... We could play tricks with * the semaphore (presetting it and correcting at the end if - * sum_bhs is not 'n' but we have to do end_request by hand if + * sum_bios is not 'n' but we have to do end_request by hand if * all requests finish until we had a chance to set up the * semaphore correctly ... lots of races). */ - bh = r1_bh->mirror_bh_list; - while(bh) { - struct buffer_head *bh2 = bh; - bh = bh->b_next; - generic_make_request(rw, bh2); + for (i = 0; i < disks; i++) { + struct bio *mbio; + mbio = r1_bio->write_bios[i]; + if (!mbio) + continue; + + generic_make_request(mbio); } - return (0); + return 0; } -static int raid1_status (char *page, mddev_t *mddev) +static int status(char *page, mddev_t *mddev) { - raid1_conf_t *conf = mddev_to_conf(mddev); + conf_t *conf = mddev_to_conf(mddev); int sz = 0, i; - - sz += sprintf (page+sz, " [%d/%d] [", conf->raid_disks, - conf->working_disks); + + sz += sprintf(page+sz, " [%d/%d] [", conf->raid_disks, + conf->working_disks); for (i = 0; i < conf->raid_disks; i++) - sz += sprintf (page+sz, "%s", + sz += sprintf(page+sz, "%s", conf->mirrors[i].operational ? "U" : "_"); sz += sprintf (page+sz, "]"); return sz; @@ -731,10 +592,10 @@ #define ALREADY_SYNCING KERN_INFO \ "raid1: syncing already in progress.\n" -static void mark_disk_bad (mddev_t *mddev, int failed) +static void mark_disk_bad(mddev_t *mddev, int failed) { - raid1_conf_t *conf = mddev_to_conf(mddev); - struct mirror_info *mirror = conf->mirrors+failed; + conf_t *conf = mddev_to_conf(mddev); + mirror_info_t *mirror = conf->mirrors+failed; mdp_super_t *sb = mddev->sb; mirror->operational = 0; @@ -749,37 +610,36 @@ md_wakeup_thread(conf->thread); if (!mirror->write_only) conf->working_disks--; - printk (DISK_FAILED, partition_name (mirror->dev), - conf->working_disks); + printk(DISK_FAILED, partition_name(mirror->dev), + conf->working_disks); } -static int raid1_error (mddev_t *mddev, kdev_t dev) +static int error(mddev_t *mddev, kdev_t dev) { - raid1_conf_t *conf = mddev_to_conf(mddev); - struct mirror_info * mirrors = conf->mirrors; + conf_t *conf = mddev_to_conf(mddev); + mirror_info_t * mirrors = conf->mirrors; int disks = MD_SB_DISKS; int i; - /* Find the drive. + /* + * Find the drive. * If it is not operational, then we have already marked it as dead * else if it is the last working disks, ignore the error, let the * next level up know. * else mark the drive as failed */ - for (i = 0; i < disks; i++) - if (mirrors[i].dev==dev && mirrors[i].operational) + if (mirrors[i].dev == dev && mirrors[i].operational) break; if (i == disks) return 0; - if (i < conf->raid_disks && conf->working_disks == 1) { - /* Don't fail the drive, act as though we were just a + if (i < conf->raid_disks && conf->working_disks == 1) + /* + * Don't fail the drive, act as though we were just a * normal single drive */ - return 1; - } mark_disk_bad(mddev, i); return 0; } @@ -790,41 +650,42 @@ #undef START_SYNCING -static void print_raid1_conf (raid1_conf_t *conf) +static void print_conf(conf_t *conf) { int i; - struct mirror_info *tmp; + mirror_info_t *tmp; printk("RAID1 conf printout:\n"); if (!conf) { - printk("(conf==NULL)\n"); + printk("(!conf)\n"); return; } printk(" --- wd:%d rd:%d nd:%d\n", conf->working_disks, - conf->raid_disks, conf->nr_disks); + conf->raid_disks, conf->nr_disks); for (i = 0; i < MD_SB_DISKS; i++) { tmp = conf->mirrors + i; printk(" disk %d, s:%d, o:%d, n:%d rd:%d us:%d dev:%s\n", - i, tmp->spare,tmp->operational, - tmp->number,tmp->raid_disk,tmp->used_slot, + i, tmp->spare, tmp->operational, + tmp->number, tmp->raid_disk, tmp->used_slot, partition_name(tmp->dev)); } } -static void close_sync(raid1_conf_t *conf) +static void close_sync(conf_t *conf) { mddev_t *mddev = conf->mddev; - /* If reconstruction was interrupted, we need to close the "active" and "pending" - * holes. - * we know that there are no active rebuild requests, os cnt_active == cnt_ready ==0 + /* + * If reconstruction was interrupted, we need to close the "active" + * and "pending" holes. + * we know that there are no active rebuild requests, + * os cnt_active == cnt_ready == 0 */ - /* this is really needed when recovery stops too... */ spin_lock_irq(&conf->segment_lock); conf->start_active = conf->start_pending; conf->start_ready = conf->start_pending; wait_event_lock_irq(conf->wait_ready, !conf->cnt_pending, conf->segment_lock); - conf->start_active =conf->start_ready = conf->start_pending = conf->start_future; + conf->start_active = conf->start_ready = conf->start_pending = conf->start_future; conf->start_future = mddev->sb->size+1; conf->cnt_pending = conf->cnt_future; conf->cnt_future = 0; @@ -838,18 +699,18 @@ wake_up(&conf->wait_done); } -static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state) +static int diskop(mddev_t *mddev, mdp_disk_t **d, int state) { int err = 0; - int i, failed_disk=-1, spare_disk=-1, removed_disk=-1, added_disk=-1; - raid1_conf_t *conf = mddev->private; - struct mirror_info *tmp, *sdisk, *fdisk, *rdisk, *adisk; + int i, failed_disk = -1, spare_disk = -1, removed_disk = -1, added_disk = -1; + conf_t *conf = mddev->private; + mirror_info_t *tmp, *sdisk, *fdisk, *rdisk, *adisk; mdp_super_t *sb = mddev->sb; mdp_disk_t *failed_desc, *spare_desc, *added_desc; mdk_rdev_t *spare_rdev, *failed_rdev; - print_raid1_conf(conf); - md_spin_lock_irq(&conf->device_lock); + print_conf(conf); + spin_lock_irq(&conf->device_lock); /* * find the disk ... */ @@ -871,7 +732,7 @@ } /* * When we activate a spare disk we _must_ have a disk in - * the lower (active) part of the array to replace. + * the lower (active) part of the array to replace. */ if ((failed_disk == -1) || (failed_disk >= conf->raid_disks)) { MD_BUG(); @@ -982,7 +843,7 @@ err = 1; goto abort; } - + if (sdisk->raid_disk != spare_disk) { MD_BUG(); err = 1; @@ -1007,13 +868,14 @@ spare_rdev = find_rdev_nr(mddev, spare_desc->number); failed_rdev = find_rdev_nr(mddev, failed_desc->number); - /* There must be a spare_rdev, but there may not be a - * failed_rdev. That slot might be empty... + /* + * There must be a spare_rdev, but there may not be a + * failed_rdev. That slot might be empty... */ spare_rdev->desc_nr = failed_desc->number; if (failed_rdev) failed_rdev->desc_nr = spare_desc->number; - + xchg_values(*spare_desc, *failed_desc); xchg_values(*fdisk, *sdisk); @@ -1024,7 +886,6 @@ * give the proper raid_disk number to the now activated * disk. (this means we switch back these values) */ - xchg_values(spare_desc->raid_disk, failed_desc->raid_disk); xchg_values(sdisk->raid_disk, fdisk->raid_disk); xchg_values(spare_desc->number, failed_desc->number); @@ -1054,7 +915,7 @@ rdisk = conf->mirrors + removed_disk; if (rdisk->spare && (removed_disk < conf->raid_disks)) { - MD_BUG(); + MD_BUG(); err = 1; goto abort; } @@ -1068,14 +929,14 @@ added_desc = *d; if (added_disk != added_desc->number) { - MD_BUG(); + MD_BUG(); err = 1; goto abort; } adisk->number = added_desc->number; adisk->raid_disk = added_desc->raid_disk; - adisk->dev = MKDEV(added_desc->major,added_desc->minor); + adisk->dev = MKDEV(added_desc->major, added_desc->minor); adisk->operational = 0; adisk->write_only = 0; @@ -1087,17 +948,18 @@ break; default: - MD_BUG(); + MD_BUG(); err = 1; goto abort; } abort: - md_spin_unlock_irq(&conf->device_lock); - if (state == DISKOP_SPARE_ACTIVE || state == DISKOP_SPARE_INACTIVE) - /* should move to "END_REBUILD" when such exists */ - raid1_shrink_buffers(conf); + spin_unlock_irq(&conf->device_lock); + if (state == DISKOP_SPARE_ACTIVE || state == DISKOP_SPARE_INACTIVE) { + mempool_destroy(conf->r1buf_pool); + conf->r1buf_pool = NULL; + } - print_raid1_conf(conf); + print_conf(conf); return err; } @@ -1108,6 +970,122 @@ #define REDIRECT_SECTOR KERN_ERR \ "raid1: %s: redirecting sector %lu to another mirror\n" +static int end_sync_read(struct bio *bio, int nr_sectors) +{ + int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); + r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private); + + check_all_w_bios_empty(r1_bio); + if (r1_bio->read_bio != bio) + BUG(); + /* + * we have read a block, now it needs to be re-written, + * or re-read if the read failed. + * We don't do much here, just schedule handling by raid1d + */ + if (!uptodate) + md_error (r1_bio->mddev, bio->bi_dev); + else + set_bit(R1BIO_Uptodate, &r1_bio->state); + reschedule_retry(r1_bio); + + return 0; +} + +static int end_sync_write(struct bio *bio, int nr_sectors) +{ + int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); + r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private); + mddev_t *mddev = r1_bio->mddev; + + if (!uptodate) + md_error(mddev, bio->bi_dev); + + if (atomic_dec_and_test(&r1_bio->remaining)) { + sync_request_done(r1_bio->sector, mddev_to_conf(mddev)); + md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, uptodate); + put_buf(r1_bio); + } + return 0; +} + +static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio) +{ + conf_t *conf = mddev_to_conf(mddev); + int i, sum_bios = 0; + int disks = MD_SB_DISKS; + struct bio *bio, *mbio; + + bio = r1_bio->master_bio; + + /* + * have to allocate lots of bio structures and + * schedule writes + */ + if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) { + /* + * There is no point trying a read-for-reconstruct as + * reconstruct is about to be aborted + */ + printk(IO_ERROR, partition_name(bio->bi_dev), r1_bio->sector); + md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 0); + return; + } + + check_all_w_bios_empty(r1_bio); + + for (i = 0; i < disks ; i++) { + if (!conf->mirrors[i].operational) + continue; + if (i == conf->last_used) + /* + * we read from here, no need to write + */ + continue; + if (i < conf->raid_disks && !conf->resync_mirrors) + /* + * don't need to write this we are just rebuilding + */ + continue; + + mbio = bio_clone(bio, GFP_NOIO); + if (r1_bio->write_bios[i]) + BUG(); + r1_bio->write_bios[i] = mbio; + mbio->bi_dev = conf->mirrors[i].dev; + mbio->bi_sector = r1_bio->sector; + mbio->bi_end_io = end_sync_write; + mbio->bi_rw = WRITE; + mbio->bi_private = r1_bio; + + sum_bios++; + } + if (i != disks) + BUG(); + atomic_set(&r1_bio->remaining, sum_bios); + + + if (!sum_bios) { + /* + * Nowhere to write this to... I guess we + * must be done + */ + printk(IO_ERROR, partition_name(bio->bi_dev), r1_bio->sector); + sync_request_done(r1_bio->sector, conf); + md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 0); + put_buf(r1_bio); + return; + } + for (i = 0; i < disks ; i++) { + mbio = r1_bio->write_bios[i]; + if (!mbio) + continue; + + md_sync_acct(mbio->bi_dev, mbio->bi_size >> 9); + generic_make_request(mbio); + } +} + /* * This is a kernel thread which: * @@ -1115,134 +1093,56 @@ * 2. Updates the raid superblock when problems encounter. * 3. Performs writes following reads for array syncronising. */ -static void end_sync_write(struct buffer_head *bh, int uptodate); -static void end_sync_read(struct buffer_head *bh, int uptodate); -static void raid1d (void *data) +static void raid1d(void *data) { - struct raid1_bh *r1_bh; - struct buffer_head *bh; + struct list_head *head = &retry_list_head; + r1bio_t *r1_bio; + struct bio *bio; unsigned long flags; mddev_t *mddev; kdev_t dev; for (;;) { - md_spin_lock_irqsave(&retry_list_lock, flags); - r1_bh = raid1_retry_list; - if (!r1_bh) + spin_lock_irqsave(&retry_list_lock, flags); + if (list_empty(head)) break; - raid1_retry_list = r1_bh->next_r1; - md_spin_unlock_irqrestore(&retry_list_lock, flags); + r1_bio = list_entry(head->prev, r1bio_t, retry_list); + list_del(head->prev); + spin_unlock_irqrestore(&retry_list_lock, flags); + check_all_w_bios_empty(r1_bio); - mddev = r1_bh->mddev; + mddev = r1_bio->mddev; if (mddev->sb_dirty) { printk(KERN_INFO "raid1: dirty sb detected, updating.\n"); mddev->sb_dirty = 0; md_update_sb(mddev); } - bh = &r1_bh->bh_req; - switch(r1_bh->cmd) { + bio = r1_bio->master_bio; + switch(r1_bio->cmd) { case SPECIAL: - /* have to allocate lots of bh structures and - * schedule writes - */ - if (test_bit(R1BH_Uptodate, &r1_bh->state)) { - int i, sum_bhs = 0; - int disks = MD_SB_DISKS; - struct buffer_head *bhl, *mbh; - raid1_conf_t *conf; - - conf = mddev_to_conf(mddev); - bhl = raid1_alloc_bh(conf, conf->raid_disks); /* don't really need this many */ - for (i = 0; i < disks ; i++) { - if (!conf->mirrors[i].operational) - continue; - if (i==conf->last_used) - /* we read from here, no need to write */ - continue; - if (i < conf->raid_disks - && !conf->resync_mirrors) - /* don't need to write this, - * we are just rebuilding */ - continue; - mbh = bhl; - if (!mbh) { - MD_BUG(); - break; - } - bhl = mbh->b_next; - mbh->b_this_page = (struct buffer_head *)1; - - - /* - * prepare mirrored bh (fields ordered for max mem throughput): - */ - mbh->b_blocknr = bh->b_blocknr; - mbh->b_dev = conf->mirrors[i].dev; - mbh->b_rdev = conf->mirrors[i].dev; - mbh->b_rsector = bh->b_blocknr; - mbh->b_state = (1<b_count, 1); - mbh->b_size = bh->b_size; - mbh->b_page = bh->b_page; - mbh->b_data = bh->b_data; - mbh->b_list = BUF_LOCKED; - mbh->b_end_io = end_sync_write; - mbh->b_private = r1_bh; - - mbh->b_next = r1_bh->mirror_bh_list; - r1_bh->mirror_bh_list = mbh; - - sum_bhs++; - } - md_atomic_set(&r1_bh->remaining, sum_bhs); - if (bhl) raid1_free_bh(conf, bhl); - mbh = r1_bh->mirror_bh_list; - - if (!sum_bhs) { - /* nowhere to write this too... I guess we - * must be done - */ - sync_request_done(bh->b_blocknr, conf); - md_done_sync(mddev, bh->b_size>>9, 0); - raid1_free_buf(r1_bh); - } else - while (mbh) { - struct buffer_head *bh1 = mbh; - mbh = mbh->b_next; - generic_make_request(WRITE, bh1); - md_sync_acct(bh1->b_dev, bh1->b_size/512); - } - } else { - /* There is no point trying a read-for-reconstruct - * as reconstruct is about to be aborted - */ - - printk (IO_ERROR, partition_name(bh->b_dev), bh->b_blocknr); - md_done_sync(mddev, bh->b_size>>9, 0); - } - + sync_request_write(mddev, r1_bio); break; case READ: case READA: - dev = bh->b_dev; - raid1_map (mddev, &bh->b_dev); - if (bh->b_dev == dev) { - printk (IO_ERROR, partition_name(bh->b_dev), bh->b_blocknr); - raid1_end_bh_io(r1_bh, 0); - } else { - printk (REDIRECT_SECTOR, - partition_name(bh->b_dev), bh->b_blocknr); - bh->b_rdev = bh->b_dev; - bh->b_rsector = bh->b_blocknr; - generic_make_request (r1_bh->cmd, bh); + dev = bio->bi_dev; + map(mddev, &bio->bi_dev); + if (bio->bi_dev == dev) { + printk(IO_ERROR, partition_name(bio->bi_dev), r1_bio->sector); + raid_end_bio_io(r1_bio, 0, 0); + break; } + printk(REDIRECT_SECTOR, + partition_name(bio->bi_dev), r1_bio->sector); + bio->bi_sector = r1_bio->sector; + bio->bi_rw = r1_bio->cmd; + + generic_make_request(bio); break; } } - md_spin_unlock_irqrestore(&retry_list_lock, flags); + spin_unlock_irqrestore(&retry_list_lock, flags); } #undef IO_ERROR #undef REDIRECT_SECTOR @@ -1251,9 +1151,9 @@ * Private kernel thread to reconstruct mirrors after an unclean * shutdown. */ -static void raid1syncd (void *data) +static void raid1syncd(void *data) { - raid1_conf_t *conf = data; + conf_t *conf = data; mddev_t *mddev = conf->mddev; if (!conf->resync_mirrors) @@ -1271,7 +1171,56 @@ close_sync(conf); up(&mddev->recovery_sem); - raid1_shrink_buffers(conf); +} + +static int init_resync(conf_t *conf) +{ + int buffs; + + conf->start_active = 0; + conf->start_ready = 0; + conf->start_pending = 0; + conf->start_future = 0; + conf->phase = 0; + + buffs = RESYNC_WINDOW / RESYNC_BLOCK_SIZE; + if (conf->r1buf_pool) + BUG(); + conf->r1buf_pool = mempool_create(buffs, r1buf_pool_alloc, r1buf_pool_free, conf); + if (!conf->r1buf_pool) + return -ENOMEM; + conf->window = 2048; + conf->cnt_future += conf->cnt_done+conf->cnt_pending; + conf->cnt_done = conf->cnt_pending = 0; + if (conf->cnt_ready || conf->cnt_active) + MD_BUG(); + return 0; +} + +static void wait_sync_pending(conf_t *conf, sector_t sector_nr) +{ + spin_lock_irq(&conf->segment_lock); + while (sector_nr >= conf->start_pending) { +// printk("wait .. sect=%lu start_active=%d ready=%d pending=%d future=%d, cnt_done=%d active=%d ready=%d pending=%d future=%d\n", sector_nr, conf->start_active, conf->start_ready, conf->start_pending, conf->start_future, conf->cnt_done, conf->cnt_active, conf->cnt_ready, conf->cnt_pending, conf->cnt_future); + wait_event_lock_irq(conf->wait_done, !conf->cnt_active, + conf->segment_lock); + wait_event_lock_irq(conf->wait_ready, !conf->cnt_pending, + conf->segment_lock); + conf->start_active = conf->start_ready; + conf->start_ready = conf->start_pending; + conf->start_pending = conf->start_future; + conf->start_future = conf->start_future+conf->window; + + // Note: falling off the end is not a problem + conf->phase = conf->phase ^1; + conf->cnt_active = conf->cnt_ready; + conf->cnt_ready = 0; + conf->cnt_pending = conf->cnt_future; + conf->cnt_future = 0; + wake_up(&conf->wait_done); + } + conf->cnt_ready++; + spin_unlock_irq(&conf->segment_lock); } /* @@ -1279,7 +1228,7 @@ * * We need to make sure that no normal I/O request - particularly write * requests - conflict with active sync requests. - * This is achieved by conceptually dividing the device space into a + * This is achieved by conceptually dividing the block space into a * number of sections: * DONE: 0 .. a-1 These blocks are in-sync * ACTIVE: a.. b-1 These blocks may have active sync requests, but @@ -1322,149 +1271,81 @@ * issue suitable write requests */ -static int raid1_sync_request (mddev_t *mddev, unsigned long sector_nr) +static int sync_request(mddev_t *mddev, sector_t sector_nr) { - raid1_conf_t *conf = mddev_to_conf(mddev); - struct mirror_info *mirror; - struct raid1_bh *r1_bh; - struct buffer_head *bh; - int bsize; - int disk; - int block_nr; + conf_t *conf = mddev_to_conf(mddev); + mirror_info_t *mirror; + r1bio_t *r1_bio; + struct bio *read_bio, *bio; + sector_t max_sector, nr_sectors; + int disk, partial; + + if (!sector_nr) + if (init_resync(conf)) + return -ENOMEM; - spin_lock_irq(&conf->segment_lock); - if (!sector_nr) { - /* initialize ...*/ - int buffs; - conf->start_active = 0; - conf->start_ready = 0; - conf->start_pending = 0; - conf->start_future = 0; - conf->phase = 0; - /* we want enough buffers to hold twice the window of 128*/ - buffs = 128 *2 / (PAGE_SIZE>>9); - buffs = raid1_grow_buffers(conf, buffs); - if (buffs < 2) - goto nomem; - - conf->window = buffs*(PAGE_SIZE>>9)/2; - conf->cnt_future += conf->cnt_done+conf->cnt_pending; - conf->cnt_done = conf->cnt_pending = 0; - if (conf->cnt_ready || conf->cnt_active) - MD_BUG(); - } - while (sector_nr >= conf->start_pending) { - PRINTK("wait .. sect=%lu start_active=%d ready=%d pending=%d future=%d, cnt_done=%d active=%d ready=%d pending=%d future=%d\n", - sector_nr, conf->start_active, conf->start_ready, conf->start_pending, conf->start_future, - conf->cnt_done, conf->cnt_active, conf->cnt_ready, conf->cnt_pending, conf->cnt_future); - wait_event_lock_irq(conf->wait_done, - !conf->cnt_active, - conf->segment_lock); - wait_event_lock_irq(conf->wait_ready, - !conf->cnt_pending, - conf->segment_lock); - conf->start_active = conf->start_ready; - conf->start_ready = conf->start_pending; - conf->start_pending = conf->start_future; - conf->start_future = conf->start_future+conf->window; - // Note: falling off the end is not a problem - conf->phase = conf->phase ^1; - conf->cnt_active = conf->cnt_ready; - conf->cnt_ready = 0; - conf->cnt_pending = conf->cnt_future; - conf->cnt_future = 0; - wake_up(&conf->wait_done); - } - conf->cnt_ready++; - spin_unlock_irq(&conf->segment_lock); - + wait_sync_pending(conf, sector_nr); - /* If reconstructing, and >1 working disc, + /* + * If reconstructing, and >1 working disc, * could dedicate one to rebuild and others to * service read requests .. */ disk = conf->last_used; /* make sure disk is operational */ while (!conf->mirrors[disk].operational) { - if (disk <= 0) disk = conf->raid_disks; + if (disk <= 0) + disk = conf->raid_disks; disk--; if (disk == conf->last_used) break; } conf->last_used = disk; - + mirror = conf->mirrors+conf->last_used; - - r1_bh = raid1_alloc_buf (conf); - r1_bh->master_bh = NULL; - r1_bh->mddev = mddev; - r1_bh->cmd = SPECIAL; - bh = &r1_bh->bh_req; - - block_nr = sector_nr; - bsize = 512; - while (!(block_nr & 1) && bsize < PAGE_SIZE - && (block_nr+2)*(bsize>>9) < (mddev->sb->size *2)) { - block_nr >>= 1; - bsize <<= 1; - } - bh->b_size = bsize; - bh->b_list = BUF_LOCKED; - bh->b_dev = mirror->dev; - bh->b_rdev = mirror->dev; - bh->b_state = (1<b_page) - BUG(); - if (!bh->b_data) - BUG(); - if (bh->b_data != page_address(bh->b_page)) + + r1_bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO); + check_all_bios_empty(r1_bio); + + r1_bio->mddev = mddev; + r1_bio->sector = sector_nr; + r1_bio->cmd = SPECIAL; + + max_sector = mddev->sb->size << 1; + if (sector_nr >= max_sector) BUG(); - bh->b_end_io = end_sync_read; - bh->b_private = r1_bh; - bh->b_blocknr = sector_nr; - bh->b_rsector = sector_nr; - init_waitqueue_head(&bh->b_wait); - generic_make_request(READ, bh); - md_sync_acct(bh->b_dev, bh->b_size/512); + bio = r1_bio->master_bio; + nr_sectors = RESYNC_BLOCK_SIZE >> 9; + if (max_sector - sector_nr < nr_sectors) + nr_sectors = max_sector - sector_nr; + bio->bi_size = nr_sectors << 9; + bio->bi_vcnt = (bio->bi_size + PAGE_SIZE-1) / PAGE_SIZE; + /* + * Is there a partial page at the end of the request? + */ + partial = bio->bi_size % PAGE_SIZE; + if (partial) + bio->bi_io_vec[bio->bi_vcnt-1].bv_len = partial; - return (bsize >> 9); -nomem: - raid1_shrink_buffers(conf); - spin_unlock_irq(&conf->segment_lock); - return -ENOMEM; -} + read_bio = bio_clone(r1_bio->master_bio, GFP_NOIO); -static void end_sync_read(struct buffer_head *bh, int uptodate) -{ - struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private); + read_bio->bi_sector = sector_nr; + read_bio->bi_dev = mirror->dev; + read_bio->bi_end_io = end_sync_read; + read_bio->bi_rw = READ; + read_bio->bi_private = r1_bio; - /* we have read a block, now it needs to be re-written, - * or re-read if the read failed. - * We don't do much here, just schedule handling by raid1d - */ - if (!uptodate) - md_error (r1_bh->mddev, bh->b_dev); - else - set_bit(R1BH_Uptodate, &r1_bh->state); - raid1_reschedule_retry(r1_bh); -} + if (r1_bio->read_bio) + BUG(); + r1_bio->read_bio = read_bio; -static void end_sync_write(struct buffer_head *bh, int uptodate) -{ - struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private); - - if (!uptodate) - md_error (r1_bh->mddev, bh->b_dev); - if (atomic_dec_and_test(&r1_bh->remaining)) { - mddev_t *mddev = r1_bh->mddev; - unsigned long sect = bh->b_blocknr; - int size = bh->b_size; - raid1_free_buf(r1_bh); - sync_request_done(sect, mddev_to_conf(mddev)); - md_done_sync(mddev,size>>9, uptodate); - } + md_sync_acct(read_bio->bi_dev, nr_sectors); + + generic_make_request(read_bio); + + return nr_sectors; } #define INVALID_LEVEL KERN_WARNING \ @@ -1506,15 +1387,15 @@ #define START_RESYNC KERN_WARNING \ "raid1: raid set md%d not clean; reconstructing mirrors\n" -static int raid1_run (mddev_t *mddev) +static int run(mddev_t *mddev) { - raid1_conf_t *conf; + conf_t *conf; int i, j, disk_idx; - struct mirror_info *disk; + mirror_info_t *disk; mdp_super_t *sb = mddev->sb; mdp_disk_t *descriptor; mdk_rdev_t *rdev; - struct md_list_head *tmp; + struct list_head *tmp; int start_recovery = 0; MOD_INC_USE_COUNT; @@ -1525,11 +1406,10 @@ } /* * copy the already verified devices into our private RAID1 - * bookkeeping area. [whatever we allocate in raid1_run(), - * should be freed in raid1_stop()] + * bookkeeping area. [whatever we allocate in run(), + * should be freed in stop()] */ - - conf = kmalloc(sizeof(raid1_conf_t), GFP_KERNEL); + conf = kmalloc(sizeof(conf_t), GFP_KERNEL); mddev->private = conf; if (!conf) { printk(MEM_ERROR, mdidx(mddev)); @@ -1537,7 +1417,16 @@ } memset(conf, 0, sizeof(*conf)); - ITERATE_RDEV(mddev,rdev,tmp) { + conf->r1bio_pool = mempool_create(NR_RAID1_BIOS, r1bio_pool_alloc, + r1bio_pool_free, NULL); + if (!conf->r1bio_pool) { + printk(MEM_ERROR, mdidx(mddev)); + goto out; + } + +// for (tmp = (mddev)->disks.next; rdev = ((mdk_rdev_t *)((char *)(tmp)-(unsigned long)(&((mdk_rdev_t *)0)->same_set))), tmp = tmp->next, tmp->prev != &(mddev)->disks ; ) { + + ITERATE_RDEV(mddev, rdev, tmp) { if (rdev->faulty) { printk(ERRORS, partition_name(rdev->dev)); } else { @@ -1573,7 +1462,7 @@ continue; } if ((descriptor->number > MD_SB_DISKS) || - (disk_idx > sb->raid_disks)) { + (disk_idx > sb->raid_disks)) { printk(INCONSISTENT, partition_name(rdev->dev)); @@ -1586,7 +1475,7 @@ continue; } printk(OPERATIONAL, partition_name(rdev->dev), - disk_idx); + disk_idx); disk->number = descriptor->number; disk->raid_disk = disk_idx; disk->dev = rdev->dev; @@ -1616,10 +1505,9 @@ conf->raid_disks = sb->raid_disks; conf->nr_disks = sb->nr_disks; conf->mddev = mddev; - conf->device_lock = MD_SPIN_LOCK_UNLOCKED; + conf->device_lock = SPIN_LOCK_UNLOCKED; - conf->segment_lock = MD_SPIN_LOCK_UNLOCKED; - init_waitqueue_head(&conf->wait_buffer); + conf->segment_lock = SPIN_LOCK_UNLOCKED; init_waitqueue_head(&conf->wait_done); init_waitqueue_head(&conf->wait_ready); @@ -1628,25 +1516,8 @@ goto out_free_conf; } - - /* pre-allocate some buffer_head structures. - * As a minimum, 1 r1bh and raid_disks buffer_heads - * would probably get us by in tight memory situations, - * but a few more is probably a good idea. - * For now, try NR_RESERVED_BUFS r1bh and - * NR_RESERVED_BUFS*raid_disks bufferheads - * This will allow at least NR_RESERVED_BUFS concurrent - * reads or writes even if kmalloc starts failing - */ - if (raid1_grow_r1bh(conf, NR_RESERVED_BUFS) < NR_RESERVED_BUFS || - raid1_grow_bh(conf, NR_RESERVED_BUFS*conf->raid_disks) - < NR_RESERVED_BUFS*conf->raid_disks) { - printk(MEM_ERROR, mdidx(mddev)); - goto out_free_conf; - } - for (i = 0; i < MD_SB_DISKS; i++) { - + descriptor = sb->disks+i; disk_idx = descriptor->raid_disk; disk = conf->mirrors + disk_idx; @@ -1691,10 +1562,10 @@ } if (!start_recovery && !(sb->state & (1 << MD_SB_CLEAN)) && - (conf->working_disks > 1)) { + (conf->working_disks > 1)) { const char * name = "raid1syncd"; - conf->resync_thread = md_register_thread(raid1syncd, conf,name); + conf->resync_thread = md_register_thread(raid1syncd, conf, name); if (!conf->resync_thread) { printk(THREAD_ERROR, mdidx(mddev)); goto out_free_conf; @@ -1731,9 +1602,8 @@ return 0; out_free_conf: - raid1_shrink_r1bh(conf); - raid1_shrink_bh(conf); - raid1_shrink_buffers(conf); + if (conf->r1bio_pool) + mempool_destroy(conf->r1bio_pool); kfree(conf); mddev->private = NULL; out: @@ -1752,9 +1622,9 @@ #undef NONE_OPERATIONAL #undef ARRAY_IS_ACTIVE -static int raid1_stop_resync (mddev_t *mddev) +static int stop_resync(mddev_t *mddev) { - raid1_conf_t *conf = mddev_to_conf(mddev); + conf_t *conf = mddev_to_conf(mddev); if (conf->resync_thread) { if (conf->resync_mirrors) { @@ -1769,9 +1639,9 @@ return 0; } -static int raid1_restart_resync (mddev_t *mddev) +static int restart_resync(mddev_t *mddev) { - raid1_conf_t *conf = mddev_to_conf(mddev); + conf_t *conf = mddev_to_conf(mddev); if (conf->resync_mirrors) { if (!conf->resync_thread) { @@ -1785,46 +1655,45 @@ return 0; } -static int raid1_stop (mddev_t *mddev) +static int stop(mddev_t *mddev) { - raid1_conf_t *conf = mddev_to_conf(mddev); + conf_t *conf = mddev_to_conf(mddev); md_unregister_thread(conf->thread); if (conf->resync_thread) md_unregister_thread(conf->resync_thread); - raid1_shrink_r1bh(conf); - raid1_shrink_bh(conf); - raid1_shrink_buffers(conf); + if (conf->r1bio_pool) + mempool_destroy(conf->r1bio_pool); kfree(conf); mddev->private = NULL; MOD_DEC_USE_COUNT; return 0; } -static mdk_personality_t raid1_personality= +static mdk_personality_t raid1_personality = { name: "raid1", - make_request: raid1_make_request, - run: raid1_run, - stop: raid1_stop, - status: raid1_status, - error_handler: raid1_error, - diskop: raid1_diskop, - stop_resync: raid1_stop_resync, - restart_resync: raid1_restart_resync, - sync_request: raid1_sync_request + make_request: make_request, + run: run, + stop: stop, + status: status, + error_handler: error, + diskop: diskop, + stop_resync: stop_resync, + restart_resync: restart_resync, + sync_request: sync_request }; -static int md__init raid1_init (void) +static int __init raid_init(void) { - return register_md_personality (RAID1, &raid1_personality); + return register_md_personality(RAID1, &raid1_personality); } -static void raid1_exit (void) +static void raid_exit(void) { - unregister_md_personality (RAID1); + unregister_md_personality(RAID1); } -module_init(raid1_init); -module_exit(raid1_exit); +module_init(raid_init); +module_exit(raid_exit); MODULE_LICENSE("GPL"); diff -urN linux-2.5.1-pre10/drivers/net/tulip/ChangeLog linux/drivers/net/tulip/ChangeLog --- linux-2.5.1-pre10/drivers/net/tulip/ChangeLog Mon Nov 19 15:19:42 2001 +++ linux/drivers/net/tulip/ChangeLog Wed Dec 12 23:32:29 2001 @@ -1,3 +1,8 @@ +2001-12-11 Jeff Garzik + + * eeprom.c, timer.c, media.c, tulip_core.c: + Remove 21040 and 21041 chip support. + 2001-11-13 David S. Miller * tulip_core.c (tulip_mwi_config): Kill unused label early_out. diff -urN linux-2.5.1-pre10/drivers/net/tulip/eeprom.c linux/drivers/net/tulip/eeprom.c --- linux-2.5.1-pre10/drivers/net/tulip/eeprom.c Tue Oct 2 09:00:58 2001 +++ linux/drivers/net/tulip/eeprom.c Wed Dec 12 23:32:29 2001 @@ -136,23 +136,6 @@ subsequent_board: if (ee_data[27] == 0) { /* No valid media table. */ - } else if (tp->chip_id == DC21041) { - unsigned char *p = (void *)ee_data + ee_data[27 + controller_index*3]; - int media = get_u16(p); - int count = p[2]; - p += 3; - - printk(KERN_INFO "%s: 21041 Media table, default media %4.4x (%s).\n", - dev->name, media, - media & 0x0800 ? "Autosense" : medianame[media & MEDIA_MASK]); - for (i = 0; i < count; i++) { - unsigned char media_block = *p++; - int media_code = media_block & MEDIA_MASK; - if (media_block & 0x40) - p += 6; - printk(KERN_INFO "%s: 21041 media #%d, %s.\n", - dev->name, media_code, medianame[media_code]); - } } else { unsigned char *p = (void *)ee_data + ee_data[27]; unsigned char csr12dir = 0; diff -urN linux-2.5.1-pre10/drivers/net/tulip/media.c linux/drivers/net/tulip/media.c --- linux-2.5.1-pre10/drivers/net/tulip/media.c Tue Jul 17 18:53:55 2001 +++ linux/drivers/net/tulip/media.c Wed Dec 12 23:32:29 2001 @@ -21,12 +21,6 @@ #include "tulip.h" -/* This is a mysterious value that can be written to CSR11 in the 21040 (only) - to support a pre-NWay full-duplex signaling mechanism using short frames. - No one knows what it should be, but if left at its default value some - 10base2(!) packets trigger a full-duplex-request interrupt. */ -#define FULL_DUPLEX_MAGIC 0x6969 - /* The maximum data clock rate is 2.5 Mhz. The minimum timing is usually met by back-to-back PCI I/O cycles, but we insert a delay to avoid "overclocking" issues or future 66Mhz PCI. */ @@ -326,17 +320,6 @@ printk(KERN_DEBUG "%s: Using media type %s, CSR12 is %2.2x.\n", dev->name, medianame[dev->if_port], inl(ioaddr + CSR12) & 0xff); - } else if (tp->chip_id == DC21041) { - int port = dev->if_port <= 4 ? dev->if_port : 0; - if (tulip_debug > 1) - printk(KERN_DEBUG "%s: 21041 using media %s, CSR12 is %4.4x.\n", - dev->name, medianame[port == 3 ? 12: port], - inl(ioaddr + CSR12)); - outl(0x00000000, ioaddr + CSR13); /* Reset the serial interface */ - outl(t21041_csr14[port], ioaddr + CSR14); - outl(t21041_csr15[port], ioaddr + CSR15); - outl(t21041_csr13[port], ioaddr + CSR13); - new_csr6 = 0x80020000; } else if (tp->chip_id == LC82C168) { if (startup && ! tp->medialock) dev->if_port = tp->mii_cnt ? 11 : 0; @@ -363,26 +346,6 @@ new_csr6 = 0x00420000; outl(0x1F078, ioaddr + 0xB8); } - } else if (tp->chip_id == DC21040) { /* 21040 */ - /* Turn on the xcvr interface. */ - int csr12 = inl(ioaddr + CSR12); - if (tulip_debug > 1) - printk(KERN_DEBUG "%s: 21040 media type is %s, CSR12 is %2.2x.\n", - dev->name, medianame[dev->if_port], csr12); - if (tulip_media_cap[dev->if_port] & MediaAlwaysFD) - tp->full_duplex = 1; - new_csr6 = 0x20000; - /* Set the full duplux match frame. */ - outl(FULL_DUPLEX_MAGIC, ioaddr + CSR11); - outl(0x00000000, ioaddr + CSR13); /* Reset the serial interface */ - if (t21040_csr13[dev->if_port] & 8) { - outl(0x0705, ioaddr + CSR14); - outl(0x0006, ioaddr + CSR15); - } else { - outl(0xffff, ioaddr + CSR14); - outl(0x0000, ioaddr + CSR15); - } - outl(0x8f01 | t21040_csr13[dev->if_port], ioaddr + CSR13); } else { /* Unknown chip type with no media table. */ if (tp->default_port == 0) dev->if_port = tp->mii_cnt ? 11 : 3; diff -urN linux-2.5.1-pre10/drivers/net/tulip/timer.c linux/drivers/net/tulip/timer.c --- linux-2.5.1-pre10/drivers/net/tulip/timer.c Wed Jun 20 11:15:44 2001 +++ linux/drivers/net/tulip/timer.c Wed Dec 12 23:32:29 2001 @@ -33,60 +33,6 @@ inl(ioaddr + CSR14), inl(ioaddr + CSR15)); } switch (tp->chip_id) { - case DC21040: - if (!tp->medialock && csr12 & 0x0002) { /* Network error */ - printk(KERN_INFO "%s: No link beat found.\n", - dev->name); - dev->if_port = (dev->if_port == 2 ? 0 : 2); - tulip_select_media(dev, 0); - dev->trans_start = jiffies; - } - break; - case DC21041: - if (tulip_debug > 2) - printk(KERN_DEBUG "%s: 21041 media tick CSR12 %8.8x.\n", - dev->name, csr12); - if (tp->medialock) break; - switch (dev->if_port) { - case 0: case 3: case 4: - if (csr12 & 0x0004) { /*LnkFail */ - /* 10baseT is dead. Check for activity on alternate port. */ - tp->mediasense = 1; - if (csr12 & 0x0200) - dev->if_port = 2; - else - dev->if_port = 1; - printk(KERN_INFO "%s: No 21041 10baseT link beat, Media switched to %s.\n", - dev->name, medianame[dev->if_port]); - outl(0, ioaddr + CSR13); /* Reset */ - outl(t21041_csr14[dev->if_port], ioaddr + CSR14); - outl(t21041_csr15[dev->if_port], ioaddr + CSR15); - outl(t21041_csr13[dev->if_port], ioaddr + CSR13); - next_tick = 10*HZ; /* 2.4 sec. */ - } else - next_tick = 30*HZ; - break; - case 1: /* 10base2 */ - case 2: /* AUI */ - if (csr12 & 0x0100) { - next_tick = (30*HZ); /* 30 sec. */ - tp->mediasense = 0; - } else if ((csr12 & 0x0004) == 0) { - printk(KERN_INFO "%s: 21041 media switched to 10baseT.\n", - dev->name); - dev->if_port = 0; - tulip_select_media(dev, 0); - next_tick = (24*HZ)/10; /* 2.4 sec. */ - } else if (tp->mediasense || (csr12 & 0x0002)) { - dev->if_port = 3 - dev->if_port; /* Swap ports. */ - tulip_select_media(dev, 0); - next_tick = 20*HZ; - } else { - next_tick = 20*HZ; - } - break; - } - break; case DC21140: case DC21142: case MX98713: diff -urN linux-2.5.1-pre10/drivers/net/tulip/tulip_core.c linux/drivers/net/tulip/tulip_core.c --- linux-2.5.1-pre10/drivers/net/tulip/tulip_core.c Mon Nov 19 15:19:42 2001 +++ linux/drivers/net/tulip/tulip_core.c Wed Dec 12 23:32:29 2001 @@ -15,8 +15,8 @@ */ #define DRV_NAME "tulip" -#define DRV_VERSION "0.9.15-pre9" -#define DRV_RELDATE "Nov 6, 2001" +#define DRV_VERSION "1.1.0" +#define DRV_RELDATE "Dec 11, 2001" #include #include @@ -130,12 +130,8 @@ */ struct tulip_chip_table tulip_tbl[] = { - /* DC21040 */ - { "Digital DC21040 Tulip", 128, 0x0001ebef, 0, tulip_timer }, - - /* DC21041 */ - { "Digital DC21041 Tulip", 128, 0x0001ebef, - HAS_MEDIA_TABLE | HAS_NWAY, tulip_timer }, + { }, /* placeholder for array, slot unused currently */ + { }, /* placeholder for array, slot unused currently */ /* DC21140 */ { "Digital DS21140 Tulip", 128, 0x0001ebef, @@ -192,8 +188,6 @@ static struct pci_device_id tulip_pci_tbl[] __devinitdata = { - { 0x1011, 0x0002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21040 }, - { 0x1011, 0x0014, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21041 }, { 0x1011, 0x0009, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21140 }, { 0x1011, 0x0019, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21143 }, { 0x11AD, 0x0002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, LC82C168 }, @@ -224,19 +218,6 @@ /* A full-duplex map for media types. */ const char tulip_media_cap[32] = {0,0,0,16, 3,19,16,24, 27,4,7,5, 0,20,23,20, 28,31,0,0, }; -u8 t21040_csr13[] = {2,0x0C,8,4, 4,0,0,0, 0,0,0,0, 4,0,0,0}; - -/* 21041 transceiver register settings: 10-T, 10-2, AUI, 10-T, 10T-FD*/ -u16 t21041_csr13[] = { - csr13_mask_10bt, /* 10-T */ - csr13_mask_auibnc, /* 10-2 */ - csr13_mask_auibnc, /* AUI */ - csr13_mask_10bt, /* 10-T */ - csr13_mask_10bt, /* 10T-FD */ -}; -u16 t21041_csr14[] = { 0xFFFF, 0xF7FD, 0xF7FD, 0x7F3F, 0x7F3D, }; -u16 t21041_csr15[] = { 0x0008, 0x0006, 0x000E, 0x0008, 0x0008, }; - static void tulip_tx_timeout(struct net_device *dev); static void tulip_init_ring(struct net_device *dev); @@ -388,19 +369,6 @@ outl(0x0008, ioaddr + CSR15); } tulip_select_media(dev, 1); - } else if (tp->chip_id == DC21041) { - dev->if_port = 0; - tp->nway = tp->mediasense = 1; - tp->nwayset = tp->lpar = 0; - outl(0x00000000, ioaddr + CSR13); - outl(0xFFFFFFFF, ioaddr + CSR14); - outl(0x00000008, ioaddr + CSR15); /* Listen on AUI also. */ - tp->csr6 = 0x80020000; - if (tp->sym_advertise & 0x0040) - tp->csr6 |= FullDuplex; - outl(tp->csr6, ioaddr + CSR6); - outl(0x0000EF01, ioaddr + CSR13); - } else if (tp->chip_id == DC21142) { if (tp->mii_cnt) { tulip_select_media(dev, 1); @@ -538,33 +506,6 @@ if (tulip_debug > 1) printk(KERN_WARNING "%s: Transmit timeout using MII device.\n", dev->name); - } else if (tp->chip_id == DC21040) { - if ( !tp->medialock && inl(ioaddr + CSR12) & 0x0002) { - dev->if_port = (dev->if_port == 2 ? 0 : 2); - printk(KERN_INFO "%s: 21040 transmit timed out, switching to " - "%s.\n", - dev->name, medianame[dev->if_port]); - tulip_select_media(dev, 0); - } - goto out; - } else if (tp->chip_id == DC21041) { - int csr12 = inl(ioaddr + CSR12); - - printk(KERN_WARNING "%s: 21041 transmit timed out, status %8.8x, " - "CSR12 %8.8x, CSR13 %8.8x, CSR14 %8.8x, resetting...\n", - dev->name, inl(ioaddr + CSR5), csr12, - inl(ioaddr + CSR13), inl(ioaddr + CSR14)); - tp->mediasense = 1; - if ( ! tp->medialock) { - if (dev->if_port == 1 || dev->if_port == 2) - if (csr12 & 0x0004) { - dev->if_port = 2 - dev->if_port; - } else - dev->if_port = 0; - else - dev->if_port = 1; - tulip_select_media(dev, 0); - } } else if (tp->chip_id == DC21140 || tp->chip_id == DC21142 || tp->chip_id == MX98713 || tp->chip_id == COMPEX9881 || tp->chip_id == DM910X) { @@ -636,7 +577,6 @@ tp->stats.tx_errors++; -out: spin_unlock_irqrestore (&tp->lock, flags); dev->trans_start = jiffies; netif_wake_queue (dev); @@ -802,10 +742,6 @@ /* release any unconsumed transmit buffers */ tulip_clean_tx_ring(tp); - /* 21040 -- Leave the card in 10baseT state. */ - if (tp->chip_id == DC21040) - outl (0x00000004, ioaddr + CSR13); - if (inl (ioaddr + CSR6) != 0xffffffff) tp->stats.rx_missed_errors += inl (ioaddr + CSR8) & 0xffff; @@ -966,16 +902,14 @@ 0x1848 + ((csr12&0x7000) == 0x5000 ? 0x20 : 0) + ((csr12&0x06) == 6 ? 0 : 4); - if (tp->chip_id != DC21041) - data->val_out |= 0x6048; + data->val_out |= 0x6048; break; case 4: /* Advertised value, bogus 10baseTx-FD value from CSR6. */ data->val_out = ((inl(ioaddr + CSR6) >> 3) & 0x0040) + ((csr14 >> 1) & 0x20) + 1; - if (tp->chip_id != DC21041) - data->val_out |= ((csr14 >> 9) & 0x03C0); + data->val_out |= ((csr14 >> 9) & 0x03C0); break; case 5: data->val_out = tp->lpar; break; default: data->val_out = 0; break; @@ -1358,7 +1292,6 @@ long ioaddr; static int board_idx = -1; int chip_idx = ent->driver_data; - unsigned int t2104x_mode = 0; unsigned int eeprom_missing = 0; unsigned int force_csr0 = 0; @@ -1527,31 +1460,12 @@ /* Clear the missed-packet counter. */ inl(ioaddr + CSR8); - if (chip_idx == DC21041) { - if (inl(ioaddr + CSR9) & 0x8000) { - chip_idx = DC21040; - t2104x_mode = 1; - } else { - t2104x_mode = 2; - } - } - /* The station address ROM is read byte serially. The register must be polled, waiting for the value to be read bit serially from the EEPROM. */ sum = 0; - if (chip_idx == DC21040) { - outl(0, ioaddr + CSR9); /* Reset the pointer with a dummy write. */ - for (i = 0; i < 6; i++) { - int value, boguscnt = 100000; - do - value = inl(ioaddr + CSR9); - while (value < 0 && --boguscnt > 0); - dev->dev_addr[i] = value; - sum += value & 0xff; - } - } else if (chip_idx == LC82C168) { + if (chip_idx == LC82C168) { for (i = 0; i < 3; i++) { int value, boguscnt = 100000; outl(0x600 | i, ioaddr + 0x98); @@ -1719,10 +1633,6 @@ dev->name, tulip_tbl[chip_idx].chip_name, chip_rev, ioaddr); pci_set_drvdata(pdev, dev); - if (t2104x_mode == 1) - printk(" 21040 compatible mode,"); - else if (t2104x_mode == 2) - printk(" 21041 mode,"); if (eeprom_missing) printk(" EEPROM not present,"); for (i = 0; i < 6; i++) @@ -1731,26 +1641,13 @@ if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; - else if ((tp->flags & HAS_NWAY) || tp->chip_id == DC21041) + else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change; /* Reset the xcvr interface and turn on heartbeat. */ switch (chip_idx) { - case DC21041: - if (tp->sym_advertise == 0) - tp->sym_advertise = 0x0061; - outl(0x00000000, ioaddr + CSR13); - outl(0xFFFFFFFF, ioaddr + CSR14); - outl(0x00000008, ioaddr + CSR15); /* Listen on AUI also. */ - outl(inl(ioaddr + CSR6) | csr6_fd, ioaddr + CSR6); - outl(0x0000EF01, ioaddr + CSR13); - break; - case DC21040: - outl(0x00000000, ioaddr + CSR13); - outl(0x00000004, ioaddr + CSR13); - break; case DC21140: case DM910X: default: diff -urN linux-2.5.1-pre10/drivers/scsi/eata.c linux/drivers/scsi/eata.c --- linux-2.5.1-pre10/drivers/scsi/eata.c Fri Nov 9 14:05:06 2001 +++ linux/drivers/scsi/eata.c Wed Dec 12 23:32:30 2001 @@ -1,6 +1,9 @@ /* * eata.c - Low-level driver for EATA/DMA SCSI host adapters. * + * 11 Dec 2001 Rev. 7.00 for linux 2.5.1 + * + Use host->host_lock instead of io_request_lock. + * * 1 May 2001 Rev. 6.05 for linux 2.4.4 * + Clean up all pci related routines. * + Fix data transfer direction for opcode SEND_CUE_SHEET (0x5d) @@ -438,13 +441,6 @@ #include #include -#define SPIN_FLAGS unsigned long spin_flags; -#define SPIN_LOCK spin_lock_irq(&io_request_lock); -#define SPIN_LOCK_SAVE spin_lock_irqsave(&io_request_lock, spin_flags); -#define SPIN_UNLOCK spin_unlock_irq(&io_request_lock); -#define SPIN_UNLOCK_RESTORE \ - spin_unlock_irqrestore(&io_request_lock, spin_flags); - /* Subversion values */ #define ISA 0 #define ESA 1 @@ -1589,10 +1585,12 @@ #endif HD(j)->in_reset = TRUE; - SPIN_UNLOCK + + spin_unlock_irq(&sh[j]->host_lock); time = jiffies; while ((jiffies - time) < (10 * HZ) && limit++ < 200000) udelay(100L); - SPIN_LOCK + spin_lock_irq(&sh[j]->host_lock); + printk("%s: reset, interrupts disabled, loops %d.\n", BN(j), limit); for (i = 0; i < sh[j]->can_queue; i++) { @@ -2036,14 +2034,14 @@ static void do_interrupt_handler(int irq, void *shap, struct pt_regs *regs) { unsigned int j; - SPIN_FLAGS + unsigned long spin_flags; /* Check if the interrupt must be processed by this handler */ if ((j = (unsigned int)((char *)shap - sha)) >= num_boards) return; - SPIN_LOCK_SAVE + spin_lock_irqsave(&sh[j]->host_lock, spin_flags); ihdlr(irq, j); - SPIN_UNLOCK_RESTORE + spin_unlock_irqrestore(&sh[j]->host_lock, spin_flags); } int eata2x_release(struct Scsi_Host *shpnt) { @@ -2077,4 +2075,4 @@ #ifndef MODULE __setup("eata=", option_setup); #endif /* end MODULE */ -MODULE_LICENSE("Dual BSD/GPL"); +MODULE_LICENSE("GPL"); diff -urN linux-2.5.1-pre10/drivers/scsi/eata.h linux/drivers/scsi/eata.h --- linux-2.5.1-pre10/drivers/scsi/eata.h Wed Dec 12 23:32:27 2001 +++ linux/drivers/scsi/eata.h Wed Dec 12 23:32:30 2001 @@ -13,7 +13,7 @@ int eata2x_reset(Scsi_Cmnd *); int eata2x_biosparam(Disk *, kdev_t, int *); -#define EATA_VERSION "6.05.00" +#define EATA_VERSION "7.00.00" #define EATA { \ name: "EATA/DMA 2.0x rev. " EATA_VERSION " ", \ diff -urN linux-2.5.1-pre10/drivers/scsi/scsi.c linux/drivers/scsi/scsi.c --- linux-2.5.1-pre10/drivers/scsi/scsi.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/scsi/scsi.c Wed Dec 12 23:32:30 2001 @@ -183,7 +183,7 @@ request_queue_t *q = &SDpnt->request_queue; int max_segments = SHpnt->sg_tablesize; - blk_init_queue(q, scsi_request_fn); + blk_init_queue(q, scsi_request_fn, &SHpnt->host_lock); q->queuedata = (void *) SDpnt; #ifdef DMA_CHUNK_SIZE diff -urN linux-2.5.1-pre10/drivers/scsi/scsi_error.c linux/drivers/scsi/scsi_error.c --- linux-2.5.1-pre10/drivers/scsi/scsi_error.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/scsi/scsi_error.c Wed Dec 12 23:32:30 2001 @@ -1254,9 +1254,7 @@ break; } - spin_lock(&q->queue_lock); q->request_fn(q); - spin_unlock(&q->queue_lock); } spin_unlock_irqrestore(&host->host_lock, flags); } diff -urN linux-2.5.1-pre10/drivers/scsi/scsi_lib.c linux/drivers/scsi/scsi_lib.c --- linux-2.5.1-pre10/drivers/scsi/scsi_lib.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/scsi/scsi_lib.c Wed Dec 12 23:32:30 2001 @@ -70,7 +70,7 @@ { unsigned long flags; - ASSERT_LOCK(&q->queue_lock, 0); + ASSERT_LOCK(q->queue_lock, 0); /* * tell I/O scheduler that this isn't a regular read/write (ie it @@ -91,10 +91,10 @@ * head of the queue for things like a QUEUE_FULL message from a * device, or a host that is unable to accept a particular command. */ - spin_lock_irqsave(&q->queue_lock, flags); + spin_lock_irqsave(q->queue_lock, flags); __elv_add_request(q, rq, !at_head, 0); q->request_fn(q); - spin_unlock_irqrestore(&q->queue_lock, flags); + spin_unlock_irqrestore(q->queue_lock, flags); } @@ -250,9 +250,9 @@ Scsi_Device *SDpnt; struct Scsi_Host *SHpnt; - ASSERT_LOCK(&q->queue_lock, 0); + ASSERT_LOCK(q->queue_lock, 0); - spin_lock_irqsave(&q->queue_lock, flags); + spin_lock_irqsave(q->queue_lock, flags); if (SCpnt != NULL) { /* @@ -325,7 +325,7 @@ SHpnt->some_device_starved = 0; } } - spin_unlock_irqrestore(&q->queue_lock, flags); + spin_unlock_irqrestore(q->queue_lock, flags); } /* @@ -360,7 +360,7 @@ request_queue_t *q = &SCpnt->device->request_queue; struct request *req = &SCpnt->request; - ASSERT_LOCK(&q->queue_lock, 0); + ASSERT_LOCK(q->queue_lock, 0); /* * If there are blocks left over at the end, set up the command @@ -445,7 +445,7 @@ { struct request *req = &SCpnt->request; - ASSERT_LOCK(&SCpnt->device->request_queue.queue_lock, 0); + ASSERT_LOCK(&SCpnt->host->host_lock, 0); /* * Free up any indirection buffers we allocated for DMA purposes. @@ -518,7 +518,7 @@ * would be used if we just wanted to retry, for example. * */ - ASSERT_LOCK(&q->queue_lock, 0); + ASSERT_LOCK(q->queue_lock, 0); /* * Free up any indirection buffers we allocated for DMA purposes. @@ -746,8 +746,6 @@ kdev_t dev = req->rq_dev; int major = MAJOR(dev); - ASSERT_LOCK(&req->q->queue_lock, 1); - for (spnt = scsi_devicelist; spnt; spnt = spnt->next) { /* * Search for a block device driver that supports this @@ -804,7 +802,7 @@ struct Scsi_Host *SHpnt; struct Scsi_Device_Template *STpnt; - ASSERT_LOCK(&q->queue_lock, 1); + ASSERT_LOCK(q->queue_lock, 1); SDpnt = (Scsi_Device *) q->queuedata; if (!SDpnt) { @@ -871,9 +869,9 @@ */ SDpnt->was_reset = 0; if (SDpnt->removable && !in_interrupt()) { - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); scsi_ioctl(SDpnt, SCSI_IOCTL_DOORLOCK, 0); - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); continue; } } @@ -982,7 +980,7 @@ * another. */ req = NULL; - spin_unlock_irq(&q->queue_lock); + spin_unlock_irq(q->queue_lock); if (SCpnt->request.flags & REQ_CMD) { /* @@ -1012,7 +1010,7 @@ { panic("Should not have leftover blocks\n"); } - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); SHpnt->host_busy--; SDpnt->device_busy--; continue; @@ -1028,7 +1026,7 @@ { panic("Should not have leftover blocks\n"); } - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); SHpnt->host_busy--; SDpnt->device_busy--; continue; @@ -1049,7 +1047,7 @@ * Now we need to grab the lock again. We are about to mess * with the request queue and try to find another command. */ - spin_lock_irq(&q->queue_lock); + spin_lock_irq(q->queue_lock); } } diff -urN linux-2.5.1-pre10/drivers/scsi/scsi_merge.c linux/drivers/scsi/scsi_merge.c --- linux-2.5.1-pre10/drivers/scsi/scsi_merge.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/scsi/scsi_merge.c Wed Dec 12 23:32:30 2001 @@ -307,7 +307,7 @@ } #ifdef DMA_CHUNK_SIZE - if (MERGEABLE_BUFFERS(bio, req->bio)) + if (MERGEABLE_BUFFERS(req->biotail, bio)) return scsi_new_mergeable(q, req, bio); #endif @@ -461,9 +461,7 @@ * (mainly because we don't need queue management functions * which keep the tally uptodate. */ -__inline static int __init_io(Scsi_Cmnd * SCpnt, - int sg_count_valid, - int dma_host) +__inline static int __init_io(Scsi_Cmnd * SCpnt, int dma_host) { struct bio * bio; char * buff; @@ -480,11 +478,7 @@ /* * First we need to know how many scatter gather segments are needed. */ - if (!sg_count_valid) { - count = __count_segments(req, dma_host, NULL); - } else { - count = req->nr_segments; - } + count = req->nr_segments; /* * If the dma pool is nearly empty, then queue a minimal request @@ -721,20 +715,14 @@ return 1; } -#define INITIO(_FUNCTION, _VALID, _DMA) \ +#define INITIO(_FUNCTION, _DMA) \ static int _FUNCTION(Scsi_Cmnd * SCpnt) \ { \ - return __init_io(SCpnt, _VALID, _DMA); \ + return __init_io(SCpnt, _DMA); \ } -/* - * ll_rw_blk.c now keeps track of the number of segments in - * a request. Thus we don't have to do it any more here. - * We always force "_VALID" to 1. Eventually clean this up - * and get rid of the extra argument. - */ -INITIO(scsi_init_io_v, 1, 0) -INITIO(scsi_init_io_vd, 1, 1) +INITIO(scsi_init_io_v, 0) +INITIO(scsi_init_io_vd, 1) /* * Function: initialize_merge_fn() diff -urN linux-2.5.1-pre10/drivers/scsi/scsi_queue.c linux/drivers/scsi/scsi_queue.c --- linux-2.5.1-pre10/drivers/scsi/scsi_queue.c Wed Dec 12 23:32:27 2001 +++ linux/drivers/scsi/scsi_queue.c Wed Dec 12 23:32:30 2001 @@ -80,7 +80,6 @@ { struct Scsi_Host *host; unsigned long flags; - request_queue_t *q = &cmd->device->request_queue; SCSI_LOG_MLQUEUE(1, printk("Inserting command %p into mlqueue\n", cmd)); @@ -138,10 +137,10 @@ * Decrement the counters, since these commands are no longer * active on the host/device. */ - spin_lock_irqsave(&q->queue_lock, flags); + spin_lock_irqsave(&cmd->host->host_lock, flags); cmd->host->host_busy--; cmd->device->device_busy--; - spin_unlock_irqrestore(&q->queue_lock, flags); + spin_unlock_irqrestore(&cmd->host->host_lock, flags); /* * Insert this command at the head of the queue for it's device. diff -urN linux-2.5.1-pre10/drivers/scsi/u14-34f.c linux/drivers/scsi/u14-34f.c --- linux-2.5.1-pre10/drivers/scsi/u14-34f.c Thu Oct 25 13:53:51 2001 +++ linux/drivers/scsi/u14-34f.c Wed Dec 12 23:32:30 2001 @@ -1,6 +1,9 @@ /* * u14-34f.c - Low-level driver for UltraStor 14F/34F SCSI host adapters. * + * 11 Dec 2001 Rev. 7.00 for linux 2.5.1 + * + Use host->host_lock instead of io_request_lock. + * * 1 May 2001 Rev. 6.05 for linux 2.4.4 * + Fix data transfer direction for opcode SEND_CUE_SHEET (0x5d) * @@ -334,7 +337,6 @@ * the driver sets host->wish_block = TRUE for all ISA boards. */ -#include #include #ifndef LinuxVersionCode @@ -343,6 +345,9 @@ #define MAX_INT_PARAM 10 +#if defined(MODULE) +#include + MODULE_PARM(boot_options, "s"); MODULE_PARM(io_port, "1-" __MODULE_STRING(MAX_INT_PARAM) "i"); MODULE_PARM(linked_comm, "i"); @@ -352,6 +357,8 @@ MODULE_PARM(ext_tran, "i"); MODULE_AUTHOR("Dario Ballabio"); +#endif + #include #include #include @@ -374,13 +381,6 @@ #include #include -#define SPIN_FLAGS unsigned long spin_flags; -#define SPIN_LOCK spin_lock_irq(&io_request_lock); -#define SPIN_LOCK_SAVE spin_lock_irqsave(&io_request_lock, spin_flags); -#define SPIN_UNLOCK spin_unlock_irq(&io_request_lock); -#define SPIN_UNLOCK_RESTORE \ - spin_unlock_irqrestore(&io_request_lock, spin_flags); - /* Values for the PRODUCT_ID ports for the 14/34F */ #define PRODUCT_ID1 0x56 #define PRODUCT_ID2 0x40 /* NOTE: Only upper nibble is used */ @@ -672,10 +672,8 @@ /* Issue OGM interrupt */ outb(CMD_OGM_INTR, sh[j]->io_port + REG_LCL_INTR); - SPIN_UNLOCK time = jiffies; while ((jiffies - time) < HZ && limit++ < 20000) udelay(100L); - SPIN_LOCK if (cpp->adapter_status || HD(j)->cp_stat[0] != FREE) { HD(j)->cp_stat[0] = FREE; @@ -1274,10 +1272,12 @@ #endif HD(j)->in_reset = TRUE; - SPIN_UNLOCK + + spin_unlock_irq(&sh[j]->host_lock); time = jiffies; while ((jiffies - time) < (10 * HZ) && limit++ < 200000) udelay(100L); - SPIN_LOCK + spin_lock_irq(&sh[j]->host_lock); + printk("%s: reset, interrupts disabled, loops %d.\n", BN(j), limit); for (i = 0; i < sh[j]->can_queue; i++) { @@ -1718,14 +1718,14 @@ static void do_interrupt_handler(int irq, void *shap, struct pt_regs *regs) { unsigned int j; - SPIN_FLAGS + unsigned long spin_flags; /* Check if the interrupt must be processed by this handler */ if ((j = (unsigned int)((char *)shap - sha)) >= num_boards) return; - SPIN_LOCK_SAVE + spin_lock_irqsave(&sh[j]->host_lock, spin_flags); ihdlr(irq, j); - SPIN_UNLOCK_RESTORE + spin_unlock_irqrestore(&sh[j]->host_lock, spin_flags); } int u14_34f_release(struct Scsi_Host *shpnt) { @@ -1752,7 +1752,6 @@ return FALSE; } -MODULE_LICENSE("BSD without advertisement clause"); static Scsi_Host_Template driver_template = ULTRASTOR_14_34F; #include "scsi_module.c" @@ -1760,3 +1759,4 @@ #ifndef MODULE __setup("u14-34f=", option_setup); #endif /* end MODULE */ +MODULE_LICENSE("GPL"); diff -urN linux-2.5.1-pre10/drivers/scsi/u14-34f.h linux/drivers/scsi/u14-34f.h --- linux-2.5.1-pre10/drivers/scsi/u14-34f.h Wed Dec 12 23:32:27 2001 +++ linux/drivers/scsi/u14-34f.h Wed Dec 12 23:32:30 2001 @@ -13,7 +13,7 @@ int u14_34f_reset(Scsi_Cmnd *); int u14_34f_biosparam(Disk *, kdev_t, int *); -#define U14_34F_VERSION "6.05.00" +#define U14_34F_VERSION "7.00.00" #define ULTRASTOR_14_34F { \ name: "UltraStor 14F/34F rev. " U14_34F_VERSION " ", \ diff -urN linux-2.5.1-pre10/fs/bio.c linux/fs/bio.c --- linux-2.5.1-pre10/fs/bio.c Wed Dec 12 23:32:27 2001 +++ linux/fs/bio.c Wed Dec 12 23:32:30 2001 @@ -48,7 +48,7 @@ #define BIO_MAX_PAGES (bvec_pool_sizes[BIOVEC_NR_POOLS - 1]) -static void * slab_pool_alloc(int gfp_mask, void *data) +static void *slab_pool_alloc(int gfp_mask, void *data) { return kmem_cache_alloc(data, gfp_mask); } diff -urN linux-2.5.1-pre10/fs/block_dev.c linux/fs/block_dev.c --- linux-2.5.1-pre10/fs/block_dev.c Wed Dec 12 23:32:27 2001 +++ linux/fs/block_dev.c Wed Dec 12 23:32:30 2001 @@ -324,6 +324,7 @@ new_bdev->bd_dev = dev; new_bdev->bd_op = NULL; new_bdev->bd_inode = inode; + inode->i_mode = S_IFBLK; inode->i_rdev = kdev; inode->i_dev = kdev; inode->i_bdev = new_bdev; diff -urN linux-2.5.1-pre10/fs/buffer.c linux/fs/buffer.c --- linux-2.5.1-pre10/fs/buffer.c Wed Dec 12 23:32:27 2001 +++ linux/fs/buffer.c Wed Dec 12 23:32:30 2001 @@ -2005,12 +2005,12 @@ { int i, nr_blocks, retval; sector_t *blocks = iobuf->blocks; - struct buffer_head bh; - bh.b_dev = inode->i_dev; nr_blocks = iobuf->length / blocksize; /* build the blocklist */ for (i = 0; i < nr_blocks; i++, blocknr++) { + struct buffer_head bh; + bh.b_state = 0; bh.b_dev = inode->i_dev; bh.b_size = blocksize; @@ -2037,7 +2037,7 @@ } /* This does not understand multi-device filesystems currently */ - retval = brw_kiovec(rw, 1, &iobuf, bh.b_dev, blocks, blocksize); + retval = brw_kiovec(rw, 1, &iobuf, inode->i_dev, blocks, blocksize); out: return retval; diff -urN linux-2.5.1-pre10/fs/ufs/inode.c linux/fs/ufs/inode.c --- linux-2.5.1-pre10/fs/ufs/inode.c Mon Nov 19 14:55:46 2001 +++ linux/fs/ufs/inode.c Wed Dec 12 23:32:30 2001 @@ -311,7 +311,7 @@ return result; } -static int ufs_getfrag_block (struct inode *inode, long fragment, struct buffer_head *bh_result, int create) +static int ufs_getfrag_block (struct inode *inode, sector_t fragment, struct buffer_head *bh_result, int create) { struct super_block * sb; struct ufs_sb_private_info * uspi; diff -urN linux-2.5.1-pre10/include/asm-i386/io.h linux/include/asm-i386/io.h --- linux-2.5.1-pre10/include/asm-i386/io.h Wed Dec 12 23:32:27 2001 +++ linux/include/asm-i386/io.h Wed Dec 12 23:32:30 2001 @@ -51,12 +51,9 @@ */ #if CONFIG_DEBUG_IOVIRT extern void *__io_virt_debug(unsigned long x, const char *file, int line); - extern unsigned long __io_phys_debug(unsigned long x, const char *file, int line); #define __io_virt(x) __io_virt_debug((unsigned long)(x), __FILE__, __LINE__) -//#define __io_phys(x) __io_phys_debug((unsigned long)(x), __FILE__, __LINE__) #else #define __io_virt(x) ((void *)(x)) -//#define __io_phys(x) __pa(x) #endif /* diff -urN linux-2.5.1-pre10/include/asm-s390/io.h linux/include/asm-s390/io.h --- linux-2.5.1-pre10/include/asm-s390/io.h Wed Jul 25 14:12:02 2001 +++ linux/include/asm-s390/io.h Wed Dec 12 23:32:30 2001 @@ -19,7 +19,7 @@ #define IO_SPACE_LIMIT 0xffffffff #define __io_virt(x) ((void *)(PAGE_OFFSET | (unsigned long)(x))) -#define __io_phys(x) ((unsigned long)(x) & ~PAGE_OFFSET) + /* * Change virtual addresses to physical addresses and vv. * These are pretty trivial diff -urN linux-2.5.1-pre10/include/asm-s390x/io.h linux/include/asm-s390x/io.h --- linux-2.5.1-pre10/include/asm-s390x/io.h Wed Jul 25 14:12:02 2001 +++ linux/include/asm-s390x/io.h Wed Dec 12 23:32:30 2001 @@ -19,7 +19,7 @@ #define IO_SPACE_LIMIT 0xffffffff #define __io_virt(x) ((void *)(PAGE_OFFSET | (unsigned long)(x))) -#define __io_phys(x) ((unsigned long)(x) & ~PAGE_OFFSET) + /* * Change virtual addresses to physical addresses and vv. * These are pretty trivial diff -urN linux-2.5.1-pre10/include/linux/blkdev.h linux/include/linux/blkdev.h --- linux-2.5.1-pre10/include/linux/blkdev.h Wed Dec 12 23:32:27 2001 +++ linux/include/linux/blkdev.h Wed Dec 12 23:32:30 2001 @@ -160,7 +160,7 @@ /* * protects queue structures from reentrancy */ - spinlock_t queue_lock; + spinlock_t *queue_lock; /* * queue settings @@ -258,13 +258,14 @@ extern void blk_plug_device(request_queue_t *); extern void blk_recount_segments(request_queue_t *, struct bio *); extern inline int blk_contig_segment(request_queue_t *q, struct bio *, struct bio *); +extern void blk_queue_assign_lock(request_queue_t *q, spinlock_t *); extern int block_ioctl(kdev_t, unsigned int, unsigned long); /* * Access functions for manipulating queue properties */ -extern int blk_init_queue(request_queue_t *, request_fn_proc *); +extern int blk_init_queue(request_queue_t *, request_fn_proc *, spinlock_t *); extern void blk_cleanup_queue(request_queue_t *); extern void blk_queue_make_request(request_queue_t *, make_request_fn *); extern void blk_queue_bounce_limit(request_queue_t *, u64); diff -urN linux-2.5.1-pre10/include/linux/devfs_fs_kernel.h linux/include/linux/devfs_fs_kernel.h --- linux-2.5.1-pre10/include/linux/devfs_fs_kernel.h Wed Dec 12 23:32:27 2001 +++ linux/include/linux/devfs_fs_kernel.h Wed Dec 12 23:32:30 2001 @@ -47,14 +47,6 @@ typedef struct devfs_entry * devfs_handle_t; - -#ifdef CONFIG_BLK_DEV_INITRD -# define ROOT_DEVICE_NAME ((real_root_dev ==ROOT_DEV) ? root_device_name:NULL) -#else -# define ROOT_DEVICE_NAME root_device_name -#endif - - #ifdef CONFIG_DEVFS_FS struct unique_numspace diff -urN linux-2.5.1-pre10/include/linux/ide.h linux/include/linux/ide.h --- linux-2.5.1-pre10/include/linux/ide.h Wed Dec 12 23:32:27 2001 +++ linux/include/linux/ide.h Wed Dec 12 23:32:30 2001 @@ -1001,7 +1001,6 @@ void hwif_unregister (ide_hwif_t *hwif); -#define DRIVE_LOCK(drive) (&(drive)->queue.queue_lock) extern spinlock_t ide_lock; #endif /* _IDE_H */ diff -urN linux-2.5.1-pre10/include/linux/mempool.h linux/include/linux/mempool.h --- linux-2.5.1-pre10/include/linux/mempool.h Wed Dec 12 23:32:27 2001 +++ linux/include/linux/mempool.h Wed Dec 12 23:32:30 2001 @@ -25,6 +25,7 @@ }; extern mempool_t * mempool_create(int min_nr, mempool_alloc_t *alloc_fn, mempool_free_t *free_fn, void *pool_data); +extern void mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask); extern void mempool_destroy(mempool_t *pool); extern void * mempool_alloc(mempool_t *pool, int gfp_mask); extern void mempool_free(void *element, mempool_t *pool); diff -urN linux-2.5.1-pre10/include/linux/nbd.h linux/include/linux/nbd.h --- linux-2.5.1-pre10/include/linux/nbd.h Wed Dec 12 23:32:27 2001 +++ linux/include/linux/nbd.h Wed Dec 12 23:32:30 2001 @@ -46,7 +46,7 @@ #ifdef PARANOIA requests_out++; #endif - spin_lock_irqsave(&q->queue_lock, flags); + spin_lock_irqsave(q->queue_lock, flags); while((bio = req->bio) != NULL) { nsect = bio_sectors(bio); blk_finished_io(nsect); @@ -55,7 +55,7 @@ bio_endio(bio, uptodate, nsect); } blkdev_release_request(req); - spin_unlock_irqrestore(&q->queue_lock, flags); + spin_unlock_irqrestore(q->queue_lock, flags); } #define MAX_NBD 128 diff -urN linux-2.5.1-pre10/include/linux/raid/md.h linux/include/linux/raid/md.h --- linux-2.5.1-pre10/include/linux/raid/md.h Thu Nov 22 11:48:07 2001 +++ linux/include/linux/raid/md.h Wed Dec 12 23:32:30 2001 @@ -37,8 +37,12 @@ #include #include #include +#include +#include +#include +#include +#include -#include /* * 'md_p.h' holds the 'physical' layout of RAID devices * 'md_u.h' holds the user <=> kernel API diff -urN linux-2.5.1-pre10/include/linux/raid/md_compatible.h linux/include/linux/raid/md_compatible.h --- linux-2.5.1-pre10/include/linux/raid/md_compatible.h Thu Nov 22 11:48:07 2001 +++ linux/include/linux/raid/md_compatible.h Wed Dec 31 16:00:00 1969 @@ -1,158 +0,0 @@ - -/* - md.h : Multiple Devices driver compatibility layer for Linux 2.0/2.2 - Copyright (C) 1998 Ingo Molnar - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - You should have received a copy of the GNU General Public License - (for example /usr/src/linux/COPYING); if not, write to the Free - Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -*/ - -#include - -#ifndef _MD_COMPATIBLE_H -#define _MD_COMPATIBLE_H - -/** 2.3/2.4 stuff: **/ - -#include -#include -#include - -/* 000 */ -#define md__get_free_pages(x,y) __get_free_pages(x,y) - -#if defined(__i386__) || defined(__x86_64__) -/* 001 */ -static __inline__ int md_cpu_has_mmx(void) -{ - return test_bit(X86_FEATURE_MMX, &boot_cpu_data.x86_capability); -} -#else -#define md_cpu_has_mmx(x) (0) -#endif - -/* 002 */ -#define md_clear_page(page) clear_page(page) - -/* 003 */ -#define MD_EXPORT_SYMBOL(x) EXPORT_SYMBOL(x) - -/* 004 */ -#define md_copy_to_user(x,y,z) copy_to_user(x,y,z) - -/* 005 */ -#define md_copy_from_user(x,y,z) copy_from_user(x,y,z) - -/* 006 */ -#define md_put_user put_user - -/* 007 */ -static inline int md_capable_admin(void) -{ - return capable(CAP_SYS_ADMIN); -} - -/* 008 */ -#define MD_FILE_TO_INODE(file) ((file)->f_dentry->d_inode) - -/* 009 */ -static inline void md_flush_signals (void) -{ - spin_lock(¤t->sigmask_lock); - flush_signals(current); - spin_unlock(¤t->sigmask_lock); -} - -/* 010 */ -static inline void md_init_signals (void) -{ - current->exit_signal = SIGCHLD; - siginitsetinv(¤t->blocked, sigmask(SIGKILL)); -} - -/* 011 */ -#define md_signal_pending signal_pending - -/* 012 - md_set_global_readahead - nowhere used */ - -/* 013 */ -#define md_mdelay(x) mdelay(x) - -/* 014 */ -#define MD_SYS_DOWN SYS_DOWN -#define MD_SYS_HALT SYS_HALT -#define MD_SYS_POWER_OFF SYS_POWER_OFF - -/* 015 */ -#define md_register_reboot_notifier register_reboot_notifier - -/* 016 */ -#define md_test_and_set_bit test_and_set_bit - -/* 017 */ -#define md_test_and_clear_bit test_and_clear_bit - -/* 018 */ -#define md_atomic_read atomic_read -#define md_atomic_set atomic_set - -/* 019 */ -#define md_lock_kernel lock_kernel -#define md_unlock_kernel unlock_kernel - -/* 020 */ - -#include - -#define md__init __init -#define md__initdata __initdata -#define md__initfunc(__arginit) __initfunc(__arginit) - -/* 021 */ - - -/* 022 */ - -#define md_list_head list_head -#define MD_LIST_HEAD(name) LIST_HEAD(name) -#define MD_INIT_LIST_HEAD(ptr) INIT_LIST_HEAD(ptr) -#define md_list_add list_add -#define md_list_del list_del -#define md_list_empty list_empty - -#define md_list_entry(ptr, type, member) list_entry(ptr, type, member) - -/* 023 */ - -#define md_schedule_timeout schedule_timeout - -/* 024 */ -#define md_need_resched(tsk) ((tsk)->need_resched) - -/* 025 */ -#define md_spinlock_t spinlock_t -#define MD_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED - -#define md_spin_lock spin_lock -#define md_spin_unlock spin_unlock -#define md_spin_lock_irq spin_lock_irq -#define md_spin_unlock_irq spin_unlock_irq -#define md_spin_unlock_irqrestore spin_unlock_irqrestore -#define md_spin_lock_irqsave spin_lock_irqsave - -/* 026 */ -typedef wait_queue_head_t md_wait_queue_head_t; -#define MD_DECLARE_WAITQUEUE(w,t) DECLARE_WAITQUEUE((w),(t)) -#define MD_DECLARE_WAIT_QUEUE_HEAD(x) DECLARE_WAIT_QUEUE_HEAD(x) -#define md_init_waitqueue_head init_waitqueue_head - -/* END */ - -#endif - diff -urN linux-2.5.1-pre10/include/linux/raid/md_k.h linux/include/linux/raid/md_k.h --- linux-2.5.1-pre10/include/linux/raid/md_k.h Wed Dec 12 23:32:27 2001 +++ linux/include/linux/raid/md_k.h Wed Dec 12 23:32:30 2001 @@ -158,9 +158,9 @@ */ struct mdk_rdev_s { - struct md_list_head same_set; /* RAID devices within the same set */ - struct md_list_head all; /* all RAID devices */ - struct md_list_head pending; /* undetected RAID devices */ + struct list_head same_set; /* RAID devices within the same set */ + struct list_head all; /* all RAID devices */ + struct list_head pending; /* undetected RAID devices */ kdev_t dev; /* Device number */ kdev_t old_dev; /* "" when it was last imported */ @@ -197,7 +197,7 @@ int __minor; mdp_super_t *sb; int nb_dev; - struct md_list_head disks; + struct list_head disks; int sb_dirty; mdu_param_t param; int ro; @@ -212,9 +212,9 @@ atomic_t active; atomic_t recovery_active; /* blocks scheduled, but not written */ - md_wait_queue_head_t recovery_wait; + wait_queue_head_t recovery_wait; - struct md_list_head all_mddevs; + struct list_head all_mddevs; }; struct mdk_personality_s @@ -240,7 +240,7 @@ int (*stop_resync)(mddev_t *mddev); int (*restart_resync)(mddev_t *mddev); - int (*sync_request)(mddev_t *mddev, unsigned long block_nr); + int (*sync_request)(mddev_t *mddev, sector_t sector_nr); }; @@ -269,9 +269,9 @@ */ #define ITERATE_RDEV_GENERIC(head,field,rdev,tmp) \ \ - for (tmp = head.next; \ - rdev = md_list_entry(tmp, mdk_rdev_t, field), \ - tmp = tmp->next, tmp->prev != &head \ + for ((tmp) = (head).next; \ + (rdev) = (list_entry((tmp), mdk_rdev_t, field)), \ + (tmp) = (tmp)->next, (tmp)->prev != &(head) \ ; ) /* * iterates through the 'same array disks' ringlist @@ -305,7 +305,7 @@ #define ITERATE_MDDEV(mddev,tmp) \ \ for (tmp = all_mddevs.next; \ - mddev = md_list_entry(tmp, mddev_t, all_mddevs), \ + mddev = list_entry(tmp, mddev_t, all_mddevs), \ tmp = tmp->next, tmp->prev != &all_mddevs \ ; ) @@ -325,7 +325,7 @@ typedef struct mdk_thread_s { void (*run) (void *data); void *data; - md_wait_queue_head_t wqueue; + wait_queue_head_t wqueue; unsigned long flags; struct completion *event; struct task_struct *tsk; @@ -337,7 +337,7 @@ #define MAX_DISKNAME_LEN 64 typedef struct dev_name_s { - struct md_list_head list; + struct list_head list; kdev_t dev; char namebuf [MAX_DISKNAME_LEN]; char *name; diff -urN linux-2.5.1-pre10/include/linux/raid/raid1.h linux/include/linux/raid/raid1.h --- linux-2.5.1-pre10/include/linux/raid/raid1.h Sun Aug 12 12:39:02 2001 +++ linux/include/linux/raid/raid1.h Wed Dec 12 23:32:30 2001 @@ -3,6 +3,8 @@ #include +typedef struct mirror_info mirror_info_t; + struct mirror_info { int number; int raid_disk; @@ -20,34 +22,21 @@ int used_slot; }; -struct raid1_private_data { +typedef struct r1bio_s r1bio_t; + +struct r1_private_data_s { mddev_t *mddev; - struct mirror_info mirrors[MD_SB_DISKS]; + mirror_info_t mirrors[MD_SB_DISKS]; int nr_disks; int raid_disks; int working_disks; int last_used; - unsigned long next_sect; + sector_t next_sect; int sect_count; mdk_thread_t *thread, *resync_thread; int resync_mirrors; - struct mirror_info *spare; - md_spinlock_t device_lock; - - /* buffer pool */ - /* buffer_heads that we have pre-allocated have b_pprev -> &freebh - * and are linked into a stack using b_next - * raid1_bh that are pre-allocated have R1BH_PreAlloc set. - * All these variable are protected by device_lock - */ - struct buffer_head *freebh; - int freebh_cnt; /* how many are on the list */ - int freebh_blocked; - struct raid1_bh *freer1; - int freer1_blocked; - int freer1_cnt; - struct raid1_bh *freebuf; /* each bh_req has a page allocated */ - md_wait_queue_head_t wait_buffer; + mirror_info_t *spare; + spinlock_t device_lock; /* for use when syncing mirrors: */ unsigned long start_active, start_ready, @@ -56,18 +45,21 @@ cnt_pending, cnt_future; int phase; int window; - md_wait_queue_head_t wait_done; - md_wait_queue_head_t wait_ready; - md_spinlock_t segment_lock; + wait_queue_head_t wait_done; + wait_queue_head_t wait_ready; + spinlock_t segment_lock; + + mempool_t *r1bio_pool; + mempool_t *r1buf_pool; }; -typedef struct raid1_private_data raid1_conf_t; +typedef struct r1_private_data_s conf_t; /* * this is the only point in the RAID code where we violate * C type safety. mddev->private is an 'opaque' pointer. */ -#define mddev_to_conf(mddev) ((raid1_conf_t *) mddev->private) +#define mddev_to_conf(mddev) ((conf_t *) mddev->private) /* * this is our 'private' 'collective' RAID1 buffer head. @@ -75,20 +67,32 @@ * for this RAID1 operation, and about their status: */ -struct raid1_bh { +struct r1bio_s { atomic_t remaining; /* 'have we finished' count, * used from IRQ handlers */ int cmd; + sector_t sector; unsigned long state; mddev_t *mddev; - struct buffer_head *master_bh; - struct buffer_head *mirror_bh_list; - struct buffer_head bh_req; - struct raid1_bh *next_r1; /* next for retry or in free list */ + /* + * original bio going to /dev/mdx + */ + struct bio *master_bio; + /* + * if the IO is in READ direction, then this bio is used: + */ + struct bio *read_bio; + /* + * if the IO is in WRITE direction, then multiple bios are used: + */ + struct bio *write_bios[MD_SB_DISKS]; + + r1bio_t *next_r1; /* next for retry or in free list */ + struct list_head retry_list; }; -/* bits for raid1_bh.state */ -#define R1BH_Uptodate 1 -#define R1BH_SyncPhase 2 -#define R1BH_PreAlloc 3 /* this was pre-allocated, add to free list */ + +/* bits for r1bio.state */ +#define R1BIO_Uptodate 1 +#define R1BIO_SyncPhase 2 #endif diff -urN linux-2.5.1-pre10/init/do_mounts.c linux/init/do_mounts.c --- linux-2.5.1-pre10/init/do_mounts.c Wed Dec 12 23:32:27 2001 +++ linux/init/do_mounts.c Wed Dec 12 23:32:30 2001 @@ -14,37 +14,44 @@ #include #include #include +#include +#include +#include #include -/* syscalls missing from unistd.h */ - -static inline _syscall2(int,mkdir,char *,name,int,mode); -static inline _syscall1(int,chdir,char *,name); -static inline _syscall1(int,chroot,char *,name); -static inline _syscall1(int,unlink,char *,name); -static inline _syscall3(int,mknod,char *,name,int,mode,dev_t,dev); -static inline _syscall5(int,mount,char *,dev,char *,dir,char *,type, - unsigned long,flags,void *,data); -static inline _syscall2(int,umount,char *,name,int,flags); +#define BUILD_CRAMDISK -extern void rd_load(void); -extern void initrd_load(void); extern int get_filesystem_list(char * buf); extern void wait_for_keypress(void); -asmlinkage long sys_mount(char * dev_name, char * dir_name, char * type, - unsigned long flags, void * data); +asmlinkage long sys_mount(char *dev_name, char *dir_name, char *type, + unsigned long flags, void *data); +asmlinkage long sys_mkdir(char *name, int mode); +asmlinkage long sys_chdir(char *name); +asmlinkage long sys_chroot(char *name); +asmlinkage long sys_unlink(char *name); +asmlinkage long sys_symlink(char *old, char *new); +asmlinkage long sys_mknod(char *name, int mode, dev_t dev); +asmlinkage long sys_umount(char *name, int flags); +asmlinkage long sys_ioctl(int fd, int cmd, unsigned long arg); #ifdef CONFIG_BLK_DEV_INITRD unsigned int real_root_dev; /* do_proc_dointvec cannot handle kdev_t */ #endif -int root_mountflags = MS_RDONLY; -char root_device_name[64]; +#ifdef CONFIG_BLK_DEV_RAM +extern int rd_doload; +#else +static int rd_doload = 0; +#endif +int root_mountflags = MS_RDONLY | MS_VERBOSE; +static char root_device_name[64]; /* this is initialized in init/main.c */ kdev_t ROOT_DEV; +static int do_devfs = 0; + static int __init readonly(char *str) { if (*str) @@ -275,91 +282,20 @@ } *s = '\0'; } - -static void __init mount_root(void) +static void __init mount_block_root(char *name, int flags) { - void *handle; - char path[64]; - char *name = "/dev/root"; - char *fs_names, *p; - int do_devfs = 0; - - root_mountflags |= MS_VERBOSE; + char *fs_names = __getname(); + char *p; - fs_names = __getname(); get_fs_names(fs_names); - -#ifdef CONFIG_ROOT_NFS - if (MAJOR(ROOT_DEV) == UNNAMED_MAJOR) { - void *data; - data = nfs_root_data(); - if (data) { - int err = mount("/dev/root", "/root", "nfs", root_mountflags, data); - if (!err) - goto done; - } - printk(KERN_ERR "VFS: Unable to mount root fs via NFS, trying floppy.\n"); - ROOT_DEV = MKDEV(FLOPPY_MAJOR, 0); - } -#endif - -#ifdef CONFIG_BLK_DEV_FD - if (MAJOR(ROOT_DEV) == FLOPPY_MAJOR) { -#ifdef CONFIG_BLK_DEV_RAM - extern int rd_doload; - extern void rd_load_secondary(void); -#endif - floppy_eject(); -#ifndef CONFIG_BLK_DEV_RAM - printk(KERN_NOTICE "(Warning, this kernel has no ramdisk support)\n"); -#else - /* rd_doload is 2 for a dual initrd/ramload setup */ - if(rd_doload==2) - rd_load_secondary(); - else -#endif - { - printk(KERN_NOTICE "VFS: Insert root floppy and press ENTER\n"); - wait_for_keypress(); - } - } -#endif - - devfs_make_root (root_device_name); - handle = devfs_find_handle (NULL, ROOT_DEVICE_NAME, - MAJOR (ROOT_DEV), MINOR (ROOT_DEV), - DEVFS_SPECIAL_BLK, 1); - if (handle) { - int n; - unsigned major, minor; - - devfs_get_maj_min (handle, &major, &minor); - ROOT_DEV = MKDEV (major, minor); - if (!ROOT_DEV) - panic("I have no root and I want to scream"); - n = devfs_generate_path (handle, path + 5, sizeof (path) - 5); - if (n >= 0) { - name = path + n; - devfs_mk_symlink (NULL, "root", DEVFS_FL_DEFAULT, - name + 5, NULL, NULL); - memcpy (name, "/dev/", 5); - do_devfs = 1; - } - } - chdir("/dev"); - unlink("root"); - mknod("root", S_IFBLK|0600, kdev_t_to_nr(ROOT_DEV)); - if (do_devfs) - mount("devfs", ".", "devfs", 0, NULL); retry: for (p = fs_names; *p; p += strlen(p)+1) { - int err; - err = sys_mount(name,"/root",p,root_mountflags,root_mount_data); + int err = sys_mount(name, "/root", p, flags, root_mount_data); switch (err) { case 0: - goto done; + goto out; case -EACCES: - root_mountflags |= MS_RDONLY; + flags |= MS_RDONLY; goto retry; case -EINVAL: continue; @@ -375,94 +311,324 @@ kdevname(ROOT_DEV)); } panic("VFS: Unable to mount root fs on %s", kdevname(ROOT_DEV)); - -done: +out: putname(fs_names); - if (do_devfs) - umount(".", 0); + sys_chdir("/root"); + ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev; + printk("VFS: Mounted root (%s filesystem)%s.\n", + current->fs->pwdmnt->mnt_sb->s_type->name, + (current->fs->pwdmnt->mnt_sb->s_flags & MS_RDONLY) ? " readonly" : ""); } + +#ifdef CONFIG_ROOT_NFS +static int __init mount_nfs_root(void) +{ + void *data = nfs_root_data(); -#ifdef CONFIG_BLK_DEV_INITRD + if (data && sys_mount("/dev/root","/root","nfs",root_mountflags,data) == 0) + return 1; + return 0; +} +#endif + +static int __init create_dev(char *name, kdev_t dev, char *devfs_name) +{ + void *handle; + char path[64]; + int n; + + sys_unlink(name); + if (!do_devfs) + return sys_mknod(name, S_IFBLK|0600, kdev_t_to_nr(dev)); + + handle = devfs_find_handle(NULL, dev ? NULL : devfs_name, + MAJOR(dev), MINOR(dev), DEVFS_SPECIAL_BLK, 1); + if (!handle) + return -1; + n = devfs_generate_path(handle, path + 5, sizeof (path) - 5); + if (n < 0) + return -1; + return sys_symlink(path + n + 5, name); +} + +#ifdef CONFIG_MAC_FLOPPY +int swim3_fd_eject(int devnum); +#endif +static void __init change_floppy(char *fmt, ...) +{ + extern void wait_for_keypress(void); + char buf[80]; + va_list args; + va_start(args, fmt); + vsprintf(buf, fmt, args); + va_end(args); +#ifdef CONFIG_BLK_DEV_FD + floppy_eject(); +#endif +#ifdef CONFIG_MAC_FLOPPY + swim3_fd_eject(MINOR(ROOT_DEV)); +#endif + printk(KERN_NOTICE "VFS: Insert %s and press ENTER\n", buf); + wait_for_keypress(); +} -static int __init change_root(kdev_t new_root_dev,const char *put_old) +#ifdef CONFIG_BLK_DEV_RAM + +static int __init crd_load(int in_fd, int out_fd); + +/* + * This routine tries to find a RAM disk image to load, and returns the + * number of blocks to read for a non-compressed image, 0 if the image + * is a compressed image, and -1 if an image with the right magic + * numbers could not be found. + * + * We currently check for the following magic numbers: + * minix + * ext2 + * romfs + * gzip + */ +static int __init +identify_ramdisk_image(int fd, int start_block) { - struct vfsmount *old_rootmnt; - struct nameidata devfs_nd; - char *new_devname = kmalloc(strlen("/dev/root.old")+1, GFP_KERNEL); - int error = 0; - - if (new_devname) - strcpy(new_devname, "/dev/root.old"); - - /* .. here is directory mounted over root */ - mount("..", ".", NULL, MS_MOVE, NULL); - chdir("/old"); - - read_lock(¤t->fs->lock); - old_rootmnt = mntget(current->fs->pwdmnt); - read_unlock(¤t->fs->lock); - - /* First unmount devfs if mounted */ - if (path_init("/old/dev", LOOKUP_FOLLOW|LOOKUP_POSITIVE, &devfs_nd)) - error = path_walk("/old/dev", &devfs_nd); - if (!error) { - if (devfs_nd.mnt->mnt_sb->s_magic == DEVFS_SUPER_MAGIC && - devfs_nd.dentry == devfs_nd.mnt->mnt_root) - umount("/old/dev", 0); - path_release(&devfs_nd); + const int size = 512; + struct minix_super_block *minixsb; + struct ext2_super_block *ext2sb; + struct romfs_super_block *romfsb; + int nblocks = -1; + unsigned char *buf; + + buf = kmalloc(size, GFP_KERNEL); + if (buf == 0) + return -1; + + minixsb = (struct minix_super_block *) buf; + ext2sb = (struct ext2_super_block *) buf; + romfsb = (struct romfs_super_block *) buf; + memset(buf, 0xe5, size); + + /* + * Read block 0 to test for gzipped kernel + */ + lseek(fd, start_block * BLOCK_SIZE, 0); + read(fd, buf, size); + + /* + * If it matches the gzip magic numbers, return -1 + */ + if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) { + printk(KERN_NOTICE + "RAMDISK: Compressed image found at block %d\n", + start_block); + nblocks = 0; + goto done; } - ROOT_DEV = new_root_dev; - mount_root(); + /* romfs is at block zero too */ + if (romfsb->word0 == ROMSB_WORD0 && + romfsb->word1 == ROMSB_WORD1) { + printk(KERN_NOTICE + "RAMDISK: romfs filesystem found at block %d\n", + start_block); + nblocks = (ntohl(romfsb->size)+BLOCK_SIZE-1)>>BLOCK_SIZE_BITS; + goto done; + } - chdir("/root"); - ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev; - printk("VFS: Mounted root (%s filesystem)%s.\n", - current->fs->pwdmnt->mnt_sb->s_type->name, - (current->fs->pwdmnt->mnt_sb->s_flags & MS_RDONLY) ? " readonly" : ""); + /* + * Read block 1 to test for minix and ext2 superblock + */ + lseek(fd, (start_block+1) * BLOCK_SIZE, 0); + read(fd, buf, size); + + /* Try minix */ + if (minixsb->s_magic == MINIX_SUPER_MAGIC || + minixsb->s_magic == MINIX_SUPER_MAGIC2) { + printk(KERN_NOTICE + "RAMDISK: Minix filesystem found at block %d\n", + start_block); + nblocks = minixsb->s_nzones << minixsb->s_log_zone_size; + goto done; + } -#if 1 - shrink_dcache(); - printk("change_root: old root has d_count=%d\n", - atomic_read(&old_rootmnt->mnt_root->d_count)); -#endif - - error = mount("/old", "/root/initrd", NULL, MS_MOVE, NULL); - if (error) { - int blivet; - struct block_device *ramdisk = old_rootmnt->mnt_sb->s_bdev; - - atomic_inc(&ramdisk->bd_count); - blivet = blkdev_get(ramdisk, FMODE_READ, 0, BDEV_FS); - printk(KERN_NOTICE "Trying to unmount old root ... "); - umount("/old", MNT_DETACH); - if (!blivet) { - blivet = ioctl_by_bdev(ramdisk, BLKFLSBUF, 0); - blkdev_put(ramdisk, BDEV_FS); - } - if (blivet) { - printk(KERN_ERR "error %d\n", blivet); - } else { - printk("okay\n"); - error = 0; + /* Try ext2 */ + if (ext2sb->s_magic == cpu_to_le16(EXT2_SUPER_MAGIC)) { + printk(KERN_NOTICE + "RAMDISK: ext2 filesystem found at block %d\n", + start_block); + nblocks = le32_to_cpu(ext2sb->s_blocks_count); + goto done; + } + + printk(KERN_NOTICE + "RAMDISK: Couldn't find valid RAM disk image starting at %d.\n", + start_block); + +done: + lseek(fd, start_block * BLOCK_SIZE, 0); + kfree(buf); + return nblocks; +} +#endif + +static int __init rd_load_image(char *from) +{ + int res = 0; + +#ifdef CONFIG_BLK_DEV_RAM + int in_fd, out_fd; + int nblocks, rd_blocks, devblocks, i; + char *buf; + unsigned short rotate = 0; +#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES) + char rotator[4] = { '|' , '/' , '-' , '\\' }; +#endif + + out_fd = open("/dev/ram", O_RDWR, 0); + if (out_fd < 0) + goto out; + + in_fd = open(from, O_RDONLY, 0); + if (in_fd < 0) + goto noclose_input; + + nblocks = identify_ramdisk_image(in_fd, rd_image_start); + if (nblocks < 0) + goto done; + + if (nblocks == 0) { +#ifdef BUILD_CRAMDISK + if (crd_load(in_fd, out_fd) == 0) + goto successful_load; +#else + printk(KERN_NOTICE + "RAMDISK: Kernel does not support compressed " + "RAM disk images\n"); +#endif + goto done; + } + + /* + * NOTE NOTE: nblocks suppose that the blocksize is BLOCK_SIZE, so + * rd_load_image will work only with filesystem BLOCK_SIZE wide! + * So make sure to use 1k blocksize while generating ext2fs + * ramdisk-images. + */ + if (sys_ioctl(out_fd, BLKGETSIZE, (unsigned long)&rd_blocks) < 0) + rd_blocks = 0; + else + rd_blocks >>= 1; + + if (nblocks > rd_blocks) { + printk("RAMDISK: image too big! (%d/%d blocks)\n", + nblocks, rd_blocks); + goto done; + } + + /* + * OK, time to copy in the data + */ + buf = kmalloc(BLOCK_SIZE, GFP_KERNEL); + if (buf == 0) { + printk(KERN_ERR "RAMDISK: could not allocate buffer\n"); + goto done; + } + + if (sys_ioctl(in_fd, BLKGETSIZE, (unsigned long)&devblocks) < 0) + devblocks = 0; + else + devblocks >>= 1; + + if (strcmp(from, "/dev/initrd") == 0) + devblocks = nblocks; + + if (devblocks == 0) { + printk(KERN_ERR "RAMDISK: could not determine device size\n"); + goto done; + } + + printk(KERN_NOTICE "RAMDISK: Loading %d blocks [%d disk%s] into ram disk... ", + nblocks, ((nblocks-1)/devblocks)+1, nblocks>devblocks ? "s" : ""); + for (i=0; i < nblocks; i++) { + if (i && (i % devblocks == 0)) { + printk("done disk #%d.\n", i/devblocks); + rotate = 0; + if (close(in_fd)) { + printk("Error closing the disk.\n"); + goto noclose_input; + } + change_floppy("disk #%d", i/devblocks+1); + in_fd = open(from, O_RDONLY, 0); + if (in_fd < 0) { + printk("Error opening disk.\n"); + goto noclose_input; + } + printk("Loading disk #%d... ", i/devblocks+1); } - } else { - spin_lock(&dcache_lock); - if (new_devname) { - void *p = old_rootmnt->mnt_devname; - old_rootmnt->mnt_devname = new_devname; - new_devname = p; + read(in_fd, buf, BLOCK_SIZE); + write(out_fd, buf, BLOCK_SIZE); +#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES) + if (!(i % 16)) { + printk("%c\b", rotator[rotate & 0x3]); + rotate++; } - spin_unlock(&dcache_lock); +#endif } + printk("done.\n"); + kfree(buf); - /* put the old stuff */ - mntput(old_rootmnt); - kfree(new_devname); - return error; +successful_load: + res = 1; +done: + close(in_fd); +noclose_input: + close(out_fd); +out: + sys_unlink("/dev/ram"); +#endif + return res; } +static int __init rd_load_disk(int n) +{ +#ifdef CONFIG_BLK_DEV_RAM + extern int rd_prompt; + if (rd_prompt) + change_floppy("root floppy disk to be loaded into RAM disk"); + create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, n), NULL); #endif + return rd_load_image("/dev/root"); +} + +static void __init mount_root(void) +{ +#ifdef CONFIG_ROOT_NFS + if (MAJOR(ROOT_DEV) == UNNAMED_MAJOR) { + if (mount_nfs_root()) { + sys_chdir("/root"); + ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev; + printk("VFS: Mounted root (nfs filesystem).\n"); + return; + } + printk(KERN_ERR "VFS: Unable to mount root fs via NFS, trying floppy.\n"); + ROOT_DEV = MKDEV(FLOPPY_MAJOR, 0); + } +#endif + devfs_make_root(root_device_name); + create_dev("/dev/root", ROOT_DEV, root_device_name); +#ifdef CONFIG_BLK_DEV_FD + if (MAJOR(ROOT_DEV) == FLOPPY_MAJOR) { + /* rd_doload is 2 for a dual initrd/ramload setup */ + if (rd_doload==2) { + if (rd_load_disk(1)) { + ROOT_DEV = MKDEV(RAMDISK_MAJOR, 1); + create_dev("/dev/root", ROOT_DEV, NULL); + } + } else + change_floppy("root floppy"); + } +#endif + mount_block_root("/dev/root", root_mountflags); +} #ifdef CONFIG_BLK_DEV_INITRD static int do_linuxrc(void * shell) @@ -470,9 +636,9 @@ static char *argv[] = { "linuxrc", NULL, }; extern char * envp_init[]; - chdir("/root"); - mount(".", "/", NULL, MS_MOVE, NULL); - chroot("."); + sys_chdir("/root"); + sys_mount(".", "/", NULL, MS_MOVE, NULL); + sys_chroot("."); mount_devfs_fs (); @@ -486,76 +652,247 @@ #endif +static void __init handle_initrd(void) +{ +#ifdef CONFIG_BLK_DEV_INITRD + int ram0 = kdev_t_to_nr(MKDEV(RAMDISK_MAJOR,0)); + int error; + int i, pid; + + create_dev("/dev/root.old", ram0, NULL); + mount_block_root("/dev/root.old", root_mountflags & ~MS_RDONLY); + sys_mkdir("/old", 0700); + sys_chdir("/old"); + + pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD); + if (pid > 0) { + while (pid != wait(&i)) { + current->policy |= SCHED_YIELD; + schedule(); + } + } + + sys_mount("..", ".", NULL, MS_MOVE, NULL); + sys_umount("/old/dev", 0); + + if (real_root_dev == ram0) { + sys_chdir("/old"); + return; + } + + ROOT_DEV = real_root_dev; + mount_root(); + + printk(KERN_NOTICE "Trying to move old root to /initrd ... "); + error = sys_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL); + if (!error) + printk("okay\n"); + else { + int fd = open("/dev/root.old", O_RDWR, 0); + printk("failed\n"); + printk(KERN_NOTICE "Unmounting old root\n"); + sys_umount("/old", MNT_DETACH); + printk(KERN_NOTICE "Trying to free ramdisk memory ... "); + if (fd < 0) { + error = fd; + } else { + error = sys_ioctl(fd, BLKFLSBUF, 0); + close(fd); + } + printk(error ? "okay\n" : "failed\n"); + } +#endif +} + +static int __init initrd_load(void) +{ +#ifdef CONFIG_BLK_DEV_INITRD + create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, 0), NULL); + create_dev("/dev/initrd", MKDEV(RAMDISK_MAJOR, INITRD_MINOR), NULL); +#endif + return rd_load_image("/dev/initrd"); +} + /* * Prepare the namespace - decide what/where to mount, load ramdisks, etc. */ void prepare_namespace(void) { + int do_initrd = 0; + int is_floppy = MAJOR(ROOT_DEV) == FLOPPY_MAJOR; #ifdef CONFIG_BLK_DEV_INITRD - int real_root_mountflags = root_mountflags; if (!initrd_start) mount_initrd = 0; if (mount_initrd) - root_mountflags &= ~MS_RDONLY; + do_initrd = 1; real_root_dev = ROOT_DEV; #endif - mkdir("/dev", 0700); - mkdir("/root", 0700); - -#ifdef CONFIG_BLK_DEV_RAM -#ifdef CONFIG_BLK_DEV_INITRD - if (mount_initrd) - initrd_load(); - else -#endif - rd_load(); + sys_mkdir("/dev", 0700); + sys_mkdir("/root", 0700); +#ifdef CONFIG_DEVFS_FS + sys_mount("devfs", "/dev", "devfs", 0, NULL); + do_devfs = 1; #endif - /* Mount the root filesystem.. */ + create_dev("/dev/root", ROOT_DEV, NULL); + if (do_initrd) { + if (initrd_load() && ROOT_DEV != MKDEV(RAMDISK_MAJOR, 0)) { + handle_initrd(); + goto out; + } + } else if (is_floppy && rd_doload && rd_load_disk(0)) + ROOT_DEV = MKDEV(RAMDISK_MAJOR, 0); mount_root(); - chdir("/root"); - ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev; - printk("VFS: Mounted root (%s filesystem)%s.\n", - current->fs->pwdmnt->mnt_sb->s_type->name, - (current->fs->pwdmnt->mnt_sb->s_flags & MS_RDONLY) ? " readonly" : ""); +out: + sys_umount("/dev", 0); + sys_mount(".", "/", NULL, MS_MOVE, NULL); + sys_chroot("."); + mount_devfs_fs (); +} -#ifdef CONFIG_BLK_DEV_INITRD - root_mountflags = real_root_mountflags; - if (mount_initrd && ROOT_DEV != real_root_dev - && MAJOR(ROOT_DEV) == RAMDISK_MAJOR && MINOR(ROOT_DEV) == 0) { - int error; - int i, pid; - mkdir("/old", 0700); - chdir("/old"); - - pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD); - if (pid > 0) { - while (pid != wait(&i)) { - current->policy |= SCHED_YIELD; - schedule(); - } - } - if (MAJOR(real_root_dev) != RAMDISK_MAJOR - || MINOR(real_root_dev) != 0) { - error = change_root(real_root_dev,"/initrd"); - if (error) - printk(KERN_ERR "Change root to /initrd: " - "error %d\n",error); - - chdir("/root"); - mount(".", "/", NULL, MS_MOVE, NULL); - chroot("."); +#ifdef BUILD_CRAMDISK - mount_devfs_fs (); - return; - } - chroot(".."); - chdir("/"); - return; - } +/* + * gzip declarations + */ + +#define OF(args) args + +#ifndef memzero +#define memzero(s, n) memset ((s), 0, (n)) #endif - mount(".", "/", NULL, MS_MOVE, NULL); - chroot("."); - mount_devfs_fs (); +typedef unsigned char uch; +typedef unsigned short ush; +typedef unsigned long ulg; + +#define INBUFSIZ 4096 +#define WSIZE 0x8000 /* window size--must be a power of two, and */ + /* at least 32K for zip's deflate method */ + +static uch *inbuf; +static uch *window; + +static unsigned insize; /* valid bytes in inbuf */ +static unsigned inptr; /* index of next byte to be processed in inbuf */ +static unsigned outcnt; /* bytes in output buffer */ +static int exit_code; +static long bytes_out; +static int crd_infd, crd_outfd; + +#define get_byte() (inptr < insize ? inbuf[inptr++] : fill_inbuf()) + +/* Diagnostic functions (stubbed out) */ +#define Assert(cond,msg) +#define Trace(x) +#define Tracev(x) +#define Tracevv(x) +#define Tracec(c,x) +#define Tracecv(c,x) + +#define STATIC static + +static int fill_inbuf(void); +static void flush_window(void); +static void *malloc(int size); +static void free(void *where); +static void error(char *m); +static void gzip_mark(void **); +static void gzip_release(void **); + +#include "../lib/inflate.c" + +static void __init *malloc(int size) +{ + return kmalloc(size, GFP_KERNEL); +} + +static void __init free(void *where) +{ + kfree(where); +} + +static void __init gzip_mark(void **ptr) +{ +} + +static void __init gzip_release(void **ptr) +{ +} + + +/* =========================================================================== + * Fill the input buffer. This is called only when the buffer is empty + * and at least one byte is really needed. + */ +static int __init fill_inbuf(void) +{ + if (exit_code) return -1; + + insize = read(crd_infd, inbuf, INBUFSIZ); + if (insize == 0) return -1; + + inptr = 1; + + return inbuf[0]; +} + +/* =========================================================================== + * Write the output window window[0..outcnt-1] and update crc and bytes_out. + * (Used for the decompressed data only.) + */ +static void __init flush_window(void) +{ + ulg c = crc; /* temporary variable */ + unsigned n; + uch *in, ch; + + write(crd_outfd, window, outcnt); + in = window; + for (n = 0; n < outcnt; n++) { + ch = *in++; + c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8); + } + crc = c; + bytes_out += (ulg)outcnt; + outcnt = 0; +} + +static void __init error(char *x) +{ + printk(KERN_ERR "%s", x); + exit_code = 1; +} + +static int __init crd_load(int in_fd, int out_fd) +{ + int result; + + insize = 0; /* valid bytes in inbuf */ + inptr = 0; /* index of next byte to be processed in inbuf */ + outcnt = 0; /* bytes in output buffer */ + exit_code = 0; + bytes_out = 0; + crc = (ulg)0xffffffffL; /* shift register contents */ + + crd_infd = in_fd; + crd_outfd = out_fd; + inbuf = kmalloc(INBUFSIZ, GFP_KERNEL); + if (inbuf == 0) { + printk(KERN_ERR "RAMDISK: Couldn't allocate gzip buffer\n"); + return -1; + } + window = kmalloc(WSIZE, GFP_KERNEL); + if (window == 0) { + printk(KERN_ERR "RAMDISK: Couldn't allocate gzip window\n"); + kfree(inbuf); + return -1; + } + makecrc(); + result = gunzip(); + kfree(inbuf); + kfree(window); + return result; } + +#endif /* BUILD_CRAMDISK */ diff -urN linux-2.5.1-pre10/mm/memory.c linux/mm/memory.c --- linux-2.5.1-pre10/mm/memory.c Thu Nov 15 10:03:06 2001 +++ linux/mm/memory.c Wed Dec 12 23:32:30 2001 @@ -1221,8 +1221,10 @@ */ if (write_access && !(vma->vm_flags & VM_SHARED)) { struct page * page = alloc_page(GFP_HIGHUSER); - if (!page) + if (!page) { + page_cache_release(new_page); return -1; + } copy_highpage(page, new_page); page_cache_release(new_page); lru_cache_add(page); diff -urN linux-2.5.1-pre10/mm/mempool.c linux/mm/mempool.c --- linux-2.5.1-pre10/mm/mempool.c Wed Dec 12 23:32:27 2001 +++ linux/mm/mempool.c Wed Dec 12 23:32:30 2001 @@ -1,9 +1,9 @@ /* * linux/mm/mempool.c * - * memory buffer pool support. Such pools are mostly used to - * guarantee deadlock-free IO operations even during extreme - * VM load. + * memory buffer pool support. Such pools are mostly used + * for guaranteed, deadlock-free memory allocations during + * extreme VM load. * * started by Ingo Molnar, Copyright (C) 2001 */ @@ -75,6 +75,71 @@ } /** + * mempool_resize - resize an existing memory pool + * @pool: pointer to the memory pool which was allocated via + * mempool_create(). + * @new_min_nr: the new minimum number of elements guaranteed to be + * allocated for this pool. + * @gfp_mask: the usual allocation bitmask. + * + * This function shrinks/grows the pool. In the case of growing, + * it cannot be guaranteed that the pool will be grown to the new + * size immediately, but new mempool_free() calls will refill it. + * + * Note, the caller must guarantee that no mempool_destroy is called + * while this function is running. mempool_alloc() & mempool_free() + * might be called (eg. from IRQ contexts) while this function executes. + */ +void mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask) +{ + int delta; + void *element; + unsigned long flags; + struct list_head *tmp; + + if (new_min_nr <= 0) + BUG(); + + spin_lock_irqsave(&pool->lock, flags); + if (new_min_nr < pool->min_nr) { + pool->min_nr = new_min_nr; + /* + * Free possible excess elements. + */ + while (pool->curr_nr > pool->min_nr) { + tmp = pool->elements.next; + if (tmp == &pool->elements) + BUG(); + list_del(tmp); + element = tmp; + pool->curr_nr--; + spin_unlock_irqrestore(&pool->lock, flags); + + pool->free(element, pool->pool_data); + + spin_lock_irqsave(&pool->lock, flags); + } + spin_unlock_irqrestore(&pool->lock, flags); + return; + } + delta = new_min_nr - pool->min_nr; + pool->min_nr = new_min_nr; + spin_unlock_irqrestore(&pool->lock, flags); + + /* + * We refill the pool up to the new treshold - but we dont + * (cannot) guarantee that the refill succeeds. + */ + while (delta) { + element = pool->alloc(gfp_mask, pool->pool_data); + if (!element) + break; + mempool_free(element, pool); + delta--; + } +} + +/** * mempool_destroy - deallocate a memory pool * @pool: pointer to the memory pool which was allocated via * mempool_create(). @@ -110,7 +175,7 @@ * @gfp_mask: the usual allocation bitmask. * * this function only sleeps if the alloc_fn function sleeps or - * returns NULL. Note that due to preallocation guarantees this function + * returns NULL. Note that due to preallocation, this function * *never* fails. */ void * mempool_alloc(mempool_t *pool, int gfp_mask) @@ -175,7 +240,7 @@ /** * mempool_free - return an element to the pool. - * @gfp_mask: pool element pointer. + * @element: pool element pointer. * @pool: pointer to the memory pool which was allocated via * mempool_create(). * @@ -200,6 +265,7 @@ } EXPORT_SYMBOL(mempool_create); +EXPORT_SYMBOL(mempool_resize); EXPORT_SYMBOL(mempool_destroy); EXPORT_SYMBOL(mempool_alloc); EXPORT_SYMBOL(mempool_free);