Changelog:
o Big ACPI update (Paul Diefenbaugh)
o Added support for HP's zx1 McKinley platform (Alex Williamson,
Bjorn Helgaas, et al).
o Fix GENERIC build (Bjorn Helgaas)
o Fix GCC version-detection for cross-compilation (Gary Hade)
o Make loopback device and ram disk support available for HP Ski simulator
(Peter Chubb)
o Correct IA-32 Loacked Data reference fault from 3 to 4 (NOMURA, Jun'ichi)
o Update efivars from v0.04 to v0.05 (Matt Domsch)
o Tune stacked-register clearing loop for McKinley.
o Make __ia64_init_fpu() both smaller and faster.
o Tweak memset() to call __bzero() if we can determine at compile time that
the value is zero.
o Update perfmon to latest version (Stephane Eranian).
o Fix ia64_iobase initialization on APs (Bjorn Helgaas), clean up code to use
ioremap() (me).
o Fix VGA legacy initialization (Bjorn Helgaas, Alex Williamson).
o Fix bug which prevent mmap() at the very end of the page-table-mapped space.
Bug reported by Peter A. Buhr.
o Consolidate exception handling with SEARCH_EXCEPTION_TABLE() macro (Keith Owens).
o McKinley-tuned clear_page() (Ken Chen)
o McKinley-tuned copy_page().
o Fix softare I/O TLB to return <4GB addresses for coherent buffers (reported
by Dave Miller).
o AGP/DRM cleanup (Bjorn Helgaas)
o Make PC keyboard handle graciously the case where no legacy keyboard exists
(Alex Williamson, I think)
o EFI update and clean up (Matt Domsch)
o Don't fail on kernel modules with no unwind data (Andreas Schwab)
o Add back flush_icache_page() for easier compatibility with vanilla 2.4 kernel.
o Drop include/linux/crc32.h and lib/crc32.c (suggested by Matt Domsch).
diff -urN linux-2.4.18/Documentation/Configure.help lia64-2.4/Documentation/Configure.help
--- linux-2.4.18/Documentation/Configure.help Tue Feb 26 11:03:49 2002
+++ lia64-2.4/Documentation/Configure.help Fri Apr 5 16:44:44 2002
@@ -3253,6 +3253,16 @@
slot then this option isn't going to do you much good. If you're
dying to do Direct Rendering on IA-64, this is what you're looking for.
+Intel 460GX support
+CONFIG_AGP_I460
+ This option gives you AGP support for the Intel 460GX chipset. This
+ chipset, the first to support Intel Itanium processors, is new and
+ this option is correspondingly a little experimental.
+
+ If you don't have a 460GX based machine (such as BigSur) with an AGP
+ slot then this option isn't going to do you much good. If you're
+ dying to do Direct Rendering on IA-64, this is what you're looking for.
+
Intel I810/I815 DC100/I810e support
CONFIG_AGP_I810
This option gives you AGP support for the Xserver on the Intel 810
@@ -14968,12 +14978,6 @@
were partitioned using EFI GPT. Presently only useful on the
IA-64 platform.
-/dev/guid support (EXPERIMENTAL)
-CONFIG_DEVFS_GUID
- Say Y here if you would like to access disks and partitions by
- their Globally Unique Identifiers (GUIDs) which will appear as
- symbolic links in /dev/guid.
-
Ultrix partition table support
CONFIG_ULTRIX_PARTITION
Say Y here if you would like to be able to read the hard disk
@@ -23783,13 +23787,19 @@
HP-simulator For the HP simulator
().
+ HP-zx1 For HP zx1 Platforms.
SN1 For SGI SN1 Platforms.
SN2 For SGI SN2 Platforms.
DIG-compliant For DIG ("Developer's Interface Guide") compliant
- system.
+ systems.
If you don't know what to do, choose "generic".
+CONFIG_IA64_HP_ZX1
+ Build a kernel that runs on HP zx1-based systems. This adds support
+ for the zx1 IOMMU and makes root bus bridges appear in PCI config space
+ (required for zx1 agpgart support).
+
CONFIG_IA64_SGI_SN_SIM
Build a kernel that runs on both the SGI simulator AND on hardware.
There is a very slight performance penalty on hardware for including this
@@ -23888,6 +23898,15 @@
Layer) information in /proc/pal. This contains useful information
about the processors in your systems, such as cache and TLB sizes
and the PAL firmware version in use.
+
+ To use this option, you have to check that the "/proc file system
+ support" (CONFIG_PROC_FS) is enabled, too.
+
+/proc/efi/vars support
+CONFIG_EFI_VARS
+ If you say Y here, you are able to get EFI (Extensible Firmware
+ Interface) variable information in /proc/efi/vars. You may read,
+ write, create, and destroy EFI variables through this interface.
To use this option, you have to check that the "/proc file system
support" (CONFIG_PROC_FS) is enabled, too.
diff -urN linux-2.4.18/Documentation/ia64/IRQ-redir.txt lia64-2.4/Documentation/ia64/IRQ-redir.txt
--- linux-2.4.18/Documentation/ia64/IRQ-redir.txt Wed Dec 31 16:00:00 1969
+++ lia64-2.4/Documentation/ia64/IRQ-redir.txt Mon Feb 11 16:17:26 2002
@@ -0,0 +1,69 @@
+IRQ affinity on IA64 platforms
+------------------------------
+ 07.01.2002, Erich Focht
+
+
+By writing to /proc/irq/IRQ#/smp_affinity the interrupt routing can be
+controlled. The behavior on IA64 platforms is slightly different from
+that described in Documentation/IRQ-affinity.txt for i386 systems.
+
+Because of the usage of SAPIC mode and physical destination mode the
+IRQ target is one particular CPU and cannot be a mask of several
+CPUs. Only the first non-zero bit is taken into account.
+
+
+Usage examples:
+
+The target CPU has to be specified as a hexadecimal CPU mask. The
+first non-zero bit is the selected CPU. This format has been kept for
+compatibility reasons with i386.
+
+Set the delivery mode of interrupt 41 to fixed and route the
+interrupts to CPU #3 (logical CPU number) (2^3=0x08):
+ echo "8" >/proc/irq/41/smp_affinity
+
+Set the default route for IRQ number 41 to CPU 6 in lowest priority
+delivery mode (redirectable):
+ echo "r 40" >/proc/irq/41/smp_affinity
+
+The output of the command
+ cat /proc/irq/IRQ#/smp_affinity
+gives the target CPU mask for the specified interrupt vector. If the CPU
+mask is preceeded by the character "r", the interrupt is redirectable
+(i.e. lowest priority mode routing is used), otherwise its route is
+fixed.
+
+
+
+Initialization and default behavior:
+
+If the platform features IRQ redirection (info provided by SAL) all
+IO-SAPIC interrupts are initialized with CPU#0 as their default target
+and the routing is the so called "lowest priority mode" (actually
+fixed SAPIC mode with hint). The XTP chipset registers are used as hints
+for the IRQ routing. Currently in Linux XTP registers can have three
+values:
+ - minimal for an idle task,
+ - normal if any other task runs,
+ - maximal if the CPU is going to be switched off.
+The IRQ is routed to the CPU with lowest XTP register value, the
+search begins at the default CPU. Therefore most of the interrupts
+will be handled by CPU #0.
+
+If the platform doesn't feature interrupt redirection IOSAPIC fixed
+routing is used. The target CPUs are distributed in a round robin
+manner. IRQs will be routed only to the selected target CPUs. Check
+with
+ cat /proc/interrupts
+
+
+
+Comments:
+
+On large (multi-node) systems it is recommended to route the IRQs to
+the node to which the corresponding device is connected.
+For systems like the NEC AzusA we get IRQ node-affinity for free. This
+is because usually the chipsets on each node redirect the interrupts
+only to their own CPUs (as they cannot see the XTP registers on the
+other nodes).
+
diff -urN linux-2.4.18/Makefile lia64-2.4/Makefile
--- linux-2.4.18/Makefile Tue Feb 26 11:03:51 2002
+++ lia64-2.4/Makefile Fri Apr 5 20:31:50 2002
@@ -88,7 +88,7 @@
CPPFLAGS := -D__KERNEL__ -I$(HPATH)
-CFLAGS := $(CPPFLAGS) -Wall -Wstrict-prototypes -Wno-trigraphs -O2 \
+CFLAGS := $(CPPFLAGS) -Wall -Wstrict-prototypes -Wno-trigraphs -g -O2 \
-fomit-frame-pointer -fno-strict-aliasing -fno-common
AFLAGS := -D__ASSEMBLY__ $(CPPFLAGS)
diff -urN linux-2.4.18/arch/ia64/Makefile lia64-2.4/arch/ia64/Makefile
--- linux-2.4.18/arch/ia64/Makefile Mon Nov 26 11:18:19 2001
+++ lia64-2.4/arch/ia64/Makefile Sat Apr 6 00:29:08 2002
@@ -22,10 +22,10 @@
# -ffunction-sections
CFLAGS_KERNEL := -mconstant-gp
-GCC_VERSION=$(shell $(CROSS_COMPILE)$(HOSTCC) -v 2>&1 | fgrep 'gcc version' | cut -f3 -d' ' | cut -f1 -d'.')
+GCC_VERSION=$(shell $(CC) -v 2>&1 | fgrep 'gcc version' | cut -f3 -d' ' | cut -f1 -d'.')
ifneq ($(GCC_VERSION),2)
- CFLAGS += -frename-registers --param max-inline-insns=400
+ CFLAGS += -frename-registers --param max-inline-insns=2000
endif
ifeq ($(CONFIG_ITANIUM_BSTEP_SPECIFIC),y)
@@ -33,16 +33,11 @@
endif
ifdef CONFIG_IA64_GENERIC
- CORE_FILES := arch/$(ARCH)/hp/hp.a \
- arch/$(ARCH)/sn/sn.o \
- arch/$(ARCH)/dig/dig.a \
- arch/$(ARCH)/sn/io/sgiio.o \
+ CORE_FILES := arch/$(ARCH)/hp/hp.o \
+ arch/$(ARCH)/dig/dig.a \
$(CORE_FILES)
SUBDIRS := arch/$(ARCH)/hp \
- arch/$(ARCH)/sn/sn1 \
- arch/$(ARCH)/sn \
arch/$(ARCH)/dig \
- arch/$(ARCH)/sn/io \
$(SUBDIRS)
else # !GENERIC
@@ -50,7 +45,16 @@
ifdef CONFIG_IA64_HP_SIM
SUBDIRS := arch/$(ARCH)/hp \
$(SUBDIRS)
- CORE_FILES := arch/$(ARCH)/hp/hp.a \
+ CORE_FILES := arch/$(ARCH)/hp/hp.o \
+ $(CORE_FILES)
+endif
+
+ifdef CONFIG_IA64_HP_ZX1
+ SUBDIRS := arch/$(ARCH)/hp \
+ arch/$(ARCH)/dig \
+ $(SUBDIRS)
+ CORE_FILES := arch/$(ARCH)/hp/hp.o \
+ arch/$(ARCH)/dig/dig.a \
$(CORE_FILES)
endif
@@ -58,7 +62,7 @@
CFLAGS += -DBRINGUP
SUBDIRS := arch/$(ARCH)/sn/kernel \
arch/$(ARCH)/sn/io \
- arch/$(ARCH)/sn/fprom \
+ arch/$(ARCH)/sn/fakeprom \
$(SUBDIRS)
CORE_FILES := arch/$(ARCH)/sn/kernel/sn.o \
arch/$(ARCH)/sn/io/sgiio.o \
diff -urN linux-2.4.18/arch/ia64/config.in lia64-2.4/arch/ia64/config.in
--- linux-2.4.18/arch/ia64/config.in Mon Nov 26 11:18:19 2001
+++ lia64-2.4/arch/ia64/config.in Fri Apr 5 16:49:19 2002
@@ -41,6 +41,7 @@
"generic CONFIG_IA64_GENERIC \
DIG-compliant CONFIG_IA64_DIG \
HP-simulator CONFIG_IA64_HP_SIM \
+ HP-zx1 CONFIG_IA64_HP_ZX1 \
SGI-SN1 CONFIG_IA64_SGI_SN1 \
SGI-SN2 CONFIG_IA64_SGI_SN2" generic
@@ -68,7 +69,8 @@
fi
fi
-if [ "$CONFIG_IA64_DIG" = "y" ]; then
+if [ "$CONFIG_IA64_GENERIC" = "y" ] || [ "$CONFIG_IA64_DIG" = "y" ] \
+ || [ "$CONFIG_IA64_HP_ZX1" = "y" ]; then
bool ' Enable IA-64 Machine Check Abort' CONFIG_IA64_MCA
define_bool CONFIG_PM y
fi
@@ -135,6 +137,7 @@
source drivers/mtd/Config.in
source drivers/pnp/Config.in
source drivers/block/Config.in
+source drivers/ieee1394/Config.in
source drivers/message/i2o/Config.in
source drivers/md/Config.in
@@ -151,6 +154,17 @@
fi
endmenu
+else # ! HP_SIM
+mainmenu_option next_comment
+comment 'Block devices'
+tristate 'Loopback device support' CONFIG_BLK_DEV_LOOP
+dep_tristate 'Network block device support' CONFIG_BLK_DEV_NBD $CONFIG_NET
+
+tristate 'RAM disk support' CONFIG_BLK_DEV_RAM
+if [ "$CONFIG_BLK_DEV_RAM" = "y" -o "$CONFIG_BLK_DEV_RAM" = "m" ]; then
+ int ' Default RAM disk size' CONFIG_BLK_DEV_RAM_SIZE 4096
+fi
+endmenu
fi # !HP_SIM
mainmenu_option next_comment
@@ -244,7 +258,7 @@
mainmenu_option next_comment
comment 'Simulated drivers'
- tristate 'Simulated Ethernet ' CONFIG_SIMETH
+ bool 'Simulated Ethernet ' CONFIG_SIMETH
bool 'Simulated serial driver support' CONFIG_SIM_SERIAL
if [ "$CONFIG_SCSI" != "n" ]; then
bool 'Simulated SCSI disk' CONFIG_SCSI_SIM
@@ -266,9 +280,7 @@
bool ' Disable VHPT' CONFIG_DISABLE_VHPT
bool ' Magic SysRq key' CONFIG_MAGIC_SYSRQ
-# early printk is currently broken for SMP: the secondary processors get stuck...
-# bool ' Early printk support (requires VGA!)' CONFIG_IA64_EARLY_PRINTK
-
+ bool ' Early printk support (requires VGA!)' CONFIG_IA64_EARLY_PRINTK
bool ' Debug memory allocations' CONFIG_DEBUG_SLAB
bool ' Spinlock debugging' CONFIG_DEBUG_SPINLOCK
bool ' Turn on compare-and-exchange bug checking (slow!)' CONFIG_IA64_DEBUG_CMPXCHG
diff -urN linux-2.4.18/arch/ia64/defconfig lia64-2.4/arch/ia64/defconfig
--- linux-2.4.18/arch/ia64/defconfig Mon Nov 26 11:18:19 2001
+++ lia64-2.4/arch/ia64/defconfig Thu Mar 28 16:11:08 2002
@@ -299,6 +299,7 @@
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_NCR53C7xx is not set
+# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_NCR53C8XX is not set
# CONFIG_SCSI_SYM53C8XX is not set
# CONFIG_SCSI_PAS16 is not set
@@ -373,6 +374,7 @@
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_VIA_RHINE is not set
+# CONFIG_VIA_RHINE_MMIO is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_NET_POCKET is not set
@@ -554,6 +556,9 @@
# CONFIG_TUNER_3036 is not set
# CONFIG_VIDEO_STRADIS is not set
# CONFIG_VIDEO_ZORAN is not set
+# CONFIG_VIDEO_ZORAN_BUZ is not set
+# CONFIG_VIDEO_ZORAN_DC10 is not set
+# CONFIG_VIDEO_ZORAN_LML33 is not set
# CONFIG_VIDEO_ZR36120 is not set
# CONFIG_VIDEO_MEYE is not set
@@ -584,11 +589,15 @@
# CONFIG_AUTOFS4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_REISERFS_CHECK is not set
+# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_ADFS_FS is not set
# CONFIG_ADFS_FS_RW is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_BFS_FS is not set
+CONFIG_EXT3_FS=m
+CONFIG_JBD=m
+CONFIG_JBD_DEBUG=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
# CONFIG_UMSDOS_FS is not set
@@ -626,6 +635,7 @@
# Network File Systems
#
# CONFIG_CODA_FS is not set
+# CONFIG_INTERMEZZO_FS is not set
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_ROOT_NFS is not set
@@ -662,7 +672,6 @@
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
CONFIG_EFI_PARTITION=y
-# CONFIG_DEVFS_GUID is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
@@ -874,6 +883,7 @@
CONFIG_IA64_PRINT_HAZARDS=y
# CONFIG_DISABLE_VHPT is not set
CONFIG_MAGIC_SYSRQ=y
+CONFIG_IA64_EARLY_PRINTK=y
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_IA64_DEBUG_CMPXCHG is not set
diff -urN linux-2.4.18/arch/ia64/dig/setup.c lia64-2.4/arch/ia64/dig/setup.c
--- linux-2.4.18/arch/ia64/dig/setup.c Thu Apr 5 12:51:47 2001
+++ lia64-2.4/arch/ia64/dig/setup.c Wed Apr 10 11:04:02 2002
@@ -33,8 +33,7 @@
* is sufficient (the IDE driver will autodetect the drive geometry).
*/
char drive_info[4*16];
-
-unsigned char aux_device_present = 0xaa; /* XXX remove this when legacy I/O is gone */
+extern int pcat_compat;
void __init
dig_setup (char **cmdline_p)
@@ -81,13 +80,7 @@
screen_info.orig_video_ega_bx = 3; /* XXX fake */
}
-void
+void __init
dig_irq_init (void)
{
- /*
- * Disable the compatibility mode interrupts (8259 style), needs IN/OUT support
- * enabled.
- */
- outb(0xff, 0xA1);
- outb(0xff, 0x21);
}
diff -urN linux-2.4.18/arch/ia64/hp/Makefile lia64-2.4/arch/ia64/hp/Makefile
--- linux-2.4.18/arch/ia64/hp/Makefile Thu Jan 4 12:50:17 2001
+++ lia64-2.4/arch/ia64/hp/Makefile Fri Apr 5 16:44:44 2002
@@ -1,17 +1,15 @@
-#
-# ia64/platform/hp/Makefile
-#
-# Copyright (C) 1999 Silicon Graphics, Inc.
-# Copyright (C) Srinivasa Thirumalachar (sprasad@engr.sgi.com)
-#
+# arch/ia64/hp/Makefile
+# Copyright (c) 2002 Matthew Wilcox for Hewlett Packard
-all: hp.a
+ALL_SUB_DIRS := sim zx1 common
-O_TARGET := hp.a
+O_TARGET := hp.o
-obj-y := hpsim_console.o hpsim_irq.o hpsim_setup.o
-obj-$(CONFIG_IA64_GENERIC) += hpsim_machvec.o
+subdir-$(CONFIG_IA64_GENERIC) += $(ALL_SUB_DIRS)
+subdir-$(CONFIG_IA64_HP_SIM) += sim
+subdir-$(CONFIG_IA64_HP_ZX1) += zx1 common
-clean::
+SUB_DIRS := $(subdir-y)
+obj-y += $(join $(subdir-y),$(subdir-y:%=/%.o))
include $(TOPDIR)/Rules.make
diff -urN linux-2.4.18/arch/ia64/hp/common/Makefile lia64-2.4/arch/ia64/hp/common/Makefile
--- linux-2.4.18/arch/ia64/hp/common/Makefile Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/common/Makefile Fri Apr 5 16:44:44 2002
@@ -0,0 +1,14 @@
+#
+# ia64/platform/hp/common/Makefile
+#
+# Copyright (C) 2002 Hewlett Packard
+# Copyright (C) Alex Williamson (alex_williamson@hp.com)
+#
+
+O_TARGET := common.o
+
+export-objs := sba_iommu.o
+
+obj-y := sba_iommu.o
+
+include $(TOPDIR)/Rules.make
diff -urN linux-2.4.18/arch/ia64/hp/common/sba_iommu.c lia64-2.4/arch/ia64/hp/common/sba_iommu.c
--- linux-2.4.18/arch/ia64/hp/common/sba_iommu.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/common/sba_iommu.c Fri Apr 5 23:28:59 2002
@@ -0,0 +1,1850 @@
+/*
+** IA64 System Bus Adapter (SBA) I/O MMU manager
+**
+** (c) Copyright 2002 Alex Williamson
+** (c) Copyright 2002 Hewlett-Packard Company
+**
+** Portions (c) 2000 Grant Grundler (from parisc I/O MMU code)
+** Portions (c) 1999 Dave S. Miller (from sparc64 I/O MMU code)
+**
+** This program is free software; you can redistribute it and/or modify
+** it under the terms of the GNU General Public License as published by
+** the Free Software Foundation; either version 2 of the License, or
+** (at your option) any later version.
+**
+**
+** This module initializes the IOC (I/O Controller) found on HP
+** McKinley machines and their successors.
+**
+*/
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include /* ia64_get_itc() */
+#include
+#include /* PAGE_OFFSET */
+#include
+
+
+#define DRIVER_NAME "SBA"
+
+#ifndef CONFIG_IA64_HP_PROTO
+#define ALLOW_IOV_BYPASS
+#endif
+#define ENABLE_MARK_CLEAN
+/*
+** The number of debug flags is a clue - this code is fragile.
+*/
+#undef DEBUG_SBA_INIT
+#undef DEBUG_SBA_RUN
+#undef DEBUG_SBA_RUN_SG
+#undef DEBUG_SBA_RESOURCE
+#undef ASSERT_PDIR_SANITY
+#undef DEBUG_LARGE_SG_ENTRIES
+#undef DEBUG_BYPASS
+
+#define SBA_INLINE __inline__
+/* #define SBA_INLINE */
+
+#ifdef DEBUG_SBA_INIT
+#define DBG_INIT(x...) printk(x)
+#else
+#define DBG_INIT(x...)
+#endif
+
+#ifdef DEBUG_SBA_RUN
+#define DBG_RUN(x...) printk(x)
+#else
+#define DBG_RUN(x...)
+#endif
+
+#ifdef DEBUG_SBA_RUN_SG
+#define DBG_RUN_SG(x...) printk(x)
+#else
+#define DBG_RUN_SG(x...)
+#endif
+
+
+#ifdef DEBUG_SBA_RESOURCE
+#define DBG_RES(x...) printk(x)
+#else
+#define DBG_RES(x...)
+#endif
+
+#ifdef DEBUG_BYPASS
+#define DBG_BYPASS(x...) printk(x)
+#else
+#define DBG_BYPASS(x...)
+#endif
+
+#ifdef ASSERT_PDIR_SANITY
+#define ASSERT(expr) \
+ if(!(expr)) { \
+ printk( "\n" __FILE__ ":%d: Assertion " #expr " failed!\n",__LINE__); \
+ panic(#expr); \
+ }
+#else
+#define ASSERT(expr)
+#endif
+
+#define KB(x) ((x) * 1024)
+#define MB(x) (KB (KB (x)))
+#define GB(x) (MB (KB (x)))
+
+/*
+** The number of pdir entries to "free" before issueing
+** a read to PCOM register to flush out PCOM writes.
+** Interacts with allocation granularity (ie 4 or 8 entries
+** allocated and free'd/purged at a time might make this
+** less interesting).
+*/
+#define DELAYED_RESOURCE_CNT 16
+
+#define DEFAULT_DMA_HINT_REG 0
+
+#define ZX1_FUNC_ID_VALUE ((PCI_DEVICE_ID_HP_ZX1_SBA << 16) | PCI_VENDOR_ID_HP)
+#define ZX1_MC_ID ((PCI_DEVICE_ID_HP_ZX1_MC << 16) | PCI_VENDOR_ID_HP)
+
+#define SBA_FUNC_ID 0x0000 /* function id */
+#define SBA_FCLASS 0x0008 /* function class, bist, header, rev... */
+
+#define SBA_FUNC_SIZE 0x10000 /* SBA configuration function reg set */
+
+unsigned int __initdata zx1_func_offsets[] = {0x1000, 0x4000, 0x8000,
+ 0x9000, 0xa000, -1};
+
+#define SBA_IOC_OFFSET 0x1000
+
+#define MAX_IOC 1 /* we only have 1 for now*/
+
+#define IOC_IBASE 0x300 /* IO TLB */
+#define IOC_IMASK 0x308
+#define IOC_PCOM 0x310
+#define IOC_TCNFG 0x318
+#define IOC_PDIR_BASE 0x320
+
+#define IOC_IOVA_SPACE_BASE 0x40000000 /* IOVA ranges start at 1GB */
+
+/*
+** IOC supports 4/8/16/64KB page sizes (see TCNFG register)
+** It's safer (avoid memory corruption) to keep DMA page mappings
+** equivalently sized to VM PAGE_SIZE.
+**
+** We really can't avoid generating a new mapping for each
+** page since the Virtual Coherence Index has to be generated
+** and updated for each page.
+**
+** IOVP_SIZE could only be greater than PAGE_SIZE if we are
+** confident the drivers really only touch the next physical
+** page iff that driver instance owns it.
+*/
+#define IOVP_SIZE PAGE_SIZE
+#define IOVP_SHIFT PAGE_SHIFT
+#define IOVP_MASK PAGE_MASK
+
+struct ioc {
+ unsigned long ioc_hpa; /* I/O MMU base address */
+ char *res_map; /* resource map, bit == pdir entry */
+ u64 *pdir_base; /* physical base address */
+ unsigned long ibase; /* pdir IOV Space base */
+ unsigned long imask; /* pdir IOV Space mask */
+
+ unsigned long *res_hint; /* next avail IOVP - circular search */
+ spinlock_t res_lock;
+ unsigned long hint_mask_pdir; /* bits used for DMA hints */
+ unsigned int res_bitshift; /* from the RIGHT! */
+ unsigned int res_size; /* size of resource map in bytes */
+ unsigned int hint_shift_pdir;
+ unsigned long dma_mask;
+#if DELAYED_RESOURCE_CNT > 0
+ int saved_cnt;
+ struct sba_dma_pair {
+ dma_addr_t iova;
+ size_t size;
+ } saved[DELAYED_RESOURCE_CNT];
+#endif
+
+#ifdef CONFIG_PROC_FS
+#define SBA_SEARCH_SAMPLE 0x100
+ unsigned long avg_search[SBA_SEARCH_SAMPLE];
+ unsigned long avg_idx; /* current index into avg_search */
+ unsigned long used_pages;
+ unsigned long msingle_calls;
+ unsigned long msingle_pages;
+ unsigned long msg_calls;
+ unsigned long msg_pages;
+ unsigned long usingle_calls;
+ unsigned long usingle_pages;
+ unsigned long usg_calls;
+ unsigned long usg_pages;
+#ifdef ALLOW_IOV_BYPASS
+ unsigned long msingle_bypass;
+ unsigned long usingle_bypass;
+ unsigned long msg_bypass;
+#endif
+#endif
+
+ /* STUFF We don't need in performance path */
+ unsigned int pdir_size; /* in bytes, determined by IOV Space size */
+};
+
+struct sba_device {
+ struct sba_device *next; /* list of SBA's in system */
+ const char *name;
+ unsigned long sba_hpa; /* base address */
+ spinlock_t sba_lock;
+ unsigned int flags; /* state/functionality enabled */
+ unsigned int hw_rev; /* HW revision of chip */
+
+ unsigned int num_ioc; /* number of on-board IOC's */
+ struct ioc ioc[MAX_IOC];
+};
+
+
+static struct sba_device *sba_list;
+static int sba_count;
+static int reserve_sba_gart = 1;
+
+#define sba_sg_iova(sg) (sg->address)
+#define sba_sg_len(sg) (sg->length)
+#define sba_sg_buffer(sg) (sg->orig_address)
+
+/* REVISIT - fix me for multiple SBAs/IOCs */
+#define GET_IOC(dev) (sba_list->ioc)
+#define SBA_SET_AGP(sba_dev) (sba_dev->flags |= 0x1)
+#define SBA_GET_AGP(sba_dev) (sba_dev->flags & 0x1)
+
+/*
+** DMA_CHUNK_SIZE is used by the SCSI mid-layer to break up
+** (or rather not merge) DMA's into managable chunks.
+** On parisc, this is more of the software/tuning constraint
+** rather than the HW. I/O MMU allocation alogorithms can be
+** faster with smaller size is (to some degree).
+*/
+#define DMA_CHUNK_SIZE (BITS_PER_LONG*PAGE_SIZE)
+
+/* Looks nice and keeps the compiler happy */
+#define SBA_DEV(d) ((struct sba_device *) (d))
+
+#define ROUNDUP(x,y) ((x + ((y)-1)) & ~((y)-1))
+
+/************************************
+** SBA register read and write support
+**
+** BE WARNED: register writes are posted.
+** (ie follow writes which must reach HW with a read)
+**
+*/
+#define READ_REG(addr) __raw_readq(addr)
+#define WRITE_REG(val, addr) __raw_writeq(val, addr)
+
+#ifdef DEBUG_SBA_INIT
+
+/**
+ * sba_dump_tlb - debugging only - print IOMMU operating parameters
+ * @hpa: base address of the IOMMU
+ *
+ * Print the size/location of the IO MMU PDIR.
+ */
+static void
+sba_dump_tlb(char *hpa)
+{
+ DBG_INIT("IO TLB at 0x%p\n", (void *)hpa);
+ DBG_INIT("IOC_IBASE : %016lx\n", READ_REG(hpa+IOC_IBASE));
+ DBG_INIT("IOC_IMASK : %016lx\n", READ_REG(hpa+IOC_IMASK));
+ DBG_INIT("IOC_TCNFG : %016lx\n", READ_REG(hpa+IOC_TCNFG));
+ DBG_INIT("IOC_PDIR_BASE: %016lx\n", READ_REG(hpa+IOC_PDIR_BASE));
+ DBG_INIT("\n");
+}
+#endif
+
+
+#ifdef ASSERT_PDIR_SANITY
+
+/**
+ * sba_dump_pdir_entry - debugging only - print one IOMMU PDIR entry
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @msg: text to print ont the output line.
+ * @pide: pdir index.
+ *
+ * Print one entry of the IO MMU PDIR in human readable form.
+ */
+static void
+sba_dump_pdir_entry(struct ioc *ioc, char *msg, uint pide)
+{
+ /* start printing from lowest pde in rval */
+ u64 *ptr = &(ioc->pdir_base[pide & ~(BITS_PER_LONG - 1)]);
+ unsigned long *rptr = (unsigned long *) &(ioc->res_map[(pide >>3) & ~(sizeof(unsigned long) - 1)]);
+ uint rcnt;
+
+ /* printk(KERN_DEBUG "SBA: %s rp %p bit %d rval 0x%lx\n", */
+ printk("SBA: %s rp %p bit %d rval 0x%lx\n",
+ msg, rptr, pide & (BITS_PER_LONG - 1), *rptr);
+
+ rcnt = 0;
+ while (rcnt < BITS_PER_LONG) {
+ printk("%s %2d %p %016Lx\n",
+ (rcnt == (pide & (BITS_PER_LONG - 1)))
+ ? " -->" : " ",
+ rcnt, ptr, *ptr );
+ rcnt++;
+ ptr++;
+ }
+ printk("%s", msg);
+}
+
+
+/**
+ * sba_check_pdir - debugging only - consistency checker
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @msg: text to print ont the output line.
+ *
+ * Verify the resource map and pdir state is consistent
+ */
+static int
+sba_check_pdir(struct ioc *ioc, char *msg)
+{
+ u64 *rptr_end = (u64 *) &(ioc->res_map[ioc->res_size]);
+ u64 *rptr = (u64 *) ioc->res_map; /* resource map ptr */
+ u64 *pptr = ioc->pdir_base; /* pdir ptr */
+ uint pide = 0;
+
+ while (rptr < rptr_end) {
+ u64 rval;
+ int rcnt; /* number of bits we might check */
+
+ rval = *rptr;
+ rcnt = 64;
+
+ while (rcnt) {
+ /* Get last byte and highest bit from that */
+ u32 pde = ((u32)((*pptr >> (63)) & 0x1));
+ if ((rval & 0x1) ^ pde)
+ {
+ /*
+ ** BUMMER! -- res_map != pdir --
+ ** Dump rval and matching pdir entries
+ */
+ sba_dump_pdir_entry(ioc, msg, pide);
+ return(1);
+ }
+ rcnt--;
+ rval >>= 1; /* try the next bit */
+ pptr++;
+ pide++;
+ }
+ rptr++; /* look at next word of res_map */
+ }
+ /* It'd be nice if we always got here :^) */
+ return 0;
+}
+
+
+/**
+ * sba_dump_sg - debugging only - print Scatter-Gather list
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @startsg: head of the SG list
+ * @nents: number of entries in SG list
+ *
+ * print the SG list so we can verify it's correct by hand.
+ */
+static void
+sba_dump_sg( struct ioc *ioc, struct scatterlist *startsg, int nents)
+{
+ while (nents-- > 0) {
+ printk(" %d : %08lx/%05x %p\n",
+ nents,
+ (unsigned long) sba_sg_iova(startsg),
+ sba_sg_len(startsg),
+ sba_sg_buffer(startsg));
+ startsg++;
+ }
+}
+static void
+sba_check_sg( struct ioc *ioc, struct scatterlist *startsg, int nents)
+{
+ struct scatterlist *the_sg = startsg;
+ int the_nents = nents;
+
+ while (the_nents-- > 0) {
+ if (sba_sg_buffer(the_sg) == 0x0UL)
+ sba_dump_sg(NULL, startsg, nents);
+ the_sg++;
+ }
+}
+
+#endif /* ASSERT_PDIR_SANITY */
+
+
+
+
+/**************************************************************
+*
+* I/O Pdir Resource Management
+*
+* Bits set in the resource map are in use.
+* Each bit can represent a number of pages.
+* LSbs represent lower addresses (IOVA's).
+*
+***************************************************************/
+#define PAGES_PER_RANGE 1 /* could increase this to 4 or 8 if needed */
+
+/* Convert from IOVP to IOVA and vice versa. */
+#define SBA_IOVA(ioc,iovp,offset,hint_reg) ((ioc->ibase) | (iovp) | (offset) | ((hint_reg)<<(ioc->hint_shift_pdir)))
+#define SBA_IOVP(ioc,iova) (((iova) & ioc->hint_mask_pdir) & ~(ioc->ibase))
+
+/* FIXME : review these macros to verify correctness and usage */
+#define PDIR_INDEX(iovp) ((iovp)>>IOVP_SHIFT)
+
+#define RESMAP_MASK(n) ~(~0UL << (n))
+#define RESMAP_IDX_MASK (sizeof(unsigned long) - 1)
+
+
+/**
+ * sba_search_bitmap - find free space in IO PDIR resource bitmap
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @bits_wanted: number of entries we need.
+ *
+ * Find consecutive free bits in resource bitmap.
+ * Each bit represents one entry in the IO Pdir.
+ * Cool perf optimization: search for log2(size) bits at a time.
+ */
+static SBA_INLINE unsigned long
+sba_search_bitmap(struct ioc *ioc, unsigned long bits_wanted)
+{
+ unsigned long *res_ptr = ioc->res_hint;
+ unsigned long *res_end = (unsigned long *) &(ioc->res_map[ioc->res_size]);
+ unsigned long pide = ~0UL;
+
+ ASSERT(((unsigned long) ioc->res_hint & (sizeof(unsigned long) - 1UL)) == 0);
+ ASSERT(res_ptr < res_end);
+ if (bits_wanted > (BITS_PER_LONG/2)) {
+ /* Search word at a time - no mask needed */
+ for(; res_ptr < res_end; ++res_ptr) {
+ if (*res_ptr == 0) {
+ *res_ptr = RESMAP_MASK(bits_wanted);
+ pide = ((unsigned long)res_ptr - (unsigned long)ioc->res_map);
+ pide <<= 3; /* convert to bit address */
+ break;
+ }
+ }
+ /* point to the next word on next pass */
+ res_ptr++;
+ ioc->res_bitshift = 0;
+ } else {
+ /*
+ ** Search the resource bit map on well-aligned values.
+ ** "o" is the alignment.
+ ** We need the alignment to invalidate I/O TLB using
+ ** SBA HW features in the unmap path.
+ */
+ unsigned long o = 1 << get_order(bits_wanted << PAGE_SHIFT);
+ uint bitshiftcnt = ROUNDUP(ioc->res_bitshift, o);
+ unsigned long mask;
+
+ if (bitshiftcnt >= BITS_PER_LONG) {
+ bitshiftcnt = 0;
+ res_ptr++;
+ }
+ mask = RESMAP_MASK(bits_wanted) << bitshiftcnt;
+
+ DBG_RES("%s() o %ld %p", __FUNCTION__, o, res_ptr);
+ while(res_ptr < res_end)
+ {
+ DBG_RES(" %p %lx %lx\n", res_ptr, mask, *res_ptr);
+ ASSERT(0 != mask);
+ if(0 == ((*res_ptr) & mask)) {
+ *res_ptr |= mask; /* mark resources busy! */
+ pide = ((unsigned long)res_ptr - (unsigned long)ioc->res_map);
+ pide <<= 3; /* convert to bit address */
+ pide += bitshiftcnt;
+ break;
+ }
+ mask <<= o;
+ bitshiftcnt += o;
+ if (0 == mask) {
+ mask = RESMAP_MASK(bits_wanted);
+ bitshiftcnt=0;
+ res_ptr++;
+ }
+ }
+ /* look in the same word on the next pass */
+ ioc->res_bitshift = bitshiftcnt + bits_wanted;
+ }
+
+ /* wrapped ? */
+ if (res_end <= res_ptr) {
+ ioc->res_hint = (unsigned long *) ioc->res_map;
+ ioc->res_bitshift = 0;
+ } else {
+ ioc->res_hint = res_ptr;
+ }
+ return (pide);
+}
+
+
+/**
+ * sba_alloc_range - find free bits and mark them in IO PDIR resource bitmap
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @size: number of bytes to create a mapping for
+ *
+ * Given a size, find consecutive unmarked and then mark those bits in the
+ * resource bit map.
+ */
+static int
+sba_alloc_range(struct ioc *ioc, size_t size)
+{
+ unsigned int pages_needed = size >> IOVP_SHIFT;
+#ifdef CONFIG_PROC_FS
+ unsigned long itc_start = ia64_get_itc();
+#endif
+ unsigned long pide;
+
+ ASSERT(pages_needed);
+ ASSERT((pages_needed * IOVP_SIZE) <= DMA_CHUNK_SIZE);
+ ASSERT(pages_needed <= BITS_PER_LONG);
+ ASSERT(0 == (size & ~IOVP_MASK));
+
+ /*
+ ** "seek and ye shall find"...praying never hurts either...
+ */
+
+ pide = sba_search_bitmap(ioc, pages_needed);
+ if (pide >= (ioc->res_size << 3)) {
+ pide = sba_search_bitmap(ioc, pages_needed);
+ if (pide >= (ioc->res_size << 3))
+ panic(__FILE__ ": I/O MMU @ %lx is out of mapping resources\n", ioc->ioc_hpa);
+ }
+
+#ifdef ASSERT_PDIR_SANITY
+ /* verify the first enable bit is clear */
+ if(0x00 != ((u8 *) ioc->pdir_base)[pide*sizeof(u64) + 7]) {
+ sba_dump_pdir_entry(ioc, "sba_search_bitmap() botched it?", pide);
+ }
+#endif
+
+ DBG_RES("%s(%x) %d -> %lx hint %x/%x\n",
+ __FUNCTION__, size, pages_needed, pide,
+ (uint) ((unsigned long) ioc->res_hint - (unsigned long) ioc->res_map),
+ ioc->res_bitshift );
+
+#ifdef CONFIG_PROC_FS
+ {
+ unsigned long itc_end = ia64_get_itc();
+ unsigned long tmp = itc_end - itc_start;
+ /* check for roll over */
+ itc_start = (itc_end < itc_start) ? -(tmp) : (tmp);
+ }
+ ioc->avg_search[ioc->avg_idx++] = itc_start;
+ ioc->avg_idx &= SBA_SEARCH_SAMPLE - 1;
+
+ ioc->used_pages += pages_needed;
+#endif
+
+ return (pide);
+}
+
+
+/**
+ * sba_free_range - unmark bits in IO PDIR resource bitmap
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @iova: IO virtual address which was previously allocated.
+ * @size: number of bytes to create a mapping for
+ *
+ * clear bits in the ioc's resource map
+ */
+static SBA_INLINE void
+sba_free_range(struct ioc *ioc, dma_addr_t iova, size_t size)
+{
+ unsigned long iovp = SBA_IOVP(ioc, iova);
+ unsigned int pide = PDIR_INDEX(iovp);
+ unsigned int ridx = pide >> 3; /* convert bit to byte address */
+ unsigned long *res_ptr = (unsigned long *) &((ioc)->res_map[ridx & ~RESMAP_IDX_MASK]);
+
+ int bits_not_wanted = size >> IOVP_SHIFT;
+
+ /* 3-bits "bit" address plus 2 (or 3) bits for "byte" == bit in word */
+ unsigned long m = RESMAP_MASK(bits_not_wanted) << (pide & (BITS_PER_LONG - 1));
+
+ DBG_RES("%s( ,%x,%x) %x/%lx %x %p %lx\n",
+ __FUNCTION__, (uint) iova, size,
+ bits_not_wanted, m, pide, res_ptr, *res_ptr);
+
+#ifdef CONFIG_PROC_FS
+ ioc->used_pages -= bits_not_wanted;
+#endif
+
+ ASSERT(m != 0);
+ ASSERT(bits_not_wanted);
+ ASSERT((bits_not_wanted * IOVP_SIZE) <= DMA_CHUNK_SIZE);
+ ASSERT(bits_not_wanted <= BITS_PER_LONG);
+ ASSERT((*res_ptr & m) == m); /* verify same bits are set */
+ *res_ptr &= ~m;
+}
+
+
+/**************************************************************
+*
+* "Dynamic DMA Mapping" support (aka "Coherent I/O")
+*
+***************************************************************/
+
+#define SBA_DMA_HINT(ioc, val) ((val) << (ioc)->hint_shift_pdir)
+
+
+/**
+ * sba_io_pdir_entry - fill in one IO PDIR entry
+ * @pdir_ptr: pointer to IO PDIR entry
+ * @vba: Virtual CPU address of buffer to map
+ *
+ * SBA Mapping Routine
+ *
+ * Given a virtual address (vba, arg1) sba_io_pdir_entry()
+ * loads the I/O PDIR entry pointed to by pdir_ptr (arg0).
+ * Each IO Pdir entry consists of 8 bytes as shown below
+ * (LSB == bit 0):
+ *
+ * 63 40 11 7 0
+ * +-+---------------------+----------------------------------+----+--------+
+ * |V| U | PPN[39:12] | U | FF |
+ * +-+---------------------+----------------------------------+----+--------+
+ *
+ * V == Valid Bit
+ * U == Unused
+ * PPN == Physical Page Number
+ *
+ * The physical address fields are filled with the results of virt_to_phys()
+ * on the vba.
+ */
+
+#if 1
+#define sba_io_pdir_entry(pdir_ptr, vba) *pdir_ptr = ((vba & ~0xE000000000000FFFULL) | 0x80000000000000FFULL)
+#else
+void SBA_INLINE
+sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
+{
+ *pdir_ptr = ((vba & ~0xE000000000000FFFULL) | 0x80000000000000FFULL);
+}
+#endif
+
+#ifdef ENABLE_MARK_CLEAN
+/**
+ * Since DMA is i-cache coherent, any (complete) pages that were written via
+ * DMA can be marked as "clean" so that update_mmu_cache() doesn't have to
+ * flush them when they get mapped into an executable vm-area.
+ */
+static void
+mark_clean (void *addr, size_t size)
+{
+ unsigned long pg_addr, end;
+
+ pg_addr = PAGE_ALIGN((unsigned long) addr);
+ end = (unsigned long) addr + size;
+ while (pg_addr + PAGE_SIZE <= end) {
+ struct page *page = virt_to_page(pg_addr);
+ set_bit(PG_arch_1, &page->flags);
+ pg_addr += PAGE_SIZE;
+ }
+}
+#endif
+
+/**
+ * sba_mark_invalid - invalidate one or more IO PDIR entries
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @iova: IO Virtual Address mapped earlier
+ * @byte_cnt: number of bytes this mapping covers.
+ *
+ * Marking the IO PDIR entry(ies) as Invalid and invalidate
+ * corresponding IO TLB entry. The PCOM (Purge Command Register)
+ * is to purge stale entries in the IO TLB when unmapping entries.
+ *
+ * The PCOM register supports purging of multiple pages, with a minium
+ * of 1 page and a maximum of 2GB. Hardware requires the address be
+ * aligned to the size of the range being purged. The size of the range
+ * must be a power of 2. The "Cool perf optimization" in the
+ * allocation routine helps keep that true.
+ */
+static SBA_INLINE void
+sba_mark_invalid(struct ioc *ioc, dma_addr_t iova, size_t byte_cnt)
+{
+ u32 iovp = (u32) SBA_IOVP(ioc,iova);
+
+ int off = PDIR_INDEX(iovp);
+
+ /* Must be non-zero and rounded up */
+ ASSERT(byte_cnt > 0);
+ ASSERT(0 == (byte_cnt & ~IOVP_MASK));
+
+#ifdef ASSERT_PDIR_SANITY
+ /* Assert first pdir entry is set */
+ if (!(ioc->pdir_base[off] >> 60)) {
+ sba_dump_pdir_entry(ioc,"sba_mark_invalid()", PDIR_INDEX(iovp));
+ }
+#endif
+
+ if (byte_cnt <= IOVP_SIZE)
+ {
+ ASSERT(off < ioc->pdir_size);
+
+ iovp |= IOVP_SHIFT; /* set "size" field for PCOM */
+
+ /*
+ ** clear I/O PDIR entry "valid" bit
+ ** Do NOT clear the rest - save it for debugging.
+ ** We should only clear bits that have previously
+ ** been enabled.
+ */
+ ioc->pdir_base[off] &= ~(0x80000000000000FFULL);
+ } else {
+ u32 t = get_order(byte_cnt) + PAGE_SHIFT;
+
+ iovp |= t;
+ ASSERT(t <= 31); /* 2GB! Max value of "size" field */
+
+ do {
+ /* verify this pdir entry is enabled */
+ ASSERT(ioc->pdir_base[off] >> 63);
+ /* clear I/O Pdir entry "valid" bit first */
+ ioc->pdir_base[off] &= ~(0x80000000000000FFULL);
+ off++;
+ byte_cnt -= IOVP_SIZE;
+ } while (byte_cnt > 0);
+ }
+
+ WRITE_REG(iovp, ioc->ioc_hpa+IOC_PCOM);
+}
+
+/**
+ * sba_map_single - map one buffer and return IOVA for DMA
+ * @dev: instance of PCI owned by the driver that's asking.
+ * @addr: driver buffer to map.
+ * @size: number of bytes to map in driver buffer.
+ * @direction: R/W or both.
+ *
+ * See Documentation/DMA-mapping.txt
+ */
+dma_addr_t
+sba_map_single(struct pci_dev *dev, void *addr, size_t size, int direction)
+{
+ struct ioc *ioc;
+ unsigned long flags;
+ dma_addr_t iovp;
+ dma_addr_t offset;
+ u64 *pdir_start;
+ int pide;
+#ifdef ALLOW_IOV_BYPASS
+ unsigned long pci_addr = virt_to_phys(addr);
+#endif
+
+ ioc = GET_IOC(dev);
+ ASSERT(ioc);
+
+#ifdef ALLOW_IOV_BYPASS
+ /*
+ ** Check if the PCI device can DMA to ptr... if so, just return ptr
+ */
+ if ((pci_addr & ~dev->dma_mask) == 0) {
+ /*
+ ** Device is bit capable of DMA'ing to the buffer...
+ ** just return the PCI address of ptr
+ */
+#ifdef CONFIG_PROC_FS
+ spin_lock_irqsave(&ioc->res_lock, flags);
+ ioc->msingle_bypass++;
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+#endif
+ DBG_BYPASS("sba_map_single() bypass mask/addr: 0x%lx/0x%lx\n",
+ dev->dma_mask, pci_addr);
+ return pci_addr;
+ }
+#endif
+
+ ASSERT(size > 0);
+ ASSERT(size <= DMA_CHUNK_SIZE);
+
+ /* save offset bits */
+ offset = ((dma_addr_t) (long) addr) & ~IOVP_MASK;
+
+ /* round up to nearest IOVP_SIZE */
+ size = (size + offset + ~IOVP_MASK) & IOVP_MASK;
+
+ spin_lock_irqsave(&ioc->res_lock, flags);
+#ifdef ASSERT_PDIR_SANITY
+ if (sba_check_pdir(ioc,"Check before sba_map_single()"))
+ panic("Sanity check failed");
+#endif
+
+#ifdef CONFIG_PROC_FS
+ ioc->msingle_calls++;
+ ioc->msingle_pages += size >> IOVP_SHIFT;
+#endif
+ pide = sba_alloc_range(ioc, size);
+ iovp = (dma_addr_t) pide << IOVP_SHIFT;
+
+ DBG_RUN("%s() 0x%p -> 0x%lx\n",
+ __FUNCTION__, addr, (long) iovp | offset);
+
+ pdir_start = &(ioc->pdir_base[pide]);
+
+ while (size > 0) {
+ ASSERT(((u8 *)pdir_start)[7] == 0); /* verify availability */
+ sba_io_pdir_entry(pdir_start, (unsigned long) addr);
+
+ DBG_RUN(" pdir 0x%p %lx\n", pdir_start, *pdir_start);
+
+ addr += IOVP_SIZE;
+ size -= IOVP_SIZE;
+ pdir_start++;
+ }
+ /* form complete address */
+#ifdef ASSERT_PDIR_SANITY
+ sba_check_pdir(ioc,"Check after sba_map_single()");
+#endif
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+ return SBA_IOVA(ioc, iovp, offset, DEFAULT_DMA_HINT_REG);
+}
+
+/**
+ * sba_unmap_single - unmap one IOVA and free resources
+ * @dev: instance of PCI owned by the driver that's asking.
+ * @iova: IOVA of driver buffer previously mapped.
+ * @size: number of bytes mapped in driver buffer.
+ * @direction: R/W or both.
+ *
+ * See Documentation/DMA-mapping.txt
+ */
+void sba_unmap_single(struct pci_dev *dev, dma_addr_t iova, size_t size,
+ int direction)
+{
+ struct ioc *ioc;
+#if DELAYED_RESOURCE_CNT > 0
+ struct sba_dma_pair *d;
+#endif
+ unsigned long flags;
+ dma_addr_t offset;
+
+ ioc = GET_IOC(dev);
+ ASSERT(ioc);
+
+#ifdef ALLOW_IOV_BYPASS
+ if ((iova & ioc->imask) != ioc->ibase) {
+ /*
+ ** Address does not fall w/in IOVA, must be bypassing
+ */
+#ifdef CONFIG_PROC_FS
+ spin_lock_irqsave(&ioc->res_lock, flags);
+ ioc->usingle_bypass++;
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+#endif
+ DBG_BYPASS("sba_unmap_single() bypass addr: 0x%lx\n", iova);
+
+#ifdef ENABLE_MARK_CLEAN
+ if (direction == PCI_DMA_FROMDEVICE) {
+ mark_clean(phys_to_virt(iova), size);
+ }
+#endif
+ return;
+ }
+#endif
+ offset = iova & ~IOVP_MASK;
+
+ DBG_RUN("%s() iovp 0x%lx/%x\n",
+ __FUNCTION__, (long) iova, size);
+
+ iova ^= offset; /* clear offset bits */
+ size += offset;
+ size = ROUNDUP(size, IOVP_SIZE);
+
+ spin_lock_irqsave(&ioc->res_lock, flags);
+#ifdef CONFIG_PROC_FS
+ ioc->usingle_calls++;
+ ioc->usingle_pages += size >> IOVP_SHIFT;
+#endif
+
+#if DELAYED_RESOURCE_CNT > 0
+ d = &(ioc->saved[ioc->saved_cnt]);
+ d->iova = iova;
+ d->size = size;
+ if (++(ioc->saved_cnt) >= DELAYED_RESOURCE_CNT) {
+ int cnt = ioc->saved_cnt;
+ while (cnt--) {
+ sba_mark_invalid(ioc, d->iova, d->size);
+ sba_free_range(ioc, d->iova, d->size);
+ d--;
+ }
+ ioc->saved_cnt = 0;
+ READ_REG(ioc->ioc_hpa+IOC_PCOM); /* flush purges */
+ }
+#else /* DELAYED_RESOURCE_CNT == 0 */
+ sba_mark_invalid(ioc, iova, size);
+ sba_free_range(ioc, iova, size);
+ READ_REG(ioc->ioc_hpa+IOC_PCOM); /* flush purges */
+#endif /* DELAYED_RESOURCE_CNT == 0 */
+#ifdef ENABLE_MARK_CLEAN
+ if (direction == PCI_DMA_FROMDEVICE) {
+ u32 iovp = (u32) SBA_IOVP(ioc,iova);
+ int off = PDIR_INDEX(iovp);
+ void *addr;
+
+ if (size <= IOVP_SIZE) {
+ addr = phys_to_virt(ioc->pdir_base[off] &
+ ~0xE000000000000FFFULL);
+ mark_clean(addr, size);
+ } else {
+ size_t byte_cnt = size;
+
+ do {
+ addr = phys_to_virt(ioc->pdir_base[off] &
+ ~0xE000000000000FFFULL);
+ mark_clean(addr, min(byte_cnt, IOVP_SIZE));
+ off++;
+ byte_cnt -= IOVP_SIZE;
+
+ } while (byte_cnt > 0);
+ }
+ }
+#endif
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+
+ /* XXX REVISIT for 2.5 Linux - need syncdma for zero-copy support.
+ ** For Astro based systems this isn't a big deal WRT performance.
+ ** As long as 2.4 kernels copyin/copyout data from/to userspace,
+ ** we don't need the syncdma. The issue here is I/O MMU cachelines
+ ** are *not* coherent in all cases. May be hwrev dependent.
+ ** Need to investigate more.
+ asm volatile("syncdma");
+ */
+}
+
+
+/**
+ * sba_alloc_consistent - allocate/map shared mem for DMA
+ * @hwdev: instance of PCI owned by the driver that's asking.
+ * @size: number of bytes mapped in driver buffer.
+ * @dma_handle: IOVA of new buffer.
+ *
+ * See Documentation/DMA-mapping.txt
+ */
+void *
+sba_alloc_consistent(struct pci_dev *hwdev, size_t size, dma_addr_t *dma_handle)
+{
+ void *ret;
+
+ if (!hwdev) {
+ /* only support PCI */
+ *dma_handle = 0;
+ return 0;
+ }
+
+ ret = (void *) __get_free_pages(GFP_ATOMIC, get_order(size));
+
+ if (ret) {
+ memset(ret, 0, size);
+ *dma_handle = sba_map_single(hwdev, ret, size, 0);
+ }
+
+ return ret;
+}
+
+
+/**
+ * sba_free_consistent - free/unmap shared mem for DMA
+ * @hwdev: instance of PCI owned by the driver that's asking.
+ * @size: number of bytes mapped in driver buffer.
+ * @vaddr: virtual address IOVA of "consistent" buffer.
+ * @dma_handler: IO virtual address of "consistent" buffer.
+ *
+ * See Documentation/DMA-mapping.txt
+ */
+void sba_free_consistent(struct pci_dev *hwdev, size_t size, void *vaddr,
+ dma_addr_t dma_handle)
+{
+ sba_unmap_single(hwdev, dma_handle, size, 0);
+ free_pages((unsigned long) vaddr, get_order(size));
+}
+
+
+/*
+** Since 0 is a valid pdir_base index value, can't use that
+** to determine if a value is valid or not. Use a flag to indicate
+** the SG list entry contains a valid pdir index.
+*/
+#define PIDE_FLAG 0x1UL
+
+#ifdef DEBUG_LARGE_SG_ENTRIES
+int dump_run_sg = 0;
+#endif
+
+
+/**
+ * sba_fill_pdir - write allocated SG entries into IO PDIR
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @startsg: list of IOVA/size pairs
+ * @nents: number of entries in startsg list
+ *
+ * Take preprocessed SG list and write corresponding entries
+ * in the IO PDIR.
+ */
+
+static SBA_INLINE int
+sba_fill_pdir(
+ struct ioc *ioc,
+ struct scatterlist *startsg,
+ int nents)
+{
+ struct scatterlist *dma_sg = startsg; /* pointer to current DMA */
+ int n_mappings = 0;
+ u64 *pdirp = 0;
+ unsigned long dma_offset = 0;
+
+ dma_sg--;
+ while (nents-- > 0) {
+ int cnt = sba_sg_len(startsg);
+ sba_sg_len(startsg) = 0;
+
+#ifdef DEBUG_LARGE_SG_ENTRIES
+ if (dump_run_sg)
+ printk(" %2d : %08lx/%05x %p\n",
+ nents,
+ (unsigned long) sba_sg_iova(startsg), cnt,
+ sba_sg_buffer(startsg)
+ );
+#else
+ DBG_RUN_SG(" %d : %08lx/%05x %p\n",
+ nents,
+ (unsigned long) sba_sg_iova(startsg), cnt,
+ sba_sg_buffer(startsg)
+ );
+#endif
+ /*
+ ** Look for the start of a new DMA stream
+ */
+ if ((u64)sba_sg_iova(startsg) & PIDE_FLAG) {
+ u32 pide = (u64)sba_sg_iova(startsg) & ~PIDE_FLAG;
+ dma_offset = (unsigned long) pide & ~IOVP_MASK;
+ sba_sg_iova(startsg) = 0;
+ dma_sg++;
+ sba_sg_iova(dma_sg) = (char *)(pide | ioc->ibase);
+ pdirp = &(ioc->pdir_base[pide >> IOVP_SHIFT]);
+ n_mappings++;
+ }
+
+ /*
+ ** Look for a VCONTIG chunk
+ */
+ if (cnt) {
+ unsigned long vaddr = (unsigned long) sba_sg_buffer(startsg);
+ ASSERT(pdirp);
+
+ /* Since multiple Vcontig blocks could make up
+ ** one DMA stream, *add* cnt to dma_len.
+ */
+ sba_sg_len(dma_sg) += cnt;
+ cnt += dma_offset;
+ dma_offset=0; /* only want offset on first chunk */
+ cnt = ROUNDUP(cnt, IOVP_SIZE);
+#ifdef CONFIG_PROC_FS
+ ioc->msg_pages += cnt >> IOVP_SHIFT;
+#endif
+ do {
+ sba_io_pdir_entry(pdirp, vaddr);
+ vaddr += IOVP_SIZE;
+ cnt -= IOVP_SIZE;
+ pdirp++;
+ } while (cnt > 0);
+ }
+ startsg++;
+ }
+#ifdef DEBUG_LARGE_SG_ENTRIES
+ dump_run_sg = 0;
+#endif
+ return(n_mappings);
+}
+
+
+/*
+** Two address ranges are DMA contiguous *iff* "end of prev" and
+** "start of next" are both on a page boundry.
+**
+** (shift left is a quick trick to mask off upper bits)
+*/
+#define DMA_CONTIG(__X, __Y) \
+ (((((unsigned long) __X) | ((unsigned long) __Y)) << (BITS_PER_LONG - PAGE_SHIFT)) == 0UL)
+
+
+/**
+ * sba_coalesce_chunks - preprocess the SG list
+ * @ioc: IO MMU structure which owns the pdir we are interested in.
+ * @startsg: list of IOVA/size pairs
+ * @nents: number of entries in startsg list
+ *
+ * First pass is to walk the SG list and determine where the breaks are
+ * in the DMA stream. Allocates PDIR entries but does not fill them.
+ * Returns the number of DMA chunks.
+ *
+ * Doing the fill seperate from the coalescing/allocation keeps the
+ * code simpler. Future enhancement could make one pass through
+ * the sglist do both.
+ */
+static SBA_INLINE int
+sba_coalesce_chunks( struct ioc *ioc,
+ struct scatterlist *startsg,
+ int nents)
+{
+ struct scatterlist *vcontig_sg; /* VCONTIG chunk head */
+ unsigned long vcontig_len; /* len of VCONTIG chunk */
+ unsigned long vcontig_end;
+ struct scatterlist *dma_sg; /* next DMA stream head */
+ unsigned long dma_offset, dma_len; /* start/len of DMA stream */
+ int n_mappings = 0;
+
+ while (nents > 0) {
+ unsigned long vaddr = (unsigned long) (startsg->address);
+
+ /*
+ ** Prepare for first/next DMA stream
+ */
+ dma_sg = vcontig_sg = startsg;
+ dma_len = vcontig_len = vcontig_end = sba_sg_len(startsg);
+ vcontig_end += vaddr;
+ dma_offset = vaddr & ~IOVP_MASK;
+
+ /* PARANOID: clear entries */
+ sba_sg_buffer(startsg) = sba_sg_iova(startsg);
+ sba_sg_iova(startsg) = 0;
+ sba_sg_len(startsg) = 0;
+
+ /*
+ ** This loop terminates one iteration "early" since
+ ** it's always looking one "ahead".
+ */
+ while (--nents > 0) {
+ unsigned long vaddr; /* tmp */
+
+ startsg++;
+
+ /* catch brokenness in SCSI layer */
+ ASSERT(startsg->length <= DMA_CHUNK_SIZE);
+
+ /*
+ ** First make sure current dma stream won't
+ ** exceed DMA_CHUNK_SIZE if we coalesce the
+ ** next entry.
+ */
+ if (((dma_len + dma_offset + startsg->length + ~IOVP_MASK) & IOVP_MASK) > DMA_CHUNK_SIZE)
+ break;
+
+ /*
+ ** Then look for virtually contiguous blocks.
+ **
+ ** append the next transaction?
+ */
+ vaddr = (unsigned long) sba_sg_iova(startsg);
+ if (vcontig_end == vaddr)
+ {
+ vcontig_len += sba_sg_len(startsg);
+ vcontig_end += sba_sg_len(startsg);
+ dma_len += sba_sg_len(startsg);
+ sba_sg_buffer(startsg) = (char *)vaddr;
+ sba_sg_iova(startsg) = 0;
+ sba_sg_len(startsg) = 0;
+ continue;
+ }
+
+#ifdef DEBUG_LARGE_SG_ENTRIES
+ dump_run_sg = (vcontig_len > IOVP_SIZE);
+#endif
+
+ /*
+ ** Not virtually contigous.
+ ** Terminate prev chunk.
+ ** Start a new chunk.
+ **
+ ** Once we start a new VCONTIG chunk, dma_offset
+ ** can't change. And we need the offset from the first
+ ** chunk - not the last one. Ergo Successive chunks
+ ** must start on page boundaries and dove tail
+ ** with it's predecessor.
+ */
+ sba_sg_len(vcontig_sg) = vcontig_len;
+
+ vcontig_sg = startsg;
+ vcontig_len = sba_sg_len(startsg);
+
+ /*
+ ** 3) do the entries end/start on page boundaries?
+ ** Don't update vcontig_end until we've checked.
+ */
+ if (DMA_CONTIG(vcontig_end, vaddr))
+ {
+ vcontig_end = vcontig_len + vaddr;
+ dma_len += vcontig_len;
+ sba_sg_buffer(startsg) = (char *)vaddr;
+ sba_sg_iova(startsg) = 0;
+ continue;
+ } else {
+ break;
+ }
+ }
+
+ /*
+ ** End of DMA Stream
+ ** Terminate last VCONTIG block.
+ ** Allocate space for DMA stream.
+ */
+ sba_sg_len(vcontig_sg) = vcontig_len;
+ dma_len = (dma_len + dma_offset + ~IOVP_MASK) & IOVP_MASK;
+ ASSERT(dma_len <= DMA_CHUNK_SIZE);
+ sba_sg_iova(dma_sg) = (char *) (PIDE_FLAG
+ | (sba_alloc_range(ioc, dma_len) << IOVP_SHIFT)
+ | dma_offset);
+ n_mappings++;
+ }
+
+ return n_mappings;
+}
+
+
+/**
+ * sba_map_sg - map Scatter/Gather list
+ * @dev: instance of PCI owned by the driver that's asking.
+ * @sglist: array of buffer/length pairs
+ * @nents: number of entries in list
+ * @direction: R/W or both.
+ *
+ * See Documentation/DMA-mapping.txt
+ */
+int sba_map_sg(struct pci_dev *dev, struct scatterlist *sglist, int nents,
+ int direction)
+{
+ struct ioc *ioc;
+ int coalesced, filled = 0;
+ unsigned long flags;
+#ifdef ALLOW_IOV_BYPASS
+ struct scatterlist *sg;
+#endif
+
+ DBG_RUN_SG("%s() START %d entries\n", __FUNCTION__, nents);
+ ioc = GET_IOC(dev);
+ ASSERT(ioc);
+
+#ifdef ALLOW_IOV_BYPASS
+ if (dev->dma_mask >= ioc->dma_mask) {
+ for (sg = sglist ; filled < nents ; filled++, sg++){
+ sba_sg_buffer(sg) = sba_sg_iova(sg);
+ sba_sg_iova(sg) = (char *)virt_to_phys(sba_sg_buffer(sg));
+ }
+#ifdef CONFIG_PROC_FS
+ spin_lock_irqsave(&ioc->res_lock, flags);
+ ioc->msg_bypass++;
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+#endif
+ return filled;
+ }
+#endif
+ /* Fast path single entry scatterlists. */
+ if (nents == 1) {
+ sba_sg_buffer(sglist) = sba_sg_iova(sglist);
+ sba_sg_iova(sglist) = (char *)sba_map_single(dev,
+ sba_sg_buffer(sglist),
+ sba_sg_len(sglist), direction);
+#ifdef CONFIG_PROC_FS
+ /*
+ ** Should probably do some stats counting, but trying to
+ ** be precise quickly starts wasting CPU time.
+ */
+#endif
+ return 1;
+ }
+
+ spin_lock_irqsave(&ioc->res_lock, flags);
+
+#ifdef ASSERT_PDIR_SANITY
+ if (sba_check_pdir(ioc,"Check before sba_map_sg()"))
+ {
+ sba_dump_sg(ioc, sglist, nents);
+ panic("Check before sba_map_sg()");
+ }
+#endif
+
+#ifdef CONFIG_PROC_FS
+ ioc->msg_calls++;
+#endif
+
+ /*
+ ** First coalesce the chunks and allocate I/O pdir space
+ **
+ ** If this is one DMA stream, we can properly map using the
+ ** correct virtual address associated with each DMA page.
+ ** w/o this association, we wouldn't have coherent DMA!
+ ** Access to the virtual address is what forces a two pass algorithm.
+ */
+ coalesced = sba_coalesce_chunks(ioc, sglist, nents);
+
+ /*
+ ** Program the I/O Pdir
+ **
+ ** map the virtual addresses to the I/O Pdir
+ ** o dma_address will contain the pdir index
+ ** o dma_len will contain the number of bytes to map
+ ** o address contains the virtual address.
+ */
+ filled = sba_fill_pdir(ioc, sglist, nents);
+
+#ifdef ASSERT_PDIR_SANITY
+ if (sba_check_pdir(ioc,"Check after sba_map_sg()"))
+ {
+ sba_dump_sg(ioc, sglist, nents);
+ panic("Check after sba_map_sg()\n");
+ }
+#endif
+
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+
+ ASSERT(coalesced == filled);
+ DBG_RUN_SG("%s() DONE %d mappings\n", __FUNCTION__, filled);
+
+ return filled;
+}
+
+
+/**
+ * sba_unmap_sg - unmap Scatter/Gather list
+ * @dev: instance of PCI owned by the driver that's asking.
+ * @sglist: array of buffer/length pairs
+ * @nents: number of entries in list
+ * @direction: R/W or both.
+ *
+ * See Documentation/DMA-mapping.txt
+ */
+void sba_unmap_sg(struct pci_dev *dev, struct scatterlist *sglist, int nents,
+ int direction)
+{
+ struct ioc *ioc;
+#ifdef ASSERT_PDIR_SANITY
+ unsigned long flags;
+#endif
+
+ DBG_RUN_SG("%s() START %d entries, %p,%x\n",
+ __FUNCTION__, nents, sba_sg_buffer(sglist), sglist->length);
+
+ ioc = GET_IOC(dev);
+ ASSERT(ioc);
+
+#ifdef CONFIG_PROC_FS
+ ioc->usg_calls++;
+#endif
+
+#ifdef ASSERT_PDIR_SANITY
+ spin_lock_irqsave(&ioc->res_lock, flags);
+ sba_check_pdir(ioc,"Check before sba_unmap_sg()");
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+#endif
+
+ while (sba_sg_len(sglist) && nents--) {
+
+ sba_unmap_single(dev, (dma_addr_t)sba_sg_iova(sglist),
+ sba_sg_len(sglist), direction);
+#ifdef CONFIG_PROC_FS
+ /*
+ ** This leaves inconsistent data in the stats, but we can't
+ ** tell which sg lists were mapped by map_single and which
+ ** were coalesced to a single entry. The stats are fun,
+ ** but speed is more important.
+ */
+ ioc->usg_pages += (((u64)sba_sg_iova(sglist) & ~IOVP_MASK) + sba_sg_len(sglist) + IOVP_SIZE - 1) >> PAGE_SHIFT;
+#endif
+ ++sglist;
+ }
+
+ DBG_RUN_SG("%s() DONE (nents %d)\n", __FUNCTION__, nents);
+
+#ifdef ASSERT_PDIR_SANITY
+ spin_lock_irqsave(&ioc->res_lock, flags);
+ sba_check_pdir(ioc,"Check after sba_unmap_sg()");
+ spin_unlock_irqrestore(&ioc->res_lock, flags);
+#endif
+
+}
+
+unsigned long
+sba_dma_address (struct scatterlist *sg)
+{
+ return ((unsigned long)sba_sg_iova(sg));
+}
+
+/**************************************************************
+*
+* Initialization and claim
+*
+***************************************************************/
+
+
+static void
+sba_ioc_init(struct sba_device *sba_dev, struct ioc *ioc, int ioc_num)
+{
+ u32 iova_space_size, iova_space_mask;
+ void * pdir_base;
+ int pdir_size, iov_order, tcnfg;
+
+ /*
+ ** Firmware programs the maximum IOV space size into the imask reg
+ */
+ iova_space_size = ~(READ_REG(ioc->ioc_hpa + IOC_IMASK) & 0xFFFFFFFFUL) + 1;
+#ifdef CONFIG_IA64_HP_PROTO
+ if (!iova_space_size)
+ iova_space_size = GB(1);
+#endif
+
+ /*
+ ** iov_order is always based on a 1GB IOVA space since we want to
+ ** turn on the other half for AGP GART.
+ */
+ iov_order = get_order(iova_space_size >> (IOVP_SHIFT-PAGE_SHIFT));
+ ioc->pdir_size = pdir_size = (iova_space_size/IOVP_SIZE) * sizeof(u64);
+
+ DBG_INIT("%s() hpa 0x%lx IOV %dMB (%d bits) PDIR size 0x%0x\n",
+ __FUNCTION__, ioc->ioc_hpa, iova_space_size>>20,
+ iov_order + PAGE_SHIFT, ioc->pdir_size);
+
+ /* FIXME : DMA HINTs not used */
+ ioc->hint_shift_pdir = iov_order + PAGE_SHIFT;
+ ioc->hint_mask_pdir = ~(0x3 << (iov_order + PAGE_SHIFT));
+
+ ioc->pdir_base =
+ pdir_base = (void *) __get_free_pages(GFP_KERNEL, get_order(pdir_size));
+ if (NULL == pdir_base)
+ {
+ panic(__FILE__ ":%s() could not allocate I/O Page Table\n", __FUNCTION__);
+ }
+ memset(pdir_base, 0, pdir_size);
+
+ DBG_INIT("%s() pdir %p size %x hint_shift_pdir %x hint_mask_pdir %lx\n",
+ __FUNCTION__, pdir_base, pdir_size,
+ ioc->hint_shift_pdir, ioc->hint_mask_pdir);
+
+ ASSERT((((unsigned long) pdir_base) & PAGE_MASK) == (unsigned long) pdir_base);
+ WRITE_REG(virt_to_phys(pdir_base), ioc->ioc_hpa + IOC_PDIR_BASE);
+
+ DBG_INIT(" base %p\n", pdir_base);
+
+ /* build IMASK for IOC and Elroy */
+ iova_space_mask = 0xffffffff;
+ iova_space_mask <<= (iov_order + PAGE_SHIFT);
+
+#ifdef CONFIG_IA64_HP_PROTO
+ /*
+ ** REVISIT - this is a kludge, but we won't be supporting anything but
+ ** zx1 2.0 or greater for real. When fw is in shape, ibase will
+ ** be preprogrammed w/ the IOVA hole base and imask will give us
+ ** the size.
+ */
+ if ((sba_dev->hw_rev & 0xFF) < 0x20) {
+ DBG_INIT("%s() Found SBA rev < 2.0, setting IOVA base to 0. This device will not be supported in the future.\n", __FUNCTION__);
+ ioc->ibase = 0x0;
+ } else
+#endif
+ ioc->ibase = READ_REG(ioc->ioc_hpa + IOC_IBASE) & 0xFFFFFFFEUL;
+
+ ioc->imask = iova_space_mask; /* save it */
+
+ DBG_INIT("%s() IOV base 0x%lx mask 0x%0lx\n",
+ __FUNCTION__, ioc->ibase, ioc->imask);
+
+ /*
+ ** FIXME: Hint registers are programmed with default hint
+ ** values during boot, so hints should be sane even if we
+ ** can't reprogram them the way drivers want.
+ */
+
+ WRITE_REG(ioc->imask, ioc->ioc_hpa+IOC_IMASK);
+
+ /*
+ ** Setting the upper bits makes checking for bypass addresses
+ ** a little faster later on.
+ */
+ ioc->imask |= 0xFFFFFFFF00000000UL;
+
+ /* Set I/O PDIR Page size to system page size */
+ switch (PAGE_SHIFT) {
+ case 12: /* 4K */
+ tcnfg = 0;
+ break;
+ case 13: /* 8K */
+ tcnfg = 1;
+ break;
+ case 14: /* 16K */
+ tcnfg = 2;
+ break;
+ case 16: /* 64K */
+ tcnfg = 3;
+ break;
+ }
+ WRITE_REG(tcnfg, ioc->ioc_hpa+IOC_TCNFG);
+
+ /*
+ ** Program the IOC's ibase and enable IOVA translation
+ ** Bit zero == enable bit.
+ */
+ WRITE_REG(ioc->ibase | 1, ioc->ioc_hpa+IOC_IBASE);
+
+ /*
+ ** Clear I/O TLB of any possible entries.
+ ** (Yes. This is a bit paranoid...but so what)
+ */
+ WRITE_REG(0 | 31, ioc->ioc_hpa+IOC_PCOM);
+
+ /*
+ ** If an AGP device is present, only use half of the IOV space
+ ** for PCI DMA. Unfortunately we can't know ahead of time
+ ** whether GART support will actually be used, for now we
+ ** can just key on an AGP device found in the system.
+ ** We program the next pdir index after we stop w/ a key for
+ ** the GART code to handshake on.
+ */
+ if (SBA_GET_AGP(sba_dev)) {
+ DBG_INIT("%s() AGP Device found, reserving 512MB for GART support\n", __FUNCTION__);
+ ioc->pdir_size /= 2;
+ ((u64 *)pdir_base)[PDIR_INDEX(iova_space_size/2)] = 0x0000badbadc0ffeeULL;
+ }
+
+ DBG_INIT("%s() DONE\n", __FUNCTION__);
+}
+
+
+
+/**************************************************************************
+**
+** SBA initialization code (HW and SW)
+**
+** o identify SBA chip itself
+** o FIXME: initialize DMA hints for reasonable defaults
+**
+**************************************************************************/
+
+static void
+sba_hw_init(struct sba_device *sba_dev)
+{
+ int i;
+ int num_ioc;
+ u64 dma_mask;
+ u32 func_id;
+
+ /*
+ ** Identify the SBA so we can set the dma_mask. We can make a virtual
+ ** dma_mask of the memory subsystem such that devices not implmenting
+ ** a full 64bit mask might still be able to bypass efficiently.
+ */
+ func_id = READ_REG(sba_dev->sba_hpa + SBA_FUNC_ID);
+
+ if (func_id == ZX1_FUNC_ID_VALUE) {
+ dma_mask = 0xFFFFFFFFFFUL;
+ } else {
+ dma_mask = 0xFFFFFFFFFFFFFFFFUL;
+ }
+
+ DBG_INIT("%s(): ioc->dma_mask == 0x%lx\n", __FUNCTION__, dma_mask);
+
+ /*
+ ** Leaving in the multiple ioc code from parisc for the future,
+ ** currently there are no muli-ioc mckinley sbas
+ */
+ sba_dev->ioc[0].ioc_hpa = SBA_IOC_OFFSET;
+ num_ioc = 1;
+
+ sba_dev->num_ioc = num_ioc;
+ for (i = 0; i < num_ioc; i++) {
+ sba_dev->ioc[i].dma_mask = dma_mask;
+ sba_dev->ioc[i].ioc_hpa += sba_dev->sba_hpa;
+ sba_ioc_init(sba_dev, &(sba_dev->ioc[i]), i);
+ }
+}
+
+static void
+sba_common_init(struct sba_device *sba_dev)
+{
+ int i;
+
+ /* add this one to the head of the list (order doesn't matter)
+ ** This will be useful for debugging - especially if we get coredumps
+ */
+ sba_dev->next = sba_list;
+ sba_list = sba_dev;
+ sba_count++;
+
+ for(i=0; i< sba_dev->num_ioc; i++) {
+ int res_size;
+
+ /* resource map size dictated by pdir_size */
+ res_size = sba_dev->ioc[i].pdir_size/sizeof(u64); /* entries */
+ res_size >>= 3; /* convert bit count to byte count */
+ DBG_INIT("%s() res_size 0x%x\n",
+ __FUNCTION__, res_size);
+
+ sba_dev->ioc[i].res_size = res_size;
+ sba_dev->ioc[i].res_map = (char *) __get_free_pages(GFP_KERNEL, get_order(res_size));
+
+ if (NULL == sba_dev->ioc[i].res_map)
+ {
+ panic(__FILE__ ":%s() could not allocate resource map\n", __FUNCTION__ );
+ }
+
+ memset(sba_dev->ioc[i].res_map, 0, res_size);
+ /* next available IOVP - circular search */
+ if ((sba_dev->hw_rev & 0xFF) >= 0x20) {
+ sba_dev->ioc[i].res_hint = (unsigned long *)
+ sba_dev->ioc[i].res_map;
+ } else {
+ u64 reserved_iov;
+
+ /* Yet another 1.x hack */
+ printk("zx1 1.x: Starting resource hint offset into IOV space to avoid initial zero value IOVA\n");
+ sba_dev->ioc[i].res_hint = (unsigned long *)
+ &(sba_dev->ioc[i].res_map[L1_CACHE_BYTES]);
+
+ sba_dev->ioc[i].res_map[0] = 0x1;
+ sba_dev->ioc[i].pdir_base[0] = 0x8000badbadc0ffeeULL;
+
+ for (reserved_iov = 0xA0000 ; reserved_iov < 0xC0000 ; reserved_iov += IOVP_SIZE) {
+ u64 *res_ptr = sba_dev->ioc[i].res_map;
+ int index = PDIR_INDEX(reserved_iov);
+ int res_word;
+ u64 mask;
+
+ res_word = (int)(index / BITS_PER_LONG);
+ mask = 0x1UL << (index - (res_word * BITS_PER_LONG));
+ res_ptr[res_word] |= mask;
+ sba_dev->ioc[i].pdir_base[PDIR_INDEX(reserved_iov)] = (0x80000000000000FFULL | reserved_iov);
+
+ }
+ }
+
+#ifdef ASSERT_PDIR_SANITY
+ /* Mark first bit busy - ie no IOVA 0 */
+ sba_dev->ioc[i].res_map[0] = 0x1;
+ sba_dev->ioc[i].pdir_base[0] = 0x8000badbadc0ffeeULL;
+#endif
+
+ DBG_INIT("%s() %d res_map %x %p\n", __FUNCTION__,
+ i, res_size, (void *)sba_dev->ioc[i].res_map);
+ }
+
+ sba_dev->sba_lock = SPIN_LOCK_UNLOCKED;
+}
+
+#ifdef CONFIG_PROC_FS
+static int sba_proc_info(char *buf, char **start, off_t offset, int len)
+{
+ struct sba_device *sba_dev = sba_list;
+ struct ioc *ioc = &sba_dev->ioc[0]; /* FIXME: Multi-IOC support! */
+ int total_pages = (int) (ioc->res_size << 3); /* 8 bits per byte */
+ unsigned long i = 0, avg = 0, min, max;
+
+ sprintf(buf, "%s rev %d.%d\n",
+ "Hewlett Packard zx1 SBA",
+ ((sba_dev->hw_rev >> 4) & 0xF),
+ (sba_dev->hw_rev & 0xF)
+ );
+ sprintf(buf, "%sIO PDIR size : %d bytes (%d entries)\n",
+ buf,
+ (int) ((ioc->res_size << 3) * sizeof(u64)), /* 8 bits/byte */
+ total_pages);
+
+ sprintf(buf, "%sIO PDIR entries : %ld free %ld used (%d%%)\n", buf,
+ total_pages - ioc->used_pages, ioc->used_pages,
+ (int) (ioc->used_pages * 100 / total_pages));
+
+ sprintf(buf, "%sResource bitmap : %d bytes (%d pages)\n",
+ buf, ioc->res_size, ioc->res_size << 3); /* 8 bits per byte */
+
+ min = max = ioc->avg_search[0];
+ for (i = 0; i < SBA_SEARCH_SAMPLE; i++) {
+ avg += ioc->avg_search[i];
+ if (ioc->avg_search[i] > max) max = ioc->avg_search[i];
+ if (ioc->avg_search[i] < min) min = ioc->avg_search[i];
+ }
+ avg /= SBA_SEARCH_SAMPLE;
+ sprintf(buf, "%s Bitmap search : %ld/%ld/%ld (min/avg/max CPU Cycles)\n",
+ buf, min, avg, max);
+
+ sprintf(buf, "%spci_map_single(): %12ld calls %12ld pages (avg %d/1000)\n",
+ buf, ioc->msingle_calls, ioc->msingle_pages,
+ (int) ((ioc->msingle_pages * 1000)/ioc->msingle_calls));
+#ifdef ALLOW_IOV_BYPASS
+ sprintf(buf, "%spci_map_single(): %12ld bypasses\n",
+ buf, ioc->msingle_bypass);
+#endif
+
+ sprintf(buf, "%spci_unmap_single: %12ld calls %12ld pages (avg %d/1000)\n",
+ buf, ioc->usingle_calls, ioc->usingle_pages,
+ (int) ((ioc->usingle_pages * 1000)/ioc->usingle_calls));
+#ifdef ALLOW_IOV_BYPASS
+ sprintf(buf, "%spci_unmap_single: %12ld bypasses\n",
+ buf, ioc->usingle_bypass);
+#endif
+
+ sprintf(buf, "%spci_map_sg() : %12ld calls %12ld pages (avg %d/1000)\n",
+ buf, ioc->msg_calls, ioc->msg_pages,
+ (int) ((ioc->msg_pages * 1000)/ioc->msg_calls));
+#ifdef ALLOW_IOV_BYPASS
+ sprintf(buf, "%spci_map_sg() : %12ld bypasses\n",
+ buf, ioc->msg_bypass);
+#endif
+
+ sprintf(buf, "%spci_unmap_sg() : %12ld calls %12ld pages (avg %d/1000)\n",
+ buf, ioc->usg_calls, ioc->usg_pages,
+ (int) ((ioc->usg_pages * 1000)/ioc->usg_calls));
+
+ return strlen(buf);
+}
+
+static int
+sba_resource_map(char *buf, char **start, off_t offset, int len)
+{
+ struct ioc *ioc = sba_list->ioc; /* FIXME: Multi-IOC support! */
+ unsigned int *res_ptr = (unsigned int *)ioc->res_map;
+ int i;
+
+ buf[0] = '\0';
+ for(i = 0; i < (ioc->res_size / sizeof(unsigned int)); ++i, ++res_ptr) {
+ if ((i & 7) == 0)
+ strcat(buf,"\n ");
+ sprintf(buf, "%s %08x", buf, *res_ptr);
+ }
+ strcat(buf, "\n");
+
+ return strlen(buf);
+}
+#endif
+
+/*
+** Determine if sba should claim this chip (return 0) or not (return 1).
+** If so, initialize the chip and tell other partners in crime they
+** have work to do.
+*/
+void __init sba_init(void)
+{
+ struct sba_device *sba_dev;
+ u32 func_id, hw_rev;
+ u32 *func_offset = NULL;
+ int i, agp_found = 0;
+ static char sba_rev[6];
+ struct pci_dev *device = NULL;
+ u64 hpa = 0;
+
+ if (!(device = pci_find_device(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_ZX1_SBA, NULL)))
+ return;
+
+ for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+ if (pci_resource_flags(device, i) == IORESOURCE_MEM) {
+ hpa = ioremap(pci_resource_start(device, i),
+ pci_resource_len(device, i));
+ break;
+ }
+ }
+
+ func_id = READ_REG(hpa + SBA_FUNC_ID);
+
+ if (func_id == ZX1_FUNC_ID_VALUE) {
+ (void)strcpy(sba_rev, "zx1");
+ func_offset = zx1_func_offsets;
+ } else {
+ return;
+ }
+
+ /* Read HW Rev First */
+ hw_rev = READ_REG(hpa + SBA_FCLASS) & 0xFFUL;
+
+ /*
+ * Not all revision registers of the chipset are updated on every
+ * turn. Must scan through all functions looking for the highest rev
+ */
+ if (func_offset) {
+ for (i = 0 ; func_offset[i] != -1 ; i++) {
+ u32 func_rev;
+
+ func_rev = READ_REG(hpa + SBA_FCLASS + func_offset[i]) & 0xFFUL;
+ DBG_INIT("%s() func offset: 0x%x rev: 0x%x\n",
+ __FUNCTION__, func_offset[i], func_rev);
+ if (func_rev > hw_rev)
+ hw_rev = func_rev;
+ }
+ }
+
+ printk(KERN_INFO "%s found %s %d.%d at %s, HPA 0x%lx\n", DRIVER_NAME,
+ sba_rev, ((hw_rev >> 4) & 0xF), (hw_rev & 0xF),
+ device->slot_name, hpa);
+
+ if ((hw_rev & 0xFF) < 0x20) {
+ printk(KERN_INFO "%s WARNING rev 2.0 or greater will be required for IO MMU support in the future\n", DRIVER_NAME);
+#ifndef CONFIG_IA64_HP_PROTO
+ panic("%s: CONFIG_IA64_HP_PROTO MUST be enabled to support SBA rev less than 2.0", DRIVER_NAME);
+#endif
+ }
+
+ sba_dev = kmalloc(sizeof(struct sba_device), GFP_KERNEL);
+ if (NULL == sba_dev) {
+ printk(KERN_ERR DRIVER_NAME " - couldn't alloc sba_device\n");
+ return;
+ }
+
+ memset(sba_dev, 0, sizeof(struct sba_device));
+
+ for(i=0; iioc[i].res_lock));
+
+ sba_dev->hw_rev = hw_rev;
+ sba_dev->sba_hpa = hpa;
+
+ /*
+ * We need to check for an AGP device, if we find one, then only
+ * use part of the IOVA space for PCI DMA, the rest is for GART.
+ * REVISIT for multiple IOC.
+ */
+ pci_for_each_dev(device)
+ agp_found |= pci_find_capability(device, PCI_CAP_ID_AGP);
+
+ if (agp_found && reserve_sba_gart)
+ SBA_SET_AGP(sba_dev);
+
+ sba_hw_init(sba_dev);
+ sba_common_init(sba_dev);
+
+#ifdef CONFIG_PROC_FS
+ {
+ struct proc_dir_entry * proc_mckinley_root;
+
+ proc_mckinley_root = proc_mkdir("bus/mckinley",0);
+ create_proc_info_entry(sba_rev, 0, proc_mckinley_root, sba_proc_info);
+ create_proc_info_entry("bitmap", 0, proc_mckinley_root, sba_resource_map);
+ }
+#endif
+}
+
+static int __init
+nosbagart (char *str)
+{
+ reserve_sba_gart = 0;
+ return 1;
+}
+
+__setup("nosbagart",nosbagart);
+
+EXPORT_SYMBOL(sba_init);
+EXPORT_SYMBOL(sba_map_single);
+EXPORT_SYMBOL(sba_unmap_single);
+EXPORT_SYMBOL(sba_map_sg);
+EXPORT_SYMBOL(sba_unmap_sg);
+EXPORT_SYMBOL(sba_dma_address);
+EXPORT_SYMBOL(sba_alloc_consistent);
+EXPORT_SYMBOL(sba_free_consistent);
diff -urN linux-2.4.18/arch/ia64/hp/hpsim_console.c lia64-2.4/arch/ia64/hp/hpsim_console.c
--- linux-2.4.18/arch/ia64/hp/hpsim_console.c Thu Oct 12 14:20:48 2000
+++ lia64-2.4/arch/ia64/hp/hpsim_console.c Wed Dec 31 16:00:00 1969
@@ -1,74 +0,0 @@
-/*
- * Platform dependent support for HP simulator.
- *
- * Copyright (C) 1998, 1999 Hewlett-Packard Co
- * Copyright (C) 1998, 1999 David Mosberger-Tang
- * Copyright (C) 1999 Vijay Chander
- */
-#include
-#include
-#include
-#include
-#include
-#include
-#include
-
-#include
-#include
-#include
-#include
-#include
-#include
-
-#include "hpsim_ssc.h"
-
-static int simcons_init (struct console *, char *);
-static void simcons_write (struct console *, const char *, unsigned);
-static int simcons_wait_key (struct console *);
-static kdev_t simcons_console_device (struct console *);
-
-struct console hpsim_cons = {
- name: "simcons",
- write: simcons_write,
- device: simcons_console_device,
- wait_key: simcons_wait_key,
- setup: simcons_init,
- flags: CON_PRINTBUFFER,
- index: -1,
-};
-
-static int
-simcons_init (struct console *cons, char *options)
-{
- return 0;
-}
-
-static void
-simcons_write (struct console *cons, const char *buf, unsigned count)
-{
- unsigned long ch;
-
- while (count-- > 0) {
- ch = *buf++;
- ia64_ssc(ch, 0, 0, 0, SSC_PUTCHAR);
- if (ch == '\n')
- ia64_ssc('\r', 0, 0, 0, SSC_PUTCHAR);
- }
-}
-
-static int
-simcons_wait_key (struct console *cons)
-{
- char ch;
-
- do {
- ch = ia64_ssc(0, 0, 0, 0, SSC_GETCHAR);
- } while (ch == '\0');
- return ch;
-}
-
-static kdev_t
-simcons_console_device (struct console *c)
-{
- return MKDEV(TTY_MAJOR, 64 + c->index);
-}
diff -urN linux-2.4.18/arch/ia64/hp/hpsim_irq.c lia64-2.4/arch/ia64/hp/hpsim_irq.c
--- linux-2.4.18/arch/ia64/hp/hpsim_irq.c Thu Apr 5 12:51:47 2001
+++ lia64-2.4/arch/ia64/hp/hpsim_irq.c Wed Dec 31 16:00:00 1969
@@ -1,46 +0,0 @@
-/*
- * Platform dependent support for HP simulator.
- *
- * Copyright (C) 1998-2001 Hewlett-Packard Co
- * Copyright (C) 1998-2001 David Mosberger-Tang
- */
-
-#include
-#include
-#include
-#include
-
-static unsigned int
-hpsim_irq_startup (unsigned int irq)
-{
- return 0;
-}
-
-static void
-hpsim_irq_noop (unsigned int irq)
-{
-}
-
-static struct hw_interrupt_type irq_type_hp_sim = {
- typename: "hpsim",
- startup: hpsim_irq_startup,
- shutdown: hpsim_irq_noop,
- enable: hpsim_irq_noop,
- disable: hpsim_irq_noop,
- ack: hpsim_irq_noop,
- end: hpsim_irq_noop,
- set_affinity: (void (*)(unsigned int, unsigned long)) hpsim_irq_noop,
-};
-
-void __init
-hpsim_irq_init (void)
-{
- irq_desc_t *idesc;
- int i;
-
- for (i = 0; i < NR_IRQS; ++i) {
- idesc = irq_desc(i);
- if (idesc->handler == &no_irq_type)
- idesc->handler = &irq_type_hp_sim;
- }
-}
diff -urN linux-2.4.18/arch/ia64/hp/hpsim_machvec.c lia64-2.4/arch/ia64/hp/hpsim_machvec.c
--- linux-2.4.18/arch/ia64/hp/hpsim_machvec.c Fri Aug 11 19:09:06 2000
+++ lia64-2.4/arch/ia64/hp/hpsim_machvec.c Wed Dec 31 16:00:00 1969
@@ -1,2 +0,0 @@
-#define MACHVEC_PLATFORM_NAME hpsim
-#include
diff -urN linux-2.4.18/arch/ia64/hp/hpsim_setup.c lia64-2.4/arch/ia64/hp/hpsim_setup.c
--- linux-2.4.18/arch/ia64/hp/hpsim_setup.c Tue Jul 31 10:30:08 2001
+++ lia64-2.4/arch/ia64/hp/hpsim_setup.c Wed Dec 31 16:00:00 1969
@@ -1,58 +0,0 @@
-/*
- * Platform dependent support for HP simulator.
- *
- * Copyright (C) 1998, 1999 Hewlett-Packard Co
- * Copyright (C) 1998, 1999 David Mosberger-Tang
- * Copyright (C) 1999 Vijay Chander
- */
-#include
-#include
-#include
-#include
-#include
-#include
-#include
-
-#include
-#include
-#include
-#include
-#include
-#include
-
-#include "hpsim_ssc.h"
-
-extern struct console hpsim_cons;
-
-/*
- * Simulator system call.
- */
-asm (".text\n"
- ".align 32\n"
- ".global ia64_ssc\n"
- ".proc ia64_ssc\n"
- "ia64_ssc:\n"
- "mov r15=r36\n"
- "break 0x80001\n"
- "br.ret.sptk.many rp\n"
- ".endp\n");
-
-void
-ia64_ssc_connect_irq (long intr, long irq)
-{
- ia64_ssc(intr, irq, 0, 0, SSC_CONNECT_INTERRUPT);
-}
-
-void
-ia64_ctl_trace (long on)
-{
- ia64_ssc(on, 0, 0, 0, SSC_CTL_TRACE);
-}
-
-void __init
-hpsim_setup (char **cmdline_p)
-{
- ROOT_DEV = to_kdev_t(0x0801); /* default to first SCSI drive */
-
- register_console (&hpsim_cons);
-}
diff -urN linux-2.4.18/arch/ia64/hp/hpsim_ssc.h lia64-2.4/arch/ia64/hp/hpsim_ssc.h
--- linux-2.4.18/arch/ia64/hp/hpsim_ssc.h Sun Feb 6 18:42:40 2000
+++ lia64-2.4/arch/ia64/hp/hpsim_ssc.h Wed Dec 31 16:00:00 1969
@@ -1,36 +0,0 @@
-/*
- * Platform dependent support for HP simulator.
- *
- * Copyright (C) 1998, 1999 Hewlett-Packard Co
- * Copyright (C) 1998, 1999 David Mosberger-Tang
- * Copyright (C) 1999 Vijay Chander
- */
-#ifndef _IA64_PLATFORM_HPSIM_SSC_H
-#define _IA64_PLATFORM_HPSIM_SSC_H
-
-/* Simulator system calls: */
-
-#define SSC_CONSOLE_INIT 20
-#define SSC_GETCHAR 21
-#define SSC_PUTCHAR 31
-#define SSC_CONNECT_INTERRUPT 58
-#define SSC_GENERATE_INTERRUPT 59
-#define SSC_SET_PERIODIC_INTERRUPT 60
-#define SSC_GET_RTC 65
-#define SSC_EXIT 66
-#define SSC_LOAD_SYMBOLS 69
-#define SSC_GET_TOD 74
-#define SSC_CTL_TRACE 76
-
-#define SSC_NETDEV_PROBE 100
-#define SSC_NETDEV_SEND 101
-#define SSC_NETDEV_RECV 102
-#define SSC_NETDEV_ATTACH 103
-#define SSC_NETDEV_DETACH 104
-
-/*
- * Simulator system call.
- */
-extern long ia64_ssc (long arg0, long arg1, long arg2, long arg3, int nr);
-
-#endif /* _IA64_PLATFORM_HPSIM_SSC_H */
diff -urN linux-2.4.18/arch/ia64/hp/sim/Makefile lia64-2.4/arch/ia64/hp/sim/Makefile
--- linux-2.4.18/arch/ia64/hp/sim/Makefile Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/sim/Makefile Fri Apr 5 16:44:44 2002
@@ -0,0 +1,13 @@
+#
+# ia64/platform/hp/sim/Makefile
+#
+# Copyright (C) 1999 Silicon Graphics, Inc.
+# Copyright (C) Srinivasa Thirumalachar (sprasad@engr.sgi.com)
+#
+
+O_TARGET := sim.o
+
+obj-y := hpsim_console.o hpsim_irq.o hpsim_setup.o
+obj-$(CONFIG_IA64_GENERIC) += hpsim_machvec.o
+
+include $(TOPDIR)/Rules.make
diff -urN linux-2.4.18/arch/ia64/hp/sim/hpsim_console.c lia64-2.4/arch/ia64/hp/sim/hpsim_console.c
--- linux-2.4.18/arch/ia64/hp/sim/hpsim_console.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/sim/hpsim_console.c Wed Nov 1 23:10:42 2000
@@ -0,0 +1,74 @@
+/*
+ * Platform dependent support for HP simulator.
+ *
+ * Copyright (C) 1998, 1999 Hewlett-Packard Co
+ * Copyright (C) 1998, 1999 David Mosberger-Tang
+ * Copyright (C) 1999 Vijay Chander
+ */
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include "hpsim_ssc.h"
+
+static int simcons_init (struct console *, char *);
+static void simcons_write (struct console *, const char *, unsigned);
+static int simcons_wait_key (struct console *);
+static kdev_t simcons_console_device (struct console *);
+
+struct console hpsim_cons = {
+ name: "simcons",
+ write: simcons_write,
+ device: simcons_console_device,
+ wait_key: simcons_wait_key,
+ setup: simcons_init,
+ flags: CON_PRINTBUFFER,
+ index: -1,
+};
+
+static int
+simcons_init (struct console *cons, char *options)
+{
+ return 0;
+}
+
+static void
+simcons_write (struct console *cons, const char *buf, unsigned count)
+{
+ unsigned long ch;
+
+ while (count-- > 0) {
+ ch = *buf++;
+ ia64_ssc(ch, 0, 0, 0, SSC_PUTCHAR);
+ if (ch == '\n')
+ ia64_ssc('\r', 0, 0, 0, SSC_PUTCHAR);
+ }
+}
+
+static int
+simcons_wait_key (struct console *cons)
+{
+ char ch;
+
+ do {
+ ch = ia64_ssc(0, 0, 0, 0, SSC_GETCHAR);
+ } while (ch == '\0');
+ return ch;
+}
+
+static kdev_t
+simcons_console_device (struct console *c)
+{
+ return MKDEV(TTY_MAJOR, 64 + c->index);
+}
diff -urN linux-2.4.18/arch/ia64/hp/sim/hpsim_irq.c lia64-2.4/arch/ia64/hp/sim/hpsim_irq.c
--- linux-2.4.18/arch/ia64/hp/sim/hpsim_irq.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/sim/hpsim_irq.c Wed Feb 28 14:43:45 2001
@@ -0,0 +1,46 @@
+/*
+ * Platform dependent support for HP simulator.
+ *
+ * Copyright (C) 1998-2001 Hewlett-Packard Co
+ * Copyright (C) 1998-2001 David Mosberger-Tang
+ */
+
+#include
+#include
+#include
+#include
+
+static unsigned int
+hpsim_irq_startup (unsigned int irq)
+{
+ return 0;
+}
+
+static void
+hpsim_irq_noop (unsigned int irq)
+{
+}
+
+static struct hw_interrupt_type irq_type_hp_sim = {
+ typename: "hpsim",
+ startup: hpsim_irq_startup,
+ shutdown: hpsim_irq_noop,
+ enable: hpsim_irq_noop,
+ disable: hpsim_irq_noop,
+ ack: hpsim_irq_noop,
+ end: hpsim_irq_noop,
+ set_affinity: (void (*)(unsigned int, unsigned long)) hpsim_irq_noop,
+};
+
+void __init
+hpsim_irq_init (void)
+{
+ irq_desc_t *idesc;
+ int i;
+
+ for (i = 0; i < NR_IRQS; ++i) {
+ idesc = irq_desc(i);
+ if (idesc->handler == &no_irq_type)
+ idesc->handler = &irq_type_hp_sim;
+ }
+}
diff -urN linux-2.4.18/arch/ia64/hp/sim/hpsim_machvec.c lia64-2.4/arch/ia64/hp/sim/hpsim_machvec.c
--- linux-2.4.18/arch/ia64/hp/sim/hpsim_machvec.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/sim/hpsim_machvec.c Thu Aug 24 08:17:30 2000
@@ -0,0 +1,2 @@
+#define MACHVEC_PLATFORM_NAME hpsim
+#include
diff -urN linux-2.4.18/arch/ia64/hp/sim/hpsim_setup.c lia64-2.4/arch/ia64/hp/sim/hpsim_setup.c
--- linux-2.4.18/arch/ia64/hp/sim/hpsim_setup.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/sim/hpsim_setup.c Wed May 30 22:41:37 2001
@@ -0,0 +1,58 @@
+/*
+ * Platform dependent support for HP simulator.
+ *
+ * Copyright (C) 1998, 1999 Hewlett-Packard Co
+ * Copyright (C) 1998, 1999 David Mosberger-Tang
+ * Copyright (C) 1999 Vijay Chander
+ */
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include "hpsim_ssc.h"
+
+extern struct console hpsim_cons;
+
+/*
+ * Simulator system call.
+ */
+asm (".text\n"
+ ".align 32\n"
+ ".global ia64_ssc\n"
+ ".proc ia64_ssc\n"
+ "ia64_ssc:\n"
+ "mov r15=r36\n"
+ "break 0x80001\n"
+ "br.ret.sptk.many rp\n"
+ ".endp\n");
+
+void
+ia64_ssc_connect_irq (long intr, long irq)
+{
+ ia64_ssc(intr, irq, 0, 0, SSC_CONNECT_INTERRUPT);
+}
+
+void
+ia64_ctl_trace (long on)
+{
+ ia64_ssc(on, 0, 0, 0, SSC_CTL_TRACE);
+}
+
+void __init
+hpsim_setup (char **cmdline_p)
+{
+ ROOT_DEV = to_kdev_t(0x0801); /* default to first SCSI drive */
+
+ register_console (&hpsim_cons);
+}
diff -urN linux-2.4.18/arch/ia64/hp/sim/hpsim_ssc.h lia64-2.4/arch/ia64/hp/sim/hpsim_ssc.h
--- linux-2.4.18/arch/ia64/hp/sim/hpsim_ssc.h Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/sim/hpsim_ssc.h Sun Feb 6 18:42:40 2000
@@ -0,0 +1,36 @@
+/*
+ * Platform dependent support for HP simulator.
+ *
+ * Copyright (C) 1998, 1999 Hewlett-Packard Co
+ * Copyright (C) 1998, 1999 David Mosberger-Tang
+ * Copyright (C) 1999 Vijay Chander
+ */
+#ifndef _IA64_PLATFORM_HPSIM_SSC_H
+#define _IA64_PLATFORM_HPSIM_SSC_H
+
+/* Simulator system calls: */
+
+#define SSC_CONSOLE_INIT 20
+#define SSC_GETCHAR 21
+#define SSC_PUTCHAR 31
+#define SSC_CONNECT_INTERRUPT 58
+#define SSC_GENERATE_INTERRUPT 59
+#define SSC_SET_PERIODIC_INTERRUPT 60
+#define SSC_GET_RTC 65
+#define SSC_EXIT 66
+#define SSC_LOAD_SYMBOLS 69
+#define SSC_GET_TOD 74
+#define SSC_CTL_TRACE 76
+
+#define SSC_NETDEV_PROBE 100
+#define SSC_NETDEV_SEND 101
+#define SSC_NETDEV_RECV 102
+#define SSC_NETDEV_ATTACH 103
+#define SSC_NETDEV_DETACH 104
+
+/*
+ * Simulator system call.
+ */
+extern long ia64_ssc (long arg0, long arg1, long arg2, long arg3, int nr);
+
+#endif /* _IA64_PLATFORM_HPSIM_SSC_H */
diff -urN linux-2.4.18/arch/ia64/hp/zx1/Makefile lia64-2.4/arch/ia64/hp/zx1/Makefile
--- linux-2.4.18/arch/ia64/hp/zx1/Makefile Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/zx1/Makefile Fri Apr 5 16:44:44 2002
@@ -0,0 +1,13 @@
+#
+# ia64/platform/hp/zx1/Makefile
+#
+# Copyright (C) 2002 Hewlett Packard
+# Copyright (C) Alex Williamson (alex_williamson@hp.com)
+#
+
+O_TARGET := zx1.o
+
+obj-y := hpzx1_misc.o
+obj-$(CONFIG_IA64_GENERIC) += hpzx1_machvec.o
+
+include $(TOPDIR)/Rules.make
diff -urN linux-2.4.18/arch/ia64/hp/zx1/hpzx1_machvec.c lia64-2.4/arch/ia64/hp/zx1/hpzx1_machvec.c
--- linux-2.4.18/arch/ia64/hp/zx1/hpzx1_machvec.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/zx1/hpzx1_machvec.c Fri Apr 5 16:44:44 2002
@@ -0,0 +1,2 @@
+#define MACHVEC_PLATFORM_NAME hpzx1
+#include
diff -urN linux-2.4.18/arch/ia64/hp/zx1/hpzx1_misc.c lia64-2.4/arch/ia64/hp/zx1/hpzx1_misc.c
--- linux-2.4.18/arch/ia64/hp/zx1/hpzx1_misc.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/hp/zx1/hpzx1_misc.c Fri Apr 5 23:29:03 2002
@@ -0,0 +1,402 @@
+/*
+ * Misc. support for HP zx1 chipset support
+ *
+ * Copyright (C) 2002 Hewlett-Packard Co
+ * Copyright (C) 2002 Alex Williamson
+ * Copyright (C) 2002 Bjorn Helgaas
+ */
+
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include "../drivers/acpi/include/platform/acgcc.h"
+#include "../drivers/acpi/include/actypes.h"
+#include "../drivers/acpi/include/acexcep.h"
+#include "../drivers/acpi/include/acpixf.h"
+#include "../drivers/acpi/include/actbl.h"
+#include "../drivers/acpi/include/acconfig.h"
+#include "../drivers/acpi/include/acmacros.h"
+#include "../drivers/acpi/include/aclocal.h"
+#include "../drivers/acpi/include/acobject.h"
+#include "../drivers/acpi/include/acstruct.h"
+#include "../drivers/acpi/include/acnamesp.h"
+#include "../drivers/acpi/include/acutils.h"
+
+#define PFX "hpzx1: "
+
+struct fake_pci_dev {
+ struct fake_pci_dev *next;
+ unsigned char bus;
+ unsigned int devfn;
+ int sizing; // in middle of BAR sizing operation?
+ unsigned long csr_base;
+ unsigned int csr_size;
+ unsigned long mapped_csrs; // ioremapped
+};
+
+static struct fake_pci_dev *fake_pci_head, **fake_pci_tail = &fake_pci_head;
+
+static struct pci_ops orig_pci_ops;
+
+static inline struct fake_pci_dev *
+fake_pci_find_slot(unsigned char bus, unsigned int devfn)
+{
+ struct fake_pci_dev *dev;
+
+ for (dev = fake_pci_head; dev; dev = dev->next)
+ if (dev->bus == bus && dev->devfn == devfn)
+ return dev;
+ return NULL;
+}
+
+static struct fake_pci_dev *
+alloc_fake_pci_dev(void)
+{
+ struct fake_pci_dev *dev;
+
+ dev = kmalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev)
+ return NULL;
+
+ memset(dev, 0, sizeof(*dev));
+
+ *fake_pci_tail = dev;
+ fake_pci_tail = &dev->next;
+
+ return dev;
+}
+
+#define HP_CFG_RD(sz, bits, name) \
+static int hp_cfg_read##sz (struct pci_dev *dev, int where, u##bits *value) \
+{ \
+ struct fake_pci_dev *fake_dev; \
+ if (!(fake_dev = fake_pci_find_slot(dev->bus->number, dev->devfn))) \
+ return orig_pci_ops.name(dev, where, value); \
+ \
+ switch (where) { \
+ case PCI_COMMAND: \
+ *value = read##sz(fake_dev->mapped_csrs + where); \
+ *value |= PCI_COMMAND_MEMORY; /* SBA omits this */ \
+ break; \
+ case PCI_BASE_ADDRESS_0: \
+ if (fake_dev->sizing) \
+ *value = ~(fake_dev->csr_size - 1); \
+ else \
+ *value = (fake_dev->csr_base & \
+ PCI_BASE_ADDRESS_MEM_MASK) | \
+ PCI_BASE_ADDRESS_SPACE_MEMORY; \
+ fake_dev->sizing = 0; \
+ break; \
+ default: \
+ *value = read##sz(fake_dev->mapped_csrs + where); \
+ break; \
+ } \
+ return PCIBIOS_SUCCESSFUL; \
+}
+
+#define HP_CFG_WR(sz, bits, name) \
+static int hp_cfg_write##sz (struct pci_dev *dev, int where, u##bits value) \
+{ \
+ struct fake_pci_dev *fake_dev; \
+ if (!(fake_dev = fake_pci_find_slot(dev->bus->number, dev->devfn))) \
+ return orig_pci_ops.name(dev, where, value); \
+ \
+ switch (where) { \
+ case PCI_BASE_ADDRESS_0: \
+ if (value == ~0) \
+ fake_dev->sizing = 1; \
+ break; \
+ default: \
+ write##sz(value, fake_dev->mapped_csrs + where); \
+ break; \
+ } \
+ return PCIBIOS_SUCCESSFUL; \
+}
+
+HP_CFG_RD(b, 8, read_byte)
+HP_CFG_RD(w, 16, read_word)
+HP_CFG_RD(l, 32, read_dword)
+HP_CFG_WR(b, 8, write_byte)
+HP_CFG_WR(w, 16, write_word)
+HP_CFG_WR(l, 32, write_dword)
+
+static struct pci_ops hp_pci_conf = {
+ hp_cfg_readb,
+ hp_cfg_readw,
+ hp_cfg_readl,
+ hp_cfg_writeb,
+ hp_cfg_writew,
+ hp_cfg_writel,
+};
+
+/*
+ * Assume we'll never have a physical slot higher than 0x10, so we can
+ * use slots above that for "fake" PCI devices to represent things
+ * that only show up in the ACPI namespace.
+ */
+#define HP_MAX_SLOT 0x10
+
+static struct fake_pci_dev *
+hpzx1_fake_pci_dev(unsigned long addr, unsigned int bus, unsigned int size)
+{
+ struct fake_pci_dev *dev;
+ int slot;
+
+ // Note: lspci thinks 0x1f is invalid
+ for (slot = 0x1e; slot > HP_MAX_SLOT; slot--) {
+ if (!fake_pci_find_slot(bus, PCI_DEVFN(slot, 0)))
+ break;
+ }
+ if (slot == HP_MAX_SLOT) {
+ printk(KERN_ERR PFX
+ "no slot space for device (0x%p) on bus 0x%02x\n",
+ (void *) addr, bus);
+ return NULL;
+ }
+
+ dev = alloc_fake_pci_dev();
+ if (!dev) {
+ printk(KERN_ERR PFX
+ "no memory for device (0x%p) on bus 0x%02x\n",
+ (void *) addr, bus);
+ return NULL;
+ }
+
+ dev->bus = bus;
+ dev->devfn = PCI_DEVFN(slot, 0);
+ dev->csr_base = addr;
+ dev->csr_size = size;
+
+ /*
+ * Drivers should ioremap what they need, but we have to do
+ * it here, too, so PCI config accesses work.
+ */
+ dev->mapped_csrs = (unsigned long) ioremap(dev->csr_base, dev->csr_size);
+
+ return dev;
+}
+
+typedef struct {
+ u8 guid_id;
+ u8 guid[16];
+ u8 csr_base[8];
+ u8 csr_length[8];
+} acpi_hp_vendor_long;
+
+#define HP_CCSR_LENGTH 0x21
+#define HP_CCSR_TYPE 0x2
+#define HP_CCSR_GUID EFI_GUID(0x69e9adf9, 0x924f, 0xab5f, \
+ 0xf6, 0x4a, 0x24, 0xd2, 0x01, 0x37, 0x0e, 0xad)
+
+extern acpi_status acpi_get_crs(acpi_handle, acpi_buffer *);
+extern acpi_resource *acpi_get_crs_next(acpi_buffer *, int *);
+extern acpi_resource_data *acpi_get_crs_type(acpi_buffer *, int *, int);
+extern void acpi_dispose_crs(acpi_buffer *);
+extern acpi_status acpi_cf_evaluate_method(acpi_handle, UINT8 *, NATIVE_UINT *);
+
+static acpi_status
+hp_csr_space(acpi_handle obj, u64 *csr_base, u64 *csr_length)
+{
+ int i, offset = 0;
+ acpi_status status;
+ acpi_buffer buf;
+ acpi_resource_vendor *res;
+ acpi_hp_vendor_long *hp_res;
+ efi_guid_t vendor_guid;
+
+ *csr_base = 0;
+ *csr_length = 0;
+
+ status = acpi_get_crs(obj, &buf);
+ if (status != AE_OK) {
+ printk(KERN_ERR PFX "Unable to get _CRS data on object\n");
+ return status;
+ }
+
+ res = (acpi_resource_vendor *)acpi_get_crs_type(&buf, &offset, ACPI_RSTYPE_VENDOR);
+ if (!res) {
+ printk(KERN_ERR PFX "Failed to find config space for device\n");
+ acpi_dispose_crs(&buf);
+ return AE_NOT_FOUND;
+ }
+
+ hp_res = (acpi_hp_vendor_long *)(res->reserved);
+
+ if (res->length != HP_CCSR_LENGTH || hp_res->guid_id != HP_CCSR_TYPE) {
+ printk(KERN_ERR PFX "Unknown Vendor data\n");
+ acpi_dispose_crs(&buf);
+ return AE_TYPE; /* Revisit error? */
+ }
+
+ memcpy(&vendor_guid, hp_res->guid, sizeof(efi_guid_t));
+ if (efi_guidcmp(vendor_guid, HP_CCSR_GUID) != 0) {
+ printk(KERN_ERR PFX "Vendor GUID does not match\n");
+ acpi_dispose_crs(&buf);
+ return AE_TYPE; /* Revisit error? */
+ }
+
+ for (i = 0 ; i < 8 ; i++) {
+ *csr_base |= ((u64)(hp_res->csr_base[i]) << (i * 8));
+ *csr_length |= ((u64)(hp_res->csr_length[i]) << (i * 8));
+ }
+
+ acpi_dispose_crs(&buf);
+
+ return AE_OK;
+}
+
+static acpi_status
+hpzx1_sba_probe(acpi_handle obj, u32 depth, void *context, void **ret)
+{
+ u64 csr_base = 0, csr_length = 0;
+ char *name = context;
+ struct fake_pci_dev *dev;
+ acpi_status status;
+
+ status = hp_csr_space(obj, &csr_base, &csr_length);
+
+printk("hpzx1_sba_probe: status=%d\n", status);
+ if (status != AE_OK)
+ return status;
+
+ /*
+ * Only SBA shows up in ACPI namespace, so its CSR space
+ * includes both SBA and IOC. Make SBA and IOC show up
+ * separately in PCI space.
+ */
+ if ((dev = hpzx1_fake_pci_dev(csr_base, 0, 0x1000)))
+ printk(KERN_INFO PFX "%s SBA at 0x%lx; pci dev %02x:%02x.%d\n",
+ name, csr_base, dev->bus,
+ PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
+ if ((dev = hpzx1_fake_pci_dev(csr_base + 0x1000, 0, 0x1000)))
+ printk(KERN_INFO PFX "%s IOC at 0x%lx; pci dev %02x:%02x.%d\n",
+ name, csr_base + 0x1000, dev->bus,
+ PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
+
+ return AE_OK;
+}
+
+static acpi_status
+hpzx1_lba_probe(acpi_handle obj, u32 depth, void *context, void **ret)
+{
+ acpi_status status;
+ u64 csr_base = 0, csr_length = 0;
+ char *name = context;
+ NATIVE_UINT busnum = 0;
+ struct fake_pci_dev *dev;
+
+ status = hp_csr_space(obj, &csr_base, &csr_length);
+
+ if (status != AE_OK)
+ return status;
+
+ status = acpi_cf_evaluate_method(obj, METHOD_NAME__BBN, &busnum);
+ if (ACPI_FAILURE(status)) {
+ printk(KERN_ERR PFX "evaluate _BBN fail=0x%x\n", status);
+ busnum = 0; // no _BBN; stick it on bus 0
+ }
+
+ if ((dev = hpzx1_fake_pci_dev(csr_base, busnum, csr_length)))
+ printk(KERN_INFO PFX "%s LBA at 0x%lx, _BBN 0x%02x; "
+ "pci dev %02x:%02x.%d\n",
+ name, csr_base, busnum, dev->bus,
+ PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
+
+ return AE_OK;
+}
+
+static void
+hpzx1_acpi_dev_init(void)
+{
+ extern struct pci_ops pci_conf;
+
+ /*
+ * Make fake PCI devices for the following hardware in the
+ * ACPI namespace. This makes it more convenient for drivers
+ * because they can claim these devices based on PCI
+ * information, rather than needing to know about ACPI. The
+ * 64-bit "HPA" space for this hardware is available as BAR
+ * 0/1.
+ *
+ * HWP0001: Single IOC SBA w/o IOC in namespace
+ * HWP0002: LBA device
+ * HWP0003: AGP LBA device
+ */
+printk("hpzx1_acpi_dev_init\n");
+ acpi_get_devices("HWP0001", hpzx1_sba_probe, "HWP0001", NULL);
+#ifdef CONFIG_IA64_HP_PROTO
+ if (fake_pci_tail != &fake_pci_head) {
+#endif
+ acpi_get_devices("HWP0002", hpzx1_lba_probe, "HWP0002", NULL);
+ acpi_get_devices("HWP0003", hpzx1_lba_probe, "HWP0003", NULL);
+
+#ifdef CONFIG_IA64_HP_PROTO
+ }
+
+#define ZX1_FUNC_ID_VALUE (PCI_DEVICE_ID_HP_ZX1_SBA << 16) | PCI_VENDOR_ID_HP
+ /*
+ * Early protos don't have bridges in the ACPI namespace, so
+ * if we didn't find anything, add the things we know are
+ * there.
+ */
+ if (fake_pci_tail == &fake_pci_head) {
+ u64 hpa, csr_base;
+ struct fake_pci_dev *dev;
+
+ csr_base = 0xfed00000UL;
+ hpa = (u64) ioremap(csr_base, 0x1000);
+ if (__raw_readl(hpa) == ZX1_FUNC_ID_VALUE) {
+ if ((dev = hpzx1_fake_pci_dev(csr_base, 0, 0x1000)))
+ printk(KERN_INFO PFX "HWP0001 SBA at 0x%lx; "
+ "pci dev %02x:%02x.%d\n", csr_base,
+ dev->bus, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn));
+ if ((dev = hpzx1_fake_pci_dev(csr_base + 0x1000, 0,
+ 0x1000)))
+ printk(KERN_INFO PFX "HWP0001 IOC at 0x%lx; "
+ "pci dev %02x:%02x.%d\n",
+ csr_base + 0x1000,
+ dev->bus, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn));
+
+ csr_base = 0xfed24000UL;
+ iounmap(hpa);
+ hpa = (u64) ioremap(csr_base, 0x1000);
+ if ((dev = hpzx1_fake_pci_dev(csr_base, 0x40, 0x1000)))
+ printk(KERN_INFO PFX "HWP0003 AGP LBA at "
+ "0x%lx; pci dev %02x:%02x.%d\n",
+ csr_base,
+ dev->bus, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn));
+ }
+ iounmap(hpa);
+ }
+#endif
+
+ if (fake_pci_tail == &fake_pci_head)
+ return;
+
+ /*
+ * Replace PCI ops, but only if we made fake devices.
+ */
+ orig_pci_ops = pci_conf;
+ pci_conf = hp_pci_conf;
+}
+
+extern void sba_init(void);
+
+void
+hpzx1_pci_fixup (int phase)
+{
+ if (phase == 0)
+ hpzx1_acpi_dev_init();
+ iosapic_pci_fixup(phase);
+ if (phase == 1)
+ sba_init();
+}
diff -urN linux-2.4.18/arch/ia64/ia32/binfmt_elf32.c lia64-2.4/arch/ia64/ia32/binfmt_elf32.c
--- linux-2.4.18/arch/ia64/ia32/binfmt_elf32.c Mon Nov 26 11:18:19 2001
+++ lia64-2.4/arch/ia64/ia32/binfmt_elf32.c Mon Feb 4 22:42:54 2002
@@ -142,10 +142,11 @@
/*
* Setup GDTD. Note: GDTD is the descrambled version of the pseudo-descriptor
* format defined by Figure 3-11 "Pseudo-Descriptor Format" in the IA-32
- * architecture manual.
+ * architecture manual. Also note that the only fields that are not ignored are
+ * `base', `limit', 'G', `P' (must be 1) and `S' (must be 0).
*/
- regs->r31 = IA32_SEG_UNSCRAMBLE(IA32_SEG_DESCRIPTOR(IA32_GDT_OFFSET, IA32_PAGE_SIZE - 1, 0,
- 0, 0, 0, 0, 0, 0));
+ regs->r31 = IA32_SEG_UNSCRAMBLE(IA32_SEG_DESCRIPTOR(IA32_GDT_OFFSET, IA32_PAGE_SIZE - 1,
+ 0, 0, 0, 1, 0, 0, 0));
/* Setup the segment selectors */
regs->r16 = (__USER_DS << 16) | __USER_DS; /* ES == DS, GS, FS are zero */
regs->r17 = (__USER_DS << 16) | __USER_CS; /* SS, CS; ia32_load_state() sets TSS and LDT */
@@ -206,6 +207,7 @@
set_personality(PER_LINUX32);
current->thread.map_base = IA32_PAGE_OFFSET/3;
current->thread.task_size = IA32_PAGE_OFFSET; /* use what Linux/x86 uses... */
+ current->thread.flags |= IA64_THREAD_XSTACK; /* data must be executable */
set_fs(USER_DS); /* set addr limit for new TASK_SIZE */
}
diff -urN linux-2.4.18/arch/ia64/ia32/ia32_entry.S lia64-2.4/arch/ia64/ia32/ia32_entry.S
--- linux-2.4.18/arch/ia64/ia32/ia32_entry.S Mon Nov 26 11:18:19 2001
+++ lia64-2.4/arch/ia64/ia32/ia32_entry.S Sat Feb 9 10:41:41 2002
@@ -37,7 +37,7 @@
mov loc1=r16 // save ar.pfs across do_fork
.body
zxt4 out1=in1 // newsp
- mov out3=0 // stacksize
+ mov out3=16 // stacksize (compensates for 16-byte scratch area)
adds out2=IA64_SWITCH_STACK_SIZE+16,sp // out2 = ®s
zxt4 out0=in0 // out0 = clone_flags
br.call.sptk.many rp=do_fork
@@ -220,7 +220,7 @@
data8 sys32_pipe
data8 sys32_times
data8 sys32_ni_syscall /* old prof syscall holder */
- data8 sys_brk /* 45 */
+ data8 sys32_brk /* 45 */
data8 sys_setgid /* 16-bit version */
data8 sys_getgid /* 16-bit version */
data8 sys32_signal
diff -urN linux-2.4.18/arch/ia64/ia32/ia32_ioctl.c lia64-2.4/arch/ia64/ia32/ia32_ioctl.c
--- linux-2.4.18/arch/ia64/ia32/ia32_ioctl.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/ia32/ia32_ioctl.c Thu Feb 21 11:49:45 2002
@@ -79,6 +79,38 @@
return ret;
}
+ case IOCTL_NR(SIOCGIFCONF):
+ {
+ struct ifconf32 {
+ int ifc_len;
+ unsigned int ifc_ptr;
+ } ifconf32;
+ struct ifconf ifconf;
+ int i, n;
+ char *p32, *p64;
+ char buf[32]; /* sizeof IA32 ifreq structure */
+
+ if (copy_from_user(&ifconf32, P(arg), sizeof(ifconf32)))
+ return -EFAULT;
+ ifconf.ifc_len = ifconf32.ifc_len;
+ ifconf.ifc_req = P(ifconf32.ifc_ptr);
+ ret = DO_IOCTL(fd, SIOCGIFCONF, &ifconf);
+ ifconf32.ifc_len = ifconf.ifc_len;
+ if (copy_to_user(P(arg), &ifconf32, sizeof(ifconf32)))
+ return -EFAULT;
+ n = ifconf.ifc_len / sizeof(struct ifreq);
+ p32 = P(ifconf32.ifc_ptr);
+ p64 = P(ifconf32.ifc_ptr);
+ for (i = 0; i < n; i++) {
+ if (copy_from_user(buf, p64, sizeof(struct ifreq)))
+ return -EFAULT;
+ if (copy_to_user(p32, buf, sizeof(buf)))
+ return -EFAULT;
+ p32 += sizeof(buf);
+ p64 += sizeof(struct ifreq);
+ }
+ return ret;
+ }
case IOCTL_NR(DRM_IOCTL_VERSION):
{
diff -urN linux-2.4.18/arch/ia64/ia32/ia32_signal.c lia64-2.4/arch/ia64/ia32/ia32_signal.c
--- linux-2.4.18/arch/ia64/ia32/ia32_signal.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/ia32/ia32_signal.c Tue Feb 26 13:53:30 2002
@@ -522,6 +522,7 @@
static int
setup_frame_ia32 (int sig, struct k_sigaction *ka, sigset_t *set, struct pt_regs * regs)
{
+ struct exec_domain *ed = current->exec_domain;
struct sigframe_ia32 *frame;
int err = 0;
@@ -530,12 +531,8 @@
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
goto give_sigsegv;
- err |= __put_user((current->exec_domain
- && current->exec_domain->signal_invmap
- && sig < 32
- ? (int)(current->exec_domain->signal_invmap[sig])
- : sig),
- &frame->sig);
+ err |= __put_user((ed && ed->signal_invmap
+ && sig < 32 ? (int)(ed->signal_invmap[sig]) : sig), &frame->sig);
err |= setup_sigcontext_ia32(&frame->sc, &frame->fpstate, regs, set->sig[0]);
@@ -590,6 +587,7 @@
setup_rt_frame_ia32 (int sig, struct k_sigaction *ka, siginfo_t *info,
sigset_t *set, struct pt_regs * regs)
{
+ struct exec_domain *ed = current->exec_domain;
struct rt_sigframe_ia32 *frame;
int err = 0;
@@ -598,11 +596,8 @@
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
goto give_sigsegv;
- err |= __put_user((current->exec_domain
- && current->exec_domain->signal_invmap
- && sig < 32
- ? current->exec_domain->signal_invmap[sig]
- : sig),
+ err |= __put_user((ed && ed->signal_invmap
+ && sig < 32 ? ed->signal_invmap[sig] : sig),
&frame->sig);
err |= __put_user((long)&frame->info, &frame->pinfo);
err |= __put_user((long)&frame->uc, &frame->puc);
diff -urN linux-2.4.18/arch/ia64/ia32/ia32_support.c lia64-2.4/arch/ia64/ia32/ia32_support.c
--- linux-2.4.18/arch/ia64/ia32/ia32_support.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/ia32/ia32_support.c Fri Feb 22 17:07:58 2002
@@ -3,7 +3,7 @@
*
* Copyright (C) 1999 Arun Sharma
* Copyright (C) 2000 Asit K. Mallick
- * Copyright (C) 2001 Hewlett-Packard Co
+ * Copyright (C) 2001-2002 Hewlett-Packard Co
* David Mosberger-Tang
*
* 06/16/00 A. Mallick added csd/ssd/tssd for ia32 thread context
@@ -153,10 +153,12 @@
/* We never change the TSS and LDT descriptors, so we can share them across all CPUs. */
ldt_size = PAGE_ALIGN(IA32_LDT_ENTRIES*IA32_LDT_ENTRY_SIZE);
for (nr = 0; nr < NR_CPUS; ++nr) {
- ia32_gdt[_TSS(nr)] = IA32_SEG_DESCRIPTOR(IA32_TSS_OFFSET, 235,
- 0xb, 0, 3, 1, 1, 1, 0);
- ia32_gdt[_LDT(nr)] = IA32_SEG_DESCRIPTOR(IA32_LDT_OFFSET, ldt_size - 1,
- 0x2, 0, 3, 1, 1, 1, 0);
+ ia32_gdt[_TSS(nr) >> IA32_SEGSEL_INDEX_SHIFT]
+ = IA32_SEG_DESCRIPTOR(IA32_TSS_OFFSET, 235,
+ 0xb, 0, 3, 1, 1, 1, 0);
+ ia32_gdt[_LDT(nr) >> IA32_SEGSEL_INDEX_SHIFT]
+ = IA32_SEG_DESCRIPTOR(IA32_LDT_OFFSET, ldt_size - 1,
+ 0x2, 0, 3, 1, 1, 1, 0);
}
}
@@ -172,6 +174,10 @@
siginfo.si_signo = SIGTRAP;
siginfo.si_errno = int_num; /* XXX is it OK to abuse si_errno like this? */
+ siginfo.si_flags = 0;
+ siginfo.si_isr = 0;
+ siginfo.si_addr = 0;
+ siginfo.si_imm = 0;
siginfo.si_code = TRAP_BRKPT;
force_sig_info(SIGTRAP, &siginfo, current);
}
diff -urN linux-2.4.18/arch/ia64/ia32/ia32_traps.c lia64-2.4/arch/ia64/ia32/ia32_traps.c
--- linux-2.4.18/arch/ia64/ia32/ia32_traps.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/ia32/ia32_traps.c Fri Mar 15 12:03:00 2002
@@ -2,7 +2,7 @@
* IA-32 exception handlers
*
* Copyright (C) 2000 Asit K. Mallick
- * Copyright (C) 2001 Hewlett-Packard Co
+ * Copyright (C) 2001-2002 Hewlett-Packard Co
* David Mosberger-Tang
*
* 06/16/00 A. Mallick added siginfo for most cases (close to IA32)
@@ -20,7 +20,7 @@
{
switch ((isr >> 16) & 0xff) {
case 0: /* Instruction intercept fault */
- case 3: /* Locked Data reference fault */
+ case 4: /* Locked Data reference fault */
case 1: /* Gate intercept trap */
return -1;
@@ -40,7 +40,11 @@
{
struct siginfo siginfo;
+ /* initialize these fields to avoid leaking kernel bits to user space: */
siginfo.si_errno = 0;
+ siginfo.si_flags = 0;
+ siginfo.si_isr = 0;
+ siginfo.si_imm = 0;
switch ((isr >> 16) & 0xff) {
case 1:
case 2:
@@ -103,6 +107,8 @@
* and it will suffer the consequences since we won't be able to
* fully reproduce the context of the exception
*/
+ siginfo.si_isr = isr;
+ siginfo.si_flags = __ISR_VALID;
switch(((~fcr) & (fsr & 0x3f)) | (fsr & 0x240)) {
case 0x000:
default:
diff -urN linux-2.4.18/arch/ia64/ia32/sys_ia32.c lia64-2.4/arch/ia64/ia32/sys_ia32.c
--- linux-2.4.18/arch/ia64/ia32/sys_ia32.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/ia32/sys_ia32.c Tue Feb 26 14:35:20 2002
@@ -6,7 +6,7 @@
* Copyright (C) 1999 Arun Sharma
* Copyright (C) 1997,1998 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
* Copyright (C) 1997 David S. Miller (davem@caip.rutgers.edu)
- * Copyright (C) 2000-2001 Hewlett-Packard Co
+ * Copyright (C) 2000-2002 Hewlett-Packard Co
* David Mosberger-Tang
*
* These routines maintain argument size conversion between 32bit and 64bit
@@ -82,6 +82,7 @@
/* forward declaration: */
asmlinkage long sys32_mprotect (unsigned int, unsigned int, int);
+asmlinkage unsigned long sys_brk(unsigned long);
/*
* Anything that modifies or inspects ia32 user virtual memory must hold this semaphore
@@ -412,7 +413,7 @@
return -EINVAL;
}
if (!(prot & PROT_WRITE) && sys_mprotect(pstart, pend - pstart, prot) < 0)
- return EINVAL;
+ return -EINVAL;
}
return start;
}
@@ -2590,6 +2591,7 @@
default:
return -EINVAL;
}
+ return -EINVAL;
}
/*
@@ -3807,6 +3809,19 @@
ret = sys_personality(personality);
if (ret == PER_LINUX32)
ret = PER_LINUX;
+ return ret;
+}
+
+asmlinkage unsigned long
+sys32_brk (unsigned int brk)
+{
+ unsigned long ret, obrk;
+ struct mm_struct *mm = current->mm;
+
+ obrk = mm->brk;
+ ret = sys_brk(brk);
+ if (ret < obrk)
+ clear_user((void *) ret, PAGE_ALIGN(ret) - ret);
return ret;
}
diff -urN linux-2.4.18/arch/ia64/kernel/Makefile lia64-2.4/arch/ia64/kernel/Makefile
--- linux-2.4.18/arch/ia64/kernel/Makefile Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/Makefile Fri Apr 5 16:44:44 2002
@@ -14,9 +14,10 @@
export-objs := ia64_ksyms.o
obj-y := acpi.o entry.o gate.o efi.o efi_stub.o ia64_ksyms.o irq.o irq_ia64.o irq_lsapic.o ivt.o \
- machvec.o pal.o process.o perfmon.o ptrace.o sal.o semaphore.o setup.o \
+ machvec.o pal.o process.o perfmon.o ptrace.o sal.o salinfo.o semaphore.o setup.o \
signal.o sys_ia64.o traps.o time.o unaligned.o unwind.o
obj-$(CONFIG_IA64_GENERIC) += iosapic.o
+obj-$(CONFIG_IA64_HP_ZX1) += iosapic.o
obj-$(CONFIG_IA64_DIG) += iosapic.o
obj-$(CONFIG_IA64_PALINFO) += palinfo.o
obj-$(CONFIG_EFI_VARS) += efivars.o
diff -urN linux-2.4.18/arch/ia64/kernel/acpi.c lia64-2.4/arch/ia64/kernel/acpi.c
--- linux-2.4.18/arch/ia64/kernel/acpi.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/acpi.c Wed Apr 10 11:25:39 2002
@@ -1,21 +1,34 @@
/*
- * Advanced Configuration and Power Interface
+ * acpi.c - Architecture-Specific Low-Level ACPI Support
*
- * Based on 'ACPI Specification 1.0b' February 2, 1999 and
- * 'IA-64 Extensions to ACPI Specification' Revision 0.6
+ * Copyright (C) 1999 VA Linux Systems
+ * Copyright (C) 1999,2000 Walt Drummond
+ * Copyright (C) 2000 Hewlett-Packard Co.
+ * Copyright (C) 2000 David Mosberger-Tang
+ * Copyright (C) 2000 Intel Corp.
+ * Copyright (C) 2000,2001 J.I. Lee
+ * Copyright (C) 2001 Paul Diefenbaugh
*
- * Copyright (C) 1999 VA Linux Systems
- * Copyright (C) 1999,2000 Walt Drummond
- * Copyright (C) 2000 Hewlett-Packard Co.
- * Copyright (C) 2000 David Mosberger-Tang
- * Copyright (C) 2000 Intel Corp.
- * Copyright (C) 2000,2001 J.I. Lee
- * ACPI based kernel configuration manager.
- * ACPI 2.0 & IA64 ext 0.71
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
#include
-
#include
#include
#include
@@ -23,29 +36,16 @@
#include
#include
#include
-#ifdef CONFIG_SERIAL_ACPI
-#include
-#endif
-
-#include
-#include
+#include
#include
#include
#include
#include
#include
+#include
-#undef ACPI_DEBUG /* Guess what this does? */
-
-/* global array to record platform interrupt vectors for generic int routing */
-int platform_irq_list[ACPI_MAX_PLATFORM_IRQS];
-/* These are ugly but will be reclaimed by the kernel */
-int __initdata available_cpus;
-int __initdata total_cpus;
-
-void (*pm_idle) (void);
-void (*pm_power_off) (void);
+#define PREFIX "ACPI: "
asm (".weak iosapic_register_irq");
asm (".weak iosapic_register_legacy_irq");
@@ -53,10 +53,16 @@
asm (".weak iosapic_init");
asm (".weak iosapic_version");
+void (*pm_idle) (void);
+void (*pm_power_off) (void);
+
+
+/*
+ * TBD: Should go away once we have an ACPI parser.
+ */
const char *
acpi_get_sysname (void)
{
- /* the following should go away once we have an ACPI parser: */
#ifdef CONFIG_IA64_GENERIC
return "hpsim";
#else
@@ -72,16 +78,19 @@
# error Unknown platform. Fix acpi.c.
# endif
#endif
-
}
+#define ACPI_MAX_PLATFORM_IRQS 256
+
+/* Array to record platform interrupt vectors for generic interrupt routing. */
+int platform_irq_list[ACPI_MAX_PLATFORM_IRQS];
+
/*
- * Interrupt routing API for device drivers.
- * Provides the interrupt vector for a generic platform event
- * (currently only CPEI implemented)
+ * Interrupt routing API for device drivers. Provides interrupt vector for
+ * a generic platform event. Currently only CPEI is implemented.
*/
int
-acpi_request_vector(u32 int_type)
+acpi_request_vector (u32 int_type)
{
int vector = -1;
@@ -94,586 +103,492 @@
return vector;
}
-/*
- * Configure legacy IRQ information.
- */
-static void __init
-acpi_legacy_irq (char *p)
+
+/* --------------------------------------------------------------------------
+ Boot-time Table Parsing
+ -------------------------------------------------------------------------- */
+
+static int total_cpus __initdata;
+static int available_cpus __initdata;
+struct acpi_table_madt * acpi_madt __initdata;
+
+
+static int __init
+acpi_parse_lapic_addr_ovr (acpi_table_entry_header *header)
{
- acpi_entry_int_override_t *legacy = (acpi_entry_int_override_t *) p;
- unsigned long polarity = 0, edge_triggered = 0;
+ struct acpi_table_lapic_addr_ovr *lapic = NULL;
- /*
- * If the platform we're running doesn't define
- * iosapic_register_legacy_irq(), we ignore this info...
- */
- if (!iosapic_register_legacy_irq)
- return;
+ lapic = (struct acpi_table_lapic_addr_ovr *) header;
+ if (!lapic)
+ return -EINVAL;
+
+ acpi_table_print_madt_entry(header);
- switch (legacy->flags) {
- case 0x5: polarity = 1; edge_triggered = 1; break;
- case 0x7: polarity = 0; edge_triggered = 1; break;
- case 0xd: polarity = 1; edge_triggered = 0; break;
- case 0xf: polarity = 0; edge_triggered = 0; break;
- default:
- printk(" ACPI Legacy IRQ 0x%02x: Unknown flags 0x%x\n", legacy->isa_irq,
- legacy->flags);
- break;
+ if (lapic->address) {
+ iounmap((void *) ipi_base_addr);
+ ipi_base_addr = (unsigned long) ioremap(lapic->address, 0);
}
- iosapic_register_legacy_irq(legacy->isa_irq, legacy->pin, polarity, edge_triggered);
+
+ return 0;
}
-/*
- * ACPI 2.0 tables parsing functions
- */
-static unsigned long
-readl_unaligned(void *p)
+static int __init
+acpi_parse_lsapic (acpi_table_entry_header *header)
{
- unsigned long ret;
-
- memcpy(&ret, p, sizeof(long));
- return ret;
-}
+ struct acpi_table_lsapic *lsapic = NULL;
-/*
- * Identify usable CPU's and remember them for SMP bringup later.
- */
-static void __init
-acpi20_lsapic (char *p)
-{
- int add = 1;
+ lsapic = (struct acpi_table_lsapic *) header;
+ if (!lsapic)
+ return -EINVAL;
- acpi20_entry_lsapic_t *lsapic = (acpi20_entry_lsapic_t *) p;
- printk(" CPU %.04x:%.04x: ", lsapic->eid, lsapic->id);
+ acpi_table_print_madt_entry(header);
- if ((lsapic->flags & LSAPIC_ENABLED) == 0) {
- printk("disabled.\n");
- add = 0;
- }
+ printk("CPU %d (0x%04x)", total_cpus, (lsapic->id << 8) | lsapic->eid);
-#ifdef CONFIG_SMP
- smp_boot_data.cpu_phys_id[total_cpus] = -1;
-#endif
- if (add) {
+ if (lsapic->flags.enabled) {
available_cpus++;
- printk("available");
+ printk(" enabled");
#ifdef CONFIG_SMP
smp_boot_data.cpu_phys_id[total_cpus] = (lsapic->id << 8) | lsapic->eid;
if (hard_smp_processor_id() == smp_boot_data.cpu_phys_id[total_cpus])
printk(" (BSP)");
#endif
- printk(".\n");
}
+ else {
+ printk(" disabled");
+#ifdef CONFIG_SMP
+ smp_boot_data.cpu_phys_id[total_cpus] = -1;
+#endif
+ }
+
+ printk("\n");
+
total_cpus++;
+ return 0;
}
-/*
- * Extract iosapic info from madt (again) to determine which iosapic
- * this platform interrupt resides in
- */
+
static int __init
-acpi20_which_iosapic (int global_vector, acpi_madt_t *madt, u32 *irq_base, char **iosapic_address)
+acpi_parse_lapic_nmi (acpi_table_entry_header *header)
{
- acpi_entry_iosapic_t *iosapic;
- char *p, *end;
- int ver, max_pin;
+ struct acpi_table_lapic_nmi *lacpi_nmi = NULL;
+
+ lacpi_nmi = (struct acpi_table_lapic_nmi*) header;
+ if (!lacpi_nmi)
+ return -EINVAL;
- p = (char *) (madt + 1);
- end = p + (madt->header.length - sizeof(acpi_madt_t));
+ acpi_table_print_madt_entry(header);
+
+ /* TBD: Support lapic_nmi entries */
+
+ return 0;
+}
+
+
+static int __init
+acpi_find_iosapic (int global_vector, u32 *irq_base, char **iosapic_address)
+{
+ struct acpi_table_iosapic *iosapic = NULL;
+ int ver = 0;
+ int max_pin = 0;
+ char *p = 0;
+ char *end = 0;
+
+ if (!irq_base || !iosapic_address)
+ return -ENODEV;
+
+ p = (char *) (acpi_madt + 1);
+ end = p + (acpi_madt->header.length - sizeof(struct acpi_table_madt));
while (p < end) {
- switch (*p) {
- case ACPI20_ENTRY_IO_SAPIC:
- /* collect IOSAPIC info for platform int use later */
- iosapic = (acpi_entry_iosapic_t *)p;
- *irq_base = iosapic->irq_base;
+ if (*p == ACPI_MADT_IOSAPIC) {
+ iosapic = (struct acpi_table_iosapic *) p;
+
+ *irq_base = iosapic->global_irq_base;
*iosapic_address = ioremap(iosapic->address, 0);
- /* is this the iosapic we're looking for? */
+
ver = iosapic_version(*iosapic_address);
max_pin = (ver >> 16) & 0xff;
+
if ((global_vector - *irq_base) <= max_pin)
- return 0; /* found it! */
- break;
- default:
- break;
+ return 0; /* Found it! */
}
p += p[1];
}
- return 1;
+ return -ENODEV;
}
-/*
- * Info on platform interrupt sources: NMI, PMI, INIT, etc.
- */
-static void __init
-acpi20_platform (char *p, acpi_madt_t *madt)
+
+static int __init
+acpi_parse_iosapic (acpi_table_entry_header *header)
{
- int vector;
- u32 irq_base;
- char *iosapic_address;
- unsigned long polarity = 0, trigger = 0;
- acpi20_entry_platform_src_t *plat = (acpi20_entry_platform_src_t *) p;
+ struct acpi_table_iosapic *iosapic;
+
+ iosapic = (struct acpi_table_iosapic *) header;
+ if (!iosapic)
+ return -EINVAL;
+
+ acpi_table_print_madt_entry(header);
+
+ if (iosapic_init) {
+#ifndef CONFIG_ITANIUM
+ /* PCAT_COMPAT flag indicates dual-8259 setup */
+ iosapic_init(iosapic->address, iosapic->global_irq_base,
+ acpi_madt->flags.pcat_compat);
+#else
+ /* Firmware on old Itanium systems is broken */
+ iosapic_init(iosapic->address, iosapic->global_irq_base, 1);
+#endif
+ }
+ return 0;
+}
+
- printk("PLATFORM: IOSAPIC %x -> Vector %x on CPU %.04u:%.04u\n",
- plat->iosapic_vector, plat->global_vector, plat->eid, plat->id);
+static int __init
+acpi_parse_plat_int_src (acpi_table_entry_header *header)
+{
+ struct acpi_table_plat_int_src *plintsrc = NULL;
+ int vector = 0;
+ u32 irq_base = 0;
+ char *iosapic_address = NULL;
+
+ plintsrc = (struct acpi_table_plat_int_src *) header;
+ if (!plintsrc)
+ return -EINVAL;
- /* record platform interrupt vectors for generic int routing code */
+ acpi_table_print_madt_entry(header);
if (!iosapic_register_platform_irq) {
- printk("acpi20_platform(): no ACPI platform IRQ support\n");
- return;
+ printk(KERN_WARNING PREFIX "No ACPI platform IRQ support\n");
+ return -ENODEV;
}
- /* extract polarity and trigger info from flags */
- switch (plat->flags) {
- case 0x5: polarity = 1; trigger = 1; break;
- case 0x7: polarity = 0; trigger = 1; break;
- case 0xd: polarity = 1; trigger = 0; break;
- case 0xf: polarity = 0; trigger = 0; break;
- default:
- printk("acpi20_platform(): unknown flags 0x%x\n", plat->flags);
- break;
- }
-
- /* which iosapic does this IRQ belong to? */
- if (acpi20_which_iosapic(plat->global_vector, madt, &irq_base, &iosapic_address)) {
- printk("acpi20_platform(): I/O SAPIC not found!\n");
- return;
+ if (0 != acpi_find_iosapic(plintsrc->global_irq, &irq_base, &iosapic_address)) {
+ printk(KERN_WARNING PREFIX "IOSAPIC not found\n");
+ return -ENODEV;
}
/*
- * get vector assignment for this IRQ, set attributes, and program the IOSAPIC
- * routing table
+ * Get vector assignment for this IRQ, set attributes, and program the
+ * IOSAPIC routing table.
*/
- vector = iosapic_register_platform_irq(plat->int_type,
- plat->global_vector,
- plat->iosapic_vector,
- plat->eid,
- plat->id,
- polarity,
- trigger,
- irq_base,
- iosapic_address);
- platform_irq_list[plat->int_type] = vector;
+ vector = iosapic_register_platform_irq (plintsrc->type,
+ plintsrc->global_irq,
+ plintsrc->iosapic_vector,
+ plintsrc->eid,
+ plintsrc->id,
+ (plintsrc->flags.polarity == 1) ? 1 : 0,
+ (plintsrc->flags.trigger == 1) ? 1 : 0,
+ irq_base,
+ iosapic_address);
+
+ platform_irq_list[plintsrc->type] = vector;
+ return 0;
}
-/*
- * Override the physical address of the local APIC in the MADT stable header.
- */
-static void __init
-acpi20_lapic_addr_override (char *p)
+
+static int __init
+acpi_parse_int_src_ovr (acpi_table_entry_header *header)
{
- acpi20_entry_lapic_addr_override_t * lapic = (acpi20_entry_lapic_addr_override_t *) p;
+ struct acpi_table_int_src_ovr *p = NULL;
- if (lapic->lapic_address) {
- iounmap((void *)ipi_base_addr);
- ipi_base_addr = (unsigned long) ioremap(lapic->lapic_address, 0);
+ p = (struct acpi_table_int_src_ovr *) header;
+ if (!p)
+ return -EINVAL;
- printk("LOCAL ACPI override to 0x%lx(p=0x%lx)\n",
- ipi_base_addr, lapic->lapic_address);
- }
+ acpi_table_print_madt_entry(header);
+
+ /* Ignore if the platform doesn't support overrides */
+ if (!iosapic_register_legacy_irq)
+ return 0;
+
+ iosapic_register_legacy_irq(p->bus_irq, p->global_irq,
+ (p->flags.polarity == 1) ? 1 : 0,
+ (p->flags.trigger == 1) ? 1 : 0);
+
+ return 0;
}
-/*
- * Parse the ACPI Multiple APIC Description Table
- */
-static void __init
-acpi20_parse_madt (acpi_madt_t *madt)
+
+static int __init
+acpi_parse_nmi_src (acpi_table_entry_header *header)
{
- acpi_entry_iosapic_t *iosapic = NULL;
- acpi20_entry_lsapic_t *lsapic = NULL;
- char *p, *end;
- int i;
-
- /* Base address of IPI Message Block */
- if (madt->lapic_address) {
- ipi_base_addr = (unsigned long) ioremap(madt->lapic_address, 0);
- printk("Lapic address set to 0x%lx\n", ipi_base_addr);
- } else
- printk("Lapic address set to default 0x%lx\n", ipi_base_addr);
+ struct acpi_table_nmi_src *nmi_src = NULL;
- p = (char *) (madt + 1);
- end = p + (madt->header.length - sizeof(acpi_madt_t));
+ nmi_src = (struct acpi_table_nmi_src*) header;
+ if (!nmi_src)
+ return -EINVAL;
- /* Initialize platform interrupt vector array */
- for (i = 0; i < ACPI_MAX_PLATFORM_IRQS; i++)
- platform_irq_list[i] = -1;
+ acpi_table_print_madt_entry(header);
- /*
- * Split-up entry parsing to ensure ordering.
- */
- while (p < end) {
- switch (*p) {
- case ACPI20_ENTRY_LOCAL_APIC_ADDR_OVERRIDE:
- printk("ACPI 2.0 MADT: LOCAL APIC Override\n");
- acpi20_lapic_addr_override(p);
- break;
-
- case ACPI20_ENTRY_LOCAL_SAPIC:
- printk("ACPI 2.0 MADT: LOCAL SAPIC\n");
- lsapic = (acpi20_entry_lsapic_t *) p;
- acpi20_lsapic(p);
- break;
-
- case ACPI20_ENTRY_IO_SAPIC:
- iosapic = (acpi_entry_iosapic_t *) p;
- if (iosapic_init)
- /*
- * The PCAT_COMPAT flag indicates that the system has a
- * dual-8259 compatible setup.
- */
- iosapic_init(iosapic->address, iosapic->irq_base,
-#ifdef CONFIG_ITANIUM
- 1 /* fw on some Itanium systems is broken... */
-#else
- (madt->flags & MADT_PCAT_COMPAT)
-#endif
- );
- break;
+ /* TBD: Support nimsrc entries */
- case ACPI20_ENTRY_PLATFORM_INT_SOURCE:
- printk("ACPI 2.0 MADT: PLATFORM INT SOURCE\n");
- acpi20_platform(p, madt);
- break;
-
- case ACPI20_ENTRY_LOCAL_APIC:
- printk("ACPI 2.0 MADT: LOCAL APIC entry\n"); break;
- case ACPI20_ENTRY_IO_APIC:
- printk("ACPI 2.0 MADT: IO APIC entry\n"); break;
- case ACPI20_ENTRY_NMI_SOURCE:
- printk("ACPI 2.0 MADT: NMI SOURCE entry\n"); break;
- case ACPI20_ENTRY_LOCAL_APIC_NMI:
- printk("ACPI 2.0 MADT: LOCAL APIC NMI entry\n"); break;
- case ACPI20_ENTRY_INT_SRC_OVERRIDE:
- break;
- default:
- printk("ACPI 2.0 MADT: unknown entry skip\n"); break;
- break;
- }
- p += p[1];
- }
+ return 0;
+}
- p = (char *) (madt + 1);
- end = p + (madt->header.length - sizeof(acpi_madt_t));
- while (p < end) {
- switch (*p) {
- case ACPI20_ENTRY_LOCAL_APIC:
- if (lsapic) break;
- printk("ACPI 2.0 MADT: LOCAL APIC entry\n");
- /* parse local apic if there's no local Sapic */
- break;
- case ACPI20_ENTRY_IO_APIC:
- if (iosapic) break;
- printk("ACPI 2.0 MADT: IO APIC entry\n");
- /* parse ioapic if there's no ioSapic */
- break;
- default:
- break;
- }
- p += p[1];
- }
+static int __init
+acpi_parse_madt (unsigned long phys_addr, unsigned long size)
+{
+ int i = 0;
- p = (char *) (madt + 1);
- end = p + (madt->header.length - sizeof(acpi_madt_t));
+ if (!phys_addr || !size)
+ return -EINVAL;
- while (p < end) {
- switch (*p) {
- case ACPI20_ENTRY_INT_SRC_OVERRIDE:
- printk("ACPI 2.0 MADT: INT SOURCE Override\n");
- acpi_legacy_irq(p);
- break;
- default:
- break;
- }
- p += p[1];
+ acpi_madt = (struct acpi_table_madt *) __va(phys_addr);
+ if (!acpi_madt) {
+ printk(KERN_WARNING PREFIX "Unable to map MADT\n");
+ return -ENODEV;
}
- /* Make bootup pretty */
- printk(" %d CPUs available, %d CPUs total\n",
- available_cpus, total_cpus);
+ /* Initialize platform interrupt vector array */
+
+ for (i = 0; i < ACPI_MAX_PLATFORM_IRQS; i++)
+ platform_irq_list[i] = -1;
+
+ /* Get base address of IPI Message Block */
+
+ if (acpi_madt->lapic_address)
+ ipi_base_addr = (unsigned long)
+ ioremap(acpi_madt->lapic_address, 0);
+
+ printk(KERN_INFO PREFIX "Local APIC address 0x%lx\n", ipi_base_addr);
+
+ return 0;
}
+
int __init
-acpi20_parse (acpi20_rsdp_t *rsdp20)
+acpi_find_rsdp (unsigned long *rsdp_phys)
{
-# ifdef CONFIG_ACPI
- acpi_xsdt_t *xsdt;
- acpi_desc_table_hdr_t *hdrp;
- acpi_madt_t *madt;
- int tables, i;
+ if (!rsdp_phys)
+ return -EINVAL;
- if (strncmp(rsdp20->signature, ACPI_RSDP_SIG, ACPI_RSDP_SIG_LEN)) {
- printk("ACPI 2.0 RSDP signature incorrect!\n");
+ if (efi.acpi20) {
+ (*rsdp_phys) = __pa(efi.acpi20);
return 0;
- } else {
- printk("ACPI 2.0 Root System Description Ptr at 0x%lx\n",
- (unsigned long)rsdp20);
}
-
- xsdt = __va(rsdp20->xsdt);
- hdrp = &xsdt->header;
- if (strncmp(hdrp->signature,
- ACPI_XSDT_SIG, ACPI_XSDT_SIG_LEN)) {
- printk("ACPI 2.0 XSDT signature incorrect. Trying RSDT\n");
- /* RSDT parsing here */
- return 0;
- } else {
- printk("ACPI 2.0 XSDT at 0x%lx (p=0x%lx)\n",
- (unsigned long)xsdt, (unsigned long)rsdp20->xsdt);
+ else if (efi.acpi) {
+ printk(KERN_WARNING PREFIX "v1.0/r0.71 tables no longer supported\n");
}
- printk("ACPI 2.0: %.6s %.8s %d.%d\n",
- hdrp->oem_id,
- hdrp->oem_table_id,
- hdrp->oem_revision >> 16,
- hdrp->oem_revision & 0xffff);
+ return -ENODEV;
+}
+
- acpi_cf_init((void *)rsdp20);
+#ifdef CONFIG_SERIAL_ACPI
- tables =(hdrp->length -sizeof(acpi_desc_table_hdr_t))>>3;
+#include
- for (i = 0; i < tables; i++) {
- hdrp = (acpi_desc_table_hdr_t *) __va(readl_unaligned(&xsdt->entry_ptrs[i]));
- printk(" :table %4.4s found\n", hdrp->signature);
+static int __init
+acpi_parse_spcr (unsigned long phys_addr, unsigned long size)
+{
+ acpi_ser_t *spcr = NULL;
+ unsigned long global_int = 0;
- /* Only interested int the MADT table for now ... */
- if (strncmp(hdrp->signature,
- ACPI_MADT_SIG, ACPI_MADT_SIG_LEN) != 0)
- continue;
+ if (!phys_addr || !size)
+ return -EINVAL;
- /* Save MADT pointer for later */
- madt = (acpi_madt_t *) hdrp;
- acpi20_parse_madt(madt);
- }
+ if (!iosapic_register_irq)
+ return -ENODEV;
-#ifdef CONFIG_SERIAL_ACPI
/*
- * Now we're interested in other tables. We want the iosapics already
- * initialized, so we do it in a separate loop.
+ * ACPI is able to describe serial ports that live at non-standard
+ * memory addresses and use non-standard interrupts, either via
+ * direct SAPIC mappings or via PCI interrupts. We handle interrupt
+ * routing for SAPIC-based (non-PCI) devices here. Interrupt routing
+ * for PCI devices will be handled when processing the PCI Interrupt
+ * Routing Table (PRT).
*/
- for (i = 0; i < tables; i++) {
- hdrp = (acpi_desc_table_hdr_t *) __va(readl_unaligned(&xsdt->entry_ptrs[i]));
- /*
- * search for SPCR and DBGP table entries so we can enable
- * non-pci interrupts to IO-SAPICs.
- */
- if (!strncmp(hdrp->signature, ACPI_SPCRT_SIG, ACPI_SPCRT_SIG_LEN) ||
- !strncmp(hdrp->signature, ACPI_DBGPT_SIG, ACPI_DBGPT_SIG_LEN))
- {
- acpi_ser_t *spcr = (void *)hdrp;
- unsigned long global_int;
-
- setup_serial_acpi(hdrp);
-
- /*
- * ACPI is able to describe serial ports that live at non-standard
- * memory space addresses and use SAPIC interrupts. If not also
- * PCI devices, there would be no interrupt vector information for
- * them. This checks for and fixes that situation.
- */
- if (spcr->length < sizeof(acpi_ser_t))
- /* table is not long enough for full info, thus no int */
- break;
-
- /*
- * If the device is not in PCI space, but uses a SAPIC interrupt,
- * we need to program the SAPIC so that serial can autoprobe for
- * the IA64 interrupt vector later on. If the device is in PCI
- * space, it should already be setup via the PCI vectors
- */
- if (spcr->base_addr.space_id != ACPI_SERIAL_PCICONF_SPACE &&
- spcr->int_type == ACPI_SERIAL_INT_SAPIC)
- {
- u32 irq_base;
- char *iosapic_address;
- int vector;
-
- /* We have a UART in memory space with a SAPIC interrupt */
- global_int = ( (spcr->global_int[3] << 24)
- | (spcr->global_int[2] << 16)
- | (spcr->global_int[1] << 8)
- | spcr->global_int[0]);
-
- if (!iosapic_register_irq)
- continue;
-
- /* which iosapic does this IRQ belong to? */
- if (acpi20_which_iosapic(global_int, madt, &irq_base,
- &iosapic_address) == 0)
- {
- vector = iosapic_register_irq(global_int,
- 1, /* active high polarity */
- 1, /* edge triggered */
- irq_base,
- iosapic_address);
- }
- }
- }
+
+ spcr = (acpi_ser_t *) __va(phys_addr);
+ if (!spcr) {
+ printk(KERN_WARNING PREFIX "Unable to map SPCR\n");
+ return -ENODEV;
}
-#endif
- acpi_cf_terminate();
-# ifdef CONFIG_SMP
- if (available_cpus == 0) {
- printk("ACPI: Found 0 CPUS; assuming 1\n");
- available_cpus = 1; /* We've got at least one of these, no? */
+ setup_serial_acpi(spcr);
+
+ if (spcr->length < sizeof(acpi_ser_t))
+ /* Table not long enough for full info, thus no interrupt */
+ return -ENODEV;
+
+ if ((spcr->base_addr.space_id != ACPI_SERIAL_PCICONF_SPACE) &&
+ (spcr->int_type == ACPI_SERIAL_INT_SAPIC))
+ {
+ u32 irq_base = 0;
+ char *iosapic_address = NULL;
+ int vector = 0;
+
+ /* We have a UART in memory space with an SAPIC interrupt */
+
+ global_int = ( (spcr->global_int[3] << 24) |
+ (spcr->global_int[2] << 16) |
+ (spcr->global_int[1] << 8) |
+ (spcr->global_int[0]) );
+
+ /* Which iosapic does this IRQ belong to? */
+
+ if (0 == acpi_find_iosapic(global_int, &irq_base, &iosapic_address)) {
+ vector = iosapic_register_irq (global_int, 1, 1,
+ irq_base, iosapic_address);
+ }
}
- smp_boot_data.cpu_count = total_cpus;
-# endif
-# endif /* CONFIG_ACPI */
- return 1;
+ return 0;
}
-/*
- * ACPI 1.0b with 0.71 IA64 extensions functions; should be removed once all
- * platforms start supporting ACPI 2.0
- */
-/*
- * Identify usable CPU's and remember them for SMP bringup later.
- */
-static void __init
-acpi_lsapic (char *p)
+#endif /*CONFIG_SERIAL_ACPI*/
+
+
+int __init
+acpi_boot_init (char *cmdline)
{
- int add = 1;
+ int result = 0;
+
+ /* Initialize the ACPI boot-time table parser */
+ result = acpi_table_init(cmdline);
+ if (0 != result)
+ return result;
- acpi_entry_lsapic_t *lsapic = (acpi_entry_lsapic_t *) p;
+ /*
+ * MADT
+ * ----
+ * Parse the Multiple APIC Description Table (MADT), if exists.
+ * Note that this table provides platform SMP configuration
+ * information -- the successor to MPS tables.
+ */
- if ((lsapic->flags & LSAPIC_PRESENT) == 0)
- return;
+ result = acpi_table_parse(ACPI_APIC, acpi_parse_madt);
+ if (1 > result)
+ return result;
- printk(" CPU %d (%.04x:%.04x): ", total_cpus, lsapic->eid, lsapic->id);
+ /* Local APIC */
- if ((lsapic->flags & LSAPIC_ENABLED) == 0) {
- printk("Disabled.\n");
- add = 0;
- } else if (lsapic->flags & LSAPIC_PERFORMANCE_RESTRICTED) {
- printk("Performance Restricted; ignoring.\n");
- add = 0;
+ result = acpi_table_parse_madt(ACPI_MADT_LAPIC_ADDR_OVR, acpi_parse_lapic_addr_ovr);
+ if (0 > result) {
+ printk(KERN_ERR PREFIX "Error parsing LAPIC address override entry\n");
+ return result;
}
-#ifdef CONFIG_SMP
- smp_boot_data.cpu_phys_id[total_cpus] = -1;
-#endif
- if (add) {
- printk("Available.\n");
- available_cpus++;
-#ifdef CONFIG_SMP
- smp_boot_data.cpu_phys_id[total_cpus] = (lsapic->id << 8) | lsapic->eid;
-#endif /* CONFIG_SMP */
+ result = acpi_table_parse_madt(ACPI_MADT_LSAPIC, acpi_parse_lsapic);
+ if (1 > result) {
+ printk(KERN_ERR PREFIX "Error parsing MADT - no LAPIC entries!\n");
+ return -ENODEV;
}
- total_cpus++;
-}
-/*
- * Info on platform interrupt sources: NMI. PMI, INIT, etc.
- */
-static void __init
-acpi_platform (char *p)
-{
- acpi_entry_platform_src_t *plat = (acpi_entry_platform_src_t *) p;
+ result = acpi_table_parse_madt(ACPI_MADT_LAPIC_NMI, acpi_parse_lapic_nmi);
+ if (0 > result) {
+ printk(KERN_ERR PREFIX "Error parsing LAPIC NMI entry\n");
+ return result;
+ }
- printk("PLATFORM: IOSAPIC %x -> Vector %x on CPU %.04u:%.04u\n",
- plat->iosapic_vector, plat->global_vector, plat->eid, plat->id);
-}
+ /* I/O APIC */
-/*
- * Parse the ACPI Multiple SAPIC Table
- */
-static void __init
-acpi_parse_msapic (acpi_sapic_t *msapic)
-{
- acpi_entry_iosapic_t *iosapic;
- char *p, *end;
+ result = acpi_table_parse_madt(ACPI_MADT_IOSAPIC, acpi_parse_iosapic);
+ if (1 > result) {
+ printk(KERN_ERR PREFIX "Error parsing MADT - no IOAPIC entries!\n");
+ return ((result == 0) ? -ENODEV : result);
+ }
- /* Base address of IPI Message Block */
- ipi_base_addr = (unsigned long) ioremap(msapic->interrupt_block, 0);
+ /* System-Level Interrupt Routing */
- p = (char *) (msapic + 1);
- end = p + (msapic->header.length - sizeof(acpi_sapic_t));
+ result = acpi_table_parse_madt(ACPI_MADT_PLAT_INT_SRC, acpi_parse_plat_int_src);
+ if (0 > result) {
+ printk(KERN_ERR PREFIX "Error parsing platform interrupt source entry\n");
+ return result;
+ }
- while (p < end) {
- switch (*p) {
- case ACPI_ENTRY_LOCAL_SAPIC:
- acpi_lsapic(p);
- break;
-
- case ACPI_ENTRY_IO_SAPIC:
- iosapic = (acpi_entry_iosapic_t *) p;
- if (iosapic_init)
- /*
- * The ACPI I/O SAPIC table doesn't have a PCAT_COMPAT
- * flag like the MADT table, but we can safely assume that
- * ACPI 1.0b systems have a dual-8259 setup.
- */
- iosapic_init(iosapic->address, iosapic->irq_base, 1);
- break;
-
- case ACPI_ENTRY_INT_SRC_OVERRIDE:
- acpi_legacy_irq(p);
- break;
-
- case ACPI_ENTRY_PLATFORM_INT_SOURCE:
- acpi_platform(p);
- break;
+ result = acpi_table_parse_madt(ACPI_MADT_INT_SRC_OVR, acpi_parse_int_src_ovr);
+ if (0 > result) {
+ printk(KERN_ERR PREFIX "Error parsing interrupt source overrides entry\n");
+ return result;
+ }
- default:
- break;
- }
+ result = acpi_table_parse_madt(ACPI_MADT_NMI_SRC, acpi_parse_nmi_src);
+ if (0 > result) {
+ printk(KERN_ERR PREFIX "Error parsing NMI SRC entry\n");
+ return result;
+ }
- /* Move to next table entry. */
- p += p[1];
+#ifdef CONFIG_SERIAL_ACPI
+ /*
+ * TBD: Need phased approach to table parsing (only do those absolutely
+ * required during boot-up). Recommend expanding concept of fix-
+ * feature devices (LDM) to include table-based devices such as
+ * serial ports, EC, SMBus, etc.
+ */
+ acpi_table_parse(ACPI_SPCR, acpi_parse_spcr);
+#endif /*CONFIG_SERIAL_ACPI*/
+
+#ifdef CONFIG_SMP
+ if (available_cpus == 0) {
+ printk("ACPI: Found 0 CPUS; assuming 1\n");
+ available_cpus = 1; /* We've got at least one of these, no? */
}
+ smp_boot_data.cpu_count = total_cpus;
+#endif
+ /* Make boot-up look pretty */
+ printk("%d CPUs available, %d CPUs total\n", available_cpus, total_cpus);
- /* Make bootup pretty */
- printk(" %d CPUs available, %d CPUs total\n", available_cpus, total_cpus);
+ return 0;
}
+
+/* --------------------------------------------------------------------------
+ PCI Interrupt Routing
+ -------------------------------------------------------------------------- */
+
int __init
-acpi_parse (acpi_rsdp_t *rsdp)
+acpi_get_prt (struct pci_vector_struct **vectors, int *count)
{
-# ifdef CONFIG_ACPI
- acpi_rsdt_t *rsdt;
- acpi_desc_table_hdr_t *hdrp;
- long tables, i;
+ struct pci_vector_struct *vector = NULL;
+ struct list_head *node = NULL;
+ struct acpi_prt_entry *entry = NULL;
+ int i = 0;
- if (strncmp(rsdp->signature, ACPI_RSDP_SIG, ACPI_RSDP_SIG_LEN)) {
- printk("Uh-oh, ACPI RSDP signature incorrect!\n");
- return 0;
- }
+ if (!vectors || !count)
+ return -EINVAL;
- rsdt = __va(rsdp->rsdt);
- if (strncmp(rsdt->header.signature, ACPI_RSDT_SIG, ACPI_RSDT_SIG_LEN)) {
- printk("Uh-oh, ACPI RDST signature incorrect!\n");
- return 0;
+ *vectors = NULL;
+ *count = 0;
+
+ if (acpi_prts.count < 0) {
+ printk(KERN_ERR PREFIX "No PCI IRQ routing entries\n");
+ return -ENODEV;
}
- printk("ACPI: %.6s %.8s %d.%d\n", rsdt->header.oem_id, rsdt->header.oem_table_id,
- rsdt->header.oem_revision >> 16, rsdt->header.oem_revision & 0xffff);
+ /* Allocate vectors */
- acpi_cf_init(rsdp);
+ *vectors = kmalloc(sizeof(struct pci_vector_struct) * acpi_prts.count, GFP_KERNEL);
+ if (!(*vectors))
+ return -ENOMEM;
- tables = (rsdt->header.length - sizeof(acpi_desc_table_hdr_t)) / 8;
- for (i = 0; i < tables; i++) {
- hdrp = (acpi_desc_table_hdr_t *) __va(rsdt->entry_ptrs[i]);
+ /* Convert PRT entries to IOSAPIC PCI vectors */
- /* Only interested int the MSAPIC table for now ... */
- if (strncmp(hdrp->signature, ACPI_SAPIC_SIG, ACPI_SAPIC_SIG_LEN) != 0)
- continue;
+ vector = *vectors;
- acpi_parse_msapic((acpi_sapic_t *) hdrp);
+ list_for_each(node, &acpi_prts.entries) {
+ entry = (struct acpi_prt_entry *)node;
+ vector[i].bus = (u16) entry->id.bus;
+ vector[i].pci_id = (u32) entry->id.dev << 16 | 0xffff;
+ vector[i].pin = (u8) entry->id.pin;
+ vector[i].irq = (u8) entry->source.index;
+ i++;
}
+ *count = acpi_prts.count;
+ return 0;
+}
- acpi_cf_terminate();
+/* Assume IA64 always use I/O SAPIC */
-# ifdef CONFIG_SMP
- if (available_cpus == 0) {
- printk("ACPI: Found 0 CPUS; assuming 1\n");
- available_cpus = 1; /* We've got at least one of these, no? */
- }
- smp_boot_data.cpu_count = total_cpus;
-# endif
-# endif /* CONFIG_ACPI */
- return 1;
+int __init
+acpi_get_interrupt_model (int *type)
+{
+ if (!type)
+ return -EINVAL;
+
+ *type = ACPI_INT_MODEL_IOSAPIC;
+
+ return 0;
}
diff -urN linux-2.4.18/arch/ia64/kernel/brl_emu.c lia64-2.4/arch/ia64/kernel/brl_emu.c
--- linux-2.4.18/arch/ia64/kernel/brl_emu.c Thu Apr 5 12:51:47 2001
+++ lia64-2.4/arch/ia64/kernel/brl_emu.c Fri Feb 22 17:12:29 2002
@@ -2,6 +2,9 @@
* Emulation of the "brl" instruction for IA64 processors that
* don't support it in hardware.
* Author: Stephan Zeisset, Intel Corp.
+ *
+ * 02/22/02 D. Mosberger Clear si_flgs, si_isr, and si_imm to avoid
+ * leaking kernel bits.
*/
#include
@@ -195,6 +198,9 @@
printk("Woah! Unimplemented Instruction Address Trap!\n");
siginfo.si_signo = SIGILL;
siginfo.si_errno = 0;
+ siginfo.si_flags = 0;
+ siginfo.si_isr = 0;
+ siginfo.si_imm = 0;
siginfo.si_code = ILL_BADIADDR;
force_sig_info(SIGILL, &siginfo, current);
} else if (ia64_psr(regs)->tb) {
@@ -205,6 +211,10 @@
siginfo.si_signo = SIGTRAP;
siginfo.si_errno = 0;
siginfo.si_code = TRAP_BRANCH;
+ siginfo.si_flags = 0;
+ siginfo.si_isr = 0;
+ siginfo.si_addr = 0;
+ siginfo.si_imm = 0;
force_sig_info(SIGTRAP, &siginfo, current);
} else if (ia64_psr(regs)->ss) {
/*
@@ -214,6 +224,10 @@
siginfo.si_signo = SIGTRAP;
siginfo.si_errno = 0;
siginfo.si_code = TRAP_TRACE;
+ siginfo.si_flags = 0;
+ siginfo.si_isr = 0;
+ siginfo.si_addr = 0;
+ siginfo.si_imm = 0;
force_sig_info(SIGTRAP, &siginfo, current);
}
return rv;
diff -urN linux-2.4.18/arch/ia64/kernel/efi.c lia64-2.4/arch/ia64/kernel/efi.c
--- linux-2.4.18/arch/ia64/kernel/efi.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/efi.c Wed Apr 10 11:53:19 2002
@@ -155,10 +155,10 @@
case EFI_CONVENTIONAL_MEMORY:
if (!(md->attribute & EFI_MEMORY_WB))
continue;
- if (md->phys_addr + (md->num_pages << 12) > mem_limit) {
+ if (md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) > mem_limit) {
if (md->phys_addr > mem_limit)
continue;
- md->num_pages = (mem_limit - md->phys_addr) >> 12;
+ md->num_pages = (mem_limit - md->phys_addr) >> EFI_PAGE_SHIFT;
}
if (md->num_pages == 0) {
printk("efi_memmap_walk: ignoring empty region at 0x%lx",
@@ -167,7 +167,7 @@
}
curr.start = PAGE_OFFSET + md->phys_addr;
- curr.end = curr.start + (md->num_pages << 12);
+ curr.end = curr.start + (md->num_pages << EFI_PAGE_SHIFT);
if (!prev_valid) {
prev = curr;
@@ -250,16 +250,17 @@
* dedicated ITR for the PAL code.
*/
if ((vaddr & mask) == (KERNEL_START & mask)) {
- printk(__FUNCTION__ ": no need to install ITR for PAL code\n");
+ printk("%s: no need to install ITR for PAL code\n", __FUNCTION__);
continue;
}
- if (md->num_pages << 12 > IA64_GRANULE_SIZE)
+ if (md->num_pages << EFI_PAGE_SHIFT > IA64_GRANULE_SIZE)
panic("Woah! PAL code size bigger than a granule!");
mask = ~((1 << IA64_GRANULE_SHIFT) - 1);
printk("CPU %d: mapping PAL code [0x%lx-0x%lx) into [0x%lx-0x%lx)\n",
- smp_processor_id(), md->phys_addr, md->phys_addr + (md->num_pages << 12),
+ smp_processor_id(), md->phys_addr,
+ md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT),
vaddr & mask, (vaddr & mask) + IA64_GRANULE_SIZE);
/*
@@ -375,7 +376,8 @@
md = p;
printk("mem%02u: type=%u, attr=0x%lx, range=[0x%016lx-0x%016lx) (%luMB)\n",
i, md->type, md->attribute, md->phys_addr,
- md->phys_addr + (md->num_pages<<12) - 1, md->num_pages >> 8);
+ md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1,
+ md->num_pages >> (20 - EFI_PAGE_SHIFT));
}
}
#endif
@@ -482,8 +484,50 @@
return 0;
}
+u32
+efi_mem_type (u64 phys_addr)
+{
+ void *efi_map_start, *efi_map_end, *p;
+ efi_memory_desc_t *md;
+ u64 efi_desc_size;
+
+ efi_map_start = __va(ia64_boot_param->efi_memmap);
+ efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
+ efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+ for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
+ md = p;
+
+ if ((md->phys_addr <= phys_addr) && (phys_addr <=
+ (md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1)))
+ return md->type;
+ }
+ return 0;
+}
+
+u64
+efi_mem_attributes (u64 phys_addr)
+{
+ void *efi_map_start, *efi_map_end, *p;
+ efi_memory_desc_t *md;
+ u64 efi_desc_size;
+
+ efi_map_start = __va(ia64_boot_param->efi_memmap);
+ efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
+ efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+ for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
+ md = p;
+
+ if ((md->phys_addr <= phys_addr) && (phys_addr <=
+ (md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1)))
+ return md->attribute;
+ }
+ return 0;
+}
+
static void __exit
-efivars_exit(void)
+efivars_exit (void)
{
#ifdef CONFIG_PROC_FS
remove_proc_entry(efi_dir->name, NULL);
diff -urN linux-2.4.18/arch/ia64/kernel/efivars.c lia64-2.4/arch/ia64/kernel/efivars.c
--- linux-2.4.18/arch/ia64/kernel/efivars.c Wed Dec 26 16:58:36 2001
+++ lia64-2.4/arch/ia64/kernel/efivars.c Thu Mar 28 16:11:08 2002
@@ -29,6 +29,14 @@
*
* Changelog:
*
+ * 25 Mar 2002 - Matt Domsch
+ * move uuid_unparse() to include/asm-ia64/efi.h:efi_guid_unparse()
+ *
+ * 12 Feb 2002 - Matt Domsch
+ * use list_for_each_safe when deleting vars.
+ * remove ifdef CONFIG_SMP around include
+ * v0.04 release to linux-ia64@linuxia64.org
+ *
* 20 April 2001 - Matt Domsch
* Moved vars from /proc/efi to /proc/efi/vars, and made
* efi.c own the /proc/efi directory.
@@ -56,18 +64,16 @@
#include /* for capable() */
#include
#include
+#include
#include
#include
-#ifdef CONFIG_SMP
-#include
-#endif
MODULE_AUTHOR("Matt Domsch ");
MODULE_DESCRIPTION("/proc interface to EFI Variables");
MODULE_LICENSE("GPL");
-#define EFIVARS_VERSION "0.03 2001-Apr-20"
+#define EFIVARS_VERSION "0.05 2002-Mar-26"
static int
efivar_read(char *page, char **start, off_t off,
@@ -138,20 +144,6 @@
return len;
}
-
-static void
-uuid_unparse(efi_guid_t *guid, char *out)
-{
- sprintf(out, "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x",
- guid->data1, guid->data2, guid->data3,
- guid->data4[0], guid->data4[1], guid->data4[2], guid->data4[3],
- guid->data4[4], guid->data4[5], guid->data4[6], guid->data4[7]);
-}
-
-
-
-
-
/*
* efivar_create_proc_entry()
* Requires:
@@ -194,7 +186,7 @@
private variables from another's. */
*(short_name + strlen(short_name)) = '-';
- uuid_unparse(vendor_guid, short_name + strlen(short_name));
+ efi_guid_unparse(vendor_guid, short_name + strlen(short_name));
/* Create the entry in proc */
@@ -265,7 +257,7 @@
{
unsigned long strsize1, strsize2;
int found=0;
- struct list_head *pos;
+ struct list_head *pos, *n;
unsigned long size = sizeof(efi_variable_t);
efi_status_t status;
efivar_entry_t *efivar = data, *search_efivar = NULL;
@@ -297,7 +289,7 @@
This allows any properly formatted data structure to
be written to any of the files in /proc/efi/vars and it will work.
*/
- list_for_each(pos, &efivar_list) {
+ list_for_each_safe(pos, n, &efivar_list) {
search_efivar = efivar_entry(pos);
strsize1 = utf8_strsize(search_efivar->var.VariableName, 1024);
strsize2 = utf8_strsize(var_data->VariableName, 1024);
@@ -413,12 +405,12 @@
static void __exit
efivars_exit(void)
{
- struct list_head *pos;
+ struct list_head *pos, *n;
efivar_entry_t *efivar;
spin_lock(&efivars_lock);
- list_for_each(pos, &efivar_list) {
+ list_for_each_safe(pos, n, &efivar_list) {
efivar = efivar_entry(pos);
remove_proc_entry(efivar->entry->name, efi_vars_dir);
list_del(&efivar->list);
diff -urN linux-2.4.18/arch/ia64/kernel/entry.S lia64-2.4/arch/ia64/kernel/entry.S
--- linux-2.4.18/arch/ia64/kernel/entry.S Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/entry.S Tue Apr 9 22:01:38 2002
@@ -3,7 +3,7 @@
*
* Kernel entry points.
*
- * Copyright (C) 1998-2001 Hewlett-Packard Co
+ * Copyright (C) 1998-2002 Hewlett-Packard Co
* David Mosberger-Tang
* Copyright (C) 1999 VA Linux Systems
* Copyright (C) 1999 Walt Drummond
@@ -115,7 +115,7 @@
mov loc1=r16 // save ar.pfs across do_fork
.body
mov out1=in1
- mov out3=0
+ mov out3=16 // stacksize (compensates for 16-byte scratch area)
adds out2=IA64_SWITCH_STACK_SIZE+16,sp // out2 = ®s
mov out0=in0 // out0 = clone_flags
br.call.sptk.many rp=do_fork
@@ -521,35 +521,38 @@
;;
mov.ret.sptk rp=r14,.restart
.restart:
+ // need_resched and signals atomic test
+(pUser) rsm psr.i
adds r17=IA64_TASK_NEED_RESCHED_OFFSET,r13
adds r18=IA64_TASK_SIGPENDING_OFFSET,r13
#ifdef CONFIG_PERFMON
- adds r19=IA64_TASK_PFM_MUST_BLOCK_OFFSET,r13
+ adds r19=IA64_TASK_PFM_OVFL_BLOCK_RESET_OFFSET,r13
#endif
;;
#ifdef CONFIG_PERFMON
-(pUser) ld8 r19=[r19] // load current->thread.pfm_must_block
+(pUser) ld8 r19=[r19] // load current->thread.pfm_ovfl_block_reset
#endif
(pUser) ld8 r17=[r17] // load current->need_resched
(pUser) ld4 r18=[r18] // load current->sigpending
;;
#ifdef CONFIG_PERFMON
-(pUser) cmp.ne.unc p9,p0=r19,r0 // current->thread.pfm_must_block != 0?
+(pUser) cmp.ne.unc p9,p0=r19,r0 // current->thread.pfm_ovfl_block_reset != 0?
#endif
(pUser) cmp.ne.unc p7,p0=r17,r0 // current->need_resched != 0?
(pUser) cmp.ne.unc p8,p0=r18,r0 // current->sigpending != 0?
;;
- adds r2=PT(R8)+16,r12
- adds r3=PT(R9)+16,r12
#ifdef CONFIG_PERFMON
-(p9) br.call.spnt.many b7=pfm_block_on_overflow
+(p9) br.call.spnt.many b7=pfm_ovfl_block_reset
#endif
#if __GNUC__ < 3
(p7) br.call.spnt.many b7=invoke_schedule
#else
(p7) br.call.spnt.many b7=schedule
#endif
-(p8) br.call.spnt.many b7=handle_signal_delivery // check & deliver pending signals
+(p8) br.call.spnt.many rp=handle_signal_delivery // check & deliver pending signals (once)
+ ;;
+.ret9: adds r2=PT(R8)+16,r12
+ adds r3=PT(R9)+16,r12
;;
// start restoring the state saved on the kernel stack (struct pt_regs):
ld8.fill r8=[r2],16
@@ -582,7 +585,7 @@
ld8.fill r30=[r2],16
ld8.fill r31=[r3],16
;;
- rsm psr.i | psr.ic // initiate turning off of interrupts & interruption collection
+ rsm psr.i | psr.ic // initiate turning off of interrupt and interruption collection
invala // invalidate ALAT
;;
ld8 r1=[r2],16 // ar.ccv
@@ -601,7 +604,7 @@
mov ar.fpsr=r13
mov b0=r14
;;
- srlz.i // ensure interrupts & interruption collection are off
+ srlz.i // ensure interruption collection is off
mov b7=r15
;;
bsw.0 // switch back to bank 0
@@ -664,23 +667,38 @@
/*
* To prevent leaking bits between the kernel and user-space,
* we must clear the stacked registers in the "invalid" partition here.
- * Not pretty, but at least it's fast (3.34 registers/cycle).
- * Architecturally, this loop could go at 4.67 registers/cycle, but that would
- * oversubscribe Itanium.
+ * Not pretty, but at least it's fast (3.34 registers/cycle on Itanium,
+ * 5 registers/cycle on McKinley).
*/
# define pRecurse p6
# define pReturn p7
+#ifdef CONFIG_ITANIUM
# define Nregs 10
+#else
+# define Nregs 14
+#endif
alloc loc0=ar.pfs,2,Nregs-2,2,0
shr.u loc1=r18,9 // RNaTslots <= dirtySize / (64*8) + 1
sub r17=r17,r18 // r17 = (physStackedSize + 8) - dirtySize
;;
+#if 1
+ .align 32 // see comment below about gas bug...
+#endif
mov ar.rsc=r19 // load ar.rsc to be used for "loadrs"
shladd in0=loc1,3,r17
mov in1=0
+#if 0
+ // gas-2.11.90 is unable to generate a stop bit after .align, which is bad,
+ // because alloc must be at the beginning of an insn-group.
+ .align 32
+#else
+ nop 0
+ nop 0
+ nop 0
+#endif
;;
-// .align 32 // gas-2.11.90 is unable to generate a stop bit after .align
rse_clear_invalid:
+#ifdef CONFIG_ITANIUM
// cycle 0
{ .mii
alloc loc0=ar.pfs,2,Nregs-2,2,0
@@ -709,9 +727,31 @@
mov loc7=0
(pReturn) br.ret.sptk.many b6
}
+#else /* !CONFIG_ITANIUM */
+ alloc loc0=ar.pfs,2,Nregs-2,2,0
+ cmp.lt pRecurse,p0=Nregs*8,in0 // if more than Nregs regs left to clear, (re)curse
+ add out0=-Nregs*8,in0
+ add out1=1,in1 // increment recursion count
+ mov loc1=0
+ mov loc2=0
+ ;;
+ mov loc3=0
+ mov loc4=0
+ mov loc9=0
+ mov loc5=0
+ mov loc6=0
+(pRecurse) br.call.sptk.many b6=rse_clear_invalid
+ ;;
+ mov loc7=0
+ mov loc8=0
+ cmp.ne pReturn,p0=r0,in1 // if recursion count != 0, we need to do a br.ret
+ mov loc10=0
+ mov loc11=0
+(pReturn) br.ret.sptk.many b6
+#endif /* !CONFIG_ITANIUM */
# undef pRecurse
# undef pReturn
-
+ ;;
alloc r17=ar.pfs,0,0,0,0 // drop current register frame
;;
loadrs
diff -urN linux-2.4.18/arch/ia64/kernel/gate.S lia64-2.4/arch/ia64/kernel/gate.S
--- linux-2.4.18/arch/ia64/kernel/gate.S Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/gate.S Wed Feb 20 13:49:41 2002
@@ -90,7 +90,7 @@
(p8) br.cond.spnt setup_rbs // yup -> (clobbers r14, r15, and r16)
back_from_setup_rbs:
- .save ar.pfs, r8
+ .spillreg ar.pfs, r8
alloc r8=ar.pfs,0,0,3,0 // get CFM0, EC0, and CPL0 into r8
ld8 out0=[base0],16 // load arg0 (signum)
adds base1=(ARG1_OFF-(RBS_BASE_OFF+SIGCONTEXT_OFF)),base1
diff -urN linux-2.4.18/arch/ia64/kernel/head.S lia64-2.4/arch/ia64/kernel/head.S
--- linux-2.4.18/arch/ia64/kernel/head.S Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/head.S Tue Apr 9 21:50:52 2002
@@ -180,10 +180,12 @@
.rodata
alive_msg:
stringz "I'm alive and well\n"
+alive_msg_end:
.previous
alloc r2=ar.pfs,0,0,2,0
movl out0=alive_msg
+ movl out1=alive_msg_end-alive_msg-1
;;
br.call.sptk.many rp=early_printk
1: // force new bundle
@@ -560,137 +562,114 @@
END(__ia64_load_fpu)
GLOBAL_ENTRY(__ia64_init_fpu)
- alloc r2=ar.pfs,0,0,0,0
- stf.spill [sp]=f0
- mov f32=f0
- ;;
- ldf.fill f33=[sp]
- ldf.fill f34=[sp]
- mov f35=f0
- ;;
- ldf.fill f36=[sp]
- ldf.fill f37=[sp]
- mov f38=f0
- ;;
- ldf.fill f39=[sp]
- ldf.fill f40=[sp]
- mov f41=f0
- ;;
- ldf.fill f42=[sp]
- ldf.fill f43=[sp]
- mov f44=f0
- ;;
- ldf.fill f45=[sp]
- ldf.fill f46=[sp]
- mov f47=f0
- ;;
- ldf.fill f48=[sp]
- ldf.fill f49=[sp]
- mov f50=f0
- ;;
- ldf.fill f51=[sp]
- ldf.fill f52=[sp]
- mov f53=f0
- ;;
- ldf.fill f54=[sp]
- ldf.fill f55=[sp]
- mov f56=f0
- ;;
- ldf.fill f57=[sp]
- ldf.fill f58=[sp]
- mov f59=f0
- ;;
- ldf.fill f60=[sp]
- ldf.fill f61=[sp]
- mov f62=f0
- ;;
- ldf.fill f63=[sp]
- ldf.fill f64=[sp]
- mov f65=f0
- ;;
- ldf.fill f66=[sp]
- ldf.fill f67=[sp]
- mov f68=f0
- ;;
- ldf.fill f69=[sp]
- ldf.fill f70=[sp]
- mov f71=f0
- ;;
- ldf.fill f72=[sp]
- ldf.fill f73=[sp]
- mov f74=f0
- ;;
- ldf.fill f75=[sp]
- ldf.fill f76=[sp]
- mov f77=f0
- ;;
- ldf.fill f78=[sp]
- ldf.fill f79=[sp]
- mov f80=f0
- ;;
- ldf.fill f81=[sp]
- ldf.fill f82=[sp]
- mov f83=f0
- ;;
- ldf.fill f84=[sp]
- ldf.fill f85=[sp]
- mov f86=f0
- ;;
- ldf.fill f87=[sp]
- ldf.fill f88=[sp]
- mov f89=f0
- ;;
- ldf.fill f90=[sp]
- ldf.fill f91=[sp]
- mov f92=f0
- ;;
- ldf.fill f93=[sp]
- ldf.fill f94=[sp]
- mov f95=f0
- ;;
- ldf.fill f96=[sp]
- ldf.fill f97=[sp]
- mov f98=f0
- ;;
- ldf.fill f99=[sp]
- ldf.fill f100=[sp]
- mov f101=f0
- ;;
- ldf.fill f102=[sp]
- ldf.fill f103=[sp]
- mov f104=f0
- ;;
- ldf.fill f105=[sp]
- ldf.fill f106=[sp]
- mov f107=f0
- ;;
- ldf.fill f108=[sp]
- ldf.fill f109=[sp]
- mov f110=f0
- ;;
- ldf.fill f111=[sp]
- ldf.fill f112=[sp]
- mov f113=f0
- ;;
- ldf.fill f114=[sp]
- ldf.fill f115=[sp]
- mov f116=f0
- ;;
- ldf.fill f117=[sp]
- ldf.fill f118=[sp]
- mov f119=f0
- ;;
- ldf.fill f120=[sp]
- ldf.fill f121=[sp]
- mov f122=f0
- ;;
- ldf.fill f123=[sp]
- ldf.fill f124=[sp]
- mov f125=f0
+ stf.spill [sp]=f0 // M3
+ mov f32=f0 // F
+ nop.b 0
+
+ ldfps f33,f34=[sp] // M0
+ ldfps f35,f36=[sp] // M1
+ mov f37=f0 // F
;;
- ldf.fill f126=[sp]
- mov f127=f0
- br.ret.sptk.many rp
+
+ setf.s f38=r0 // M2
+ setf.s f39=r0 // M3
+ mov f40=f0 // F
+
+ ldfps f41,f42=[sp] // M0
+ ldfps f43,f44=[sp] // M1
+ mov f45=f0 // F
+
+ setf.s f46=r0 // M2
+ setf.s f47=r0 // M3
+ mov f48=f0 // F
+
+ ldfps f49,f50=[sp] // M0
+ ldfps f51,f52=[sp] // M1
+ mov f53=f0 // F
+
+ setf.s f54=r0 // M2
+ setf.s f55=r0 // M3
+ mov f56=f0 // F
+
+ ldfps f57,f58=[sp] // M0
+ ldfps f59,f60=[sp] // M1
+ mov f61=f0 // F
+
+ setf.s f62=r0 // M2
+ setf.s f63=r0 // M3
+ mov f64=f0 // F
+
+ ldfps f65,f66=[sp] // M0
+ ldfps f67,f68=[sp] // M1
+ mov f69=f0 // F
+
+ setf.s f70=r0 // M2
+ setf.s f71=r0 // M3
+ mov f72=f0 // F
+
+ ldfps f73,f74=[sp] // M0
+ ldfps f75,f76=[sp] // M1
+ mov f77=f0 // F
+
+ setf.s f78=r0 // M2
+ setf.s f79=r0 // M3
+ mov f80=f0 // F
+
+ ldfps f81,f82=[sp] // M0
+ ldfps f83,f84=[sp] // M1
+ mov f85=f0 // F
+
+ setf.s f86=r0 // M2
+ setf.s f87=r0 // M3
+ mov f88=f0 // F
+
+ /*
+ * When the instructions are cached, it would be faster to initialize
+ * the remaining registers with simply mov instructions (F-unit).
+ * This gets the time down to ~29 cycles. However, this would use up
+ * 33 bundles, whereas continuing with the above pattern yields
+ * 10 bundles and ~30 cycles.
+ */
+
+ ldfps f89,f90=[sp] // M0
+ ldfps f91,f92=[sp] // M1
+ mov f93=f0 // F
+
+ setf.s f94=r0 // M2
+ setf.s f95=r0 // M3
+ mov f96=f0 // F
+
+ ldfps f97,f98=[sp] // M0
+ ldfps f99,f100=[sp] // M1
+ mov f101=f0 // F
+
+ setf.s f102=r0 // M2
+ setf.s f103=r0 // M3
+ mov f104=f0 // F
+
+ ldfps f105,f106=[sp] // M0
+ ldfps f107,f108=[sp] // M1
+ mov f109=f0 // F
+
+ setf.s f110=r0 // M2
+ setf.s f111=r0 // M3
+ mov f112=f0 // F
+
+ ldfps f113,f114=[sp] // M0
+ ldfps f115,f116=[sp] // M1
+ mov f117=f0 // F
+
+ setf.s f118=r0 // M2
+ setf.s f119=r0 // M3
+ mov f120=f0 // F
+
+ ldfps f121,f122=[sp] // M0
+ ldfps f123,f124=[sp] // M1
+ mov f125=f0 // F
+
+ setf.s f126=r0 // M2
+ setf.s f127=r0 // M3
+ br.ret.sptk.many rp // F
END(__ia64_init_fpu)
/*
diff -urN linux-2.4.18/arch/ia64/kernel/ia64_ksyms.c lia64-2.4/arch/ia64/kernel/ia64_ksyms.c
--- linux-2.4.18/arch/ia64/kernel/ia64_ksyms.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/ia64_ksyms.c Tue Apr 9 11:03:59 2002
@@ -6,7 +6,8 @@
#include
#include
-EXPORT_SYMBOL_NOVERS(memset);
+EXPORT_SYMBOL_NOVERS(__memset_generic);
+EXPORT_SYMBOL_NOVERS(__bzero);
EXPORT_SYMBOL(memchr);
EXPORT_SYMBOL(memcmp);
EXPORT_SYMBOL_NOVERS(memcpy);
@@ -24,6 +25,7 @@
EXPORT_SYMBOL(strrchr);
EXPORT_SYMBOL(strstr);
EXPORT_SYMBOL(strtok);
+EXPORT_SYMBOL(strpbrk);
#include
EXPORT_SYMBOL(isa_irq_to_vector_map);
@@ -147,3 +149,10 @@
#include
extern struct proc_dir_entry *efi_dir;
EXPORT_SYMBOL(efi_dir);
+
+#include
+#ifdef CONFIG_IA64_GENERIC
+EXPORT_SYMBOL(ia64_mv);
+#endif
+EXPORT_SYMBOL(machvec_noop);
+
diff -urN linux-2.4.18/arch/ia64/kernel/iosapic.c lia64-2.4/arch/ia64/kernel/iosapic.c
--- linux-2.4.18/arch/ia64/kernel/iosapic.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/iosapic.c Wed Apr 10 11:03:55 2002
@@ -3,8 +3,9 @@
*
* Copyright (C) 1999 Intel Corp.
* Copyright (C) 1999 Asit Mallick
- * Copyright (C) 1999-2000 Hewlett-Packard Co.
- * Copyright (C) 1999-2000 David Mosberger-Tang
+ * Copyright (C) 2000-2002 J.I. Lee
+ * Copyright (C) 1999-2000, 2002 Hewlett-Packard Co.
+ * David Mosberger-Tang
* Copyright (C) 1999 VA Linux Systems
* Copyright (C) 1999,2000 Walt Drummond
*
@@ -15,6 +16,13 @@
* PCI to vector mapping, shared PCI interrupts.
* 00/10/27 D. Mosberger Document things a bit more to make them more understandable.
* Clean up much of the old IOSAPIC cruft.
+ * 01/07/27 J.I. Lee PCI irq routing, Platform/Legacy interrupts and fixes for
+ * ACPI S5(SoftOff) support.
+ * 02/01/23 J.I. Lee iosapic pgm fixes for PCI irq routing from _PRT
+ * 02/01/07 E. Focht Redirectable interrupt vectors in
+ * iosapic_set_affinity(), initializations for
+ * /proc/irq/#/smp_affinity
+ * 02/04/02 P. Diefenbaugh Cleaned up ACPI PCI IRQ routing.
*/
/*
* Here is what the interrupt logic between a PCI device and the CPU looks like:
@@ -49,9 +57,8 @@
#include
#include
#include
+#include
-#include
-#include
#include
#include
#include
@@ -63,6 +70,7 @@
#undef DEBUG_IRQ_ROUTING
+#undef OVERRIDE_DEBUG
static spinlock_t iosapic_lock = SPIN_LOCK_UNLOCKED;
@@ -84,6 +92,32 @@
unsigned char trigger : 1; /* trigger mode (see iosapic.h) */
} iosapic_irq[IA64_NUM_VECTORS];
+static struct iosapic {
+ char *addr; /* base address of IOSAPIC */
+ unsigned char pcat_compat; /* 8259 compatibility flag */
+ unsigned char base_irq; /* first irq assigned to this IOSAPIC */
+ unsigned short max_pin; /* max input pin supported in this IOSAPIC */
+} iosapic_lists[256] __initdata;
+
+static int num_iosapic = 0;
+
+
+/*
+ * Find an IOSAPIC associated with an IRQ
+ */
+static inline int __init
+find_iosapic (unsigned int irq)
+{
+ int i;
+
+ for (i = 0; i < num_iosapic; i++) {
+ if ((irq - iosapic_lists[i].base_irq) < iosapic_lists[i].max_pin)
+ return i;
+ }
+
+ return -1;
+}
+
/*
* Translate IOSAPIC irq number to the corresponding IA-64 interrupt vector. If no
* entry exists, return -1.
@@ -121,6 +155,7 @@
u32 low32, high32;
char *addr;
int pin;
+ char redir;
pin = iosapic_irq[vector].pin;
if (pin < 0)
@@ -131,6 +166,11 @@
trigger = iosapic_irq[vector].trigger;
dmode = iosapic_irq[vector].dmode;
+ redir = (dmode == IOSAPIC_LOWEST_PRIORITY) ? 1 : 0;
+#ifdef CONFIG_SMP
+ set_irq_affinity_info(vector, (int)(dest & 0xffff), redir);
+#endif
+
low32 = ((pol << IOSAPIC_POLARITY_SHIFT) |
(trigger << IOSAPIC_TRIGGER_SHIFT) |
(dmode << IOSAPIC_DELIVERY_SHIFT) |
@@ -211,6 +251,7 @@
u32 high32, low32;
int dest, pin;
char *addr;
+ int redir = (irq & (1<<31)) ? 1 : 0;
mask &= (1UL << smp_num_cpus) - 1;
@@ -225,6 +266,8 @@
if (pin < 0)
return; /* not an IOSAPIC interrupt */
+ set_irq_affinity_info(irq,dest,redir);
+
/* dest contains both id and eid */
high32 = dest << IOSAPIC_DEST_SHIFT;
@@ -234,9 +277,13 @@
writel(IOSAPIC_RTE_LOW(pin), addr + IOSAPIC_REG_SELECT);
low32 = readl(addr + IOSAPIC_WINDOW);
- /* change delivery mode to fixed */
low32 &= ~(7 << IOSAPIC_DELIVERY_SHIFT);
- low32 |= (IOSAPIC_FIXED << IOSAPIC_DELIVERY_SHIFT);
+ if (redir)
+ /* change delivery mode to lowest priority */
+ low32 |= (IOSAPIC_LOWEST_PRIORITY << IOSAPIC_DELIVERY_SHIFT);
+ else
+ /* change delivery mode to fixed */
+ low32 |= (IOSAPIC_FIXED << IOSAPIC_DELIVERY_SHIFT);
writel(IOSAPIC_RTE_HIGH(pin), addr + IOSAPIC_REG_SELECT);
writel(high32, addr + IOSAPIC_WINDOW);
@@ -343,29 +390,64 @@
}
/*
- * ACPI can describe IOSAPIC interrupts via static tables and namespace
- * methods. This provides an interface to register those interrupts and
- * program the IOSAPIC RTE.
+ * if the given vector is already owned by other,
+ * assign a new vector for the other and make the vector available
*/
-int
-iosapic_register_irq (u32 global_vector, unsigned long polarity, unsigned long
- edge_triggered, u32 base_irq, char *iosapic_address)
+static void
+iosapic_reassign_vector (int vector)
+{
+ int new_vector;
+
+ if (iosapic_irq[vector].pin >= 0 || iosapic_irq[vector].addr
+ || iosapic_irq[vector].base_irq || iosapic_irq[vector].dmode
+ || iosapic_irq[vector].polarity || iosapic_irq[vector].trigger)
+ {
+ new_vector = ia64_alloc_irq();
+ printk("Reassigning Vector 0x%x to 0x%x\n", vector, new_vector);
+ memcpy (&iosapic_irq[new_vector], &iosapic_irq[vector],
+ sizeof(struct iosapic_irq));
+ memset (&iosapic_irq[vector], 0, sizeof(struct iosapic_irq));
+ iosapic_irq[vector].pin = -1;
+ }
+}
+
+static void
+register_irq (u32 global_vector, int vector, int pin, unsigned char delivery,
+ unsigned long polarity, unsigned long edge_triggered,
+ u32 base_irq, char *iosapic_address)
{
irq_desc_t *idesc;
struct hw_interrupt_type *irq_type;
- int vector;
-
- vector = iosapic_irq_to_vector(global_vector);
- if (vector < 0)
- vector = ia64_alloc_irq();
- /* fill in information from this vector's IOSAPIC */
- iosapic_irq[vector].addr = iosapic_address;
- iosapic_irq[vector].base_irq = base_irq;
- iosapic_irq[vector].pin = global_vector - iosapic_irq[vector].base_irq;
+ iosapic_irq[vector].pin = pin;
iosapic_irq[vector].polarity = polarity ? IOSAPIC_POL_HIGH : IOSAPIC_POL_LOW;
- iosapic_irq[vector].dmode = IOSAPIC_LOWEST_PRIORITY;
+ iosapic_irq[vector].dmode = delivery;
+ /*
+ * In override, it does not provide addr/base_irq. global_vector is enough to
+ * locate iosapic addr, base_irq and pin by examining base_irq and max_pin of
+ * registered iosapics (tbd)
+ */
+#ifndef OVERRIDE_DEBUG
+ if (iosapic_address) {
+ iosapic_irq[vector].addr = iosapic_address;
+ iosapic_irq[vector].base_irq = base_irq;
+ }
+#else
+ if (iosapic_address) {
+ if (iosapic_irq[vector].addr && (iosapic_irq[vector].addr != iosapic_address))
+ printk("WARN: register_irq: diff IOSAPIC ADDRESS for gv %x, v %x\n",
+ global_vector, vector);
+ iosapic_irq[vector].addr = iosapic_address;
+ if (iosapic_irq[vector].base_irq && (iosapic_irq[vector].base_irq != base_irq)) {
+ printk("WARN: register_irq: diff BASE IRQ %x for gv %x, v %x\n",
+ base_irq, global_vector, vector);
+ }
+ iosapic_irq[vector].base_irq = base_irq;
+ } else if (!iosapic_irq[vector].addr)
+ printk("WARN: register_irq: invalid override for gv %x, v %x\n",
+ global_vector, vector);
+#endif
if (edge_triggered) {
iosapic_irq[vector].trigger = IOSAPIC_EDGE;
irq_type = &irq_type_iosapic_edge;
@@ -377,12 +459,32 @@
idesc = irq_desc(vector);
if (idesc->handler != irq_type) {
if (idesc->handler != &no_irq_type)
- printk("iosapic_register_irq(): changing vector 0x%02x from"
+ printk("register_irq(): changing vector 0x%02x from "
"%s to %s\n", vector, idesc->handler->typename, irq_type->typename);
idesc->handler = irq_type;
}
+}
+
+/*
+ * ACPI can describe IOSAPIC interrupts via static tables and namespace
+ * methods. This provides an interface to register those interrupts and
+ * program the IOSAPIC RTE.
+ */
+int
+iosapic_register_irq (u32 global_vector, unsigned long polarity, unsigned long
+ edge_triggered, u32 base_irq, char *iosapic_address)
+{
+ int vector;
- printk("IOSAPIC %x(%s,%s) -> Vector %x\n", global_vector,
+ vector = iosapic_irq_to_vector(global_vector);
+ if (vector < 0)
+ vector = ia64_alloc_irq();
+
+ register_irq (global_vector, vector, global_vector - base_irq,
+ IOSAPIC_LOWEST_PRIORITY, polarity, edge_triggered,
+ base_irq, iosapic_address);
+
+ printk("IOSAPIC 0x%x(%s,%s) -> Vector 0x%x\n", global_vector,
(polarity ? "high" : "low"), (edge_triggered ? "edge" : "level"), vector);
/* program the IOSAPIC routing table */
@@ -395,51 +497,40 @@
* Note that the irq_base and IOSAPIC address must be set in iosapic_init().
*/
int
-iosapic_register_platform_irq (u32 int_type, u32 global_vector, u32 iosapic_vector,
- u16 eid, u16 id, unsigned long polarity,
+iosapic_register_platform_irq (u32 int_type, u32 global_vector,
+ u32 iosapic_vector, u16 eid, u16 id, unsigned long polarity,
unsigned long edge_triggered, u32 base_irq, char *iosapic_address)
{
- struct hw_interrupt_type *irq_type;
- irq_desc_t *idesc;
+ unsigned char delivery;
int vector;
switch (int_type) {
- case ACPI20_ENTRY_PIS_CPEI:
- vector = IA64_PCE_VECTOR;
- iosapic_irq[vector].dmode = IOSAPIC_LOWEST_PRIORITY;
+ case ACPI_INTERRUPT_PMI:
+ vector = iosapic_vector;
+ /*
+ * since PMI vector is alloc'd by FW(ACPI) not by kernel,
+ * we need to make sure the vector is available
+ */
+ iosapic_reassign_vector(vector);
+ delivery = IOSAPIC_PMI;
break;
- case ACPI20_ENTRY_PIS_INIT:
+ case ACPI_INTERRUPT_INIT:
vector = ia64_alloc_irq();
- iosapic_irq[vector].dmode = IOSAPIC_INIT;
+ delivery = IOSAPIC_INIT;
+ break;
+ case ACPI_INTERRUPT_CPEI:
+ vector = IA64_PCE_VECTOR;
+ delivery = IOSAPIC_LOWEST_PRIORITY;
break;
default:
printk("iosapic_register_platform_irq(): invalid int type\n");
return -1;
}
- /* fill in information from this vector's IOSAPIC */
- iosapic_irq[vector].addr = iosapic_address;
- iosapic_irq[vector].base_irq = base_irq;
- iosapic_irq[vector].pin = global_vector - iosapic_irq[vector].base_irq;
- iosapic_irq[vector].polarity = polarity ? IOSAPIC_POL_HIGH : IOSAPIC_POL_LOW;
-
- if (edge_triggered) {
- iosapic_irq[vector].trigger = IOSAPIC_EDGE;
- irq_type = &irq_type_iosapic_edge;
- } else {
- iosapic_irq[vector].trigger = IOSAPIC_LEVEL;
- irq_type = &irq_type_iosapic_level;
- }
+ register_irq(global_vector, vector, global_vector - base_irq, delivery, polarity,
+ edge_triggered, base_irq, iosapic_address);
- idesc = irq_desc(vector);
- if (idesc->handler != irq_type) {
- if (idesc->handler != &no_irq_type)
- printk("iosapic_register_platform_irq(): changing vector 0x%02x from"
- "%s to %s\n", vector, idesc->handler->typename, irq_type->typename);
- idesc->handler = irq_type;
- }
-
- printk("PLATFORM int %x: IOSAPIC %x(%s,%s) -> Vector %x CPU %.02u:%.02u\n",
+ printk("PLATFORM int 0x%x: IOSAPIC 0x%x(%s,%s) -> Vector 0x%x CPU %.02u:%.02u\n",
int_type, global_vector, (polarity ? "high" : "low"),
(edge_triggered ? "edge" : "level"), vector, eid, id);
@@ -450,15 +541,18 @@
/*
- * ACPI calls this when it finds an entry for a legacy ISA interrupt. Note that the
- * irq_base and IOSAPIC address must be set in iosapic_init().
+ * ACPI calls this when it finds an entry for a legacy ISA interrupt.
+ * Note that the irq_base and IOSAPIC address must be set in iosapic_init().
*/
void
iosapic_register_legacy_irq (unsigned long irq,
unsigned long pin, unsigned long polarity,
unsigned long edge_triggered)
{
- unsigned int vector = isa_irq_to_vector(irq);
+ int vector = isa_irq_to_vector(irq);
+
+ register_irq(irq, vector, (int)pin, IOSAPIC_LOWEST_PRIORITY, polarity, edge_triggered,
+ 0, NULL); /* ignored for override */
#ifdef DEBUG_IRQ_ROUTING
printk("ISA: IRQ %u -> IOSAPIC irq 0x%02x (%s, %s) -> vector %02x\n",
@@ -467,43 +561,48 @@
vector);
#endif
- iosapic_irq[vector].pin = pin;
- iosapic_irq[vector].dmode = IOSAPIC_LOWEST_PRIORITY;
- iosapic_irq[vector].polarity = polarity ? IOSAPIC_POL_HIGH : IOSAPIC_POL_LOW;
- iosapic_irq[vector].trigger = edge_triggered ? IOSAPIC_EDGE : IOSAPIC_LEVEL;
+ /* program the IOSAPIC routing table */
+ set_rte(vector, (ia64_get_lid() >> 16) & 0xffff);
}
void __init
iosapic_init (unsigned long phys_addr, unsigned int base_irq, int pcat_compat)
{
- struct hw_interrupt_type *irq_type;
- int i, irq, max_pin, vector;
- irq_desc_t *idesc;
+ int irq, max_pin, vector, pin;
unsigned int ver;
char *addr;
static int first_time = 1;
if (first_time) {
first_time = 0;
-
for (vector = 0; vector < IA64_NUM_VECTORS; ++vector)
iosapic_irq[vector].pin = -1; /* mark as unused */
+ }
+ if (pcat_compat) {
/*
- * Fetch the PCI interrupt routing table:
+ * Disable the compatibility mode interrupts (8259 style), needs IN/OUT support
+ * enabled.
*/
- acpi_cf_get_pci_vectors(&pci_irq.route, &pci_irq.num_routes);
+ printk("%s: Disabling PC-AT compatible 8259 interrupts\n", __FUNCTION__);
+ outb(0xff, 0xA1);
+ outb(0xff, 0x21);
}
addr = ioremap(phys_addr, 0);
-
ver = iosapic_version(addr);
max_pin = (ver >> 16) & 0xff;
+ iosapic_lists[num_iosapic].addr = addr;
+ iosapic_lists[num_iosapic].pcat_compat = pcat_compat;
+ iosapic_lists[num_iosapic].base_irq = base_irq;
+ iosapic_lists[num_iosapic].max_pin = max_pin;
+ num_iosapic++;
+
printk("IOSAPIC: version %x.%x, address 0x%lx, IRQs 0x%02x-0x%02x\n",
(ver & 0xf0) >> 4, (ver & 0x0f), phys_addr, base_irq, base_irq + max_pin);
- if ((base_irq == 0) && pcat_compat)
+ if ((base_irq == 0) && pcat_compat) {
/*
* Map the legacy ISA devices into the IOSAPIC data. Some of these may
* get reprogrammed later on with data from the ACPI Interrupt Source
@@ -511,36 +610,53 @@
*/
for (irq = 0; irq < 16; ++irq) {
vector = isa_irq_to_vector(irq);
- iosapic_irq[vector].addr = addr;
- iosapic_irq[vector].base_irq = 0;
- if (iosapic_irq[vector].pin == -1)
- iosapic_irq[vector].pin = irq;
- iosapic_irq[vector].dmode = IOSAPIC_LOWEST_PRIORITY;
- iosapic_irq[vector].trigger = IOSAPIC_EDGE;
- iosapic_irq[vector].polarity = IOSAPIC_POL_HIGH;
+ if ((pin = iosapic_irq[vector].pin) == -1)
+ pin = irq;
+
+ register_irq(irq, vector, pin,
+ /* IOSAPIC_POL_HIGH, IOSAPIC_EDGE */
+ IOSAPIC_LOWEST_PRIORITY, 1, 1, base_irq, addr);
+
#ifdef DEBUG_IRQ_ROUTING
printk("ISA: IRQ %u -> IOSAPIC irq 0x%02x (high, edge) -> vector 0x%02x\n",
irq, iosapic_irq[vector].base_irq + iosapic_irq[vector].pin,
vector);
#endif
- irq_type = &irq_type_iosapic_edge;
- idesc = irq_desc(vector);
- if (idesc->handler != irq_type) {
- if (idesc->handler != &no_irq_type)
- printk("iosapic_init: changing vector 0x%02x from %s to "
- "%s\n", irq, idesc->handler->typename,
- irq_type->typename);
- idesc->handler = irq_type;
- }
/* program the IOSAPIC routing table: */
set_rte(vector, (ia64_get_lid() >> 16) & 0xffff);
}
+ }
+}
+
+void __init
+iosapic_init_pci_irq (void)
+{
+ int i, index, vector, pin;
+ int base_irq, max_pin, pcat_compat;
+ unsigned int irq;
+ char *addr;
+
+ if (0 != acpi_get_prt(&pci_irq.route, &pci_irq.num_routes))
+ return;
for (i = 0; i < pci_irq.num_routes; i++) {
+
irq = pci_irq.route[i].irq;
- if ((unsigned) (irq - base_irq) > max_pin)
+ index = find_iosapic(irq);
+ if (index < 0) {
+ printk("PCI: IRQ %u has no IOSAPIC mapping\n", irq);
+ continue;
+ }
+
+ addr = iosapic_lists[index].addr;
+ base_irq = iosapic_lists[index].base_irq;
+ max_pin = iosapic_lists[index].max_pin;
+ pcat_compat = iosapic_lists[index].pcat_compat;
+ pin = irq - base_irq;
+
+ if ((unsigned) pin > max_pin)
/* the interrupt route is for another controller... */
continue;
@@ -553,29 +669,13 @@
vector = ia64_alloc_irq();
}
- iosapic_irq[vector].addr = addr;
- iosapic_irq[vector].base_irq = base_irq;
- iosapic_irq[vector].pin = (irq - base_irq);
- iosapic_irq[vector].dmode = IOSAPIC_LOWEST_PRIORITY;
- iosapic_irq[vector].trigger = IOSAPIC_LEVEL;
- iosapic_irq[vector].polarity = IOSAPIC_POL_LOW;
+ register_irq(irq, vector, pin, IOSAPIC_LOWEST_PRIORITY, 0, 0, base_irq, addr);
-# ifdef DEBUG_IRQ_ROUTING
+#ifdef DEBUG_IRQ_ROUTING
printk("PCI: (B%d,I%d,P%d) -> IOSAPIC irq 0x%02x -> vector 0x%02x\n",
pci_irq.route[i].bus, pci_irq.route[i].pci_id>>16, pci_irq.route[i].pin,
iosapic_irq[vector].base_irq + iosapic_irq[vector].pin, vector);
-# endif
- irq_type = &irq_type_iosapic_level;
- idesc = irq_desc(vector);
- if (idesc->handler != irq_type){
- if (idesc->handler != &no_irq_type)
- printk("iosapic_init: changing vector 0x%02x from %s to %s\n",
- vector, idesc->handler->typename, irq_type->typename);
- idesc->handler = irq_type;
- }
-
- /* program the IOSAPIC routing table: */
- set_rte(vector, (ia64_get_lid() >> 16) & 0xffff);
+#endif
}
}
@@ -585,6 +685,13 @@
struct pci_dev *dev;
unsigned char pin;
int vector;
+ struct hw_interrupt_type *irq_type;
+ irq_desc_t *idesc;
+
+ if (phase == 0) {
+ iosapic_init_pci_irq();
+ return;
+ }
if (phase != 1)
return;
@@ -611,19 +718,28 @@
if (vector >= 0)
printk(KERN_WARNING
"PCI: using PPB(B%d,I%d,P%d) to get vector %02x\n",
- bridge->bus->number, PCI_SLOT(bridge->devfn),
+ dev->bus->number, PCI_SLOT(dev->devfn),
pin, vector);
else
printk(KERN_WARNING
- "PCI: Couldn't map irq for (B%d,I%d,P%d)o\n",
- bridge->bus->number, PCI_SLOT(bridge->devfn),
- pin);
+ "PCI: Couldn't map irq for (B%d,I%d,P%d)\n",
+ dev->bus->number, PCI_SLOT(dev->devfn), pin);
}
if (vector >= 0) {
printk("PCI->APIC IRQ transform: (B%d,I%d,P%d) -> 0x%02x\n",
dev->bus->number, PCI_SLOT(dev->devfn), pin, vector);
dev->irq = vector;
+ irq_type = &irq_type_iosapic_level;
+ idesc = irq_desc(vector);
+ if (idesc->handler != irq_type) {
+ if (idesc->handler != &no_irq_type)
+ printk("iosapic_pci_fixup: changing vector 0x%02x "
+ "from %s to %s\n", vector,
+ idesc->handler->typename,
+ irq_type->typename);
+ idesc->handler = irq_type;
+ }
#ifdef CONFIG_SMP
/*
* For platforms that do not support interrupt redirect
@@ -638,7 +754,16 @@
cpu_index++;
if (cpu_index >= smp_num_cpus)
cpu_index = 0;
+ } else {
+ /*
+ * Direct the interrupt vector to the current cpu,
+ * platform redirection will distribute them.
+ */
+ set_rte(vector, (ia64_get_lid() >> 16) & 0xffff);
}
+#else
+ /* direct the interrupt vector to the running cpu id */
+ set_rte(vector, (ia64_get_lid() >> 16) & 0xffff);
#endif
}
}
diff -urN linux-2.4.18/arch/ia64/kernel/irq.c lia64-2.4/arch/ia64/kernel/irq.c
--- linux-2.4.18/arch/ia64/kernel/irq.c Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/irq.c Fri Apr 5 17:05:37 2002
@@ -67,6 +67,27 @@
irq_desc_t _irq_desc[NR_IRQS] __cacheline_aligned =
{ [0 ... NR_IRQS-1] = { IRQ_DISABLED, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED}};
+#ifdef CONFIG_IA64_GENERIC
+struct irq_desc *
+__ia64_irq_desc (unsigned int irq)
+{
+ return _irq_desc + irq;
+}
+
+ia64_vector
+__ia64_irq_to_vector (unsigned int irq)
+{
+ return (ia64_vector) irq;
+}
+
+unsigned int
+__ia64_local_vector_to_irq (ia64_vector vec)
+{
+ return (unsigned int) vec;
+}
+
+#endif
+
static void register_irq_proc (unsigned int irq);
/*
@@ -287,10 +308,11 @@
* already executing in one..
*/
if (!irqs_running())
- if (local_bh_count() || !spin_is_locked(&global_bh_lock))
+ if (really_local_bh_count() || !spin_is_locked(&global_bh_lock))
break;
/* Duh, we have to loop. Release the lock to avoid deadlocks */
+ smp_mb__before_clear_bit(); /* need barrier before releasing lock... */
clear_bit(0,&global_irq_lock);
for (;;) {
@@ -305,7 +327,7 @@
continue;
if (global_irq_lock)
continue;
- if (!local_bh_count() && spin_is_locked(&global_bh_lock))
+ if (!really_local_bh_count() && spin_is_locked(&global_bh_lock))
continue;
if (!test_and_set_bit(0,&global_irq_lock))
break;
@@ -378,14 +400,14 @@
__save_flags(flags);
if (flags & IA64_PSR_I) {
__cli();
- if (!local_irq_count())
+ if (!really_local_irq_count())
get_irqlock();
}
#else
__save_flags(flags);
if (flags & (1 << EFLAGS_IF_SHIFT)) {
__cli();
- if (!local_irq_count())
+ if (!really_local_irq_count())
get_irqlock();
}
#endif
@@ -393,7 +415,7 @@
void __global_sti(void)
{
- if (!local_irq_count())
+ if (!really_local_irq_count())
release_irqlock(smp_processor_id());
__sti();
}
@@ -422,7 +444,7 @@
retval = 2 + local_enabled;
/* check for global flags if we're not in an interrupt */
- if (!local_irq_count()) {
+ if (!really_local_irq_count()) {
if (local_enabled)
retval = 1;
if (global_irq_holder == cpu)
@@ -529,7 +551,7 @@
disable_irq_nosync(irq);
#ifdef CONFIG_SMP
- if (!local_irq_count()) {
+ if (!really_local_irq_count()) {
do {
barrier();
} while (irq_desc(irq)->status & IRQ_INPROGRESS);
@@ -1009,6 +1031,11 @@
rand_initialize_irq(irq);
}
+ if (new->flags & SA_PERCPU_IRQ) {
+ desc->status |= IRQ_PER_CPU;
+ desc->handler = &irq_type_ia64_lsapic;
+ }
+
/*
* The following block of code has to be executed atomically
*/
@@ -1089,13 +1116,25 @@
static struct proc_dir_entry * smp_affinity_entry [NR_IRQS];
static unsigned long irq_affinity [NR_IRQS] = { [0 ... NR_IRQS-1] = ~0UL };
+static char irq_redir [NR_IRQS]; // = { [0 ... NR_IRQS-1] = 1 };
+
+void set_irq_affinity_info(int irq, int hwid, int redir)
+{
+ unsigned long mask = 1UL<= 0 && irq < NR_IRQS) {
+ irq_affinity[irq] = mask;
+ irq_redir[irq] = (char) (redir & 0xff);
+ }
+}
static int irq_affinity_read_proc (char *page, char **start, off_t off,
int count, int *eof, void *data)
{
- if (count < HEX_DIGITS+1)
+ if (count < HEX_DIGITS+3)
return -EINVAL;
- return sprintf (page, "%08lx\n", irq_affinity[(long)data]);
+ return sprintf (page, "%s%08lx\n", irq_redir[(long)data] ? "r " : "",
+ irq_affinity[(long)data]);
}
static int irq_affinity_write_proc (struct file *file, const char *buffer,
@@ -1103,11 +1142,20 @@
{
int irq = (long) data, full_count = count, err;
unsigned long new_value;
+ const char *buf = buffer;
+ int redir;
if (!irq_desc(irq)->handler->set_affinity)
return -EIO;
- err = parse_hex_value(buffer, count, &new_value);
+ if (buf[0] == 'r' || buf[0] == 'R') {
+ ++buf;
+ while (*buf == ' ') ++buf;
+ redir = 1;
+ } else
+ redir = 0;
+
+ err = parse_hex_value(buf, count, &new_value);
/*
* Do not allow disabling IRQs completely - it's a too easy
@@ -1117,8 +1165,7 @@
if (!(new_value & cpu_online_map))
return -EINVAL;
- irq_affinity[irq] = new_value;
- irq_desc(irq)->handler->set_affinity(irq, new_value);
+ irq_desc(irq)->handler->set_affinity(irq | (redir?(1<<31):0), new_value);
return full_count;
}
diff -urN linux-2.4.18/arch/ia64/kernel/ivt.S lia64-2.4/arch/ia64/kernel/ivt.S
--- linux-2.4.18/arch/ia64/kernel/ivt.S Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/ivt.S Wed Feb 20 16:12:21 2002
@@ -275,6 +275,7 @@
mov r16=cr.ifa // get address that caused the TLB miss
movl r17=PAGE_KERNEL
mov r21=cr.ipsr
+ movl r19=(((1 << IA64_MAX_PHYS_BITS) - 1) & ~0xfff)
mov r31=pr
;;
#ifdef CONFIG_DISABLE_VHPT
@@ -289,12 +290,12 @@
(p8) br.cond.dptk itlb_fault
#endif
extr.u r23=r21,IA64_PSR_CPL0_BIT,2 // extract psr.cpl
+ and r19=r19,r16 // clear ed, reserved bits, and PTE control bits
shr.u r18=r16,57 // move address bit 61 to bit 4
- dep r19=0,r16,IA64_MAX_PHYS_BITS,(64-IA64_MAX_PHYS_BITS) // clear ed & reserved bits
;;
andcm r18=0x10,r18 // bit 4=~address-bit(61)
cmp.ne p8,p0=r0,r23 // psr.cpl != 0?
- dep r19=r17,r19,0,12 // insert PTE control bits into r19
+ or r19=r17,r19 // insert PTE control bits into r19
;;
or r19=r19,r18 // set bit 4 (uncached) if the access was to region 6
(p8) br.cond.spnt page_fault
@@ -312,6 +313,7 @@
mov r16=cr.ifa // get address that caused the TLB miss
movl r17=PAGE_KERNEL
mov r20=cr.isr
+ movl r19=(((1 << IA64_MAX_PHYS_BITS) - 1) & ~0xfff)
mov r21=cr.ipsr
mov r31=pr
;;
@@ -328,15 +330,15 @@
#endif
extr.u r23=r21,IA64_PSR_CPL0_BIT,2 // extract psr.cpl
tbit.nz p6,p7=r20,IA64_ISR_SP_BIT // is speculation bit on?
+ and r19=r19,r16 // clear ed, reserved bits, and PTE control bits
shr.u r18=r16,57 // move address bit 61 to bit 4
- dep r19=0,r16,IA64_MAX_PHYS_BITS,(64-IA64_MAX_PHYS_BITS) // clear ed & reserved bits
;;
andcm r18=0x10,r18 // bit 4=~address-bit(61)
cmp.ne p8,p0=r0,r23
(p8) br.cond.spnt page_fault
dep r21=-1,r21,IA64_PSR_ED_BIT,1
- dep r19=r17,r19,0,12 // insert PTE control bits into r19
+ or r19=r19,r17 // insert PTE control bits into r19
;;
or r19=r19,r18 // set bit 4 (uncached) if the access was to region 6
(p6) mov cr.ipsr=r21
diff -urN linux-2.4.18/arch/ia64/kernel/mca.c lia64-2.4/arch/ia64/kernel/mca.c
--- linux-2.4.18/arch/ia64/kernel/mca.c Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/mca.c Wed Apr 10 10:11:02 2002
@@ -3,6 +3,12 @@
* Purpose: Generic MCA handling layer
*
* Updated for latest kernel
+ * Copyright (C) 2002 Dell Computer Corporation
+ * Copyright (C) Matt Domsch (Matt_Domsch@dell.com)
+ *
+ * Copyright (C) 2002 Intel
+ * Copyright (C) Jenna Hall (jenna.s.hall@intel.com)
+ *
* Copyright (C) 2001 Intel
* Copyright (C) Fred Lewis (frederick.v.lewis@intel.com)
*
@@ -12,6 +18,13 @@
* Copyright (C) 1999 Silicon Graphics, Inc.
* Copyright (C) Vijay Chander(vijay@engr.sgi.com)
*
+ * 02/03/25 M. Domsch GUID cleanups
+ *
+ * 02/01/04 J. Hall Aligned MCA stack to 16 bytes, added platform vs. CPU
+ * error flag, set SAL default return values, changed
+ * error record structure to linked list, added init call
+ * to sal_get_state_info_size().
+ *
* 01/01/03 F. Lewis Added setup of CMCI and CPEI IRQs, logging of corrected
* platform errors, completed code for logging of
* corrected & uncorrected machine check errors, and
@@ -27,6 +40,8 @@
#include
#include
#include
+#include
+#include
#include
#include
@@ -37,7 +52,6 @@
#include
#include
-#include
#undef MCA_PRT_XTRA_DATA
@@ -50,18 +64,22 @@
ia64_mca_sal_to_os_state_t ia64_sal_to_os_handoff_state;
ia64_mca_os_to_sal_state_t ia64_os_to_sal_handoff_state;
u64 ia64_mca_proc_state_dump[512];
-u64 ia64_mca_stack[1024];
+u64 ia64_mca_stack[1024] __attribute__((aligned(16)));
u64 ia64_mca_stackframe[32];
u64 ia64_mca_bspstore[1024];
u64 ia64_init_stack[INIT_TASK_SIZE] __attribute__((aligned(16)));
+u64 ia64_mca_sal_data_area[1356];
+u64 ia64_mca_min_state_save_info;
+u64 ia64_tlb_functional;
+u64 ia64_os_mca_recovery_successful;
static void ia64_mca_wakeup_ipi_wait(void);
static void ia64_mca_wakeup(int cpu);
static void ia64_mca_wakeup_all(void);
static void ia64_log_init(int);
-extern void ia64_monarch_init_handler (void);
-extern void ia64_slave_init_handler (void);
-extern struct hw_interrupt_type irq_type_iosapic_level;
+extern void ia64_monarch_init_handler (void);
+extern void ia64_slave_init_handler (void);
+extern struct hw_interrupt_type irq_type_iosapic_level;
static struct irqaction cmci_irqaction = {
handler: ia64_mca_cmc_int_handler,
@@ -95,25 +113,31 @@
* memory.
*
* Inputs : sal_info_type (Type of error record MCA/CMC/CPE/INIT)
- * Outputs : None
+ * Outputs : platform error status
*/
-void
+int
ia64_mca_log_sal_error_record(int sal_info_type)
{
+ int platform_err = 0;
+
/* Get the MCA error record */
if (!ia64_log_get(sal_info_type, (prfunc_t)printk))
- return; // no record retrieved
+ return platform_err; // no record retrieved
- /* Log the error record */
- ia64_log_print(sal_info_type, (prfunc_t)printk);
+ /* TODO:
+ * 1. analyze error logs to determine recoverability
+ * 2. perform error recovery procedures, if applicable
+ * 3. set ia64_os_mca_recovery_successful flag, if applicable
+ */
- /* Clear the CMC SAL logs now that they have been logged */
+ platform_err = ia64_log_print(sal_info_type, (prfunc_t)printk);
ia64_sal_clear_state_info(sal_info_type);
+
+ return platform_err;
}
/*
- * hack for now, add platform dependent handlers
- * here
+ * platform dependent error handling
*/
#ifndef PLATFORM_MCA_HANDLERS
void
@@ -275,8 +299,8 @@
cmcv_reg_t cmcv;
cmcv.cmcv_regval = 0;
- cmcv.cmcv_mask = 0; /* Unmask/enable interrupt */
- cmcv.cmcv_vector = IA64_CMC_VECTOR;
+ cmcv.cmcv_mask = 0; /* Unmask/enable interrupt */
+ cmcv.cmcv_vector = IA64_CMC_VECTOR;
ia64_set_cmcv(cmcv.cmcv_regval);
IA64_MCA_DEBUG("ia64_mca_platform_init: CPU %d corrected "
@@ -329,17 +353,13 @@
verify_guid (efi_guid_t *test, efi_guid_t *target)
{
int rc;
+ char out[40];
- if ((rc = memcmp((void *)test, (void *)target, sizeof(efi_guid_t)))) {
- IA64_MCA_DEBUG("ia64_mca_print: invalid guid = "
- "{ %08x, %04x, %04x, { %#02x, %#02x, %#02x, %#02x, "
- "%#02x, %#02x, %#02x, %#02x, } } \n ",
- test->data1, test->data2, test->data3, test->data4[0],
- test->data4[1], test->data4[2], test->data4[3],
- test->data4[4], test->data4[5], test->data4[6],
- test->data4[7]);
+ if ((rc = efi_guidcmp(*test, *target))) {
+ IA64_MCA_DEBUG(KERN_DEBUG
+ "verify_guid: invalid GUID = %s\n",
+ efi_guid_unparse(test, out));
}
-
return rc;
}
@@ -374,6 +394,9 @@
IA64_MCA_DEBUG("ia64_mca_init: begin\n");
+ /* initialize recovery success indicator */
+ ia64_os_mca_recovery_successful = 0;
+
/* Clear the Rendez checkin flag for all cpus */
for(i = 0 ; i < NR_CPUS; i++)
ia64_mc_info.imi_rendez_checkin[i] = IA64_MCA_RENDEZ_CHECKIN_NOTDONE;
@@ -459,7 +482,7 @@
/*
* Configure the CMCI vector and handler. Interrupts for CMC are
- * per-processor, so AP CMC interrupts are setup in smp_callin() (smp.c).
+ * per-processor, so AP CMC interrupts are setup in smp_callin() (smpboot.c).
*/
register_percpu_irq(IA64_CMC_VECTOR, &cmci_irqaction);
ia64_mca_cmc_vector_setup(); /* Setup vector on BSP & enable */
@@ -474,7 +497,7 @@
{
irq_desc_t *desc;
unsigned int irq;
- int cpev = acpi_request_vector(ACPI20_ENTRY_PIS_CPEI);
+ int cpev = acpi_request_vector(ACPI_INTERRUPT_CPEI);
if (cpev >= 0) {
for (irq = 0; irq < NR_IRQS; ++irq)
@@ -498,6 +521,9 @@
ia64_log_init(SAL_INFO_TYPE_CMC);
ia64_log_init(SAL_INFO_TYPE_CPE);
+ /* Zero the min state save info */
+ ia64_mca_min_state_save_info = 0;
+
#if defined(MCA_TEST)
mca_test();
#endif /* #if defined(MCA_TEST) */
@@ -576,7 +602,7 @@
int cpu;
/* Clear the Rendez checkin flag for all cpus */
- for(cpu = 0 ; cpu < smp_num_cpus; cpu++)
+ for(cpu = 0; cpu < smp_num_cpus; cpu++)
if (ia64_mc_info.imi_rendez_checkin[cpu] == IA64_MCA_RENDEZ_CHECKIN_DONE)
ia64_mca_wakeup(cpu);
@@ -668,6 +694,13 @@
/* Cold Boot for uncorrectable MCA */
ia64_os_to_sal_handoff_state.imots_os_status = IA64_MCA_COLD_BOOT;
+
+ /* Default = tell SAL to return to same context */
+ ia64_os_to_sal_handoff_state.imots_context = IA64_MCA_SAME_CONTEXT;
+
+ /* Register pointer to new min state values */
+ /* NOTE: need to do something with this during recovery phase */
+ ia64_os_to_sal_handoff_state.imots_new_min_state = &ia64_mca_min_state_save_info;
}
/*
@@ -678,10 +711,10 @@
* This is the place where the core of OS MCA handling is done.
* Right now the logs are extracted and displayed in a well-defined
* format. This handler code is supposed to be run only on the
- * monarch processor. Once the monarch is done with MCA handling
+ * monarch processor. Once the monarch is done with MCA handling
* further MCA logging is enabled by clearing logs.
* Monarch also has the duty of sending wakeup-IPIs to pull the
- * slave processors out of rendezvous spinloop.
+ * slave processors out of rendezvous spinloop.
*
* Inputs : None
* Outputs : None
@@ -689,20 +722,16 @@
void
ia64_mca_ucmc_handler(void)
{
-#if 0 /* stubbed out @FVL */
- /*
- * Attempting to log a DBE error Causes "reserved register/field panic"
- * in printk.
- */
+ int platform_err = 0;
/* Get the MCA error record and log it */
- ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);
-#endif /* stubbed out @FVL */
+ platform_err = ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);
/*
* Do Platform-specific mca error handling if required.
*/
- mca_handler_platform() ;
+ if (platform_err)
+ mca_handler_platform();
/*
* Wakeup all the processors which are spinning in the rendezvous
@@ -749,13 +778,16 @@
{
spinlock_t isl_lock;
int isl_index;
- ia64_err_rec_t isl_log[IA64_MAX_LOGS]; /* need space to store header + error log */
+ ia64_err_rec_t *isl_log[IA64_MAX_LOGS]; /* need space to store header + error log */
} ia64_state_log_t;
static ia64_state_log_t ia64_state_log[IA64_MAX_LOG_TYPES];
-/* Note: Some of these macros assume IA64_MAX_LOGS is always 2. Should be */
-/* fixed. @FVL */
+#define IA64_LOG_ALLOCATE(it, size) \
+ {ia64_state_log[it].isl_log[IA64_LOG_CURR_INDEX(it)] = \
+ (ia64_err_rec_t *)alloc_bootmem(size); \
+ ia64_state_log[it].isl_log[IA64_LOG_NEXT_INDEX(it)] = \
+ (ia64_err_rec_t *)alloc_bootmem(size);}
#define IA64_LOG_LOCK_INIT(it) spin_lock_init(&ia64_state_log[it].isl_lock)
#define IA64_LOG_LOCK(it) spin_lock_irqsave(&ia64_state_log[it].isl_lock, s)
#define IA64_LOG_UNLOCK(it) spin_unlock_irqrestore(&ia64_state_log[it].isl_lock,s)
@@ -765,13 +797,13 @@
ia64_state_log[it].isl_index = 1 - ia64_state_log[it].isl_index
#define IA64_LOG_INDEX_DEC(it) \
ia64_state_log[it].isl_index = 1 - ia64_state_log[it].isl_index
-#define IA64_LOG_NEXT_BUFFER(it) (void *)(&(ia64_state_log[it].isl_log[IA64_LOG_NEXT_INDEX(it)]))
-#define IA64_LOG_CURR_BUFFER(it) (void *)(&(ia64_state_log[it].isl_log[IA64_LOG_CURR_INDEX(it)]))
+#define IA64_LOG_NEXT_BUFFER(it) (void *)((ia64_state_log[it].isl_log[IA64_LOG_NEXT_INDEX(it)]))
+#define IA64_LOG_CURR_BUFFER(it) (void *)((ia64_state_log[it].isl_log[IA64_LOG_CURR_INDEX(it)]))
/*
* C portion of the OS INIT handler
*
- * Called from ia64__init_handler
+ * Called from ia64_monarch_init_handler
*
* Inputs: pointer to pt_regs where processor info was saved.
*
@@ -825,11 +857,8 @@
void
ia64_log_prt_guid (efi_guid_t *p_guid, prfunc_t prfunc)
{
- printk("GUID = { %08x, %04x, %04x, { %#02x, %#02x, %#02x, %#02x, "
- "%#02x, %#02x, %#02x, %#02x, } } \n ", p_guid->data1,
- p_guid->data2, p_guid->data3, p_guid->data4[0], p_guid->data4[1],
- p_guid->data4[2], p_guid->data4[3], p_guid->data4[4],
- p_guid->data4[5], p_guid->data4[6], p_guid->data4[7]);
+ char out[40];
+ printk(KERN_DEBUG "GUID = %s\n", efi_guid_unparse(p_guid, out));
}
static void
@@ -885,10 +914,18 @@
void
ia64_log_init(int sal_info_type)
{
- IA64_LOG_LOCK_INIT(sal_info_type);
+ u64 max_size = 0;
+
IA64_LOG_NEXT_INDEX(sal_info_type) = 0;
- memset(IA64_LOG_NEXT_BUFFER(sal_info_type), 0,
- sizeof(ia64_err_rec_t) * IA64_MAX_LOGS);
+ IA64_LOG_LOCK_INIT(sal_info_type);
+
+ // SAL will tell us the maximum size of any error record of this type
+ max_size = ia64_sal_get_state_info_size(sal_info_type);
+
+ // set up OS data structures to hold error info
+ IA64_LOG_ALLOCATE(sal_info_type, max_size);
+ memset(IA64_LOG_CURR_BUFFER(sal_info_type), 0, max_size);
+ memset(IA64_LOG_NEXT_BUFFER(sal_info_type), 0, max_size);
}
/*
@@ -923,8 +960,7 @@
return total_len;
} else {
IA64_LOG_UNLOCK(sal_info_type);
- prfunc("ia64_log_get: Failed to retrieve SAL error record type %d\n",
- sal_info_type);
+ prfunc("ia64_log_get: No SAL error record available for type %d\n", sal_info_type);
return 0;
}
}
@@ -1268,7 +1304,7 @@
}
if (mdei->valid.oem_data) {
- ia64_log_prt_oem_data((int)mdei->header.len,
+ platform_mem_dev_err_print((int)mdei->header.len,
(int)sizeof(sal_log_mem_dev_err_info_t) - 1,
&(mdei->oem_data[0]), prfunc);
}
@@ -1357,7 +1393,7 @@
prfunc("\n");
if (pbei->valid.oem_data) {
- ia64_log_prt_oem_data((int)pbei->header.len,
+ platform_pci_bus_err_print((int)pbei->header.len,
(int)sizeof(sal_log_pci_bus_err_info_t) - 1,
&(pbei->oem_data[0]), prfunc);
}
@@ -1456,7 +1492,7 @@
}
}
if (pcei->valid.oem_data) {
- ia64_log_prt_oem_data((int)pcei->header.len, n_pci_data,
+ platform_pci_comp_err_print((int)pcei->header.len, n_pci_data,
p_oem_data, prfunc);
prfunc("\n");
}
@@ -1485,7 +1521,7 @@
ia64_log_prt_guid(&psei->guid, prfunc);
}
if (psei->valid.oem_data) {
- ia64_log_prt_oem_data((int)psei->header.len,
+ platform_plat_specific_err_print((int)psei->header.len,
(int)sizeof(sal_log_plat_specific_err_info_t) - 1,
&(psei->oem_data[0]), prfunc);
}
@@ -1519,7 +1555,7 @@
if (hcei->valid.bus_spec_data)
prfunc(" Bus Specific Data: %#lx", hcei->bus_spec_data);
if (hcei->valid.oem_data) {
- ia64_log_prt_oem_data((int)hcei->header.len,
+ platform_host_ctlr_err_print((int)hcei->header.len,
(int)sizeof(sal_log_host_ctlr_err_info_t) - 1,
&(hcei->oem_data[0]), prfunc);
}
@@ -1553,7 +1589,7 @@
if (pbei->valid.bus_spec_data)
prfunc(" Bus Specific Data: %#lx", pbei->bus_spec_data);
if (pbei->valid.oem_data) {
- ia64_log_prt_oem_data((int)pbei->header.len,
+ platform_plat_bus_err_print((int)pbei->header.len,
(int)sizeof(sal_log_plat_bus_err_info_t) - 1,
&(pbei->oem_data[0]), prfunc);
}
@@ -1716,7 +1752,7 @@
ia64_log_prt_section_header(slsh, prfunc);
#endif // MCA_PRT_XTRA_DATA for test only @FVL
- if (verify_guid((void *)&slsh->guid, (void *)&(SAL_PROC_DEV_ERR_SECT_GUID))) {
+ if (verify_guid(&slsh->guid, &(SAL_PROC_DEV_ERR_SECT_GUID))) {
IA64_MCA_DEBUG("ia64_mca_log_print: unsupported record section\n");
continue;
}
@@ -1745,17 +1781,18 @@
* Inputs : lh (Pointer to the sal error record header with format
* specified by the SAL spec).
* prfunc (fn ptr of log output function to use)
- * Outputs : None
+ * Outputs : platform error status
*/
-void
+int
ia64_log_platform_info_print (sal_log_record_header_t *lh, prfunc_t prfunc)
{
- sal_log_section_hdr_t *slsh;
- int n_sects;
- int ercd_pos;
+ sal_log_section_hdr_t *slsh;
+ int n_sects;
+ int ercd_pos;
+ int platform_err = 0;
if (!lh)
- return;
+ return platform_err;
#ifdef MCA_PRT_XTRA_DATA // for test only @FVL
ia64_log_prt_record_header(lh, prfunc);
@@ -1765,7 +1802,7 @@
IA64_MCA_DEBUG("ia64_mca_log_print: "
"truncated SAL error record. len = %d\n",
lh->len);
- return;
+ return platform_err;
}
/* Print record header info */
@@ -1796,35 +1833,43 @@
ia64_log_proc_dev_err_info_print((sal_log_processor_info_t *)slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_MEM_DEV_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform Memory Device Error Info Section\n");
ia64_log_mem_dev_err_info_print((sal_log_mem_dev_err_info_t *)slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_SEL_DEV_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform SEL Device Error Info Section\n");
ia64_log_sel_dev_err_info_print((sal_log_sel_dev_err_info_t *)slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_PCI_BUS_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform PCI Bus Error Info Section\n");
ia64_log_pci_bus_err_info_print((sal_log_pci_bus_err_info_t *)slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_SMBIOS_DEV_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform SMBIOS Device Error Info Section\n");
ia64_log_smbios_dev_err_info_print((sal_log_smbios_dev_err_info_t *)slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_PCI_COMP_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform PCI Component Error Info Section\n");
ia64_log_pci_comp_err_info_print((sal_log_pci_comp_err_info_t *)slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_SPECIFIC_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform Specific Error Info Section\n");
ia64_log_plat_specific_err_info_print((sal_log_plat_specific_err_info_t *)
slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_HOST_CTLR_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform Host Controller Error Info Section\n");
ia64_log_host_ctlr_err_info_print((sal_log_host_ctlr_err_info_t *)slsh,
prfunc);
} else if (efi_guidcmp(slsh->guid, SAL_PLAT_BUS_ERR_SECT_GUID) == 0) {
+ platform_err = 1;
prfunc("+Platform Bus Error Info Section\n");
ia64_log_plat_bus_err_info_print((sal_log_plat_bus_err_info_t *)slsh,
prfunc);
@@ -1838,8 +1883,9 @@
n_sects, lh->len);
if (!n_sects) {
prfunc("No Platform Error Info Sections found\n");
- return;
+ return platform_err;
}
+ return platform_err;
}
/*
@@ -1849,15 +1895,17 @@
*
* Inputs : info_type (SAL_INFO_TYPE_{MCA,INIT,CMC,CPE})
* prfunc (fn ptr of log output function to use)
- * Outputs : None
+ * Outputs : platform error status
*/
-void
+int
ia64_log_print(int sal_info_type, prfunc_t prfunc)
{
+ int platform_err = 0;
+
switch(sal_info_type) {
case SAL_INFO_TYPE_MCA:
prfunc("+BEGIN HARDWARE ERROR STATE AT MCA\n");
- ia64_log_platform_info_print(IA64_LOG_CURR_BUFFER(sal_info_type), prfunc);
+ platform_err = ia64_log_platform_info_print(IA64_LOG_CURR_BUFFER(sal_info_type), prfunc);
prfunc("+END HARDWARE ERROR STATE AT MCA\n");
break;
case SAL_INFO_TYPE_INIT:
@@ -1877,4 +1925,5 @@
prfunc("+MCA UNKNOWN ERROR LOG (UNIMPLEMENTED)\n");
break;
}
+ return platform_err;
}
diff -urN linux-2.4.18/arch/ia64/kernel/mca_asm.S lia64-2.4/arch/ia64/kernel/mca_asm.S
--- linux-2.4.18/arch/ia64/kernel/mca_asm.S Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/mca_asm.S Thu Jan 24 11:28:57 2002
@@ -7,6 +7,12 @@
// 00/03/29 cfleck Added code to save INIT handoff state in pt_regs format, switch to temp
// kstack, switch modes, jump to C INIT handler
//
+// 02/01/04 J.Hall
+// Before entering virtual mode code:
+// 1. Check for TLB CPU error
+// 2. Restore current thread pointer to kr6
+// 3. Move stack ptr 16 bytes to conform to C calling convention
+//
#include
#include
@@ -21,10 +27,21 @@
*/
#define MINSTATE_PHYS /* Make sure stack access is physical for MINSTATE */
+/*
+ * Needed for ia64_sal call
+ */
+#define SAL_GET_STATE_INFO 0x01000001
+
+/*
+ * Needed for return context to SAL
+ */
+#define IA64_MCA_SAME_CONTEXT 0x0
+#define IA64_MCA_COLD_BOOT -2
+
#include "minstate.h"
/*
- * SAL_TO_OS_MCA_HANDOFF_STATE (SAL 3.0 spec)
+ * SAL_TO_OS_MCA_HANDOFF_STATE (SAL 3.0 spec)
* 1. GR1 = OS GP
* 2. GR8 = PAL_PROC physical address
* 3. GR9 = SAL_PROC physical address
@@ -40,26 +57,34 @@
st8 [_tmp]=r9,0x08;; \
st8 [_tmp]=r10,0x08;; \
st8 [_tmp]=r11,0x08;; \
- st8 [_tmp]=r12,0x08;;
+ st8 [_tmp]=r12,0x08
/*
- * OS_MCA_TO_SAL_HANDOFF_STATE (SAL 3.0 spec)
- * 1. GR8 = OS_MCA return status
+ * OS_MCA_TO_SAL_HANDOFF_STATE (SAL 3.0 spec)
+ * (p6) is executed if we never entered virtual mode (TLB error)
+ * (p7) is executed if we entered virtual mode as expected (normal case)
+ * 1. GR8 = OS_MCA return status
* 2. GR9 = SAL GP (physical)
- * 3. GR10 = 0/1 returning same/new context
- * 4. GR22 = New min state save area pointer
- * returns ptr to SAL rtn save loc in _tmp
+ * 3. GR10 = 0/1 returning same/new context
+ * 4. GR22 = New min state save area pointer
+ * returns ptr to SAL rtn save loc in _tmp
*/
-#define OS_MCA_TO_SAL_HANDOFF_STATE_RESTORE(_tmp) \
- movl _tmp=ia64_os_to_sal_handoff_state;; \
- DATA_VA_TO_PA(_tmp);; \
- ld8 r8=[_tmp],0x08;; \
- ld8 r9=[_tmp],0x08;; \
- ld8 r10=[_tmp],0x08;; \
- ld8 r22=[_tmp],0x08;; \
- movl _tmp=ia64_sal_to_os_handoff_state;; \
- DATA_VA_TO_PA(_tmp);; \
- add _tmp=0x28,_tmp;; // point to SAL rtn save location
+#define OS_MCA_TO_SAL_HANDOFF_STATE_RESTORE(_tmp) \
+(p6) movl _tmp=ia64_sal_to_os_handoff_state;; \
+(p7) movl _tmp=ia64_os_to_sal_handoff_state;; \
+ DATA_VA_TO_PA(_tmp);; \
+(p6) movl r8=IA64_MCA_COLD_BOOT; \
+(p6) movl r10=IA64_MCA_SAME_CONTEXT; \
+(p6) add _tmp=0x18,_tmp;; \
+(p6) ld8 r9=[_tmp],0x10; \
+(p6) movl r22=ia64_mca_min_state_save_info;; \
+(p7) ld8 r8=[_tmp],0x08;; \
+(p7) ld8 r9=[_tmp],0x08;; \
+(p7) ld8 r10=[_tmp],0x08;; \
+(p7) ld8 r22=[_tmp],0x08;; \
+ DATA_VA_TO_PA(r22)
+ // now _tmp is pointing to SAL rtn save location
+
.global ia64_os_mca_dispatch
.global ia64_os_mca_dispatch_end
@@ -70,6 +95,9 @@
.global ia64_mca_stackframe
.global ia64_mca_bspstore
.global ia64_init_stack
+ .global ia64_mca_sal_data_area
+ .global ia64_tlb_functional
+ .global ia64_mca_min_state_save_info
.text
.align 16
@@ -90,26 +118,34 @@
// for ia64_mca_sal_to_os_state_t has been
// defined in include/asm/mca.h
SAL_TO_OS_MCA_HANDOFF_STATE_SAVE(r2)
+ ;;
// LOG PROCESSOR STATE INFO FROM HERE ON..
- ;;
begin_os_mca_dump:
br ia64_os_mca_proc_state_dump;;
ia64_os_mca_done_dump:
// Setup new stack frame for OS_MCA handling
- movl r2=ia64_mca_bspstore;; // local bspstore area location in r2
+ movl r2=ia64_mca_bspstore;; // local bspstore area location in r2
DATA_VA_TO_PA(r2);;
- movl r3=ia64_mca_stackframe;; // save stack frame to memory in r3
+ movl r3=ia64_mca_stackframe;; // save stack frame to memory in r3
DATA_VA_TO_PA(r3);;
- rse_switch_context(r6,r3,r2);; // RSC management in this new context
- movl r12=ia64_mca_stack;;
- mov r2=8*1024;; // stack size must be same as c array
- add r12=r2,r12;; // stack base @ bottom of array
+ rse_switch_context(r6,r3,r2);; // RSC management in this new context
+ movl r12=ia64_mca_stack
+ mov r2=8*1024;; // stack size must be same as C array
+ add r12=r2,r12;; // stack base @ bottom of array
+ adds r12=-16,r12;; // allow 16 bytes of scratch
+ // (C calling convention)
DATA_VA_TO_PA(r12);;
- // Enter virtual mode from physical mode
+ // Check to see if the MCA resulted from a TLB error
+begin_tlb_error_check:
+ br ia64_os_mca_tlb_error_check;;
+
+done_tlb_error_check:
+
+ // If TLB is functional, enter virtual mode from physical mode
VIRTUAL_MODE_ENTER(r2, r3, ia64_os_mca_virtual_begin, r4)
ia64_os_mca_virtual_begin:
@@ -130,25 +166,28 @@
#endif /* #if defined(MCA_TEST) */
// restore the original stack frame here
- movl r2=ia64_mca_stackframe // restore stack frame from memory at r2
+ movl r2=ia64_mca_stackframe // restore stack frame from memory at r2
;;
DATA_VA_TO_PA(r2)
movl r4=IA64_PSR_MC
;;
- rse_return_context(r4,r3,r2) // switch from interrupt context for RSE
+ rse_return_context(r4,r3,r2) // switch from interrupt context for RSE
// let us restore all the registers from our PSI structure
- mov r8=gp
+ mov r8=gp
;;
begin_os_mca_restore:
br ia64_os_mca_proc_state_restore;;
ia64_os_mca_done_restore:
- ;;
+ movl r3=ia64_tlb_functional;;
+ DATA_VA_TO_PA(r3);;
+ ld8 r3=[r3];;
+ cmp.eq p6,p7=r0,r3;;
+ OS_MCA_TO_SAL_HANDOFF_STATE_RESTORE(r2);;
// branch back to SALE_CHECK
- OS_MCA_TO_SAL_HANDOFF_STATE_RESTORE(r2)
ld8 r3=[r2];;
- mov b0=r3;; // SAL_CHECK return address
+ mov b0=r3;; // SAL_CHECK return address
br b0
;;
ia64_os_mca_dispatch_end:
@@ -405,7 +444,7 @@
movl r2=ia64_mca_proc_state_dump // Convert virtual address
;; // of OS state dump area
DATA_VA_TO_PA(r2) // to physical address
- ;;
+
restore_GRs: // restore bank-1 GRs 16-31
bsw.1;;
add r3=16*8,r2;; // to get to NaT of GR 16-31
@@ -621,6 +660,80 @@
//EndStub//////////////////////////////////////////////////////////////////////
+//++
+// Name:
+// ia64_os_mca_tlb_error_check()
+//
+// Stub Description:
+//
+// This stub checks to see if the MCA resulted from a TLB error
+//
+//--
+
+ia64_os_mca_tlb_error_check:
+
+ // Retrieve sal data structure for uncorrected MCA
+
+ // Make the ia64_sal_get_state_info() call
+ movl r4=ia64_mca_sal_data_area;;
+ movl r7=ia64_sal;;
+ mov r6=r1 // save gp
+ DATA_VA_TO_PA(r4) // convert to physical address
+ DATA_VA_TO_PA(r7);; // convert to physical address
+ ld8 r7=[r7] // get addr of pdesc from ia64_sal
+ movl r3=SAL_GET_STATE_INFO;;
+ DATA_VA_TO_PA(r7);; // convert to physical address
+ ld8 r8=[r7],8;; // get pdesc function pointer
+ DATA_VA_TO_PA(r8) // convert to physical address
+ ld8 r1=[r7];; // set new (ia64_sal) gp
+ DATA_VA_TO_PA(r1) // convert to physical address
+ mov b6=r8
+
+ alloc r5=ar.pfs,8,0,8,0;; // allocate stack frame for SAL call
+ mov out0=r3 // which SAL proc to call
+ mov out1=r0 // error type == MCA
+ mov out2=r0 // null arg
+ mov out3=r4 // data copy area
+ mov out4=r0 // null arg
+ mov out5=r0 // null arg
+ mov out6=r0 // null arg
+ mov out7=r0;; // null arg
+
+ br.call.sptk.few b0=b6;;
+
+ mov r1=r6 // restore gp
+ mov ar.pfs=r5;; // restore ar.pfs
+
+ movl r6=ia64_tlb_functional;;
+ DATA_VA_TO_PA(r6) // needed later
+
+ cmp.eq p6,p7=r0,r8;; // check SAL call return address
+(p7) st8 [r6]=r0 // clear tlb_functional flag
+(p7) br tlb_failure // error; return to SAL
+
+ // examine processor error log for type of error
+ add r4=40+24,r4;; // parse past record header (length=40)
+ // and section header (length=24)
+ ld4 r4=[r4] // get valid field of processor log
+ mov r5=0xf00;;
+ and r5=r4,r5;; // read bits 8-11 of valid field
+ // to determine if we have a TLB error
+ movl r3=0x1
+ cmp.eq p6,p7=r0,r5;;
+ // if no TLB failure, set tlb_functional flag
+(p6) st8 [r6]=r3
+ // else clear flag
+(p7) st8 [r6]=r0
+
+ // if no TLB failure, continue with normal virtual mode logging
+(p6) br done_tlb_error_check
+ // else no point in entering virtual mode for logging
+tlb_failure:
+ br ia64_os_mca_virtual_end
+
+//EndStub//////////////////////////////////////////////////////////////////////
+
+
// ok, the issue here is that we need to save state information so
// it can be useable by the kernel debugger and show regs routines.
// In order to do this, our best bet is save the current state (plus
@@ -633,7 +746,7 @@
// This has been defined for registration purposes with SAL
// as a part of ia64_mca_init.
//
-// When we get here, the follow registers have been
+// When we get here, the following registers have been
// set by the SAL for our use
//
// 1. GR1 = OS INIT GP
@@ -649,42 +762,10 @@
GLOBAL_ENTRY(ia64_monarch_init_handler)
-#if defined(CONFIG_SMP) && defined(SAL_MPINIT_WORKAROUND)
- //
- // work around SAL bug that sends all processors to monarch entry
- //
- mov r17=cr.lid
- // XXX fix me: this is wrong: hard_smp_processor_id() is a pair of lid/eid
- movl r18=ia64_cpu_to_sapicid
- ;;
- dep r18=0,r18,61,3 // convert to physical address
- ;;
- shr.u r17=r17,16
- ld4 r18=[r18] // get the BSP ID
- ;;
- dep r17=0,r17,16,48
- ;;
- cmp4.ne p6,p0=r17,r18 // Am I the BSP ?
-(p6) br.cond.spnt slave_init_spin_me
- ;;
-#endif
-
-//
-// ok, the first thing we do is stash the information
-// the SAL passed to os
-//
-_tmp = r2
- movl _tmp=ia64_sal_to_os_handoff_state
- ;;
- dep _tmp=0,_tmp, 61, 3 // get physical address
+ // stash the information the SAL passed to os
+ SAL_TO_OS_MCA_HANDOFF_STATE_SAVE(r2)
;;
- st8 [_tmp]=r1,0x08;;
- st8 [_tmp]=r8,0x08;;
- st8 [_tmp]=r9,0x08;;
- st8 [_tmp]=r10,0x08;;
- st8 [_tmp]=r11,0x08;;
- st8 [_tmp]=r12,0x08;;
// now we want to save information so we can dump registers
SAVE_MIN_WITH_COVER
@@ -695,12 +776,10 @@
;;
SAVE_REST
-// ok, enough should be saved at this point to be dangerous, and supply
+// ok, enough should be saved at this point to be dangerous, and supply
// information for a dump
// We need to switch to Virtual mode before hitting the C functions.
-//
-//
-//
+
movl r2=IA64_PSR_IT|IA64_PSR_IC|IA64_PSR_DT|IA64_PSR_RT|IA64_PSR_DFH|IA64_PSR_BN
mov r3=psr // get the current psr, minimum enabled at this point
;;
@@ -708,8 +787,8 @@
;;
movl r3=IVirtual_Switch
;;
- mov cr.iip=r3 // short return to set the appropriate bits
- mov cr.ipsr=r2 // need to do an rfi to set appropriate bits
+ mov cr.iip=r3 // short return to set the appropriate bits
+ mov cr.ipsr=r2 // need to do an rfi to set appropriate bits
;;
rfi
;;
@@ -717,7 +796,7 @@
//
// We should now be running virtual
//
- // Lets call the C handler to get the rest of the state info
+ // Let's call the C handler to get the rest of the state info
//
alloc r14=ar.pfs,0,0,1,0 // now it's safe (must be first in insn group!)
;; //
diff -urN linux-2.4.18/arch/ia64/kernel/minstate.h lia64-2.4/arch/ia64/kernel/minstate.h
--- linux-2.4.18/arch/ia64/kernel/minstate.h Tue Jul 31 10:30:08 2001
+++ lia64-2.4/arch/ia64/kernel/minstate.h Tue Apr 9 22:21:40 2002
@@ -92,7 +92,6 @@
*
* Assumed state upon entry:
* psr.ic: off
- * psr.dt: off
* r31: contains saved predicates (pr)
*
* Upon exit, the state is as follows:
@@ -186,7 +185,6 @@
*
* Assumed state upon entry:
* psr.ic: on
- * psr.dt: on
* r2: points to &pt_regs.r16
* r3: points to &pt_regs.r17
*/
diff -urN linux-2.4.18/arch/ia64/kernel/pal.S lia64-2.4/arch/ia64/kernel/pal.S
--- linux-2.4.18/arch/ia64/kernel/pal.S Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/pal.S Tue Feb 26 20:41:18 2002
@@ -161,7 +161,7 @@
;;
mov loc3 = psr // save psr
adds r8 = 1f-1b,r8 // calculate return address for call
- ;;
+ ;;
mov loc4=ar.rsc // save RSE configuration
dep.z loc2=loc2,0,61 // convert pal entry point to physical
dep.z r8=r8,0,61 // convert rp to physical
@@ -216,7 +216,7 @@
mov out3 = in3 // copy arg3
;;
mov loc3 = psr // save psr
- ;;
+ ;;
mov loc4=ar.rsc // save RSE configuration
dep.z loc2=loc2,0,61 // convert pal entry point to physical
;;
diff -urN linux-2.4.18/arch/ia64/kernel/palinfo.c lia64-2.4/arch/ia64/kernel/palinfo.c
--- linux-2.4.18/arch/ia64/kernel/palinfo.c Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/palinfo.c Thu Jan 24 17:16:06 2002
@@ -724,7 +724,7 @@
status = ia64_pal_tr_read(j, i, tr_buffer, &tr_valid);
if (status != 0) {
- printk(__FUNCTION__ " pal call failed on tr[%d:%d]=%ld\n", i, j, status);
+ printk("palinfo: pal call failed on tr[%d:%d]=%ld\n", i, j, status);
continue;
}
@@ -842,9 +842,8 @@
palinfo_smp_call(void *info)
{
palinfo_smp_data_t *data = (palinfo_smp_data_t *)info;
- /* printk(__FUNCTION__" called on CPU %d\n", smp_processor_id());*/
if (data == NULL) {
- printk(KERN_ERR __FUNCTION__" data pointer is NULL\n");
+ printk("%s palinfo: data pointer is NULL\n", KERN_ERR);
data->ret = 0; /* no output */
return;
}
@@ -868,11 +867,10 @@
ptr.page = page;
ptr.ret = 0; /* just in case */
- /*printk(__FUNCTION__" calling CPU %d from CPU %d for function %d\n", f->req_cpu,smp_processor_id(), f->func_id);*/
/* will send IPI to other CPU and wait for completion of remote call */
if ((ret=smp_call_function_single(f->req_cpu, palinfo_smp_call, &ptr, 0, 1))) {
- printk(__FUNCTION__" remote CPU call from %d to %d on function %d: error %d\n", smp_processor_id(), f->req_cpu, f->func_id, ret);
+ printk("palinfo: remote CPU call from %d to %d on function %d: error %d\n", smp_processor_id(), f->req_cpu, f->func_id, ret);
return 0;
}
return ptr.ret;
@@ -881,7 +879,7 @@
static
int palinfo_handle_smp(pal_func_cpu_u_t *f, char *page)
{
- printk(__FUNCTION__" should not be called with non SMP kernel\n");
+ printk("palinfo: should not be called with non SMP kernel\n");
return 0;
}
#endif /* CONFIG_SMP */
diff -urN linux-2.4.18/arch/ia64/kernel/pci.c lia64-2.4/arch/ia64/kernel/pci.c
--- linux-2.4.18/arch/ia64/kernel/pci.c Wed Dec 26 16:58:36 2001
+++ lia64-2.4/arch/ia64/kernel/pci.c Wed Apr 10 10:28:58 2002
@@ -42,101 +42,183 @@
extern void ia64_mca_check_errors( void );
#endif
+struct pci_fixup pcibios_fixups[];
+
+struct pci_ops *pci_root_ops;
+
+int (*pci_config_read)(int seg, int bus, int dev, int fn, int reg, int len, u32 *value);
+int (*pci_config_write)(int seg, int bus, int dev, int fn, int reg, int len, u32 value);
+
+
/*
- * This interrupt-safe spinlock protects all accesses to PCI
- * configuration space.
+ * Low-level SAL-based PCI configuration access functions. Note that SAL
+ * calls are already serialized (via sal_lock), so we don't need another
+ * synchronization mechanism here. Not using segment number (yet).
*/
-static spinlock_t pci_lock = SPIN_LOCK_UNLOCKED;
-struct pci_fixup pcibios_fixups[] = {
- { 0 }
-};
+#define PCI_SAL_ADDRESS(bus, dev, fn, reg) \
+ ((u64)(bus << 16) | (u64)(dev << 11) | (u64)(fn << 8) | (u64)(reg))
+
+static int
+pci_sal_read (int seg, int bus, int dev, int fn, int reg, int len, u32 *value)
+{
+ int result = 0;
+ u64 data = 0;
+
+ if (!value || (bus > 255) || (dev > 31) || (fn > 7) || (reg > 255))
+ return -EINVAL;
+
+ result = ia64_sal_pci_config_read(PCI_SAL_ADDRESS(bus, dev, fn, reg), len, &data);
+
+ *value = (u32) data;
-/* Macro to build a PCI configuration address to be passed as a parameter to SAL. */
+ return result;
+}
+
+static int
+pci_sal_write (int seg, int bus, int dev, int fn, int reg, int len, u32 value)
+{
+ if ((bus > 255) || (dev > 31) || (fn > 7) || (reg > 255))
+ return -EINVAL;
+
+ return ia64_sal_pci_config_write(PCI_SAL_ADDRESS(bus, dev, fn, reg), len, value);
+}
-#define PCI_CONFIG_ADDRESS(dev, where) \
- (((u64) dev->bus->number << 16) | ((u64) (dev->devfn & 0xff) << 8) | (where & 0xff))
static int
-pci_conf_read_config_byte(struct pci_dev *dev, int where, u8 *value)
+pci_sal_read_config_byte (struct pci_dev *dev, int where, u8 *value)
{
- s64 status;
- u64 lval;
+ int result = 0;
+ u32 data = 0;
- status = ia64_sal_pci_config_read(PCI_CONFIG_ADDRESS(dev, where), 1, &lval);
- *value = lval;
- return status;
+ if (!value)
+ return -EINVAL;
+
+ result = pci_sal_read(0, dev->bus->number, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn), where, 1, &data);
+
+ *value = (u8) data;
+
+ return result;
}
static int
-pci_conf_read_config_word(struct pci_dev *dev, int where, u16 *value)
+pci_sal_read_config_word (struct pci_dev *dev, int where, u16 *value)
{
- s64 status;
- u64 lval;
+ int result = 0;
+ u32 data = 0;
+
+ if (!value)
+ return -EINVAL;
+
+ result = pci_sal_read(0, dev->bus->number, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn), where, 2, &data);
- status = ia64_sal_pci_config_read(PCI_CONFIG_ADDRESS(dev, where), 2, &lval);
- *value = lval;
- return status;
+ *value = (u16) data;
+
+ return result;
}
static int
-pci_conf_read_config_dword(struct pci_dev *dev, int where, u32 *value)
+pci_sal_read_config_dword (struct pci_dev *dev, int where, u32 *value)
{
- s64 status;
- u64 lval;
+ if (!value)
+ return -EINVAL;
- status = ia64_sal_pci_config_read(PCI_CONFIG_ADDRESS(dev, where), 4, &lval);
- *value = lval;
- return status;
+ return pci_sal_read(0, dev->bus->number, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn), where, 4, value);
}
static int
-pci_conf_write_config_byte (struct pci_dev *dev, int where, u8 value)
+pci_sal_write_config_byte (struct pci_dev *dev, int where, u8 value)
{
- return ia64_sal_pci_config_write(PCI_CONFIG_ADDRESS(dev, where), 1, value);
+ return pci_sal_write(0, dev->bus->number, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn), where, 1, value);
}
static int
-pci_conf_write_config_word (struct pci_dev *dev, int where, u16 value)
+pci_sal_write_config_word (struct pci_dev *dev, int where, u16 value)
{
- return ia64_sal_pci_config_write(PCI_CONFIG_ADDRESS(dev, where), 2, value);
+ return pci_sal_write(0, dev->bus->number, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn), where, 2, value);
}
static int
-pci_conf_write_config_dword (struct pci_dev *dev, int where, u32 value)
+pci_sal_write_config_dword (struct pci_dev *dev, int where, u32 value)
{
- return ia64_sal_pci_config_write(PCI_CONFIG_ADDRESS(dev, where), 4, value);
+ return pci_sal_write(0, dev->bus->number, PCI_SLOT(dev->devfn),
+ PCI_FUNC(dev->devfn), where, 4, value);
}
-struct pci_ops pci_conf = {
- pci_conf_read_config_byte,
- pci_conf_read_config_word,
- pci_conf_read_config_dword,
- pci_conf_write_config_byte,
- pci_conf_write_config_word,
- pci_conf_write_config_dword
+struct pci_ops pci_sal_ops = {
+ pci_sal_read_config_byte,
+ pci_sal_read_config_word,
+ pci_sal_read_config_dword,
+ pci_sal_write_config_byte,
+ pci_sal_write_config_word,
+ pci_sal_write_config_dword
};
+
/*
* Initialization. Uses the SAL interface
*/
+
+struct pci_bus *
+pcibios_scan_root(int seg, int bus)
+{
+ struct list_head *list = NULL;
+ struct pci_bus *pci_bus = NULL;
+
+ list_for_each(list, &pci_root_buses) {
+ pci_bus = pci_bus_b(list);
+ if (pci_bus->number == bus) {
+ /* Already scanned */
+ printk("PCI: Bus (%02x:%02x) already probed\n", seg, bus);
+ return pci_bus;
+ }
+ }
+
+ printk("PCI: Probing PCI hardware on bus (%02x:%02x)\n", seg, bus);
+
+ return pci_scan_bus(bus, pci_root_ops, NULL);
+}
+
+void __init
+pcibios_config_init (void)
+{
+ if (pci_root_ops)
+ return;
+
+ printk("PCI: Using SAL to access configuration space\n");
+
+ pci_root_ops = &pci_sal_ops;
+ pci_config_read = pci_sal_read;
+ pci_config_write = pci_sal_write;
+
+ return;
+}
+
void __init
pcibios_init (void)
{
# define PCI_BUSES_TO_SCAN 255
- int i;
+ int i = 0;
#ifdef CONFIG_IA64_MCA
ia64_mca_check_errors(); /* For post-failure MCA error logging */
#endif
- platform_pci_fixup(0); /* phase 0 initialization (before PCI bus has been scanned) */
+ pcibios_config_init();
+
+ platform_pci_fixup(0); /* phase 0 fixups (before buses scanned) */
printk("PCI: Probing PCI hardware\n");
for (i = 0; i < PCI_BUSES_TO_SCAN; i++)
- pci_scan_bus(i, &pci_conf, NULL);
+ pci_scan_bus(i, pci_root_ops, NULL);
+
+ platform_pci_fixup(1); /* phase 1 fixups (after buses scanned) */
- platform_pci_fixup(1); /* phase 1 initialization (after PCI bus has been scanned) */
return;
}
@@ -186,7 +268,14 @@
int
pcibios_enable_device (struct pci_dev *dev)
{
+ if (!dev)
+ return -EINVAL;
+
/* Not needed, since we enable all devices at startup. */
+
+ printk(KERN_INFO "PCI: Found IRQ %d for device %s\n", dev->irq,
+ dev->slot_name);
+
return 0;
}
diff -urN linux-2.4.18/arch/ia64/kernel/perfmon.c lia64-2.4/arch/ia64/kernel/perfmon.c
--- linux-2.4.18/arch/ia64/kernel/perfmon.c Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/perfmon.c Tue Apr 9 13:23:36 2002
@@ -1,13 +1,16 @@
/*
- * This file contains the code to configure and read/write the ia64 performance
- * monitoring stuff.
+ * This file implements the perfmon subsystem which is used
+ * to program the IA-64 Performance Monitoring Unit (PMU).
*
* Originaly Written by Ganesh Venkitachalam, IBM Corp.
- * Modifications by David Mosberger-Tang, Hewlett-Packard Co.
- * Modifications by Stephane Eranian, Hewlett-Packard Co.
* Copyright (C) 1999 Ganesh Venkitachalam
- * Copyright (C) 1999 David Mosberger-Tang
- * Copyright (C) 2000-2001 Stephane Eranian
+ *
+ * Modifications by Stephane Eranian, Hewlett-Packard Co.
+ * Modifications by David Mosberger-Tang, Hewlett-Packard Co.
+ *
+ * Copyright (C) 1999-2002 Hewlett Packard Co
+ * Stephane Eranian
+ * David Mosberger-Tang
*/
#include
@@ -20,286 +23,413 @@
#include
#include
#include
+#include
#include
-#include
#include
-#include
#include
#include
#include
-#include
#include
#include
#include
-#include
#include
#include /* for ia64_get_itc() */
#ifdef CONFIG_PERFMON
-#define PFM_VERSION "0.3"
-#define PFM_SMPL_HDR_VERSION 1
-
-#define PMU_FIRST_COUNTER 4 /* first generic counter */
-
-#define PFM_WRITE_PMCS 0xa0
-#define PFM_WRITE_PMDS 0xa1
-#define PFM_READ_PMDS 0xa2
-#define PFM_STOP 0xa3
-#define PFM_START 0xa4
-#define PFM_ENABLE 0xa5 /* unfreeze only */
-#define PFM_DISABLE 0xa6 /* freeze only */
-#define PFM_RESTART 0xcf
-#define PFM_CREATE_CONTEXT 0xa7
-#define PFM_DESTROY_CONTEXT 0xa8
/*
- * Those 2 are just meant for debugging. I considered using sysctl() for
- * that but it is a little bit too pervasive. This solution is at least
- * self-contained.
+ * For PMUs which rely on the debug registers for some features, you must
+ * you must enable the following flag to activate the support for
+ * accessing the registers via the perfmonctl() interface.
*/
-#define PFM_DEBUG_ON 0xe0
-#define PFM_DEBUG_OFF 0xe1
-
-#define PFM_DEBUG_BASE PFM_DEBUG_ON
-
+#if defined(CONFIG_ITANIUM) || defined(CONFIG_MCKINLEY)
+#define PFM_PMU_USES_DBR 1
+#endif
/*
- * perfmon API flags
+ * perfmon context states
*/
-#define PFM_FL_INHERIT_NONE 0x00 /* never inherit a context across fork (default) */
-#define PFM_FL_INHERIT_ONCE 0x01 /* clone pfm_context only once across fork() */
-#define PFM_FL_INHERIT_ALL 0x02 /* always clone pfm_context across fork() */
-#define PFM_FL_SMPL_OVFL_NOBLOCK 0x04 /* do not block on sampling buffer overflow */
-#define PFM_FL_SYSTEM_WIDE 0x08 /* create a system wide context */
-#define PFM_FL_EXCL_INTR 0x10 /* exclude interrupt from system wide monitoring */
+#define PFM_CTX_DISABLED 0
+#define PFM_CTX_ENABLED 1
/*
- * PMC API flags
+ * Reset register flags
*/
-#define PFM_REGFL_OVFL_NOTIFY 1 /* send notification on overflow */
+#define PFM_RELOAD_LONG_RESET 1
+#define PFM_RELOAD_SHORT_RESET 2
/*
- * Private flags and masks
+ * Misc macros and definitions
*/
+#define PMU_FIRST_COUNTER 4
+
+#define PFM_IS_DISABLED() pmu_conf.pfm_is_disabled
+
+#define PMC_OVFL_NOTIFY(ctx, i) ((ctx)->ctx_soft_pmds[i].flags & PFM_REGFL_OVFL_NOTIFY)
#define PFM_FL_INHERIT_MASK (PFM_FL_INHERIT_NONE|PFM_FL_INHERIT_ONCE|PFM_FL_INHERIT_ALL)
-#ifdef CONFIG_SMP
-#define cpu_is_online(i) (cpu_online_map & (1UL << i))
-#else
-#define cpu_is_online(i) 1
-#endif
+/* i assume unsigned */
+#define PMC_IS_IMPL(i) (i>6] & (1UL<< (i) %64))
+#define PMD_IS_IMPL(i) (i>6)] & (1UL<<(i) % 64))
+
+/* XXX: these three assume that register i is implemented */
+#define PMD_IS_COUNTING(i) (pmu_conf.pmd_desc[i].type == PFM_REG_COUNTING)
+#define PMC_IS_COUNTING(i) (pmu_conf.pmc_desc[i].type == PFM_REG_COUNTING)
+#define PMC_IS_MONITOR(c) (pmu_conf.pmc_desc[i].type == PFM_REG_MONITOR)
+
+/* k assume unsigned */
+#define IBR_IS_IMPL(k) (kctx_flags.state == PFM_CTX_ENABLED)
+#define CTX_OVFL_NOBLOCK(c) ((c)->ctx_fl_block == 0)
+#define CTX_INHERIT_MODE(c) ((c)->ctx_fl_inherit)
+#define CTX_HAS_SMPL(c) ((c)->ctx_psb != NULL)
+/* XXX: does not support more than 64 PMDs */
+#define CTX_USED_PMD(ctx, mask) (ctx)->ctx_used_pmds[0] |= (mask)
+#define CTX_IS_USED_PMD(ctx, c) (((ctx)->ctx_used_pmds[0] & (1UL << (c))) != 0UL)
+
+
+#define CTX_USED_IBR(ctx,n) (ctx)->ctx_used_ibrs[(n)>>6] |= 1UL<< ((n) % 64)
+#define CTX_USED_DBR(ctx,n) (ctx)->ctx_used_dbrs[(n)>>6] |= 1UL<< ((n) % 64)
+#define CTX_USES_DBREGS(ctx) (((pfm_context_t *)(ctx))->ctx_fl_using_dbreg==1)
+
+#define LOCK_CTX(ctx) spin_lock(&(ctx)->ctx_lock)
+#define UNLOCK_CTX(ctx) spin_unlock(&(ctx)->ctx_lock)
+
+#define SET_PMU_OWNER(t) do { pmu_owners[smp_processor_id()].owner = (t); } while(0)
+#define PMU_OWNER() pmu_owners[smp_processor_id()].owner
-#define PMC_IS_IMPL(i) (i < pmu_conf.num_pmcs && pmu_conf.impl_regs[i>>6] & (1<< (i&~(64-1))))
-#define PMD_IS_IMPL(i) (i < pmu_conf.num_pmds && pmu_conf.impl_regs[4+(i>>6)] & (1<< (i&~(64-1))))
-#define PMD_IS_COUNTER(i) (i>=PMU_FIRST_COUNTER && i < (PMU_FIRST_COUNTER+pmu_conf.max_counters))
-#define PMC_IS_COUNTER(i) (i>=PMU_FIRST_COUNTER && i < (PMU_FIRST_COUNTER+pmu_conf.max_counters))
+#define LOCK_PFS() spin_lock(&pfm_sessions.pfs_lock)
+#define UNLOCK_PFS() spin_unlock(&pfm_sessions.pfs_lock)
-/* This is the Itanium-specific PMC layout for counter config */
+#define PFM_REG_RETFLAG_SET(flags, val) do { flags &= ~PFM_REG_RETFL_MASK; flags |= (val); } while(0)
+
+/*
+ * debugging
+ */
+#define DBprintk(a) \
+ do { \
+ if (pfm_debug_mode >0 || pfm_sysctl.debug >0) { printk("%s.%d: CPU%d ", __FUNCTION__, __LINE__, smp_processor_id()); printk a; } \
+ } while (0)
+
+
+/*
+ * Architected PMC structure
+ */
typedef struct {
unsigned long pmc_plm:4; /* privilege level mask */
unsigned long pmc_ev:1; /* external visibility */
unsigned long pmc_oi:1; /* overflow interrupt */
unsigned long pmc_pm:1; /* privileged monitor */
unsigned long pmc_ig1:1; /* reserved */
- unsigned long pmc_es:7; /* event select */
- unsigned long pmc_ig2:1; /* reserved */
- unsigned long pmc_umask:4; /* unit mask */
- unsigned long pmc_thres:3; /* threshold */
- unsigned long pmc_ig3:1; /* reserved (missing from table on p6-17) */
- unsigned long pmc_ism:2; /* instruction set mask */
- unsigned long pmc_ig4:38; /* reserved */
-} pmc_counter_reg_t;
-
-/* test for EAR/BTB configuration */
-#define PMU_DEAR_EVENT 0x67
-#define PMU_IEAR_EVENT 0x23
-#define PMU_BTB_EVENT 0x11
-
-#define PMC_IS_DEAR(a) (((pmc_counter_reg_t *)(a))->pmc_es == PMU_DEAR_EVENT)
-#define PMC_IS_IEAR(a) (((pmc_counter_reg_t *)(a))->pmc_es == PMU_IEAR_EVENT)
-#define PMC_IS_BTB(a) (((pmc_counter_reg_t *)(a))->pmc_es == PMU_BTB_EVENT)
-
-/*
- * This header is at the beginning of the sampling buffer returned to the user.
- * It is exported as Read-Only at this point. It is directly followed with the
- * first record.
- */
-typedef struct {
- int hdr_version; /* could be used to differentiate formats */
- int hdr_reserved;
- unsigned long hdr_entry_size; /* size of one entry in bytes */
- unsigned long hdr_count; /* how many valid entries */
- unsigned long hdr_pmds; /* which pmds are recorded */
-} perfmon_smpl_hdr_t;
-
-/*
- * Header entry in the buffer as a header as follows.
- * The header is directly followed with the PMDS to saved in increasing index order:
- * PMD4, PMD5, .... How many PMDs are present is determined by the tool which must
- * keep track of it when generating the final trace file.
- */
-typedef struct {
- int pid; /* identification of process */
- int cpu; /* which cpu was used */
- unsigned long rate; /* initial value of this counter */
- unsigned long stamp; /* timestamp */
- unsigned long ip; /* where did the overflow interrupt happened */
- unsigned long regs; /* which registers overflowed (up to 64)*/
-} perfmon_smpl_entry_t;
+ unsigned long pmc_es:8; /* event select */
+ unsigned long pmc_ig2:48; /* reserved */
+} pfm_monitor_t;
/*
* There is one such data structure per perfmon context. It is used to describe the
- * sampling buffer. It is to be shared among siblings whereas the pfm_context isn't.
+ * sampling buffer. It is to be shared among siblings whereas the pfm_context
+ * is not.
* Therefore we maintain a refcnt which is incremented on fork().
- * This buffer is private to the kernel only the actual sampling buffer including its
- * header are exposed to the user. This construct allows us to export the buffer read-write,
- * if needed, without worrying about security problems.
+ * This buffer is private to the kernel only the actual sampling buffer
+ * including its header are exposed to the user. This construct allows us to
+ * export the buffer read-write, if needed, without worrying about security
+ * problems.
*/
-typedef struct {
- atomic_t psb_refcnt; /* how many users for the buffer */
- int reserved;
+typedef struct _pfm_smpl_buffer_desc {
+ spinlock_t psb_lock; /* protection lock */
+ unsigned long psb_refcnt; /* how many users for the buffer */
+ int psb_flags; /* bitvector of flags */
+
void *psb_addr; /* points to location of first entry */
unsigned long psb_entries; /* maximum number of entries */
unsigned long psb_size; /* aligned size of buffer */
- unsigned long psb_index; /* next free entry slot */
+ unsigned long psb_index; /* next free entry slot XXX: must use the one in buffer */
unsigned long psb_entry_size; /* size of each entry including entry header */
perfmon_smpl_hdr_t *psb_hdr; /* points to sampling buffer header */
+
+ struct _pfm_smpl_buffer_desc *psb_next; /* next psb, used for rvfreeing of psb_hdr */
+
} pfm_smpl_buffer_desc_t;
+#define LOCK_PSB(p) spin_lock(&(p)->psb_lock)
+#define UNLOCK_PSB(p) spin_unlock(&(p)->psb_lock)
+
+#define PFM_PSB_VMA 0x1 /* a VMA is describing the buffer */
/*
- * This structure is initialized at boot time and contains
- * a description of the PMU main characteristic as indicated
- * by PAL
+ * The possible type of a PMU register
*/
-typedef struct {
- unsigned long pfm_is_disabled; /* indicates if perfmon is working properly */
- unsigned long perf_ovfl_val; /* overflow value for generic counters */
- unsigned long max_counters; /* upper limit on counter pair (PMC/PMD) */
- unsigned long num_pmcs ; /* highest PMC implemented (may have holes) */
- unsigned long num_pmds; /* highest PMD implemented (may have holes) */
- unsigned long impl_regs[16]; /* buffer used to hold implememted PMC/PMD mask */
-} pmu_config_t;
-
-#define PERFMON_IS_DISABLED() pmu_conf.pfm_is_disabled
+typedef enum {
+ PFM_REG_NOTIMPL, /* not implemented */
+ PFM_REG_NONE, /* end marker */
+ PFM_REG_MONITOR, /* a PMC with a pmc.pm field only */
+ PFM_REG_COUNTING,/* a PMC with a pmc.pm AND pmc.oi, a PMD used as a counter */
+ PFM_REG_CONTROL, /* PMU control register */
+ PFM_REG_CONFIG, /* refine configuration */
+ PFM_REG_BUFFER /* PMD used as buffer */
+} pfm_pmu_reg_type_t;
+/*
+ * 64-bit software counter structure
+ */
typedef struct {
- __u64 val; /* virtual 64bit counter value */
- __u64 ival; /* initial value from user */
- __u64 smpl_rval; /* reset value on sampling overflow */
- __u64 ovfl_rval; /* reset value on overflow */
- int flags; /* notify/do not notify */
+ u64 val; /* virtual 64bit counter value */
+ u64 ival; /* initial value from user */
+ u64 long_reset; /* reset value on sampling overflow */
+ u64 short_reset;/* reset value on overflow */
+ u64 reset_pmds[4]; /* which other pmds to reset when this counter overflows */
+ int flags; /* notify/do not notify */
} pfm_counter_t;
-#define PMD_OVFL_NOTIFY(ctx, i) ((ctx)->ctx_pmds[i].flags & PFM_REGFL_OVFL_NOTIFY)
/*
- * perfmon context. One per process, is cloned on fork() depending on inheritance flags
+ * perfmon context. One per process, is cloned on fork() depending on
+ * inheritance flags
*/
typedef struct {
- unsigned int inherit:2; /* inherit mode */
- unsigned int noblock:1; /* block/don't block on overflow with notification */
- unsigned int system:1; /* do system wide monitoring */
- unsigned int frozen:1; /* pmu must be kept frozen on ctxsw in */
- unsigned int exclintr:1;/* exlcude interrupts from system wide monitoring */
- unsigned int reserved:26;
+ unsigned int state:1; /* 0=disabled, 1=enabled */
+ unsigned int inherit:2; /* inherit mode */
+ unsigned int block:1; /* when 1, task will blocked on user notifications */
+ unsigned int system:1; /* do system wide monitoring */
+ unsigned int frozen:1; /* pmu must be kept frozen on ctxsw in */
+ unsigned int protected:1; /* allow access to creator of context only */
+ unsigned int using_dbreg:1; /* using range restrictions (debug registers) */
+ unsigned int reserved:24;
} pfm_context_flags_t;
+/*
+ * perfmon context: encapsulates all the state of a monitoring session
+ * XXX: probably need to change layout
+ */
typedef struct pfm_context {
+ pfm_smpl_buffer_desc_t *ctx_psb; /* sampling buffer, if any */
+ unsigned long ctx_smpl_vaddr; /* user level virtual address of smpl buffer */
- pfm_smpl_buffer_desc_t *ctx_smpl_buf; /* sampling buffer descriptor, if any */
- unsigned long ctx_dear_counter; /* which PMD holds D-EAR */
- unsigned long ctx_iear_counter; /* which PMD holds I-EAR */
- unsigned long ctx_btb_counter; /* which PMD holds BTB */
-
- spinlock_t ctx_notify_lock;
+ spinlock_t ctx_lock;
pfm_context_flags_t ctx_flags; /* block/noblock */
- int ctx_notify_sig; /* XXX: SIGPROF or other */
+
struct task_struct *ctx_notify_task; /* who to notify on overflow */
- struct task_struct *ctx_creator; /* pid of creator (debug) */
+ struct task_struct *ctx_owner; /* pid of creator (debug) */
+
+ unsigned long ctx_ovfl_regs[4]; /* which registers overflowed (notification) */
+ unsigned long ctx_smpl_regs[4]; /* which registers to record on overflow */
+
+ struct semaphore ctx_restart_sem; /* use for blocking notification mode */
+
+ unsigned long ctx_used_pmds[4]; /* bitmask of PMD used */
+ unsigned long ctx_reload_pmds[4]; /* bitmask of PMD to reload on ctxsw */
- unsigned long ctx_ovfl_regs; /* which registers just overflowed (notification) */
- unsigned long ctx_smpl_regs; /* which registers to record on overflow */
+ unsigned long ctx_used_pmcs[4]; /* bitmask PMC used by context */
+ unsigned long ctx_reload_pmcs[4]; /* bitmask of PMC to reload on ctxsw */
- struct semaphore ctx_restart_sem; /* use for blocking notification mode */
+ unsigned long ctx_used_ibrs[4]; /* bitmask of used IBR (speedup ctxsw) */
+ unsigned long ctx_used_dbrs[4]; /* bitmask of used DBR (speedup ctxsw) */
- unsigned long ctx_used_pmds[4]; /* bitmask of used PMD (speedup ctxsw) */
- unsigned long ctx_used_pmcs[4]; /* bitmask of used PMC (speedup ctxsw) */
+ pfm_counter_t ctx_soft_pmds[IA64_NUM_PMD_REGS]; /* XXX: size should be dynamic */
- pfm_counter_t ctx_pmds[IA64_NUM_PMD_COUNTERS]; /* XXX: size should be dynamic */
+ u64 ctx_saved_psr; /* copy of psr used for lazy ctxsw */
+ unsigned long ctx_saved_cpus_allowed; /* copy of the task cpus_allowed (system wide) */
+ unsigned long ctx_cpu; /* cpu to which perfmon is applied (system wide) */
+ atomic_t ctx_saving_in_progress; /* flag indicating actual save in progress */
+ atomic_t ctx_is_busy; /* context accessed by overflow handler */
+ atomic_t ctx_last_cpu; /* CPU id of current or last CPU used */
} pfm_context_t;
-#define CTX_USED_PMD(ctx,n) (ctx)->ctx_used_pmds[(n)>>6] |= 1<< ((n) % 64)
-#define CTX_USED_PMC(ctx,n) (ctx)->ctx_used_pmcs[(n)>>6] |= 1<< ((n) % 64)
+#define ctx_fl_inherit ctx_flags.inherit
+#define ctx_fl_block ctx_flags.block
+#define ctx_fl_system ctx_flags.system
+#define ctx_fl_frozen ctx_flags.frozen
+#define ctx_fl_protected ctx_flags.protected
+#define ctx_fl_using_dbreg ctx_flags.using_dbreg
-#define ctx_fl_inherit ctx_flags.inherit
-#define ctx_fl_noblock ctx_flags.noblock
-#define ctx_fl_system ctx_flags.system
-#define ctx_fl_frozen ctx_flags.frozen
-#define ctx_fl_exclintr ctx_flags.exclintr
+/*
+ * global information about all sessions
+ * mostly used to synchronize between system wide and per-process
+ */
+typedef struct {
+ spinlock_t pfs_lock; /* lock the structure */
-#define CTX_OVFL_NOBLOCK(c) ((c)->ctx_fl_noblock == 1)
-#define CTX_INHERIT_MODE(c) ((c)->ctx_fl_inherit)
-#define CTX_HAS_SMPL(c) ((c)->ctx_smpl_buf != NULL)
+ unsigned long pfs_task_sessions; /* number of per task sessions */
+ unsigned long pfs_sys_sessions; /* number of per system wide sessions */
+ unsigned long pfs_sys_use_dbregs; /* incremented when a system wide session uses debug regs */
+ unsigned long pfs_ptrace_use_dbregs; /* incremented when a process uses debug regs */
+ struct task_struct *pfs_sys_session[NR_CPUS]; /* point to task owning a system-wide session */
+} pfm_session_t;
-static pmu_config_t pmu_conf;
+/*
+ * information about a PMC or PMD.
+ * dep_pmd[]: a bitmask of dependent PMD registers
+ * dep_pmc[]: a bitmask of dependent PMC registers
+ */
+typedef struct {
+ pfm_pmu_reg_type_t type;
+ int pm_pos;
+ int (*read_check)(struct task_struct *task, unsigned int cnum, unsigned long *val);
+ int (*write_check)(struct task_struct *task, unsigned int cnum, unsigned long *val);
+ unsigned long dep_pmd[4];
+ unsigned long dep_pmc[4];
+} pfm_reg_desc_t;
+/* assume cnum is a valid monitor */
+#define PMC_PM(cnum, val) (((val) >> (pmu_conf.pmc_desc[cnum].pm_pos)) & 0x1)
+#define PMC_WR_FUNC(cnum) (pmu_conf.pmc_desc[cnum].write_check)
+#define PMD_WR_FUNC(cnum) (pmu_conf.pmd_desc[cnum].write_check)
+#define PMD_RD_FUNC(cnum) (pmu_conf.pmd_desc[cnum].read_check)
-/* for debug only */
-static int pfm_debug=0; /* 0= nodebug, >0= debug output on */
+/*
+ * This structure is initialized at boot time and contains
+ * a description of the PMU main characteristic as indicated
+ * by PAL along with a list of inter-registers dependencies and configurations.
+ */
+typedef struct {
+ unsigned long pfm_is_disabled; /* indicates if perfmon is working properly */
+ unsigned long perf_ovfl_val; /* overflow value for generic counters */
+ unsigned long max_counters; /* upper limit on counter pair (PMC/PMD) */
+ unsigned long num_pmcs ; /* highest PMC implemented (may have holes) */
+ unsigned long num_pmds; /* highest PMD implemented (may have holes) */
+ unsigned long impl_regs[16]; /* buffer used to hold implememted PMC/PMD mask */
+ unsigned long num_ibrs; /* number of instruction debug registers */
+ unsigned long num_dbrs; /* number of data debug registers */
+ pfm_reg_desc_t *pmc_desc; /* detailed PMC register descriptions */
+ pfm_reg_desc_t *pmd_desc; /* detailed PMD register descriptions */
+} pmu_config_t;
-#define DBprintk(a) \
- do { \
- if (pfm_debug >0) { printk(__FUNCTION__" %d: ", __LINE__); printk a; } \
- } while (0);
-static void ia64_reset_pmu(void);
+/*
+ * structure used to pass argument to/from remote CPU
+ * using IPI to check and possibly save the PMU context on SMP systems.
+ *
+ * not used in UP kernels
+ */
+typedef struct {
+ struct task_struct *task; /* which task we are interested in */
+ int retval; /* return value of the call: 0=you can proceed, 1=need to wait for completion */
+} pfm_smp_ipi_arg_t;
/*
- * structure used to pass information between the interrupt handler
- * and the tasklet.
+ * perfmon command descriptions
*/
typedef struct {
- pid_t to_pid; /* which process to notify */
- pid_t from_pid; /* which process is source of overflow */
- int sig; /* with which signal */
- unsigned long bitvect; /* which counters have overflowed */
-} notification_info_t;
+ int (*cmd_func)(struct task_struct *task, pfm_context_t *ctx, void *arg, int count, struct pt_regs *regs);
+ int cmd_flags;
+ unsigned int cmd_narg;
+ size_t cmd_argsize;
+} pfm_cmd_desc_t;
+
+#define PFM_CMD_PID 0x1 /* command requires pid argument */
+#define PFM_CMD_ARG_READ 0x2 /* command must read argument(s) */
+#define PFM_CMD_ARG_WRITE 0x4 /* command must write argument(s) */
+#define PFM_CMD_CTX 0x8 /* command needs a perfmon context */
+#define PFM_CMD_NOCHK 0x10 /* command does not need to check task's state */
+
+#define PFM_CMD_IDX(cmd) (cmd)
+
+#define PFM_CMD_IS_VALID(cmd) ((PFM_CMD_IDX(cmd) >= 0) && (PFM_CMD_IDX(cmd) < PFM_CMD_COUNT) \
+ && pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_func != NULL)
+
+#define PFM_CMD_USE_PID(cmd) ((pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_flags & PFM_CMD_PID) != 0)
+#define PFM_CMD_READ_ARG(cmd) ((pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_flags & PFM_CMD_ARG_READ) != 0)
+#define PFM_CMD_WRITE_ARG(cmd) ((pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_flags & PFM_CMD_ARG_WRITE) != 0)
+#define PFM_CMD_USE_CTX(cmd) ((pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_flags & PFM_CMD_CTX) != 0)
+#define PFM_CMD_CHK(cmd) ((pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_flags & PFM_CMD_NOCHK) == 0)
+
+#define PFM_CMD_ARG_MANY -1 /* cannot be zero */
+#define PFM_CMD_NARG(cmd) (pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_narg)
+#define PFM_CMD_ARG_SIZE(cmd) (pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_argsize)
+typedef struct {
+ int debug; /* turn on/off debugging via syslog */
+ int fastctxsw; /* turn on/off fast (unsecure) ctxsw */
+} pfm_sysctl_t;
typedef struct {
- unsigned long pfs_proc_sessions;
- unsigned long pfs_sys_session; /* can only be 0/1 */
- unsigned long pfs_dfl_dcr; /* XXX: hack */
- unsigned int pfs_pp;
-} pfm_session_t;
+ unsigned long pfm_spurious_ovfl_intr_count; /* keep track of spurious ovfl interrupts */
+ unsigned long pfm_ovfl_intr_count; /* keep track of spurious ovfl interrupts */
+ unsigned long pfm_recorded_samples_count;
+ unsigned long pfm_restore_dbrs;
+ unsigned long pfm_ctxsw_reload_pmds;
+ unsigned long pfm_ctxsw_used_pmds;
+} pfm_stats_t;
+
+/*
+ * perfmon internal variables
+ */
+static pmu_config_t pmu_conf; /* PMU configuration */
+static int pfm_debug_mode; /* 0= nodebug, >0= debug output on */
+static pfm_session_t pfm_sessions; /* global sessions information */
+static struct proc_dir_entry *perfmon_dir; /* for debug only */
+static pfm_stats_t pfm_stats;
+
+/* sysctl() controls */
+static pfm_sysctl_t pfm_sysctl;
+
+static ctl_table pfm_ctl_table[]={
+ {1, "debug", &pfm_sysctl.debug, sizeof(int), 0666, NULL, &proc_dointvec, NULL,},
+ {1, "fastctxsw", &pfm_sysctl.fastctxsw, sizeof(int), 0600, NULL, &proc_dointvec, NULL,},
+ { 0, },
+};
+static ctl_table pfm_sysctl_dir[] = {
+ {1, "perfmon", NULL, 0, 0755, pfm_ctl_table, },
+ {0,},
+};
+static ctl_table pfm_sysctl_root[] = {
+ {1, "kernel", NULL, 0, 0755, pfm_sysctl_dir, },
+ {0,},
+};
+static struct ctl_table_header *pfm_sysctl_header;
+
+static unsigned long reset_pmcs[IA64_NUM_PMC_REGS]; /* contains PAL reset values for PMCS */
-struct {
+static void pfm_vm_close(struct vm_area_struct * area);
+
+static struct vm_operations_struct pfm_vm_ops={
+ close: pfm_vm_close
+};
+
+/*
+ * keep track of task owning the PMU per CPU.
+ */
+static struct {
struct task_struct *owner;
} ____cacheline_aligned pmu_owners[NR_CPUS];
-/*
- * helper macros
- */
-#define SET_PMU_OWNER(t) do { pmu_owners[smp_processor_id()].owner = (t); } while(0);
-#define PMU_OWNER() pmu_owners[smp_processor_id()].owner
+/*
+ * forward declarations
+ */
+static void ia64_reset_pmu(struct task_struct *);
#ifdef CONFIG_SMP
-#define PFM_CAN_DO_LAZY() (smp_num_cpus==1 && pfs_info.pfs_sys_session==0)
-#else
-#define PFM_CAN_DO_LAZY() (pfs_info.pfs_sys_session==0)
+static void pfm_fetch_regs(int cpu, struct task_struct *task, pfm_context_t *ctx);
#endif
-
static void pfm_lazy_save_regs (struct task_struct *ta);
-/* for debug only */
-static struct proc_dir_entry *perfmon_dir;
+#if defined(CONFIG_ITANIUM)
+#include "perfmon_itanium.h"
+#elif defined(CONFIG_MCKINLEY)
+#include "perfmon_mckinley.h"
+#else
+#include "perfmon_generic.h"
+#endif
+
+static inline unsigned long
+pfm_read_soft_counter(pfm_context_t *ctx, int i)
+{
+ return ctx->ctx_soft_pmds[i].val + (ia64_get_pmd(i) & pmu_conf.perf_ovfl_val);
+}
-/*
- * XXX: hack to indicate that a system wide monitoring session is active
- */
-static pfm_session_t pfs_info;
+static inline void
+pfm_write_soft_counter(pfm_context_t *ctx, int i, unsigned long val)
+{
+ ctx->ctx_soft_pmds[i].val = val & ~pmu_conf.perf_ovfl_val;
+ /*
+ * writing to unimplemented part is ignore, so we do not need to
+ * mask off top part
+ */
+ ia64_set_pmd(i, val & pmu_conf.perf_ovfl_val);
+}
/*
* finds the number of PM(C|D) registers given
@@ -324,10 +454,10 @@
* Generates a unique (per CPU) timestamp
*/
static inline unsigned long
-perfmon_get_stamp(void)
+pfm_get_stamp(void)
{
/*
- * XXX: maybe find something more efficient
+ * XXX: must find something more efficient
*/
return ia64_get_itc();
}
@@ -353,80 +483,184 @@
}
}
}
- DBprintk(("uv2kva(%lx-->%lx)\n", adr, ret));
+ DBprintk(("[%d] uv2kva(%lx-->%lx)\n", current->pid, adr, ret));
return ret;
}
-
/* Here we want the physical address of the memory.
* This is used when initializing the contents of the
* area and marking the pages as reserved.
*/
static inline unsigned long
-kvirt_to_pa(unsigned long adr)
+pfm_kvirt_to_pa(unsigned long adr)
{
__u64 pa = ia64_tpa(adr);
- DBprintk(("kv2pa(%lx-->%lx)\n", adr, pa));
+ //DBprintk(("kv2pa(%lx-->%lx)\n", adr, pa));
return pa;
}
static void *
-rvmalloc(unsigned long size)
+pfm_rvmalloc(unsigned long size)
{
void *mem;
unsigned long adr, page;
- /* XXX: may have to revisit this part because
- * vmalloc() does not necessarily return a page-aligned buffer.
- * This maybe a security problem when mapped at user level
- */
mem=vmalloc(size);
if (mem) {
+ //printk("perfmon: CPU%d pfm_rvmalloc(%ld)=%p\n", smp_processor_id(), size, mem);
memset(mem, 0, size); /* Clear the ram out, no junk to the user */
adr=(unsigned long) mem;
while (size > 0) {
- page = kvirt_to_pa(adr);
+ page = pfm_kvirt_to_pa(adr);
mem_map_reserve(virt_to_page(__va(page)));
- adr+=PAGE_SIZE;
- size-=PAGE_SIZE;
+ adr += PAGE_SIZE;
+ size -= PAGE_SIZE;
}
}
return mem;
}
static void
-rvfree(void *mem, unsigned long size)
+pfm_rvfree(void *mem, unsigned long size)
{
- unsigned long adr, page;
+ unsigned long adr, page = 0;
if (mem) {
adr=(unsigned long) mem;
while (size > 0) {
- page = kvirt_to_pa(adr);
+ page = pfm_kvirt_to_pa(adr);
mem_map_unreserve(virt_to_page(__va(page)));
adr+=PAGE_SIZE;
size-=PAGE_SIZE;
}
vfree(mem);
}
+ return;
+}
+
+/*
+ * This function gets called from mm/mmap.c:exit_mmap() only when there is a sampling buffer
+ * attached to the context AND the current task has a mapping for it, i.e., it is the original
+ * creator of the context.
+ *
+ * This function is used to remember the fact that the vma describing the sampling buffer
+ * has now been removed. It can only be called when no other tasks share the same mm context.
+ *
+ */
+static void
+pfm_vm_close(struct vm_area_struct *vma)
+{
+ pfm_smpl_buffer_desc_t *psb = (pfm_smpl_buffer_desc_t *)vma->vm_private_data;
+
+ if (psb == NULL) {
+ printk("perfmon: psb is null in [%d]\n", current->pid);
+ return;
+ }
+ /*
+ * Add PSB to list of buffers to free on release_thread() when no more users
+ *
+ * This call is safe because, once the count is zero is cannot be modified anymore.
+ * This is not because there is no more user of the mm context, that the sampling
+ * buffer is not being used anymore outside of this task. In fact, it can still
+ * be accessed from within the kernel by another task (such as the monitored task).
+ *
+ * Therefore, we only move the psb into the list of buffers to free when we know
+ * nobody else is using it.
+ * The linked list if independent of the perfmon context, because in the case of
+ * multi-threaded processes, the last thread may not have been involved with
+ * monitoring however it will be the one removing the vma and it should therefore
+ * also remove the sampling buffer. This buffer cannot be removed until the vma
+ * is removed.
+ *
+ * This function cannot remove the buffer from here, because exit_mmap() must first
+ * complete. Given that there is no other vma related callback in the generic code,
+ * we have created on own with the linked list of sampling buffer to free which
+ * is part of the thread structure. In release_thread() we check if the list is
+ * empty. If not we call into perfmon to free the buffer and psb. That is the only
+ * way to ensure a safe deallocation of the sampling buffer which works when
+ * the buffer is shared between distinct processes or with multi-threaded programs.
+ *
+ * We need to lock the psb because the refcnt test and flag manipulation must
+ * looked like an atomic operation vis a vis pfm_context_exit()
+ */
+ LOCK_PSB(psb);
+
+ if (psb->psb_refcnt == 0) {
+
+ psb->psb_next = current->thread.pfm_smpl_buf_list;
+ current->thread.pfm_smpl_buf_list = psb;
+
+ DBprintk(("psb for [%d] smpl @%p size %ld inserted into list\n",
+ current->pid, psb->psb_hdr, psb->psb_size));
+ }
+ DBprintk(("psb vma flag cleared for [%d] smpl @%p size %ld inserted into list\n",
+ current->pid, psb->psb_hdr, psb->psb_size));
+
+ /*
+ * indicate to pfm_context_exit() that the vma has been removed.
+ */
+ psb->psb_flags &= ~PFM_PSB_VMA;
+
+ UNLOCK_PSB(psb);
+}
+
+/*
+ * This function is called from pfm_destroy_context() and also from pfm_inherit()
+ * to explicitely remove the sampling buffer mapping from the user level address space.
+ */
+static int
+pfm_remove_smpl_mapping(struct task_struct *task)
+{
+ pfm_context_t *ctx = task->thread.pfm_context;
+ pfm_smpl_buffer_desc_t *psb;
+ int r;
+
+ /*
+ * some sanity checks first
+ */
+ if (ctx == NULL || task->mm == NULL || ctx->ctx_smpl_vaddr == 0 || ctx->ctx_psb == NULL) {
+ printk("perfmon: invalid context mm=%p\n", task->mm);
+ return -1;
+ }
+ psb = ctx->ctx_psb;
+
+ down_write(&task->mm->mmap_sem);
+
+ r = do_munmap(task->mm, ctx->ctx_smpl_vaddr, psb->psb_size);
+
+ up_write(&task->mm->mmap_sem);
+ if (r !=0) {
+ printk("perfmon: pid %d unable to unmap sampling buffer @0x%lx size=%ld\n",
+ task->pid, ctx->ctx_smpl_vaddr, psb->psb_size);
+ }
+ DBprintk(("[%d] do_unmap(0x%lx, %ld)=%d\n",
+ task->pid, ctx->ctx_smpl_vaddr, psb->psb_size, r));
+
+ /*
+ * make sure we suppress all traces of this buffer
+ * (important for pfm_inherit)
+ */
+ ctx->ctx_smpl_vaddr = 0;
+
+ return 0;
}
static pfm_context_t *
pfm_context_alloc(void)
{
- pfm_context_t *pfc;
+ pfm_context_t *ctx;
/* allocate context descriptor */
- pfc = vmalloc(sizeof(*pfc));
- if (pfc) memset(pfc, 0, sizeof(*pfc));
-
- return pfc;
+ ctx = kmalloc(sizeof(pfm_context_t), GFP_KERNEL);
+ if (ctx) memset(ctx, 0, sizeof(pfm_context_t));
+
+ return ctx;
}
static void
-pfm_context_free(pfm_context_t *pfc)
+pfm_context_free(pfm_context_t *ctx)
{
- if (pfc) vfree(pfc);
+ if (ctx) kfree(ctx);
}
static int
@@ -434,11 +668,13 @@
{
unsigned long page;
+ DBprintk(("CPU%d buf=0x%lx addr=0x%lx size=%ld\n", smp_processor_id(), buf, addr, size));
+
while (size > 0) {
- page = kvirt_to_pa(buf);
+ page = pfm_kvirt_to_pa(buf);
if (remap_page_range(addr, page, PAGE_SIZE, PAGE_SHARED)) return -ENOMEM;
-
+
addr += PAGE_SIZE;
buf += PAGE_SIZE;
size -= PAGE_SIZE;
@@ -458,7 +694,7 @@
for (i=0; i < size; i++, which++) res += hweight64(*which);
- DBprintk((" res=%ld\n", res));
+ DBprintk(("weight=%ld\n", res));
return res;
}
@@ -467,15 +703,16 @@
* Allocates the sampling buffer and remaps it into caller's address space
*/
static int
-pfm_smpl_buffer_alloc(pfm_context_t *ctx, unsigned long which_pmds, unsigned long entries, void **user_addr)
+pfm_smpl_buffer_alloc(pfm_context_t *ctx, unsigned long *which_pmds, unsigned long entries,
+ void **user_vaddr)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
- unsigned long addr, size, regcount;
+ struct vm_area_struct *vma = NULL;
+ unsigned long size, regcount;
void *smpl_buf;
pfm_smpl_buffer_desc_t *psb;
- regcount = pfm_smpl_entry_size(&which_pmds, 1);
+ regcount = pfm_smpl_entry_size(which_pmds, 1);
/* note that regcount might be 0, in this case only the header for each
* entry will be recorded.
@@ -488,133 +725,207 @@
+ entries * (sizeof(perfmon_smpl_entry_t) + regcount*sizeof(u64)));
/*
* check requested size to avoid Denial-of-service attacks
- * XXX: may have to refine this test
+ * XXX: may have to refine this test
+ * Check against address space limit.
+ *
+ * if ((mm->total_vm << PAGE_SHIFT) + len> current->rlim[RLIMIT_AS].rlim_cur)
+ * return -ENOMEM;
*/
if (size > current->rlim[RLIMIT_MEMLOCK].rlim_cur) return -EAGAIN;
- /* find some free area in address space */
- addr = get_unmapped_area(NULL, 0, size, 0, MAP_PRIVATE);
- if (!addr) goto no_addr;
+ /*
+ * We do the easy to undo allocations first.
+ *
+ * pfm_rvmalloc(), clears the buffer, so there is no leak
+ */
+ smpl_buf = pfm_rvmalloc(size);
+ if (smpl_buf == NULL) {
+ DBprintk(("Can't allocate sampling buffer\n"));
+ return -ENOMEM;
+ }
+
+ DBprintk(("smpl_buf @%p\n", smpl_buf));
- DBprintk((" entries=%ld aligned size=%ld, unmapped @0x%lx\n", entries, size, addr));
+ /* allocate sampling buffer descriptor now */
+ psb = kmalloc(sizeof(*psb), GFP_KERNEL);
+ if (psb == NULL) {
+ DBprintk(("Can't allocate sampling buffer descriptor\n"));
+ pfm_rvfree(smpl_buf, size);
+ return -ENOMEM;
+ }
/* allocate vma */
vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
- if (!vma) goto no_vma;
-
- /* XXX: see rvmalloc() for page alignment problem */
- smpl_buf = rvmalloc(size);
- if (smpl_buf == NULL) goto no_buffer;
-
- DBprintk((" smpl_buf @%p\n", smpl_buf));
-
- if (pfm_remap_buffer((unsigned long)smpl_buf, addr, size)) goto cant_remap;
-
- /* allocate sampling buffer descriptor now */
- psb = vmalloc(sizeof(*psb));
- if (psb == NULL) goto no_buffer_desc;
+ if (!vma) {
+ DBprintk(("Cannot allocate vma\n"));
+ goto error;
+ }
+ /*
+ * partially initialize the vma for the sampling buffer
+ */
+ vma->vm_mm = mm;
+ vma->vm_flags = VM_READ| VM_MAYREAD |VM_RESERVED;
+ vma->vm_page_prot = PAGE_READONLY; /* XXX may need to change */
+ vma->vm_ops = &pfm_vm_ops; /* necesarry to get the close() callback */
+ vma->vm_pgoff = 0;
+ vma->vm_file = NULL;
+ vma->vm_raend = 0;
+ vma->vm_private_data = psb; /* information needed by the pfm_vm_close() function */
- /* start with something clean */
- memset(smpl_buf, 0x0, size);
+ /*
+ * Now we have everything we need and we can initialize
+ * and connect all the data structures
+ */
psb->psb_hdr = smpl_buf;
- psb->psb_addr = (char *)smpl_buf+sizeof(perfmon_smpl_hdr_t); /* first entry */
+ psb->psb_addr = ((char *)smpl_buf)+sizeof(perfmon_smpl_hdr_t); /* first entry */
psb->psb_size = size; /* aligned size */
psb->psb_index = 0;
psb->psb_entries = entries;
+ psb->psb_flags = PFM_PSB_VMA; /* remember that there is a vma describing the buffer */
+ psb->psb_refcnt = 1;
- atomic_set(&psb->psb_refcnt, 1);
+ spin_lock_init(&psb->psb_lock);
+ /*
+ * XXX: will need to do cacheline alignment to avoid false sharing in SMP mode and
+ * multitask monitoring.
+ */
psb->psb_entry_size = sizeof(perfmon_smpl_entry_t) + regcount*sizeof(u64);
- DBprintk((" psb @%p entry_size=%ld hdr=%p addr=%p\n", (void *)psb,psb->psb_entry_size, (void *)psb->psb_hdr, (void *)psb->psb_addr));
-
- /* initialize some of the fields of header */
- psb->psb_hdr->hdr_version = PFM_SMPL_HDR_VERSION;
- psb->psb_hdr->hdr_entry_size = sizeof(perfmon_smpl_entry_t)+regcount*sizeof(u64);
- psb->psb_hdr->hdr_pmds = which_pmds;
-
- /* store which PMDS to record */
- ctx->ctx_smpl_regs = which_pmds;
+ DBprintk(("psb @%p entry_size=%ld hdr=%p addr=%p\n",
+ (void *)psb,psb->psb_entry_size, (void *)psb->psb_hdr,
+ (void *)psb->psb_addr));
- /* link to perfmon context */
- ctx->ctx_smpl_buf = psb;
+ /* initialize some of the fields of user visible buffer header */
+ psb->psb_hdr->hdr_version = PFM_SMPL_VERSION;
+ psb->psb_hdr->hdr_entry_size = psb->psb_entry_size;
+ psb->psb_hdr->hdr_pmds[0] = which_pmds[0];
/*
- * initialize the vma for the sampling buffer
+ * Let's do the difficult operations next.
+ *
+ * now we atomically find some area in the address space and
+ * remap the buffer in it.
*/
- vma->vm_mm = mm;
- vma->vm_start = addr;
- vma->vm_end = addr + size;
- vma->vm_flags = VM_READ|VM_MAYREAD;
- vma->vm_page_prot = PAGE_READONLY; /* XXX may need to change */
- vma->vm_ops = NULL;
- vma->vm_pgoff = 0;
- vma->vm_file = NULL;
- vma->vm_raend = 0;
+ down_write(¤t->mm->mmap_sem);
+
+
+ /* find some free area in address space, must have mmap sem held */
+ vma->vm_start = get_unmapped_area(NULL, 0, size, 0, MAP_PRIVATE|MAP_ANONYMOUS);
+ if (vma->vm_start == 0UL) {
+ DBprintk(("Cannot find unmapped area for size %ld\n", size));
+ up_write(¤t->mm->mmap_sem);
+ goto error;
+ }
+ vma->vm_end = vma->vm_start + size;
- vma->vm_private_data = ctx; /* link to pfm_context(not yet used) */
+ DBprintk(("entries=%ld aligned size=%ld, unmapped @0x%lx\n", entries, size, vma->vm_start));
+
+ /* can only be applied to current, need to have the mm semaphore held when called */
+ if (pfm_remap_buffer((unsigned long)smpl_buf, vma->vm_start, size)) {
+ DBprintk(("Can't remap buffer\n"));
+ up_write(¤t->mm->mmap_sem);
+ goto error;
+ }
/*
- * now insert the vma in the vm list for the process
+ * now insert the vma in the vm list for the process, must be
+ * done with mmap lock held
*/
insert_vm_struct(mm, vma);
mm->total_vm += size >> PAGE_SHIFT;
+ up_write(¤t->mm->mmap_sem);
+
+ /* store which PMDS to record */
+ ctx->ctx_smpl_regs[0] = which_pmds[0];
+
+
+ /* link to perfmon context */
+ ctx->ctx_psb = psb;
+
/*
- * that's the address returned to the user
+ * keep track of user level virtual address
*/
- *user_addr = (void *)addr;
+ ctx->ctx_smpl_vaddr = *(unsigned long *)user_vaddr = vma->vm_start;
return 0;
- /* outlined error handling */
-no_addr:
- DBprintk(("Cannot find unmapped area for size %ld\n", size));
- return -ENOMEM;
-no_vma:
- DBprintk(("Cannot allocate vma\n"));
- return -ENOMEM;
-cant_remap:
- DBprintk(("Can't remap buffer\n"));
- rvfree(smpl_buf, size);
-no_buffer:
- DBprintk(("Can't allocate sampling buffer\n"));
- kmem_cache_free(vm_area_cachep, vma);
- return -ENOMEM;
-no_buffer_desc:
- DBprintk(("Can't allocate sampling buffer descriptor\n"));
- kmem_cache_free(vm_area_cachep, vma);
- rvfree(smpl_buf, size);
+error:
+ pfm_rvfree(smpl_buf, size);
+ kfree(psb);
return -ENOMEM;
}
+/*
+ * XXX: do something better here
+ */
+static int
+pfm_bad_permissions(struct task_struct *task)
+{
+ /* stolen from bad_signal() */
+ return (current->session != task->session)
+ && (current->euid ^ task->suid) && (current->euid ^ task->uid)
+ && (current->uid ^ task->suid) && (current->uid ^ task->uid);
+}
+
+
static int
-pfx_is_sane(pfreq_context_t *pfx)
+pfx_is_sane(struct task_struct *task, pfarg_context_t *pfx)
{
int ctx_flags;
+ int cpu;
/* valid signal */
- //if (pfx->notify_sig < 1 || pfx->notify_sig >= _NSIG) return -EINVAL;
- if (pfx->notify_sig !=0 && pfx->notify_sig != SIGPROF) return -EINVAL;
/* cannot send to process 1, 0 means do not notify */
- if (pfx->notify_pid < 0 || pfx->notify_pid == 1) return -EINVAL;
-
- ctx_flags = pfx->flags;
+ if (pfx->ctx_notify_pid == 1) {
+ DBprintk(("invalid notify_pid %d\n", pfx->ctx_notify_pid));
+ return -EINVAL;
+ }
+ ctx_flags = pfx->ctx_flags;
if (ctx_flags & PFM_FL_SYSTEM_WIDE) {
-#ifdef CONFIG_SMP
- if (smp_num_cpus > 1) {
- printk("perfmon: system wide monitoring on SMP not yet supported\n");
+ DBprintk(("cpu_mask=0x%lx\n", pfx->ctx_cpu_mask));
+ /*
+ * cannot block in this mode
+ */
+ if (ctx_flags & PFM_FL_NOTIFY_BLOCK) {
+ DBprintk(("cannot use blocking mode when in system wide monitoring\n"));
return -EINVAL;
}
-#endif
- if ((ctx_flags & PFM_FL_SMPL_OVFL_NOBLOCK) == 0) {
- printk("perfmon: system wide monitoring cannot use blocking notification mode\n");
+ /*
+ * must only have one bit set in the CPU mask
+ */
+ if (hweight64(pfx->ctx_cpu_mask) != 1UL) {
+ DBprintk(("invalid CPU mask specified\n"));
+ return -EINVAL;
+ }
+ /*
+ * and it must be a valid CPU
+ */
+ cpu = ffs(pfx->ctx_cpu_mask);
+ if (cpu > smp_num_cpus) {
+ DBprintk(("CPU%d is not online\n", cpu));
+ return -EINVAL;
+ }
+ /*
+ * check for pre-existing pinning, if conflicting reject
+ */
+ if (task->cpus_allowed != ~0UL && (task->cpus_allowed & (1UL<pid,
+ task->cpus_allowed, cpu));
return -EINVAL;
}
+
+ } else {
+ /*
+ * must provide a target for the signal in blocking mode even when
+ * no counter is configured with PFM_FL_REG_OVFL_NOTIFY
+ */
+ if ((ctx_flags & PFM_FL_NOTIFY_BLOCK) && pfx->ctx_notify_pid == 0) return -EINVAL;
}
/* probably more to add here */
@@ -622,68 +933,97 @@
}
static int
-pfm_context_create(int flags, perfmon_req_t *req)
+pfm_create_context(struct task_struct *task, pfm_context_t *ctx, void *req, int count,
+ struct pt_regs *regs)
{
- pfm_context_t *ctx;
- struct task_struct *task = NULL;
- perfmon_req_t tmp;
+ pfarg_context_t tmp;
void *uaddr = NULL;
- int ret;
+ int ret, cpu = 0;
int ctx_flags;
- pid_t pid;
+ pid_t notify_pid;
- /* to go away */
- if (flags) {
- printk("perfmon: use context flags instead of perfmon() flags. Obsoleted API\n");
- }
+ /* a context has already been defined */
+ if (ctx) return -EBUSY;
+
+ /*
+ * not yet supported
+ */
+ if (task != current) return -EINVAL;
if (copy_from_user(&tmp, req, sizeof(tmp))) return -EFAULT;
- ret = pfx_is_sane(&tmp.pfr_ctx);
+ ret = pfx_is_sane(task, &tmp);
if (ret < 0) return ret;
- ctx_flags = tmp.pfr_ctx.flags;
+ ctx_flags = tmp.ctx_flags;
- if (ctx_flags & PFM_FL_SYSTEM_WIDE) {
+ ret = -EBUSY;
+
+ LOCK_PFS();
+
+ if (ctx_flags & PFM_FL_SYSTEM_WIDE) {
+
+ /* at this point, we know there is at least one bit set */
+ cpu = ffs(tmp.ctx_cpu_mask) - 1;
+
+ DBprintk(("requesting CPU%d currently on CPU%d\n",cpu, smp_processor_id()));
+
+ if (pfm_sessions.pfs_task_sessions > 0) {
+ DBprintk(("system wide not possible, task_sessions=%ld\n", pfm_sessions.pfs_task_sessions));
+ goto abort;
+ }
+
+ if (pfm_sessions.pfs_sys_session[cpu]) {
+ DBprintk(("system wide not possible, conflicting session [%d] on CPU%d\n",pfm_sessions.pfs_sys_session[cpu]->pid, cpu));
+ goto abort;
+ }
+ pfm_sessions.pfs_sys_session[cpu] = task;
/*
- * XXX: This is not AT ALL SMP safe
+ * count the number of system wide sessions
*/
- if (pfs_info.pfs_proc_sessions > 0) return -EBUSY;
- if (pfs_info.pfs_sys_session > 0) return -EBUSY;
-
- pfs_info.pfs_sys_session = 1;
+ pfm_sessions.pfs_sys_sessions++;
- } else if (pfs_info.pfs_sys_session >0) {
+ } else if (pfm_sessions.pfs_sys_sessions == 0) {
+ pfm_sessions.pfs_task_sessions++;
+ } else {
/* no per-process monitoring while there is a system wide session */
- return -EBUSY;
- } else
- pfs_info.pfs_proc_sessions++;
+ goto abort;
+ }
+
+ UNLOCK_PFS();
+
+ ret = -ENOMEM;
ctx = pfm_context_alloc();
if (!ctx) goto error;
- /* record the creator (debug only) */
- ctx->ctx_creator = current;
+ /* record the creator (important for inheritance) */
+ ctx->ctx_owner = current;
- pid = tmp.pfr_ctx.notify_pid;
+ notify_pid = tmp.ctx_notify_pid;
- spin_lock_init(&ctx->ctx_notify_lock);
+ spin_lock_init(&ctx->ctx_lock);
+
+ if (notify_pid == current->pid) {
- if (pid == current->pid) {
ctx->ctx_notify_task = task = current;
current->thread.pfm_context = ctx;
- atomic_set(¤t->thread.pfm_notifiers_check, 1);
+ } else if (notify_pid!=0) {
+ struct task_struct *notify_task;
- } else if (pid!=0) {
read_lock(&tasklist_lock);
- task = find_task_by_pid(pid);
- if (task) {
+ notify_task = find_task_by_pid(notify_pid);
+
+ if (notify_task) {
+
+ ret = -EPERM;
+
/*
- * record who to notify
- */
- ctx->ctx_notify_task = task;
+ * check if we can send this task a signal
+ */
+ if (pfm_bad_permissions(notify_task)) goto buffer_error;
/*
* make visible
@@ -702,7 +1042,9 @@
* task has been detached from the tasklist otherwise you are
* exposed to race conditions.
*/
- atomic_add(1, &task->thread.pfm_notifiers_check);
+ atomic_add(1, &ctx->ctx_notify_task->thread.pfm_notifiers_check);
+
+ ctx->ctx_notify_task = notify_task;
}
read_unlock(&tasklist_lock);
}
@@ -710,71 +1052,71 @@
/*
* notification process does not exist
*/
- if (pid != 0 && task == NULL) {
+ if (notify_pid != 0 && ctx->ctx_notify_task == NULL) {
ret = -EINVAL;
goto buffer_error;
}
- ctx->ctx_notify_sig = SIGPROF; /* siginfo imposes a fixed signal */
-
- if (tmp.pfr_ctx.smpl_entries) {
- DBprintk((" sampling entries=%ld\n",tmp.pfr_ctx.smpl_entries));
+ if (tmp.ctx_smpl_entries) {
+ DBprintk(("sampling entries=%ld\n",tmp.ctx_smpl_entries));
- ret = pfm_smpl_buffer_alloc(ctx, tmp.pfr_ctx.smpl_regs,
- tmp.pfr_ctx.smpl_entries, &uaddr);
+ ret = pfm_smpl_buffer_alloc(ctx, tmp.ctx_smpl_regs,
+ tmp.ctx_smpl_entries, &uaddr);
if (ret<0) goto buffer_error;
- tmp.pfr_ctx.smpl_vaddr = uaddr;
+ tmp.ctx_smpl_vaddr = uaddr;
}
/* initialization of context's flags */
- ctx->ctx_fl_inherit = ctx_flags & PFM_FL_INHERIT_MASK;
- ctx->ctx_fl_noblock = (ctx_flags & PFM_FL_SMPL_OVFL_NOBLOCK) ? 1 : 0;
- ctx->ctx_fl_system = (ctx_flags & PFM_FL_SYSTEM_WIDE) ? 1: 0;
- ctx->ctx_fl_exclintr = (ctx_flags & PFM_FL_EXCL_INTR) ? 1: 0;
- ctx->ctx_fl_frozen = 0;
-
- /*
- * Keep track of the pmds we want to sample
- * XXX: may be we don't need to save/restore the DEAR/IEAR pmds
- * but we do need the BTB for sure. This is because of a hardware
- * buffer of 1 only for non-BTB pmds.
- */
- ctx->ctx_used_pmds[0] = tmp.pfr_ctx.smpl_regs;
- ctx->ctx_used_pmcs[0] = 1; /* always save/restore PMC[0] */
+ ctx->ctx_fl_inherit = ctx_flags & PFM_FL_INHERIT_MASK;
+ ctx->ctx_fl_block = (ctx_flags & PFM_FL_NOTIFY_BLOCK) ? 1 : 0;
+ ctx->ctx_fl_system = (ctx_flags & PFM_FL_SYSTEM_WIDE) ? 1: 0;
+ ctx->ctx_fl_frozen = 0;
+ /*
+ * setting this flag to 0 here means, that the creator or the task that the
+ * context is being attached are granted access. Given that a context can only
+ * be created for the calling process this, in effect only allows the creator
+ * to access the context. See pfm_protect() for more.
+ */
+ ctx->ctx_fl_protected = 0;
+
+ /* for system wide mode only (only 1 bit set) */
+ ctx->ctx_cpu = cpu;
+
+ atomic_set(&ctx->ctx_last_cpu,-1); /* SMP only, means no CPU */
+
+ /* may be redudant with memset() but at least it's easier to remember */
+ atomic_set(&ctx->ctx_saving_in_progress, 0);
+ atomic_set(&ctx->ctx_is_busy, 0);
sema_init(&ctx->ctx_restart_sem, 0); /* init this semaphore to locked */
-
if (copy_to_user(req, &tmp, sizeof(tmp))) {
ret = -EFAULT;
goto buffer_error;
}
- DBprintk((" context=%p, pid=%d notify_sig %d notify_task=%p\n",(void *)ctx, current->pid, ctx->ctx_notify_sig, ctx->ctx_notify_task));
- DBprintk((" context=%p, pid=%d flags=0x%x inherit=%d noblock=%d system=%d\n",(void *)ctx, current->pid, ctx_flags, ctx->ctx_fl_inherit, ctx->ctx_fl_noblock, ctx->ctx_fl_system));
+ DBprintk(("context=%p, pid=%d notify_task=%p\n",
+ (void *)ctx, task->pid, ctx->ctx_notify_task));
+
+ DBprintk(("context=%p, pid=%d flags=0x%x inherit=%d block=%d system=%d\n",
+ (void *)ctx, task->pid, ctx_flags, ctx->ctx_fl_inherit,
+ ctx->ctx_fl_block, ctx->ctx_fl_system));
/*
* when no notification is required, we can make this visible at the last moment
*/
- if (pid == 0) current->thread.pfm_context = ctx;
-
+ if (notify_pid == 0) task->thread.pfm_context = ctx;
/*
- * by default, we always include interrupts for system wide
- * DCR.pp is set by default to zero by kernel in cpu_init()
+ * pin task to CPU and force reschedule on exit to ensure
+ * that when back to user level the task runs on the designated
+ * CPU.
*/
if (ctx->ctx_fl_system) {
- if (ctx->ctx_fl_exclintr == 0) {
- unsigned long dcr = ia64_get_dcr();
-
- ia64_set_dcr(dcr|IA64_DCR_PP);
- /*
- * keep track of the kernel default value
- */
- pfs_info.pfs_dfl_dcr = dcr;
-
- DBprintk((" dcr.pp is set\n"));
- }
- }
+ ctx->ctx_saved_cpus_allowed = task->cpus_allowed;
+ task->cpus_allowed = 1UL << cpu;
+ task->need_resched = 1;
+ DBprintk(("[%d] rescheduled allowed=0x%lx\n", task->pid,task->cpus_allowed));
+ }
return 0;
@@ -784,225 +1126,514 @@
/*
* undo session reservation
*/
+ LOCK_PFS();
+
if (ctx_flags & PFM_FL_SYSTEM_WIDE) {
- pfs_info.pfs_sys_session = 0;
+ pfm_sessions.pfs_sys_session[cpu] = NULL;
+ pfm_sessions.pfs_sys_sessions--;
} else {
- pfs_info.pfs_proc_sessions--;
+ pfm_sessions.pfs_task_sessions--;
}
+abort:
+ UNLOCK_PFS();
+
return ret;
}
static void
-pfm_reset_regs(pfm_context_t *ctx)
+pfm_reset_regs(pfm_context_t *ctx, unsigned long *ovfl_regs, int flag)
{
- unsigned long mask = ctx->ctx_ovfl_regs;
- int i, cnum;
+ unsigned long mask = ovfl_regs[0];
+ unsigned long reset_others = 0UL;
+ unsigned long val;
+ int i;
+
+ DBprintk(("masks=0x%lx\n", mask));
- DBprintk((" ovfl_regs=0x%lx\n", mask));
/*
* now restore reset value on sampling overflowed counters
*/
- for(i=0, cnum=PMU_FIRST_COUNTER; i < pmu_conf.max_counters; i++, cnum++, mask >>= 1) {
+ mask >>= PMU_FIRST_COUNTER;
+ for(i = PMU_FIRST_COUNTER; mask; i++, mask >>= 1) {
if (mask & 0x1) {
- DBprintk((" reseting PMD[%d]=%lx\n", cnum, ctx->ctx_pmds[i].smpl_rval & pmu_conf.perf_ovfl_val));
+ val = flag == PFM_RELOAD_LONG_RESET ?
+ ctx->ctx_soft_pmds[i].long_reset:
+ ctx->ctx_soft_pmds[i].short_reset;
+
+ reset_others |= ctx->ctx_soft_pmds[i].reset_pmds[0];
+
+ DBprintk(("[%d] %s reset soft_pmd[%d]=%lx\n",
+ current->pid,
+ flag == PFM_RELOAD_LONG_RESET ? "long" : "short", i, val));
/* upper part is ignored on rval */
- ia64_set_pmd(cnum, ctx->ctx_pmds[i].smpl_rval);
+ pfm_write_soft_counter(ctx, i, val);
+ }
+ }
- /*
- * we must reset BTB index (clears pmd16.full to make
- * sure we do not report the same branches twice.
- * The non-blocking case in handled in update_counters()
- */
- if (cnum == ctx->ctx_btb_counter) {
- DBprintk(("reseting PMD16\n"));
- ia64_set_pmd(16, 0);
- }
+ /*
+ * Now take care of resetting the other registers
+ */
+ for(i = 0; reset_others; i++, reset_others >>= 1) {
+
+ if ((reset_others & 0x1) == 0) continue;
+
+ val = flag == PFM_RELOAD_LONG_RESET ?
+ ctx->ctx_soft_pmds[i].long_reset:
+ ctx->ctx_soft_pmds[i].short_reset;
+
+ if (PMD_IS_COUNTING(i)) {
+ pfm_write_soft_counter(ctx, i, val);
+ } else {
+ ia64_set_pmd(i, val);
}
+
+ DBprintk(("[%d] %s reset_others pmd[%d]=%lx\n",
+ current->pid,
+ flag == PFM_RELOAD_LONG_RESET ? "long" : "short", i, val));
}
/* just in case ! */
- ctx->ctx_ovfl_regs = 0;
+ ctx->ctx_ovfl_regs[0] = 0UL;
}
static int
-pfm_write_pmcs(struct task_struct *ta, perfmon_req_t *req, int count)
+pfm_write_pmcs(struct task_struct *task, pfm_context_t *ctx, void *arg, int count, struct pt_regs *regs)
{
- struct thread_struct *th = &ta->thread;
- pfm_context_t *ctx = th->pfm_context;
- perfmon_req_t tmp;
- unsigned long cnum;
+ struct thread_struct *th = &task->thread;
+ pfarg_reg_t tmp, *req = (pfarg_reg_t *)arg;
+ unsigned int cnum;
int i;
+ int ret = 0, reg_retval = 0;
+
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
+
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
/* XXX: ctx locking may be required here */
for (i = 0; i < count; i++, req++) {
+
if (copy_from_user(&tmp, req, sizeof(tmp))) return -EFAULT;
- cnum = tmp.pfr_reg.reg_num;
+ cnum = tmp.reg_num;
- /* XXX needs to check validity of the data maybe */
- if (!PMC_IS_IMPL(cnum)) {
- DBprintk((" invalid pmc[%ld]\n", cnum));
- return -EINVAL;
+ /*
+ * we reject all non implemented PMC as well
+ * as attempts to modify PMC[0-3] which are used
+ * as status registers by the PMU
+ */
+ if (!PMC_IS_IMPL(cnum) || cnum < 4) {
+ DBprintk(("pmc[%u] is unimplemented or invalid\n", cnum));
+ ret = -EINVAL;
+ goto abort_mission;
+ }
+ /*
+ * A PMC used to configure monitors must be:
+ * - system-wide session: privileged monitor
+ * - per-task : user monitor
+ * any other configuration is rejected.
+ */
+ if (PMC_IS_MONITOR(cnum) || PMC_IS_COUNTING(cnum)) {
+ DBprintk(("pmc[%u].pm=%ld\n", cnum, PMC_PM(cnum, tmp.reg_value)));
+
+ if (ctx->ctx_fl_system ^ PMC_PM(cnum, tmp.reg_value)) {
+ DBprintk(("pmc_pm=%ld fl_system=%d\n", PMC_PM(cnum, tmp.reg_value), ctx->ctx_fl_system));
+ ret = -EINVAL;
+ goto abort_mission;
+ }
}
- if (PMC_IS_COUNTER(cnum)) {
+ if (PMC_IS_COUNTING(cnum)) {
+ pfm_monitor_t *p = (pfm_monitor_t *)&tmp.reg_value;
+ /*
+ * enforce generation of overflow interrupt. Necessary on all
+ * CPUs.
+ */
+ p->pmc_oi = 1;
+
+ if (tmp.reg_flags & PFM_REGFL_OVFL_NOTIFY) {
+ /*
+ * must have a target for the signal
+ */
+ if (ctx->ctx_notify_task == NULL) {
+ DBprintk(("no notify_task && PFM_REGFL_OVFL_NOTIFY\n"));
+ ret = -EINVAL;
+ goto abort_mission;
+ }
+ ctx->ctx_soft_pmds[cnum].flags |= PFM_REGFL_OVFL_NOTIFY;
+ }
/*
- * we keep track of EARS/BTB to speed up sampling later
+ * copy reset vector
*/
- if (PMC_IS_DEAR(&tmp.pfr_reg.reg_value)) {
- ctx->ctx_dear_counter = cnum;
- } else if (PMC_IS_IEAR(&tmp.pfr_reg.reg_value)) {
- ctx->ctx_iear_counter = cnum;
- } else if (PMC_IS_BTB(&tmp.pfr_reg.reg_value)) {
- ctx->ctx_btb_counter = cnum;
- }
-#if 0
- if (tmp.pfr_reg.reg_flags & PFM_REGFL_OVFL_NOTIFY)
- ctx->ctx_pmds[cnum - PMU_FIRST_COUNTER].flags |= PFM_REGFL_OVFL_NOTIFY;
-#endif
+ ctx->ctx_soft_pmds[cnum].reset_pmds[0] = tmp.reg_reset_pmds[0];
+ ctx->ctx_soft_pmds[cnum].reset_pmds[1] = tmp.reg_reset_pmds[1];
+ ctx->ctx_soft_pmds[cnum].reset_pmds[2] = tmp.reg_reset_pmds[2];
+ ctx->ctx_soft_pmds[cnum].reset_pmds[3] = tmp.reg_reset_pmds[3];
}
- /* keep track of what we use */
- CTX_USED_PMC(ctx, cnum);
- ia64_set_pmc(cnum, tmp.pfr_reg.reg_value);
+ /*
+ * execute write checker, if any
+ */
+ if (PMC_WR_FUNC(cnum)) ret = PMC_WR_FUNC(cnum)(task, cnum, &tmp.reg_value);
+abort_mission:
+ if (ret == -EINVAL) reg_retval = PFM_REG_RETFL_EINVAL;
- DBprintk((" setting PMC[%ld]=0x%lx flags=0x%x used_pmcs=0%lx\n", cnum, tmp.pfr_reg.reg_value, ctx->ctx_pmds[cnum - PMU_FIRST_COUNTER].flags, ctx->ctx_used_pmcs[0]));
+ PFM_REG_RETFLAG_SET(tmp.reg_flags, reg_retval);
- }
- /*
- * we have to set this here event hough we haven't necessarily started monitoring
- * because we may be context switched out
- */
- if (ctx->ctx_fl_system==0) th->flags |= IA64_THREAD_PM_VALID;
+ /*
+ * update register return value, abort all if problem during copy.
+ */
+ if (copy_to_user(req, &tmp, sizeof(tmp))) return -EFAULT;
- return 0;
+ /*
+ * if there was something wrong on this register, don't touch
+ * the hardware at all and abort write request for others.
+ *
+ * On error, the user mut sequentially scan the table and the first
+ * entry which has a return flag set is the one that caused the error.
+ */
+ if (ret != 0) {
+ DBprintk(("[%d] pmc[%u]=0x%lx error %d\n",
+ task->pid, cnum, tmp.reg_value, reg_retval));
+ break;
+ }
+
+ /*
+ * We can proceed with this register!
+ */
+
+ /*
+ * Needed in case the user does not initialize the equivalent
+ * PMD. Clearing is done in reset_pmu() so there is no possible
+ * leak here.
+ */
+ CTX_USED_PMD(ctx, pmu_conf.pmc_desc[cnum].dep_pmd[0]);
+
+ /*
+ * keep copy the pmc, used for register reload
+ */
+ th->pmc[cnum] = tmp.reg_value;
+
+ ia64_set_pmc(cnum, tmp.reg_value);
+
+ DBprintk(("[%d] pmc[%u]=0x%lx flags=0x%x used_pmds=0x%lx\n",
+ task->pid, cnum, tmp.reg_value,
+ ctx->ctx_soft_pmds[cnum].flags,
+ ctx->ctx_used_pmds[0]));
+
+ }
+ return ret;
}
static int
-pfm_write_pmds(struct task_struct *ta, perfmon_req_t *req, int count)
+pfm_write_pmds(struct task_struct *task, pfm_context_t *ctx, void *arg, int count, struct pt_regs *regs)
{
- struct thread_struct *th = &ta->thread;
- pfm_context_t *ctx = th->pfm_context;
- perfmon_req_t tmp;
- unsigned long cnum;
+ pfarg_reg_t tmp, *req = (pfarg_reg_t *)arg;
+ unsigned int cnum;
int i;
+ int ret = 0, reg_retval = 0;
+
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
+
+ /*
+ * Cannot do anything before PMU is enabled
+ */
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
+
/* XXX: ctx locking may be required here */
for (i = 0; i < count; i++, req++) {
- int k;
if (copy_from_user(&tmp, req, sizeof(tmp))) return -EFAULT;
- cnum = tmp.pfr_reg.reg_num;
+ cnum = tmp.reg_num;
+ if (!PMD_IS_IMPL(cnum)) {
+ ret = -EINVAL;
+ goto abort_mission;
+ }
+
+ /* update virtualized (64bits) counter */
+ if (PMD_IS_COUNTING(cnum)) {
+ ctx->ctx_soft_pmds[cnum].ival = tmp.reg_value;
+ ctx->ctx_soft_pmds[cnum].val = tmp.reg_value & ~pmu_conf.perf_ovfl_val;
+ ctx->ctx_soft_pmds[cnum].long_reset = tmp.reg_long_reset;
+ ctx->ctx_soft_pmds[cnum].short_reset = tmp.reg_short_reset;
- k = cnum - PMU_FIRST_COUNTER;
+ }
+ /*
+ * execute write checker, if any
+ */
+ if (PMD_WR_FUNC(cnum)) ret = PMD_WR_FUNC(cnum)(task, cnum, &tmp.reg_value);
+abort_mission:
+ if (ret == -EINVAL) reg_retval = PFM_REG_RETFL_EINVAL;
- if (!PMD_IS_IMPL(cnum)) return -EINVAL;
+ PFM_REG_RETFLAG_SET(tmp.reg_flags, reg_retval);
- /* update virtualized (64bits) counter */
- if (PMD_IS_COUNTER(cnum)) {
- ctx->ctx_pmds[k].ival = tmp.pfr_reg.reg_value;
- ctx->ctx_pmds[k].val = tmp.pfr_reg.reg_value & ~pmu_conf.perf_ovfl_val;
- ctx->ctx_pmds[k].smpl_rval = tmp.pfr_reg.reg_smpl_reset;
- ctx->ctx_pmds[k].ovfl_rval = tmp.pfr_reg.reg_ovfl_reset;
+ if (copy_to_user(req, &tmp, sizeof(tmp))) return -EFAULT;
- if (tmp.pfr_reg.reg_flags & PFM_REGFL_OVFL_NOTIFY)
- ctx->ctx_pmds[cnum - PMU_FIRST_COUNTER].flags |= PFM_REGFL_OVFL_NOTIFY;
+ /*
+ * if there was something wrong on this register, don't touch
+ * the hardware at all and abort write request for others.
+ *
+ * On error, the user mut sequentially scan the table and the first
+ * entry which has a return flag set is the one that caused the error.
+ */
+ if (ret != 0) {
+ DBprintk(("[%d] pmc[%u]=0x%lx error %d\n",
+ task->pid, cnum, tmp.reg_value, reg_retval));
+ break;
}
+
/* keep track of what we use */
- CTX_USED_PMD(ctx, cnum);
+ CTX_USED_PMD(ctx, pmu_conf.pmd_desc[(cnum)].dep_pmd[0]);
/* writes to unimplemented part is ignored, so this is safe */
- ia64_set_pmd(cnum, tmp.pfr_reg.reg_value);
+ ia64_set_pmd(cnum, tmp.reg_value & pmu_conf.perf_ovfl_val);
/* to go away */
ia64_srlz_d();
- DBprintk((" setting PMD[%ld]: ovfl_notify=%d pmd.val=0x%lx pmd.ovfl_rval=0x%lx pmd.smpl_rval=0x%lx pmd=%lx used_pmds=0%lx\n",
- cnum,
- PMD_OVFL_NOTIFY(ctx, cnum - PMU_FIRST_COUNTER),
- ctx->ctx_pmds[k].val,
- ctx->ctx_pmds[k].ovfl_rval,
- ctx->ctx_pmds[k].smpl_rval,
- ia64_get_pmd(cnum) & pmu_conf.perf_ovfl_val,
- ctx->ctx_used_pmds[0]));
- }
- /*
- * we have to set this here event hough we haven't necessarily started monitoring
- * because we may be context switched out
- */
- if (ctx->ctx_fl_system==0) th->flags |= IA64_THREAD_PM_VALID;
- return 0;
+ DBprintk(("[%d] pmd[%u]: soft_pmd=0x%lx short_reset=0x%lx "
+ "long_reset=0x%lx hw_pmd=%lx notify=%c used_pmds=0x%lx reset_pmds=0x%lx\n",
+ task->pid, cnum,
+ ctx->ctx_soft_pmds[cnum].val,
+ ctx->ctx_soft_pmds[cnum].short_reset,
+ ctx->ctx_soft_pmds[cnum].long_reset,
+ ia64_get_pmd(cnum) & pmu_conf.perf_ovfl_val,
+ PMC_OVFL_NOTIFY(ctx, cnum) ? 'Y':'N',
+ ctx->ctx_used_pmds[0],
+ ctx->ctx_soft_pmds[cnum].reset_pmds[0]));
+ }
+ return ret;
}
static int
-pfm_read_pmds(struct task_struct *ta, perfmon_req_t *req, int count)
+pfm_read_pmds(struct task_struct *task, pfm_context_t *ctx, void *arg, int count, struct pt_regs *regs)
{
- struct thread_struct *th = &ta->thread;
- pfm_context_t *ctx = th->pfm_context;
+ struct thread_struct *th = &task->thread;
unsigned long val=0;
- perfmon_req_t tmp;
- int i;
+ pfarg_reg_t tmp, *req = (pfarg_reg_t *)arg;
+ unsigned int cnum;
+ int i, ret = 0;
+
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
/*
* XXX: MUST MAKE SURE WE DON"T HAVE ANY PENDING OVERFLOW BEFORE READING
- * This is required when the monitoring has been stoppped by user of kernel.
- * If ity is still going on, then that's fine because we a re not gauranteed
- * to return an accurate value in this case
+ * This is required when the monitoring has been stoppped by user or kernel.
+ * If it is still going on, then that's fine because we a re not guaranteed
+ * to return an accurate value in this case.
*/
/* XXX: ctx locking may be required here */
+ DBprintk(("ctx_last_cpu=%d for [%d]\n", atomic_read(&ctx->ctx_last_cpu), task->pid));
+
for (i = 0; i < count; i++, req++) {
- unsigned long reg_val = ~0, ctx_val = ~0;
+ unsigned long reg_val = ~0UL, ctx_val = ~0UL;
if (copy_from_user(&tmp, req, sizeof(tmp))) return -EFAULT;
- if (!PMD_IS_IMPL(tmp.pfr_reg.reg_num)) return -EINVAL;
+ cnum = tmp.reg_num;
+
+ if (!PMD_IS_IMPL(cnum)) goto abort_mission;
+ /*
+ * we can only read the register that we use. That includes
+ * the one we explicitely initialize AND the one we want included
+ * in the sampling buffer (smpl_regs).
+ *
+ * Having this restriction allows optimization in the ctxsw routine
+ * without compromising security (leaks)
+ */
+ if (!CTX_IS_USED_PMD(ctx, cnum)) goto abort_mission;
- if (PMD_IS_COUNTER(tmp.pfr_reg.reg_num)) {
- if (ta == current){
- val = ia64_get_pmd(tmp.pfr_reg.reg_num);
- } else {
- val = reg_val = th->pmd[tmp.pfr_reg.reg_num];
+ /*
+ * If the task is not the current one, then we check if the
+ * PMU state is still in the local live register due to lazy ctxsw.
+ * If true, then we read directly from the registers.
+ */
+ if (atomic_read(&ctx->ctx_last_cpu) == smp_processor_id()){
+ ia64_srlz_d();
+ val = reg_val = ia64_get_pmd(cnum);
+ DBprintk(("reading pmd[%u]=0x%lx from hw\n", cnum, val));
+ } else {
+#ifdef CONFIG_SMP
+ int cpu;
+ /*
+ * for SMP system, the context may still be live on another
+ * CPU so we need to fetch it before proceeding with the read
+ * This call we only be made once for the whole loop because
+ * of ctx_last_cpu becoming == -1.
+ *
+ * We cannot reuse ctx_last_cpu as it may change before we get to the
+ * actual IPI call. In this case, we will do the call for nothing but
+ * there is no way around it. The receiving side will simply do nothing.
+ */
+ cpu = atomic_read(&ctx->ctx_last_cpu);
+ if (cpu != -1) {
+ DBprintk(("must fetch on CPU%d for [%d]\n", cpu, task->pid));
+ pfm_fetch_regs(cpu, task, ctx);
}
- val &= pmu_conf.perf_ovfl_val;
+#endif
+ /* context has been saved */
+ val = reg_val = th->pmd[cnum];
+ }
+ if (PMD_IS_COUNTING(cnum)) {
/*
- * lower part of .val may not be zero, so we must be an addition because of
- * residual count (see update_counters).
+ * XXX: need to check for overflow
*/
- val += ctx_val = ctx->ctx_pmds[tmp.pfr_reg.reg_num - PMU_FIRST_COUNTER].val;
+
+ val &= pmu_conf.perf_ovfl_val;
+ val += ctx_val = ctx->ctx_soft_pmds[cnum].val;
} else {
- /* for now */
- if (ta != current) return -EINVAL;
+ val = reg_val = ia64_get_pmd(cnum);
+ }
- ia64_srlz_d();
- val = ia64_get_pmd(tmp.pfr_reg.reg_num);
+ tmp.reg_value = val;
+
+ /*
+ * execute read checker, if any
+ */
+ if (PMD_RD_FUNC(cnum)) {
+ ret = PMD_RD_FUNC(cnum)(task, cnum, &tmp.reg_value);
}
- tmp.pfr_reg.reg_value = val;
- DBprintk((" reading PMD[%ld]=0x%lx reg=0x%lx ctx_val=0x%lx pmc=0x%lx\n",
- tmp.pfr_reg.reg_num, val, reg_val, ctx_val, ia64_get_pmc(tmp.pfr_reg.reg_num)));
+ PFM_REG_RETFLAG_SET(tmp.reg_flags, ret);
+
+ DBprintk(("read pmd[%u] ret=%d soft_pmd=0x%lx reg=0x%lx pmc=0x%lx\n",
+ cnum, ret, ctx_val, reg_val,
+ ia64_get_pmc(cnum)));
if (copy_to_user(req, &tmp, sizeof(tmp))) return -EFAULT;
}
return 0;
+abort_mission:
+ PFM_REG_RETFLAG_SET(tmp.reg_flags, PFM_REG_RETFL_EINVAL);
+ /*
+ * XXX: if this fails, we stick we the original failure, flag not updated!
+ */
+ copy_to_user(req, &tmp, sizeof(tmp));
+ return -EINVAL;
+
+}
+
+#ifdef PFM_PMU_USES_DBR
+/*
+ * Only call this function when a process it trying to
+ * write the debug registers (reading is always allowed)
+ */
+int
+pfm_use_debug_registers(struct task_struct *task)
+{
+ pfm_context_t *ctx = task->thread.pfm_context;
+ int ret = 0;
+
+ DBprintk(("called for [%d]\n", task->pid));
+
+ /*
+ * do it only once
+ */
+ if (task->thread.flags & IA64_THREAD_DBG_VALID) return 0;
+
+ /*
+ * Even on SMP, we do not need to use an atomic here because
+ * the only way in is via ptrace() and this is possible only when the
+ * process is stopped. Even in the case where the ctxsw out is not totally
+ * completed by the time we come here, there is no way the 'stopped' process
+ * could be in the middle of fiddling with the pfm_write_ibr_dbr() routine.
+ * So this is always safe.
+ */
+ if (ctx && ctx->ctx_fl_using_dbreg == 1) return -1;
+
+ /*
+ * XXX: not pretty
+ */
+ LOCK_PFS();
+
+ /*
+ * We only allow the use of debug registers when there is no system
+ * wide monitoring
+ * XXX: we could relax this by
+ */
+ if (pfm_sessions.pfs_sys_use_dbregs> 0)
+ ret = -1;
+ else
+ pfm_sessions.pfs_ptrace_use_dbregs++;
+
+ DBprintk(("ptrace_use_dbregs=%lu sys_use_dbregs=%lu by [%d] ret = %d\n",
+ pfm_sessions.pfs_ptrace_use_dbregs,
+ pfm_sessions.pfs_sys_use_dbregs,
+ task->pid, ret));
+
+ UNLOCK_PFS();
+
+ return ret;
+}
+
+/*
+ * This function is called for every task that exits with the
+ * IA64_THREAD_DBG_VALID set. This indicates a task which was
+ * able to use the debug registers for debugging purposes via
+ * ptrace(). Therefore we know it was not using them for
+ * perfmormance monitoring, so we only decrement the number
+ * of "ptraced" debug register users to keep the count up to date
+ */
+int
+pfm_release_debug_registers(struct task_struct *task)
+{
+ int ret;
+
+ LOCK_PFS();
+ if (pfm_sessions.pfs_ptrace_use_dbregs == 0) {
+ printk("perfmon: invalid release for [%d] ptrace_use_dbregs=0\n", task->pid);
+ ret = -1;
+ } else {
+ pfm_sessions.pfs_ptrace_use_dbregs--;
+ ret = 0;
+ }
+ UNLOCK_PFS();
+
+ return ret;
+}
+#else /* PFM_PMU_USES_DBR is true */
+/*
+ * in case, the PMU does not use the debug registers, these two functions are nops.
+ * The first function is called from arch/ia64/kernel/ptrace.c.
+ * The second function is called from arch/ia64/kernel/process.c.
+ */
+int
+pfm_use_debug_registers(struct task_struct *task)
+{
+ return 0;
+}
+int
+pfm_release_debug_registers(struct task_struct *task)
+{
+ return 0;
}
+#endif /* PFM_PMU_USES_DBR */
static int
-pfm_do_restart(struct task_struct *task)
+pfm_restart(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
{
- struct thread_struct *th = &task->thread;
- pfm_context_t *ctx = th->pfm_context;
void *sem = &ctx->ctx_restart_sem;
+ /*
+ * Cannot do anything before PMU is enabled
+ */
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
+
if (task == current) {
- DBprintk((" restarting self %d frozen=%d \n", current->pid, ctx->ctx_fl_frozen));
+ DBprintk(("restarting self %d frozen=%d \n", current->pid, ctx->ctx_fl_frozen));
+
+ pfm_reset_regs(ctx, ctx->ctx_ovfl_regs, PFM_RELOAD_LONG_RESET);
- pfm_reset_regs(ctx);
+ ctx->ctx_ovfl_regs[0] = 0UL;
/*
* We ignore block/don't block because we never block
@@ -1011,26 +1642,36 @@
ctx->ctx_fl_frozen = 0;
if (CTX_HAS_SMPL(ctx)) {
- ctx->ctx_smpl_buf->psb_hdr->hdr_count = 0;
- ctx->ctx_smpl_buf->psb_index = 0;
+ ctx->ctx_psb->psb_hdr->hdr_count = 0;
+ ctx->ctx_psb->psb_index = 0;
}
- /* pfm_reset_smpl_buffers(ctx,th->pfm_ovfl_regs);*/
-
/* simply unfreeze */
ia64_set_pmc(0, 0);
ia64_srlz_d();
return 0;
- }
+ }
+ /* restart on another task */
- /* check if blocking */
+ /*
+ * if blocking, then post the semaphore.
+ * if non-blocking, then we ensure that the task will go into
+ * pfm_overflow_must_block() before returning to user mode.
+ * We cannot explicitely reset another task, it MUST always
+ * be done by the task itself. This works for system wide because
+ * the tool that is controlling the session is doing "self-monitoring".
+ *
+ * XXX: what if the task never goes back to user?
+ *
+ */
if (CTX_OVFL_NOBLOCK(ctx) == 0) {
- DBprintk((" unblocking %d \n", task->pid));
+ DBprintk(("unblocking %d \n", task->pid));
up(sem);
- return 0;
+ } else {
+ task->thread.pfm_ovfl_block_reset = 1;
}
-
+#if 0
/*
* in case of non blocking mode, then it's just a matter of
* of reseting the sampling buffer (if any) index. The PMU
@@ -1041,281 +1682,723 @@
* must reset the header count first
*/
if (CTX_HAS_SMPL(ctx)) {
- DBprintk((" resetting sampling indexes for %d \n", task->pid));
- ctx->ctx_smpl_buf->psb_hdr->hdr_count = 0;
- ctx->ctx_smpl_buf->psb_index = 0;
+ DBprintk(("resetting sampling indexes for %d \n", task->pid));
+ ctx->ctx_psb->psb_hdr->hdr_count = 0;
+ ctx->ctx_psb->psb_index = 0;
}
-
+#endif
return 0;
}
+#ifndef CONFIG_SMP
/*
- * system-wide mode: propagate activation/desactivation throughout the tasklist
- *
- * XXX: does not work for SMP, of course
+ * On UP kernels, we do not need to constantly set the psr.pp bit
+ * when a task is scheduled. The psr.pp bit can only be changed in
+ * the kernel because of a user request. Given we are on a UP non preeemptive
+ * kernel we know that no other task is running, so we cna simply update their
+ * psr.pp from their saved state. There is this no impact on the context switch
+ * code compared to the SMP case.
*/
static void
-pfm_process_tasklist(int cmd)
+pfm_tasklist_toggle_pp(unsigned int val)
{
struct task_struct *p;
struct pt_regs *regs;
+ DBprintk(("invoked by [%d] pp=%u\n", current->pid, val));
+
+ read_lock(&tasklist_lock);
+
for_each_task(p) {
- regs = (struct pt_regs *)((unsigned long)p + IA64_STK_OFFSET);
+ regs = (struct pt_regs *)((unsigned long) p + IA64_STK_OFFSET);
+
+ /*
+ * position on pt_regs saved on stack on 1st entry into the kernel
+ */
regs--;
- ia64_psr(regs)->pp = cmd;
+
+ /*
+ * update psr.pp
+ */
+ ia64_psr(regs)->pp = val;
}
+ read_unlock(&tasklist_lock);
}
+#endif
+
+
static int
-do_perfmonctl (struct task_struct *task, int cmd, int flags, perfmon_req_t *req, int count, struct pt_regs *regs)
+pfm_stop(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
{
- perfmon_req_t tmp;
- struct thread_struct *th = &task->thread;
- pfm_context_t *ctx = th->pfm_context;
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
- memset(&tmp, 0, sizeof(tmp));
+ /*
+ * Cannot do anything before PMU is enabled
+ */
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
- if (ctx == NULL && cmd != PFM_CREATE_CONTEXT && cmd < PFM_DEBUG_BASE) {
- DBprintk((" PFM_WRITE_PMCS: no context for task %d\n", task->pid));
- return -EINVAL;
- }
+ DBprintk(("[%d] fl_system=%d owner=%p current=%p\n",
+ current->pid,
+ ctx->ctx_fl_system, PMU_OWNER(),
+ current));
+ /* simply stop monitoring but not the PMU */
+ if (ctx->ctx_fl_system) {
- switch (cmd) {
- case PFM_CREATE_CONTEXT:
- /* a context has already been defined */
- if (ctx) return -EBUSY;
+ __asm__ __volatile__ ("rsm psr.pp;;"::: "memory");
- /*
- * cannot directly create a context in another process
- */
- if (task != current) return -EINVAL;
+ /* disable dcr pp */
+ ia64_set_dcr(ia64_get_dcr() & ~IA64_DCR_PP);
- if (req == NULL || count != 1) return -EINVAL;
+#ifdef CONFIG_SMP
+ local_cpu_data->pfm_dcr_pp = 0;
+#else
+ pfm_tasklist_toggle_pp(0);
+#endif
- if (!access_ok(VERIFY_READ, req, sizeof(struct perfmon_req_t)*count)) return -EFAULT;
+ ia64_psr(regs)->pp = 0;
- return pfm_context_create(flags, req);
+ } else {
+ __asm__ __volatile__ ("rum psr.up;;"::: "memory");
- case PFM_WRITE_PMCS:
- /* we don't quite support this right now */
- if (task != current) return -EINVAL;
+ ia64_psr(regs)->up = 0;
+ }
+ return 0;
+}
- if (!access_ok(VERIFY_READ, req, sizeof(struct perfmon_req_t)*count)) return -EFAULT;
+static int
+pfm_disable(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
- return pfm_write_pmcs(task, req, count);
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
- case PFM_WRITE_PMDS:
- /* we don't quite support this right now */
- if (task != current) return -EINVAL;
+ /*
+ * stop monitoring, freeze PMU, and save state in context
+ * this call will clear IA64_THREAD_PM_VALID for per-task sessions.
+ */
+ pfm_flush_regs(task);
- if (!access_ok(VERIFY_READ, req, sizeof(struct perfmon_req_t)*count)) return -EFAULT;
+ if (ctx->ctx_fl_system) {
+ ia64_psr(regs)->pp = 0;
+ } else {
+ ia64_psr(regs)->up = 0;
+ }
+ /*
+ * goes back to default behavior
+ * no need to change live psr.sp because useless at the kernel level
+ */
+ ia64_psr(regs)->sp = 1;
+
+ DBprintk(("enabling psr.sp for [%d]\n", current->pid));
+
+ ctx->ctx_flags.state = PFM_CTX_DISABLED;
+
+ return 0;
+}
+
+
+
+static int
+pfm_destroy_context(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
+
+ /*
+ * if context was never enabled, then there is not much
+ * to do
+ */
+ if (!CTX_IS_ENABLED(ctx)) goto skipped_stop;
+
+ /*
+ * Disable context: stop monitoring, flush regs to software state (useless here),
+ * and freeze PMU
+ *
+ * The IA64_THREAD_PM_VALID is cleared by pfm_flush_regs() called from pfm_disable()
+ */
+ pfm_disable(task, ctx, arg, count, regs);
+
+ if (ctx->ctx_fl_system) {
+ ia64_psr(regs)->pp = 0;
+ } else {
+ ia64_psr(regs)->up = 0;
+ }
+
+ /* restore security level */
+ ia64_psr(regs)->sp = 1;
+
+skipped_stop:
+ /*
+ * remove sampling buffer mapping, if any
+ */
+ if (ctx->ctx_smpl_vaddr) pfm_remove_smpl_mapping(task);
+
+ /* now free context and related state */
+ pfm_context_exit(task);
+
+ return 0;
+}
+
+/*
+ * does nothing at the moment
+ */
+static int
+pfm_unprotect_context(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ return 0;
+}
+
+static int
+pfm_protect_context(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ DBprintk(("context from [%d] is protected\n", task->pid));
+ /*
+ * from now on, only the creator of the context has access to it
+ */
+ ctx->ctx_fl_protected = 1;
+
+ /*
+ * reinforce secure monitoring: cannot toggle psr.up
+ */
+ ia64_psr(regs)->sp = 1;
+
+ return 0;
+}
+
+static int
+pfm_debug(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ unsigned int mode = *(unsigned int *)arg;
+
+ pfm_debug_mode = mode == 0 ? 0 : 1;
+
+ printk("perfmon debugging %s\n", pfm_debug_mode ? "on" : "off");
+
+ return 0;
+}
+
+#ifdef PFM_PMU_USES_DBR
+
+typedef struct {
+ unsigned long ibr_mask:56;
+ unsigned long ibr_plm:4;
+ unsigned long ibr_ig:3;
+ unsigned long ibr_x:1;
+} ibr_mask_reg_t;
+
+typedef struct {
+ unsigned long dbr_mask:56;
+ unsigned long dbr_plm:4;
+ unsigned long dbr_ig:2;
+ unsigned long dbr_w:1;
+ unsigned long dbr_r:1;
+} dbr_mask_reg_t;
+
+typedef union {
+ unsigned long val;
+ ibr_mask_reg_t ibr;
+ dbr_mask_reg_t dbr;
+} dbreg_t;
+
+
+static int
+pfm_write_ibr_dbr(int mode, struct task_struct *task, void *arg, int count, struct pt_regs *regs)
+{
+ struct thread_struct *thread = &task->thread;
+ pfm_context_t *ctx = task->thread.pfm_context;
+ pfarg_dbreg_t tmp, *req = (pfarg_dbreg_t *)arg;
+ dbreg_t dbreg;
+ unsigned int rnum;
+ int first_time;
+ int i, ret = 0;
- return pfm_write_pmds(task, req, count);
+ /*
+ * for range restriction: psr.db must be cleared or the
+ * the PMU will ignore the debug registers.
+ *
+ * XXX: may need more in system wide mode,
+ * no task can have this bit set?
+ */
+ if (ia64_psr(regs)->db == 1) return -EINVAL;
- case PFM_START:
- /* we don't quite support this right now */
- if (task != current) return -EINVAL;
- if (PMU_OWNER() && PMU_OWNER() != current && PFM_CAN_DO_LAZY()) pfm_lazy_save_regs(PMU_OWNER());
+ first_time = ctx->ctx_fl_using_dbreg == 0;
- SET_PMU_OWNER(current);
+ /*
+ * check for debug registers in system wide mode
+ *
+ */
+ LOCK_PFS();
+ if (ctx->ctx_fl_system && first_time) {
+ if (pfm_sessions.pfs_ptrace_use_dbregs)
+ ret = -EBUSY;
+ else
+ pfm_sessions.pfs_sys_use_dbregs++;
+ }
+ UNLOCK_PFS();
- /* will start monitoring right after rfi */
- ia64_psr(regs)->up = 1;
- ia64_psr(regs)->pp = 1;
+ if (ret != 0) return ret;
- if (ctx->ctx_fl_system) {
- pfm_process_tasklist(1);
- pfs_info.pfs_pp = 1;
+ if (ctx->ctx_fl_system) {
+ /* we mark ourselves as owner of the debug registers */
+ ctx->ctx_fl_using_dbreg = 1;
+ } else {
+ if (ctx->ctx_fl_using_dbreg == 0) {
+ ret= -EBUSY;
+ if ((thread->flags & IA64_THREAD_DBG_VALID) != 0) {
+ DBprintk(("debug registers already in use for [%d]\n", task->pid));
+ goto abort_mission;
}
+ /* we mark ourselves as owner of the debug registers */
+ ctx->ctx_fl_using_dbreg = 1;
- /*
- * mark the state as valid.
- * this will trigger save/restore at context switch
+ /*
+ * Given debug registers cannot be used for both debugging
+ * and performance monitoring at the same time, we reuse
+ * the storage area to save and restore the registers on ctxsw.
*/
- if (ctx->ctx_fl_system==0) th->flags |= IA64_THREAD_PM_VALID;
+ memset(task->thread.dbr, 0, sizeof(task->thread.dbr));
+ memset(task->thread.ibr, 0, sizeof(task->thread.ibr));
- ia64_set_pmc(0, 0);
+ /*
+ * clear hardware registers to make sure we don't
+ * pick up stale state
+ */
+ for (i=0; i < pmu_conf.num_ibrs; i++) {
+ ia64_set_ibr(i, 0UL);
+ }
+ ia64_srlz_i();
+ for (i=0; i < pmu_conf.num_dbrs; i++) {
+ ia64_set_dbr(i, 0UL);
+ }
ia64_srlz_d();
+ }
+ }
- break;
+ ret = -EFAULT;
+
+ /*
+ * Now install the values into the registers
+ */
+ for (i = 0; i < count; i++, req++) {
+
+
+ if (copy_from_user(&tmp, req, sizeof(tmp))) goto abort_mission;
+
+ rnum = tmp.dbreg_num;
+ dbreg.val = tmp.dbreg_value;
+
+ ret = -EINVAL;
+
+ if ((mode == 0 && !IBR_IS_IMPL(rnum)) || ((mode == 1) && !DBR_IS_IMPL(rnum))) {
+ DBprintk(("invalid register %u val=0x%lx mode=%d i=%d count=%d\n",
+ rnum, dbreg.val, mode, i, count));
- case PFM_ENABLE:
- /* we don't quite support this right now */
- if (task != current) return -EINVAL;
+ goto abort_mission;
+ }
- if (PMU_OWNER() && PMU_OWNER() != current && PFM_CAN_DO_LAZY()) pfm_lazy_save_regs(PMU_OWNER());
+ /*
+ * make sure we do not install enabled breakpoint
+ */
+ if (rnum & 0x1) {
+ if (mode == 0)
+ dbreg.ibr.ibr_x = 0;
+ else
+ dbreg.dbr.dbr_r = dbreg.dbr.dbr_w = 0;
+ }
- /* reset all registers to stable quiet state */
- ia64_reset_pmu();
+ /*
+ * clear return flags and copy back to user
+ *
+ * XXX: fix once EAGAIN is implemented
+ */
+ ret = -EFAULT;
- /* make sure nothing starts */
- ia64_psr(regs)->up = 0;
- ia64_psr(regs)->pp = 0;
+ PFM_REG_RETFLAG_SET(tmp.dbreg_flags, 0);
- /* do it on the live register as well */
- __asm__ __volatile__ ("rsm psr.pp|psr.pp;;"::: "memory");
+ if (copy_to_user(req, &tmp, sizeof(tmp))) goto abort_mission;
- SET_PMU_OWNER(current);
+ /*
+ * Debug registers, just like PMC, can only be modified
+ * by a kernel call. Moreover, perfmon() access to those
+ * registers are centralized in this routine. The hardware
+ * does not modify the value of these registers, therefore,
+ * if we save them as they are written, we can avoid having
+ * to save them on context switch out. This is made possible
+ * by the fact that when perfmon uses debug registers, ptrace()
+ * won't be able to modify them concurrently.
+ */
+ if (mode == 0) {
+ CTX_USED_IBR(ctx, rnum);
- /*
- * mark the state as valid.
- * this will trigger save/restore at context switch
- */
- if (ctx->ctx_fl_system==0) th->flags |= IA64_THREAD_PM_VALID;
+ ia64_set_ibr(rnum, dbreg.val);
+ ia64_srlz_i();
- /* simply unfreeze */
- ia64_set_pmc(0, 0);
- ia64_srlz_d();
- break;
+ thread->ibr[rnum] = dbreg.val;
- case PFM_DISABLE:
- /* we don't quite support this right now */
- if (task != current) return -EINVAL;
+ DBprintk(("write ibr%u=0x%lx used_ibrs=0x%lx\n", rnum, dbreg.val, ctx->ctx_used_ibrs[0]));
+ } else {
+ CTX_USED_DBR(ctx, rnum);
- /* simply freeze */
- ia64_set_pmc(0, 1);
+ ia64_set_dbr(rnum, dbreg.val);
ia64_srlz_d();
- /*
- * XXX: cannot really toggle IA64_THREAD_PM_VALID
- * but context is still considered valid, so any
- * read request would return something valid. Same
- * thing when this task terminates (pfm_flush_regs()).
- */
- break;
- case PFM_READ_PMDS:
- if (!access_ok(VERIFY_READ, req, sizeof(struct perfmon_req_t)*count)) return -EFAULT;
- if (!access_ok(VERIFY_WRITE, req, sizeof(struct perfmon_req_t)*count)) return -EFAULT;
-
- return pfm_read_pmds(task, req, count);
-
- case PFM_STOP:
- /* we don't quite support this right now */
- if (task != current) return -EINVAL;
-
- /* simply stop monitors, not PMU */
- ia64_psr(regs)->up = 0;
- ia64_psr(regs)->pp = 0;
-
- if (ctx->ctx_fl_system) {
- pfm_process_tasklist(0);
- pfs_info.pfs_pp = 0;
- }
+ thread->dbr[rnum] = dbreg.val;
- break;
+ DBprintk(("write dbr%u=0x%lx used_dbrs=0x%lx\n", rnum, dbreg.val, ctx->ctx_used_dbrs[0]));
+ }
+ }
- case PFM_RESTART: /* temporary, will most likely end up as a PFM_ENABLE */
+ return 0;
- if ((th->flags & IA64_THREAD_PM_VALID) == 0 && ctx->ctx_fl_system==0) {
- printk(" PFM_RESTART not monitoring\n");
- return -EINVAL;
- }
- if (CTX_OVFL_NOBLOCK(ctx) == 0 && ctx->ctx_fl_frozen==0) {
- printk("task %d without pmu_frozen set\n", task->pid);
- return -EINVAL;
- }
+abort_mission:
+ /*
+ * in case it was our first attempt, we undo the global modifications
+ */
+ if (first_time) {
+ LOCK_PFS();
+ if (ctx->ctx_fl_system) {
+ pfm_sessions.pfs_sys_use_dbregs--;
+ }
+ UNLOCK_PFS();
+ ctx->ctx_fl_using_dbreg = 0;
+ }
+ /*
+ * install error return flag
+ */
+ if (ret != -EFAULT) {
+ /*
+ * XXX: for now we can only come here on EINVAL
+ */
+ PFM_REG_RETFLAG_SET(tmp.dbreg_flags, PFM_REG_RETFL_EINVAL);
+ copy_to_user(req, &tmp, sizeof(tmp));
+ }
+ return ret;
+}
- return pfm_do_restart(task); /* we only look at first entry */
+static int
+pfm_write_ibrs(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
- case PFM_DESTROY_CONTEXT:
- /* we don't quite support this right now */
- if (task != current) return -EINVAL;
-
- /* first stop monitors */
- ia64_psr(regs)->up = 0;
- ia64_psr(regs)->pp = 0;
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
- /* then freeze PMU */
- ia64_set_pmc(0, 1);
- ia64_srlz_d();
+ return pfm_write_ibr_dbr(0, task, arg, count, regs);
+}
- /* don't save/restore on context switch */
- if (ctx->ctx_fl_system ==0) task->thread.flags &= ~IA64_THREAD_PM_VALID;
+static int
+pfm_write_dbrs(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
- SET_PMU_OWNER(NULL);
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
- /* now free context and related state */
- pfm_context_exit(task);
- break;
+ return pfm_write_ibr_dbr(1, task, arg, count, regs);
+}
- case PFM_DEBUG_ON:
- printk("perfmon debugging on\n");
- pfm_debug = 1;
- break;
+#endif /* PFM_PMU_USES_DBR */
- case PFM_DEBUG_OFF:
- printk("perfmon debugging off\n");
- pfm_debug = 0;
- break;
+static int
+pfm_get_features(struct task_struct *task, pfm_context_t *ctx, void *arg, int count, struct pt_regs *regs)
+{
+ pfarg_features_t tmp;
+
+ memset(&tmp, 0, sizeof(tmp));
+
+ tmp.ft_version = PFM_VERSION;
+ tmp.ft_smpl_version = PFM_SMPL_VERSION;
+
+ if (copy_to_user(arg, &tmp, sizeof(tmp))) return -EFAULT;
+
+ return 0;
+}
+
+static int
+pfm_start(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
+
+ /*
+ * Cannot do anything before PMU is enabled
+ */
+ if (!CTX_IS_ENABLED(ctx)) return -EINVAL;
+
+ DBprintk(("[%d] fl_system=%d owner=%p current=%p\n",
+ current->pid,
+ ctx->ctx_fl_system, PMU_OWNER(),
+ current));
+
+ if (PMU_OWNER() != task) {
+ printk("perfmon: pfm_start task [%d] not pmu owner\n", task->pid);
+ return -EINVAL;
+ }
+
+ if (ctx->ctx_fl_system) {
+
+ /* enable dcr pp */
+ ia64_set_dcr(ia64_get_dcr()|IA64_DCR_PP);
+
+#ifdef CONFIG_SMP
+ local_cpu_data->pfm_dcr_pp = 1;
+#else
+ pfm_tasklist_toggle_pp(1);
+#endif
+ ia64_psr(regs)->pp = 1;
+
+ __asm__ __volatile__ ("ssm psr.pp;;"::: "memory");
- default:
- DBprintk((" UNknown command 0x%x\n", cmd));
+ } else {
+ if ((task->thread.flags & IA64_THREAD_PM_VALID) == 0) {
+ printk("perfmon: pfm_start task flag not set for [%d]\n", task->pid);
return -EINVAL;
+ }
+ ia64_psr(regs)->up = 1;
+ __asm__ __volatile__ ("sum psr.up;;"::: "memory");
+ }
+ ia64_srlz_d();
+
+ return 0;
+}
+
+static int
+pfm_enable(struct task_struct *task, pfm_context_t *ctx, void *arg, int count,
+ struct pt_regs *regs)
+{
+ /* we don't quite support this right now */
+ if (task != current) return -EINVAL;
+
+ if (ctx->ctx_fl_system == 0 && PMU_OWNER() && PMU_OWNER() != current)
+ pfm_lazy_save_regs(PMU_OWNER());
+
+ /* reset all registers to stable quiet state */
+ ia64_reset_pmu(task);
+
+ /* make sure nothing starts */
+ if (ctx->ctx_fl_system) {
+ ia64_psr(regs)->pp = 0;
+ ia64_psr(regs)->up = 0; /* just to make sure! */
+
+ __asm__ __volatile__ ("rsm psr.pp;;"::: "memory");
+
+#ifdef CONFIG_SMP
+ local_cpu_data->pfm_syst_wide = 1;
+ local_cpu_data->pfm_dcr_pp = 0;
+#endif
+ } else {
+ /*
+ * needed in case the task was a passive task during
+ * a system wide session and now wants to have its own
+ * session
+ */
+ ia64_psr(regs)->pp = 0; /* just to make sure! */
+ ia64_psr(regs)->up = 0;
+
+ __asm__ __volatile__ ("rum psr.up;;"::: "memory");
+ /*
+ * allow user control (user monitors only)
+ if (task == ctx->ctx_owner) {
+ */
+ {
+ DBprintk(("clearing psr.sp for [%d]\n", current->pid));
+ ia64_psr(regs)->sp = 0;
+ }
+ task->thread.flags |= IA64_THREAD_PM_VALID;
}
+
+ SET_PMU_OWNER(task);
+
+
+ ctx->ctx_flags.state = PFM_CTX_ENABLED;
+ atomic_set(&ctx->ctx_last_cpu, smp_processor_id());
+
+ /* simply unfreeze */
+ ia64_set_pmc(0, 0);
+ ia64_srlz_d();
+
return 0;
}
/*
- * XXX: do something better here
+ * functions MUST be listed in the increasing order of their index (see permfon.h)
*/
+static pfm_cmd_desc_t pfm_cmd_tab[]={
+/* 0 */{ NULL, 0, 0, 0}, /* not used */
+/* 1 */{ pfm_write_pmcs, PFM_CMD_PID|PFM_CMD_CTX|PFM_CMD_ARG_READ|PFM_CMD_ARG_WRITE, PFM_CMD_ARG_MANY, sizeof(pfarg_reg_t)},
+/* 2 */{ pfm_write_pmds, PFM_CMD_PID|PFM_CMD_CTX|PFM_CMD_ARG_READ, PFM_CMD_ARG_MANY, sizeof(pfarg_reg_t)},
+/* 3 */{ pfm_read_pmds, PFM_CMD_PID|PFM_CMD_CTX|PFM_CMD_ARG_READ|PFM_CMD_ARG_WRITE, PFM_CMD_ARG_MANY, sizeof(pfarg_reg_t)},
+/* 4 */{ pfm_stop, PFM_CMD_PID|PFM_CMD_CTX, 0, 0},
+/* 5 */{ pfm_start, PFM_CMD_PID|PFM_CMD_CTX, 0, 0},
+/* 6 */{ pfm_enable, PFM_CMD_PID|PFM_CMD_CTX, 0, 0},
+/* 7 */{ pfm_disable, PFM_CMD_PID|PFM_CMD_CTX, 0, 0},
+/* 8 */{ pfm_create_context, PFM_CMD_ARG_READ, 1, sizeof(pfarg_context_t)},
+/* 9 */{ pfm_destroy_context, PFM_CMD_PID|PFM_CMD_CTX, 0, 0},
+/* 10 */{ pfm_restart, PFM_CMD_PID|PFM_CMD_CTX|PFM_CMD_NOCHK, 0, 0},
+/* 11 */{ pfm_protect_context, PFM_CMD_PID|PFM_CMD_CTX, 0, 0},
+/* 12 */{ pfm_get_features, PFM_CMD_ARG_WRITE, 0, 0},
+/* 13 */{ pfm_debug, 0, 1, sizeof(unsigned int)},
+/* 14 */{ pfm_unprotect_context, PFM_CMD_PID|PFM_CMD_CTX, 0, 0},
+/* 15 */{ NULL, 0, 0, 0}, /* not used */
+/* 16 */{ NULL, 0, 0, 0}, /* not used */
+/* 17 */{ NULL, 0, 0, 0}, /* not used */
+/* 18 */{ NULL, 0, 0, 0}, /* not used */
+/* 19 */{ NULL, 0, 0, 0}, /* not used */
+/* 20 */{ NULL, 0, 0, 0}, /* not used */
+/* 21 */{ NULL, 0, 0, 0}, /* not used */
+/* 22 */{ NULL, 0, 0, 0}, /* not used */
+/* 23 */{ NULL, 0, 0, 0}, /* not used */
+/* 24 */{ NULL, 0, 0, 0}, /* not used */
+/* 25 */{ NULL, 0, 0, 0}, /* not used */
+/* 26 */{ NULL, 0, 0, 0}, /* not used */
+/* 27 */{ NULL, 0, 0, 0}, /* not used */
+/* 28 */{ NULL, 0, 0, 0}, /* not used */
+/* 29 */{ NULL, 0, 0, 0}, /* not used */
+/* 30 */{ NULL, 0, 0, 0}, /* not used */
+/* 31 */{ NULL, 0, 0, 0}, /* not used */
+#ifdef PFM_PMU_USES_DBR
+/* 32 */{ pfm_write_ibrs, PFM_CMD_PID|PFM_CMD_CTX|PFM_CMD_ARG_READ|PFM_CMD_ARG_WRITE, PFM_CMD_ARG_MANY, sizeof(pfarg_dbreg_t)},
+/* 33 */{ pfm_write_dbrs, PFM_CMD_PID|PFM_CMD_CTX|PFM_CMD_ARG_READ|PFM_CMD_ARG_WRITE, PFM_CMD_ARG_MANY, sizeof(pfarg_dbreg_t)}
+#endif
+};
+#define PFM_CMD_COUNT (sizeof(pfm_cmd_tab)/sizeof(pfm_cmd_desc_t))
+
static int
-perfmon_bad_permissions(struct task_struct *task)
+check_task_state(struct task_struct *task)
{
- /* stolen from bad_signal() */
- return (current->session != task->session)
- && (current->euid ^ task->suid) && (current->euid ^ task->uid)
- && (current->uid ^ task->suid) && (current->uid ^ task->uid);
+ int ret = 0;
+#ifdef CONFIG_SMP
+ /* We must wait until the state has been completely
+ * saved. There can be situations where the reader arrives before
+ * after the task is marked as STOPPED but before pfm_save_regs()
+ * is completed.
+ */
+ for (;;) {
+
+ task_lock(task);
+ if (!task_has_cpu(task)) break;
+ task_unlock(task);
+
+ do {
+ if (task->state != TASK_ZOMBIE && task->state != TASK_STOPPED) return -EBUSY;
+ barrier();
+ cpu_relax();
+ } while (task_has_cpu(task));
+ }
+ task_unlock(task);
+#else
+ if (task->state != TASK_ZOMBIE && task->state != TASK_STOPPED) {
+ DBprintk(("warning [%d] not in stable state %ld\n", task->pid, task->state));
+ ret = -EBUSY;
+ }
+#endif
+ return ret;
}
asmlinkage int
-sys_perfmonctl (int pid, int cmd, int flags, perfmon_req_t *req, int count, long arg6, long arg7, long arg8, long stack)
+sys_perfmonctl (pid_t pid, int cmd, void *arg, int count, long arg5, long arg6, long arg7,
+ long arg8, long stack)
{
- struct pt_regs *regs = (struct pt_regs *) &stack;
- struct task_struct *child = current;
- int ret = -ESRCH;
+ struct pt_regs *regs = (struct pt_regs *)&stack;
+ struct task_struct *task = current;
+ pfm_context_t *ctx = task->thread.pfm_context;
+ size_t sz;
+ int ret = -ESRCH, narg;
- /* sanity check:
- *
- * ensures that we don't do bad things in case the OS
- * does not have enough storage to save/restore PMC/PMD
+ /*
+ * reject any call if perfmon was disabled at initialization time
*/
- if (PERFMON_IS_DISABLED()) return -ENOSYS;
+ if (PFM_IS_DISABLED()) return -ENOSYS;
- /* XXX: pid interface is going away in favor of pfm context */
- if (pid != current->pid) {
- read_lock(&tasklist_lock);
+ DBprintk(("cmd=%d idx=%d valid=%d narg=0x%x\n", cmd, PFM_CMD_IDX(cmd),
+ PFM_CMD_IS_VALID(cmd), PFM_CMD_NARG(cmd)));
- child = find_task_by_pid(pid);
+ if (PFM_CMD_IS_VALID(cmd) == 0) return -EINVAL;
- if (!child) goto abort_call;
+ /* ingore arguments when command has none */
+ narg = PFM_CMD_NARG(cmd);
+ if ((narg == PFM_CMD_ARG_MANY && count == 0) || (narg > 0 && narg != count)) return -EINVAL;
- ret = -EPERM;
+ sz = PFM_CMD_ARG_SIZE(cmd);
- if (perfmon_bad_permissions(child)) goto abort_call;
+ if (PFM_CMD_READ_ARG(cmd) && !access_ok(VERIFY_READ, arg, sz*count)) return -EFAULT;
- /*
- * XXX: need to do more checking here
+ if (PFM_CMD_WRITE_ARG(cmd) && !access_ok(VERIFY_WRITE, arg, sz*count)) return -EFAULT;
+
+ if (PFM_CMD_USE_PID(cmd)) {
+ /*
+ * XXX: may need to fine tune this one
*/
- if (child->state != TASK_ZOMBIE && child->state != TASK_STOPPED) {
- DBprintk((" warning process %d not in stable state %ld\n", pid, child->state));
+ if (pid < 2) return -EPERM;
+
+ if (pid != current->pid) {
+
+ read_lock(&tasklist_lock);
+
+ task = find_task_by_pid(pid);
+
+ if (!task) goto abort_call;
+
+ ret = -EPERM;
+
+ if (pfm_bad_permissions(task)) goto abort_call;
+
+ if (PFM_CMD_CHK(cmd)) {
+ ret = check_task_state(task);
+ if (ret != 0) goto abort_call;
+ }
+ ctx = task->thread.pfm_context;
}
+ }
+
+ if (PFM_CMD_USE_CTX(cmd)) {
+ ret = -EINVAL;
+ if (ctx == NULL) {
+ DBprintk(("no context for task %d\n", task->pid));
+ goto abort_call;
+ }
+ ret = -EPERM;
+ /*
+ * we only grant access to the context if:
+ * - the caller is the creator of the context (ctx_owner)
+ * OR - the context is attached to the caller AND The context IS NOT
+ * in protected mode
+ */
+ if (ctx->ctx_owner != current && (ctx->ctx_fl_protected || task != current)) {
+ DBprintk(("context protected, no access for [%d]\n", task->pid));
+ goto abort_call;
+ }
}
- ret = do_perfmonctl(child, cmd, flags, req, count, regs);
+
+ ret = (*pfm_cmd_tab[PFM_CMD_IDX(cmd)].cmd_func)(task, ctx, arg, count, regs);
abort_call:
- if (child != current) read_unlock(&tasklist_lock);
+ if (task != current) read_unlock(&tasklist_lock);
return ret;
}
#if __GNUC__ >= 3
void asmlinkage
-pfm_block_on_overflow(void)
+pfm_ovfl_block_reset(u64 arg0, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5,
+ u64 arg6, u64 arg7, long info)
#else
void asmlinkage
-pfm_block_on_overflow(u64 arg0, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7)
+pfm_ovfl_block_reset(u64 arg0, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5,
+ u64 arg6, u64 arg7, long info)
#endif
{
struct thread_struct *th = ¤t->thread;
@@ -1323,32 +2406,22 @@
int ret;
/*
- * NO matter what notify_pid is,
- * we clear overflow, won't notify again
+ * clear the flag, to make sure we won't get here
+ * again
*/
- th->pfm_must_block = 0;
+ th->pfm_ovfl_block_reset = 0;
/*
* do some sanity checks first
*/
if (!ctx) {
- printk("perfmon: process %d has no PFM context\n", current->pid);
- return;
- }
- if (ctx->ctx_notify_task == 0) {
- printk("perfmon: process %d has no task to notify\n", current->pid);
+ printk("perfmon: [%d] has no PFM context\n", current->pid);
return;
}
- DBprintk((" current=%d task=%d\n", current->pid, ctx->ctx_notify_task->pid));
-
- /* should not happen */
- if (CTX_OVFL_NOBLOCK(ctx)) {
- printk("perfmon: process %d non-blocking ctx should not be here\n", current->pid);
- return;
- }
+ if (CTX_OVFL_NOBLOCK(ctx)) goto non_blocking;
- DBprintk((" CPU%d %d before sleep\n", smp_processor_id(), current->pid));
+ DBprintk(("[%d] before sleeping\n", current->pid));
/*
* may go through without blocking on SMP systems
@@ -1356,12 +2429,14 @@
*/
ret = down_interruptible(&ctx->ctx_restart_sem);
- DBprintk((" CPU%d %d after sleep ret=%d\n", smp_processor_id(), current->pid, ret));
+ DBprintk(("[%d] after sleeping ret=%d\n", current->pid, ret));
/*
* in case of interruption of down() we don't restart anything
*/
if (ret >= 0) {
+
+non_blocking:
/* we reactivate on context switch */
ctx->ctx_fl_frozen = 0;
/*
@@ -1369,19 +2444,19 @@
* use the local reference
*/
- pfm_reset_regs(ctx);
+ pfm_reset_regs(ctx, ctx->ctx_ovfl_regs, PFM_RELOAD_LONG_RESET);
+
+ ctx->ctx_ovfl_regs[0] = 0UL;
/*
* Unlock sampling buffer and reset index atomically
* XXX: not really needed when blocking
*/
if (CTX_HAS_SMPL(ctx)) {
- ctx->ctx_smpl_buf->psb_hdr->hdr_count = 0;
- ctx->ctx_smpl_buf->psb_index = 0;
+ ctx->ctx_psb->psb_hdr->hdr_count = 0;
+ ctx->ctx_psb->psb_index = 0;
}
- DBprintk((" CPU%d %d unfreeze PMU\n", smp_processor_id(), current->pid));
-
ia64_set_pmc(0, 0);
ia64_srlz_d();
@@ -1390,264 +2465,257 @@
}
/*
- * main overflow processing routine.
- * it can be called from the interrupt path or explicitely during the context switch code
+ * This function will record an entry in the sampling if it is not full already.
* Return:
- * new value of pmc[0]. if 0x0 then unfreeze, else keep frozen
+ * 0 : buffer is not full (did not BECOME full: still space or was already full)
+ * 1 : buffer is full (recorded the last entry)
*/
-unsigned long
-update_counters (struct task_struct *task, u64 pmc0, struct pt_regs *regs)
+static int
+pfm_record_sample(struct task_struct *task, pfm_context_t *ctx, unsigned long ovfl_mask, struct pt_regs *regs)
{
- unsigned long mask, i, cnum;
- struct thread_struct *th;
- pfm_context_t *ctx;
- unsigned long bv = 0;
- int my_cpu = smp_processor_id();
- int ret = 1, buffer_is_full = 0;
- int ovfl_has_long_recovery, can_notify, need_reset_pmd16=0;
- struct siginfo si;
+ pfm_smpl_buffer_desc_t *psb = ctx->ctx_psb;
+ unsigned long *e, m, idx;
+ perfmon_smpl_entry_t *h;
+ int j;
+
+
+ pfm_stats.pfm_recorded_samples_count++;
+
+ idx = ia64_fetch_and_add(1, &psb->psb_index);
+ DBprintk(("recording index=%ld entries=%ld\n", idx-1, psb->psb_entries));
/*
- * It is never safe to access the task for which the overflow interrupt is destinated
- * using the current variable as the interrupt may occur in the middle of a context switch
- * where current does not hold the task that is running yet.
- *
- * For monitoring, however, we do need to get access to the task which caused the overflow
- * to account for overflow on the counters.
- *
- * We accomplish this by maintaining a current owner of the PMU per CPU. During context
- * switch the ownership is changed in a way such that the reflected owner is always the
- * valid one, i.e. the one that caused the interrupt.
+ * XXX: there is a small chance that we could run out on index before resetting
+ * but index is unsigned long, so it will take some time.....
+ * We use > instead of == because fetch_and_add() is off by one (see below)
+ *
+ * This case can happen in non-blocking mode or with multiple processes.
+ * For non-blocking, we need to reload and continue.
*/
+ if (idx > psb->psb_entries) return 0;
- if (task == NULL) {
- DBprintk((" owners[%d]=NULL\n", my_cpu));
- return 0x1;
- }
- th = &task->thread;
- ctx = th->pfm_context;
+ /* first entry is really entry 0, not 1 caused by fetch_and_add */
+ idx--;
- /*
- * XXX: debug test
- * Don't think this could happen given upfront tests
- */
- if ((th->flags & IA64_THREAD_PM_VALID) == 0 && ctx->ctx_fl_system == 0) {
- printk("perfmon: Spurious overflow interrupt: process %d not using perfmon\n", task->pid);
- return 0x1;
- }
- if (!ctx) {
- printk("perfmon: Spurious overflow interrupt: process %d has no PFM context\n", task->pid);
- return 0;
- }
+ h = (perfmon_smpl_entry_t *)(((char *)psb->psb_addr) + idx*(psb->psb_entry_size));
/*
- * sanity test. Should never happen
+ * initialize entry header
*/
- if ((pmc0 & 0x1 )== 0) {
- printk("perfmon: pid %d pmc0=0x%lx assumption error for freeze bit\n", task->pid, pmc0);
- return 0x0;
- }
+ h->pid = task->pid;
+ h->cpu = smp_processor_id();
+ h->rate = 0; /* XXX: add the sampling rate used here */
+ h->ip = regs ? regs->cr_iip : 0x0; /* where did the fault happened */
+ h->regs = ovfl_mask; /* which registers overflowed */
- mask = pmc0 >> PMU_FIRST_COUNTER;
+ /* guaranteed to monotonically increase on each cpu */
+ h->stamp = pfm_get_stamp();
+ h->period = 0UL; /* not yet used */
- DBprintk(("pmc0=0x%lx pid=%d owner=%d iip=0x%lx, ctx is in %s mode used_pmds=0x%lx used_pmcs=0x%lx\n",
- pmc0, task->pid, PMU_OWNER()->pid, regs->cr_iip,
- CTX_OVFL_NOBLOCK(ctx) ? "NO-BLOCK" : "BLOCK",
- ctx->ctx_used_pmds[0],
- ctx->ctx_used_pmcs[0]));
+ /* position for first pmd */
+ e = (unsigned long *)(h+1);
/*
- * XXX: need to record sample only when an EAR/BTB has overflowed
+ * selectively store PMDs in increasing index number
*/
- if (CTX_HAS_SMPL(ctx)) {
- pfm_smpl_buffer_desc_t *psb = ctx->ctx_smpl_buf;
- unsigned long *e, m, idx=0;
- perfmon_smpl_entry_t *h;
- int j;
-
- idx = ia64_fetch_and_add(1, &psb->psb_index);
- DBprintk((" recording index=%ld entries=%ld\n", idx, psb->psb_entries));
-
- /*
- * XXX: there is a small chance that we could run out on index before resetting
- * but index is unsigned long, so it will take some time.....
- * We use > instead of == because fetch_and_add() is off by one (see below)
- *
- * This case can happen in non-blocking mode or with multiple processes.
- * For non-blocking, we need to reload and continue.
- */
- if (idx > psb->psb_entries) {
- buffer_is_full = 1;
- goto reload_pmds;
- }
+ m = ctx->ctx_smpl_regs[0];
+ for (j=0; m; m >>=1, j++) {
- /* first entry is really entry 0, not 1 caused by fetch_and_add */
- idx--;
+ if ((m & 0x1) == 0) continue;
- h = (perfmon_smpl_entry_t *)(((char *)psb->psb_addr) + idx*(psb->psb_entry_size));
-
- h->pid = task->pid;
- h->cpu = my_cpu;
- h->rate = 0;
- h->ip = regs ? regs->cr_iip : 0x0; /* where did the fault happened */
- h->regs = mask; /* which registers overflowed */
-
- /* guaranteed to monotonically increase on each cpu */
- h->stamp = perfmon_get_stamp();
-
- e = (unsigned long *)(h+1);
-
- /*
- * selectively store PMDs in increasing index number
- */
- for (j=0, m = ctx->ctx_smpl_regs; m; m >>=1, j++) {
- if (m & 0x1) {
- if (PMD_IS_COUNTER(j))
- *e = ctx->ctx_pmds[j-PMU_FIRST_COUNTER].val
- + (ia64_get_pmd(j) & pmu_conf.perf_ovfl_val);
- else {
- *e = ia64_get_pmd(j); /* slow */
- }
- DBprintk((" e=%p pmd%d =0x%lx\n", (void *)e, j, *e));
- e++;
- }
+ if (PMD_IS_COUNTING(j)) {
+ *e = pfm_read_soft_counter(ctx, j);
+ /* check if this pmd overflowed as well */
+ *e += ovfl_mask & (1UL<psb_hdr->hdr_count);
+ DBprintk(("e=%p pmd%d =0x%lx\n", (void *)e, j, *e));
+ e++;
+ }
+ /*
+ * make the new entry visible to user, needs to be atomic
+ */
+ ia64_fetch_and_add(1, &psb->psb_hdr->hdr_count);
- DBprintk((" index=%ld entries=%ld hdr_count=%ld\n", idx, psb->psb_entries, psb->psb_hdr->hdr_count));
- /*
- * sampling buffer full ?
+ DBprintk(("index=%ld entries=%ld hdr_count=%ld\n",
+ idx, psb->psb_entries, psb->psb_hdr->hdr_count));
+ /*
+ * sampling buffer full ?
+ */
+ if (idx == (psb->psb_entries-1)) {
+ DBprintk(("sampling buffer full\n"));
+ /*
+ * XXX: must reset buffer in blocking mode and lost notified
*/
- if (idx == (psb->psb_entries-1)) {
- /*
- * will cause notification, cannot be 0
- */
- bv = mask << PMU_FIRST_COUNTER;
-
- buffer_is_full = 1;
-
- DBprintk((" sampling buffer full must notify bv=0x%lx\n", bv));
-
- /*
- * we do not reload here, when context is blocking
- */
- if (!CTX_OVFL_NOBLOCK(ctx)) goto no_reload;
-
- /*
- * here, we have a full buffer but we are in non-blocking mode
- * so we need to reload overflowed PMDs with sampling reset values
- * and restart right away.
- */
- }
- /* FALL THROUGH */
+ return 1;
}
-reload_pmds:
+ return 0;
+}
+/*
+ * main overflow processing routine.
+ * it can be called from the interrupt path or explicitely during the context switch code
+ * Return:
+ * new value of pmc[0]. if 0x0 then unfreeze, else keep frozen
+ */
+static unsigned long
+pfm_overflow_handler(struct task_struct *task, pfm_context_t *ctx, u64 pmc0, struct pt_regs *regs)
+{
+ unsigned long mask;
+ struct thread_struct *t;
+ unsigned long old_val;
+ unsigned long ovfl_notify = 0UL, ovfl_pmds = 0UL;
+ int i;
+ int ret = 1;
+ struct siginfo si;
/*
- * in the case of a non-blocking context, we reload
- * with the ovfl_rval when no user notification is taking place (short recovery)
- * otherwise when the buffer is full which requires user interaction) then we use
- * smpl_rval which is the long_recovery path (disturbance introduce by user execution).
+ * It is never safe to access the task for which the overflow interrupt is destinated
+ * using the current variable as the interrupt may occur in the middle of a context switch
+ * where current does not hold the task that is running yet.
+ *
+ * For monitoring, however, we do need to get access to the task which caused the overflow
+ * to account for overflow on the counters.
*
- * XXX: implies that when buffer is full then there is always notification.
+ * We accomplish this by maintaining a current owner of the PMU per CPU. During context
+ * switch the ownership is changed in a way such that the reflected owner is always the
+ * valid one, i.e. the one that caused the interrupt.
*/
- ovfl_has_long_recovery = CTX_OVFL_NOBLOCK(ctx) && buffer_is_full;
+
+ t = &task->thread;
/*
- * XXX: CTX_HAS_SMPL() should really be something like CTX_HAS_SMPL() and is activated,i.e.,
- * one of the PMC is configured for EAR/BTB.
- *
- * When sampling, we can only notify when the sampling buffer is full.
+ * XXX: debug test
+ * Don't think this could happen given upfront tests
*/
- can_notify = CTX_HAS_SMPL(ctx) == 0 && ctx->ctx_notify_task;
+ if ((t->flags & IA64_THREAD_PM_VALID) == 0 && ctx->ctx_fl_system == 0) {
+ printk("perfmon: Spurious overflow interrupt: process %d not using perfmon\n",
+ task->pid);
+ return 0x1;
+ }
+ /*
+ * sanity test. Should never happen
+ */
+ if ((pmc0 & 0x1) == 0) {
+ printk("perfmon: pid %d pmc0=0x%lx assumption error for freeze bit\n",
+ task->pid, pmc0);
+ return 0x0;
+ }
- DBprintk((" ovfl_has_long_recovery=%d can_notify=%d\n", ovfl_has_long_recovery, can_notify));
+ mask = pmc0 >> PMU_FIRST_COUNTER;
- for (i = 0, cnum = PMU_FIRST_COUNTER; mask ; cnum++, i++, mask >>= 1) {
+ DBprintk(("pmc0=0x%lx pid=%d iip=0x%lx, %s"
+ " mode used_pmds=0x%lx used_pmcs=0x%lx reload_pmcs=0x%lx\n",
+ pmc0, task->pid, (regs ? regs->cr_iip : 0),
+ CTX_OVFL_NOBLOCK(ctx) ? "nonblocking" : "blocking",
+ ctx->ctx_used_pmds[0],
+ ctx->ctx_used_pmcs[0],
+ ctx->ctx_reload_pmcs[0]));
+ /*
+ * First we update the virtual counters
+ */
+ for (i = PMU_FIRST_COUNTER; mask ; i++, mask >>= 1) {
+
+ /* skip pmd which did not overflow */
if ((mask & 0x1) == 0) continue;
- DBprintk((" PMD[%ld] overflowed pmd=0x%lx pmod.val=0x%lx\n", cnum, ia64_get_pmd(cnum), ctx->ctx_pmds[i].val));
+ DBprintk(("pmd[%d] overflowed hw_pmd=0x%lx soft_pmd=0x%lx\n",
+ i, ia64_get_pmd(i), ctx->ctx_soft_pmds[i].val));
/*
* Because we sometimes (EARS/BTB) reset to a specific value, we cannot simply use
- * val to count the number of times we overflowed. Otherwise we would loose the current value
- * in the PMD (which can be >0). So to make sure we don't loose
+ * val to count the number of times we overflowed. Otherwise we would loose the
+ * current value in the PMD (which can be >0). So to make sure we don't loose
* the residual counts we set val to contain full 64bits value of the counter.
- *
- * XXX: is this needed for EARS/BTB ?
*/
- ctx->ctx_pmds[i].val += 1 + pmu_conf.perf_ovfl_val
- + (ia64_get_pmd(cnum) & pmu_conf.perf_ovfl_val); /* slow */
+ old_val = ctx->ctx_soft_pmds[i].val;
+ ctx->ctx_soft_pmds[i].val = 1 + pmu_conf.perf_ovfl_val + pfm_read_soft_counter(ctx, i);
- DBprintk((" pmod[%ld].val=0x%lx pmd=0x%lx\n", i, ctx->ctx_pmds[i].val, ia64_get_pmd(cnum)&pmu_conf.perf_ovfl_val));
+ DBprintk(("soft_pmd[%d].val=0x%lx old_val=0x%lx pmd=0x%lx\n",
+ i, ctx->ctx_soft_pmds[i].val, old_val,
+ ia64_get_pmd(i) & pmu_conf.perf_ovfl_val));
- if (can_notify && PMD_OVFL_NOTIFY(ctx, i)) {
- DBprintk((" CPU%d should notify task %p with signal %d\n", my_cpu, ctx->ctx_notify_task, ctx->ctx_notify_sig));
- bv |= 1 << i;
- } else {
- DBprintk((" CPU%d PMD[%ld] overflow, no notification\n", my_cpu, cnum));
- /*
- * In case no notification is requested, we reload the reset value right away
- * otherwise we wait until the notify_pid process has been called and has
- * has finished processing data. Check out pfm_overflow_notify()
- */
+ /*
+ * now that we have extracted the hardware counter, we can clear it to ensure
+ * that a subsequent PFM_READ_PMDS will not include it again.
+ */
+ ia64_set_pmd(i, 0UL);
+
+ /*
+ * check for overflow condition
+ */
+ if (old_val > ctx->ctx_soft_pmds[i].val) {
+
+ ovfl_pmds |= 1UL << i;
- /* writes to upper part are ignored, so this is safe */
- if (ovfl_has_long_recovery) {
- DBprintk((" CPU%d PMD[%ld] reload with smpl_val=%lx\n", my_cpu, cnum,ctx->ctx_pmds[i].smpl_rval));
- ia64_set_pmd(cnum, ctx->ctx_pmds[i].smpl_rval);
- } else {
- DBprintk((" CPU%d PMD[%ld] reload with ovfl_val=%lx\n", my_cpu, cnum,ctx->ctx_pmds[i].smpl_rval));
- ia64_set_pmd(cnum, ctx->ctx_pmds[i].ovfl_rval);
+ DBprintk(("soft_pmd[%d] overflowed flags=0x%x, ovfl=0x%lx\n", i, ctx->ctx_soft_pmds[i].flags, ovfl_pmds));
+
+ if (PMC_OVFL_NOTIFY(ctx, i)) {
+ ovfl_notify |= 1UL << i;
}
}
- if (cnum == ctx->ctx_btb_counter) need_reset_pmd16=1;
}
+
/*
- * In case of BTB overflow we need to reset the BTB index.
+ * check for sampling buffer
+ *
+ * if present, record sample. We propagate notification ONLY when buffer
+ * becomes full.
*/
- if (need_reset_pmd16) {
- DBprintk(("reset PMD16\n"));
- ia64_set_pmd(16, 0);
+ if(CTX_HAS_SMPL(ctx)) {
+ ret = pfm_record_sample(task, ctx, ovfl_pmds, regs);
+ if (ret == 1) {
+ /*
+ * Sampling buffer became full
+ * If no notication was requested, then we reset buffer index
+ * and reset registers (done below) and resume.
+ * If notification requested, then defer reset until pfm_restart()
+ */
+ if (ovfl_notify == 0UL) {
+ ctx->ctx_psb->psb_hdr->hdr_count = 0UL;
+ ctx->ctx_psb->psb_index = 0UL;
+ }
+ } else {
+ /*
+ * sample recorded in buffer, no need to notify user
+ */
+ ovfl_notify = 0UL;
+ }
}
-no_reload:
-
/*
- * some counters overflowed, but they did not require
- * user notification, so after having reloaded them above
- * we simply restart
+ * No overflow requiring a user level notification
*/
- if (!bv) return 0x0;
+ if (ovfl_notify == 0UL) {
+ pfm_reset_regs(ctx, &ovfl_pmds, PFM_RELOAD_SHORT_RESET);
+ return 0x0;
+ }
- ctx->ctx_ovfl_regs = bv; /* keep track of what to reset when unblocking */
- /*
- * Now we know that:
- * - we have some counters which overflowed (contains in bv)
- * - someone has asked to be notified on overflow.
+ /*
+ * keep track of what to reset when unblocking
*/
+ ctx->ctx_ovfl_regs[0] = ovfl_pmds;
-
/*
- * If the notification task is still present, then notify_task is non
- * null. It is clean by that task if it ever exits before we do.
+ * we have come to this point because there was an overflow and that notification
+ * was requested. The notify_task may have disappeared, in which case notify_task
+ * is NULL.
*/
-
if (ctx->ctx_notify_task) {
si.si_errno = 0;
si.si_addr = NULL;
si.si_pid = task->pid; /* who is sending */
- si.si_signo = ctx->ctx_notify_sig; /* is SIGPROF */
- si.si_code = PROF_OVFL; /* goes to user */
- si.si_pfm_ovfl = bv;
-
-
+ si.si_signo = SIGPROF;
+ si.si_code = PROF_OVFL; /* indicates a perfmon SIGPROF signal */
+ /*
+ * Shift the bitvector such that the user sees bit 4 for PMD4 and so on.
+ * We only use smpl_ovfl[0] for now. It should be fine for quite a while
+ * until we have more than 61 PMD available.
+ */
+ si.si_pfm_ovfl[0] = ovfl_notify;
/*
* when the target of the signal is not ourself, we have to be more
@@ -1659,15 +2727,29 @@
if (ctx->ctx_notify_task != current) {
/*
* grab the notification lock for this task
+ * This guarantees that the sequence: test + send_signal
+ * is atomic with regards to the ctx_notify_task field.
+ *
+ * We need a spinlock and not just an atomic variable for this.
+ *
*/
- spin_lock(&ctx->ctx_notify_lock);
+ spin_lock(&ctx->ctx_lock);
/*
* now notify_task cannot be modified until we're done
* if NULL, they it got modified while we were in the handler
*/
if (ctx->ctx_notify_task == NULL) {
- spin_unlock(&ctx->ctx_notify_lock);
+
+ spin_unlock(&ctx->ctx_lock);
+
+ /*
+ * If we've lost the notified task, then we will run
+ * to completion wbut keep the PMU frozen. Results
+ * will be incorrect anyway. We do not kill task
+ * to leave it possible to attach perfmon context
+ * to already running task.
+ */
goto lost_notify;
}
/*
@@ -1681,20 +2763,23 @@
* necessarily go to the signal handler (if any) when it goes back to
* user mode.
*/
- DBprintk((" %d sending %d notification to %d\n", task->pid, si.si_signo, ctx->ctx_notify_task->pid));
+ DBprintk(("[%d] sending notification to [%d]\n",
+ task->pid, ctx->ctx_notify_task->pid));
/*
* this call is safe in an interrupt handler, so does read_lock() on tasklist_lock
*/
- ret = send_sig_info(ctx->ctx_notify_sig, &si, ctx->ctx_notify_task);
- if (ret != 0) printk(" send_sig_info(process %d, SIGPROF)=%d\n", ctx->ctx_notify_task->pid, ret);
+ ret = send_sig_info(SIGPROF, &si, ctx->ctx_notify_task);
+ if (ret != 0)
+ printk("send_sig_info(process %d, SIGPROF)=%d\n",
+ ctx->ctx_notify_task->pid, ret);
/*
* now undo the protections in order
*/
if (ctx->ctx_notify_task != current) {
read_unlock(&tasklist_lock);
- spin_unlock(&ctx->ctx_notify_lock);
+ spin_unlock(&ctx->ctx_lock);
}
/*
@@ -1711,35 +2796,41 @@
* before, changing it to NULL will still maintain this invariant.
* Of course, when it is equal to current it cannot change at this point.
*/
- if (!CTX_OVFL_NOBLOCK(ctx) && ctx->ctx_notify_task != current) {
- th->pfm_must_block = 1; /* will cause blocking */
+ DBprintk(("block=%d notify [%d] current [%d]\n",
+ ctx->ctx_fl_block,
+ ctx->ctx_notify_task ? ctx->ctx_notify_task->pid: -1,
+ current->pid ));
+
+ if (!CTX_OVFL_NOBLOCK(ctx) && ctx->ctx_notify_task != task) {
+ t->pfm_ovfl_block_reset = 1; /* will cause blocking */
}
} else {
-lost_notify:
- DBprintk((" notification task has disappeared !\n"));
+lost_notify: /* XXX: more to do here, to convert to non-blocking (reset values) */
+
+ DBprintk(("notification task has disappeared !\n"));
/*
- * for a non-blocking context, we make sure we do not fall into the pfm_overflow_notify()
- * trap. Also in the case of a blocking context with lost notify process, then we do not
- * want to block either (even though it is interruptible). In this case, the PMU will be kept
- * frozen and the process will run to completion without monitoring enabled.
+ * for a non-blocking context, we make sure we do not fall into the
+ * pfm_overflow_notify() trap. Also in the case of a blocking context with lost
+ * notify process, then we do not want to block either (even though it is
+ * interruptible). In this case, the PMU will be kept frozen and the process will
+ * run to completion without monitoring enabled.
*
* Of course, we cannot loose notify process when self-monitoring.
*/
- th->pfm_must_block = 0;
+ t->pfm_ovfl_block_reset = 0;
}
/*
- * if we block, we keep the PMU frozen. If non-blocking we restart.
- * in the case of non-blocking were the notify process is lost, we also
- * restart.
+ * If notification was successful, then we rely on the pfm_restart()
+ * call to unfreeze and reset (in both blocking or non-blocking mode).
+ *
+ * If notification failed, then we will keep the PMU frozen and run
+ * the task to completion
*/
- if (!CTX_OVFL_NOBLOCK(ctx))
- ctx->ctx_fl_frozen = 1;
- else
- ctx->ctx_fl_frozen = 0;
+ ctx->ctx_fl_frozen = 1;
- DBprintk((" reload pmc0=0x%x must_block=%ld\n",
- ctx->ctx_fl_frozen ? 0x1 : 0x0, th->pfm_must_block));
+ DBprintk(("return pmc0=0x%x must_block=%ld\n",
+ ctx->ctx_fl_frozen ? 0x1 : 0x0, t->pfm_ovfl_block_reset));
return ctx->ctx_fl_frozen ? 0x1 : 0x0;
}
@@ -1748,29 +2839,71 @@
perfmon_interrupt (int irq, void *arg, struct pt_regs *regs)
{
u64 pmc0;
- struct task_struct *ta;
+ struct task_struct *task;
+ pfm_context_t *ctx;
- pmc0 = ia64_get_pmc(0); /* slow */
+ pfm_stats.pfm_ovfl_intr_count++;
+
+ /*
+ * srlz.d done before arriving here
+ *
+ * This is slow
+ */
+ pmc0 = ia64_get_pmc(0);
/*
* if we have some pending bits set
* assumes : if any PM[0].bit[63-1] is set, then PMC[0].fr = 1
*/
- if ((pmc0 & ~0x1) && (ta=PMU_OWNER())) {
+ if ((pmc0 & ~0x1UL)!=0UL && (task=PMU_OWNER())!= NULL) {
+ /*
+ * we assume that pmc0.fr is always set here
+ */
+ ctx = task->thread.pfm_context;
- /* assumes, PMC[0].fr = 1 at this point */
- pmc0 = update_counters(ta, pmc0, regs);
+ /* sanity check */
+ if (!ctx) {
+ printk("perfmon: Spurious overflow interrupt: process %d has no PFM context\n",
+ task->pid);
+ return;
+ }
+#ifdef CONFIG_SMP
+ /*
+ * Because an IPI has higher priority than the PMU overflow interrupt, it is
+ * possible that the handler be interrupted by a request from another CPU to fetch
+ * the PMU state of the currently active context. The task may have just been
+ * migrated to another CPU which is trying to restore the context. If there was
+ * a pending overflow interrupt when the task left this CPU, it is possible for
+ * the handler to get interrupt by the IPI. In which case, we fetch request
+ * MUST be postponed until the interrupt handler is done. The ctx_is_busy
+ * flag indicates such a condition. The other CPU must busy wait until it's cleared.
+ */
+ atomic_set(&ctx->ctx_is_busy, 1);
+#endif
+
+ /*
+ * assume PMC[0].fr = 1 at this point
+ */
+ pmc0 = pfm_overflow_handler(task, ctx, pmc0, regs);
/*
- * if pmu_frozen = 0
- * pmc0 = 0 and we resume monitoring right away
- * else
- * pmc0 = 0x1 frozen but all pending bits are cleared
+ * We always clear the overflow status bits and either unfreeze
+ * or keep the PMU frozen.
*/
ia64_set_pmc(0, pmc0);
ia64_srlz_d();
+
+#ifdef CONFIG_SMP
+ /*
+ * announce that we are doing with the context
+ */
+ atomic_set(&ctx->ctx_is_busy, 0);
+#endif
} else {
- printk("perfmon: Spurious PMU overflow interrupt: pmc0=0x%lx owner=%p\n", pmc0, (void *)PMU_OWNER());
+ pfm_stats.pfm_spurious_ovfl_intr_count++;
+
+ DBprintk(("perfmon: Spurious PMU overflow interrupt on CPU%d: pmc0=0x%lx owner=%p\n",
+ smp_processor_id(), pmc0, (void *)PMU_OWNER()));
}
}
@@ -1778,26 +2911,74 @@
static int
perfmon_proc_info(char *page)
{
+#ifdef CONFIG_SMP
+#define cpu_is_online(i) (cpu_online_map & (1UL << i))
+#else
+#define cpu_is_online(i) 1
+#endif
char *p = page;
- u64 pmc0 = ia64_get_pmc(0);
int i;
- p += sprintf(p, "CPU%d.pmc[0]=%lx\nPerfmon debug: %s\n", smp_processor_id(), pmc0, pfm_debug ? "On" : "Off");
- p += sprintf(p, "proc_sessions=%lu sys_sessions=%lu\n",
- pfs_info.pfs_proc_sessions,
- pfs_info.pfs_sys_session);
+ p += sprintf(p, "enabled : %s\n", pmu_conf.pfm_is_disabled ? "No": "Yes");
+ p += sprintf(p, "debug : %s\n", pfm_debug_mode > 0 || pfm_sysctl.debug > 0 ? "Yes": "No");
+ p += sprintf(p, "fastctxsw : %s\n", pfm_sysctl.fastctxsw > 0 ? "Yes": "No");
+ p += sprintf(p, "ovfl_mask : 0x%lx\n", pmu_conf.perf_ovfl_val);
+ p += sprintf(p, "overflow intrs : %lu\n", pfm_stats.pfm_ovfl_intr_count);
+ p += sprintf(p, "spurious intrs : %lu\n", pfm_stats.pfm_spurious_ovfl_intr_count);
+ p += sprintf(p, "recorded samples : %lu\n", pfm_stats.pfm_recorded_samples_count);
+ p += sprintf(p, "restored dbrs : %lu\n", pfm_stats.pfm_restore_dbrs);
+ p += sprintf(p, "ctxsw reload pmds: %lu\n", pfm_stats.pfm_ctxsw_reload_pmds);
+ p += sprintf(p, "ctxsw used pmds : %lu\n", pfm_stats.pfm_ctxsw_used_pmds);
+
+#ifdef CONFIG_SMP
+ p += sprintf(p, "CPU%d syst_wide : %d\n"
+ "CPU%d dcr_pp : %d\n",
+ smp_processor_id(),
+ local_cpu_data->pfm_syst_wide,
+ smp_processor_id(),
+ local_cpu_data->pfm_dcr_pp);
+#endif
+
+ LOCK_PFS();
+ p += sprintf(p, "proc_sessions : %lu\n"
+ "sys_sessions : %lu\n"
+ "sys_use_dbregs : %lu\n"
+ "ptrace_use_dbregs: %lu\n",
+ pfm_sessions.pfs_task_sessions,
+ pfm_sessions.pfs_sys_sessions,
+ pfm_sessions.pfs_sys_use_dbregs,
+ pfm_sessions.pfs_ptrace_use_dbregs);
+
+ UNLOCK_PFS();
for(i=0; i < NR_CPUS; i++) {
if (cpu_is_online(i)) {
- p += sprintf(p, "CPU%d.pmu_owner: %-6d\n",
+ p += sprintf(p, "CPU%d owner : %-6d\n",
i,
pmu_owners[i].owner ? pmu_owners[i].owner->pid: -1);
}
}
+
+ for(i=0; pmd_desc[i].type != PFM_REG_NONE; i++) {
+ p += sprintf(p, "PMD%-2d: %d 0x%lx 0x%lx\n",
+ i,
+ pmd_desc[i].type,
+ pmd_desc[i].dep_pmd[0],
+ pmd_desc[i].dep_pmc[0]);
+ }
+
+ for(i=0; pmc_desc[i].type != PFM_REG_NONE; i++) {
+ p += sprintf(p, "PMC%-2d: %d 0x%lx 0x%lx\n",
+ i,
+ pmc_desc[i].type,
+ pmc_desc[i].dep_pmd[0],
+ pmc_desc[i].dep_pmc[0]);
+ }
+
return p - page;
}
-/* for debug only */
+/* /proc interface, for debug only */
static int
perfmon_read_entry(char *page, char **start, off_t off, int count, int *eof, void *data)
{
@@ -1814,153 +2995,90 @@
return len;
}
-static struct irqaction perfmon_irqaction = {
- handler: perfmon_interrupt,
- flags: SA_INTERRUPT,
- name: "perfmon"
-};
-
-void __init
-perfmon_init (void)
+#ifdef CONFIG_SMP
+void
+pfm_syst_wide_update_task(struct task_struct *task, int mode)
{
- pal_perf_mon_info_u_t pm_info;
- s64 status;
+ struct pt_regs *regs = (struct pt_regs *)((unsigned long) task + IA64_STK_OFFSET);
- register_percpu_irq(IA64_PERFMON_VECTOR, &perfmon_irqaction);
+ regs--;
- ia64_set_pmv(IA64_PERFMON_VECTOR);
- ia64_srlz_d();
-
- pmu_conf.pfm_is_disabled = 1;
+ /*
+ * propagate the value of the dcr_pp bit to the psr
+ */
+ ia64_psr(regs)->pp = mode ? local_cpu_data->pfm_dcr_pp : 0;
+}
+#endif
- printk("perfmon: version %s (sampling format v%d)\n", PFM_VERSION, PFM_SMPL_HDR_VERSION);
- printk("perfmon: Interrupt vectored to %u\n", IA64_PERFMON_VECTOR);
- if ((status=ia64_pal_perf_mon_info(pmu_conf.impl_regs, &pm_info)) != 0) {
- printk("perfmon: PAL call failed (%ld)\n", status);
- return;
- }
- pmu_conf.perf_ovfl_val = (1L << pm_info.pal_perf_mon_info_s.width) - 1;
- pmu_conf.max_counters = pm_info.pal_perf_mon_info_s.generic;
- pmu_conf.num_pmcs = find_num_pm_regs(pmu_conf.impl_regs);
- pmu_conf.num_pmds = find_num_pm_regs(&pmu_conf.impl_regs[4]);
+void
+pfm_save_regs (struct task_struct *task)
+{
+ pfm_context_t *ctx;
+ u64 psr;
- printk("perfmon: %d bits counters (max value 0x%lx)\n", pm_info.pal_perf_mon_info_s.width, pmu_conf.perf_ovfl_val);
- printk("perfmon: %ld PMC/PMD pairs, %ld PMCs, %ld PMDs\n", pmu_conf.max_counters, pmu_conf.num_pmcs, pmu_conf.num_pmds);
+ ctx = task->thread.pfm_context;
- /* sanity check */
- if (pmu_conf.num_pmds >= IA64_NUM_PMD_REGS || pmu_conf.num_pmcs >= IA64_NUM_PMC_REGS) {
- printk(KERN_ERR "perfmon: ERROR not enough PMC/PMD storage in kernel, perfmon is DISABLED\n");
- return; /* no need to continue anyway */
- }
- /* we are all set */
- pmu_conf.pfm_is_disabled = 0;
/*
- * Insert the tasklet in the list.
- * It is still disabled at this point, so it won't run
- printk(__FUNCTION__" tasklet is %p state=%d, count=%d\n", &perfmon_tasklet, perfmon_tasklet.state, perfmon_tasklet.count);
+ * save current PSR: needed because we modify it
*/
+ __asm__ __volatile__ ("mov %0=psr;;": "=r"(psr) :: "memory");
/*
- * for now here for debug purposes
+ * stop monitoring:
+ * This is the last instruction which can generate an overflow
+ *
+ * We do not need to set psr.sp because, it is irrelevant in kernel.
+ * It will be restored from ipsr when going back to user level
*/
- perfmon_dir = create_proc_read_entry ("perfmon", 0, 0, perfmon_read_entry, NULL);
-}
+ __asm__ __volatile__ ("rum psr.up;;"::: "memory");
+
+ ctx->ctx_saved_psr = psr;
+
+ //ctx->ctx_last_cpu = smp_processor_id();
-void
-perfmon_init_percpu (void)
-{
- ia64_set_pmv(IA64_PERFMON_VECTOR);
- ia64_srlz_d();
}
-void
-pfm_save_regs (struct task_struct *ta)
+static void
+pfm_lazy_save_regs (struct task_struct *task)
{
- struct task_struct *owner;
pfm_context_t *ctx;
struct thread_struct *t;
- u64 pmc0, psr;
unsigned long mask;
int i;
- t = &ta->thread;
- ctx = ta->thread.pfm_context;
+ DBprintk(("on [%d] by [%d]\n", task->pid, current->pid));
- /*
- * We must make sure that we don't loose any potential overflow
- * interrupt while saving PMU context. In this code, external
- * interrupts are always enabled.
- */
+ t = &task->thread;
+ ctx = task->thread.pfm_context;
- /*
- * save current PSR: needed because we modify it
+#ifdef CONFIG_SMP
+ /*
+ * announce we are saving this PMU state
+ * This will cause other CPU, to wait until we're done
+ * before using the context.h
+ *
+ * must be an atomic operation
*/
- __asm__ __volatile__ ("mov %0=psr;;": "=r"(psr) :: "memory");
+ atomic_set(&ctx->ctx_saving_in_progress, 1);
- /*
- * stop monitoring:
- * This is the only way to stop monitoring without destroying overflow
- * information in PMC[0].
- * This is the last instruction which can cause overflow when monitoring
- * in kernel.
- * By now, we could still have an overflow interrupt in-flight.
- */
- __asm__ __volatile__ ("rsm psr.up|psr.pp;;"::: "memory");
+ /*
+ * if owner is NULL, it means that the other CPU won the race
+ * and the IPI has caused the context to be saved in pfm_handle_fectch_regs()
+ * instead of here. We have nothing to do
+ *
+ * note that this is safe, because the other CPU NEVER modifies saving_in_progress.
+ */
+ if (PMU_OWNER() == NULL) goto do_nothing;
+#endif
/*
- * Mark the PMU as not owned
- * This will cause the interrupt handler to do nothing in case an overflow
- * interrupt was in-flight
- * This also guarantees that pmc0 will contain the final state
- * It virtually gives us full control over overflow processing from that point
- * on.
- * It must be an atomic operation.
+ * do not own the PMU
*/
- owner = PMU_OWNER();
SET_PMU_OWNER(NULL);
- /*
- * read current overflow status:
- *
- * we are guaranteed to read the final stable state
- */
ia64_srlz_d();
- pmc0 = ia64_get_pmc(0); /* slow */
-
- /*
- * freeze PMU:
- *
- * This destroys the overflow information. This is required to make sure
- * next process does not start with monitoring on if not requested
- */
- ia64_set_pmc(0, 1);
-
- /*
- * Check for overflow bits and proceed manually if needed
- *
- * It is safe to call the interrupt handler now because it does
- * not try to block the task right away. Instead it will set a
- * flag and let the task proceed. The blocking will only occur
- * next time the task exits from the kernel.
- */
- if (pmc0 & ~0x1) {
- update_counters(owner, pmc0, NULL);
- /* we will save the updated version of pmc0 */
- }
- /*
- * restore PSR for context switch to save
- */
- __asm__ __volatile__ ("mov psr.l=%0;; srlz.i;;"::"r"(psr): "memory");
-
- /*
- * we do not save registers if we can do lazy
- */
- if (PFM_CAN_DO_LAZY()) {
- SET_PMU_OWNER(owner);
- return;
- }
/*
* XXX needs further optimization.
@@ -1971,116 +3089,405 @@
if (mask & 0x1) t->pmd[i] =ia64_get_pmd(i);
}
- /* skip PMC[0], we handle it separately */
- mask = ctx->ctx_used_pmcs[0]>>1;
- for (i=1; mask; i++, mask>>=1) {
- if (mask & 0x1) t->pmc[i] = ia64_get_pmc(i);
- }
+ /* save pmc0 */
+ t->pmc[0] = ia64_get_pmc(0);
+
+ /* not owned by this CPU */
+ atomic_set(&ctx->ctx_last_cpu, -1);
+
+#ifdef CONFIG_SMP
+do_nothing:
+#endif
/*
- * Throughout this code we could have gotten an overflow interrupt. It is transformed
- * into a spurious interrupt as soon as we give up pmu ownership.
+ * declare we are done saving this context
+ *
+ * must be an atomic operation
*/
+ atomic_set(&ctx->ctx_saving_in_progress,0);
+
}
-static void
-pfm_lazy_save_regs (struct task_struct *ta)
+#ifdef CONFIG_SMP
+/*
+ * Handles request coming from other CPUs
+ */
+static void
+pfm_handle_fetch_regs(void *info)
{
- pfm_context_t *ctx;
+ pfm_smp_ipi_arg_t *arg = info;
struct thread_struct *t;
+ pfm_context_t *ctx;
unsigned long mask;
int i;
- DBprintk((" on [%d] by [%d]\n", ta->pid, current->pid));
+ ctx = arg->task->thread.pfm_context;
+ t = &arg->task->thread;
+
+ DBprintk(("task=%d owner=%d saving=%d\n",
+ arg->task->pid,
+ PMU_OWNER() ? PMU_OWNER()->pid: -1,
+ atomic_read(&ctx->ctx_saving_in_progress)));
+
+ /* must wait until not busy before retrying whole request */
+ if (atomic_read(&ctx->ctx_is_busy)) {
+ arg->retval = 2;
+ return;
+ }
+
+ /* must wait if saving was interrupted */
+ if (atomic_read(&ctx->ctx_saving_in_progress)) {
+ arg->retval = 1;
+ return;
+ }
+
+ /* can proceed, done with context */
+ if (PMU_OWNER() != arg->task) {
+ arg->retval = 0;
+ return;
+ }
+
+ DBprintk(("saving state for [%d] used_pmcs=0x%lx reload_pmcs=0x%lx used_pmds=0x%lx\n",
+ arg->task->pid,
+ ctx->ctx_used_pmcs[0],
+ ctx->ctx_reload_pmcs[0],
+ ctx->ctx_used_pmds[0]));
+
+ /*
+ * XXX: will be replaced with pure assembly call
+ */
+ SET_PMU_OWNER(NULL);
+
+ ia64_srlz_d();
- t = &ta->thread;
- ctx = ta->thread.pfm_context;
/*
* XXX needs further optimization.
- * Also must take holes into account
*/
mask = ctx->ctx_used_pmds[0];
for (i=0; mask; i++, mask>>=1) {
- if (mask & 0x1) t->pmd[i] =ia64_get_pmd(i);
+ if (mask & 0x1) t->pmd[i] = ia64_get_pmd(i);
}
-
- /* skip PMC[0], we handle it separately */
- mask = ctx->ctx_used_pmcs[0]>>1;
- for (i=1; mask; i++, mask>>=1) {
- if (mask & 0x1) t->pmc[i] = ia64_get_pmc(i);
+
+ /* save pmc0 */
+ t->pmc[0] = ia64_get_pmc(0);
+
+ /* not owned by this CPU */
+ atomic_set(&ctx->ctx_last_cpu, -1);
+
+ /* can proceed */
+ arg->retval = 0;
+}
+
+/*
+ * Function call to fetch PMU state from another CPU identified by 'cpu'.
+ * If the context is being saved on the remote CPU, then we busy wait until
+ * the saving is done and then we return. In this case, non IPI is sent.
+ * Otherwise, we send an IPI to the remote CPU, potentially interrupting
+ * pfm_lazy_save_regs() over there.
+ *
+ * If the retval==1, then it means that we interrupted remote save and that we must
+ * wait until the saving is over before proceeding.
+ * Otherwise, we did the saving on the remote CPU, and it was done by the time we got there.
+ * in either case, we can proceed.
+ */
+static void
+pfm_fetch_regs(int cpu, struct task_struct *task, pfm_context_t *ctx)
+{
+ pfm_smp_ipi_arg_t arg;
+ int ret;
+
+ arg.task = task;
+ arg.retval = -1;
+
+ if (atomic_read(&ctx->ctx_is_busy)) {
+must_wait_busy:
+ while (atomic_read(&ctx->ctx_is_busy));
}
- SET_PMU_OWNER(NULL);
+
+ if (atomic_read(&ctx->ctx_saving_in_progress)) {
+ DBprintk(("no IPI, must wait for [%d] to be saved on [%d]\n", task->pid, cpu));
+must_wait_saving:
+ /* busy wait */
+ while (atomic_read(&ctx->ctx_saving_in_progress));
+ DBprintk(("done saving for [%d] on [%d]\n", task->pid, cpu));
+ return;
+ }
+ DBprintk(("calling CPU %d from CPU %d\n", cpu, smp_processor_id()));
+
+ if (cpu == -1) {
+ printk("refusing to use -1 for [%d]\n", task->pid);
+ return;
+ }
+
+ /* will send IPI to other CPU and wait for completion of remote call */
+ if ((ret=smp_call_function_single(cpu, pfm_handle_fetch_regs, &arg, 0, 1))) {
+ printk("perfmon: remote CPU call from %d to %d error %d\n", smp_processor_id(), cpu, ret);
+ return;
+ }
+ /*
+ * we must wait until saving is over on the other CPU
+ * This is the case, where we interrupted the saving which started just at the time we sent the
+ * IPI.
+ */
+ if (arg.retval == 1) goto must_wait_saving;
+ if (arg.retval == 2) goto must_wait_busy;
}
+#endif /* CONFIG_SMP */
void
-pfm_load_regs (struct task_struct *ta)
+pfm_load_regs (struct task_struct *task)
{
- struct thread_struct *t = &ta->thread;
- pfm_context_t *ctx = ta->thread.pfm_context;
+ struct thread_struct *t;
+ pfm_context_t *ctx;
struct task_struct *owner;
unsigned long mask;
+ u64 psr;
int i;
+#ifdef CONFIG_SMP
+ int cpu;
+#endif
owner = PMU_OWNER();
- if (owner == ta) goto skip_restore;
+ ctx = task->thread.pfm_context;
+
+ /*
+ * if we were the last user, then nothing to do except restore psr
+ */
+ if (owner == task) {
+ if (atomic_read(&ctx->ctx_last_cpu) != smp_processor_id())
+ DBprintk(("invalid last_cpu=%d for [%d]\n",
+ atomic_read(&ctx->ctx_last_cpu), task->pid));
+
+ psr = ctx->ctx_saved_psr;
+ __asm__ __volatile__ ("mov psr.l=%0;; srlz.i;;"::"r"(psr): "memory");
+
+ return;
+ }
+ DBprintk(("load_regs: must reload for [%d] owner=%d\n",
+ task->pid, owner ? owner->pid : -1 ));
+ /*
+ * someone else is still using the PMU, first push it out and
+ * then we'll be able to install our stuff !
+ */
if (owner) pfm_lazy_save_regs(owner);
- SET_PMU_OWNER(ta);
+#ifdef CONFIG_SMP
+ /*
+ * check if context on another CPU (-1 means saved)
+ * We MUST use the variable, as last_cpu may change behind our
+ * back. If it changes to -1 (not on a CPU anymore), then in cpu
+ * we have the last CPU the context was on. We may be sending the
+ * IPI for nothing, but we have no way of verifying this.
+ */
+ cpu = atomic_read(&ctx->ctx_last_cpu);
+ if (cpu != -1) {
+ pfm_fetch_regs(cpu, task, ctx);
+ }
+#endif
+ t = &task->thread;
- mask = ctx->ctx_used_pmds[0];
+ /*
+ * To avoid leaking information to the user level when psr.sp=0,
+ * we must reload ALL implemented pmds (even the ones we don't use).
+ * In the kernel we only allow PFM_READ_PMDS on registers which
+ * we initialized or requested (sampling) so there is no risk there.
+ *
+ * As an optimization, we will only reload the PMD that we use when
+ * the context is in protected mode, i.e. psr.sp=1 because then there
+ * is no leak possible.
+ */
+ mask = pfm_sysctl.fastctxsw || ctx->ctx_fl_protected ? ctx->ctx_used_pmds[0] : ctx->ctx_reload_pmds[0];
for (i=0; mask; i++, mask>>=1) {
if (mask & 0x1) ia64_set_pmd(i, t->pmd[i]);
}
+#if 0
+ mask = ctx->ctx_used_pmds[0];
+ for (i=0; mask; i++, mask>>=1) {
+ if (mask & 0x1)
+ ia64_set_pmd(i, t->pmd[i]);
+ else
+ ia64_set_pmd(i, 0UL);
+ }
+#endif
- /* skip PMC[0] to avoid side effects */
- mask = ctx->ctx_used_pmcs[0]>>1;
- for (i=1; mask; i++, mask>>=1) {
+ /*
+ * PMC0 is never set in the mask because it is always restored
+ * separately.
+ *
+ * ALL PMCs are systematically reloaded, unused registers
+ * get their default (PAL reset) values to avoid picking up
+ * stale configuration.
+ */
+ mask = ctx->ctx_reload_pmcs[0];
+ for (i=0; mask; i++, mask>>=1) {
if (mask & 0x1) ia64_set_pmc(i, t->pmc[i]);
}
-skip_restore:
+
+ /*
+ * we restore ALL the debug registers to avoid picking up
+ * stale state.
+ */
+ if (ctx->ctx_fl_using_dbreg) {
+ pfm_stats.pfm_restore_dbrs++;
+ for (i=0; i < pmu_conf.num_ibrs; i++) {
+ ia64_set_ibr(i, t->ibr[i]);
+ }
+ ia64_srlz_i();
+ for (i=0; i < pmu_conf.num_dbrs; i++) {
+ ia64_set_dbr(i, t->dbr[i]);
+ }
+ }
+ ia64_srlz_d();
+
+ if (t->pmc[0] & ~0x1) {
+ pfm_overflow_handler(task, ctx, t->pmc[0], NULL);
+ }
+
/*
- * unfreeze only when possible
+ * fl_frozen==1 when we are in blocking mode waiting for restart
*/
if (ctx->ctx_fl_frozen == 0) {
ia64_set_pmc(0, 0);
ia64_srlz_d();
- /* place where we potentially (kernel level) start monitoring again */
}
+ atomic_set(&ctx->ctx_last_cpu, smp_processor_id());
+
+ SET_PMU_OWNER(task);
+
+ /*
+ * restore the psr we changed in pfm_save_regs()
+ */
+ psr = ctx->ctx_saved_psr;
+ __asm__ __volatile__ ("mov psr.l=%0;; srlz.i;;"::"r"(psr): "memory");
+
+}
+
+/*
+ * XXX: make this routine able to work with non current context
+ */
+static void
+ia64_reset_pmu(struct task_struct *task)
+{
+ struct thread_struct *t = &task->thread;
+ pfm_context_t *ctx = t->pfm_context;
+ unsigned long mask;
+ int i;
+
+ if (task != current) {
+ printk("perfmon: invalid task in ia64_reset_pmu()\n");
+ return;
+ }
+
+ /* Let's make sure the PMU is frozen */
+ ia64_set_pmc(0,1);
+
+ /*
+ * install reset values for PMC. We skip PMC0 (done above)
+ * XX: good up to 64 PMCS
+ */
+ mask = pmu_conf.impl_regs[0] >> 1;
+ for(i=1; mask; mask>>=1, i++) {
+ if (mask & 0x1) {
+ ia64_set_pmc(i, reset_pmcs[i]);
+ /*
+ * When restoring context, we must restore ALL pmcs, even the ones
+ * that the task does not use to avoid leaks and possibly corruption
+ * of the sesion because of configuration conflicts. So here, we
+ * initialize the entire set used in the context switch restore routine.
+ */
+ t->pmc[i] = reset_pmcs[i];
+ DBprintk((" pmc[%d]=0x%lx\n", i, reset_pmcs[i]));
+
+ }
+ }
+ /*
+ * clear reset values for PMD.
+ * XXX: good up to 64 PMDS. Suppose that zero is a valid value.
+ */
+ mask = pmu_conf.impl_regs[4];
+ for(i=0; mask; mask>>=1, i++) {
+ if (mask & 0x1) ia64_set_pmd(i, 0UL);
+ t->pmd[i] = 0UL;
+ }
+
+ /*
+ * On context switched restore, we must restore ALL pmc and ALL pmd even
+ * when they are not actively used by the task. In UP, the incoming process
+ * may otherwise pick up left over PMC, PMD state from the previous process.
+ * As opposed to PMD, stale PMC can cause harm to the incoming
+ * process because they may change what is being measured.
+ * Therefore, we must systematically reinstall the entire
+ * PMC state. In SMP, the same thing is possible on the
+ * same CPU but also on between 2 CPUs.
+ *
+ * The problem with PMD is information leaking especially
+ * to user level when psr.sp=0
+ *
+ * There is unfortunately no easy way to avoid this problem
+ * on either UP or SMP. This definitively slows down the
+ * pfm_load_regs() function.
+ */
+
+ /*
+ * We must include all the PMC in this mask to make sure we don't
+ * see any side effect of a stale state, such as opcode matching
+ * or range restrictions, for instance.
+ *
+ * We never directly restore PMC0 so we do not include it in the mask.
+ */
+ ctx->ctx_reload_pmcs[0] = pmu_conf.impl_regs[0] & ~0x1;
+ /*
+ * We must include all the PMD in this mask to avoid picking
+ * up stale value and leak information, especially directly
+ * at the user level when psr.sp=0
+ */
+ ctx->ctx_reload_pmds[0] = pmu_conf.impl_regs[4];
+
+ /*
+ * Keep track of the pmds we want to sample
+ * XXX: may be we don't need to save/restore the DEAR/IEAR pmds
+ * but we do need the BTB for sure. This is because of a hardware
+ * buffer of 1 only for non-BTB pmds.
+ *
+ * We ignore the unimplemented pmds specified by the user
+ */
+ ctx->ctx_used_pmds[0] = ctx->ctx_smpl_regs[0] & pmu_conf.impl_regs[4];
+ ctx->ctx_used_pmcs[0] = 1; /* always save/restore PMC[0] */
+
+ /*
+ * useful in case of re-enable after disable
+ */
+ ctx->ctx_used_ibrs[0] = 0UL;
+ ctx->ctx_used_dbrs[0] = 0UL;
+
+ ia64_srlz_d();
}
-
/*
* This function is called when a thread exits (from exit_thread()).
* This is a simplified pfm_save_regs() that simply flushes the current
* register state into the save area taking into account any pending
- * overflow. This time no notification is sent because the taks is dying
+ * overflow. This time no notification is sent because the task is dying
* anyway. The inline processing of overflows avoids loosing some counts.
* The PMU is frozen on exit from this call and is to never be reenabled
* again for this task.
+ *
*/
void
-pfm_flush_regs (struct task_struct *ta)
+pfm_flush_regs (struct task_struct *task)
{
pfm_context_t *ctx;
- u64 pmc0, psr, mask;
- int i,j;
+ u64 pmc0;
+ unsigned long mask, mask2, val;
+ int i;
- if (ta == NULL) {
- panic(__FUNCTION__" task is NULL\n");
- }
- ctx = ta->thread.pfm_context;
- if (ctx == NULL) {
- panic(__FUNCTION__" no PFM ctx is NULL\n");
- }
- /*
- * We must make sure that we don't loose any potential overflow
- * interrupt while saving PMU context. In this code, external
- * interrupts are always enabled.
- */
+ ctx = task->thread.pfm_context;
- /*
- * save current PSR: needed because we modify it
+ if (ctx == NULL) return;
+
+ /*
+ * that's it if context already disabled
*/
- __asm__ __volatile__ ("mov %0=psr;;": "=r"(psr) :: "memory");
+ if (ctx->ctx_flags.state == PFM_CTX_DISABLED) return;
/*
* stop monitoring:
@@ -2090,7 +3497,27 @@
* in kernel.
* By now, we could still have an overflow interrupt in-flight.
*/
- __asm__ __volatile__ ("rsm psr.up;;"::: "memory");
+ if (ctx->ctx_fl_system) {
+
+ __asm__ __volatile__ ("rsm psr.pp;;"::: "memory");
+
+ /* disable dcr pp */
+ ia64_set_dcr(ia64_get_dcr() & ~IA64_DCR_PP);
+
+#ifdef CONFIG_SMP
+ local_cpu_data->pfm_syst_wide = 0;
+ local_cpu_data->pfm_dcr_pp = 0;
+#else
+ pfm_tasklist_toggle_pp(0);
+#endif
+
+ } else {
+
+ __asm__ __volatile__ ("rum psr.up;;"::: "memory");
+
+ /* no more save/restore on ctxsw */
+ current->thread.flags &= ~IA64_THREAD_PM_VALID;
+ }
/*
* Mark the PMU as not owned
@@ -2121,85 +3548,68 @@
ia64_srlz_d();
/*
- * restore PSR for context switch to save
+ * We don't need to restore psr, because we are on our way out anyway
*/
- __asm__ __volatile__ ("mov psr.l=%0;;srlz.i;"::"r"(psr): "memory");
/*
* This loop flushes the PMD into the PFM context.
- * IT also processes overflow inline.
+ * It also processes overflow inline.
*
* IMPORTANT: No notification is sent at this point as the process is dying.
* The implicit notification will come from a SIGCHILD or a return from a
* waitpid().
*
- * XXX: must take holes into account
*/
- mask = pmc0 >> PMU_FIRST_COUNTER;
- for (i=0,j=PMU_FIRST_COUNTER; i< pmu_conf.max_counters; i++,j++) {
-
- /* collect latest results */
- ctx->ctx_pmds[i].val += ia64_get_pmd(j) & pmu_conf.perf_ovfl_val;
-
- /*
- * now everything is in ctx_pmds[] and we need
- * to clear the saved context from save_regs() such that
- * pfm_read_pmds() gets the correct value
- */
- ta->thread.pmd[j] = 0;
- /* take care of overflow inline */
- if (mask & 0x1) {
- ctx->ctx_pmds[i].val += 1 + pmu_conf.perf_ovfl_val;
- DBprintk((" PMD[%d] overflowed pmd=0x%lx pmds.val=0x%lx\n",
- j, ia64_get_pmd(j), ctx->ctx_pmds[i].val));
- }
- mask >>=1;
- }
-}
+ if (atomic_read(&ctx->ctx_last_cpu) != smp_processor_id())
+ printk("perfmon: [%d] last_cpu=%d\n", task->pid, atomic_read(&ctx->ctx_last_cpu));
-/*
- * XXX: this routine is not very portable for PMCs
- * XXX: make this routine able to work with non current context
- */
-static void
-ia64_reset_pmu(void)
-{
- int i;
+ mask = pmc0 >> PMU_FIRST_COUNTER;
+ mask2 = ctx->ctx_used_pmds[0] >> PMU_FIRST_COUNTER;
- /* PMU is frozen, no pending overflow bits */
- ia64_set_pmc(0,1);
+ for (i = PMU_FIRST_COUNTER; mask2; i++, mask>>=1, mask2>>=1) {
- /* extra overflow bits + counter configs cleared */
- for(i=1; i< PMU_FIRST_COUNTER + pmu_conf.max_counters ; i++) {
- ia64_set_pmc(i,0);
- }
+ /* skip non used pmds */
+ if ((mask2 & 0x1) == 0) continue;
- /* opcode matcher set to all 1s */
- ia64_set_pmc(8,~0);
- ia64_set_pmc(9,~0);
+ val = ia64_get_pmd(i);
- /* I-EAR config cleared, plm=0 */
- ia64_set_pmc(10,0);
+ if (PMD_IS_COUNTING(i)) {
- /* D-EAR config cleared, PMC[11].pt must be 1 */
- ia64_set_pmc(11,1 << 28);
+ DBprintk(("[%d] pmd[%d] soft_pmd=0x%lx hw_pmd=0x%lx\n", task->pid, i, ctx->ctx_soft_pmds[i].val, val & pmu_conf.perf_ovfl_val));
- /* BTB config. plm=0 */
- ia64_set_pmc(12,0);
+ /* collect latest results */
+ ctx->ctx_soft_pmds[i].val += val & pmu_conf.perf_ovfl_val;
- /* Instruction address range, PMC[13].ta must be 1 */
- ia64_set_pmc(13,1);
+ /*
+ * now everything is in ctx_soft_pmds[] and we need
+ * to clear the saved context from save_regs() such that
+ * pfm_read_pmds() gets the correct value
+ */
+ task->thread.pmd[i] = 0;
- /* clears all PMD registers */
- for(i=0;i< pmu_conf.num_pmds; i++) {
- if (PMD_IS_IMPL(i)) ia64_set_pmd(i,0);
+ /* take care of overflow inline */
+ if (mask & 0x1) {
+ ctx->ctx_soft_pmds[i].val += 1 + pmu_conf.perf_ovfl_val;
+ DBprintk(("[%d] pmd[%d] overflowed soft_pmd=0x%lx\n",
+ task->pid, i, ctx->ctx_soft_pmds[i].val));
+ }
+ } else {
+ DBprintk(("[%d] pmd[%d] hw_pmd=0x%lx\n", task->pid, i, val));
+ /* not a counter, just save value as is */
+ task->thread.pmd[i] = val;
+ }
}
- ia64_srlz_d();
+ /*
+ * indicates that context has been saved
+ */
+ atomic_set(&ctx->ctx_last_cpu, -1);
+
}
+
/*
- * task is the newly created task
+ * task is the newly created task, pt_regs for new child
*/
int
pfm_inherit(struct task_struct *task, struct pt_regs *regs)
@@ -2207,25 +3617,43 @@
pfm_context_t *ctx = current->thread.pfm_context;
pfm_context_t *nctx;
struct thread_struct *th = &task->thread;
- int i, cnum;
+ unsigned long m;
+ int i;
/*
- * bypass completely for system wide
+ * make sure child cannot mess up the monitoring session
*/
- if (pfs_info.pfs_sys_session) {
- DBprintk((" enabling psr.pp for %d\n", task->pid));
- ia64_psr(regs)->pp = pfs_info.pfs_pp;
- return 0;
- }
+ ia64_psr(regs)->sp = 1;
+ DBprintk(("enabling psr.sp for [%d]\n", task->pid));
+
+ /*
+ * remove any sampling buffer mapping from child user
+ * address space. Must be done for all cases of inheritance.
+ */
+ if (ctx->ctx_smpl_vaddr) pfm_remove_smpl_mapping(task);
/*
* takes care of easiest case first
*/
if (CTX_INHERIT_MODE(ctx) == PFM_FL_INHERIT_NONE) {
- DBprintk((" removing PFM context for %d\n", task->pid));
- task->thread.pfm_context = NULL;
- task->thread.pfm_must_block = 0;
- atomic_set(&task->thread.pfm_notifiers_check, 0);
+ DBprintk(("removing PFM context for [%d]\n", task->pid));
+ task->thread.pfm_context = NULL;
+ task->thread.pfm_ovfl_block_reset = 0;
+ atomic_set(&task->thread.pfm_notifiers_check,0);
+ atomic_set(&task->thread.pfm_owners_check, 0);
+ task->thread.pfm_smpl_buf_list = NULL;
+
+ /*
+ * we must clear psr.up because the new child does
+ * not have a context and the PM_VALID flag is cleared
+ * in copy_thread().
+ *
+ * we do not clear psr.pp because it is always
+ * controlled by the system wide logic and we should
+ * never be here when system wide is running anyway
+ */
+ ia64_psr(regs)->up = 0;
+
/* copy_thread() clears IA64_THREAD_PM_VALID */
return 0;
}
@@ -2235,45 +3663,85 @@
/* copy content */
*nctx = *ctx;
+
if (CTX_INHERIT_MODE(ctx) == PFM_FL_INHERIT_ONCE) {
nctx->ctx_fl_inherit = PFM_FL_INHERIT_NONE;
- atomic_set(&task->thread.pfm_notifiers_check, 0);
- DBprintk((" downgrading to INHERIT_NONE for %d\n", task->pid));
- pfs_info.pfs_proc_sessions++;
+ atomic_set(&nctx->ctx_last_cpu, -1);
+
+ /*
+ * task is not yet visible in the tasklist, so we do
+ * not need to lock the newly created context.
+ * However, we must grab the tasklist_lock to ensure
+ * that the ctx_owner or ctx_notify_task do not disappear
+ * while we increment their check counters.
+ */
+ read_lock(&tasklist_lock);
+
+ if (nctx->ctx_notify_task)
+ atomic_inc(&nctx->ctx_notify_task->thread.pfm_notifiers_check);
+
+ if (nctx->ctx_owner)
+ atomic_inc(&nctx->ctx_owner->thread.pfm_owners_check);
+
+ read_unlock(&tasklist_lock);
+
+ DBprintk(("downgrading to INHERIT_NONE for [%d]\n", task->pid));
+
+ LOCK_PFS();
+ pfm_sessions.pfs_task_sessions++;
+ UNLOCK_PFS();
}
/* initialize counters in new context */
- for(i=0, cnum= PMU_FIRST_COUNTER; i < pmu_conf.max_counters; cnum++, i++) {
- nctx->ctx_pmds[i].val = nctx->ctx_pmds[i].ival & ~pmu_conf.perf_ovfl_val;
- th->pmd[cnum] = nctx->ctx_pmds[i].ival & pmu_conf.perf_ovfl_val;
+ m = nctx->ctx_used_pmds[0] >> PMU_FIRST_COUNTER;
+ for(i = PMU_FIRST_COUNTER ; m ; m>>=1, i++) {
+ if ((m & 0x1) && pmu_conf.pmd_desc[i].type == PFM_REG_COUNTING) {
+ nctx->ctx_soft_pmds[i].val = nctx->ctx_soft_pmds[i].ival & ~pmu_conf.perf_ovfl_val;
+ th->pmd[i] = nctx->ctx_soft_pmds[i].ival & pmu_conf.perf_ovfl_val;
+ }
}
- /* clear BTB index register */
+ /*
+ * clear BTB index register
+ * XXX: CPU-model specific knowledge!
+ */
th->pmd[16] = 0;
- /* if sampling then increment number of users of buffer */
- if (nctx->ctx_smpl_buf) {
- atomic_inc(&nctx->ctx_smpl_buf->psb_refcnt);
+ /*
+ * if sampling then increment number of users of buffer
+ */
+ if (nctx->ctx_psb) {
+ /*
+ * XXX: not very pretty!
+ */
+ LOCK_PSB(nctx->ctx_psb);
+ nctx->ctx_psb->psb_refcnt++;
+ UNLOCK_PSB(nctx->ctx_psb);
+ /*
+ * remove any pointer to sampling buffer mapping
+ */
+ nctx->ctx_smpl_vaddr = 0;
}
- nctx->ctx_fl_frozen = 0;
- nctx->ctx_ovfl_regs = 0;
+ nctx->ctx_fl_frozen = 0;
+ nctx->ctx_ovfl_regs[0] = 0UL;
+
sema_init(&nctx->ctx_restart_sem, 0); /* reset this semaphore to locked */
/* clear pending notification */
- th->pfm_must_block = 0;
+ th->pfm_ovfl_block_reset = 0;
/* link with new task */
- th->pfm_context = nctx;
+ th->pfm_context = nctx;
- DBprintk((" nctx=%p for process %d\n", (void *)nctx, task->pid));
+ DBprintk(("nctx=%p for process [%d]\n", (void *)nctx, task->pid));
/*
* the copy_thread routine automatically clears
* IA64_THREAD_PM_VALID, so we need to reenable it, if it was used by the caller
*/
if (current->thread.flags & IA64_THREAD_PM_VALID) {
- DBprintk((" setting PM_VALID for %d\n", task->pid));
+ DBprintk(("setting PM_VALID for [%d]\n", task->pid));
th->flags |= IA64_THREAD_PM_VALID;
}
@@ -2281,100 +3749,251 @@
}
/*
- * called from release_thread(), at this point this task is not in the
- * tasklist anymore
+ *
+ * We cannot touch any of the PMU registers at this point as we may
+ * not be running on the same CPU the task was last run on. Therefore
+ * it is assumed that the PMU has been stopped appropriately in
+ * pfm_flush_regs() called from exit_thread().
+ *
+ * The function is called in the context of the parent via a release_thread()
+ * and wait4(). The task is not in the tasklist anymore.
*/
void
pfm_context_exit(struct task_struct *task)
{
pfm_context_t *ctx = task->thread.pfm_context;
- if (!ctx) {
- DBprintk((" invalid context for %d\n", task->pid));
- return;
- }
+ /*
+ * check sampling buffer
+ */
+ if (ctx->ctx_psb) {
+ pfm_smpl_buffer_desc_t *psb = ctx->ctx_psb;
+
+ LOCK_PSB(psb);
+
+ DBprintk(("sampling buffer from [%d] @%p size %ld vma_flag=0x%x\n",
+ task->pid,
+ psb->psb_hdr, psb->psb_size, psb->psb_flags));
+
+ /*
+ * in the case where we are the last user, we may be able to free
+ * the buffer
+ */
+ psb->psb_refcnt--;
+
+ if (psb->psb_refcnt == 0) {
+
+ /*
+ * The flag is cleared in pfm_vm_close(). which gets
+ * called from do_exit() via exit_mm().
+ * By the time we come here, the task has no more mm context.
+ *
+ * We can only free the psb and buffer here after the vm area
+ * describing the buffer has been removed. This normally happens
+ * as part of do_exit() but the entire mm context is ONLY removed
+ * once its reference counts goes to zero. This is typically
+ * the case except for multi-threaded (several tasks) processes.
+ *
+ * See pfm_vm_close() and pfm_cleanup_smpl_buf() for more details.
+ */
+ if ((psb->psb_flags & PFM_PSB_VMA) == 0) {
+
+ DBprintk(("cleaning sampling buffer from [%d] @%p size %ld\n",
+ task->pid,
+ psb->psb_hdr, psb->psb_size));
+
+ /*
+ * free the buffer and psb
+ */
+ pfm_rvfree(psb->psb_hdr, psb->psb_size);
+ kfree(psb);
+ psb = NULL;
+ }
+ }
+ /* psb may have been deleted */
+ if (psb) UNLOCK_PSB(psb);
+ }
+
+ DBprintk(("cleaning [%d] pfm_context @%p notify_task=%p check=%d mm=%p\n",
+ task->pid, ctx,
+ ctx->ctx_notify_task,
+ atomic_read(&task->thread.pfm_notifiers_check), task->mm));
- /* check is we have a sampling buffer attached */
- if (ctx->ctx_smpl_buf) {
- pfm_smpl_buffer_desc_t *psb = ctx->ctx_smpl_buf;
-
- /* if only user left, then remove */
- DBprintk((" [%d] [%d] psb->refcnt=%d\n", current->pid, task->pid, psb->psb_refcnt.counter));
-
- if (atomic_dec_and_test(&psb->psb_refcnt) ) {
- rvfree(psb->psb_hdr, psb->psb_size);
- vfree(psb);
- DBprintk((" [%d] cleaning [%d] sampling buffer\n", current->pid, task->pid ));
- }
- }
- DBprintk((" [%d] cleaning [%d] pfm_context @%p\n", current->pid, task->pid, (void *)ctx));
-
- /*
- * To avoid getting the notified task scan the entire process list
- * when it exits because it would have pfm_notifiers_check set, we
- * decrease it by 1 to inform the task, that one less task is going
- * to send it notification. each new notifer increases this field by
- * 1 in pfm_context_create(). Of course, there is race condition between
- * decreasing the value and the notified task exiting. The danger comes
- * from the fact that we have a direct pointer to its task structure
- * thereby bypassing the tasklist. We must make sure that if we have
- * notify_task!= NULL, the target task is still somewhat present. It may
- * already be detached from the tasklist but that's okay. Note that it is
- * okay if we 'miss the deadline' and the task scans the list for nothing,
- * it will affect performance but not correctness. The correctness is ensured
- * by using the notify_lock whic prevents the notify_task from changing on us.
- * Once holdhing this lock, if we see notify_task!= NULL, then it will stay like
+ /*
+ * To avoid getting the notified task or owner task scan the entire process
+ * list when they exit, we decrement notifiers_check and owners_check respectively.
+ *
+ * Of course, there is race condition between decreasing the value and the
+ * task exiting. The danger comes from the fact that, in both cases, we have a
+ * direct pointer to a task structure thereby bypassing the tasklist.
+ * We must make sure that, if we have task!= NULL, the target task is still
+ * present and is identical to the initial task specified
+ * during pfm_create_context(). It may already be detached from the tasklist but
+ * that's okay. Note that it is okay if we miss the deadline and the task scans
+ * the list for nothing, it will affect performance but not correctness.
+ * The correctness is ensured by using the ctx_lock which prevents the
+ * notify_task from changing the fields in our context.
+ * Once holdhing this lock, if we see task!= NULL, then it will stay like
* that until we release the lock. If it is NULL already then we came too late.
*/
- spin_lock(&ctx->ctx_notify_lock);
+ LOCK_CTX(ctx);
- if (ctx->ctx_notify_task) {
- DBprintk((" [%d] [%d] atomic_sub on [%d] notifiers=%u\n", current->pid, task->pid,
- ctx->ctx_notify_task->pid,
- atomic_read(&ctx->ctx_notify_task->thread.pfm_notifiers_check)));
+ if (ctx->ctx_notify_task != NULL) {
+ DBprintk(("[%d], [%d] atomic_sub on [%d] notifiers=%u\n", current->pid,
+ task->pid,
+ ctx->ctx_notify_task->pid,
+ atomic_read(&ctx->ctx_notify_task->thread.pfm_notifiers_check)));
+
+ atomic_dec(&ctx->ctx_notify_task->thread.pfm_notifiers_check);
+ }
+
+ if (ctx->ctx_owner != NULL) {
+ DBprintk(("[%d], [%d] atomic_sub on [%d] owners=%u\n",
+ current->pid,
+ task->pid,
+ ctx->ctx_owner->pid,
+ atomic_read(&ctx->ctx_owner->thread.pfm_owners_check)));
- atomic_sub(1, &ctx->ctx_notify_task->thread.pfm_notifiers_check);
+ atomic_dec(&ctx->ctx_owner->thread.pfm_owners_check);
}
- spin_unlock(&ctx->ctx_notify_lock);
+ UNLOCK_CTX(ctx);
+
+ LOCK_PFS();
if (ctx->ctx_fl_system) {
- /*
- * if included interrupts (true by default), then reset
- * to get default value
- */
- if (ctx->ctx_fl_exclintr == 0) {
- /*
- * reload kernel default DCR value
- */
- ia64_set_dcr(pfs_info.pfs_dfl_dcr);
- DBprintk((" restored dcr to 0x%lx\n", pfs_info.pfs_dfl_dcr));
+
+ pfm_sessions.pfs_sys_session[ctx->ctx_cpu] = NULL;
+ pfm_sessions.pfs_sys_sessions--;
+ DBprintk(("freeing syswide session on CPU%ld\n", ctx->ctx_cpu));
+ /* update perfmon debug register counter */
+ if (ctx->ctx_fl_using_dbreg) {
+ if (pfm_sessions.pfs_sys_use_dbregs == 0) {
+ printk("perfmon: invalid release for [%d] sys_use_dbregs=0\n", task->pid);
+ } else
+ pfm_sessions.pfs_sys_use_dbregs--;
}
- /*
- * free system wide session slot
- */
- pfs_info.pfs_sys_session = 0;
+
+ /*
+ * remove any CPU pinning
+ */
+ task->cpus_allowed = ctx->ctx_saved_cpus_allowed;
+ task->need_resched = 1;
} else {
- pfs_info.pfs_proc_sessions--;
+ pfm_sessions.pfs_task_sessions--;
}
+ UNLOCK_PFS();
pfm_context_free(ctx);
/*
* clean pfm state in thread structure,
*/
- task->thread.pfm_context = NULL;
- task->thread.pfm_must_block = 0;
+ task->thread.pfm_context = NULL;
+ task->thread.pfm_ovfl_block_reset = 0;
+
/* pfm_notifiers is cleaned in pfm_cleanup_notifiers() */
+}
+
+/*
+ * function invoked from release_thread when pfm_smpl_buf_list is not NULL
+ */
+int
+pfm_cleanup_smpl_buf(struct task_struct *task)
+{
+ pfm_smpl_buffer_desc_t *tmp, *psb = task->thread.pfm_smpl_buf_list;
+
+ if (psb == NULL) {
+ printk("perfmon: psb is null in [%d]\n", current->pid);
+ return -1;
+ }
+ /*
+ * Walk through the list and free the sampling buffer and psb
+ */
+ while (psb) {
+ DBprintk(("[%d] freeing smpl @%p size %ld\n", current->pid, psb->psb_hdr, psb->psb_size));
+ pfm_rvfree(psb->psb_hdr, psb->psb_size);
+ tmp = psb->psb_next;
+ kfree(psb);
+ psb = tmp;
+ }
+
+ /* just in case */
+ task->thread.pfm_smpl_buf_list = NULL;
+
+ return 0;
+}
+
+/*
+ * function invoked from release_thread to make sure that the ctx_owner field does not
+ * point to an unexisting task.
+ */
+void
+pfm_cleanup_owners(struct task_struct *task)
+{
+ struct task_struct *p;
+ pfm_context_t *ctx;
+
+ DBprintk(("called by [%d] for [%d]\n", current->pid, task->pid));
+
+ read_lock(&tasklist_lock);
+
+ for_each_task(p) {
+ /*
+ * It is safe to do the 2-step test here, because thread.ctx
+ * is cleaned up only in release_thread() and at that point
+ * the task has been detached from the tasklist which is an
+ * operation which uses the write_lock() on the tasklist_lock
+ * so it cannot run concurrently to this loop. So we have the
+ * guarantee that if we find p and it has a perfmon ctx then
+ * it is going to stay like this for the entire execution of this
+ * loop.
+ */
+ ctx = p->thread.pfm_context;
+
+ //DBprintk(("[%d] scanning task [%d] ctx=%p\n", task->pid, p->pid, ctx));
+
+ if (ctx && ctx->ctx_owner == task) {
+ DBprintk(("trying for owner [%d] in [%d]\n", task->pid, p->pid));
+ /*
+ * the spinlock is required to take care of a race condition
+ * with the send_sig_info() call. We must make sure that
+ * either the send_sig_info() completes using a valid task,
+ * or the notify_task is cleared before the send_sig_info()
+ * can pick up a stale value. Note that by the time this
+ * function is executed the 'task' is already detached from the
+ * tasklist. The problem is that the notifiers have a direct
+ * pointer to it. It is okay to send a signal to a task in this
+ * stage, it simply will have no effect. But it is better than sending
+ * to a completely destroyed task or worse to a new task using the same
+ * task_struct address.
+ */
+ LOCK_CTX(ctx);
+
+ ctx->ctx_owner = NULL;
+
+ UNLOCK_CTX(ctx);
+
+ DBprintk(("done for notifier [%d] in [%d]\n", task->pid, p->pid));
+ }
+ }
+ read_unlock(&tasklist_lock);
+
+ atomic_set(&task->thread.pfm_owners_check, 0);
}
+
+/*
+ * function called from release_thread to make sure that the ctx_notify_task is not pointing
+ * to an unexisting task
+ */
void
pfm_cleanup_notifiers(struct task_struct *task)
{
struct task_struct *p;
pfm_context_t *ctx;
- DBprintk((" [%d] called\n", task->pid));
+ DBprintk(("called by [%d] for [%d]\n", current->pid, task->pid));
read_lock(&tasklist_lock);
@@ -2391,10 +4010,10 @@
*/
ctx = p->thread.pfm_context;
- DBprintk((" [%d] scanning task [%d] ctx=%p\n", task->pid, p->pid, ctx));
+ //DBprintk(("[%d] scanning task [%d] ctx=%p\n", task->pid, p->pid, ctx));
if (ctx && ctx->ctx_notify_task == task) {
- DBprintk((" trying for notifier %d in %d\n", task->pid, p->pid));
+ DBprintk(("trying for notifier [%d] in [%d]\n", task->pid, p->pid));
/*
* the spinlock is required to take care of a race condition
* with the send_sig_info() call. We must make sure that
@@ -2408,23 +4027,145 @@
* to a completely destroyed task or worse to a new task using the same
* task_struct address.
*/
- spin_lock(&ctx->ctx_notify_lock);
+ LOCK_CTX(ctx);
ctx->ctx_notify_task = NULL;
- spin_unlock(&ctx->ctx_notify_lock);
+ UNLOCK_CTX(ctx);
- DBprintk((" done for notifier %d in %d\n", task->pid, p->pid));
+ DBprintk(("done for notifier [%d] in [%d]\n", task->pid, p->pid));
}
}
read_unlock(&tasklist_lock);
+ atomic_set(&task->thread.pfm_notifiers_check, 0);
+}
+
+static struct irqaction perfmon_irqaction = {
+ handler: perfmon_interrupt,
+ flags: SA_INTERRUPT,
+ name: "perfmon"
+};
+
+
+static void
+pfm_pmu_snapshot(void)
+{
+ int i;
+
+ for (i=0; i < IA64_NUM_PMC_REGS; i++) {
+ if (i >= pmu_conf.num_pmcs) break;
+ if (PMC_IS_IMPL(i)) reset_pmcs[i] = ia64_get_pmc(i);
+ }
+#ifdef CONFIG_MCKINLEY
+ /*
+ * set the 'stupid' enable bit to power the PMU!
+ */
+ reset_pmcs[4] |= 1UL << 23;
+#endif
+}
+
+/*
+ * perfmon initialization routine, called from the initcall() table
+ */
+int __init
+perfmon_init (void)
+{
+ pal_perf_mon_info_u_t pm_info;
+ s64 status;
+
+ register_percpu_irq(IA64_PERFMON_VECTOR, &perfmon_irqaction);
+
+ ia64_set_pmv(IA64_PERFMON_VECTOR);
+ ia64_srlz_d();
+
+ pmu_conf.pfm_is_disabled = 1;
+
+ printk("perfmon: version %u.%u (sampling format v%u.%u) IRQ %u\n",
+ PFM_VERSION_MAJ,
+ PFM_VERSION_MIN,
+ PFM_SMPL_VERSION_MAJ,
+ PFM_SMPL_VERSION_MIN,
+ IA64_PERFMON_VECTOR);
+
+ if ((status=ia64_pal_perf_mon_info(pmu_conf.impl_regs, &pm_info)) != 0) {
+ printk("perfmon: PAL call failed (%ld), perfmon disabled\n", status);
+ return -1;
+ }
+
+ pmu_conf.perf_ovfl_val = (1UL << pm_info.pal_perf_mon_info_s.width) - 1;
+ pmu_conf.max_counters = pm_info.pal_perf_mon_info_s.generic;
+ pmu_conf.num_pmcs = find_num_pm_regs(pmu_conf.impl_regs);
+ pmu_conf.num_pmds = find_num_pm_regs(&pmu_conf.impl_regs[4]);
+
+ printk("perfmon: %u bits counters\n", pm_info.pal_perf_mon_info_s.width);
+
+ printk("perfmon: %lu PMC/PMD pairs, %lu PMCs, %lu PMDs\n",
+ pmu_conf.max_counters, pmu_conf.num_pmcs, pmu_conf.num_pmds);
+
+ /* sanity check */
+ if (pmu_conf.num_pmds >= IA64_NUM_PMD_REGS || pmu_conf.num_pmcs >= IA64_NUM_PMC_REGS) {
+ printk(KERN_ERR "perfmon: not enough pmc/pmd, perfmon is DISABLED\n");
+ return -1; /* no need to continue anyway */
+ }
+
+ if (ia64_pal_debug_info(&pmu_conf.num_ibrs, &pmu_conf.num_dbrs)) {
+ printk(KERN_WARNING "perfmon: unable to get number of debug registers\n");
+ pmu_conf.num_ibrs = pmu_conf.num_dbrs = 0;
+ }
+ /* PAL reports the number of pairs */
+ pmu_conf.num_ibrs <<=1;
+ pmu_conf.num_dbrs <<=1;
+
+ /*
+ * take a snapshot of all PMU registers. PAL is supposed
+ * to configure them with stable/safe values, i.e., not
+ * capturing anything.
+ * We take a snapshot now, before we make any modifications. This
+ * will become our master copy. Then we will reuse the snapshot
+ * to reset the PMU in pfm_enable(). Using this technique, perfmon
+ * does NOT have to know about the specific values to program for
+ * the PMC/PMD. The safe values may be different from one CPU model to
+ * the other.
+ */
+ pfm_pmu_snapshot();
+
+ /*
+ * setup the register configuration descriptions for the CPU
+ */
+ pmu_conf.pmc_desc = pmc_desc;
+ pmu_conf.pmd_desc = pmd_desc;
+
+ /* we are all set */
+ pmu_conf.pfm_is_disabled = 0;
+
+ /*
+ * for now here for debug purposes
+ */
+ perfmon_dir = create_proc_read_entry ("perfmon", 0, 0, perfmon_read_entry, NULL);
+
+ pfm_sysctl_header = register_sysctl_table(pfm_sysctl_root, 0);
+
+ spin_lock_init(&pfm_sessions.pfs_lock);
+
+ return 0;
+}
+
+__initcall(perfmon_init);
+
+void
+perfmon_init_percpu (void)
+{
+ ia64_set_pmv(IA64_PERFMON_VECTOR);
+ ia64_srlz_d();
}
+
#else /* !CONFIG_PERFMON */
asmlinkage int
-sys_perfmonctl (int pid, int cmd, int flags, perfmon_req_t *req, int count, long arg6, long arg7, long arg8, long stack)
+sys_perfmonctl (int pid, int cmd, void *req, int count, long arg5, long arg6,
+ long arg7, long arg8, long stack)
{
return -ENOSYS;
}
diff -urN linux-2.4.18/arch/ia64/kernel/perfmon_generic.h lia64-2.4/arch/ia64/kernel/perfmon_generic.h
--- linux-2.4.18/arch/ia64/kernel/perfmon_generic.h Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/kernel/perfmon_generic.h Wed Apr 10 11:16:59 2002
@@ -0,0 +1,29 @@
+#define RDEP(x) (1UL<<(x))
+
+#ifdef CONFIG_ITANIUM
+#error "This file should not be used when CONFIG_ITANIUM is defined"
+#endif
+
+static pfm_reg_desc_t pmc_desc[256]={
+/* pmc0 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc1 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc2 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc3 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc4 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(4),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc5 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(5),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc6 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(6),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc7 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(7),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+ { PFM_REG_NONE, 0, NULL, NULL, {0,}, {0,}}, /* end marker */
+};
+
+static pfm_reg_desc_t pmd_desc[256]={
+/* pmd0 */ { PFM_REG_NOTIMPL, 0, NULL, NULL, {0,}, {0,}},
+/* pmd1 */ { PFM_REG_NOTIMPL, 0, NULL, NULL, {0,}, {0,}},
+/* pmd2 */ { PFM_REG_NOTIMPL, 0, NULL, NULL, {0,}, {0,}},
+/* pmd3 */ { PFM_REG_NOTIMPL, 0, NULL, NULL, {0,}, {0,}},
+/* pmd4 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(4),0UL, 0UL, 0UL}},
+/* pmd5 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(5),0UL, 0UL, 0UL}},
+/* pmd6 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(6),0UL, 0UL, 0UL}},
+/* pmd7 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(7),0UL, 0UL, 0UL}},
+ { PFM_REG_NONE, 0, NULL, NULL, {0,}, {0,}}, /* end marker */
+};
diff -urN linux-2.4.18/arch/ia64/kernel/perfmon_itanium.h lia64-2.4/arch/ia64/kernel/perfmon_itanium.h
--- linux-2.4.18/arch/ia64/kernel/perfmon_itanium.h Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/kernel/perfmon_itanium.h Wed Apr 10 11:16:59 2002
@@ -0,0 +1,65 @@
+#define RDEP(x) (1UL<<(x))
+
+#ifndef CONFIG_ITANIUM
+#error "This file is only valid when CONFIG_ITANIUM is defined"
+#endif
+
+static int pfm_ita_pmc_check(struct task_struct *task, unsigned int cnum, unsigned long *val);
+
+static pfm_reg_desc_t pmc_desc[256]={
+/* pmc0 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc1 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc2 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc3 */ { PFM_REG_CONTROL, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc4 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(4),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc5 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(5),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc6 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(6),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc7 */ { PFM_REG_COUNTING, 0, NULL, NULL, {RDEP(7),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc8 */ { PFM_REG_CONFIG, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc9 */ { PFM_REG_CONFIG, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc10 */ { PFM_REG_MONITOR, 0, NULL, NULL, {RDEP(0)|RDEP(1),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc11 */ { PFM_REG_MONITOR, 0, NULL, pfm_ita_pmc_check, {RDEP(2)|RDEP(3)|RDEP(17),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc12 */ { PFM_REG_MONITOR, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(10)|RDEP(11)|RDEP(12)|RDEP(13)|RDEP(14)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+/* pmc13 */ { PFM_REG_CONFIG, 0, NULL, pfm_ita_pmc_check, {0UL,0UL, 0UL, 0UL}, {0UL,0UL, 0UL, 0UL}},
+ { PFM_REG_NONE, 0, NULL, NULL, {0,}, {0,}}, /* end marker */
+};
+
+static pfm_reg_desc_t pmd_desc[256]={
+/* pmd0 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(1),0UL, 0UL, 0UL}, {RDEP(10),0UL, 0UL, 0UL}},
+/* pmd1 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(0),0UL, 0UL, 0UL}, {RDEP(10),0UL, 0UL, 0UL}},
+/* pmd2 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(3)|RDEP(17),0UL, 0UL, 0UL}, {RDEP(11),0UL, 0UL, 0UL}},
+/* pmd3 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(2)|RDEP(17),0UL, 0UL, 0UL}, {RDEP(11),0UL, 0UL, 0UL}},
+/* pmd4 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(4),0UL, 0UL, 0UL}},
+/* pmd5 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(5),0UL, 0UL, 0UL}},
+/* pmd6 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(6),0UL, 0UL, 0UL}},
+/* pmd7 */ { PFM_REG_COUNTING, 0, NULL, NULL, {0UL,0UL, 0UL, 0UL}, {RDEP(7),0UL, 0UL, 0UL}},
+/* pmd8 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(9)|RDEP(10)|RDEP(11)|RDEP(12)|RDEP(13)|RDEP(14)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd9 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(10)|RDEP(11)|RDEP(12)|RDEP(13)|RDEP(14)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd10 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(11)|RDEP(12)|RDEP(13)|RDEP(14)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd11 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(10)|RDEP(12)|RDEP(13)|RDEP(14)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd12 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(10)|RDEP(11)|RDEP(13)|RDEP(14)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd13 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(10)|RDEP(11)|RDEP(12)|RDEP(14)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd14 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(10)|RDEP(11)|RDEP(12)|RDEP(13)|RDEP(15)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd15 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(10)|RDEP(11)|RDEP(12)|RDEP(13)|RDEP(14)|RDEP(16),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd16 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(8)|RDEP(9)|RDEP(10)|RDEP(11)|RDEP(12)|RDEP(13)|RDEP(14)|RDEP(15),0UL, 0UL, 0UL}, {RDEP(12),0UL, 0UL, 0UL}},
+/* pmd17 */ { PFM_REG_BUFFER, 0, NULL, NULL, {RDEP(2)|RDEP(3),0UL, 0UL, 0UL}, {RDEP(11),0UL, 0UL, 0UL}},
+ { PFM_REG_NONE, 0, NULL, NULL, {0,}, {0,}}, /* end marker */
+};
+
+static int
+pfm_ita_pmc_check(struct task_struct *task, unsigned int cnum, unsigned long *val)
+{
+ pfm_context_t *ctx = task->thread.pfm_context;
+
+ if (cnum == 13 && (*val & 0x1) && ctx->ctx_fl_using_dbreg == 0) {
+ DBprintk(("cannot configure range restriction without initializing the instruction debug registers first\n"));
+ return -EINVAL;
+ }
+
+ if (cnum == 11 && ((*val >> 28)& 0x1) == 0 && ctx->ctx_fl_using_dbreg == 0) {
+ DBprintk(("cannot configure range restriction without initializing the data debug registers first pmc11=0x%lx\n", *val));
+ return -EINVAL;
+ }
+ return 0;
+}
+
diff -urN linux-2.4.18/arch/ia64/kernel/process.c lia64-2.4/arch/ia64/kernel/process.c
--- linux-2.4.18/arch/ia64/kernel/process.c Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/process.c Tue Feb 26 14:53:42 2002
@@ -1,8 +1,8 @@
/*
* Architecture-specific setup.
*
- * Copyright (C) 1998-2001 Hewlett-Packard Co
- * Copyright (C) 1998-2001 David Mosberger-Tang
+ * Copyright (C) 1998-2002 Hewlett-Packard Co
+ * David Mosberger-Tang
*/
#define __KERNEL_SYSCALLS__ /* see */
#include
@@ -12,6 +12,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -28,6 +29,10 @@
#include
#include
+#ifdef CONFIG_IA64_SGI_SN
+#include
+#endif
+
static void
do_show_stack (struct unw_frame_info *info, void *arg)
{
@@ -46,6 +51,15 @@
}
void
+show_trace_task (struct task_struct *task)
+{
+ struct unw_frame_info info;
+
+ unw_init_from_blocked_task(&info, task);
+ do_show_stack(&info, 0);
+}
+
+void
show_stack (struct task_struct *task)
{
if (!task)
@@ -90,8 +104,8 @@
printk("r26 : %016lx r27 : %016lx r28 : %016lx\n", regs->r26, regs->r27, regs->r28);
printk("r29 : %016lx r30 : %016lx r31 : %016lx\n", regs->r29, regs->r30, regs->r31);
- /* print the stacked registers if cr.ifs is valid: */
- if (regs->cr_ifs & 0x8000000000000000) {
+ if (user_mode(regs)) {
+ /* print the stacked registers */
unsigned long val, sof, *bsp, ndirty;
int i, is_nat = 0;
@@ -122,8 +136,18 @@
if (!current->need_resched)
min_xtp();
#endif
- while (!current->need_resched)
+
+ while (!current->need_resched) {
+#ifdef CONFIG_IA64_SGI_SN
+ snidle();
+#endif
continue;
+ }
+
+#ifdef CONFIG_IA64_SGI_SN
+ snidleoff();
+#endif
+
#ifdef CONFIG_SMP
normal_xtp();
#endif
@@ -139,10 +163,17 @@
{
if ((task->thread.flags & IA64_THREAD_DBG_VALID) != 0)
ia64_save_debug_regs(&task->thread.dbr[0]);
+
#ifdef CONFIG_PERFMON
if ((task->thread.flags & IA64_THREAD_PM_VALID) != 0)
pfm_save_regs(task);
+
+# ifdef CONFIG_SMP
+ if (local_cpu_data->pfm_syst_wide)
+ pfm_syst_wide_update_task(task, 0);
+# endif
#endif
+
if (IS_IA32_PROCESS(ia64_task_regs(task)))
ia32_save_state(task);
}
@@ -152,10 +183,17 @@
{
if ((task->thread.flags & IA64_THREAD_DBG_VALID) != 0)
ia64_load_debug_regs(&task->thread.dbr[0]);
+
#ifdef CONFIG_PERFMON
if ((task->thread.flags & IA64_THREAD_PM_VALID) != 0)
pfm_load_regs(task);
+
+# ifdef CONFIG_SMP
+ if (local_cpu_data->pfm_syst_wide)
+ pfm_syst_wide_update_task(task, 1);
+# endif
#endif
+
if (IS_IA32_PROCESS(ia64_task_regs(task)))
ia32_load_state(task);
}
@@ -235,7 +273,7 @@
if (user_mode(child_ptregs)) {
if (user_stack_base) {
- child_ptregs->r12 = user_stack_base + user_stack_size;
+ child_ptregs->r12 = user_stack_base + user_stack_size - 16;
child_ptregs->ar_bspstore = user_stack_base;
child_ptregs->ar_rnat = 0;
child_ptregs->loadrs = 0;
@@ -288,9 +326,15 @@
if (IS_IA32_PROCESS(ia64_task_regs(current)))
ia32_save_state(p);
#endif
+
#ifdef CONFIG_PERFMON
- if (p->thread.pfm_context)
- retval = pfm_inherit(p, child_ptregs);
+ /*
+ * reset notifiers and owner check (may not have a perfmon context)
+ */
+ atomic_set(&p->thread.pfm_notifiers_check, 0);
+ atomic_set(&p->thread.pfm_owners_check, 0);
+
+ if (current->thread.pfm_context) retval = pfm_inherit(p, child_ptregs);
#endif
return retval;
}
@@ -414,6 +458,16 @@
return error;
}
+void
+ia64_set_personality (struct elf64_hdr *elf_ex, int ibcs2_interpreter)
+{
+ set_personality(PER_LINUX);
+ if (elf_ex->e_flags & EF_IA_64_LINUX_EXECUTABLE_STACK)
+ current->thread.flags |= IA64_THREAD_XSTACK;
+ else
+ current->thread.flags &= ~IA64_THREAD_XSTACK;
+}
+
pid_t
kernel_thread (int (*fn)(void *), void *arg, unsigned long flags)
{
@@ -445,15 +499,15 @@
#ifdef CONFIG_PERFMON
/*
- * By the time we get here, the task is detached from the tasklist. This is important
- * because it means that no other tasks can ever find it as a notifiied task, therfore
- * there is no race condition between this code and let's say a pfm_context_create().
- * Conversely, the pfm_cleanup_notifiers() cannot try to access a task's pfm context if
- * this other task is in the middle of its own pfm_context_exit() because it would alreayd
- * be out of the task list. Note that this case is very unlikely between a direct child
- * and its parents (if it is the notified process) because of the way the exit is notified
- * via SIGCHLD.
+ * by the time we get here, the task is detached from the tasklist. This is important
+ * because it means that no other tasks can ever find it as a notified task, therfore there
+ * is no race condition between this code and let's say a pfm_context_create().
+ * Conversely, the pfm_cleanup_notifiers() cannot try to access a task's pfm context if this
+ * other task is in the middle of its own pfm_context_exit() because it would already be out of
+ * the task list. Note that this case is very unlikely between a direct child and its parents
+ * (if it is the notified process) because of the way the exit is notified via SIGCHLD.
*/
+
void
release_thread (struct task_struct *task)
{
@@ -462,6 +516,12 @@
if (atomic_read(&task->thread.pfm_notifiers_check) > 0)
pfm_cleanup_notifiers(task);
+
+ if (atomic_read(&task->thread.pfm_owners_check) > 0)
+ pfm_cleanup_owners(task);
+
+ if (task->thread.pfm_smpl_buf_list)
+ pfm_cleanup_smpl_buf(task);
}
#endif
@@ -477,21 +537,13 @@
ia64_set_fpu_owner(0);
#endif
#ifdef CONFIG_PERFMON
- /* stop monitoring */
- if ((current->thread.flags & IA64_THREAD_PM_VALID) != 0) {
- /*
- * we cannot rely on switch_to() to save the PMU
- * context for the last time. There is a possible race
- * condition in SMP mode between the child and the
- * parent. by explicitly saving the PMU context here
- * we garantee no race. this call we also stop
- * monitoring
- */
+ /* if needed, stop monitoring and flush state to perfmon context */
+ if (current->thread.pfm_context)
pfm_flush_regs(current);
- /*
- * make sure that switch_to() will not save context again
- */
- current->thread.flags &= ~IA64_THREAD_PM_VALID;
+
+ /* free debug register resources */
+ if ((current->thread.flags & IA64_THREAD_DBG_VALID) != 0) {
+ pfm_release_debug_registers(current);
}
#endif
}
diff -urN linux-2.4.18/arch/ia64/kernel/ptrace.c lia64-2.4/arch/ia64/kernel/ptrace.c
--- linux-2.4.18/arch/ia64/kernel/ptrace.c Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/ptrace.c Fri Feb 22 16:40:10 2002
@@ -1,7 +1,7 @@
/*
* Kernel support for the ptrace() and syscall tracing interfaces.
*
- * Copyright (C) 1999-2001 Hewlett-Packard Co
+ * Copyright (C) 1999-2002 Hewlett-Packard Co
* David Mosberger-Tang
*
* Derived from the x86 and Alpha versions. Most of the code in here
@@ -23,6 +23,9 @@
#include
#include
#include
+#ifdef CONFIG_PERFMON
+# include
+#endif
/*
* Bits in the PSR that we allow ptrace() to change:
@@ -755,11 +758,6 @@
} else {
/* access debug registers */
- if (!(child->thread.flags & IA64_THREAD_DBG_VALID)) {
- child->thread.flags |= IA64_THREAD_DBG_VALID;
- memset(child->thread.dbr, 0, sizeof(child->thread.dbr));
- memset(child->thread.ibr, 0, sizeof(child->thread.ibr));
- }
if (addr >= PT_IBR) {
regnum = (addr - PT_IBR) >> 3;
ptr = &child->thread.ibr[0];
@@ -772,6 +770,30 @@
dprintk("ptrace: rejecting access to register address 0x%lx\n", addr);
return -1;
}
+#ifdef CONFIG_PERFMON
+ /*
+ * Check if debug registers are used by perfmon. This test must be done
+ * once we know that we can do the operation, i.e. the arguments are all
+ * valid, but before we start modifying the state.
+ *
+ * Perfmon needs to keep a count of how many processes are trying to
+ * modify the debug registers for system wide monitoring sessions.
+ *
+ * We also include read access here, because they may cause the
+ * PMU-installed debug register state (dbr[], ibr[]) to be reset. The two
+ * arrays are also used by perfmon, but we do not use
+ * IA64_THREAD_DBG_VALID. The registers are restored by the PMU context
+ * switch code.
+ */
+ if (pfm_use_debug_registers(child))
+ return -1;
+#endif
+
+ if (!(child->thread.flags & IA64_THREAD_DBG_VALID)) {
+ child->thread.flags |= IA64_THREAD_DBG_VALID;
+ memset(child->thread.dbr, 0, sizeof(child->thread.dbr));
+ memset(child->thread.ibr, 0, sizeof(child->thread.ibr));
+ }
ptr += regnum;
@@ -789,6 +811,260 @@
return 0;
}
+static long
+ptrace_getregs (struct task_struct *child, struct pt_all_user_regs *ppr)
+{
+ struct switch_stack *sw;
+ struct pt_regs *pt;
+ long ret, retval;
+ struct unw_frame_info info;
+ char nat = 0;
+ int i;
+
+ retval = verify_area(VERIFY_WRITE, ppr, sizeof(struct pt_all_user_regs));
+ if (retval != 0) {
+ return -EIO;
+ }
+
+ pt = ia64_task_regs(child);
+ sw = (struct switch_stack *) (child->thread.ksp + 16);
+ unw_init_from_blocked_task(&info, child);
+ if (unw_unwind_to_user(&info) < 0) {
+ return -EIO;
+ }
+
+ if (((unsigned long) ppr & 0x7) != 0) {
+ dprintk("ptrace:unaligned register address %p\n", ppr);
+ return -EIO;
+ }
+
+ retval = 0;
+
+ /* control regs */
+
+ retval |= __put_user(pt->cr_iip, &ppr->cr_iip);
+ retval |= access_uarea(child, PT_CR_IPSR, &ppr->cr_ipsr, 0);
+
+ /* app regs */
+
+ retval |= __put_user(pt->ar_pfs, &ppr->ar[PT_AUR_PFS]);
+ retval |= __put_user(pt->ar_rsc, &ppr->ar[PT_AUR_RSC]);
+ retval |= __put_user(pt->ar_bspstore, &ppr->ar[PT_AUR_BSPSTORE]);
+ retval |= __put_user(pt->ar_unat, &ppr->ar[PT_AUR_UNAT]);
+ retval |= __put_user(pt->ar_ccv, &ppr->ar[PT_AUR_CCV]);
+ retval |= __put_user(pt->ar_fpsr, &ppr->ar[PT_AUR_FPSR]);
+
+ retval |= access_uarea(child, PT_AR_EC, &ppr->ar[PT_AUR_EC], 0);
+ retval |= access_uarea(child, PT_AR_LC, &ppr->ar[PT_AUR_LC], 0);
+ retval |= access_uarea(child, PT_AR_RNAT, &ppr->ar[PT_AUR_RNAT], 0);
+ retval |= access_uarea(child, PT_AR_BSP, &ppr->ar[PT_AUR_BSP], 0);
+ retval |= access_uarea(child, PT_CFM, &ppr->cfm, 0);
+
+ /* gr1-gr3 */
+
+ retval |= __copy_to_user(&ppr->gr[1], &pt->r1, sizeof(long) * 3);
+
+ /* gr4-gr7 */
+
+ for (i = 4; i < 8; i++) {
+ retval |= unw_access_gr(&info, i, &ppr->gr[i], &nat, 0);
+ }
+
+ /* gr8-gr11 */
+
+ retval |= __copy_to_user(&ppr->gr[8], &pt->r8, sizeof(long) * 4);
+
+ /* gr12-gr15 */
+
+ retval |= __copy_to_user(&ppr->gr[12], &pt->r12, sizeof(long) * 4);
+
+ /* gr16-gr31 */
+
+ retval |= __copy_to_user(&ppr->gr[16], &pt->r16, sizeof(long) * 16);
+
+ /* b0 */
+
+ retval |= __put_user(pt->b0, &ppr->br[0]);
+
+ /* b1-b5 */
+
+ for (i = 1; i < 6; i++) {
+ retval |= unw_access_br(&info, i, &ppr->br[i], 0);
+ }
+
+ /* b6-b7 */
+
+ retval |= __put_user(pt->b6, &ppr->br[6]);
+ retval |= __put_user(pt->b7, &ppr->br[7]);
+
+ /* fr2-fr5 */
+
+ for (i = 2; i < 6; i++) {
+ retval |= access_fr(&info, i, 0, (unsigned long *) &ppr->fr[i], 0);
+ retval |= access_fr(&info, i, 1, (unsigned long *) &ppr->fr[i] + 1, 0);
+ }
+
+ /* fr6-fr9 */
+
+ retval |= __copy_to_user(&ppr->fr[6], &pt->f6, sizeof(struct ia64_fpreg) * 4);
+
+ /* fp scratch regs(10-15) */
+
+ retval |= __copy_to_user(&ppr->fr[10], &sw->f10, sizeof(struct ia64_fpreg) * 6);
+
+ /* fr16-fr31 */
+
+ for (i = 16; i < 32; i++) {
+ retval |= access_fr(&info, i, 0, (unsigned long *) &ppr->fr[i], 0);
+ retval |= access_fr(&info, i, 1, (unsigned long *) &ppr->fr[i] + 1, 0);
+ }
+
+ /* fph */
+
+ ia64_flush_fph(child);
+ retval |= __copy_to_user(&ppr->fr[32], &child->thread.fph, sizeof(ppr->fr[32]) * 96);
+
+ /* preds */
+
+ retval |= __put_user(pt->pr, &ppr->pr);
+
+ /* nat bits */
+
+ retval |= access_uarea(child, PT_NAT_BITS, &ppr->nat, 0);
+
+ ret = retval ? -EIO : 0;
+ return ret;
+}
+
+static long
+ptrace_setregs (struct task_struct *child, struct pt_all_user_regs *ppr)
+{
+ struct switch_stack *sw;
+ struct pt_regs *pt;
+ long ret, retval;
+ struct unw_frame_info info;
+ char nat = 0;
+ int i;
+
+ retval = verify_area(VERIFY_READ, ppr, sizeof(struct pt_all_user_regs));
+ if (retval != 0) {
+ return -EIO;
+ }
+
+ pt = ia64_task_regs(child);
+ sw = (struct switch_stack *) (child->thread.ksp + 16);
+ unw_init_from_blocked_task(&info, child);
+ if (unw_unwind_to_user(&info) < 0) {
+ return -EIO;
+ }
+
+ if (((unsigned long) ppr & 0x7) != 0) {
+ dprintk("ptrace:unaligned register address %p\n", ppr);
+ return -EIO;
+ }
+
+ retval = 0;
+
+ /* control regs */
+
+ retval |= __get_user(pt->cr_iip, &ppr->cr_iip);
+ retval |= access_uarea(child, PT_CR_IPSR, &ppr->cr_ipsr, 1);
+
+ /* app regs */
+
+ retval |= __get_user(pt->ar_pfs, &ppr->ar[PT_AUR_PFS]);
+ retval |= __get_user(pt->ar_rsc, &ppr->ar[PT_AUR_RSC]);
+ retval |= __get_user(pt->ar_bspstore, &ppr->ar[PT_AUR_BSPSTORE]);
+ retval |= __get_user(pt->ar_unat, &ppr->ar[PT_AUR_UNAT]);
+ retval |= __get_user(pt->ar_ccv, &ppr->ar[PT_AUR_CCV]);
+ retval |= __get_user(pt->ar_fpsr, &ppr->ar[PT_AUR_FPSR]);
+
+ retval |= access_uarea(child, PT_AR_EC, &ppr->ar[PT_AUR_EC], 1);
+ retval |= access_uarea(child, PT_AR_LC, &ppr->ar[PT_AUR_LC], 1);
+ retval |= access_uarea(child, PT_AR_RNAT, &ppr->ar[PT_AUR_RNAT], 1);
+ retval |= access_uarea(child, PT_AR_BSP, &ppr->ar[PT_AUR_BSP], 1);
+ retval |= access_uarea(child, PT_CFM, &ppr->cfm, 1);
+
+ /* gr1-gr3 */
+
+ retval |= __copy_from_user(&pt->r1, &ppr->gr[1], sizeof(long) * 3);
+
+ /* gr4-gr7 */
+
+ for (i = 4; i < 8; i++) {
+ long ret = unw_get_gr(&info, i, &ppr->gr[i], &nat);
+ if (ret < 0) {
+ return ret;
+ }
+ retval |= unw_access_gr(&info, i, &ppr->gr[i], &nat, 1);
+ }
+
+ /* gr8-gr11 */
+
+ retval |= __copy_from_user(&pt->r8, &ppr->gr[8], sizeof(long) * 4);
+
+ /* gr12-gr15 */
+
+ retval |= __copy_from_user(&pt->r12, &ppr->gr[12], sizeof(long) * 4);
+
+ /* gr16-gr31 */
+
+ retval |= __copy_from_user(&pt->r16, &ppr->gr[16], sizeof(long) * 16);
+
+ /* b0 */
+
+ retval |= __get_user(pt->b0, &ppr->br[0]);
+
+ /* b1-b5 */
+
+ for (i = 1; i < 6; i++) {
+ retval |= unw_access_br(&info, i, &ppr->br[i], 1);
+ }
+
+ /* b6-b7 */
+
+ retval |= __get_user(pt->b6, &ppr->br[6]);
+ retval |= __get_user(pt->b7, &ppr->br[7]);
+
+ /* fr2-fr5 */
+
+ for (i = 2; i < 6; i++) {
+ retval |= access_fr(&info, i, 0, (unsigned long *) &ppr->fr[i], 1);
+ retval |= access_fr(&info, i, 1, (unsigned long *) &ppr->fr[i] + 1, 1);
+ }
+
+ /* fr6-fr9 */
+
+ retval |= __copy_from_user(&pt->f6, &ppr->fr[6], sizeof(ppr->fr[6]) * 4);
+
+ /* fp scratch regs(10-15) */
+
+ retval |= __copy_from_user(&sw->f10, &ppr->fr[10], sizeof(ppr->fr[10]) * 6);
+
+ /* fr16-fr31 */
+
+ for (i = 16; i < 32; i++) {
+ retval |= access_fr(&info, i, 0, (unsigned long *) &ppr->fr[i], 1);
+ retval |= access_fr(&info, i, 1, (unsigned long *) &ppr->fr[i] + 1, 1);
+ }
+
+ /* fph */
+
+ ia64_sync_fph(child);
+ retval |= __copy_from_user(&child->thread.fph, &ppr->fr[32], sizeof(ppr->fr[32]) * 96);
+
+ /* preds */
+
+ retval |= __get_user(pt->pr, &ppr->pr);
+
+ /* nat bits */
+
+ retval |= access_uarea(child, PT_NAT_BITS, &ppr->nat, 1);
+
+ ret = retval ? -EIO : 0;
+ return ret;
+}
+
/*
* Called by kernel/ptrace.c when detaching..
*
@@ -977,6 +1253,14 @@
case PTRACE_DETACH: /* detach a process that was attached. */
ret = ptrace_detach(child, data);
+ goto out_tsk;
+
+ case PTRACE_GETREGS:
+ ret = ptrace_getregs(child, (struct pt_all_user_regs*) data);
+ goto out_tsk;
+
+ case PTRACE_SETREGS:
+ ret = ptrace_setregs(child, (struct pt_all_user_regs*) data);
goto out_tsk;
default:
diff -urN linux-2.4.18/arch/ia64/kernel/sal.c lia64-2.4/arch/ia64/kernel/sal.c
--- linux-2.4.18/arch/ia64/kernel/sal.c Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/sal.c Fri Dec 14 15:50:57 2001
@@ -18,7 +18,8 @@
#include
#include
-spinlock_t sal_lock = SPIN_LOCK_UNLOCKED;
+spinlock_t sal_lock __cacheline_aligned = SPIN_LOCK_UNLOCKED;
+unsigned long sal_platform_features;
static struct {
void *addr; /* function entry point */
@@ -76,7 +77,7 @@
return str;
}
-static void __init
+static void __init
ia64_sal_handler_init (void *entry_point, void *gpval)
{
/* fill in the SAL procedure descriptor and point ia64_sal to it: */
@@ -102,7 +103,7 @@
if (strncmp(systab->signature, "SST_", 4) != 0)
printk("bad signature in system table!");
- /*
+ /*
* revisions are coded in BCD, so %x does the job for us
*/
printk("SAL v%x.%02x: oem=%.32s, product=%.32s\n",
@@ -152,12 +153,12 @@
case SAL_DESC_PLATFORM_FEATURE:
{
struct ia64_sal_desc_platform_feature *pf = (void *) p;
+ sal_platform_features = pf->feature_mask;
printk("SAL: Platform features ");
- if (pf->feature_mask & (1 << 0))
+ if (pf->feature_mask & IA64_SAL_PLATFORM_FEATURE_BUS_LOCK)
printk("BusLock ");
-
- if (pf->feature_mask & (1 << 1)) {
+ if (pf->feature_mask & IA64_SAL_PLATFORM_FEATURE_IRQ_REDIR_HINT) {
printk("IRQ_Redirection ");
#ifdef CONFIG_SMP
if (no_int_routing)
@@ -166,15 +167,17 @@
smp_int_redirect |= SMP_IRQ_REDIRECTION;
#endif
}
- if (pf->feature_mask & (1 << 2)) {
+ if (pf->feature_mask & IA64_SAL_PLATFORM_FEATURE_IPI_REDIR_HINT) {
printk("IPI_Redirection ");
#ifdef CONFIG_SMP
- if (no_int_routing)
+ if (no_int_routing)
smp_int_redirect &= ~SMP_IPI_REDIRECTION;
else
smp_int_redirect |= SMP_IPI_REDIRECTION;
#endif
}
+ if (pf->feature_mask & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)
+ printk("ITC_Drift ");
printk("\n");
break;
}
diff -urN linux-2.4.18/arch/ia64/kernel/salinfo.c lia64-2.4/arch/ia64/kernel/salinfo.c
--- linux-2.4.18/arch/ia64/kernel/salinfo.c Wed Dec 31 16:00:00 1969
+++ lia64-2.4/arch/ia64/kernel/salinfo.c Tue Feb 26 14:36:07 2002
@@ -0,0 +1,105 @@
+/*
+ * salinfo.c
+ *
+ * Creates entries in /proc/sal for various system features.
+ *
+ * Copyright (c) 2001 Silicon Graphics, Inc. All rights reserved.
+ *
+ * 10/30/2001 jbarnes@sgi.com copied much of Stephane's palinfo
+ * code to create this file
+ */
+
+#include
+#include
+#include
+
+#include
+
+MODULE_AUTHOR("Jesse Barnes ");
+MODULE_DESCRIPTION("/proc interface to IA-64 SAL features");
+MODULE_LICENSE("GPL");
+
+static int salinfo_read(char *page, char **start, off_t off, int count, int *eof, void *data);
+
+typedef struct {
+ const char *name; /* name of the proc entry */
+ unsigned long feature; /* feature bit */
+ struct proc_dir_entry *entry; /* registered entry (removal) */
+} salinfo_entry_t;
+
+/*
+ * List {name,feature} pairs for every entry in /proc/sal/
+ * that this module exports
+ */
+static salinfo_entry_t salinfo_entries[]={
+ { "bus_lock", IA64_SAL_PLATFORM_FEATURE_BUS_LOCK, },
+ { "irq_redirection", IA64_SAL_PLATFORM_FEATURE_IRQ_REDIR_HINT, },
+ { "ipi_redirection", IA64_SAL_PLATFORM_FEATURE_IPI_REDIR_HINT, },
+ { "itc_drift", IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT, },
+};
+
+#define NR_SALINFO_ENTRIES (sizeof(salinfo_entries)/sizeof(salinfo_entry_t))
+
+/*
+ * One for each feature and one more for the directory entry...
+ */
+static struct proc_dir_entry *salinfo_proc_entries[NR_SALINFO_ENTRIES + 1];
+
+static int __init
+salinfo_init(void)
+{
+ struct proc_dir_entry *salinfo_dir; /* /proc/sal dir entry */
+ struct proc_dir_entry **sdir = salinfo_proc_entries; /* keeps track of every entry */
+ int i;
+
+ salinfo_dir = proc_mkdir("sal", NULL);
+
+ for (i=0; i < NR_SALINFO_ENTRIES; i++) {
+ /* pass the feature bit in question as misc data */
+ *sdir++ = create_proc_read_entry (salinfo_entries[i].name, 0, salinfo_dir,
+ salinfo_read, (void *)salinfo_entries[i].feature);
+ }
+ *sdir++ = salinfo_dir;
+
+ return 0;
+}
+
+static void __exit
+salinfo_exit(void)
+{
+ int i = 0;
+
+ for (i = 0; i < NR_SALINFO_ENTRIES ; i++) {
+ if (salinfo_proc_entries[i])
+ remove_proc_entry (salinfo_proc_entries[i]->name, NULL);
+ }
+}
+
+/*
+ * 'data' contains an integer that corresponds to the feature we're
+ * testing
+ */
+static int
+salinfo_read(char *page, char **start, off_t off, int count, int *eof, void *data)
+{
+ int len = 0;
+
+ MOD_INC_USE_COUNT;
+
+ len = sprintf(page, (sal_platform_features & (unsigned long)data) ? "1\n" : "0\n");
+
+ if (len <= off+count) *eof = 1;
+
+ *start = page + off;
+ len -= off;
+
+ if (len>count) len = count;
+ if (len<0) len = 0;
+
+ MOD_DEC_USE_COUNT;
+
+ return len;
+}
+
+module_init(salinfo_init);
+module_exit(salinfo_exit);
diff -urN linux-2.4.18/arch/ia64/kernel/setup.c lia64-2.4/arch/ia64/kernel/setup.c
--- linux-2.4.18/arch/ia64/kernel/setup.c Mon Nov 26 11:18:24 2001
+++ lia64-2.4/arch/ia64/kernel/setup.c Wed Apr 10 11:31:13 2002
@@ -3,7 +3,7 @@
*
* Copyright (C) 1998-2001 Hewlett-Packard Co
* David Mosberger-Tang
- * Copyright (C) 1998, 1999, 2001 Stephane Eranian
+ * Stephane Eranian
* Copyright (C) 2000, Rohit Seth
* Copyright (C) 1999 VA Linux Systems
* Copyright (C) 1999 Walt Drummond
@@ -28,8 +28,8 @@
#include
#include
#include
+#include
-#include
#include
#include
#include