Monday, December 10, 2012

Automatically detecting profiles.

Volatility uses a profile to encapsulates information about a specific version of an operating systems. As internal kernel data structures change from release to release, so too must volatility adapt to these changes. The profile is usually automatically generated from debugging symbols taken from a specific version of the operating system. For analyzing windows systems, Volatility comes with a number of pre-generated profiles for the common windows releases out there. For Linux, a profile must almost always be generated for the system under examination, since the layout of Linux data structures change, even depending on local configuration changes. For that reason we wont be discussing Linux profiles in this post.
Previous versions of Volatility required the profile to always be explicitly specified. This is fine if you know in advance what version of windows you have, but sometimes you receive an image taken by a third party which has no context - you don’t exactly know the version or patch level of the image. In previous versions of volatility you would need to run the imageinfo plugin. This plugin uses difference in the KDBG structure between windows versions to guess the correct profile. You would then need to copy the profile detected and provide it to all future invocations with the --profile options:
~/volatility/trunk$ python vol.py -f image.raw imageinfo
Volatile Systems Volatility Framework 2.3_alpha
Determining profile based on KDBG search...

          Suggested Profile(s) : WinXPSP2x86, WinXPSP3x86 (Instantiated with WinXPSP2x86)
                     AS Layer1 : JKIA32PagedMemoryPae (Kernel AS)
                     AS Layer2 : FileAddressSpace (image.raw)
                      PAE type : PAE
                           DTB : 0xb94000L
                          KDBG : 0x8054d2e0
          Number of Processors : 2
     Image Type (Service Pack) : 3
                KPCR for CPU 0 : 0xffdff000
                KPCR for CPU 1 : 0xf7777000
             KUSER_SHARED_DATA : 0xffdf0000
           Image date and time : 2012-12-11 00:03:40 UTC+0000
     Image local date and time : 2012-12-10 18:03:40 -0600
The old imageinfo plugin takes a long time to run since it instantiates every profile in turn. It is OK to use this plugin to discover the profile name initially, but then we still need to provide it again.
NotePrevious versions of Volatility used incidental information in the kernel debugger block to guess the profile (for example, the size of the KdDebuggerDataBlock struct). This information needed to be hand maintained for each profile since it is not present in debug symbols. In the next Volatility version, this information is not used, and even the KDBG scanner does not use profile specific information (The kernel debugger block layout is mostly identical for all versions of Windows).
For the next version of Volatility, I wanted to have an automatic profile selection system, so users just do not need to think about it. It is not necessary to even provide the profile at all. To make this work we need to have a very quick but very reliable method of detecting the profile.

Finding the kernel DTB

One of the first things Volatility does when opening the image is to detect the kernel Directory Table Base. This is the physical address of the kernel’s base of the page tables. Without a valid DTB it is impossible to create the kernel’s virtual address space - and hence this is the first thing we need.
Volatility finds out the address of the DTB by scanning for a well known _EPROCESS signature corresponding to a kernel process. Volatility 2.x scans for theIdle process by searching to the string "Idle". Once the Idle process is found, we can deduce the kernel DTB by following the _EPROCESS.Pcb.DirectoryTableBase member. In order to follow this member we need to know about the memory layout of the _EPROCESS struct - which we get from the profile. Therefore in the old 2.x releases of Volatility, we needed to know the profile before determining the DTB.

Guessing the profile

With the next generation of Volatility, profile auto-selection is combined with the DTB scan in order to kill two birds at once. The guess_profile plugin is automatically run by the framework when the profile is not provided. This plugin scans for both the DTB and profile parameters at the same time. Deciding on the correct profile to choose comes down to the differences in the _EPROCESS struct layout alone (i.e. we no longer look at differences in the KDBG structure). In the new version, we search for the "System" process instead of the "Idle" process since it is more robust for verification (The Idle process does not always have valid threads, and is not always in the PsActiveProcessList list).
Since profiles are large and rather expensive to load and compile (There are now over 20 different Windows profiles), we pre-generate a concise and very small test profile containing important _EPROCESS member offsets for all the currently supported profiles. We then load this very small concise profile and overlay each version of the _EPROCESS struct over the System process signature, ensuring to sanity check the match:
  • Search for a string "System" - this should be the ImageFileName member.
  • For each test profile in the concise look-up map, perform the following:
    • Go back to the start of the _EPROCESS and get the DirectoryTableBase member. Validate the DTB to eliminate obviously wrong values.
    • Instantiate a virtual address space using the DTB value, and use this address space to reflect through both the ActiveProcessLinks and ThreadListHead linked lists. This gives a pretty strong signal that the _EPROCESS address is valid.
    • When we get here we are pretty certain to have found the correct profile and DTB.
When the System _EPROCESS is identified, we set both the DTB and the profile. The above process is pretty fast and actually does not add any more processing than in previous versions. We are still scanning the image once, looking for the System process signature. This algorithm is also very robust.
Due to the low cost of this plugin in comparison to the previous imageinfo plugin, we can afford to run this by default at every run. This improves usability since the user doesn’t even need to think about a profile - they just run Volatility as normal:
~/volatility/svn$ ./vol.py -f image.raw pslist
Updating session profile and address spaces.
Offset (V) Name                    PID   PPID   Thds     Hnds   Sess  Wow64 Start                Exit
---------- -------------------- ------ ------ ------ -------- ------ ------ -------------------- --------------------
0x8a6d3490 System                    4      0     83      944 ------  False -                    -
0x89c66990 smss.exe                868      4      2       21 ------  False 2012-12-03 18:33:03  -
0x89cce6c8 csrss.exe               928    868     12      492      0  False 2012-12-03 18:33:05  -
0x89474788 winlogon.exe            952    868     16      520      0  False 2012-12-03 18:33:08  -
0x89465da0 services.exe            996    952     16      299      0  False 2012-12-03 18:33:11  -
0x89458020 lsass.exe              1008    952     25      507      0  False 2012-12-03 18:33:12  -
.....
When no plugin is provided, Volatility drops into the interactive shell. In this case the user can see the chosen profile as part of the Volatility prompt:
~/volatility/svn$ python vol.py -f image.raw
Updating session profile and address spaces.
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
Type "copyright", "credits" or "license" for more information.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License.
WinXPSP3x86PAE:image.raw 07:16:09> pslist
---------------------------------> pslist()
Offset (V) Name                    PID   PPID   Thds     Hnds   Sess  Wow64 Start                Exit
---------- -------------------- ------ ------ ------ -------- ------ ------ -------------------- --------------------
0x8a6d3490 System                    4      0     83      944 ------  False -                    -
0x89c66990 smss.exe                868      4      2       21 ------  False 2012-12-03 18:33:03  -
0x89cce6c8 csrss.exe               928    868     12      492      0  False 2012-12-03 18:33:05  -

Guessing a profile from KDBG

Sometimes we already know a lot of information about the image before we start. For example, when analyzing a crash dump, we already have the DirectoryTableBase member, as well as the KdDebuggerDataBlock address provided for us in the crash dump _DMP_HEADER64 header.
Similarly when using the WinPmem driver for analyzing live systems, we can obtain these structures directly from the driver which collects the kernel’s CR3 and the KdDebuggerDataBlock address.
In these cases, profile auto-selection does add a significant amount of work to startup, since we would not normally need to perform a DTB scan at all, but without a profile we still need to scan for the System process as described above.
To avoid this we need to implement a slightly different algorithm for profile selection when the KDBG and DTB are already known:
  • Locate the _KDDEBUGGER_DATA64.PsActiveProcessHead member for the head of the process list. We also instantiate a virtual kernel address space using the provided DTB.
  • For each _EPROCESS in the test profile, attempt to get the next process in the list from the PsActiveProcessHead. We can then verify this process in the same way as above (i.e. check that we can properly reflect through its lists etc). The profile which works is the correct profile to select.
This method is extremely fast and does not actually add any time to startup time.
So now we can avoid having to specify the profile altogether, making the user experience much smoother. As a result the imageinfo plugin is now deprecated (in its current form) in the Tech Preview branch since it is no longer useful.

Sunday, November 18, 2012

Finding the Kernel Debugger Block

The kernel debugger block (named KdDebuggerDataBlock of the type _KDDEBUGGER_DATA64) is important for many things that Volatility and debuggers do. For example, it has a reference to the PsActiveProcessHead which is the list head of all processes required for process listing.
Because it is so useful the location of the kernel debugger block is also stored in the crash dump header. This assists the kernel debugger to find the _KDDEBUGGER_DATA64 quickly.
I wanted to add the ability to write crash dump files to the WinPmem memory acquisition tool. Many commercial acquisition tools also offer this useful feature. Storing the image in a crash dump format is convenient since it is possible to open the image using a variety of tools such as volatility and Microsoft’s own kernel debugger Windbg. (Of course its possible to convert a raw image to a crash dump using volatility’s raw2dmp plugins but its more convenient to take the image using a crash dump in the first place).
So I needed a way locate the kernel debugger block (KdDebuggerDataBlock) from the running system. Previously there have been two methods published for doing this Finding Kernel Global Variables in Windows:
  1. The first method involves finding the KPCR (Which I talked about previously) and following its KdVersionBlock member. This method is ideal for performing on the running system since the KPCR is simply stored in the fs/gs register of the running thread.
  2. The second method which is used by Volatility itself is to scan for KdDebuggerDataBlock using a specific signature for a valid _KDDEBUGGER_DATA64.
Both of these methods are not ideal. The first method was used by win32dd for a time and Matthew Suiche wrote about it here. Unfortunately this method stopped working in recent versions of Windows. Recently the KdVersionBlock member is always 0 and does not link to the kernel debugger block.
The second method would work but would be quite slow as it scans the entire kernel address space for the KDBG signature. It is also less reliable since there could be a number of hits for the KDBG signature and we might get the wrong hit.
There has to be an easier way :).
The first thing I asked myself was "What exactly is the kernel debugger block?". I used google to locate a definition for it in the sources for ReactOS (an open source reimplementation of windows). The relevant snippet shows:
00392 KDDEBUGGER_DATA64 KdDebuggerDataBlock =
00393 {
00394     {{0}},
00395     0,
00396     {(ULONG_PTR)RtlpBreakWithStatusInstruction},
00397     0,
00398     FIELD_OFFSET(KTHREAD, CallbackStack),
00399     FIELD_OFFSET(KCALLOUT_FRAME, CallbackStack),
00400     FIELD_OFFSET(KCALLOUT_FRAME, CBSTACK_FRAME_POINTER),
00401     FALSE,
00402     {(ULONG_PTR)KiCallUserMode},
00403     0,
00404     {(ULONG_PTR)&PsLoadedModuleList},
00405     {(ULONG_PTR)&PsActiveProcessHead},
00406     {(ULONG_PTR)&PspCidTable},
00407     {(ULONG_PTR)&ExpSystemResourcesList},
00408     {(ULONG_PTR)ExpPagedPoolDescriptor},
00409     {(ULONG_PTR)&ExpNumberOfPagedPools},
00410     {(ULONG_PTR)&KeTimeIncrement},
...
00555     {(ULONG_PTR)&IopNumTriageDumpDataBlocks},
00556     {(ULONG_PTR)IopTriageDumpDataBlocks},
00557 };
So according to the ReactOS sources the debugger block is a statically allocated structure which seems to be filled in by KdInitSystem.
This is quite interesting since static structs are always located in the same position relative to the PE executable’s base address. The compiler usually places static variables in the .data section of the PE binary. This means that in practice, the KdDebuggerDataBlock can only exist within the .data section of the kernel binary.
The problem boils down to finding the kernel base address. Once we know that, we can easily find the valid ranges for the data section. In most kernel’s the .data section is very small (less than 100kb) and is super fast to scan. Also since there is only deterministic data there, there will never be another struct with the same signature in such a small region.
So how can the WinPmem driver locate the kernel base address? The base address itself is not actually exported, but many other symbols are exported from the kernel’s Export Address Tables (pretty much any kernel API). We know that the kernel export address table is usually located at a higher address than the kernel base, and that the kernel base address is page aligned. We now can scan for it:
KDBG
Figure 1. KDBG Scan algorithm
We choose to use NtBuildNumber as the exported symbol to look for (just in case a real API function is hooked). We then round down to page align and step backwards looking for the PE header. In Windows 7, there are unmapped guard pages between the mapped kernel sections - trying to read from these will blue screen the system. We therefore check first that the pages are mapped using the MmIsAddressValid() API.
Once the PE header is found, we calculate the region for the .data section. Since this region is very small and we do not need to worry about false positives, we can relax the search criteria and just search for the OwnerTag "KDBG".
The result is an extremely fast and reliable way for locating the kernel debugger block in a live system. This allows the WinPmem kernel driver to report this address so that the user space component can write the crash dump files correctly.
The added benefit of the kernel driver locating the debugger block itself is that Volatility does not need to scan for it while it is analysing the live system. This saves a rather expensive call for the kdbgscan plugin, which would otherwise need to be made before running most plugins.

Saturday, November 17, 2012

The PMEM Memory acquisition suite


Memory acquisition is the first step in memory analysis. Before any analysis can be done, we need to acquire the memory in the first place. There are a number of commercial solutions to acquire memory, but sadly open source solutions have been abandoned or not maintained (For example win32dd has been a popular solution many years ago but has now been commercialized and is no longer open source).
We believe in open source forensic tools to make testing and transparency easier. We also believe that the availability of open source solutions spurs further development in the field and enables choices.
That is the reason we feel an open source, well tested and capable forensic memory acquisition tool is essential - we call it the Pmem suite of tools. The pmem acquisition tool aims to provide a complete imaging solution for Windows, Linux and OSX.
The following is a quick overview of how to use the pmem tools. For detailed information consult the source.

1. WinPmem

The windows memory acquisition tool is called WinPmem.
These are the features it supports:
  1. Supports all windows versions from WinXP SP2 to Windows 8 in both i386 and amd64 flavours.
  2. Output formats include:
    1. Raw memory images.
    2. Microsoft Crashdump files for use in windbg and volatility.
    3. Output to stdout (in both the above formats) for piping through other tools (e.g. ssh, ewfacquirestream etc).
  3. Memory acquisition using
    1. MmMapIoSpace method.
    2. \Device\PhysicalMemory and ZwMapViewOfSection method.
  4. Direct analysis of the running kernel using Volatility (Live memory analysis).
  5. Optional Write support for manipulating kernel data structures from Volatility.

1.1. Download

The latest version can be found here or on the Volatility download page. You will find the tool released in two versions:
  • winpmem-1.3.1.exe: is the recommended binary for general use. This binary contains signed drivers so it can load on any windows system (even 64 bit ones). This binary does not include write support for memory.
  • winpmem-1.3.1-write.exe: is the binary with write support enabled. It is not signed so it will only work on 32 bit windows or 64 bit windows with special preparation (see below).
ImportantThe recommended version for regular use is the one without write support. The version with write support can not be used on a regular system.
c:\..> winpmem_1.3.exe -h
Winpmem - A memory imager for windows.
Copyright Michael Cohen (scudette@gmail.com) 2012.

Version 1.3. Built Nov 12 2012
Usage:
  winpmem_1.3.exe [option] [output path]

Option:
  -l    Load the driver and exit.
  -u    Unload the driver and exit.
  -h    Display this help.
  -w    Turn on/off write mode.
  -1    Use MmMapIoSpace method.
  -2    Use \\Device\PhysicalMemory method (Default).
  -d    Produce a crashdump file.


NOTE: an output filename of - will write the image to STDOUT.

1.2. Examples

Writes a raw image to physmem.raw
winpmem_1.3.exe physmem.raw
Writes a crashdump file to netcat for network transport. Output is supressed here because STDOUT is redirected.
winpmem_1.3.exe -d - | nc 192.168.1.1 80
Normally the driver will be automatically unloaded after the image is acquired. To allow volatility to attach to the raw device for live analysis, we need to load the driver and exit:
c:\..> winpmem.exe -l
Loaded Driver.
c:\..> vol.py -f \\.\pmem
NoteOnly the tech preview volatility version is able to open the raw device. In the tech preview version there is no need to specify a profile, since it is autodetected.
To unload the driver and exit:
c:\..> winpmem.exe -u
Driver Unloaded.
To acquire a raw image using the MmMapIoSpace method:
c:\..> winpmem_1.3.exe -1 myimage.raw
To acquire an image in crashdump format:
c:\..>winpmem_1.3.exe -d c:\temp\test.dmp
Driver Unloaded.
Loaded Driver C:\Users\mic\AppData\Local\Temp\win6C6.tmp.
Will write a crash dump file
CR3: 0x0000187000
 2 memory ranges:
Start 0x00001000 - Length 0x0009E000
Start 0x00100000 - Length 0x6F6FB000


00% 0x00001000 .

00% 0x00100000 ..................................................
02% 0x03300000 ..................................................
05% 0x06500000 ..................................................
...
92% 0x67300000 ..................................................
95% 0x6A500000 ..................................................
98% 0x6D700000 .................................
Driver Unloaded.

1.3. Experimental write support

As from Version 1.1, the winpmem drivers support writing to memory as well as reading. This capability is a great learning tool since many rootkit hiding techniques can be emulated by writing to memory directly. For example the following Volatility session illustrates changing the name of the binary:
c:..> vol.py -f \\.\pmem

WinXPSP2x86:pmem 03:10:40> task = session.profile._EPROCESS(0x82079c18)
WinXPSP2x86:pmem 03:10:57> task.ImageFileName
                    Out    [String:ImageFileName]: 'cmd.exe\x00'
WinXPSP2x86:pmem 03:11:15> task.ImageFileName = "foo.exe\x00"
WinXPSP2x86:pmem 03:11:21> task.ImageFileName
                    Out    [String:ImageFileName]: 'foo.exe\x00'
Since this is a rather dangerous capability, the signed binary drivers have write support disabled. The unsigned binaries (really self signed with a test certificate) can not load on a regular system due to them being test self signed. You can allow the unsigned drivers to be loaded on a test system by issuing (seehttp://msdn.microsoft.com/en-us/library/windows/hardware/ff553484(v=vs.85).aspx):
Bcdedit.exe -set TESTSIGNING ON
and reboot. You will see a small "Test Mode" text on the desktop to remind you that this machine is configured for test signed drivers.
Alternatively you can test this on XP or Vista32 which have no driver signing restrictions.
Once the correct driver is loaded, Write support must also be enabled at load time using the -w switch:
winpmem_1.3-write.exe -w -l
This will load the drivers and turn on write support. Then we can run volatility interactively, as usual on the raw device:
vol.exe --profile Win7SP1x64 --file \\.\pmem

Thursday, October 11, 2012

Finding KPCR in memory images.


I was looking into finding the KPCR structure in memory images. The KPCR is used to store a lot of information about the running CPU and in windows version prior to Windows 7 also contains a link to the kernel debugger block.
Volatility already contains a scanner for KPCR (in this example the image is 821676355 bytes big):
~/projects/volatility$ time python vol.py --profile Win7SP1x64 --file ~/images/win7_trial_64bit.dmp kpcrscan
Volatile Systems Volatility Framework 2.2
**************************************************
Offset (V)                    : 0xf80002842d00
Offset (P)                    : 0x2842d00
KdVersionBlock                : 0x0
IDT                           : 0xf80000b95080
GDT                           : 0xf80000b95000
CurrentThread                 : 0xf80002850c40 TID 0 (Idle:0)
IdleThread                    : 0xf80002850c40 TID 0 (Idle:0)
Details                       : CPU 0 (GenuineIntel @ 2394 MHz)
CR3/DTB                       : 0x187000


real 48m33.825s
user 48m13.717s
sys 0m3.888s
As you can see this is extremely slow. Why is it so slow? We can look at the source code for the kpcr scanner check:
00001: def check(self, offset):
00002:         """ We check that _KCPR.pSelfPCR points to the start of the _KCPR struct """
00003:         paKCPR = offset
00004:         paPRCBDATA = offset + self.PrcbData_offset
00005:
00006:         try:
00007:             pSelfPCR = obj.Object('Pointer', offset = (offset + self.SelfPcr_offset), vm = self.address_space)
00008:             pPrcb = obj.Object('Pointer', offset = (offset + self.Prcb_offset), vm = self.address_space)
00009:             if pSelfPCR == paKCPR and pPrcb == paPRCBDATA:
00010:                 self.KPCR = pSelfPCR
00011:                 return True
00012:
00013:         except BaseException:
00014:             return False
00015:
00016:         return False
So the scanner compares each 4 byte position in the image by overlaying the _KPCR struct over it and checking that _KPCR.Self points at the address of _KPCR.
This is extremely slow since it will perform millions of comparisons in python as well as many very small read operations from the raw image (IO operations are very expensive on windows systems making this even slower).
There must be an easier way to find KPCR!
The solution is to examine the _KPCR object and see what we can leverage so we do not need to resort to an exhaustive search:
Win7SP1x64:win7_trial_64bit.raw 12:54:13> dt "_KPCR"
[_KPCR _KPCR] @ 0x00000000
  0x00 GdtBase
  0x00 NtTib                         [_NT_TIB NtTib] @ 0x00000000
  0x00 _GDT
  0x08 TssBase
  0x10 UserRsp                        [unsigned long long:UserRsp]: 0x00000000
  0x18 Self
  0x20 CurrentPrcb
  .....
  0x180 Prcb                          [_KPRCB Prcb] @ 0x00000180
And we can examine the contents of the Prcb:
Win7SP1x64:win7_trial_64bit.raw 12:54:19> dt "_KPRCB"
[_KPRCB _KPRCB] @ 0x00000000
...
  0x5F0 CpuType                                                [unsigned char:CpuType]: 0x00000000
  0x5F1 CpuID                                                  [unsigned char:CpuID]: 0x00000000
  0x5F2 CpuStep                                                [unsigned short:CpuStep]: 0x00000000
  0x5F2 CpuStepping                                            [unsigned char:CpuStepping]: 0x00000000
  0x5F3 CpuModel                                               [unsigned char:CpuModel]: 0x00000000
  0x5F4 MHz                                                    [unsigned long:MHz]: 0x00000000
  0x5F8 HalReserved
  0x638 MinorVersion                                           [unsigned short:MinorVersion]: 0x00000000
  0x63A MajorVersion                                           [unsigned short:MajorVersion]: 0x00000000
  0x63C BuildType                                              [unsigned char:BuildType]: 0x00000000
  0x63D CpuVendor                                              [unsigned char:CpuVendor]: 0x00000000
...
  0x4480 WaitListHead                                          [_LIST_ENTRY WaitListHead] @ 0x00004480
We notice the WaitListHead member. After Googling for this (http://www.codemachine.com/article_kernelstruct.html#KPCR) we find that this is the list head of threads waiting to run on this CPU:
nt!_KTHREAD

The WaitListEntry field is used to add the KTHREAD structure to the list of
threads that have entered into a wait state on a particular CPU. The
WaitListHead field of the Kernel Processor Control Region (KPRCB) structure for
every CPU links such threads together via the KTHREAD.WaitListEntry
field. Threads are added to this list by the function KiCommitThreadWait() and
removed from this list by KiSignalThread().
So the trick is to enumerate all threads and see which ones are waiting to run on this CPU. In order to find the kernel DTB we already are searching for the System process so we usually already have one _EPROCESS object we know about.
00001:         for kthread in task.Pcb.ThreadListHead.list_of_type(
00002:             "_KTHREAD", "ThreadListEntry"):
00003:
00004:             # Look for threads in the Wait state. If this thread is in the Wait
00005:             # state, the WaitListEntry will belong to the list of all waiting
00006:             # threads. By following this list we should get to the list head
00007:             # which lives inside the _KPCR object.
00008:             for kwaiter in kthread.WaitListEntry.list_of_type(
00009:                 "_KTHREAD", "WaitListEntry"):
00010:
00011:                 # Assume the kwaiter is actually the KPRCB.WaitListHead.
00012:                 possible_kpcr = self.profile._KPCR(
00013:                     kwaiter.WaitListEntry.obj_offset - offset)
00014:
00015:                 # Check for validity using the usual condition.
00016:                 if possible_kpcr.Self == possible_kpcr.obj_offset:
00017:                     if possible_kpcr.obj_offset not in seen:
00018:                         seen[possible_kpcr.obj_offset] = possible_kpcr
This runs a lot faster:
~/volatility/svn$ time vol.py -f ~/images/win7_trial_64bit.dmp kpcr
**************************************************
Property                       Value
------------------------------ -----
Offset (V)                     0xf80002842d00
Offset (P)                     0x2842d00L
KdVersionBlock                 0
IDT                            0xf80000b95080L
GDT                            0xf80000b95000L
CurrentThread                 : 0xf80002850c40 TID 0 (System:0)
IdleThread                    : 0xf80002850c40 TID 0 (System:0)
Details                       : CPU 0 (GenuineIntel @ 2394 MHz)
CR3/DTB                       : 0x187000

real 0m6.457s
user 0m5.948s
sys 0m0.480s
Yes thats 6.5 seconds vs 48 minutes!
Unfortunately this only works on later versions of windows than XP right now since the WaitListHead was introduced with Windows 2003.
PS - I originally wanted to examine the KPCR as a way of quickly retrieving the KDBG but this does not work as recent windows versions set KdVersionBlock to 0. We can not use the KPCR as a substitute for the DTB scan either since all this work is done in the kernel virtual address space (so we need a DTB already). In practice DTB scanning is very quick and KDBG scans are also not too bad so its not a huge problem.