So I have just finished my DFRWS paper submission - what a rush!!!
This year we decided to use google docs for collaborating between the authors. Google docs is a great collaborative tool because it allows everyone to edit the document at the same time. There is also a perfectly adequate but simple diagram drawing tool.
Unfortunately the formatting capabilities of google doc do not compare to latex (no word processor can really). What we wanted is a quick way to convert to latex for final proofing but for the document to be as intuitive as possible.
So I wipped together a quick script to convert the google doc into latex. Simple formatting available on google doc will be converted to some simple latex formatting but you can write any latex commands within the google doc for fine level control.
I thought this was such a useful thing that I quickly put it up on google code for others to share - no guarantee of how good this is.
http://code.google.com/p/googledoc2latex/wiki/Usage
Hopefully this can be refined further - but i think the idea is great. Edit a somewhat WYSIWYG collaborative document like google doc, and still produce publication quality product.
Scudette in Wonderland
Sunday, February 28, 2010
Thursday, February 18, 2010
Treaps - Best data structure since sliced bread
I will be the first to admit that data structures are not my strong point. The wonderful thing about data structures is that once someone shows you how one works - its easy to understand, but trying to do something with the wrong data structure is a performance killer.
While updating the AFF4 map implementation I had a requirement for a unique data structure. First a quick recap of the AFF4 map implementation. AFF4 maps are a list of points which specify a linear transormation between the map stream and one or more backing streams (which may be maps or images or whatever). For example:
map stream offset, target offset, target URL
0,100,file:///file1.dd
500,1000,file:///file2.dd
800,2123,file:///file3.dd
This means that data in the map stream from offset 0 to 500 will be take from offset 100 to 600 in file://file1.dd. From byte 500 to 800, the data will be taken from offset 1000 to 1300 in file2.dd etc.
In order to satisfy a read request at a specific offset we need to quickly search the map for the two elements immediately before and after the offset. For example if I want to read 100 bytes from offset 450 in the map stream, I need to find row 1 and row 2 above:
Before:
0,100,file:///file1.dd
After:
500,1000,file:///file2.dd
I then extrapolate the target offset and realise that I only have 50 bytes available from offset 100 + 450 = 550 in file1.dd.
When the map is huge the speed of this search is very important. The interesting thing is that I will rarely need an exact match - I really need a range.
My initial implementation used a sorted list for the map, sorted by the stream offset. I then did a binary search for the stream offset and retreived the index before and after the final point. Binary searching is O(lg N) and storage efficiency is excellent - no overhead is requires as the array can simply be allocated at once.
The biggest problem with this approach is that it is not possible to add points to the map at the same time as reading them. This is because the array needs to be sorted before you can binary search it. If we just add all the elements in batch and then sort we can get away with O(1) on adding and then O(n) for the sort. There are a number of applications, however, where we need to be able to read the map while we are writing it. In particular whenever we need to know if a point is already present in the map, before adding a new one we need to sort it first. Sorting upon each insertion will result in O(n lg n) for insertion which is very poor.
So I needed a data structure which:
A treap http://en.wikipedia.org/wiki/Treap is the perfect solution:
Reading the wikipedia page above made my head spin. Implementing one would be tricky. Luckily I found this http://www.canonware.com/trp/ - Awesome.
How to actually use it?
The .h file defines a bunch of macros (huge macros I might add) which implement the basic functions required for a treap implementation. I guess this is kind of like c++ templates - the idea is to have the macros define the algorithm and fit it to any struct you need.
So there are two major structures a tree_t struct (the names are settable as args to the macro), and a node_t struct. The node carries whatever information you want to store in each node, while the tree just stores the head of the treap (and is used in all operations on the treap).
You start off by defining the node:
Inside our C file we now just generate all the functions we need to manipulate the tree. We need to define a comparison function so the implementation can order elements in the tree. The comparison function is called on two nodes to decide if they are the same or one is bigger than the other. In our case we only care about the stream offsets:
In other words it will make static functions like tree_insert(), tree_remove(), tree_search() etc. Note that the macros do not manage memory at all, they just maintain the pointers to each node. This means you still need to worry about allocating and deleting individual nodes - you might want to allocate a slab for a bunch of nodes or allocate each node separately.
Now to use the functions:
You can also do tree_nsearch() and tree_psearch() to retrieve the nodes before and after this one.
To dump the tree in sorted order we define an iterator function callback and run it on all the tree:
then (data is a pointer passed to the callback):
Thats it.
This can be used very easily to build a fast data tree as well - making it an awesome substitute for a hash table or a dictionary. Even when we just want to store a bunch of items by a key.
While updating the AFF4 map implementation I had a requirement for a unique data structure. First a quick recap of the AFF4 map implementation. AFF4 maps are a list of points which specify a linear transormation between the map stream and one or more backing streams (which may be maps or images or whatever). For example:
map stream offset, target offset, target URL
0,100,file:///file1.dd
500,1000,file:///file2.dd
800,2123,file:///file3.dd
This means that data in the map stream from offset 0 to 500 will be take from offset 100 to 600 in file://file1.dd. From byte 500 to 800, the data will be taken from offset 1000 to 1300 in file2.dd etc.
In order to satisfy a read request at a specific offset we need to quickly search the map for the two elements immediately before and after the offset. For example if I want to read 100 bytes from offset 450 in the map stream, I need to find row 1 and row 2 above:
Before:
0,100,file:///file1.dd
After:
500,1000,file:///file2.dd
I then extrapolate the target offset and realise that I only have 50 bytes available from offset 100 + 450 = 550 in file1.dd.
When the map is huge the speed of this search is very important. The interesting thing is that I will rarely need an exact match - I really need a range.
My initial implementation used a sorted list for the map, sorted by the stream offset. I then did a binary search for the stream offset and retreived the index before and after the final point. Binary searching is O(lg N) and storage efficiency is excellent - no overhead is requires as the array can simply be allocated at once.
The biggest problem with this approach is that it is not possible to add points to the map at the same time as reading them. This is because the array needs to be sorted before you can binary search it. If we just add all the elements in batch and then sort we can get away with O(1) on adding and then O(n) for the sort. There are a number of applications, however, where we need to be able to read the map while we are writing it. In particular whenever we need to know if a point is already present in the map, before adding a new one we need to sort it first. Sorting upon each insertion will result in O(n lg n) for insertion which is very poor.
So I needed a data structure which:
- At least O(lg N) on retrieval
- Better than O(n log n) on insertion.
- Reasonably efficient for sorting.
- Most importantly it has to be a data structure which can retrieve the nearest match rather than an exact match. We need to be able to identify the highest offset below the query point, and the lowest offset above the query point.
A treap http://en.wikipedia.org/wiki/Treap is the perfect solution:
- It has O(lg n) on insertion and retrieval
- Its possible to do a previous search and next search to retrieve the elements which are before and after the query point. This gives us direct ranging.
- The treap is already sorted - to dump a sorted list, just traverse the treap in forward order.
Reading the wikipedia page above made my head spin. Implementing one would be tricky. Luckily I found this http://www.canonware.com/trp/ - Awesome.
How to actually use it?
The .h file defines a bunch of macros (huge macros I might add) which implement the basic functions required for a treap implementation. I guess this is kind of like c++ templates - the idea is to have the macros define the algorithm and fit it to any struct you need.
So there are two major structures a tree_t struct (the names are settable as args to the macro), and a node_t struct. The node carries whatever information you want to store in each node, while the tree just stores the head of the treap (and is used in all operations on the treap).
You start off by defining the node:
typedef struct map_point_node_s map_point_node_t;
struct map_point_node_s {
uint64_t image_offset;
uint64_t target_offset;
// This is a pointer into the target list (i.e. its not unique to
// this node).
RDFURN target;
// The link is what makes this node part of the tree.
trp_node(map_point_node_t) link;
};
// This just defines the tree type.
typedef trp(map_point_node_t) map_point_tree_t;
Inside our C file we now just generate all the functions we need to manipulate the tree. We need to define a comparison function so the implementation can order elements in the tree. The comparison function is called on two nodes to decide if they are the same or one is bigger than the other. In our case we only care about the stream offsets:
static int map_point_cmp(map_point_node_t *a, map_point_node_t *b) {
int rVal = (a->image_offset > b->image_offset) - \
(a->image_offset <>image_offset);
return rVal;
};
/* This huge macro generates all the tree tranversal and searching
functions.
*/
trp_gen(static, tree_, map_point_tree_t, map_point_node_t, \
link, map_point_cmp, 1297, 1301);
In other words it will make static functions like tree_insert(), tree_remove(), tree_search() etc. Note that the macros do not manage memory at all, they just maintain the pointers to each node. This means you still need to worry about allocating and deleting individual nodes - you might want to allocate a slab for a bunch of nodes or allocate each node separately.
Now to use the functions:
// Statically allocate a tree
map_point_tree_t tree;
// Initialise it
tree_new(&tree, 42);
// Make a new node
map_point_node_t *node = calloc(sizeof(map_point_node_t));
//set the key:
node->stream_offset = 5;
// add it
tree_add(&tree, node);
// retrieve it - we make a static node, fill it with the key and then search for the node in the tree which matches it:
{
map_point_node_t query;
query.stream_offset = 5;
// This will be NULL if the query is not in the tree
node = tree_search(&tree, &query);
}
You can also do tree_nsearch() and tree_psearch() to retrieve the nodes before and after this one.
To dump the tree in sorted order we define an iterator function callback and run it on all the tree:
static map_point_node_t *inline_map_iterate_cb(map_point_tree_t *tree, map_point_node_t *node, void *data) {
printf("%llu\n",node->stream_offset);
return NULL;
};
then (data is a pointer passed to the callback):
tree_iter(&tree, NULL, inline_map_iterate_cb, (void *)data);
Thats it.
This can be used very easily to build a fast data tree as well - making it an awesome substitute for a hash table or a dictionary. Even when we just want to store a bunch of items by a key.
Tuesday, July 21, 2009
screenrc file
I have been using the awesome screen program for a long time. I recently remembered that someone once gave me the following ~/.screenrc file:
This file enables a small status bar at the bottom of the screen with the hostname and color. I you need to manage lots of different servers this is great as you can make each status bar a different color - never get hosts confused again!
# Screen configuration
# Statusbar
hardstatus off
hardstatus alwayslastline
hardstatus string "%{.bW}%-w%{.rW}%n %t%{-}%+w %=%{..G} %H %{..Y} %d/%m %C%a"
# Scrolling in xterms
termcapinfo xterm|xterms|xs|rxvt ti@:te@
# Miscellaneous
startup_message off
This file enables a small status bar at the bottom of the screen with the hostname and color. I you need to manage lots of different servers this is great as you can make each status bar a different color - never get hosts confused again!
Python Debugger
I thought I will document this somewhere in case its useful to anyone.
I recently had to do some work with the python debugger which is cool, but does not support any command line completion or even a history. I googled and apparently you can use ipython as a debugger but I could not get it to work properly.
I found some code on the net to do completion and I added a bit of history. To do this you need to add the following files:
~/.pdbrc.py:
and ~/.pdbrc
I recently had to do some work with the python debugger which is cool, but does not support any command line completion or even a history. I googled and apparently you can use ipython as a debugger but I could not get it to work properly.
I found some code on the net to do completion and I added a bit of history. To do this you need to add the following files:
~/.pdbrc.py:
import readline, os.path, os
HISTORY = os.path.join(os.environ["HOME"], ".pdbhist")
try:
readline.read_history_file(HISTORY)
except IOError: pass
import atexit
atexit.register(readline.write_history_file,HISTORY)
# save this in .pdbrc.py in your home directory
def complete(self, text, state):
"""return the next possible completion for text, using the current frame's
local namespace
This is called successively with state == 0, 1, 2, ... until it
returns None. The completion should begin with 'text'.
"""
try:
# keep a completer class, and make sure that it uses the current local scope
if not hasattr(self, 'completer'):
self.completer = rlcompleter.Completer(self.curframe.f_locals)
else:
self.completer.namespace = self.curframe.f_locals
return self.completer.complete(text, state)
except Exception,e:
print e
and ~/.pdbrc
# save this in .pdbrc in your home directory
import os
import sys
# import rlcompleter early, as it contains side effects
import rlcompleter
# refresh the terminal
os.system("stty sane")
# this rc file takes single lines, so define our complete function here
execfile(os.path.expanduser("~/.pdbrc.py"))
# replace the Pdb class's complete method with ours
sys._getframe(1).f_globals['Pdb'].complete = complete
# set use_rawinput to 1 as tab completion relies on rawinput being used
sys._getframe(1).f_locals['self'].use_rawinput = 1
Thursday, October 2, 2008
pstree - a volatility plugin
I have been lurking on the volatility irc channel (#volatility @ irc.freenode.net) and I overheard a challenge to make a pstree like plugin. I thought this would be a great way to learn more of the code base.
The volatility code base is very nice and well written. It can be improved however, and this plugin demonstrates what I propose to make the code more reusable. As can be seen producing the data and rendering it are separated into 2 methods. The calculate() method just produces a data structure, and the render method output the data to the screen. This architecture is better because other tools can then just over ride the render method to output the data in any format they see fit - e.g. XML, HTML etc.
Ed: After posting the initial version I had lots of discussions from the IRC channel. The next challenge was to recover the path and name of the binary for each task. There are lots of ways to do this and I was offered 3:
First off we can get the CommandLine from the PEB. This is what volatility already does in dlllist for example
Second we can get the ImagePathName from the PEB (as moyix and msuiche point out - thanks guys for the help). Note that for those two options we need to switch to process address space first.
Lastly aschuster pointer out that same information is also stored in the auditing subsystem SeAuditProcessCreationInfo (thanks aschuster).
So there are many ways to skin this cat. Here is a sample output. Hopefully all these agree with each other.
The volatility code base is very nice and well written. It can be improved however, and this plugin demonstrates what I propose to make the code more reusable. As can be seen producing the data and rendering it are separated into 2 methods. The calculate() method just produces a data structure, and the render method output the data to the screen. This architecture is better because other tools can then just over ride the render method to output the data in any format they see fit - e.g. XML, HTML etc.
Ed: After posting the initial version I had lots of discussions from the IRC channel. The next challenge was to recover the path and name of the binary for each task. There are lots of ways to do this and I was offered 3:
First off we can get the CommandLine from the PEB. This is what volatility already does in dlllist for example
Second we can get the ImagePathName from the PEB (as moyix and msuiche point out - thanks guys for the help). Note that for those two options we need to switch to process address space first.
Lastly aschuster pointer out that same information is also stored in the auditing subsystem SeAuditProcessCreationInfo (thanks aschuster).
So there are many ways to skin this cat. Here is a sample output. Hopefully all these agree with each other.
Wednesday, July 16, 2008
Digital Forensics Research Workshop Challenge
Every year the DFRWS guys put on a great forensic challenge and this year was no different. While last years challenge was very hard and not that realistic, this years challenge was designed to reflect what many people would experience in their work. The challenge was a simulated incident which involved network traffic, some file forensics and Linux memory forensics.
On the positive side it is quite easy to get something from the challenge. Pretty much anyone can have a go with this challenge and get something (You can probably even solve it using commercial tools !!). Thats a very good aspect of this years challenge - I'm sure it will be used for training for many years to come.
The negative side is that it becomes difficult to do something extra ordinary in order to win the challenge. Just getting the right answer is not going to cut it because many others can get the right answer. We needed to get the right answer in style... Our tools need to be better and look nicer and also be easier to use. Unless, of course, we missed something blatantly obvious....
This year I was lucky enough to be involved with the great team of David Collett and Aaron Walters. These guys are brilliant and each brings their own unique skills to the table. I believe this made our submission well rounded and certainly better than what each of us could have done on our own. Regardless of whether we are lucky enough to place first, I think we did great and certainly developed pyflag in new directions and added some very cool features - so it was well worth it.
We actually took guidance from the perp (now i sound like a real CSI) and decided to use google docs for collaborative editing. Thats a great product and works very well considering that we tried to collaborate on a 50 page document with many screen shots and images. As a bonus you can publish the document when you are finished:
https://docs.google.com/Doc?id=ddmm9hjf_16hbkgjh4m&hl=en
Getting the paper in on time was a tremendous effort and well done to the team for foregoing sleep and time with their loved ones to make it happen....
It turned out to be a really cool document with a pyflag walkthrough of a realistic scenario. Hopefully people can read it and get a better idea of how to use pyflag from it.
On the positive side it is quite easy to get something from the challenge. Pretty much anyone can have a go with this challenge and get something (You can probably even solve it using commercial tools !!). Thats a very good aspect of this years challenge - I'm sure it will be used for training for many years to come.
The negative side is that it becomes difficult to do something extra ordinary in order to win the challenge. Just getting the right answer is not going to cut it because many others can get the right answer. We needed to get the right answer in style... Our tools need to be better and look nicer and also be easier to use. Unless, of course, we missed something blatantly obvious....
This year I was lucky enough to be involved with the great team of David Collett and Aaron Walters. These guys are brilliant and each brings their own unique skills to the table. I believe this made our submission well rounded and certainly better than what each of us could have done on our own. Regardless of whether we are lucky enough to place first, I think we did great and certainly developed pyflag in new directions and added some very cool features - so it was well worth it.
We actually took guidance from the perp (now i sound like a real CSI) and decided to use google docs for collaborative editing. Thats a great product and works very well considering that we tried to collaborate on a 50 page document with many screen shots and images. As a bonus you can publish the document when you are finished:
https://docs.google.com/Doc?id=ddmm9hjf_16hbkgjh4m&hl=en
Getting the paper in on time was a tremendous effort and well done to the team for foregoing sleep and time with their loved ones to make it happen....
It turned out to be a really cool document with a pyflag walkthrough of a realistic scenario. Hopefully people can read it and get a better idea of how to use pyflag from it.
Saturday, July 12, 2008
Magic Flute
So we went to see the magic flute last night... The performance was excellent, taking the liberty of changing the original story to occur in a circus. This variation really turned out well with colorful characters and an interesting set. The clown costumes were engaging, and the set was beautifully constructed.
Jason Barri Smith (Papageno) was excellent - not only was his singing superb, his acting was engaging, and so effortlessly combined with the music. Judit Lorincz, (Queen of the night) was superb, performing the vocal scales in Der Hoelle Rache superbly (This is a really hard to sing aria).
My most memorable aria of all was the Papageno/Papagna duet at the end was terrific - thats my favorite aria.
Well done opera Queensland.
Jason Barri Smith (Papageno) was excellent - not only was his singing superb, his acting was engaging, and so effortlessly combined with the music. Judit Lorincz, (Queen of the night) was superb, performing the vocal scales in Der Hoelle Rache superbly (This is a really hard to sing aria).
My most memorable aria of all was the Papageno/Papagna duet at the end was terrific - thats my favorite aria.
Well done opera Queensland.
Subscribe to:
Posts (Atom)