Technical Intelligence–T.I.

Enough is enough

Day after day, I help financial analysts, managers, and CEOs to troubleshot their freaking business *quick look behind me*  with nice visualizations, great multi-dimensional models and so deaaaaaar Excel.

And ME, I am stuck troubleshooting this stupid legacy program with plain old raw xml files, text file, csv files, rows and columns… and this ****** command line.

Enough is enough, dear developers and friends.
This is the day, we take power into our hands, this is the day we are fighting with the same weapon as the one we serve.
This is the day I decide to spread the word, to you, we are not slaves and deserve better than stupid text file and command line interfaces.
… and don’t tell me you have NotePad++ so you can’t complain, you deserve way more.

Let’s reclaim the power that is ours. Let’s proclame Technical Intelligence.

From wikipedia :
Business intelligence (BI) is a set of theories, methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information for business purpose.

So here and now, both on my blog and on codeproject, in june 2013 I declare :
Technical intelligence (TI) is a set of theories, methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information for technical purpose.

How great is this buzzword ?

This article is about using BI tools, for our own technical needs. There is not reason why BI practices are not relevant for analyzing technical data.
With this hacker mind, lets see how we can play with other’s tools for our own purpose.

This journey is about manipulation of information for developers by developers.
In a more poetic way, we’ll eat our dog food.

This article will have almost no code, except if it helps explaining my point, this article is about ideas to troubleshoot problems with an original point of view, this article will give you ideas.

Disclaimer: The few lines of code in this article are images. The reason for that is not laziness. The point of this article is to show you how to get higher level visualization of technical data quickly by crafting your own tools for your own problems and leveraging existing user applications the best you can.
The code here is throw away because it has a very narrow use case, specific to my problem.

Aggregate technical data with Pivot Table

If you don’t know about procmon, then it probably means that your main tool for debugging other’s program was a mix between The Dance Of The Rain and Google.
You wasted your life, but fear not, now you know procmon, you’ll see a whole new world. And that’s only the beginning.
This excellent program was done by Mark Russinovich long time ago, he is my coding hero, for its contribution on sysinternals, Windows Internals, and Windows Azure.

As you can see, you can see every file system or registry operation happening in real time on your system. How cool is that ?

image

A tabular view is fine for sequential analysis.
But sometimes, you want to know about frequencies : “What is the program spamming my hard drive ? Or where is the data of Evernote concentrated on the hard drive ?”
In which case, you will prefer an aggregated view of the same data… something like that.

image

But maybe you prefer an aggregated view of the result of your operations… Easy enough… just change what you dimension you are mapping on columns.

imageimage

Everything is possible, thanks to the undervalued “Export the trace as CSV” in procmon… You named it, CSV is open bar for Excel.

image

… you can create your pivot table yourself, but I since I’m doing this stuff daily I created my own program to tweak procmon’s CSV and break into fields that are interesting for me.(Noticed the nice path hierarchy ? Sourire)
You can check it out on CodePlex, there is no release, so download the code and compile by yourself… and more importantly, tweaks it to your need, the code is not difficult. (Sorry GIT fans, I did this stuff before becoming a GIT fanboy)

Know thy time

How does the disk access evolve through time by program ?
It seems Evernote is continuously hungry !
But hey, that’s cool ! It seems that windows SearchIndexer shuts up as soon as another program get high data throughput !

image

To do this sort of graph excel is waiting for a measure representation of data instead of an event representation.
Let me show you.

I get this sort of data from procmon…

image

I created the Time column, which is a number representation of the column Time of Day.
This is an event-driven representation of data. One event equal one row.

On the other hand, if you want to use graphs in excel you need this representation. Every series with their own column, plus the time column.image
This is a measurement-driven representation of data. You can see each row as one measure of 10 different properties at the same time.

So how do you transform an event-driven representation to a measurement-driven representation ?
Easy… you just create a pivot table from the event-driven representation of data.
Then you add the time in row, and the properties to measure in column… and just count occurences for each Time/Process name.
image

The resolution of the Time column is important (ie, how much do you approximate the Time of day (timespan) column).
If you are very precise on the resolution, you get that.
image
The measures are messy, we can’t really see trends, but can see sharply when each process is starting, but we don’t see any trends.
High resolution, clearly show us that SearchIndexer stops as soon as Evernote need to do stuff, and resumes as soon as Evernote stops.

On the other hand, if you improve the precision of the time, then you can see the trends more sharply, but you are not precise on time.
With a low resolution, you can clearly see that there is two pikes of evernote
With low resolution, you can deduce when I decided to save a note on Evernote, but can’t really conclude as well as the first graph that SearchIndexer stop as soon as Evernote stops.

image

This is what what mathematicians call uncertainty principle or Heisenberg principle, by sampling, you can’t get exactly both position in time (high resolution) and where it is going. (low resolution)

But you can sample at multiple resolution by taking a high resolution sampling, and rounding it at different scale, like I have done.
Or use a mathematical tool called wavelets… but I’m just a developer and I don’t know how to use it, so I’ll stop here.

In the first graph the Time was an integer representation of the “Time of day” column rounded to 0,00000001, whereas in the second graph I removed 2 zeros 0,000001.

Show me who’s slowing down my hard drive

image

From the exact same data, you can easily to map it on a TreeMap by transforming procmon’s data.
Procmon gave me that type of data.
image

All I need is to transform it into this new form, so every TreeMap framework will understand : One column to identify a rectangle (Id), one to identify the parent (Process), one to specify the Label (Label), and one to specify the size of the rectangle (Count).
image

We don’t need any line of code to make the transform.

Starting with the CSV output of procmon, create a new calculated column with all informations that the TreeMap data need separated by commas.

image

Next, we create a pivot table, plot our new columns and count it.

image

We copy all the values aside, and break the first column with ‘,’.
Which leave us with what we need.

image

Then merge the columns into one column separated with “,” by using the Concatenate excel function.
Copy that column, you have your CSV into the right format.

Once you have done that, find a simple treemap tool, I endup with Dojo’s treemap.
Copy their sample, and find where they are inserting data.

image

Replace with a placeholder, and change code that is relevant to your own columns.

image

Then hack a quick tool that transform your csv into json.
image

And enjoy,

image

The code is not pretty, but it goes to the point, very quickly, no over abstraction.

Conclusion

This was just an introduction, I will probably write again about the idea.
Data never tells any story, your point of view does.
Plain old XML rarely explains anything.

I write about techniques I use to reverse undocumented stuffs on my blog ao-sec, and you don’t need to go deep into assembler when you know 3 things : what to measure, how to measure, and how to see it.

We are developers, so why are we sometimes so afraid to code for ourselves ?
The data part is simple, for most of our needs we only need plain old CSV.
The view part is simple, for most of our needs we can just use Excel, you don’t have to be ashamed.

Don’t be afraid to use business user’s tools, searching for insight from data is a common need. Don’t let the marketing telling you who should use the tool.

Let the hacker mind blossoms.

Reversing data and the Scientist method

Introduction

A source code is nothing but knowledge acquired by its author and written in computer instructions. A language without any ambiguity.

But where does this knowledge came from ? After all source code is a support of knowledge.

Google, and someone else source code are the best source of knowledge.

And sometimes, other’s source code is outdated.
And sometimes, you do something so esoteric that few people are talking about it.

But nature gift us with a wonderful source of knowledge : observation and imagination.
Our brain can really create ideas, it is something we are forgetting when we are used to swim in this ocean of knowledge, internet.

The scientific method is entirely about observation and imagination. And this is something we can benefit, as software engineer. The scientific method is not just for scientist, as we will see.

Goal

I am programming a productivity tool : a program launcher working with the leap motion… the leap launcher.

My program launcher will show you the most recent programs you used, odds are high that you want to open them again very soon.

A quick search on Google talks about UserAssist and of someone that developed a tool to see its data… looking at the source, it seems windows is storing information inside registry about every program launched.

All UserAssist data is inside the registry key HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\UserAssist\{GUID}\Count

I don’t know yet from on what depends the {GUID}, but what I’m sure about is that the source I just downloaded does not use right GUIDs on my computer.

I fixed the source, but seems that the last datetime a program was launched did not appear, it seems that Windows 7 changed the format a little bit.

image

From that point I decided to reverse the data inside the Count keys by myself.

Once again, a dimensional view of data, coupled with good visualization technics will expose the obvious.

No data is by itself obvious. Only a good point of view can expose obviousness.

Reversing

Let’s take a look at what is in the registry.

image

And each sub key has strange data inside.

image

The first piece of code will extract every subkey inside the Count key, and save it in a CSV format, I will call it a snapshot.

Then it will start a program, and do a second snapshot

We will then compare with windifff what changed.

My code is Test Driven, I will not show you the implementation (the source code will be uploaded at the end of the article), you will only see my experiments.

image

Windiff shows no change…
Maybe it only works with user initiated programs…

Let’s change it, take snap1, run a program manually, take snap2.

image

Bingo !!

image

But I expected only one change… got more than one.
Which lines can identify the process I just launched ?

To answer that, I will create two diffs file and compare. (4 snapshots in total)

One diff is created by launching cmd.exe and the other by launching acrobat, I will then compare the diff.

By comparing the diff files, I’ll see which line give us any information about the identity of the launched process.

image

Note : The name of artifacts are not random. I did it in a way, where alphabetical order is identical to experiment order, so I can use windows browser easily to manually check the files.

Result:
image

Line 1, 537 and 607 does not give any information.

Line 105 is modified only in one diff. So it is not the line that uniquely identify the launched process.

This leave us with line 23 and 47, along with line 611 and 623.

Red lines are related to cmd.exe, and yellow line to acrobat.

What can you conclude from this experience : cmd.exe diff consistently on line 23, and acrobat on line 47. Other lines are common to both, so it gives us no information about the identity of the launched process.

Here is the line 23 (cmd.exe):
image

and line 47 (acrobat):
image

The name of each value of the Count key are encrypted with ROT13
So now I change the code of my snapshot method to decipher it.
Result:
image

45 changed to 46.
So the launch count seems to be an int32 starting at byte 1.
Another byte range changed… at byte 60…
image
It might be related with time. Maybe how much time it was open, or maybe when it was open.

To confirm the relation of this number with time, I will open programs 5 times, for a random period, one at a time.

We will record the start time, the duration, and these 5 digits for each program run.

Then, by plotting these points on a 2D graph we will se if there is any correlation between the period, the mysterious number, and the starting time.
image

Result:
image

Exact correlation between StartDate and the magic number !!!

Now I know that byte 60 is an int64 representing a timestamp… by experimenting with debugger, I found that I could use DateTime.FromFileTime to parse it !

Conclusion

This was a simple example, but too few software developers use scientific methods, even when they are coming from scientific background to study software and data they don’t understand.

Scientists are the best reverse engineers in the world, they taught us lots of best practice and wisdom to reverse engineer nature, why not using the same principles on unknown software and data ?

The study was fun, and, even if it was not intended to create a usable class for my target project, I can copy/paste the Snapshot class for production. :)

(Ok, ok I know that undocumented data structure might change… and guess what ? my code can change too !)

The Test Driven Development was a nice equivalent of repeatable experiments.

Excel is a nice data visualizer, and Windiff a nice diff tool….
CsvHelper a nice CSV parser and writer,
DiffPlex a nice diff library,

All the hard work was already done by the community.

You can download the source code on GitHub..

Apply BI analysis to reversing

Windows is a big eventing system.
Events logs, ETW and layered drivers are part of this eventing system… and only the tip of the iceberg.

These events are for diagnostic purpose and can give valuable information about what is going on in the system.
The problem is that there is an information overflow, million of events can be intercepted each minutes on your system.

File access, registry access, keyboard typing, mouse move are the most obvious. But ETW (Event Tracing for Windows) contains much more, and I would recommend you Inside Windows Debugging: A Practical Guide to Debugging and Tracing Strategies in Windows for more information.

But events are not enough to solve problems and take actions, events are part of the equation.

The second part is called Reporting : that is finding a way to display data so you can take action and gain knowledge.

Two really different domains inspired me : Business Intelligence, Data Mining.
One thing a data analyzer really like is called Pivot Table : it is a way to cross data in a tabular format and aggregate them.

So I took the output of procmon in CSV, and turned it in a pivot table in excel automatically you can see the code here.

Basically it turns this kind of stuff

image

Into this kind of stuff

image

Procmon give you events.
Excel gives you reporting.

And you need both to take action, gain knowledge and visibility.

NHook–.NET x86 debugger library

Dowload page : http://nhook.codeplex.com/

Why would you need a debugger API ?

I am not the kind of person that can break a new device into piece to know how it is made. I am not an hardware person. Every person that gave me a screwdriver regretted it afterward.

For software, that’s a different story. When a software do cool stuff, I want to know how they did that… I also use exactly the same toolset to troubleshot bugs, because software almost never work as expected… Google and MSDN are sometimes not enough to solve your problem.

So I use lots of different tool to see what is going on inside, these programs are like your eyes inside windows, and you certainly use them on a day by day basis… For example, the mighty Procmon to troubleshoot file system and registry access, the great  API Monitor to troubleshot problematic API calls –This one should deserve more attention, it’s a truly great tool-, the famous Wireshark for network or protocol related problems, and, in the last resort debuggers like OllyDbg, IDA pro, and WinDbg.

The problem is that I love my comfortable .NET world, so I thought “Hey, what if all of these great tool were available within a nice and friendly .NET API ?” so the this API is my response to this question. Not everything is implemented yet, but that’s the first bits. For now it only support x86 programs.

At the end of this article, you will be able to :

  • Set breakpoints programmatically in assembler
  • Find the address of modules and functions imported and loaded in a remote process.
  • By extension, setting breakpoint in well known WIN API functions.
  • Apply patches to remote process (changing x86 instructions or data)
  • Push the patches back to file.

All of that nicely implemented in comfortable .NET API.

For advanced developers you will understand :

  • How a debugger works (Under the hood)
  • What is a native breakpoint (Under the hood)
  • What is a PE file, a module, a PDB.

NHook is using OllyDbg assembler/disassembler for the assembler part.

NHook and this article as been bought to you by Dusan Trtica and me. We are working together on this project. Dusan is a very talented developer in C++, this project advances smoothly thanks to him ! (I’m wayyyy toooo slooooow in C++ !)

So let’s get started !

How to crack a simple Crack Me program

Before coding anything, my goal was to create an API that will make cracking this simple program easy. Now that the goal is set, coding is just a matter of expressing the path to this goal.

#include <stdio.h>

int main(int argc, char** argv)
{
    if(argc == 3)
    {
        printf("success");
        return 1;
    }
    printf("miss");
    return 0;
}

With NHook, I will bypass the if condition so that even if I invoke the program without 2 parameters, it will print success.

What does the assembly code of this function look like ?

To see the assembly code let’s run SimpleCrackMe.exe with OllyDbg, and let’s bring up the Executable modules window.

image

The code is compiled with the VC++ 2012 runtime, so please download and install it, or the loader will not find MSVCR110D (Debug library of the C Runtime library that contains printf).

You can see that before running any code, several .dll have been loaded into my process address space. How did the loader know which dll to load ? Well… that’s just written in the import table of any PE file (dll or exe file). You can view it with CFF explorer.

image

Each imported dll have other dependencies that will be loaded as well. OllyDbg blocks only when the loader is done.

So in the Executable modules window you can see the SimpleCrackMe is loaded at 0x3D0000 and the entrypoint is at 0x3E110E.

So you can think that 0x3E110E is the address of main… but wrong ! It might be that if you disable every setting and feature of ms compiler, but there are some bootstrapping code. If that is a mixed assembly (C++/CLI), you have some .NET related stuff, and if that’s time linked with C runtime libraries, there are some bootstrapping also… Here is the entry point… A just to the bootstrap mainCRTStartup.

image

You can also get the entry point of your executable by using CFF Explorer, the entry point is in the PE file.

image

0x1110E is the address relative to the base address of module SimpleCrackMe. It is called an RVA (relative virtual address).

That’s why the effective VA (virtual address) shown by Ollydbg is 0x3D0000 + 0x1110E = 0x3E110E.

So, back to the question : how could you find the address of main ?  Response : by using pdb.

PDB is just a database of symbols (a name and a type) with their RVA and maybe source file and line.

In .NET, PDB does not store RVA, but only MetadataToken-Source file:line. The JIT compiler is ultimately responsible to choose where to transform MSIL and how to decompile it (x86,64 etc…), that’s why the C# or VB .NET compiler can’t predict RVA in advance and use MetadataToken instead. I will probably extend NHook to support .NET, so this will be a subject for the next article.

Let’s right click on the SimpleCrackMe.exe  module in the  Executable View, and it will show us all symbols it resolved thanks to pdb or PE’s export directory.

Search for the main symbol, click on it and you should see the code in the memory dump.

image

If you see an hex editor instead of the instructions, right click on the memory dump, and click on disassemble, to change it’s view.

The interesting stuff is that : There is a test at 0x3E13EE and a conditional JMP (JNZ) on it just after at 0x3E13F2… This is the if we need to bypass. 

We can bypass it by patching the JNZ with two NOP instructions.

Don’t worry… NHook link with OllyDbg’s assembler/disassembler library, so you don’t have to know op codes for these instructions.

Now I exposed the basic, you can understand all you have to know to use NHook, the following code should be self explanatory.

using(var dbg = new ProcessDebugger())
{
    dbg.Start(SimpleCrackMe);
    RVA mainRVA = dbg.SymbolManager.FromName("SimpleCrackMe.exe", "main").RVA; //Get main address
    Breakpoint breakPoint = dbg.Breakpoints.At("SimpleCrackMe.exe", mainRVA); //Set breakpoint
    dbg.BreakingThread.Continue.UntilBreakpoint(breakPoint); //Run to it
    var disasm = dbg.BreakingThread.Continue.UntilInstruction("JNZ"); //Continue execution until next JNZ
    dbg.BreakingThread.WriteInstruction("NOP"); //Nop the JNZ
    dbg.Patches.ModifyImage(dbg.Debuggee.MainModule, "SimpleCrackMe-Patched.exe"); //Save changes back to a file
    Assert.Equal(1, dbg.Continue.ToEnd()); //Debugged process resolved !
    var result = StartAndGetReturnCode("SimpleCrackMe-Patched.exe");
    Assert.Equal(1, result); //SimpleCrakMe-Patched also resolved !!!!
}

Let’s dig deeper

If you just want to use NHook you can stop here, I will not expose new features in this section, if you want to understand how things work under the hood… you can continue.

The first question I will respond is this one : What is a breakpoint ?

A breakpoint is simply an x86 instruction (INT 3) that will fire an interrupt that is handled by a debugger. You can verify it by yourself, try to set a breakpoint with Ollydbg.

image

Then, check the instruction’s address 0x012E13D0 with another memory editor. For example, with the excellent API monitor.

image

So the debugger is reporting value, 0×55 (PUSH EBP), but the memory editor is reporting 0xCC (INT 3). Conclusion : A debugger put breakpoints by overwriting a given instruction by 0xCC (INT 3).

So your next question might be : How to write in another’s process memory ?

You can do that with WriteProcessMemory. Here is a simple C++/CLI wrapper, the parameter processHandle can easily be found with the Process.Handle method in .NET.

public:static void WriteMemory(IntPtr processHandle, IntPtr baseAddress, array<Byte>^ input)
       {
           mauto_handle<BYTE> bytes(Util::ToCArray(input));
           SIZE_T readen;
           ASSERT_TRUE(WriteProcessMemory(processHandle.ToPointer(),baseAddress.ToPointer(),bytes.get(),input->Length,&readen));
       }

WriteProcessMemory and all debug method will not work if the debugger’s thread security token does not have SeDebugPrivilege. Moreover this privilege is filtered when you are using UAC.

Here is a screenshot of procexp, that show a process (csrss.exe) running with the System account. System’s account token has the SeDebugPrivilege.

image

You can grant this privilege with Local Computer Policy (enabled by default for Administrators).

image

If you don’t have these rights, the debuggee should grant to debugger’s process ACL should grant PROCESS_VM_OPERATION PROCESS_VM_READ PROCESS_VM_WRITE rights… But we will not take that path.

Now that you understand what a breakpoint is, how do you attach a debugger to a process ?

The first solution, is to start the process with the debugger. Here is a C++/CLI wrapper.

static ProcessInformation^ StartDebugProcess(String^ appPath)
{
    marshal_context context;
    STARTUPINFO startupInfo = {0};
    startupInfo.cb = sizeof(STARTUPINFO);

    PROCESS_INFORMATION processInformation = {0};
    auto directory = System::IO::Path::GetDirectoryName(appPath);
    ASSERT_TRUE(CreateProcess(context.marshal_as<LPCTSTR>(appPath),
        NULL, 
        NULL,
        NULL,
        false,
        CREATE_DEFAULT_ERROR_MODE | CREATE_NEW_CONSOLE | DEBUG_ONLY_THIS_PROCESS | NORMAL_PRIORITY_CLASS,
        NULL,
        context.marshal_as<LPCTSTR>(directory),
        &startupInfo,
        &processInformation));
    return gcnew ProcessInformation(&processInformation);
}

The interesting point is the creation flag DEBUG_ONLY_THIS_PROCESS. Contrary to  DEBUG_PROCESS it will not debug child processes.

The second solution is to attach to a running process with DebugActiveProcess.

public: static ProcessInformation^ DebugActiveProcess(int pid)
        {
            ASSERT_TRUE(::DebugActiveProcess(pid));
            auto process = System::Diagnostics::Process::GetProcessById(pid);
            PROCESS_INFORMATION info;
            info.dwProcessId = process->Id;
            info.dwThreadId = process->Threads[0]->Id;
            info.hProcess = OpenProcess(PROCESS_ALL_ACCESS,false,process->Id);
            info.hThread = OpenThread(THREAD_ALL_ACCESS,false, process->Threads[0]->Id);
            return gcnew ProcessInformation(&info);
        }

Once your debugger is attached, only 2 functions will control how to receive debuggee’s events and when to continue execution.

The first is WaitForDebugEvent. It will break debugger’s thread until the next debug event. Once a debug event is received, you call ContinueDebugEvent to continue debuggee’s execution.

My C++/CLI wrapper transform the Debug_Event into CLR types. Current interesting debug events are : Dll Loaded, Exception thrown, CreateThread, and ExitProcess.

static DebugEvent^ WaitForEvent(ProcessInformation^ processInformation, TimeSpan timeout)
{
    DEBUG_EVENT dbgEvent;
    BOOL result;
    if(timeout == TimeSpan::MaxValue)
        result = WaitForDebugEvent(&dbgEvent, INFINITE);
    else
        result = WaitForDebugEvent(&dbgEvent, (DWORD)timeout.TotalMilliseconds);
    if(!result)
        return nullptr;

    if(dbgEvent.dwDebugEventCode == (DWORD)DebugEventType::LoadDllEvent)
        return gcnew DebugEventEx<LoadDllDetail^>(&dbgEvent, gcnew LoadDllDetail(processInformation, &dbgEvent.u.LoadDll));
    if(dbgEvent.dwDebugEventCode == (DWORD)DebugEventType::ExceptionEvent)
        return gcnew DebugEventEx<ExceptionDetail^>(&dbgEvent, gcnew ExceptionDetail(processInformation, &dbgEvent.u.Exception));
    if(dbgEvent.dwDebugEventCode == (DWORD)DebugEventType::CreateThreadEvent)
        return gcnew DebugEventEx<CreateThreadDetail^>(&dbgEvent, gcnew CreateThreadDetail(processInformation, &dbgEvent.u.CreateThread));
    if(dbgEvent.dwDebugEventCode == (DWORD)DebugEventType::ExitProcessEvent)
        return gcnew DebugEventEx<ExitProcessDetail^>(&dbgEvent, gcnew ExitProcessDetail(processInformation, &dbgEvent.u.ExitProcess));
    return gcnew DebugEvent(&dbgEvent);
}

ContinueDebugEvent is just asking the process and thread which need to continue, as well as, if the debugger handle an exception.

static void Continue(DebugEvent^ debugEvent, bool handleException)
{
    ContinueDebugEvent(debugEvent->ProcessId, debugEvent->ThreadId, handleException ? DBG_CONTINUE : DBG_EXCEPTION_NOT_HANDLED);
}

So now, how do you break into a breakpoint ?

Simple enough : Loop through all DebugEvent and stop when an “exception” one is receive, and that the reason is “Breakpoint”, here is the C# code, that do just that.

public DebugEventEx<ExceptionDetail> UntilNextBreakpoint()
{
    return (DebugEventEx<ExceptionDetail>)Until(ev =>
    {
        if(ev.EventType != DebugEventType.ExceptionEvent)
            return false;
        return ((DebugEventEx<ExceptionDetail>)ev).Details.Exception.Reason == ExceptionReason.ExceptionBreakpoint;
    });
}

The Until method loop until the predicate returns true.(The Run method ultimately call ContinueDebugEvent and the NextEvent method ultimately call the WaitDebugEvent method)

public DebugEvent Until(Func<DebugEvent, bool> filter)
{
    DebugEvent lastEvent = null;
    bool first = true;
    if(_Commandable.CurrentEvent != null)
        Run();
    while(lastEvent == null || !filter(lastEvent))
    {
        if(!first)
            Run();
        first = false;
        lastEvent = _Commandable.Wait.NextEvent();
    }
    return lastEvent;
}

How to dump memory addresses from PDB

Without PDBs, life would be hard for developers. Without PDB you would have no way to know that your current thread broke inside the main method. The only thing you would see is a bunch of bytes on the stack.

And so you would have no way to know the value or type of local variables during your debugging session. Life would be hard. So hard than debugging expert would tell you that PDB are as important as your source code.

That’s precisely thanks to PDBs I was able to find the RVA of the main method in this example.

using(var dbg = new ProcessDebugger())
{
    dbg.Start(SimpleCrackMe);
    RVA mainRVA = dbg.SymbolManager.FromName("SimpleCrackMe.exe", "main").RVA; //Get main address
    Breakpoint breakPoint = dbg.Breakpoints.At("SimpleCrackMe.exe", mainRVA); //Set breakpoint
    dbg.BreakingThread.Continue.UntilBreakpoint(breakPoint); //Run to it
    var disasm = dbg.BreakingThread.Continue.UntilInstruction("JNZ"); //Continue execution until next JNZ
    dbg.BreakingThread.WriteInstruction("NOP"); //Nop the JNZ
    dbg.Patches.ModifyImage(dbg.Debuggee.MainModule, "SimpleCrackMe-Patched.exe"); //Save changes back to a file
    Assert.Equal(1, dbg.Continue.ToEnd()); //Debugged process resolved !
    var result = StartAndGetReturnCode("SimpleCrackMe-Patched.exe");
    Assert.Equal(1, result); //SimpleCrakMe-Patched also resolved !!!!
}

Sounds cool ? So let’s see how to parse a PDB.

Microsoft ship with visual studio a COM component called DIA (Debug Interface Access) that does just that. So I just had to generate the COM interop .NET assembly for this COM component.

Go to C:\ProgF\Microsoft Visual Studio 11.0\DIA SDK\idl.

Generate a tlb file from the idl.
midl /I “C:\ProgF\Microsoft Visual Studio 11.0\DIA SDK\include” dia2.idl /tlb dia2.tlb

Generate the dll from the tlb
tlbimp dia2.tlb

Reference the resulting Dia2Lib.dll in NHook.

The rest is only about creating the one DiaSource object for each module loaded in the process in the WaitNextEvent method of the debugger.

if(dbgEvent.EventType == DebugEventType.LoadDllEvent)
{
    if(AreSameImage(dbgEvent.As<LoadDllDetail>().ImageName, "kernel32.dll"))
    {
        Debuggee.SetProcess();
        SymbolManager.LoadSymbolsFromExe(Debuggee.MainModule.FileName);
    }
    SymbolManager.LoadSymbolsFromExe(dbgEvent.As<LoadDllDetail>().ImageName);
}

Loading the symbols are easy:

public bool LoadSymbolsFromExe(string exePath)
{
    var pdb = Path.ChangeExtension(exePath, "pdb");
    if(File.Exists(pdb))
    {
        var source = LoadSymbols(pdb);
        _Sources.Add(Path.GetFileName(exePath), source);
        return true;
    }
    return false;
}

private DiaSource LoadSymbols(string pdbPath)
{
    if(!File.Exists(pdbPath))
        throw new FileNotFoundException(pdbPath);
    var diaSource = COMHelper.RegisterIfNotExistAndCreate(() => new DiaSource(), "msdia110.dll");
    diaSource.loadDataFromPdb(pdbPath);
    return diaSource;
}

RegisterIfNotExistAndCreate is just a method that will deploy the COM component automatically on the machine, so  the user does not have a weird exception.

The SymbolManager.FromName just need to get the right DIASource, fetch the right symbol, and return a nice plain old .NET object.

public SymbolInfo FromName(string moduleName, string name)
{
    return FromName(moduleName, name, null);
}
public SymbolInfo FromName(string moduleName, string name, SymTagEnum? type)
{
    _Debugger.EnsureProcessLoaded();
    DiaSource diaSource = FindSource(moduleName);
    IDiaSession session;
    IDiaEnumTables tables;
    diaSource.openSession(out session);
    session.getEnumTables(out tables);

    return tables.ToEnumerable()
        .OfType<IDiaEnumSymbols>()
        .SelectMany(s => s.ToEnumerable())
        .Where(s => s.name == name && (type == null || (uint)type.Value == s.symTag))
        .Select(s => new SymbolInfo(s))
        .FirstOrDefault();
}

Patching

What if you modified the behavior of a program and you want to persist the changes you made to the binary ?

You can also do that easily ! Each time that the debugger write on the Debuggee’s memory, the change is tracked by Debugger.Patches each Patch is one change, and you can decide to remove them, or apply them to binaries.

using(var dbg = new ProcessDebugger())
{
    dbg.Start(SimpleCrackMe);
    RVA mainRVA = dbg.SymbolManager.FromName("SimpleCrackMe.exe", "main").RVA; //Get main address
    Breakpoint breakPoint = dbg.Breakpoints.At("SimpleCrackMe.exe", mainRVA); //Set breakpoint
    dbg.BreakingThread.Continue.UntilBreakpoint(breakPoint); //Run to it
    var disasm = dbg.BreakingThread.Continue.UntilInstruction("JNZ"); //Continue execution until next JNZ
    dbg.BreakingThread.WriteInstruction("NOP"); //Nop the JNZ
    dbg.Patches.ModifyImage(dbg.Debuggee.MainModule, "SimpleCrackMe-Patched.exe"); //Save changes back to a file
    Assert.Equal(1, dbg.Continue.ToEnd()); //Debugged process resolved !
    var result = StartAndGetReturnCode("SimpleCrackMe-Patched.exe");
    Assert.Equal(1, result); //SimpleCrakMe-Patched also resolved !!!!
}

How it works under the hood need some explanation about how a dll or exe is stored in RAM vs on the disk.

A byte inside a dll have two locations : A file offset and an RVA. The file offset is its location on the disk file, RVA is its location in the RAM from the load address of your module.

Why is it ?
Two main reasons :

  • The file does not have to take space to store global variable. These values are decided at runtime.
  • The file need to be compact
  • On the other hand, the processor need to load sections of the DLL on page boundary (4 Kb) for security and performance reason. The processor can prevent code to execute within some pages, so that a buffer or stack overflow cannot be easily exploited.

Let’s take an example :
You can see with CFF explorer that an executable (dll or exe) have multiple sections.
The Virtual Address of the section is the RVA. The Raw Address is the File Offset.
Image [29]

The .text section is commonly where is the assembly code.
Now imagine that a method called IVssAdmin::RegisterProvider is in this section in the RAM at address E3BC.
Image [30]
Then, given the information about Raw Size and Raw Address (file offset) of the .text section, the IVssAdmin::RegisterProvider will be located at D7BC on the disk.

Image [31]

So here is the math to convert an RVA to a file offset.

E3BC (RVA) – 1000 (RVA of .text section) = D3BC (Relative address to the .text section)
D3BC + 400 (File offset of .text section) = D7BC (File Offset)

Or in code :

public FileOffset ToImageOffset(RVA rva)
{
    var section = GetSectionAt(rva);
    if(section == null)
        throw new InvalidOperationException("This RVA belongs to no section");
    var addressInSection = rva.Address - section.VirtualAddress;
    var offset = section.PointerToRawData + addressInSection;
    if(offset >= section.PointerToRawData + section.SizeOfRawData)
        throw new InvalidOperationException("This RVA is virtual");
    return new FileOffset(this, offset);
}

This is how a Patch converts tracked changes on RVA, changes on binary files.
—-New : now NHook can find RVA from the export directory of dll.

Conclusion

Hope you liked it, I think it is not a bad thing that .NET developers understand what is going on on lower level. That’s the start of an API I will use for my own needs whenever I need a way to manipulate a process memory or behavior for whatever reason.

There are tons of important things to do like :

  • x64 supports
  • .NET supports

Download page : http://nhook.codeplex.com/

Let’s see : Attach an Azure blob drive in windows 7

Introduction

This main point of this article is not about how to make cloud drive works on Windows 7.

This article will teach you how to see what is going on inside. And take advantage from it.

It is even better than relying on sparse, outdated or non-existant documentation.

This article is not a step by step tutorial of every tools I use. This would be useless, documentation can be found on their website, and when you understand what you want to, the UI is self descriptive.

You also don’t need to be knowledgeable about Azure. In fact the destination of this article is not important. The trip is the big deal.

But here is my goal : I want to create a new drive backed by Azure on Windows 7.

The code can be found on MSDN, so I tried to run it on my windows 7 machine.

 Image

ERROR_UNSUPPORTED_OS you say ?

A quick search on Google tells me that I can’t use a blob drive outside Azure…

What a shame… But this is not my last word ! Today it is decided ! Today I’m gonna  fight !

How to hack your way through it

Digging deep inside CloudDrive.Create with ILSpy led me to this method in a dll called mswacdmi.dll.

This assembly is a C++/CLI one, some parts are native, other managed, and here I’m stuck at an unmanaged one…

 Image [1]

This dll is in x64… my free edition of IDA Pro does not support x64, ollydbg also does not support it, so I can’t dig deeper.

But precious information can come from other sources than code. Code reverse engineering should be used at the last resort.

“When in doubt… run ProcMon” the great Mark Russinovich said… So that’s what I did to see what file and registry access was going on under the hood.

 Image [2]

Ok it seems to be looking for registry stuff.

Moreover, creating a new drive is a kernel stuff, so I suspect that somehow, this mswacdmi.dll module should try to start a driver, or communicate with it.

So I use API monitor to see filter WIN API calls happening in my program.

I decide to trace all file operation related API because that’s how userland communicate with a driver (WriteFile, ReadFile, CreateFile, DeviceIOControl) as well as WIN API relating to services like OpenSCManager and OpenService. (for those who do not know, drivers are started by the Service Control Manager)

I run again my code, and here is the result.

Image [3]

Ok, so the CloudDrive service is missing on my computer.

To know more about this service, I create and deploy a new Azure role with remote desktop enabled.

 Image [4]

I connect to the role.

Then a quick sc qc CloudDrive show that the service exists in Azure, but it is not a driver as I thought. (The type would be KERNEL_DRIVER)

 Image [5]

I also dump the missing registry that procmon told be earlier.

 Image [6]

This remind me that Microsoft permit you to create a VM Role from an image created by yourself, so you can scale out any server or service you want.

A quick search on internet inform me that such image should have VM role integration components installed, so instances will be able to communicate with Azure.

This installer is in the Azure SDK. “C:\Program Files\Microsoft SDKs\Windows Azure\.NET SDK\2012-06\iso\wavmroleic.iso”.

So I start a new Azure VM, and install it. (A Azure VM is different from the VM role I did previously ! An Azure VM is a plain old VM as we know it, the VM is never trashed unlike instances in VM roles, but it can’t scale.)

Unfortunately, my VM never reboot !

Frustrated, I set up a new VM hosted in my own server with Hyper-V and install azure integration components.

Then I check that the service is installed.

 Image [7]

I check the registry, and take a dump of the registry that was missing.

 Image [8]

I run my program aaand… it works !! I create a new drive backed by my blob storage. Now I can use normal System.IO API or use explorer to write on the cloud !!

 Image [9]

All is good, but I was not satisfied with the result.

Now I want to install it on my windows 7. Moreover a quick look at the msi with InstEd, show me all the stuff it installed on my VM… most of this stuff is not related with the azure drive feature, and is just azure infrastructure. This extra stuff is maybe the reason why my Azure VM never reboot. (Probably because the Azure Integration Components should only be used by VM roles, and not by Azure VM)

The solution is to remove all features not related with Azure Cloud Drive from the install. For that, msi files have native support for transform files. A transform file will apply a transformation on the msi file in memory just before starting the installation.

InstEd makes it dead easy to create one.

 Image [10]

The Features tab show you what the Features WADrive will install. And surely enough, you can recognize every single components.

 Image [11]

CloudDriveSvc.exe, the userland part of the Cloud drive feature.
WADrive_wadrive.sys the kernel part of the feature.

mswacdmi.dll the C/C++ interface in user code that interact with this driver.

CloudDrive and StorageClient assemblies that are the high level .NET API to manipulate CloudDrive.

Just right click and delete the two other features.

 Image [12]

To install this stuff on windows 7 you need to remove launch conditions in the LaunchCondition table.

 Image [13]

Save your transformation. (mst file)

Then run msiexec and pass the msi and the transform file.

 Image [14]

Cool !!!

 Image [15]

Happy from myself, I thought the battle was over, but I was wrong that’s only the beginning !!

I run the install on my Windows 7 but my CSharp program is just hanging.

A quick check with sc query show that the CloudDrive service is not started and returned an error.

 Image [16]

I spy my application with API monitor. It seems that it is trying to communicate with a named pipe that does not exist.

 Image [17]

A quick handle search with procexp on my VM Windows server 2008 R2, show me that this named pipe is opened, and probably created by CloudDriveSvc.exe.

 Image [18]

In other words, on my windows 7 box, CloudDriveSvc.exe, the user land part of cloud drive, can’t start. But the .NET API of CloudDrive communicates directly with it… If the kernel part is not running, things will get hard because I will need to use windbg on a kernel driver, and that would be a different story… In kernel mode, you can forget all the nice tool we used to spy. Your hands would become really dirty.

So I used Driver Loader from OSR to see if the driver is running. (I could also use sc query)

Image [19]

Good news, the driver is running on my windows 7 box.

Just to be sure I did not stripped too much stuff from the MSI, I run a new VM, and create a transform without deleting features, but I remove launch conditions… The result is the same.

Image [20]

Ok so now, let’s dive in CloudDriveSvc.exe to see why it can’t start on win7.

To see where thing break, I will start CloudDriveSvc.exe (with sc start CloudDrive) on Windows 2008 and Windows 7 and then compare the procmon traces.

So I stop CloudService on windows 2008, run procmon on both machine, sc start CloudDrive, and save the traces.

I save in both, PML file (native process monitor file, to open in in another instance of procmon later), and in XML file. (Maybe I will need to run some code on the trace to analyze more complicated stuff)
Here is the Windows 7 trace, and here the Windows 2008 R2 one.

Then I open side by side both traces, and try to sync them to see where the execution path diverges.

 Image [21]

And then I find the first divergence… (Left is Windows 2008, right is Windows 7)

 Image [22]

What is this key ? I go to the registry and see that it is a COM component, and the implementation is in vss_ps.dll . A quick check tell me that it is the Volume Shadow Copy Service. It allows you to backup files even if they are locked by other processes.

Now I know who is the enemy, I run API Monitor, filter call for this COM component, and start spying CloudDriveSvc.

You can ask API Monitor to prevent new programs from starting so you have time to attach to it before it closes. That’s what I did.

 Image [23]

Here is the result on win7 box… this call is failing.

 Image [24]

On the windows 2008 box.

 Image [25]

And here are all the parameters, same on both machine.

 Image [26]

A search on the E_INVALIDARG error on MSDN tells me :

 Image [27]

Ok that explained the launch condition in the MSI…  Sourire 

But the truth is that I don’t care about the volume shadow copy feature… so, if somehow I can modify the parameter VSS_PROV_HARDWARE (0x3) to be VSS_PROV_SOFTWARE (0x2). It should pass this call.

With the stack trace, I can see that RegisterProvider is called directly by CloudDriveSvc.

 Image [28]

And I know that the call will return to the RVA 0xE3BC. (offset column)

If I open a disassembler a little bit before this address, I should see instructions that push parameters on the stack, and a call x64 instruction.

Can I modify CloudServiceSvc.exe and change parameters ? Definitively.

So I run the great CFF explorer ! (a small dll file viewer, disassembler and hex editor)

The problem is that what is called offset in API monitor is in reality a RVA, ie, the Relative Virtual Memory address relative to the base address of the module that own the function.

On the other hand, for CFF explorer, an offset is a file offset, ie the position of a byte or instruction when it sit on the disk.

A dll is divided in multiple section. The position of these section is different in virtual memory (Virtual Address, or RVA) than on the disk drive (Raw Address, or file offset).

 Image [29]

The code section is .text (it is a convention). Moreover, you can see that the RVA of this section is 0×1000, and its virtual size 25F06.

The return RVA of RegisterProvider that I got earlier with API Monitor is E3BC, so, as you can see, it sits in the .text section.

Here is how CloudDriveSvc.exe is mapped in the process’s virtual memory.

 Image [30]

Here is the math to convert an RVA to a file offset.

E3BC (RVA) – 1000 (RVA of .text section) = D3BC (Relative address to the .text section)

D3BC + 400 (File offset of .text section) = D7BC (File Offset)

So in other word, here is CloudDriveSvc.exe when it is sitting on the drive.

 Image [31]

I jump to the file offset with the disassembler from CFF explorer, and move a little bit before to see how the parameters are stacked.

I stop at D7AB. (Do not pay attention to “Base Address” textbox and to the Address column, whatever you choose here will not change the result, it is just a visualization feature)

 Image [32]

You can see the value 0×3 is moved into the stack. So I just need to change this 0×3 (VSS_PROV_HARDWARE) with 0×2 (VSS_PROV_SOFTWARE).

The address of 0×03 is D7AB + 4 = D7AF

 Image [33]

Save your modification, and overwrite CloudDriveSvc.exe.

Restart the service.

 Image [34]

If you spy CloudDriveSvc.exe, you can see that it effectively use VSS_PROV_SOFTWARE now.

 Image [35]

I run again my program.

 

I wait, very anxious… and 1 min later, my program close without any crash and I have my new drive backed by a Azure VHD !!! Sourire

Image [36]

Conclusion

Finally, you can support azure drive in windows 7 just by changing just one bit in CloudDriveSvc.exe.

It reminds me a tale I read somewhere :

A company asks for a contractor to fix this important mainframe that stopped to work.
The contractor comes, opens the mainframe, scratches his head, changes one screw, closes the mainframe, then the mainframe comes back from the dead.

-Ok, it cost you 10 000$.
-Isn’t it expensive just for one screw ? Can you give me the detail of your invoice ?
-Sure, the screw costs : 1$, transport fees : 10$… and the knowledge for which screw to change : 9989$

The result is cool but less interesting than the trip. (Truth to be told, I removed the Azure drive once I knew it worked)

I hope you enjoyed it as well. But most importantly, be very grateful to the creators of these tools. Without them, such thing would be impossible.

Thanks

NHook, a .NET debugger API

I am a enterprise application developer, and in this industry comfort is the real deal.

With comfort, you can makes bad thing hard to do, and good thing easy to do, so less knowledgeable people can use your code.

As you move down in your abstraction layers, bad things become easier to do and good things harder. As you move down in your abstraction layers, good and bad are blurred, they loss the clear straight line separating both of them. And as a developer, you have to take more time, effort and knowledge to write each line of code.

Coming from enterprise application development, I started my journey on security on the road of comfort. My first goal is to make dynamic reverse engineering that you normaly do with IDA pro or OllyDbg easy to code with a full .NET API.

Here is a preview of an API that will help me to crack this test program, without having to pass 2 parameters to the program.

int main(int argc, char** argv)
{
    if(argc == 3)
    {
        printf("success");
        return 1;
    }
    printf("miss");
    return 0;
}

Simple enough, here is the Ollydbg dump.

image

So my goal was to patch the JNZ at 0x3213F2 with two NOP.

That’s easy now.

RVA mainRVA = dbg.RVAOfFunction("SimpleCrackMe.exe", "main");
using(var dbg = new ProcessDebugger())
{
    dbg.Start(SimpleCrackMe);
    Breakpoint breakPoint = dbg.Breakpoints.At("SimpleCrackMe.exe", mainRVA);
    dbg.BreakingThread.Continue.UntilBreakpoint(breakPoint);
    var disasm = dbg.BreakingThread.Continue.UntilInstruction("JNZ");
    dbg.BreakingThread.NOP(disasm.Instruction.Size);
    dbg.Patches.ModifyImage(dbg.Process.MainModule, "SimpleCrackMe-Patched.exe");
}

The RVA of main is found thanks to a SymbolTable whose will be filled thanks to .pdb and exported functions. This SymbolTable will be fillable and saveable with custom symbol for easy redistribution on reverse engineering tasks.

This library will be my building block to analyze, hook and re-use executables. It is not stealthy, but I don’t care, as a reverse engineerer, I don’t need to be hidden. Future features include anti-anti-debugging stuff. Sourire

For who is this blog ?

This blog is about combining tools, code and knowledge to see and take advantage of what the majority of developers overlook.

This is not a set of tutorial. You can find those by yourself on other website or books. This blog is not for consumers, it is for doers.

I will mix other people’s idea, code, knowledge, library, tools for some defined purpose, and explain it here.

You can use everything for your own purpose. You can be angry or just curious, you can use it for fun, glory or money, you are welcome. As long as you are smart, you’ll have your own good reason. The most dangerous person is the incompetent with powerful tools, that’s why we don’t give a knife to a baby, and I hope that the technical depth of this blog will keep him at home.

I am myself a developer, and my passion is to learn, combine idea, and share with everyone. That’s why I write a lot on CodeProject and why I earn money with training. I am mainly a .NET programmer and now you will also learn with me the low level stuff. Both .NET and C/C++ will help to achieve our goal, they are just tool at our disposal that we will use and combine to express our creativity.

This blog is for hackers. Not hackers as in pirate, but hacker as a person who like to decompose things and reassemble them to create harmonious solutions. Again this is not a blog for consumers, it is a blog for doers.

If you appreciate my posts, share it with the smartest people, and share your ideas, these are the best gifts you can make.