Note: Get the full running example here
While beeing able to use inline assembler within a 32 bit project, you can not use that within a 64bit build. Microsoft just does not support this. You have to use an external .asm
file instead which can be processed by MASM
during the build process.
Assembler in VS
To activate MASM
support in a C++
project in VS2019
, please follow the guidelines in the official documentation. Additionally I suggest you to install the VS extension AsmDude to get syntax highlighting.
Switching between both implementations
To access the procedures that are defined in the assembler file, you must declare them with extern "C"
in the header file. Otherwise the compiler rewrites the function names and thus the linker won't be able to match the CPP and ASM output together.
To see that effect, add a new function in a header file and call it somewhere in your CPP code. Then open the obj
file and search the method name.
extern bool DevToTest(int a, int b);
Without the "C"
addition, the name does not match the original one:
Using "C"
fixes this:
extern "C" bool DevToTest(int a, int b);
To differentiate between 32 and 64 bit code, you can use preprocessor directives. Adjust Naked32Bit.h
as following:
#pragma once
#include "pch.h"
#ifdef _WIN64
EXTERN_C void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size);
EXTERN_C void FnEnterCallback(FunctionID funcId,
UINT_PTR clientData,
COR_PRF_FRAME_INFO func,
COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo);
EXTERN_C void FnLeaveCallback(FunctionID funcId,
UINT_PTR clientData,
COR_PRF_FRAME_INFO func,
COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo);
EXTERN_C void FnTailcallCallback(FunctionID funcId,
UINT_PTR clientData,
COR_PRF_FRAME_INFO func);
#else
void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size);
//....
#endif
In case of a 64 bit build, the functions refer to external symbols. I also adjusted the signature of the Init
function. This was necessary because I wanted to show you how you can build the same logic as in the inline assembler. But this requires a hashmap. To avoid allocating memory in assembler, I just pass the variables from CPP
into the assembler code. This saves me some time and makes the whole thing more readable.
Note: Of course the naming of the header file is not correct anymore, but this does not matter 😄
Now adjust Naked32bit.cpp
:
extern "C" void _stdcall StackOverflowDetected(FunctionID funcId, int count) {
std::cout << "stackoverflow: " << funcId << ", count: " << count;
}
extern "C" void _stdcall EnterCpp(
FunctionID funcId,
int identifier) {
std::cout << "enter funcion id: " << funcId << ", Arguments in correct order: " << (identifier == 12345) << "\r\n";
}
#ifdef _WIN64
#else
bool* activateCallbacks;
int* pHashMap;
int mapSize;
void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size) {
activateCallbacks = activate;
pHashMap = hashMap;
mapSize = size;
}
Both functions, EnterCpp
and SODetected
must be marked with extern "C"
. The Init
function and the variables must be moved into the 32bit code block. You can leave the 64bit code block empty because everything will be in the assembler file.
Now add the initialize in ProfilerCOncreteImpl.cpp
:
this->PHashMap = new int[mapSize];
memset(this->PHashMap, 0, mapSize);
InitEnterLeaveCallbacks(&this->ActivateCallbacks, this->PHashMap, mapSize);
The ASM Code
What you will see now is no magic. There is only one thing you have to pay attention for: In 64Bt builds there is only one calling convention: fastcall
. See the links at the end of the post to get an insight into it. The most important points (at least these are the points I came across a few times):
- parameters are passed from left to right in the register:
RCX, RDX, R8, R9
- The caller must reserve 4*8 bytes in case of the callee wants to store the parameters onto the stack
- The caller has to clean up the stack afterwards
I stumbled a few times over the last two points which led to unwanted behavior.
_DATA SEGMENT
pActivateEnterLeaveCallback qword 0
pHashMap qword 0
mapSize dword 0
_DATA ENDS
extern EnterCpp:proc
extern StackOverflowDetected:proc
_TEXT SEGMENT
PUBLIC InitEnterLeaveCallbacks
InitEnterLeaveCallbacks PROC
mov pActivateEnterLeaveCallback, RCX
mov pHashMap, RDX
mov mapSize, R8D
ret
InitEnterLeaveCallbacks ENDP
PUBLIC FnEnterCallback
FnEnterCallback PROC
mov RAX, pActivateEnterLeaveCallback
cmp byte ptr [RAX], 1
JNE skipCallback
mov R8, pHashMap
MOV RAX, RCX
XOR RDX, RDX
DIV DWORD PTR [mapSize]
ADD R8, RDX
INC DWORD PTR [R8]
CMP DWORD PTR [R8], 30
JB skipStackOverflow
xor rdx, rdx
MOV EDX, [R8]
SUB RSP, 20h
CALL StackOverflowDetected
ADD RSP, 20h
skipStackOverflow:
sub RSP, 20h
mov rdx, 12345
CALL EnterCpp
add RSP, 20h
skipCallback:
ret
FnEnterCallback ENDP
PUBLIC FnLeaveCallback
FnLeaveCallback PROC
MOV RAX, pActivateEnterLeaveCallback
CMP BYTE PTR [RAX], 1
JNE skipCallback
MOV R8, pHashMap
MOV RAX, RCX
XOR RDX, RDX
DIV DWORD PTR [mapSize]
ADD R8, RDX
DEC DWORD PTR [R8]
skipCallback:
ret
FnLeaveCallback ENDP
PUBLIC FnTailcallCallback
FnTailcallCallback PROC
ret
FnTailcallCallback ENDP
_TEXT ENDS
END
You see, nothing new here. sub RSP, 20h
and add RSP, 20h
are used to reserve memory on the stack and clean it up afterwards.
Using CPP implementations
As it seems that the CLR uses fastcall
convention for calling the callbacks, you may assume that you can use CPP implementations instead of writing assembler code. Indeed I was able to do this:
#ifdef _WIN64
bool* activateCallbacks;
int* pHashMap;
int mapSize;
void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size) {
activateCallbacks = activate;
pHashMap = hashMap;
mapSize = size;
}
void __fastcall FnEnterCallback(
FunctionID funcId,
UINT_PTR clientData,
COR_PRF_FRAME_INFO func,
COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo) {
if (activateCallbacks) {
int amount = pHashMap[funcId % mapSize];
amount++;
pHashMap[funcId % mapSize] = amount;
if (amount >= 30) {
StackOverflowDetected(funcId, amount);
}
EnterCpp(funcId, 12345);
}
}
void __fastcall FnLeaveCallback(
FunctionID funcId,
UINT_PTR clientData,
COR_PRF_FRAME_INFO func,
COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo) {
if (activateCallbacks) {
pHashMap[funcId % mapSize] = pHashMap[funcId % mapSize] - 1;
}
}
void __fastcall FnTailcallCallback(FunctionID funcId,
UINT_PTR clientData,
COR_PRF_FRAME_INFO func) {
}
#else
During testing the code I don't see any errors but I don't know if this approach is intended by Microsoft.
Conclusion
The differences between 32 and 63 bit is not so big. I think the most relevant thing is the calling convention.
Additional Links
Configure project in VS to enable MASM
Use correct #define for x86/x64
Impact of fastcall to stack consumption
Unwind code macros
Stack usage on x64
Another link about stack frames
X64 ASM code for the profiler
Example about unwind information
Explanation of fast call asm code
Found a typo?
As I am not a native English speaker, it is very likely that you will find an error. In this case, feel free to create a pull request here: https://github.com/gabbersepp/dev.to-posts . Also please open a PR for all other kind of errors.
Do not worry about merge conflicts. I will resolve them on my own.
Top comments (0)