(Note: This article uses .NET Web Driver and .NET Interoperability to work with the Win32 API.)
In this article, I'll show you how to bring a Chrome browser window to the front, ensuring it becomes the active, focused window during Selenium automation. Usually, Selenium is used to automate web tasks in the background, allowing you to continue working on other tasks. However, some websites 😉 get suspicious if they detect actions happening while the browser is inactive—after all, how could there be interactions if the browser isn't even in focus?
To help address this, let's take a closer look at how browser focus works. In JavaScript, the onfocus
and onblur
events allow websites to determine if the browser is the active window. The onfocus
event triggers when the window gains input focus (meaning it can receive keyboard input and is the active window on the screen), while the onblur
event triggers when the browser loses focus. These simple events can give websites clues about whether interactions seem automated.
Many websites also use more sophisticated approaches to detect human-like interaction patterns. Therefore, if you're trying to convincingly simulate human behavior, controlling the browser window—bringing it to the front or switching between windows—can be essential. This is where the Win32 API comes in handy.
Little Note about F#
This article uses F#. Here is a little explanation of its syntax:In C# if you have a method/function named f
which takes one argument (supposed it's x
), you can call it with var y = f(x)
. But F# version, the possibilities are:
let y = f(x) // normal call
let y = f x // parentheses are optional
let y = x |> f // forward pipe call
Bringing Chrome to the Front with Selenium
On Windows, you can control any application window using the Win32 API, but you need a "window handle" to do so. A window handle (often abbreviated as HWND) is a token provided by the Windows OS that uniquely identifies a window in the system's GUI environment.
When using Selenium, starting ChromeDriver
automatically opens the browser, but it doesn't directly provide the window handle. Instead, we can get to it indirectly via the ChromeDriver
process ID, which is accessible through ChromeDriverService
.
Step 1: Obtain ChromeDriver
Process ID
When we create the ChromeDriver
instance, we can access its process ID through the given ChromeDriverService
object.
let options = ChromeOptions()
let service = ChromeDriverService()
let driver = ChromeDriver(service, options)
printfn $"Process ID = {service.ProcessId}"
Step 2: Find the Browser's Process ID
The ChromeDriver
executable (chromedriver.exe
) is responsible for launching the Chrome browser instance, but we need the browser's process ID, not the driver's. To find it, we need to iterate through all running Chrome processes on the machine and identify the one spawned by our driver.
open System.Diagnostics
let getBrowserId (driver_process_id: int) =
let chrome_processes = Process.GetProcessesByName "chrome"
try
let candidates = chrome_processes |> Seq.filter (fun p -> getParentId(p.Id) = driver_process_id)
|> Seq.map _.Id
|> Seq.toArray
assert (candidates.Length = 1) // must be one Chrome window
candidates[0]
finally
chrome_processes |> Seq.iter _.Dispose()
Step 2.1: Get the Parent Process ID
Getting the parent process ID from a given process ID is a bit tricky — it requires querying detailed process information using the NtQueryInformationProcess
function. Essentially, we open the process with the known ID, call this function, and extract the required information.
open System.Runtime.InteropServices
[<Flags>]
type private ProcessAccessFlags = QueryInformation = 0x400u
[<Struct; StructLayout(LayoutKind.Sequential)>]
type private ProcessBasicInformation = {
Reserved1: IntPtr
PebBaseAddress: IntPtr
Reserved2_0: IntPtr
Reserved2_1: IntPtr
UniqueProcessId: IntPtr
InheritedFromUniqueProcessId: IntPtr
}
with
static member Default = { Reserved1 = IntPtr.Zero; PebBaseAddress = IntPtr.Zero; Reserved2_0 = IntPtr.Zero; Reserved2_1 = IntPtr.Zero; UniqueProcessId = IntPtr.Zero; InheritedFromUniqueProcessId = IntPtr.Zero }
[<DllImport("kernel32.dll")>]
extern IntPtr private OpenProcess(ProcessAccessFlags dwDesiredAccess, bool bInheritHandle, int dwProcessId)
[<DllImport("kernel32.dll", SetLastError = true)>]
extern bool private CloseHandle(IntPtr hObject)
[<DllImport("ntdll.dll")>]
extern int private NtQueryInformationProcess(IntPtr ProcessHandle, int ProcessInformationClass, ProcessBasicInformation& ProcessInformation, int ProcessInformationLength, int& ReturnLength)
let getParentId (process_id: int) =
let handle = OpenProcess(ProcessAccessFlags.QueryInformation, false, process_id)
if handle = IntPtr.Zero then
failwith "OpenProcess failed"
try
let mutable pbi = ProcessBasicInformation.Default
let mutable returnLength = 0
let size = Marshal.SizeOf pbi
let status = NtQueryInformationProcess(handle, 0, &pbi, size, &returnLength)
if status <> 0 then failwithf $"NtQueryInformationProcess failed with status %d{status}"
pbi.InheritedFromUniqueProcessId.ToInt32()
finally
CloseHandle handle |> ignore
Step 3: Locate the Window Handle
Once we have the process ID for the browser, the next step is to enumerate the window handles owned by this process.
type private EnumWindowsProc = delegate of IntPtr * IntPtr -> bool
[<DllImport("user32.dll")>]
extern bool private EnumWindows(EnumWindowsProc lpEnumFunc, IntPtr lParam)
[<DllImport("user32.dll")>]
extern int private GetWindowThreadProcessId(IntPtr hWnd, int& lpdwProcessId)
[<DllImport("user32.dll")>]
extern bool private IsWindowVisible(IntPtr hWnd)
let findBrowserHandle (browser_pid: int) =
let mutable result = []
let enumWindowsProc (hWnd: IntPtr) (_: IntPtr) :bool =
let mutable pid = 0
GetWindowThreadProcessId(hWnd, &pid) |> ignore
if pid = browser_pid then result <- hWnd :: result
true
let callback = EnumWindowsProc(enumWindowsProc)
if not <| EnumWindows(callback, IntPtr.Zero) then failwith "EnumWindows failed"
result |> Seq.filter IsWindowVisible
|> Seq.exactlyOne
Since there could be multiple windows, we specifically look for the visible one—this is the Chrome window that we want to interact with.
Final Step: Bring Chrome to the Front!
Now that we have the correct window handle, we can bring it to the front by using the SetForegroundWindow
API. This call will make Chrome the active window, ready for any user input.
[<DllImport("kernel32.dll")>]
extern uint GetLastError()
[<DllImport("user32.dll")>]
extern bool private SetForegroundWindow(IntPtr hWnd)
let [<Literal>] private NoError = 0u
let [<Literal>] private AccessDeniedError = 5u
let setForegroundWindow (hwnd: IntPtr) =
if not (hwnd |> SetForegroundWindow) then
match GetLastError() with
| AccessDeniedError -> printfn "Cannot set foreground, process doesn't own foreground privilege"; false
| NoError -> true // window is already on top... I think...
| code -> failwithf $"""SetForegroundWindow failed (0x%X{code})"""
else
true
One caveat here is that in order to successfully set the foreground window, the process making the request (your Selenium script) must already be the foreground process. In practical terms, this means your Selenium script needs to be the active application before it can bring the browser to the front. To make this strategy work smoothly, ensure your script maintains foreground status whenever necessary.
Conclusion
In this tutorial, we used a combination of Win32 API calls to obtain and manipulate the window handle of a Chrome browser, allowing us to bring it to the front during Selenium automation. While this may seem like a lot of effort for something seemingly simple, it's often necessary if you need to convincingly simulate human interactions. Ideally, Selenium would offer more direct support for this, but until then, this approach ensures your automation scripts remain effective and human-like.
Top comments (0)