The Cat and Mouse Game: Exploiting Statistical Weaknesses in Human Interaction Anti-Evasions

As a discipline, information security involves a vast web of entry vectors, mitigations, and counter-mitigations. Among these, one of the most impactful points of conflict between attackers and defenders is what happens when binaries are subjected to sandbox emulation. Purely static analysis has been understood to be a game that attackers win, and defenders lose: code can be obfuscated (“packed”, “crypted”, “virtualized”, take your pick), forcing any analysis to first run it in some way in order to understand it. If you can tell anything at all about a binary’s functionality at face value, it’s due to its author’s lack of effort, not their lack of means.

The same is far from true about sandbox emulation. Countermeasures to sandbox emulations do exist, but each comes with its own baggage, disadvantages, and chance of outright failure. No one-click silver bullet has popularly emerged in the attacker arsenal to just trivialize sandboxes. Some attackers may mistakenly believe they can ‘solve’ sandbox emulation by compiling the al-khaser test suite into their binary and refusing to run if a test fails, but unlike packing, which is a one-way operation, a test suite like this can be flipped around and used as a shield: anyone operating a sandbox can verify that their sandbox passes all the tests. In this way, paradoxically, by publicizing numerous sandbox evasions, projects like al-khaser have significantly cut down the maneuvering space available to attackers.

As a result of all this, the sandbox is one of the focal points where malicious campaigns sink or swim. Binaries that will just execute obvious malicious functionality in a sandbox are, in some fundamental sense, naked and ripe for detection, and so attackers are naturally pressured to add sandbox evasions. The moment they do, we’ve flipped the script on them: now _they_ have to worry about false negatives and false positives. A failed evasion or a falsely triggered evasion both result in the attack chain being cut short. Depending on the attacker’s motivation, this directly equals lost money, lost power, or lost prestige.

A natural question to ask is, “Who wins this game in the long run?” That is, if sandboxes and evasions continue to get better, is the natural end result of this process the uber-sandbox that will force binaries to expose all their latent malicious behavior or the uber-evasion that can allow malware to run only when advantageous to attackers? We won’t fully explore this question here, as its scope is huge and involves some notions that we would rather not help find their way to the attacker TTP mainstream (as Lovecraft cautioned, “Do not call up that which you cannot put down”). But we can reasonably discuss some of the next moves attackers might make as long as it’s feasible to get ahead of them.

Observing the way many evasions work as seen in (e.g.) the al-khaser test suite and taking it at face value can lead to a mistaken mental model where ‘evasions’ must result from simple glitches that can be fixed, or otherwise from naive responses by the sandbox to a query about something: some file, some registry key, a disk drive size, some other artifact. One might complacently reason that sandboxes simply need to ‘fix all the glitches and patch all the query results’, and that sandbox hardening is in this way reduced to going through a gigantic check list and is effectively a solved problem. Unfortunately, this approach falls apart in cases where satisfying the query _requires,_ in a strong sense, some intrinsic resource that the sandbox does not have access to.

One example of such a scenario is all evasions having to do with ‘human interaction’. A human is not a simple `TRUE` result of a call to `IsHumanPresent()`. Humans have behaviors, tendencies, resources, and capabilities. Until such time that every sandbox has a dedicated AI convinced that it is human and going on about its day-to-day routine in a way indistinguishable from one, this is a difficult and, we argue, under-explored frontier in the sandbox evasion space.

In this article, we first outline an example of such an evasion that bypasses several sandboxes, then provide an implemented mitigation.

Evasions based on mouse movements are not new. For example, certain strains of malware from the early 2020s would periodically sample the mouse position and wait for the `xor` of the X and Y positions to reach a certain value. This was an implicit condition on the number of different values the cursor position would take; it defeated some naive mouse movement emulation algorithms that picked a new random position for the mouse every second or so. This, and some slightly more advanced examples, prompted off-the-shelf sandbox solutions to introduce hardening in the form of less obviously auto-generated mouse movements. One particularly popular movement model used for this purpose is the random walk. Whenever some predetermined amount of time passes, the cursor is moved in some chosen direction by some distance, chosen by a process that has at least some random element.

It is really not our intention here for this to be a copy-paste-able widget or even a guide for successfully evading sandboxes. So we will not quite give the specifics here, but we will say that in many cases, implementations of random walks are vulnerable to a statistical attack. We created a proof-of-concept binary that evaded detection by several off-the-shelf sandboxes but would reliably trigger whenever we let a human naturally move the cursor for a while using a mouse.

Having verified that this is possible, we set out to design an algorithm for simulated mouse movement that will resist this attack (and, hopefully, this kind of attack in general; a short discussion on this issue follows the technical exposition).

The simulated mouse movement algorithm is comprised of the following loop:

The code corresponding to this top level behavior is below.

“`
def generate_movement(t: timedelta) -> List[Coor]: output_len = int(floor((t.total_seconds() * TIME.MS_IN_S) / TIME.INTERVAL)) result : List[Coor] = [] cur_pos : Coor = gauss_step_from( [SCR.WIDTH//2,SCR.HEIGHT//2], [SCR.WIDTH,SCR.HEIGHT] ) while len(result) Prototype: wave = lambda n,shift: lambda x: sin(n*tau*x+shift) freqs = [random.choice(range(1,PROTO.MAX_FREQ+1)) for _ in range(PROTO.NUM_WAVES)] profile = {f:np.random.uniform()*tau for f in freqs} h = lambda x: sum([wave(f,s)(x) for (f,s) in profile.items()])/len(profile) psi = lambda x: 4*(x-x**2) f = lambda x: x*(1-psi(x))+((h(x)+1)/2)*psi(x) return f
“`

Below is a collection of 25 randomly generated different prototypes using the above process.

Once a source point and a destination point on the screen are chosen, as well as path prototypes for both the \(x\) and \(y\) coordinates and a desired time for the trip to take place, creating the actual cursor movement is a matter of algebra — for each point in time seeing what portion \(0 \leq r \leq 1\) of the trip has elapsed, what values the prototypes \(\varphi_x\) and \(\varphi_y\) take at that \(r\), and finally computing the corresponding linear combinations of \(v_{\text{cur}}\) and \(v_{\text{next}}\) at the \(x\) coordinate and \(y\) coordinate.

Let’s see an example. Suppose we are charting a path between two points on a 1920×1080 display. We have determined that the journey should take \(T= 5_{\text{sec}}\). Using the above method, we generate path prototypes, one per each coordinate.

Applying these two prototypes to the movement between \(v_{\text{cur}}\) and \(v_{\text{next}}\), one then obtains the following cursor movement.

As explained earlier, the full simulated mouse movement is just a sequence of these paths, where the next destination is sampled from a Gaussian centered around the current position, and there is an exponentially distributed delay between one path and the next.

At this point, it is natural to ask how favorable the prospects are of an attacker trying to construct an evasion again, sidestepping a sandbox that employs this kind of simulated mouse movement. We think the answer is nuanced and is influenced mainly by the following considerations.

`base_movement_time. Call` `generate_movementand interpolate the result with a bunch of other not-obviously-auto-generated stuff. Get creative. If there is no single algorithm to beat, attackers are at a further disadvantage.`

The universe of sandbox evasions is very large and very strange. Even something as prosaic as minor quirks in a mouse movement can, in theory, spell the difference between a successful attack and an attack stopped in its tracks or between a targeted analysis taking an hour and taking half a week. Malware authors are doing their best to come up with new evasions all the time, and it’s hard to imagine a sweeping solution to all their many creative ideas. To win the day, defenders must take many Sisyphean small steps, closing half-open avenues and raising walls ever so slightly higher. The best we can hope for is that one day a malware author with a half-baked evasion idea on their mind reads this and says “never mind, I’m going to try something else”.

“`
import numpy as np import random import time from math import floor,sqrt,sin,tau from datetime import timedelta from typing import Tuple, List, Callable import platform def get_display_resolution() -> Tuple[int,int]: if platform.system() == ‘Windows’: import ctypes user32 = ctypes.windll.user32 user32.SetProcessDPIAware() return user32.GetSystemMetrics(0), user32.GetSystemMetrics(1) if platform.system() == ‘Linux’: from Xlib.display import Display screen = Display(‘:0’).screen() return (screen.width_in_pixels, screen.height_in_pixels) print(“Could not get screen resolution.”) exit(1) #parameters #All time intervals are in ms class TIME: MS_IN_S = 1000 INTERVAL = 10 class MOV: GLOBAL_COEF = 15000 MIN_TIME = 100 EXP_DIST_MEAN = 1 DEST_MOTILITY = 0.25 class WAIT: GLOBAL_COEF = 500 EXP_DIST_MEAN = 1 class SCR: WIDTH, HEIGHT = get_display_resolution() class PROTO: MAX_FREQ = 4 NUM_WAVES = 6 type Coor = List[int] type Prototype = Callable[[float],float] #functions def generate_movement(t: timedelta) -> List[Coor]: output_len = int(floor((t.total_seconds() * TIME.MS_IN_S) / TIME.INTERVAL)) result : List[Coor] = [] cur_pos : Coor = gauss_step_from( [SCR.WIDTH//2,SCR.HEIGHT//2], [SCR.WIDTH,SCR.HEIGHT] ) while len(result) int: return int(floor(t.total_seconds()*TIME.MS_IN_S / TIME.INTERVAL)) def pick_movement_time(cur_pos: Coor, next_pos: Coor) -> timedelta: mlc = mov_len_coef(cur_pos,next_pos) base_movement_time = np.random.exponential(MOV.EXP_DIST_MEAN) return timedelta(milliseconds=max( MOV.MIN_TIME, base_movement_time*mlc*MOV.GLOBAL_COEF )) def gauss_step_from(src: Coor, bound: Coor) -> Coor: assert len(src)==len(bound) dim = len(src) assert(0 float: abs_dist = sqrt((dst[0]-src[0])**2+(dst[1]-src[1])**2) screen_diag = sqrt(SCR.WIDTH**2+SCR.HEIGHT**2) return abs_dist / screen_diag def gen_path_proto_1d() -> Prototype: wave = lambda n,shift: lambda x: sin(n*tau*x+shift) freqs = [random.choice(range(1,PROTO.MAX_FREQ+1)) for _ in range(PROTO.NUM_WAVES)] profile = {f:np.random.uniform()*tau for f in freqs} h = lambda x: sum([wave(f,s)(x) for (f,s) in profile.items()])/len(profile) psi = lambda x: 4*(x-x**2) f = lambda x: x*(1-psi(x))+((h(x)+1)/2)*psi(x) return f def actualize_path_1d(proto: Prototype, start: int, end: int, steps: int) -> List[int]: r = lambda i: proto(i/steps) samples = steps+1 # @—@—@—@ ||| —*3 whereas @*(3+1) return [round(start*(1-r(i)) + end*r(i)) for i in range(samples)] def gen_path(src: Coor, dst: Coor, steps: int) -> List[Coor]: assert(len(src)==len(dst)) dim = len(src) protos = [gen_path_proto_1d() for _ in range(dim)] path = [actualize_path_1d(protos[i],src[i],dst[i],steps) for i in range(dim)] return list(np.transpose(path)) def demo(coors: List[Coor]): from pynput.mouse import Controller mouse = Controller() for coor in coors: mouse.position = coor time.sleep(TIME.INTERVAL/TIME.MS_IN_S) if __name__ == “__main__”: coors = generate_movement(timedelta(seconds=30)) demo(coors)
“`