The importance of language, binary diffing and other "One Day" stories


This article was originally published on INCIBE security blog.

The role of language in a profession is important. Any discipline generates its own technical language as it evolves and becomes more complex, and it is a mechanism by which professionals in the same field can share knowledge and interact concisely, accurately and unambiguously.

Disciplines can even have their own sub-disciplines, as is the case with IT, which encompass so many different things that it is necessary to create specializations, each with its own technical language.

Our discipline is information security, a field as complex as it's jargon.

However, this specialization has increased in interest and complexity over a very short period of time, thus creating a problem: it becomes a trend, making its proper development difficult. "Where marketing prevails over technological development".

Given that expressions such as "computer attacks, hackers, information leaks or espionage" are commonly the subject of media, the technical language of the profession is starting to be used indiscriminately, and often, too often, is used incorrectly. Sometimes we even see invented terms that do not exist or, worse, are semantically incorrect.

Likewise, only the most popular and user-friendly techniques are advertised and made to stand out ("Become an expert using Metasploit", "Learn malware analysis in just one day"), instead of focusing on the fundamental knowledge about computing and processor architecture needed to understand what we are doing ("math is not fashionable and does not sell"). It is not unusual to find this kind of philosophy even in training courses.

All of this simply creates hot air and professionals (especially those who are just starting out), run the risk of incorrectly learning concepts or hitting a plateau at the basic concepts if he/she is not able to dig deeper without getting distracted by all of this information in the spotlight.

Other contributors include the excessive use of buzzwords or the recent trend to give vulnerabilities catchy names (shellshock, heartbleed, venom...) in order to sell them as some new discovery or as exceptionally dangerous. In reality, they are not more relevant than any other vulnerabilities and are actually quite '’simple’’.

On this article we will write about 0day vulnerabilities, as an example of how poor use (or marketing use) of some jargon words can be misleading.

Zero-day is one of the words whose meaning is being abused, giving the public a false perception that this type of vulnerability is very dangerous and that other ‘’common’’ vulnerabilities are less dangerous just because we don't hear about them until they appear on a patch "changelog". A common example are the responsible disclosure programs where the manufacturer patches silently without giving details on the vulnerabilities fixed and hiding their criticality from the general public, a questionable practice.

The reality is that any vulnerability can be dangerous. Many times, patched vulnerabilities (let's call them 1-day to illustrate how unnecessary the term is) are even riskier than undisclosed vulnerabilities since once they are patched, our attention to them lowers - and a potential attacker knows that.

Unconsciously, we downplay their importance because they are not 0day.

Let's see an example of how an attacker can leverage this "lower priority" behaviour when patching is needed.

Binary diffing

Binary diffing (or program diffing) is a classic reverse engineering technique where 2 files are compared at binary or instruction level looking for differences in code. In other words, one file is examined to see what has changed in it with regard to the other.

To achieve this and obtain meaningful information, several heuristics are used, such as finding differences between function names, blocks of instructions, function prologues, etc.

There are a few well known utils for this, like DarumGrim, Zynamics Bindiff, or Diaphora, an open source plugin for IDA Pro written by the researcher Joxean Koret and which we will use for demonstration in this article.

This type of technique is useful for some tasks, like discovering the new features or improvements that a software has received or checking if a new patch is compatible with our system to avoid "breaking it" on deploy. The most interesting feature, however, is finding the vulnerabilities that were patched, comparing the original vulnerable file with the patched one.

This has legitimate applications such as generation of IDS signatures or antivirus, as well as being a widely-used method for discovering and taking advantage of vulnerabilities that have not yet been revealed to the public.

To illustrate how this is done, we have created a simple software with two files, prog.exe and utils.dll.

This software takes the path to a text file as an argument, and prints its first 8 characters on the screen.

Later, we receive a patch for this program, with a new version of utils.dll, and whose changelog says:

"Version 1.2: A bug that caused the software to be unstable has been fixed."

This doesn't say much, and seems a bit opaque. What do they mean by "unstable"?

The only new file is utils.dll, so we know the fix was done on it, and we will compare it with the original file and see what changed on it's code.

First we throw the original utils.dll to IDA, and use the Diaphora script to generate a sqlite database with all the info it needs to work.


Now we need to repeat this process with the new utils.dll, but also providing the first sqlite to perform the diffing.

Diff assembly

The script will open a few views, and quickly we notice in "Best matches" that the function utils_1, matches on both files, but only on function name.

Best match

To finegrain, we launch again the script, this time with the option "Ignore all function names", and we can see that the code for utils_1 doesn't match.


It's time to compare the two functions to analyze in detail what have changed in the code where the patch was applied.

Diff assembly in a graph

Diff assembly in a graph Left, original function; Righ, patched code

Diff assembly

Diff assembly Left, original code; Right, patched code

Since this function is not too complex, before studying the modified code, let's see what the function does, in order to have the whole picture, which will be useful to understand the reasoning behind the fix.


utils_1 set up the stack frame for two 8-byte arrays (var_8 and var_10) nd opens a file in read mode (probably the file we pass to the program as an argument), and then reads 8 bytes from it to var_8 with fread.

If we recostruct the code in C:

Unpatched code

Now checking the patched code:

Patched code

We can see there is a call to strncpy_s to copy var_8 to var_10. We can infer from this what was the vulnerability, but let's continue the analysis.

Patched code

In the original code (instead strcpy_s):

Unchanged code

The rutine loops reading a byte at a time until a null byte ('0') is found.

That is the behaviour of strcpy (a function considered "unsafe"), and that rutine will continue reading bytes if null is not present, overwritting the stack and causing a classic buffer overflow.

We have discovered that the patch is fixing an overflow triggered by an uncontrolled use of the function fread. It copies the first 8 bytes from the file to the array, but doesn't add a null terminator to it. This will overflow strcpy when copying it to the other array.

Knowing this, writting an exploit is trivial; loading a crafted file with a carefully designed payload would allow us to execute our own code.

To check that the overflow is there, we are just feeding prog.exe with a file bigger than 8 bytes.

Patched binary

Patched binary

Unpatched binary

Unpatched binary

This was an extremely simple example; in a real scenario we would find some extra difficulties:

  • More code diffs here and there (compiler optimizations,...)
  • Bugfixes not related to the vulnerability
  • Obfuscated code
  • Anti-disassembly or anti-diffing techniques
  • ...

Also, common protections like DEP, ASLR, StackGuard (canary),… were disabled for this experiment.

In any case, for an experienced analyst, to find the vulnerability and write an exploit for it would be matter of hours.

The time window between the release of a patch and the actual deployment can be leveraged by attackers to reverse engineering it, create an exploit and attack the target. Sometimes that window can be delayed in time for multiple reasons (bad patching policies, logistics,...), including not considering these vulnerabilities critical, or considering them unimportant.