Defense in Depth, Layered Security and Mitigation¶

There is no such thing as bug free software. Only insufficiently tested one.

Layers¶

Let's begin with the idea of a code that has been successfully reviewed by both internal quality assurance team and passed even external code audit, although the reviewers had suggested one change.

int32_t marker_data = 0;
char marker[17] = "MagickMarkerBytes";
int marker_version = 1;

The company had note on this line, that they prefer to add space for the '\0' byte in string literal, although C99 permits this construction and considers it a valid one (the '\0' is not pushed in the marker array by compiler). However, as this was an embedded design, the development team considered the design that consumes less memory a better one. Plus, the in-house developed static analyzer was silent about the line. (Note: For those who doubt, this is really valid C code)

The auditing company had also one more note, turned into patch:

diff -rupN a/payload_handler.c b/payload_handler.c
--- a/payload_handler.c
+++ b/payload_handler.c
@@ -38,7 +38,7 @@
         marker_data = *(int*)(payload + 196);
     }

-    if (strcmp(marker, payload + 200) == 0) {
+    if (strncmp(marker, payload + 200, 18) == 0) {
         ready = true;
     } else {
         // not a marker tag - already authenticated command is

Now, the nasty thing is that far too many things about these lines went undiscovered. Can you spot them all?

  • The nonterminated marker is used in strcmp(). In packed variable model (recall the pressure on saving memory?) will be marker followed by marker_data, which might not contain zero - thus efficiently cause strcmp travel far into the payload.
  • The code uses pointer casting to platform-dependent type on arbitrary payload.
  • In-house static analyzer tool was implicitly used as the only static analyzer, silent of the strcmp+fixed size issue.
  • Rare corner case of the C language distracted all auditors.
  • The text holds reference to embedded design, pushes on memory compactness. Usually the difference between SOC with sufficient memory and insufficient is fragment half a dollar. Times million manufactured pieces, about half a million dollars.
  • The commend on the end of the diff suggests that potentially more dangerous commands are the default choice.

Vocabulary - failure safety vs. failure security¶

  • Safe - does not endanger liver or property.
  • Secure - does not leak information, prevents unauthorized use, security is not compromised.

These are nearly orthogonal values, it's desired to develop them both at the same time. The behavior during failure is critical, as it is the point where your system is most vulnerable and most dangerous for the rest of the world. It is thus desired to have the system under control even during the failure.

if (check_access() == EDENIED) {
    die("access denied");
} 

// vs.

if (check_access() != OK) {
    die("access denied");
}
// consider ENOMEM

The error handling paths are one of most exposed targets in the program. This has two main reasons:

  1. Technical - error handling tends to prolong the code, thus complicate audition.
  2. Psychological - the error handling is boring, once error has been detected it is natural that one might feel that vulnerability has been sanitized and let their guard down. Plus, you often want to provide as much information about the error, help to resolve it - yet this might not be a correct decision. Consider a following error messages:
  1. "Login unknown"
  2. "Wrong password"
  3. "Login or password incorrect" (returned fast)
  4. "Login or password incorrect" (returned with a slight delay)

What side channels are there?

Layered design principles¶

Generally, we would like the systems to be safe and secure. This means:

  • no leaks of private data,
  • no unauthorized access,
  • no outages,
  • fair use.

Still, even if we address those issues, seemingly safe code can lead to catastrophe. History simply indicates that failures are inevitable, all we can do about them is to lower the probability of occurrence.

The layered design splits this into several ideas:

  1. Split the design into separate layers, fortify and harden each layer.
  2. Do not trust or assume that authentication or validation has already been done.
  3. Attempt to catch failures in one layer by the other one.
  4. Suspect all extraordinary and unusual behavior.
  5. Keep the layers simple, define them both inside and outside.
  6. Prefer established libraries and tools, complement them with in-house solutions.
  7. Mitigate the impact of the attack wherever possible.
  8. Review, make the code aKISS.

Even if it cannot make your code perfect and impenetrable, it will slow down the attacker. An early detection provides you with time to respond to the attack, identify the attack vector and patch. The attacker might pick an easier target as well, if they loose interest, you win as well.

Single-point-of-failure¶

The buffer overflow above can lead to either exploitation of the vulnerability, or - if used with stack sanitizer, a crash. Crash is something that can be logged. A process core dumped, event log from last few minutes saved. Crash reconstructed, vulnerability found, attacker neutralized. All that thanks to the good design and - especially - logging. It does not help you much if the logger is the first thing to loose.

As a result, you might want to create a backup logger, that will be independent on the main one. Once the backup logger is online, you have successfully reduced one single-point-of-failure. The less there are, the better for the stability of your system.

Some designs are full of SPF, it might be impossible to reduce them. For example, if a user decides to put a sticker with password on their monitor, you can't really expect them to remove it. Anything you can do about is is to reduce the reason why they had the need to write the password on the sticker in the first place.

Review¶

Reviews are the basic tool to maintain code culture in a team. Though to be effective, they require the reader to be on at least the same experience level as the writer. The main reason is to identify unreadable code and minimize it. That does not mean a less experienced reader couldn't have it's bright moments, after all - if the code is incomprehensible, the less experienced reviewer might simply point out, while the experienced could subconsciously auto-compensate.

This idea can be upscale, a closed source project opened such that it can be observed by many at the same time. It is a big step though, many companies consider this a high risk, aware of insufficient internal processes and afraid to loose their know-how. And even after that publishing their code, there might be a an error still present for many years (shellshock, various ssl-related issues like hearthbleed).

Obscurity¶

But closing the source and obscuring it is not the solution either. It just complicates security audit and internal reviews. Quite contrary, while it complicates the defense, the attacker still has to perform reverse engineering, core inspection and disassemble the known code (which they probably would even in option zero).

You might (and will) find arguments about tamper-proof devices with secured code and inputs, yet beware - no device is tamper proof if you have sufficient number of pieces to experiment on. And thus, the midcase is often used - do not open the source, open the design.

The design can undergo independent reviews, catching out protocol flaws, while the costs of implementation still keep protecting your investment. Plus, it can increase the net value of your technology, unlike the opposite.

There are several exemplary design failures:

  • Every (offline) DRM protection so far
  • GSM (explained for example here)
  • Car remote keys (quite many in fact) (and still vulnerable by protocol flaws)

Mitigation by careful design¶

You should stick to the best practices:

  • Use well established code, especially concerning cryptographic libraries. These often (and the good ones especially) subject to independent side channel analysis and get to be designed and implemented by cryptography experts.
  • Review the code, even when the review value diminishes in time.
  • Audit your code, be suspicious of it.
  • Try avoiding cornercases in languages, they complicate the audits and review.
  • No half-a-dollar spared is worth your company name, use sufficient resources.
  • Don't consider your device to be impenetrable, minimize the damage through responsibility isolation, combine the approaches and tools.
  • There is no such a truth behind // Assume already validated.

A personal sidenote: Anybody claiming something wireless is safe and secure has either poor fantasy, or works as insurance seller.

Mitigation in Depth¶

Gameplay rules:

  1. Bugs are inevitable. Deal with it.
  2. Make the exploitation expensive.
  3. Escalate the costs of escalations.
  4. If you have a budget: pull-in honeypots.

Make the exploitation expensive¶

This rule essentially covers the idea that even if you penetrate system, the resources you had to burn (for example by bruteforcing MAC) did not pay off. And even if the system is penetrated, the attacker just hits another wall. For example, even if they manage to find a buffer overflow, they hit memory fence.

Escalate the costs of escalations¶

Each layer in the layered security model should add to the total security, not bypass the other layers. This is fairly handy, concerning layers that add capabilities. Consider a router which has a fairly working SSH, however uses old encryption. Even if you get in, you shouldn't be able find a tools that can bypass the ssh lying around (for example telnet server). If you do so, there obviously has not been a layer jailing the faulty SSH away from the rest of the system.

Stick to the principle of least privilege, if a network service does not need to fork a new processes, it shouldn't even have the capability to do so (seccomp()). The principle of least privilege provides you with one more guard rule - write permission should be exclusive with execute permission.

This applies to the principle of minimal knowledge as well. Consider a Web application that allows you to send free sms as well as payments payments. If it checks an account balance before a complete authentication, you can easily devise an attack with expensive sms and http status code 402 instead of getting 403 first, effectively turning the site into username oraculum.

(Oraculum is a thought device that allows to gain previously unknown information through a side-channel.

Use what you already have:

  • You can also use linker to defend yourself - randomize the memory regions, use signatures.
  • Compiler? Fortify source, use trap sleds.
  • Memory allocation? Canaries around variables, guard and hole pages.
  • Secure randomness as default, nor explicit choice.

Minimize the Trusted Computing Base¶

This should be only a small code, requiring elevated privileges. It is capable of overrides of the security subsystems. If the application is sufficiently sandboxed, then there is no need for elevated privilege and thus, it does not belong into TCB. Unfortunately very vast in practice. For comparison, study the differences between usual Linux security model and Android.

The goal of privilege separation is to minimize the TCB by constantly reevaluating the need for elevated privilege. And if it is no-longer needed, drop to common account. This is achieved through splitting the process into multiple processes, where the auxiliary ones are no-longer privileged. This, privilege separation is a multi-process IPC based protocol.

Example of privilege separation in Linux web server¶

  1. Kernel allows binding on well-known ports (below 1024) only to superuser.
  2. Only superuser may read private keys.
  3. If these two operations are the only ones where root is required, the web server can subsequently chroot to subdirectory within filesystem and drop to auxiliary account.

Honeypots & isolation¶

Honeypoting is a technique used widely in network security, it essentially helps you observe and steal the attacker tools. Has his previous attack failed? Prepare (firewall) rules and put him in an isolated environment where you log everything.

Although isolation is not all-solving, it definitely helps you stop propagation of the fault, isolate it in contained environment (that you can destroy and restart). This does not apply to honeypots only, you can isolate individual subsystems in non-honeypot setup. Although, there attacks that effectively allow to break it, it is still not easy. (Recall rowhammer and spectre?)

There are several approaches:

  1. Physical isolation (the machine cannot expose the data it doesn't have).
  2. Containers, VM's (process isolation)
  3. Exclusive memory mapping, locking.
  4. Sandboxing - further restrict untrusted code.
    • Use auxiliary processes (i.e. chrome renders) - kernel does the cleanup of failed process.
    • Selinux/apparmor

Yet sometimes you need to isolate subsystems that need to communicate with each other. For example, WordPress CMS and database. Under normal circumstance, this would not be possible, as WordPress needs a write access to the database. Yet, you can apply mitigation - for example, have two instances of WordPress using the same database, one having write access, hidden behind authenticated service, one for common access. Plus, with recent spread of overlay filesystems, you can even go as far as having a separate filesystem for each Apache instance. The only question is, when and where you adhere to the law of diminishing returns.

KISS¶

KISS is philosophy that can be applied as mitigation as well. Keeping your design simple and avoiding overthinked clever hacks allows for a more thorough code review and auditing, allows to imagine the impact of side effects on the security. This applies for both the code and the overall design.

Plus, simpler code indirectly leads to more readable code and thus, less bugs present (especially due to code review). However, this - as a result - also means that you need to restrain yourself from adding unnecessary features, as this provides more space for the attacker.

Sanitize channels¶

Never trust your inputs, as they originate outside your system. This means that you should validate and sanitize the inputs upon every entry to your application. Yet, this does not mean that your internal inputs are safe and have already been validated. Suspect that the attacker has access to your device and that they might have used a bug in the input channel (recall the original example?). Far too many faults propagate along trusted channels.

Misuse of language, tools and ignorance¶

Distinct languages have distinct memory policies. This has impact on how safe the language is, how error-prone it is. C has a nice property of pointer arithmetic, which is also it's caveat. Given the external parallelism in the language, it is very simple to create unsafe - or even dangerous code.

In that case, one might order the languages as follows: C, C++ (correctly used), Java (VM + memory). And, there is Ada and Rust.

Keep yourself educated on both up-to-date exploits and mitigation. Study the codebase you use, study the security record of components you use. Even the hardware (AMD vs. Intel vs. spectre). Learn from your own mistakes, learn on other people mistakes. Try hacking into device from time to time. Including your own. Use elevator pitches to read security news. Follow well-known security patterns (among other)

There is no such thing as bug free software.

Only insufficiently tested one.

Until there is a new,

insufficiently tested release.