Why most audits miss what actually breaks systems
Auditing complex systems isn’t the same thing as debugging.
Debugging is chasing a known failure. Auditing doesn’t have a single target. You’re hunting for wrong assumptions, unintended behaviors, and attack paths that only show up once the system meets reality.
That’s where a lot of audits fall short.
Most real failures don’t come from some exotic “gotcha” vulnerability or a galaxy-brain exploit. They come from the mismatch between how the system was supposed to behave and how it actually behaves under stress, misuse, or adversarial conditions.
At Hypotenuse Labs, we treat audits as structured skepticism. Tools help, but understanding matters more. Below is the methodology we use to surface real risk in complex, production-grade systems.
Suggested Procedure
1. List Expected Behaviors
One nice property of DeFi is that there’s a lot of reuse—high-level behaviors, concepts, and often the code itself. Because of that, it’s relatively rare to see behaviors that are truly foreign; usually there are close analogs elsewhere. As auditors, we should exploit that: prior attacks on similar behaviors can often be adapted to the protocol in front of us. The first step is capturing that intuition.
Before doing anything—even before reading documentation or code in detail—do a high-level scan of the protocol’s contracts. For every function, RECORD the following using only the contract name and function name as context:
The expected high-level behavior of the function in English
Who you expect should have access to the function (e.g., only admins)
A list of corner cases or likely mistakes that might exist and should be checked
Note: for protocols like MakerDAO where the code may be intentionally obfuscated, swap steps 1 and 2.
2. Consult Documentation and Tests
After you’ve recorded what you expect to see, read the provided documentation and inspect the test suite. The goal is to treat the docs as a spec that captures the developer’s intent. If you understand that intent clearly, you can look for deviations—which often map directly to bugs. Prioritize behavioral descriptions, assumptions, access controls, invariants, and expected user flows.
As you read, augment your existing notes with:
A description of the developer’s intention (don’t overwrite your original expectations—keep both)
Any explicit or implicit access controls
Potential invariants that should hold
Also flag any parts of the protocol with weak test coverage—those areas are more likely to hide issues.
While you should incorporate intent, never assume the developer is “right.” We’re the experts here, and our judgment takes precedence. After your documentation pass, revisit your notes and identify where your expectations diverged from the developer’s. This matters because sometimes the design itself is what creates the exploit surface (Beanstalk is a good example). In particular, ask yourself:
Where does the developer’s expectation diverge from my own?
Does the developer’s intended design introduce security implications?
3. Consult Tools
Once you have a working model of how the protocol is supposed to behave, bring in tools. Tools can pinpoint concrete locations worth investigation and help validate behavior. I find it useful to split tooling into two buckets:
Common Vulnerability Detection
Tools: Slither
These tools flag specific code locations that may be exploitable. A report doesn’t necessarily mean a real vuln—it may be a false positive. Treat tool output as hypotheses: write them down and validate later (in step 5).
Behavioral Checking
These tools require specifications. Build those specs by translating your behavioral descriptions and access control rules into machine-checkable properties. Then use the tools to test whether the properties actually hold. Because these runs tend to be slower than vulnerability scanners: (1) prioritize specs by importance, and (2) run them in parallel with the rest of the process.
If you get a counterexample, don’t assume the code is wrong—the spec might be. You need to understand the counterexample, then either fix the spec and rerun, or decide whether the spec should be refined. As you iterate, keep notes on how the code deviates from the original spec; those deviations can become bugs in a broader context (e.g., an access-control quirk that becomes a DoS vector elsewhere). I’ve found this refinement loop extremely helpful for building a real mental model of the protocol.
4. Understand the Code
By now you should have an intended-behavior model you can compare against the implementation. To do that comparison, you have to actually read and understand the code. There are two general approaches: bottom-up and top-down. Bottom-up means starting at the simplest “leaf” functions and moving up the call chain. Top-down means starting at the highest-level entry points and drilling down into dependencies. I prefer top-down, so that’s what I’ll outline.
Top-Down Approach
I like top-down because it makes the full picture clearer, including the assumptions callers place on callees. Starting from the most important functions, do the following:
Read the code and infer the behavior implemented by the code itself. When you infer behavior, don’t fall into the trap of seeing what you expect to see. Actually READ the code and UNDERSTAND it. Think like an interpreter: what happens for any input? If you’re not used to this, quiz yourself by walking through a few concrete test cases to make sure your understanding is correct. When you hit a function call, insert your behavioral description. If you’ve already analyzed that function, use the actual description; if you haven’t, use your expected description (from steps 1 and 2).
RECORD the actual behavior of the function. Do not overwrite what you wrote in steps 1 and 2—keep this separate.
RECORD any deviations between actual behavior and expected behavior.
RECORD any assumptions (or invariants) the current function is making about callees that you haven’t documented yet.
RECORD any vulnerability patterns you notice while reading. Just mark them for now—you’ll return to them later.
RECORD any invariants that hold for the current function.
RECORD who can actually access the function.
After doing this for a single function, pick a dependency and analyze it next. Repeat until you’ve covered all ancestors of the current function. Then move on to an unseen function until everything has been inspected.
5. Reconcile
Step 4 produces a lot of recorded information—now you need to reconcile it. You can reconcile as you go, but it’s often better to step back and come back later with fresh eyes. I recommend finishing your code pass first, then reconciling each function in the same order you visited them.
During reconciliation:
Determine whether any corner cases from step 1 actually apply.
Determine whether any anticipated invariants (from steps 2 and 4) fail.
Determine whether the function is accessible by someone who shouldn’t be, by comparing anticipated access controls to the actual access controls (from steps 1, 2, and 4).
Determine whether differences between actual and expected behavior introduce a security vulnerability.
6. Investigate Common Vulnerabilities
At this point you should have a complete list of potential “common” vulns—some surfaced by tools, some found manually. Now go through that list and validate each one. Depending on what you ran in step 3, expect plenty of false positives. The way through is prioritization: weigh severity against the likelihood the report is real (based on experience). Try to rule out as many as possible quickly. This step can feel overwhelming, so do it in parallel with step 7.
7. Get Creative, Think Like an Attacker
This last step is also the least mechanical.
Once you understand how the system is intended to work, how it actually works, and where assumptions start to crack, what’s left is creativity. Great auditors are great at breaking things.
This phase is about fully adopting an adversarial mindset:
If you were attacking this system, where would you begin?
Can you disrupt a single user? Many users?
Can you extract value or cause irreversible damage?
Which assumptions would you exploit rather than bypass?
These attacks are rarely isolated or obvious. They typically emerge from unexpected interactions between otherwise “correct” behaviors. Creativity grounded in deep systemic understanding is why experience matters.
Why methodology matters
Audit reports are easy to crank out. Confidence is harder to earn.
A long findings list doesn’t imply safety, and “passing an audit” doesn’t mean a system is resilient. What matters is whether the audit process was actually capable of uncovering the issues that matter.
This methodology reflects how we audit complex systems at Hypotenuse Labs: senior-led, assumption-driven, and grounded in real-world behavior rather than checklist compliance.
If you’re building systems where failure is expensive, methodology isn’t a detail. It’s the difference between surface-level assurance and genuine confidence.
This is the most nebulous part of the job—because a big part of being a good auditor is simply being good at breaking things. It’s last because by the time you truly understand the protocol, you’ve usually started to see attack ideas naturally. Attacks aren’t always simple; they can require multi-step interactions across “correct” behaviors. At this point you should know the protocol well enough to step into the attacker’s shoes. If you were attacking the protocol, how would you do it? Is it possible? Can you disrupt another user’s experience? Can you disrupt many users? Can you steal funds? Do your best here—this part is more art than science.
Authored by Xiangan He, Senior Software Developer @ Hypotenuse Labs