A while ago I played around with android applications, trying to figure out new ways I could make my life easier as a vulnerability researcher. I stumbled upon Caleb Fenton’s amazing Simplify. This neat framework executes application’s code, trying to make it more human-approachable. This allows researchers like me understand obfuscated code better. Going through Simplify’s code was a bit shocker, as I wasn’t a big fan of java frameworks, not to talk about code analysis done in java.
So being the stubborn fella I am, I tried to implement my own version of smali code analysis with Python. Dalvik, which is Android’s Java Virtual Machine implementation uses smali code as it’s assembly-level opcodes.
The following regular expressions take care of extracting data from any relevant opcodes for my needs.
For class deceleration extraction:
This results in a result group containing only one member – the name of the class.
For method deceleration extraction:
Quite a fancy regex, but this extracts into a single result group: methods’ name, methods’ parameters, methods’ return value, and all of the methods’ data declared inside a single class.
For method invocation extraction:
When applying this regex on the captured method’s data as seen above, you get all of the called methods’ parameters values, methods’ object type, methods’ name, methods’ parameter object types and the methods’ return object type.
For method return deceleration extraction:
move-result opcode moves the result of the last invoked method inside a register. This allows us to track the returned values from various method invocations inside a single smali class.
The whole python package and it’s features will be discussed in a future blog post, but for now, the complete code extracting all of the data above from a set of smali files can be found here: https://github.com/0rka/smalidroid/blob/master/parser.py