Unmarshalling JSON structs containing interfaces in Go

As far as developing goes, Go is the most fun programming language I’ve used until now. If you are not familiar with Go, you should really check it out. However, all languages have their own barriers and obstacles. This blog post describes the most recent one I’ve had to tackle: Unmarshalling JSON structs that contain interfaces as their attributes.

What’s the problem you may ask? Well, if you have an interface (we won’t be talking about empty interfaces this time) encoding/json doesn’t handle decoding those for you. You need to be very specific on which types you pass to the decoder. But we want to be able to do so, so how do we manage to do that?

Lets build a simple TV guide. The guide consists of some technical information about the guide itself like company’s name, legal notices, ID and so forth. The guide also contains the library itself, which is conceptually everything you can hit play on: movies, series, music, radio etc. We’ll start with defining what is actually a Playable is:

That will be our interface for playing various types of items inside the library. Like we said, we want the library to contain everything we can hit the play button on. Next we will need a TVGuide struct to hold all of our data inside:

As you can see, we already lay the groundwork for JSON decoding. Also, we used type aliasing and define a slice of Playable as Playables. We do that because, turns out that functions declared in Go don’t go well with interface types as receivers. Now we need something to implement that interface. Hence, the Movie struct:

We create the Movie struct and instantly implement the Playable interface. Here too, we define our struct attributes as JSON decoding compatible.

Just one more thing and we are done. We need to implement UnmarshalJSON for Playables in order to manually decide, depending on the type which JSON field to parse, into what implementation of a Playable should we decode to. The following code does exactly that:

We start with unmarshalling the passed data into a map of strings as keys and pointers to json.RawMessage as value.  This way we ensure we can iterate over all the fields received under the library JSON field and check for what Playable implementation they should decode to. Because we support only Movie implementations for now, we will parse only movies.

We iterate through all the library fields and enter into the movies JSON. Because we want to decode multiple movies, that value is a list of one or more JSONs. Every one of those matches exactly to what we defined in Movies struct. This describes exactly the JSON we require:

The way we receive those types of JSON string is entirely up to you, Whether it is by a web-service, directly from files, receiving from a raw socket and so forth. We will skip that part because it is entirely out of scope for this post. But there is a common way to handle all of those. This pseudo code completes this blog post by unmarshalling the above JSON string and printing it for us to be joyful it all works.

Running this will result in the required output:

{0rkaTV NO RUNNING WITH SCISSORS 0x1337H4X0R [0xc0420505e0 0xc042050640]}

The complete code is available on gist and also on Go’s playground.

Dalvik’s smali static code analysis with Python, Part II

If you still haven’t read the first blog post of this series, I highly recommend you do. You can grab it here: Dalvik’s smali static code analysis with Python.

Without further ado, lets dive straight into business. Continuing the topic of static data extraction from smali files such as classes info, method deceleration and invocation. Why only extract these types of data and not something like other glorious types of opcodes? Well, It would take a really long time and I needed a quick solution for my main question – “What could have called the function I’m currently reading?”

When dealing with regular code and utilizing some advanced IDE’s you might as well click on the “call hierarchy” button, if you have one, and check out all the method invocations done all the way until the method you are standing on. But with obfuscated code, it is much harder for the IDE’s to track these calls and tell you what’s the call hierarchy.

Personally I found that when dealing with a large and obfuscated code base, all of the major IDE’s cant cope with displaying call hierarchies. When dumping all of our extracted data into a database, you could by theory track method calls and make queries in order to recursively print call hierarchies. So that’s exactly what we’ll do.

We will start with creating the database and all the relevant schemes for our call tracking. We will use python’s sqlite3 library because we don’t need crazy amount of processing power, at least for this stage. The following present the scheme layout:

This gives us a nice start for inserting all of the extracted data to be later selected by our recursive matcher function. The id field is simply the full representation of a smali class+method. For example: ibe/a->c when ibe is the parent class, a is the child class and c is the method declared and defined inside a class. The data column contains all of the method’s smali code. It is much simpler containing this inside the database and not extracting this straight out of the smali files each time we want to access some method’s data. calling_to column is for listing all of the method invocation done inside a single method definition.

Lets try to give an example. Imagine you have the following class definition and a method definition like so:

Let’s try to deconstruct this code ourselves. What come after .class is the type of the class and the full path that the class is contained in. The .super is the super class, or the base class from which the class should derive. The .source is simply for stating in which file Java file the code should reside. The .method directive is what we are really after, and it states what method is being declared, what it’s type is, it’s parameter types, it’s return type and all of the method is defined up until the closest .end method. Also, all of the invoke- directives are for method calling. This is exactly where we extract what methods are being called.

I’ll skip the part where we scan for all smali files inside a directory and load all of the extracted data into the database. This could be done as seen in parse_smali_file method.

After understanding the most basic directives, and ignoring some for the sake of simplicity, we can start building our method matcher. Let’s assume we stumble upon a method, and we would like to know who across all the application’s code could have invoked it. After populating all of the defined methods inside the application’s smali code files, we could select from the database all of the rows containing the relevant method ID inside their calling_to column. This is achieved by:

For each method received by this select statement, apply the same statement recursively. This will result in something like this:

downloadThe full image can be downloaded from here.

To summarize it all, by applying all of the above we could achieve greater accomplishments which we would talk about extensively later on the series. The matcher function and the graph populating function can be grabbed from here.

Dalvik’s smali static code analysis with Python

A while ago I played around with android applications, trying to figure out new ways I could make my life easier as a vulnerability researcher. I stumbled upon Caleb Fenton’s amazing Simplify. This neat framework executes application’s code, trying to make it more human-approachable. This allows researchers like me understand obfuscated code better. Going through Simplify’s code was a bit shocker, as I wasn’t a big fan of java frameworks, not to talk about code analysis done in java.

So being the stubborn fella I am, I tried to implement my own version of smali code analysis with Python. Dalvik, which is Android’s Java Virtual Machine implementation uses smali code as it’s assembly-level opcodes.

The following regular expressions take care of extracting data from any relevant opcodes for my needs.

For class deceleration extraction:

This results in a result group containing only one member – the name of the class.

For method deceleration extraction:

Quite a fancy regex, but this extracts into a single result group: methods’ name, methods’ parameters, methods’ return value, and all of the methods’ data declared inside a single class.

For method invocation extraction:

When applying this regex on the captured method’s data as seen above, you get all of the called methods’ parameters values, methods’ object type, methods’ name, methods’ parameter object types and the methods’ return object type.

For method return deceleration extraction:

The move-result opcode moves the result of the last invoked method inside a register. This allows us to track the returned values from various method invocations inside a single smali class.

The whole python package and it’s features will be discussed in a future blog post, but for now, the complete code extracting all of the data above from a set of smali files can be found here: https://github.com/0rka/smalidroid/blob/master/parser.py

CSC CISO Conference talk on IoT hacking

Earlier today I took part in a conference arranged by Israeli’s pc.co.il (People & Computers) group. The main topic was IoT (Internet of Things) security concerns and what we as consumers can do about it.

My presentation talked about real case scenarios of IoT hacking, showing case studies from numerous researches and vulnerability disclosures. The presentation was uploaded to my personal GitHub account and can be found here: http://bit.ly/2tHAdmJ