Bulk Analysis

Androguard is capable of analysing probably thousand to millions of APKs. It is also possible to use tools like multiprocessing for this job and analyse APKs in parallel. Usually you want to put the results of your analysis somewhere, for example a database or some log file. It is also possile to use Session objects for this job, but you should be aware of some caveats!

1) Sessions take up a lot of space per APK. The resulting Session object can be more than 30 times larger than the original APK 2) Sessions should not be used to add unrelated APKs, again the size will blow up and you need to figure out which APK belongs to where

So the rule of thumb would be to not use Sessions for bulk analysis, only if you know what you are doing. Another way is to pickle the resulting objects. As the DalvikVMFormat are already stored in the Analysis object, there is no need to pickle them separately. Thus, it is only required to save the APK and Analysis object.

This is an example how to obtain the two objects and saving them to disk:

import sys
from pickle import dump
from hashlib import sha512
from androguard.misc import AnalyzeAPK

a, _, dx = AnalyzeAPK('examples/tests/a2dp.Vol_137.apk')

sha = sha512()

sha.update(a.get_raw())

with open("{}_apk.p".format(sha.hexdigest()), "wb") as fp:
    dump(a, fp)

with open("{}_analysis.p".format(sha.hexdigest()), "wb") as fp:
    # It looks like here is the recursion problem...
    sys.setrecursionlimit(50000)
    dump(dx, fp)

But the resulting files are very large, especially the Analysis package:

$ du -sh examples/tests/a2dp.Vol_137.apk
808K examples/tests/a2dp.Vol_137.apk

$ du -sh *.p
31M  24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_analysis.p
852K 24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_apk.p

But it is possible to compress both files to save disk space:

import sys
import lzma
from pickle import dump
from hashlib import sha512
from androguard.misc import AnalyzeAPK

a, _, dx = AnalyzeAPK('examples/tests/a2dp.Vol_137.apk')

sha = sha512()

sha.update(a.get_raw())

with lzma.open("{}_apk.p.lzma".format(sha.hexdigest()), "wb") as fp:
    dump(a, fp)

with lzma.open("{}_analysis.p.lzma".format(sha.hexdigest()), "wb") as fp:
    # It looks like here is the recursion problem...
    sys.setrecursionlimit(50000)
    dump(dx, fp)

which results in much smaller files:

$ du -sh *.lzma
4,5M 24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_analysis.p.lzma
748K 24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_apk.p.lzma

Obviously, as the APK is already packed, there is not much to compress anymore.

Using AndroAuto

Another method is to use the framework AndroAuto. AndroAuto allows you to write small python classes which implement some method, which are then called by AndroAuto at certain points in time. AndroAuto is capable of analysing thousands of apps, and uses threading to distribute the load to multiple CPUs. The results of your analysis can then be dumped to disk, or you could write your own method of saving them - for example, in a database.

The two key components are a Logger, for example DefaultAndroLog and an Analysis Runner, for example DefaultAndroAnalysis. Both are passed via a settings dictionary into AndroAuto.

Next, a minimal working example is given:

from androguard.core.analysis import auto
import sys

class AndroTest(auto.DirectoryAndroAnalysis):
    def __init__(self, path):
       super(AndroTest, self).__init__(path)
       self.has_crashed = False

    def analysis_app(self, log, apkobj, dexobj, analysisobj):
        # Just print all objects to stdout
        print(log.id_file, log.filename, apkobj, dexobj, analysisobj)

    def finish(self, log):
       # This method can be used to save information in `log`
       # finish is called regardless of a crash, so maybe store the
       # information somewhere
       if self.has_crashed:
          print("Analysis of {} has finished with Errors".format(log))
       else:
          print("Analysis of {} has finished!".format(log))

    def crash(self, log, why):
       # If some error happens during the analysis, this method will be
       # called
       self.has_crashed = True
       print("Error during analysis of {}: {}".format(log, why), file=sys.stderr)

settings = {
    # The directory `some/directory` should contain some APK files
    "my": AndroTest('some/directory'),
    # Use the default Logger
    "log": auto.DefaultAndroLog,
    # Use maximum of 2 threads
    "max_fetcher": 2,
}

aa = auto.AndroAuto(settings)
aa.go()

In this example, the analysis_app() function is used to get all created objects of the analysis and just print them to stdout.

More information can be found in the documentation of AndroAuto.