Bulk Analysis¶
Androguard is capable of analysing probably thousand to millions of APKs.
It is also possible to use tools like multiprocessing for this job and
analyse APKs in parallel.
Usually you want to put the results of your analysis somewhere, for example a
database or some log file.
It is also possile to use Session
objects for this
job, but you should be aware of some caveats!
1) Sessions take up a lot of space per APK. The resulting Session object can be more than 30 times larger than the original APK 2) Sessions should not be used to add unrelated APKs, again the size will blow up and you need to figure out which APK belongs to where
So the rule of thumb would be to not use Sessions for bulk analysis, only if you
know what you are doing.
Another way is to pickle the resulting objects.
As the DalvikVMFormat
are already stored
in the Analysis
object, there is no
need to pickle them separately.
Thus, it is only required to save the
APK
and
Analysis
object.
This is an example how to obtain the two objects and saving them to disk:
import sys
from pickle import dump
from hashlib import sha512
from androguard.misc import AnalyzeAPK
a, _, dx = AnalyzeAPK('examples/tests/a2dp.Vol_137.apk')
sha = sha512()
sha.update(a.get_raw())
with open("{}_apk.p".format(sha.hexdigest()), "wb") as fp:
dump(a, fp)
with open("{}_analysis.p".format(sha.hexdigest()), "wb") as fp:
# It looks like here is the recursion problem...
sys.setrecursionlimit(50000)
dump(dx, fp)
But the resulting files are very large, especially the Analysis package:
$ du -sh examples/tests/a2dp.Vol_137.apk
808K examples/tests/a2dp.Vol_137.apk
$ du -sh *.p
31M 24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_analysis.p
852K 24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_apk.p
But it is possible to compress both files to save disk space:
import sys
import lzma
from pickle import dump
from hashlib import sha512
from androguard.misc import AnalyzeAPK
a, _, dx = AnalyzeAPK('examples/tests/a2dp.Vol_137.apk')
sha = sha512()
sha.update(a.get_raw())
with lzma.open("{}_apk.p.lzma".format(sha.hexdigest()), "wb") as fp:
dump(a, fp)
with lzma.open("{}_analysis.p.lzma".format(sha.hexdigest()), "wb") as fp:
# It looks like here is the recursion problem...
sys.setrecursionlimit(50000)
dump(dx, fp)
which results in much smaller files:
$ du -sh *.lzma
4,5M 24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_analysis.p.lzma
748K 24a62690a770891a8f43d71e8f7beb24821d46a75e017ef4f4e6a04624105466621c96305d8e86f9900042e3ef1d5806a5d9ac873bebdf798483790446bd275e_apk.p.lzma
Obviously, as the APK is already packed, there is not much to compress anymore.
Using AndroAuto¶
Another method is to use the framework AndroAuto. AndroAuto allows you to write small python classes which implement some method, which are then called by AndroAuto at certain points in time. AndroAuto is capable of analysing thousands of apps, and uses threading to distribute the load to multiple CPUs. The results of your analysis can then be dumped to disk, or you could write your own method of saving them - for example, in a database.
The two key components are a Logger, for example
DefaultAndroLog
and an Analysis Runner,
for example DefaultAndroAnalysis
.
Both are passed via a settings dictionary into
AndroAuto
.
Next, a minimal working example is given:
from androguard.core.analysis import auto
import sys
class AndroTest(auto.DirectoryAndroAnalysis):
def __init__(self, path):
super(AndroTest, self).__init__(path)
self.has_crashed = False
def analysis_app(self, log, apkobj, dexobj, analysisobj):
# Just print all objects to stdout
print(log.id_file, log.filename, apkobj, dexobj, analysisobj)
def finish(self, log):
# This method can be used to save information in `log`
# finish is called regardless of a crash, so maybe store the
# information somewhere
if self.has_crashed:
print("Analysis of {} has finished with Errors".format(log))
else:
print("Analysis of {} has finished!".format(log))
def crash(self, log, why):
# If some error happens during the analysis, this method will be
# called
self.has_crashed = True
print("Error during analysis of {}: {}".format(log, why), file=sys.stderr)
settings = {
# The directory `some/directory` should contain some APK files
"my": AndroTest('some/directory'),
# Use the default Logger
"log": auto.DefaultAndroLog,
# Use maximum of 2 threads
"max_fetcher": 2,
}
aa = auto.AndroAuto(settings)
aa.go()
In this example, the analysis_app()
function is used to get all created objects
of the analysis and just print them to stdout.
More information can be found in the documentation of AndroAuto
.