Deep-pwning is modularized into several components to minimize code repetition. Because of the vastly different nature of potential classification tasks, the current iteration of the code is optimized for classifying images and phrases (using word vectors).
These are the code modules that make up the current iteration of Deep-pwning:
The drivers are the main execution point of the code. This is where you can tie the different modules and components together, and where you can inject more customizations into the adversarial generation processes.
This is where the actual machine learning model implementations are located. For example, the provided
lenet5model definition is located in the
lenet5.py. It defines the network as the following:
-> Convolutional Layer 1
-> Max Pooling Layer 1
-> Convolutional Layer 2
-> Max Pooling Layer 2
-> Dropout Layer
-> Softmax Layer
LeCun et al. LeNet-5 Convolutional Neural Network
- Adversarial (advgen)
This module contains the code that generates adversarial output for the models. The
run()function defined in each of these
advgenclasses takes in an
input_dict, that contains several predefined tensor operations for the machine learning model defined in Tensorflow. If the model that you are generating the adversarial sample for is known, the variables in the input dict should be based off that model definition. Else, if the model is unknown, (black box generation) a substitute model should be used/implemented, and that model definition should be used. Variables that need to be passed in are the input tensor placeholder variables and labels (often refered to as
x-> input and
y_-> labels), the model output (often refered to as
y_conv), and the actual test data and labels that the adversarial images will be based off of.
Miscellaneous utilities that don’t belong anywhere else. These include helper functions to read data, deal with Tensorflow queue inputs etc.
These are the resource directories relevant to the application:
Tensorflow allows you to load a partially trained model to resume training, or load a fully trained model into the application for evaluation or performing other operations. All these saved ‘checkpoints’ are stored in this resource directory.
This directory stores all the input data in whatever format that the driver application takes in.
This is the output directory for all application output, including adversarial images that are generated.
Please follow the directions to install tensorflow found here https://www.tensorflow.org/versions/r0.8/get_started/os_setup.html which will allow you to pick the tensorflow binary to install.
$ pip install -r requirements.txt
Execution Example (with the MNIST driver)
To restore from a previously trained checkpoint. (configuration in config/mnist.conf)
$ cd dpwn
$ python mnist_driver.py --restore_checkpoint
To train from scratch. (note that any previous checkpoint(s) located in the folder specified in the configuration will be overwritten)
$ cd dpwn
$ python mnist_driver.py
- Implement saliency graph method of generating adversarial samples
defensemodule to the project for examples of some defenses proposed in literature
- Upgrade to Tensorflow 0.9.0
- Add support for using pretrained word2vec model in
- Add SVM & Logistic Regression support in
models(+ example that uses them)
- Add non-image and non-phrase classifier example
- Add multi-GPU training support for faster training speeds
Note that dpwn requires Tensorflow 0.8.0. Tensorflow 0.9.0 introduces some
(borrowed from the amazing Requests repository by kennethreitz)
- Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
- Fork the repository on GitHub to start making your changes to the master branch (or branch off of it).
- Write a test which shows that the bug was fixed or that the feature works as expected.
- Send a pull request and bug the maintainer until it gets merged and published. 🙂 Make sure to add yourself to
There is so much impressive work from so many machine learning and security researchers that directly or indirectly contributed to this project, and inspired this framework. This is an inconclusive list of resources that was used or referenced in one way or another:
- Szegedy et al. Intriguing properties of neural networks
- Papernot et al. The Limitations of Deep Learning in Adversarial Settings
- Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
- Goodfellow et al. Explaining and Harnessing Adversarial Examples
- Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
- Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification
- Nguyen et al. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
- Xu et al. Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers
- Kantchelian et al. Evasion and Hardening of Tree Ensemble Classifiers
- Biggio et al. Support Vector Machines Under Adversarial Label Noise
- Biggio et al. Poisoning Attacks against Support Vector Machines
- Papernot et al. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
- Ororbia II et al. Unifying Adversarial Training Algorithms with Flexible Deep Data Gradient Regularization
- Jin et al. Robust Convolutional Neural Networks under Adversarial Noise
- Pang et al. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales
- Goodfellow et al. Deep Learning Adversarial Examples – Clarifying Misconceptions
- WildML Implementing a CNN for Text Classification in Tensorflow