The recent growth in social media and the social platform has increased the amount of applications. The automatic classification of age and gender has become a key cause for an enormous growth of use cases. However, if we compare the performance in terms of accuracies of the tasks related to face recognition to current methods used on real-world images, it is lacking considerably. The project we are going to work with is based on learning representations to classify the age and gender by using deep-convolutional neural networks. Ultimately, the algorithm will be able to detect the age and gender categories of the input image.
|Language||Intel® Distribution for Python*|
|Frameworks||Intel® optimization for Caffe*, Intel® optimization for TensorFlow*, and Keras|
|Libraries||OpenCV, NumPy, SciPy, scikit-learn, matplotlib|
|Hardware||Intel® Core™ i7 processor with 16 GB RAM (model: HP envy17t-s000cto*), Intel® AI DevCloud|
- The recent growth in social media and the social platform has increased the amount of applications.
- The automatic classification of age and gender has become relevant for an increasing request of applications.
- The comparison of performance that is reported for the related tasks of face recognition to the current methods used on real-world images is lacking considerably.
The problem that we are trying to solve is finding the age group and the gender of the person from an image.
Input: Face of a person
Output: Age and gender group for the input image
Scope: Classification into respective age group (rather than regression) and gender.
- Age detection plays a key role in many fields like multimedia retrieval and human machine interaction.
- Vocabulary used to address people changes according to the age groups very often.
- Gender identification is one of the major components for developing gender-dependent acoustic modules for speech recognition.
- Salutations and grammar rules of languages vary from one gender to the other.
Some of the past experiments in research for age detection:
- Calculating ratios between different measurements of localized facial features.
- As the above technique need accurate localization of facial features, this technique is completely unsuitable especially for the images that generally appear on social platforms.
- Combined 3D structure of the head and image intensities were used.
Some of the past experiments in research for gender detection:
- More recently, Weber's local texture descriptor was used, which has produced the highest accuracies on the FERET benchmark.
- However, the FERET dataset was collected under completely controlled conditions and therefore is not a reliable method for real-life images.
Dataset: Adience Benchmark
- The Adience Dataset completely comprised of the images that are uploaded to Flickr* directly from smart phones.
- Because of the source the images in the dataset come from, they are highly unconstrained and address many real-world challenges.
- The complete Adience collection comprises approximately 26,000 images. The detailed information can be seen in the table below.
Overview of dataset
The following images are from the dataset.
There are three convolutional and two fully connected layers of the following types:
- 96 filters of size 3x7x7
- 256 filters of size 96x5x5
- 384 filters of size 256x3x3
Each layer was followed by rectified linear unit (ReLu) with a max pooling layer that takes a maximum of 3x3 with stride 2.
Fully connected layers
The output from the convolutional layers is received by the first fully connected layer with 512 neurons followed by ReLu, which is forwarded to second fully connected layer with similar architecture. The final fully connected layer is mapped to the class labels.
Architecture Credits: G. Levi and T. Hassner. Age and gender classification using convolutional neural networks. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) workshops, June 2015.
Training and testing
- Data augmentation: We took a random crop of 227x227 pixels from the 256x256 input image and randomly mirrored it in each forward-backward training pass.
- Cross validation: The dataset was divided into five subject-exclusive folds, we further divided each of those folds into males and females. All of the sub-folds coming from the 5th original fold were separated as test data. Three folds were used as the training set while the 4th fold was used as the validation set.
- Initialization: The network was trained from scratch. The weights in all layers were initialized with random values from a zero mean Gaussian with standard deviation of 0.01.
- Network training: Training was performed using Stochastic Gradient Descent having a batch size of 50. The initial learning rate was 1e-3, reduced to 5e-4 after every 10,000 iterations.
Gender estimation results on Adience benchmark
The following table gives accuracies for gender detection using different architectures and frameworks.
|Above Architecture||Caffe*||85.89 +/- 4.2|
|Above Architecture||TensorFlow*||87.24 +/- 3.7|
|AlexNet||Caffe||92.57 +/- 1.5|
|AlexNet||TensorFlow||93.42 +/- 2.7|
Age estimation results on Adience benchmark
The following table gives accuracies for age detection using different architectures and frameworks.
|Above Architecture||Caffe*||57.7 +/- 14.3|
|Above Architecture||TensorFlow*||55.43 +/- 12.43|
|AlexNet||Caffe||62.46 +/- 5.6|
|AlexNet||TensorFlow||61.37 +/- 11.27|
Training time periods: (above architecture)
|Framework||Hardware Used||Time Taken for Training|
|Caffe*||Intel® Core™ i7 processor with 16 GB RAM|
|approximately 42 hours|
|Caffe||Intel® AI DevCloud||approximately 6 hours|
|TensorFlow*||Intel Core i7 processor with 16 GB RAM|
|approximately 40 hours|
|TensorFlow||Intel AI DevCloud||approximately 6 hours|
Conclusion and Future Work
Despite the fact that numerous past techniques have tended to the issues of age and gender classification up to this point, much of this work has concentrated on obliged pictures taken in lab settings. Such settings don't sufficiently reflect appearance varieties regular to this present reality pictures in social sites and online storehouses.
Considering the related issue of face acknowledgment, we investigated how well-profound convolutional neural networks perform on these undertakings utilizing web information. The above experimented system is "shallow" compared with a portion of the current system structures, along these lines decreasing the quantity of its parameters and the shot for overfitting.
Two important conclusions that can be made from our experiments and results are as follows. Firstly, convolutional neural networks can be used for a much better performance on age and gender classification problem. Next, simplicity of our model implies that more elaborate systems using more training data may be capable of substantially improving results beyond those reported here.
In future, we hope to create a new network by making some changes to the proposed architecture so that accuracies for age classification can be increased. We would also like to deploy the trained model in AWS* DeepLens and make it real time.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.