We propose supervised spatial attention that employs a heatmap generator for instructive feature learning.•We formulate a rectified Gaussian scoring function to generate informative heatmaps.•We present scale-aware layer attention that eliminates redundant information from pyramid features.•A voting strategy is designed to produce more reliable classification results.•Our face detector achieves encouraging performance in accuracy and speed on several benchmarks. Modern anchor-based face detectors learn discriminative features using large-capacity networks and extensive anchor settings. In spite of their promising results, they are not without problems. First, most anchors extract redundant features from the background. As a consequence, the performance improvements are achieved at the expense of a disproportionate computational complexity. Second, the predicted face boxes are only distinguished by a classifier supervised by pre-defined positive, negative and ignored anchors. This strategy may ignore potential contributions from cohorts of anchors labelled negative/ignored during inference simply because of their inferior initialisation, although they can regress well to a target. In other words, true positives and representative features may get filtered out by unreliable confidence scores. To deal with the first concern and achieve more efficient face detection, we propose a Heatmap-assisted Spatial Attention (HSA) module and a Scale-aware Layer Attention (SLA) module to extract informative features using lower computational costs. To be specific, SLA incorporates the information from all the feature pyramid layers, weighted adaptively to remove redundant layers. HSA predicts a reshaped Gaussian heatmap and employs it to facilitate a spatial feature selection by better highlighting facial areas. For more reliable decision-making, we merge the predicted heatmap scores and classification results by voting. Since our heatmap scores are based on the distance to the face centres, they are able to retain all the well-regressed anchors. The experiments obtained on several well-known benchmarks demonstrate the merits of the proposed method.