A number of computer vision problems such as facial age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem. Such a learning problem is made difficult due to the following reasons: i) sparse and imbalanced training data, and ii) large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, a novel cumulative attribute concept is introduced here for learning a regression model when only sparse and imbalanced data are available.

This paper presents a multi-output regression model for crowd counting in public scenes. Existing counting by regression methods either learn a single model for global counting, or train a large number of separate regressors for localised density estimation. In contrast, our single regression model based approach is able to estimate people count in spatially localised regions and is more scalable without the need for training a large number of regressors proportional to the number of local regions.