The computer ability to detect human being by computer vision is still being improved both in accuracy or computation time. In low-lighting condition, the detection accuracy is usually low. This research uses additional information, besides RGB channels, namely a depth map that shows objects' distance relative to the camera. This research integrates Cascade Classifier (CC) to localize the potential object, the Convolutional Neural Network (CNN) technique to identify the human and nonhuman image, and the Kalman filter technique to track human movement. For training and testing purposes, there are two kinds of RGB-D datasets used with different points of view and lighting conditions. Both datasets have been selected to remove images which contain a lot of noises and occlusions so that during the training process it will be more directed. Using these integrated techniques, detection and tracking accuracy reach 77.7%. The impact of using Kalman filter increases computation efficiency by 41%.