바로가기 메뉴 본문 바로가기 주메뉴 바로가기
  • 06-4Have you conducted sampling to prevent bias in data?
    Determine applicability: Consider this question if there is a possibility of class imbalance while using medical data, or if class imbalance is confirmed, and determine if the requirement has been satisfied.

    • According to studies, there is an imbalance between the majority class (negative or healthy patients) and the minority class (positive or ill patients) in medical data [30]. The resulting misclassification (false negative and false positive) has a direct impact on the result of the medical diagnosis, so care should be given to the data imbalance.

    • Sampling techniques can be used to set balance in data distribution between classes. Sampling is a technique of creating samples by extracting data from a population based on certain criteria. Samples extracted by certain criteria should represent the distribution of the population as well as prevent bias caused by a class imbalance in the population.

    • Some of the best techniques include undersampling and oversampling. But due to the nature of the healthcare sector in which collecting data itself is challenging, oversampling—which increases the data of the minority category to be the same as the majority category—can be used to prevent bias.