바로가기 메뉴 본문 바로가기 주메뉴 바로가기
  • 05-1aHave you checked any possible errors by visualizing the overall training data distribution?
    • You can find additional input errors by visualizing all data distribution after data cleansing, one of the data pre-processing processes. Data manually inputted by nurses and specialists, such as patient medical records, do not contain data omissions such as null or N/A; however, human errors such as incorrect entries can cause outliers, which can be verified by visualization of data distribution.

    • In small datasets, outliers and errors can easily be identified and managed, but when utilizing and managing large quantities of data, such as over one million patient medical records, visualization enables easy identification of errors that may occur due to human error. Visualizing data distribution can also be helpful in exploring and understanding data for AI model training.

    • There are different techniques for visualizing data distribution depending on data attributes. These techniques are distribution charts that visualize data distribution using mean, variance, and deviation of the entire data; categorical charts that visualize categorical data; and matrix charts that visualize data in a 2D array.