바로가기 메뉴 본문 바로가기 주메뉴 바로가기
  • 04-1aHave you explained the data attributes before and after cleansing?
    • Data cleansing is a stage where data are selected and processed to create training data before labeling. Users who only use cleansed data cannot accurately identify the attributes of raw data. Therefore, data attributes before and after cleansing and any related information for the cleansing in consideration of possibly collecting additional data in the future must be explained.

    • Generally, data cleansing can be performed by excluding or converting parts of the data according to predefined rules using open-source tools, or by visual inspection. You can analyze data attributes by visualizing the cleansed data.

    • If you have collected the raw data yourself, provide information about the purpose of building the data, the type of data, the criteria for cleansing (e.g. domain characteristics), and the cleansing tool. The following are examples of data cleansing standards for each data type.
    Image data: Image size, aspect ratio, resolution, imaging equipment, personal data processing, copyright, etc.
    Text data: Amount of text, grammatical accuracy in text, appropriateness in the content of the text, relevancy to the topic, etc.
    Audio data: Volume, accuracy in pronunciation, noise and static, inaudible (based on acceptance range), personal data, copyright, etc.