The Challenges

  • Keenly detailed annotation to train ML model for image and video analysis
  • Complex manual task for capturing intricate details such as objects, actions and environmental setting
  • Consistency across video transcription and image caption
  • High quality audio mapping, speaker identification for video transcription
  • Specific datasets in standard formation for quick integration in ML model

The Solution

  • Created a customized roadmap to define attributes and metadata for each image and video
  •  Ensured uniformity and precision for setting image and video annotation guidelines
  • Manually reviews and captioned each image (descriptions, action, environment settings)
  • Assigned team for audio to text transcription that includes speaker identification
  • Implemented QA process featuring multiple reviews