Data Marketplace

Use verified, licensed data with confidence. You can download right away or check the data through inquiry.

A total of 236 datasets
  • Frontier DataText

    Multilingual Chain-of-Thought Reasoning Text Dataset

    A multilingual chain-of-thought reasoning dataset built from complex problems requiring step-by-step decomposition and coherent answer generation, with AI-generated drafts reviewed by expert-level annotators.

  • Frontier DataText

    Expert CoT Text Dataset

    An expert chain-of-thought text dataset built from expert verbal reasoning to support LLM training for step-by-step reasoning.

  • Frontier DataText

    Doctoral Exam Questions and Solutions Text Dataset

    A high-difficulty text dataset built from doctoral-level exam questions and solutions to support LLM training for expert reasoning and problem solving.

  • Frontier DataText

    Domain-Specific Benchmark Dataset

    A multi-turn benchmark dataset built by benchmarking BFCL to evaluate agent action performance across finance, legal, medical, manufacturing, and defense domains.

  • Frontier DataText

    Safety Response Multi-turn Dataset

    A multi-turn conversational dataset designed to evaluate model response capabilities against major safety risk categories and attack patterns.

  • Pre-training DataVideo

    Physical AI: Human-Object Interaction Video Dataset

    A video dataset collected for training Physical AI models in manufacturing environments. Includes human-object manipulation footage along with structured annotations such as trajectory and mesh data.

  • Pre-training DataVideo

    AI-Generated Video with Frame-level Caption Dataset

    A dataset consisting of AI-generated videos sampled at 1fps with frame-level scene description captions. Applicable for video understanding and multimodal model training.

  • Pre-training DataImage

    Automotive Body Defect Detection Image Dataset

    A labeled image dataset capturing surface defects on automotive body panels, including scratches, dents, and paint irregularities. Built to support vision inspection model training in automotive manufacturing processes.

  • Pre-training DataImage

    Jailbreak Image Dataset (Red-teaming)

    A red-teaming image dataset designed for AI safety evaluation. Includes adversarial inputs, harmful content bypass attempts, and policy-violating scenarios to support model safety assessment and AI governance enhancement.

  • Pre-training DataAudio

    Korean Dialect Speech Dataset for Manufacturing & Shipbuilding Work Instructions

    A single-turn Korean speech dataset featuring shipbuilding and manufacturing work instructions, including regional dialects.

Check out the details of Snowflakes, Flitto's core dataset.

Explore Flitto's high-precision datasets, structured for seamless integration. Elevate your AI models and business decision-making with high-quality, ready-to-integrate data.