Overview
SurveyAnalytica’s Data Management module lets you upload, organize, and prepare datasets for analytics, machine learning, and campaign workflows. Whether you are importing customer records from a CSV, uploading images for computer vision, or connecting to a live data stream, the Data section gives you a guided wizard experience from upload through to analysis.
Navigating to Data Management
From the main navigation sidebar, click Data. You will see the Data Listing view, which displays all datasets in your workspace. Each row shows the dataset name, type, data source, last modified date, and size. You can search, filter, sort, and paginate through your datasets.
Creating a New Dataset
To create a new dataset, click the + New button in the Data listing view. This opens the Dataset Wizard, a guided multi-step process that walks you through the entire dataset setup.
Step 1: Upload
The upload step lets you drag and drop files or browse to select them from your computer. Supported file formats include:
- Tabular data: CSV, Excel (.xlsx, .xls), Parquet, Avro, JSON
- Images: PNG, JPG, JPEG, GIF, BMP, WebP
- Documents: PDF
- Archives: ZIP (automatically extracted after upload)
Key upload features:
- Resumable uploads: Large files are uploaded in chunks with progress tracking per file. If a chunk fails, it is retried automatically.
- Multi-file upload: Upload multiple files at once. Each file shows its own progress bar with chunk-level detail.
- Auto-detection: The system automatically detects the data type (TABLE, IMAGE, etc.) from your uploaded files and suggests a dataset name based on the first file.
- ZIP handling: ZIP archives are extracted and analyzed after upload. For image classification datasets, folder structure is preserved and used for class labeling.
- Image thumbnails: When uploading images, a thumbnail preview grid is displayed so you can verify the files before proceeding.
Direct Cloud Upload (Advanced)
For very large datasets (over 5 GB), you can use the Advanced: Direct Cloud Upload option. This generates a cloud storage dropzone path and a command you can run from your terminal to upload files directly. After uploading, click Scan Dropzone to detect the uploaded files.
After uploading, configure your dataset identity:
- Dataset Name: Auto-suggested from the first uploaded file, or enter your own descriptive name.
- Description: Optional text describing the dataset purpose or contents.
- Team / Workspace: Select which workspace or team this dataset belongs to. If you belong to only one team, it is automatically selected.
- Data Type: Auto-detected from your files. Available types include TABLE, IMAGE, VIDEO, AUDIO, TEXT, FILE, and OTHER.
Under Advanced Settings, you can set:
- Retention Period (months): How long to keep this data before archiving. Default is 6 months.
- Archival Period (months): How long to keep archived data before deletion. Default is 6 months.
Step 3: Review and Save
Review a summary of your dataset configuration: name, description, team, data type, total files uploaded, and retention settings. Click Save and Continue to persist the dataset. After saving, the system navigates you to the data preview step.
Dataset Wizard — Post-Upload Steps
After the initial upload and save, the Dataset Wizard provides additional steps that depend on whether your data is tabular or image-based.
Tabular Datasets (CSV, Excel, Parquet, JSON, Avro)
- Upload — Upload your files.
- Prepare — Preview the schema. The system reads column names and types from your uploaded files. Review the detected schema, see column-level quality metrics, and adjust data types.
- Refine (Clean and Transform) — Apply data transformations through a visual pipeline. Available operations include calculated fields, data cleaning, and transformation steps.
- Publish (Validate and Publish) — Validate data integrity and publish the dataset for analysis.
- Analyze — Opens the full Analytics dashboard directly within the data context.
Image Datasets
- Upload — Upload images individually, in bulk, or as a ZIP archive.
- Prepare (Label) — Assign labels to images for classification. Labels can come from folder structure, CSV mapping files, or manual tagging.
- Refine (Curate) — Review image quality, detect outliers, exclude problematic images, and balance class distributions.
- Publish (Split and Export) — Generate train/validation/test splits with configurable ratios. Export in multiple formats including organized folders and CSV manifests.
- Analyze — View analytics on your image dataset including class distributions and tag statistics.
Schema Definition and Mapping
When creating a dataset through the schema editor, you define columns with the following properties:
- Column Name and Description
- Column Type: DATETIME, DATE, BOOL, BYTES, GEOGRAPHY, INTERVAL, INT64, NUMERIC, BIGNUMERIC, FLOAT64, RANGE, STRING, TIMESTAMP, TIME, STRUCT, ARRAY
- Column Properties: isPrimaryKey, isUnique, isNullable, isAutoIncrement, isEncrypted, isAnonymized, isClassified, isCategorized, isNormalized, isStandardized
Data Storage
All tabular datasets are stored in a scalable cloud data warehouse for high-performance analytics. Each dataset is organized under your workspace for easy management and access control.
Data Listing Features
- Edit: Open the dataset in the wizard to modify configuration or schema.
- Clone: Create a copy of the dataset and its configuration.
- Lock / Unlock: Prevent accidental modifications to a dataset.
- Analyze: Jump directly to the analytics dashboard for this dataset.
- Share: Open the analytics dashboard view for sharing.
- Delete: Permanently remove the dataset. Supports single and bulk deletion.
Auto-Save and Version History
The Dataset Wizard includes auto-save functionality that persists your work automatically. Undo/redo history is maintained up to 50 versions so you can revert changes.
Tips and Best Practices
- Use descriptive dataset names so they are easy to find in the listing.
- For image classification datasets, organize images into folders by class name before uploading as a ZIP — the system will automatically detect class labels from the folder structure.
- Set appropriate retention and archival periods based on your data governance requirements.
- Use the Direct Cloud Upload option for datasets larger than 5 GB to avoid browser upload limitations.
- Always review the auto-detected schema before publishing to ensure column types are correct.