Overview
SurveyAnalytica’s Advanced Segmentation module lets you divide your data into meaningful groups based on shared characteristics. The platform supports three segmentation approaches: rules-based segmentation using business logic, auto-segmentation using unsupervised machine learning algorithms, and predictive segmentation using supervised ML classifiers. Access segmentation from any analytics view by clicking the Segments tab.
Rules-Based Segmentation
Rules-based segmentation lets you define segments using explicit conditions and business rules. This approach is ideal when you know exactly how you want to categorize your data.
Creating a Segment Rule
- Navigate to the Segments view within Analytics.
- Click Create Segment and select Rules-Based.
- Give your segment a descriptive name.
- Add conditions using the condition builder.
Condition Operators
Each condition evaluates a field against a value using one of these operators:
- EQUALS / NOT_EQUALS: Exact match (case-insensitive for strings)
- CONTAINS / NOT_CONTAINS: Substring match
- STARTS_WITH / ENDS_WITH: Prefix or suffix match
- GREATER_THAN / LESS_THAN: Numeric comparison
- GREATER_THAN_OR_EQUAL / LESS_THAN_OR_EQUAL: Inclusive numeric comparison
- BETWEEN: Range check (requires a second value)
- IN / NOT_IN: List membership
- IS_EMPTY / IS_NOT_EMPTY: Null or blank check
- MATCHES_REGEX: Regular expression matching
Combining Conditions
Conditions are organized into groups. Within a group, conditions are combined using AND or OR logic. Multiple groups can themselves be combined with AND/OR logic, giving you full flexibility to express complex business rules.
Segment Statistics
After applying a segment rule, the system calculates and displays:
- Total records matching the segment
- Percentage of total dataset
- Distribution of values within the segment
Auto-Segmentation (ML Clustering)
Auto-segmentation uses unsupervised machine learning to automatically discover natural groupings in your data. The platform supports three clustering algorithms:
K-Means Clustering
The most commonly used algorithm, ideal for finding spherical clusters of similar size.
- How it works: Partitions data into K clusters by minimizing the distance between data points and their cluster center.
- Configuration: Specify the number of clusters (K) or let the system auto-detect optimal K using the elbow method and silhouette analysis.
- Scalability: Optimized to handle large datasets efficiently.
Hierarchical Clustering
Builds a tree-like hierarchy of clusters, useful for exploring data at different levels of granularity.
- How it works: Uses agglomerative (bottom-up) approach with linkage methods including ward, complete, average, and single.
- Best for: Smaller datasets where you want to explore nested cluster structures.
- Output: A dendrogram visualization showing how clusters merge at different distance thresholds.
DBSCAN (Density-Based Spatial Clustering)
Discovers clusters of arbitrary shape based on data density, automatically detecting outliers.
- How it works: Groups together points that are closely packed, marking points in low-density regions as outliers.
- Key parameters:
eps (neighborhood radius) and min_samples (minimum points to form a cluster).
- Best for: Datasets with irregularly shaped clusters or when you expect outliers.
Mixed Data Type Support
The clustering engine handles both numeric and categorical data:
- Numeric features: Standardized using normalization before clustering.
- Categorical features: Encoded appropriately depending on cardinality.
- Auto-detection: The system automatically identifies feature types from question types (RADIO, CHECKBOX, DROPDOWN are treated as categorical; NPS, SLIDER, numeric fields as numeric).
Feature Studio
Before running clustering, you can use the Feature Studio to prepare and transform your data features. Available transformations include:
- Normalize: Standard scaling, min-max normalization
- Encode: Encoding for categorical variables
- Impute: Handle missing values with mean, median, mode, or custom strategies
- Bin: Convert continuous variables into discrete bins (equal-width, equal-frequency, or custom)
- Scale: Apply logarithmic, square root, or other scaling transformations
- Derived features: Create new features from existing ones using custom expressions
- Feature selection: Automatically identify the most important features for clustering
Clustering Quality Metrics
After running auto-segmentation, the platform provides quality metrics to evaluate your clusters:
- Silhouette Score: Measures how similar a point is to its own cluster versus other clusters. Ranges from -1 to 1, with higher values indicating better-defined clusters.
- Davies-Bouldin Index: Measures the average similarity between clusters. Lower values indicate better separation between clusters.
- Cluster sizes: The number of records in each cluster, helping you identify imbalanced segments.
- Cluster profiles: Statistical summaries of each cluster showing the distinguishing characteristics.
Predictive Segmentation
Once you have established segments (either through rules or clustering), you can train a supervised ML model to predict segment membership for new data. Predictive segmentation supports:
- Logistic Regression: Fast, interpretable model for segment classification.
- Random Forest: More complex model that can capture non-linear relationships.
Training a Predictive Model
- Select the features to use for prediction.
- Choose the target column (existing segment assignments).
- Configure test/train split ratio (default: 80/20).
- Select the model type (logistic regression or random forest).
- Train the model and review metrics (accuracy, precision, recall, F1 score, confusion matrix).
Feature Importance
After training, the system extracts feature importance scores showing which variables have the most influence on segment assignment. This helps you understand what drives the differences between your segments.
Applying Segments
Once segments are defined, they are stored on each record in the data. Each segment record includes:
- configId: The segmentation configuration ID
- configName: Name of the segmentation config
- segmentName: The assigned segment name
- appliedAt: Timestamp when the segment was applied
Segments can be used as filters throughout the analytics interface and as targeting criteria in campaign workflows.