# Dimensions, Measures & Filters

Understanding the difference between dimensions and measures is the single most important thing to know about SemQL. Get this right and most queries just work.

***

## Dimensions

A **dimension** is an attribute you slice, filter, or group by. Think: names, dates, categories, IDs, statuses.

```sql
-- Examples of dimensions
name
region
created_at
product_category
customer_id
```

Dimensions map to plain columns in your database, or sometimes to expressions (e.g., `full_name` might be `CONCAT(first_name, ' ', last_name)` under the hood — you don't need to know).

You use dimensions in:

* `SELECT` (to show the attribute)
* `WHERE` (to filter before aggregation)
* `GROUP BY` (to group rows)
* `ORDER BY` (to sort)

***

## Measures

A **measure** is a pre-defined, pre-aggregated metric. Think: revenue, user counts, order totals, conversion rates.

```sql
-- Examples of measures
total_revenue         -- might be SUM(amount - refunds)
user_count            -- might be COUNT(DISTINCT user_id)
average_order_value   -- might be AVG(amount) WHERE amount > 0
conversion_rate       -- might be SUM(conversions) / COUNT(sessions)
```

Measures are defined once by your data team in the semantic layer. They encode all the business logic: which columns to use, how to handle nulls, what to exclude, how to aggregate.

You use measures in:

* `SELECT` (to show the metric)
* `HAVING` (to filter after aggregation)
* `ORDER BY` (to sort by the metric)

***

## The Golden Rule: Never Aggregate a Measure

This is the most common SemQL mistake, and it produces silent, wrong results.

**Measures are already aggregated.** When you write `total_revenue`, it already contains a `SUM()` inside its definition. If you wrap it in another `SUM()`, you get double aggregation — which either errors out or returns garbage.

```sql
-- WRONG: total_revenue is already SUM(amount - refunds)
-- This becomes SUM(SUM(amount - refunds)) — double aggregation
SELECT region, SUM(total_revenue)
FROM sales
GROUP BY region

-- CORRECT: just reference the measure directly
SELECT region, total_revenue
FROM sales
GROUP BY region
```

Same rule applies to `COUNT`, `AVG`, `MIN`, `MAX`, and any other aggregate:

```sql
-- WRONG
SELECT product, AVG(average_rating) FROM products GROUP BY product

-- CORRECT
SELECT product, average_rating FROM products GROUP BY product
```

**How to tell if something is a measure:** Check your semantic layer. Measures are tagged as such. When in doubt, if a field sounds like a business metric (revenue, count, rate, total, average), it's probably a measure.

***

## WHERE vs. HAVING

Since dimensions and measures behave differently, they use different filter clauses:

|                     | WHERE                              | HAVING                                      |
| ------------------- | ---------------------------------- | ------------------------------------------- |
| **What it filters** | Dimensions (raw column values)     | Measures (aggregated results)               |
| **When it runs**    | Before aggregation                 | After aggregation                           |
| **Use for**         | `status = 'active'`, `date >= ...` | `total_revenue > 10000`, `order_count >= 5` |

```sql
-- WHERE: filter by a dimension (region is a dimension)
SELECT region, total_revenue
FROM sales
WHERE region = 'Europe'
GROUP BY region

-- HAVING: filter by a measure (total_revenue is a measure)
SELECT region, total_revenue
FROM sales
GROUP BY region
HAVING total_revenue > 500000

-- Both together
SELECT region, total_revenue
FROM sales
WHERE created_at >= '2024-01-01'   -- dimension filter: rows to include
GROUP BY region
HAVING total_revenue > 100000       -- measure filter: groups to keep
ORDER BY total_revenue DESC
```

A simple test: if you could put the condition in `WHERE` without a `GROUP BY`, it filters a dimension. If it only makes sense after grouping (because it involves an aggregated value), it belongs in `HAVING`.

***

## Predefined Filters

**Predefined filters** are reusable `WHERE` conditions defined in the semantic layer. Instead of writing `WHERE member_count > 0` everywhere, your data team defines it once as `active_only`. Anyone who uses `organizations[active_only]` gets exactly that condition — consistently.

### Syntax

Filters go in the `FROM` or `JOIN` clause, in square brackets after the model name:

```sql
-- Single filter
FROM organizations[active_only]

-- Multiple filters (all conditions applied together)
FROM sales[last_quarter, high_value]

-- Filter on a joined model
FROM sales
JOIN customers[enterprise] ON sales.customer_id = customers.customer_id
```

### Filters are not just convenience — they're consistency

Imagine "enterprise customer" means `annual_contract_value >= 50000 AND plan = 'enterprise'`. Without predefined filters, every analyst writes that condition slightly differently. Some forget the plan check. Some use different thresholds.

With `customers[enterprise]`, there's one definition. Every query, every agent, every analyst uses the same criteria.

### Auto-applied model filters

Some models have filters that apply automatically to every query, without you having to specify them. These are **guardrails** set by your data team.

For example, if the `transactions` model has an auto-filter for `status = 'completed'`, then `FROM transactions` always excludes incomplete transactions — even if you don't ask for it. You can't bypass them, and you shouldn't need to. They exist to prevent mistakes.

### Filters don't belong in SELECT

This is a syntax error:

```sql
-- WRONG: filters in SELECT
SELECT organizations[active_only].name FROM organizations

-- CORRECT: filters in FROM
SELECT name FROM organizations[active_only]
```

***

## Quick Reference

| Concept                         | Where to use                      | Example                           |
| ------------------------------- | --------------------------------- | --------------------------------- |
| Dimension                       | SELECT, WHERE, GROUP BY, ORDER BY | `region`, `created_at`, `name`    |
| Measure                         | SELECT, HAVING, ORDER BY          | `total_revenue`, `user_count`     |
| Predefined filter               | FROM, JOIN (in square brackets)   | `[active_only]`, `[last_quarter]` |
| Aggregate function on dimension | SELECT with GROUP BY              | `COUNT(order_id)`                 |
| Aggregate function on measure   | **Never** — already aggregated    | —                                 |
| Dimension filter                | WHERE                             | `WHERE region = 'EU'`             |
| Measure filter                  | HAVING                            | `HAVING total_revenue > 0`        |
