Skip to content

260605 sync r61 to r62#17496

Open
zmstone wants to merge 71 commits into
emqx:release-62from
zmstone:260605-sync-r61-to-r62
Open

260605 sync r61 to r62#17496
zmstone wants to merge 71 commits into
emqx:release-62from
zmstone:260605-sync-r61-to-r62

Conversation

@zmstone
Copy link
Copy Markdown
Member

@zmstone zmstone commented Jun 5, 2026

No description provided.

zmstone and others added 30 commits May 31, 2026 10:20
The export and import paths already filter / reject API key callers
when an archive contains `dashboard_users` or `api_keys`. The stored
backup listing and download paths were unchecked, letting a
system-scope API key, dashboard viewer, or namespaced administrator
download an archive produced by a global administrator that contains
salted dashboard password hashes, MFA / TOTP state, API key records,
and cluster.hocon.

Guards added in this commit:

* `GET /data/files/:filename` now peeks the archive for sensitive
  table sets before returning the bytes. If sensitive content is
  found, only the global administrator may proceed; API keys,
  viewers, and namespaced administrators are rejected with 403.
* `GET /data/files` (listing) is now restricted to the global
  administrator. The listing is not useful to callers that cannot
  download anything sensitive, and restricting it avoids paying the
  per-archive peek cost on every list call.

The peek runs locally on the API-receiving node, so non-global-admin
callers must hit the node where the archive lives -- the same
locality constraint as the existing import-side guard, which
side-steps freezing a new BPAPI proto version just for the cross-node
peek.
Per review: drop the per-archive sensitive-table peek and gate the
download endpoint on the global administrator role alone. Listing
remains open to any caller that previously had access.

Backups can contain dashboard accounts and API-key records regardless
of who triggered the export, so the role gate is the correct
invariant; peeking only added cost and a remote-node locality
caveat. Viewers, namespaced administrators, and API-key callers can
still list (filenames / sizes / timestamps are not credential
material) but cannot download.
Kanidm (and any IDM that sends `Cache-Control: max-age=0' on its
`.well-known/openid-configuration' response) made the OIDC SSO worker
crash at startup with `{badmatch, {error, badarg}}'. The badmatch came
from oidcc 3.2.1, where `cache_deadline/2' returned the atom `true'
instead of a millisecond integer when `max-age=0' was present, and the
worker then hard-matched the `{error, badarg}' coming back from
`timer:send_after(true, _)'. The OIDC sub-supervisor was configured
with restart intensity `{0, 1}', so a single crash put SSO into a
permanently-dead state — re-saving the config did not reliably
recover it.

Three changes:

  * Bump the oidcc dep to `3.2.2', which carries the upstream fix
    in `cache_deadline/2' plus a defensive `safe_send_after/2' in
    `oidcc_provider_configuration_worker' so any future bad expiry
    routes through the existing backoff/retry path instead of
    crashing the gen_server. Tag `3.2.2' must be published on the
    `emqx/oidcc' fork; the dep itself is patched in-place in
    `deps/oidcc/' for local testing.

  * Bump `emqx_dashboard_sso_oidc_sup' restart intensity from
    `{0, 1}' to `{10, 60}'. The oidcc worker has its own random
    5-10 s backoff for soft failures; the sup budget now only burns
    when the worker actually crashes.

  * Add a CT regression case (`t_kanidm_cache_control_max_age_zero')
    that drives the worker with a Kanidm-shape discovery doc plus
    `Cache-Control: max-age=0, no-store' and asserts the SSO login
    redirect still works.

Fixes: emqx#17422
…cope

fix(api): move topic metrics endpoints to monitoring scope
fix(sso): survive OIDC providers that send Cache-Control: max-age=0
…nload-leak

fix(backup): dashboard / api-key viewers cannot download backups
…dashobard-http-listener-by-default

ACME Plugin: add UI aid to reconfigure ssl/wss listeners with one click
…ion-no-conn

fix(prometheus): cope with inconsistent action/connector configs
validate_role_scope_compat/2 and role_default_scopes/1 used raw
string pattern matching against ?ROLE_SUPERUSER, which broke for
namespaced administrators (e.g. "ns:test::administrator"). Parse
the role via parse_dashboard_role/1 first to extract the base
role name before comparing.
fix(license): API serves monthly session HWM history symmetric with CLI
Cover namespaced administrator (ns:test::administrator) scope
assignment: POST with admin-only scopes, POST role defaults, PUT
description-only update, PUT with explicit admin scopes, and a
regression case ensuring viewer still cannot hold admin scopes.
…er-busy

test(mt): retry MQTT connect on transient server_busy
Limit namespaced administrators to connections, monitoring,
data_integration, access_control (common) and user_management,
api_key_management (login-only). RBAC already blocks ns admins
from system, license, and other endpoints — scope defaults and
validation must align.

Added macros for the ns-admin scope whitelist. Updated
role_default_scopes/1 and validate_role_scope_compat/2 to
recognize namespaced roles and enforce the restricted set.
Expanded tests from 43 to 45 covering allowed scopes,
rejection of system/mfa/sso, and description-only PUT.
…iod-r60

fix(license): API serves monthly session HWM history symmetric with CLI
hjianbo and others added 26 commits June 4, 2026 14:42
Co-authored-by: Andrew Maiorov <encube.ul@gmail.com>
…p-deps

fix(plugins): demote stop-deps warning to info
fix(dashboard): parse namespaced roles in scope validation
Documentation-only change. The `POST /api/v5/clients/:clientid/subscribe[/bulk]`
and `/unsubscribe[/bulk]` endpoints have always installed (or removed)
subscriptions directly on the target channel without invoking
`emqx_access_control:authorize/3`. This is intentional -- the REST API
key / dashboard role is the authorization boundary -- but it was not
documented.

- Expanded swagger `desc` for the four endpoint keys in
  `rel/i18n/emqx_mgmt_api_clients.hocon`.
- Added `-doc` attributes on `emqx_mgmt:subscribe/2`, `unsubscribe/2`,
  and `unsubscribe_batch/2`, plus an inline comment at the channel-pid
  send in `do_subscribe/2`.

No code behavior change.
perf(iotdb): reduce REST health check cost
…-bypass-acl

docs(api): force-subscribe/unsubscribe APIs bypass ACL
The avsc schema declared acc_key/acc_key_password as `["null", "string"]`
unions. The Avro JSON decoder requires union values to be tagged
(`{"string": "..."}`), but the dashboard sends bare strings, so saving
a `file://...` URI crashed with function_clause from
avro_json_decoder:parse_union/4. The fields are now plain "string" with
"" default; the parser already maps "" to "unset" and still handles
null defensively for upgraded configs.

A malformed domain entry like `admin@example` was passed through to
acme-erlang-client where idna:to_ascii/1 raises exit:{bad_label, ...}
from inside a try block that only catches throw, killing the issuance
worker (issuance_worker_crashed in the logs). The config parser now
rejects entries with `@`, whitespace, `/`, `\`, or control chars at
save time, and the worker wraps run_action/1 in a catch-all so any
remaining exception lands in last_result as {error, {Class, Reason}}
and in_progress clears for the next attempt.
…nable

Two SSRF-policy bugs (emqx#17483, emqx#17484), fixed symmetrically:

1. The MQTT connector schema enforced `ssrf_check` via `servers_sc/2`,
   so HOCON re-validation of the whole `connectors.mqtt` subtree (which
   happens on every add/update/import) rejected unrelated new connectors
   whenever any sibling MQTT connector's `server` was denied. The check
   is moved to `emqx_connector_resource:parse_confs/3` so it runs once
   per connector being created/updated.

2. The connector enable/disable toggle fast-path in
   `emqx_connector_resource:update/5` skipped `parse_confs/3`, so a
   connector created when SSRF was permissive could be re-enabled after
   the policy tightened. `parse_confs/3` is now re-run before the
   restart on toggle-to-enable. Toggle-to-disable still skips
   validation so administrators can always take down a now-denied
   connector.
…hboard-config

fix(acme): unblock plugin config save from the dashboard
…t-scope

fix(connector): scope SSRF validation per-connector and re-check on enable
…kick

refactor(username_quota): cluster-fanout kick_username to reach all nodes
@zmstone zmstone requested review from a team and JimMoen as code owners June 5, 2026 08:06
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.21569% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.07%. Comparing base (79cc678) to head (c93f8c7).

Files with missing lines Patch % Lines
apps/emqx_utils/src/emqx_metrics_installer.erl 63.63% 4 Missing ⚠️
.../emqx_management/src/emqx_mgmt_api_data_backup.erl 78.57% 3 Missing ⚠️
...pps/emqx_connector/src/emqx_connector_resource.erl 85.71% 1 Missing ⚠️
apps/emqx_management/src/emqx_mgmt.erl 50.00% 1 Missing ⚠️
apps/emqx_plugins/src/emqx_plugins_apps.erl 0.00% 1 Missing ⚠️
...s/emqx_rule_engine/src/emqx_rule_engine_schema.erl 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           release-62   #17496   +/-   ##
===========================================
  Coverage       86.07%   86.07%           
===========================================
  Files            1262     1263    +1     
  Lines           91381    91403   +22     
===========================================
+ Hits            78653    78678   +25     
+ Misses          12728    12725    -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants