Skip to main content

Refresh Knowledge Data

A knowledge is only as good as the data it indexed. This page explains how to keep it current — both with scheduled refreshes and with manual operations — and what to do when things go wrong.

What happens during a refresh

For every source configured on the knowledge:

  1. The connector re-reads the source — the folder is scanned, the URL is fetched, the database is queried.
  2. New content is detected and chunked.
  3. Chunks are embedded (sent to the knowledge's embedding model) and stored.
  4. Removed content is unindexed.

Refreshes are incremental where possible. Sources that have not changed since the last refresh are not re-embedded.

Refresh modes

ModeTriggered byWhen to use
ScheduledThe refresh policy on the knowledge (hourly, daily, weekly).Default mode. Set it once and forget.
ManualClicking Refresh in the editor.After adding a source, after editing a document you know the knowledge should reflect, or while testing.
From a workflowA workflow step that triggers a refresh by knowledge ID.When refresh should follow another event — for example "refresh after the nightly export finishes".

How long does a refresh take

A few rough numbers:

  • Small knowledge (a folder with a dozen documents): seconds.
  • Mid-size knowledge (a hundred documents or a couple of websites): a few minutes.
  • Large knowledge (thousands of documents): tens of minutes to hours. First-time indexing is the slowest; incremental refreshes are much faster.

The Jobs view shows progress and an estimated time for in-flight refreshes.

📷 SCREENSHOT: A refresh in progress in the Jobs view, with the percentage of processed chunks visible.

Manual refresh — step by step

  1. Open the knowledge in the editor.
  2. Click Refresh in the toolbar.
  3. The status badge changes to Refreshing. You can leave the page — the refresh runs in the background.
  4. Open the Jobs view to see the progress.
  5. When the refresh completes, the badge changes to Up to date and the Last refreshed timestamp updates.

Refresh from a workflow

Sometimes you want refresh to be event-driven instead of time-driven: a workflow that finalizes nightly data should refresh the knowledge that depends on it as the last step.

To do this from a workflow, use the integration that triggers a knowledge refresh (it accepts the knowledge ID and runs the refresh asynchronously).

Looking at past refreshes

Each refresh is a job — the same concept used for workflows. Open the Jobs view from inside the knowledge editor to see:

  • Start and end time, duration.
  • A status — completed, failed, partial.
  • Number of chunks processed, added, removed.
  • The log of any sources that failed.

📷 SCREENSHOT: The knowledge Jobs view with one completed refresh, one in-progress refresh, and a sidebar showing the selected job's per-source breakdown.

When a refresh fails

Common causes:

  • Source unreachable. A URL is down, a folder is no longer mounted, a database is unavailable. The refresh log identifies which source failed; other sources continue.
  • Authentication. Credentials in the source configuration have changed or expired.
  • Embedding model unavailable. The connected model's provider is down or rate-limited.
  • Content too big. A single source contains a document the platform cannot process. The log identifies which file.

If the cause is transient (network blip, rate limit), the next scheduled refresh recovers automatically. For persistent issues, edit the source or the embedding model.

Cost considerations

Every embedded chunk costs one call to the embedding model. Frequent refreshes on large knowledges add up.

A few tips to keep cost reasonable:

  • Use the longest sensible refresh interval. Hourly is rarely needed; daily is enough for most content.
  • Split content between knowledges that change often and knowledges that change rarely. Refresh policies are per-knowledge.
  • Watch Metrics → Consumption — embeddings appear as their own line item.

Recommendations

  • ✅ Set refresh policies that match how often the source data actually changes, not how often you wish it did.
  • ✅ Run a manual refresh after editing the configuration. Scheduled refreshes do not always pick up new sources immediately.
  • ✅ Inspect the Jobs view at least once a week. Quietly failing sources are a common cause of "the AI doesn't know about X".
  • ⚠️ Refreshes overlap. If a refresh takes longer than the interval (say a daily refresh takes 25 hours), the next one is queued. This is harmless but counts as wasted effort — increase the interval.
  • ⚠️ Removing a source removes its content from the index. If the same content was added via a second source, it stays. Surprising at first, correct on reflection.
  • ❌ Do not trigger refreshes from very short cron schedules ("every minute"). Embedding models rate-limit and the queue will pile up.

What to do next