Pave Mining

Overview

The Pave mining pipeline uploads a user’s Plaid transactions into the Pave third-party analytics platform and retrieves the financial insights Pave computes from them. The pipeline is split across four Lambdas:

Lambda Trigger Role

Lambda	Trigger	Role
`prod-insight-feeder`	Kinesis (`prod-txn-plaid-transactions`, LATEST)	Uploads transactions and balances to Pave; tracks per-user pagination state
`prod-insight-webhook`	API Gateway (Pave webhook, HMAC-verified)	Receives Pave callbacks; enqueues mining jobs on `USER_DATA_INSIGHTS_READY`
`prod-insight-miner`	SQS (`prod-insight-miner`)	Fetches processed insights from Pave; writes to DynamoDB; emits `user_new_insights_available`
`prod-insight-replay-feeder`	SQS (`prod-insight-replay-feeder`)	Manual replay — refetches Pave data with custom parameters; no downstream events

prod-insight-feeder

Kinesis (prod-txn-plaid-transactions, LATEST)

Uploads transactions and balances to Pave; tracks per-user pagination state

prod-insight-webhook

API Gateway (Pave webhook, HMAC-verified)

Receives Pave callbacks; enqueues mining jobs on USER_DATA_INSIGHTS_READY

prod-insight-miner

SQS (prod-insight-miner)

Fetches processed insights from Pave; writes to DynamoDB; emits user_new_insights_available

prod-insight-replay-feeder

SQS (prod-insight-replay-feeder)

Manual replay — refetches Pave data with custom parameters; no downstream events

The feeder and miner are intentionally decoupled. The feeder pushes data to Pave asynchronously; Pave does its own processing and signals completion via a webhook callback. The webhook Lambda bridges these two phases by enqueuing a mining job.

Feeder

Lambda: prod-insight-feeder
Trigger: Kinesis stream prod-txn-plaid-transactions (LATEST position, batched)

The feeder runs on every Plaid transaction event from the Transactions Service. Its job is to upload the user’s current transactions and balances to Pave and record that the upload happened.

Pagination State

Pave returns insights in pages. The feeder tracks two DynamoDB entities per user:

Entity Purpose

Entity	Purpose
`pave_label`	Marks whether the user’s Plaid data has ever been labeled in Pave. Written once on first upload; used to avoid re-labeling.
`last_page`	Records the latest pagination request ID for a user. The miner checks this entity to decide whether all pages for a given insight run have been processed before emitting the `user_new_insights_available` event.

pave_label

Marks whether the user’s Plaid data has ever been labeled in Pave. Written once on first upload; used to avoid re-labeling.

last_page

Records the latest pagination request ID for a user. The miner checks this entity to decide whether all pages for a given insight run have been processed before emitting the user_new_insights_available event.

Upload Logic

For each Kinesis record, the feeder:

Extracts the user ID and transaction batch from the event.
Checks and sets the pave_label entity if this is the user’s first upload.
Uploads the transaction batch and account balances to the Pave API in paginated requests.
Saves the last_page entity with the latest request ID and page metadata.

GrowthBook feature flags control the transaction batch size and which Pave upload method is used (configurable per-environment).

Webhook

Lambda: prod-insight-webhook
Trigger: API Gateway (HMAC signature-verified endpoint)

The webhook Lambda receives all callbacks Pave sends to the FloatMe webhook URL. Each request is validated via HMAC signature before any processing occurs; requests with invalid signatures are rejected with a 401.

Supported Event Types

Pave Event Type Behaviour

Pave Event Type	Behaviour
`USER_DATA_INSIGHTS_READY`	Insights are ready for this user. Enqueues a mining job (containing `user_id` and `request_id` from the webhook metadata) to the `prod-insight-miner` SQS queue.
`USER_DATA_UPLOAD_SUCCESS`	Acknowledged and logged. No downstream action taken.
`USER_DATA_DELETE_SUCCESS`	Acknowledged and logged. No downstream action taken.
Any other type	Logged as a warning. No downstream action taken.

USER_DATA_INSIGHTS_READY

Insights are ready for this user. Enqueues a mining job (containing user_id and request_id from the webhook metadata) to the prod-insight-miner SQS queue.

USER_DATA_UPLOAD_SUCCESS

Acknowledged and logged. No downstream action taken.

USER_DATA_DELETE_SUCCESS

Acknowledged and logged. No downstream action taken.

Any other type

Logged as a warning. No downstream action taken.

Non-200 status fields in the Pave webhook payload (indicating Pave-side processing errors) are also logged as warnings without enqueuing a mining job.

Miner SQS Message Shape

The SQS message body enqueued for each USER_DATA_INSIGHTS_READY event:

{
  "user_id": "<floatme_user_id>",
  "request_id": "<pave_request_id_from_webhook_metadata>"
}

The request_id is carried through from the Pave webhook metadata and used by the miner to correlate with the last_page entity in DynamoDB when deciding whether to emit the user_new_insights_available event.

Miner

Lambda: prod-insight-miner
Trigger: SQS (prod-insight-miner, batch processing)

The miner dequeues mining jobs and fetches the full set of processed insights from Pave for the given user. It writes all entity types to DynamoDB, then conditionally emits an EventBridge event to notify downstream services that fresh insights are available.

What Is Fetched and Saved

For each SQS message the miner:

Gets unified insights from Pave — returns recurring expenses, ritual expenses, and income sources in a single call.
Writes to DynamoDB:
- recurring — recurring expense records (merchant, due date, amount, bill type fixed/variable, transactions)
- ritual — ritual expense records (normalized merchant, frequency, average amount, transactions)
- income — income source records (recurring income sources, total recurring income)
Gets cash-advance score from Pave — repayment probability scores across 15d, 30d, and 45d intervals.
Writes scores to DynamoDB.
Saves bank account balance snapshot (balance entity) by querying the Transactions Service for the user’s current checking and savings accounts.

`user_new_insights_available` Emission

The miner only emits the user_new_insights_available EventBridge event when it determines it has processed the last page of a user’s insight run. This avoids notifying downstream services (e.g., the Float Service) multiple times per insight cycle when Pave returns results across multiple pages.

The check is done by reading the last_page DynamoDB entity for the {user_id, request_id} pair. If is_last_page = true, the event is emitted.

EventBridge Field Value

EventBridge Field	Value
Bus	`default`
Source	`insight-service.miner`
Detail-Type	`user_new_insights_available`
Payload	`{ "user_id": "<floatme_user_id>", "last_page": true }`

Bus

default

Source

insight-service.miner

Detail-Type

user_new_insights_available

Payload

{
  "user_id": "<floatme_user_id>",
  "last_page": true
}

Failures to emit the event are logged as errors but do not cause the SQS record to be retried — this prevents insight data from being rewritten unnecessarily when only the event emission failed.

Error Handling

The miner processes SQS messages in batch mode and returns per-message failure responses (SQSBatchItemFailure). A message is added to the failure list if any of the following occur: SQS message body cannot be unmarshalled/parsed, unified insight fetch fails, or cash-advance score fetch fails. The balance snapshot and event emission are best-effort — failures are logged but do not add to the batch failure list.

Failed messages are retried up to the queue’s configured maximum receive count before being routed to the dead-letter queue (prod-insight-miner-dlq).

Replay Feeder

Lambda: prod-insight-replay-feeder
Trigger: SQS (prod-insight-replay-feeder)

The replay feeder is a manual operations tool. It accepts SQS messages that specify a user ID plus optional date or parameter overrides, then calls the Pave API to refetch data using those parameters and updates the DynamoDB last_page state accordingly.

The replay feeder does not enqueue to the miner SQS and does not emit any EventBridge events. It is used to re-synchronise pagination state or trigger a fresh Pave data pull without waiting for new Plaid transaction events.

Replay SQS message
  └─▶ prod-insight-replay-feeder
        ├─▶ Pave API (refetch with custom params)
        └─▶ DynamoDB (update last_page state)

Architecture — System context and full Lambda inventory
Event Flows — user_new_insights_available event routing and downstream consumers
DynamoDB Tables — recurring, ritual, income, scores, pave_label, last_page entity schemas