Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions asynclogparser/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Async Log Parser

A Go tool to parse binary log files generated by `asyncloguploader`, decode MPLog protobuf messages, fetch feature schemas from the inference API, and output human-readable logs.

## Features

- **Deframes binary log files**: Removes 8-byte frame headers and padding, extracts 4-byte length-prefixed records
- **Protobuf unmarshaling**: Parses MPLog protobuf messages from record bytes
- **Schema fetching**: Automatically fetches feature schemas from the inference API (Custodian)
- **Feature decoding**: Decodes proto-encoded entity features using the fetched schema
- **Human-readable output**: Writes parsed and decoded data to a readable log file

## Usage

```bash
go run ./cmd/asynclogparse <log-file-path> [output-file]
```

### Arguments

- `log-file-path`: Path to the `.log` file generated by `asyncloguploader`
- `output-file`: (Optional) Path to output file. Defaults to `<log-file>.parsed.log`

### Environment Variables

- `INFERENCE_HOST`: Inference API host (default: `http://custodian.prd.meesho.int`)
- `INFERENCE_PATH`: Inference API path for schema endpoint (default: `/api/v1/custodian/mp-config-registry/get_feature_schema`)

### Example

```bash
# Set environment variables (optional, defaults provided)
export INFERENCE_HOST="http://custodian.prd.meesho.int"
export INFERENCE_PATH="/api/v1/custodian/mp-config-registry/get_feature_schema"

# Parse a log file
go run ./cmd/asynclogparse search-ad-head-multitask-fieldaware-categorylevelscaleup_2026-02-03_17-10-42.log

# Or specify output file
go run ./cmd/asynclogparse input.log output.parsed.log
```

## Output Format

The output file contains human-readable text with the following structure:

```
=== Record 1 ===
User ID: 212079220
Tracking ID: abc
Model Config ID: search-ad-head-multitask-fieldaware-categorylevelscaleup
Version: 4
Format Type: 0
Entities: 300
Parent Entities: 8
Encoded Features: 300

--- Entity 1 ---
Entity ID: 60889110
Parent Entity: saree
Features:
catalog_id: 60889110
user_id: 212079220
pcvr_score: 0.000302
score: 0.000302

--- Entity 2 ---
...
```

## Architecture

### Log File Format

The binary log files follow this format:
- **Frame Header** (8 bytes): `capacity` (uint32 LE) + `validDataBytes` (uint32 LE)
- **Frame Data**: Fixed-size buffer (capacity bytes), but only `validDataBytes` contain real data
- **Records**: Inside valid data, records are 4-byte length-prefixed (uint32 LE) followed by record bytes

### MPLog Protobuf Structure

- `user_id` (string)
- `tracking_id` (string)
- `model_proxy_config_id` (string)
- `version` (int32)
- `format_type` (int32): 0 = proto, 1 = arrow, 2 = parquet
- `entities` (repeated string)
- `parent_entity` (repeated string)
- `encoded_features` (repeated bytes): Per-entity encoded feature blobs

Binary file added asynclogparser/asynclogparse
Binary file not shown.
Loading