Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Documentation
- Environment variable and authentication docs updated to use `COGSOL_API_KEY` and optional Azure AD B2C credentials.
- Removed outdated "no external dependencies" statements from README.
- Added nested-topic ingestion examples and corrected ingest file paths to use topic-aligned `data/<topic-path>/` locations in docs.
- Clarified in README topic examples that `documentation` is only a sample topic name and not required.
- Retrieval-tool examples now instantiate retrieval definitions (e.g., `ProductDocsRetrieval()`) to avoid runtime confusion from class references.
- Setup guides now explicitly document creating and activating a local `.venv` before installing dependencies.
Expand Down
24 changes: 19 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,10 @@ python manage.py makemigrations data
python manage.py migrate data

# Ingest documents into a topic
python manage.py ingest documentation ./docs/*.pdf
python manage.py ingest documentation ./data/documentation/*.pdf

# Ingest documents into a nested topic
python manage.py ingest documentation/tutorials ./data/documentation/tutorials/*.pdf
```

---
Expand Down Expand Up @@ -297,6 +300,14 @@ python manage.py ingest <topic> <files...> [options]
- `topic`: Topic path (e.g., `documentation` or `parent/child/topic`)
- `files`: Files, directories, or glob patterns to ingest

Use slash-separated paths for nested topics. For example, if you created `tutorials` under
`documentation` with `starttopic tutorials --path documentation`, ingest into it with
`documentation/tutorials`.

For a topic-aligned workflow, place files under `data/<topic-path>/` and ingest from that
folder (for example, `./data/documentation/*.pdf` or
`./data/documentation/tutorials/*.pdf`).

**Options:**
- `--doc-type`: Document type (defaults to `Text Document`)
- `--ingestion-config`: Name of an ingestion config from `data/ingestion.py`
Expand All @@ -313,13 +324,16 @@ python manage.py ingest <topic> <files...> [options]
**Examples:**
```bash
# Ingest PDF files
python manage.py ingest documentation ./docs/*.pdf
python manage.py ingest documentation ./data/documentation/*.pdf

# Ingest into a child topic
python manage.py ingest documentation/tutorials ./data/documentation/tutorials/*.pdf

# Ingest with custom config
python manage.py ingest documentation ./docs/ --ingestion-config HighQuality
python manage.py ingest documentation ./data/documentation/ --ingestion-config HighQuality

# Dry run to preview
python manage.py ingest documentation ./data/ --dry-run
python manage.py ingest documentation ./data/documentation/ --dry-run
```

### `topics`
Expand Down Expand Up @@ -613,7 +627,7 @@ from cogsol.content import BaseIngestionConfig, PDFParsingMode, ChunkingMode
Use with the `ingest` command:

```bash
python manage.py ingest documentation ./docs/ --ingestion-config high_quality
python manage.py ingest documentation ./data/documentation/ --ingestion-config high_quality
```

#### Reference Formatters
Expand Down
18 changes: 13 additions & 5 deletions docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,11 @@ python manage.py ingest <topic> <files...> [options]
| `topic` | Yes | - | Topic path (e.g., `docs` or `parent/child`) |
| `files` | Yes | - | Files, directories, or glob patterns |

Use slash-separated paths for nested topics during ingestion (for example:
`documentation/tutorials`). For a topic-aligned workflow, place files under
`data/<topic-path>/` and ingest from that matching path (for example:
`./data/documentation/*.pdf` and `./data/documentation/tutorials/*.pdf`).

#### Options

| Option | Default | Description |
Expand Down Expand Up @@ -628,27 +633,30 @@ class HighQualityConfig(BaseIngestionConfig):
Then use with:

```bash
python manage.py ingest documentation ./docs/ --ingestion-config high_quality
python manage.py ingest documentation ./data/documentation/ --ingestion-config high_quality
```

#### Example Usage

```bash
# Ingest all PDFs in a directory
python manage.py ingest documentation ./docs/*.pdf
python manage.py ingest documentation ./data/documentation/*.pdf

# Ingest into a child topic using parent/child path
python manage.py ingest documentation/tutorials ./data/documentation/tutorials/*.pdf

# Ingest an entire directory recursively
python manage.py ingest documentation ./docs/
python manage.py ingest documentation ./data/documentation/

# Use custom settings
python manage.py ingest documentation ./reports/ \
python manage.py ingest documentation ./data/documentation/reports/ \
--doc-type "Text Document" \
--pdf-mode ocr \
--chunking ingestor \
--max-size-block 2000

# Preview what would be ingested
python manage.py ingest documentation ./docs/ --dry-run
python manage.py ingest documentation ./data/documentation/ --dry-run
```

#### Output Messages
Expand Down
9 changes: 6 additions & 3 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -623,14 +623,17 @@ python manage.py migrate data

### Step 8: Ingest Documents

Upload documents to your topic:
Upload documents to your topic. In this guide, examples place files under `data/<topic-path>/` so the file location mirrors the topic path:

```bash
# Ingest a directory of documents
python manage.py ingest product_docs ./docs/
python manage.py ingest product_docs ./data/product_docs/

# Ingest into a nested child topic (parent/child path)
python manage.py ingest product_docs/tutorials ./data/product_docs/tutorials/*.pdf

# Preview first (dry run)
python manage.py ingest product_docs ./docs/ --dry-run
python manage.py ingest product_docs ./data/product_docs/ --dry-run
```

### Step 9: List Topics
Expand Down
Loading