Skip to content

Add NLP synthetic data generation notebook#98

Open
catalystshakya wants to merge 2 commits into
InnovAIte-Deakin:mainfrom
catalystshakya:feature/nlp-synthetic-data
Open

Add NLP synthetic data generation notebook#98
catalystshakya wants to merge 2 commits into
InnovAIte-Deakin:mainfrom
catalystshakya:feature/nlp-synthetic-data

Conversation

@catalystshakya
Copy link
Copy Markdown
Collaborator

@catalystshakya catalystshakya commented Apr 28, 2026

Summary
This Script generates a labelled synthetic dataset of social-media-style posts about bushfire events in Victoria.

Each post is augmented with realistic social-media metadata (timestamp, likes, retweets, platform, hashtags, location).

Planner task :Synthetic Post Data generation script for narrative clustering
Link :

Type

  • script

Changes: - Week 9

  • Post IDs shortened
  • Non-fire-related posts added
  • JSON output script added

Reasons :

Copy link
Copy Markdown
Collaborator

@vuhoangnamdoan vuhoangnamdoan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Samadhi, since we've discussed in a quick meeting with the team. Your work looks quite clean for synthetic data generation. Consider using LLM APIs for generate more random data later, but for now it looks good in general.

@catalystshakya
Copy link
Copy Markdown
Collaborator Author

@vuhoangnamdoan Changes discussed such as shortening of post ids and adding non fire related posts were added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants