Autonomous daily LinkedIn content pipeline for Arjun Acharya — Port Automation & AI Engineer at Prosertek (Bilbao, Spain). The system researches maritime and industrial automation news, avoids near-duplicate posts via vector memory, generates PhD-level long-form posts with Gemini 1.5 Pro, and publishes via the official LinkedIn REST API v2 (/v2/ugcPosts) using an OAuth 2.0 access token (w_member_social), unless dry-run mode is enabled.
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ post_calendar.csv │────▶│ Orchestrator │────▶│ PostGenerator │
│ (editorial calendar)│ │ (daily pipeline) │ │ (Gemini) │
└─────────────────────┘ └────────┬─────────┘ └────────▲────────┘
│ │
┌─────────────────────┐ │ ┌──────────┴────────┐
│ MarketResearcher │──────────────┼──────────────▶│ Prompt w/ news │
│ (RSS + site scrape) │ │ └──────────────────┘
└─────────────────────┘ │
▼
┌─────────────────┐
│ RAGEngine │
│ (ChromaDB) │
│ dedupe + recall │
└────────┬────────┘
│
▼
┌─────────────────┐
│ LinkedInPublisher│
│ (OAuth /v2) │
└─────────────────┘
│
▼
┌─────────────────┐
│ PostScheduler │
│ (APScheduler) │
└─────────────────┘
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux / macOS
pip install -r requirements.txtCopy .env.example to .env and fill in API keys and credentials.
The .env file is gitignored and will not be committed or pushed; only .env.example (placeholders) belongs in the repository. If you ever ran git add .env by mistake, run git rm --cached .env and keep secrets only locally.
| Variable | Purpose |
|---|---|
GEMINI_API_KEY |
Google AI Studio key for Gemini |
LINKEDIN_ACCESS_TOKEN |
OAuth 2.0 access token with w_member_social (UGC Posts) |
LINKEDIN_CLIENT_ID / LINKEDIN_CLIENT_SECRET |
Only needed to refresh the token locally (see script below) |
LINKEDIN_EMAIL / LINKEDIN_PASSWORD |
Optional legacy placeholders (not used for publishing) |
CHROMA_PERSIST_DIR |
Persistent ChromaDB directory |
POST_CALENDAR_PATH |
CSV calendar (data/post_calendar.csv) |
SCHEDULE_HOUR / SCHEDULE_MINUTE |
Local cron time |
TIMEZONE |
IANA zone (default Europe/Madrid) |
CALENDAR_SEQUENCE_START |
Optional YYYY-MM-DD: day 1 of the CSV is this date; each following day uses the next row (wraps at N = number of CSV rows). If unset, the row follows day-of-year mod N (not run count). In GitHub Actions, set repo variable CALENDAR_SEQUENCE_START. |
DRY_RUN |
true to skip publishing |
With LINKEDIN_ACCESS_TOKEN set in .env, verify the token against LinkedIn (GET /v2/userinfo, then GET /v2/me if needed):
python main.py --test-linkedinThis does not publish anything. The access token must include OpenID scopes for userinfo (openid, profile, …) and w_member_social for posting. If you only authorized w_member_social, userinfo returns 401 — re-run the local OAuth helper below.
- In the Developer Portal, open your app → Products and ensure Sign In with LinkedIn using OpenID Connect and Share on LinkedIn are added.
- Under Auth, add this Authorized redirect URL (exactly):
http://127.0.0.1:8765/oauth/callback - Put
LINKEDIN_CLIENT_IDandLINKEDIN_CLIENT_SECRETin.env. - Run:
python scripts/linkedin_oauth_local.pySign in when the browser opens, click Allow, and the script will exchange the code and update LINKEDIN_ACCESS_TOKEN in .env.
# One-off run for “today’s” slot (1–100 cycle from day-of-year)
python main.py --run-now
# Specific calendar day (1–N)
python main.py --day 42
# Daemon scheduler (cron at configured local time)
python main.py --schedule
# No LinkedIn post — log only
python main.py --run-now --dry-runprofessional_linkedin/
├── main.py
├── requirements.txt
├── .env.example
├── data/
│ └── post_calendar.csv
├── src/
│ ├── config.py
│ ├── post_generator.py
│ ├── market_researcher.py
│ ├── rag_engine.py
│ ├── linkedin_publisher.py
│ ├── scheduler.py
│ └── orchestrator.py
├── scripts/
│ └── linkedin_oauth_local.py
└── tests/
├── test_post_generator.py
└── test_full_pipeline.py
- Python 3.10+
- Gemini AI (
google-generativeai) for long-form posts - ChromaDB for embeddings, post history, and market insight recall
- APScheduler with cron triggers and timezone support
- LinkedIn REST API v2 (
ugcPosts, OAuth2 bearer token) via requests - BeautifulSoup + requests for lightweight research scraping and Google News RSS
- Azure themes appear in prompts and calendar topics (cloud IoT patterns)
Unit + generator tests:
python -m pytest tests/ -vFull pipeline (orchestrator) without real Gemini or LinkedIn:
python -m pytest tests/test_full_pipeline.py -vThese tests mock PostGenerator.generate_post and stub market research so Chroma + dry-run publish run end-to-end.
Live smoke test (real Gemini + HTTP research + Chroma; LinkedIn skipped if --dry-run):
# Requires a valid GEMINI_API_KEY and LinkedIn credentials in .env
python main.py --run-now --dry-run --day 1If you see API_KEY_INVALID, create a new key in Google AI Studio and set GEMINI_API_KEY in .env.
MIT