diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index 89f6c20..b65cbf5 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -570,6 +570,7 @@ front matter... ``` - **Purpose:** Tells search engines where to find the sitemap - **Impact:** SEO - affects how search engines crawl the site + - **Note:** `/sitemap.xml` is generated automatically by the `jekyll-sitemap` plugin (configured in `_config.yml`). Do not maintain a manual `sitemap.xml` file. - **Priority:** 🔴 CRITICAL 3. **`CNAME`** (line 19) @@ -615,6 +616,7 @@ If you need to change the website URL: #### Step 1: Update Critical Configuration - [ ] Update `_config.yml` → `url:` field - [ ] Verify `robots.txt` → `Sitemap:` line (generated from `{{ site.url }}{{ site.baseurl }}`) +- [ ] Verify `/sitemap.xml` is generated (jekyll-sitemap) and includes key pages - [ ] Update or remove `CNAME` file if using custom domain #### Step 2: Test Locally @@ -768,9 +770,10 @@ headline: "Text with [link](OPENINGS_LINK)" **Security & Protection:** - ✅ Enhanced 404 page with navigation buttons and Bootstrap icons - ✅ Comprehensive `robots.txt` with crawler access control - - Allows: Googlebot, Bingbot, Slurp with 10-second crawl delay - - Blocks: MJ12bot, AhrefsBot, SemrushBot, DotBot, PetalBot, DataForSeoBot - - Restricts: `/images/`, `/assets/`, `/css/`, `/js/` directories + - Allows: major search engines (Googlebot, Bingbot, Slurp, DuckDuckBot, etc.) + - Blocks: known heavy scraper / SEO bots (AhrefsBot, SemrushBot, MJ12bot, DotBot, PetalBot, DataForSeoBot) + - Avoids `Crawl-delay` (ignored by Googlebot and may trigger Search Console warnings) + - Restricts: internal build artifacts only (`/_site/`, `/bin/`) - ✅ Apache security configuration (`.htaccess`) - Directory browsing disabled - Security headers (X-Frame-Options, X-XSS-Protection, etc.) @@ -968,7 +971,7 @@ headline: "Text with [link](OPENINGS_LINK)" - Test theme compatibility 3. **Security Review:** - - Review `robots.txt` blocked crawlers + - Review `robots.txt` blocked scraper list (and confirm sitemap URL) - Update `.htaccess` security headers - Check GitHub Pages security settings diff --git a/Gemfile b/Gemfile index 345c2d1..5e62951 100644 --- a/Gemfile +++ b/Gemfile @@ -10,6 +10,7 @@ # Key Dependencies: # - jekyll 4.4.1+: Static site generator # - jekyll-scholar: BibTeX bibliography support +# - jekyll-sitemap: Automatic sitemap.xml generation for search engines # - webrick 1.9+: Ruby web server for local development # # Installation: @@ -29,5 +30,6 @@ gem "jekyll", "4.4.1" # gem "github-pages", "~> 232", group: :jekyll_plugins gem "jekyll-scholar", group: :jekyll_plugins +gem "jekyll-sitemap", group: :jekyll_plugins gem "webrick", "~> 1.9" gem "wdm", ">= 0.1.0" if Gem.win_platform? diff --git a/README.md b/README.md index 5909ab8..a26f704 100644 --- a/README.md +++ b/README.md @@ -328,7 +328,7 @@ bundle install ## Security Features -- ✅ **Crawler Protection:** `robots.txt` controls search engine access +- ✅ **Crawler Protection:** `robots.txt` allows major search engines and blocks known heavy scraper bots - ✅ **Custom 404 Page:** User-friendly error handling with navigation - ✅ **DDoS Protection:** GitHub Pages + Cloudflare CDN - ✅ **Security Headers:** Content security and XSS protection diff --git a/_config.yml b/_config.yml index 5a7527f..e99042d 100644 --- a/_config.yml +++ b/_config.yml @@ -34,6 +34,10 @@ include: - _pages - robots.txt +plugins: + - jekyll-scholar + - jekyll-sitemap + sass: sass_dir: _sass diff --git a/_pages/aboutwebsite.md b/_pages/aboutwebsite.md index 222a868..26d7741 100644 --- a/_pages/aboutwebsite.md +++ b/_pages/aboutwebsite.md @@ -2,7 +2,6 @@ title: "About the website" layout: textlay excerpt: "About the website." -sitemap: false permalink: /aboutwebsite.html ---