Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.idea

dist
RSS_Reader.egg-info

__pycache__
144 changes: 144 additions & 0 deletions Narek Arsenyan/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@

# Feed Format

---

**`Feed:`**      Feed source title

 

**`Title:`**    Title of the new

**`Date:`**      Publishing date of the new

**`Link:`**      Link of the new

 

### `Content of the new`

 

**`Links:`**


`[1]: Link 1` (type of source)

`[2]: Link 2` (type of source)

.

.

.

`[n]: Link n` (type of source)


---

# Feed Format (JSON)

---

```json
{
"source_title": "Feed source title",
"source_url": "Feed source url",
"title": "Title of the new",
"date": "Publishing date of the new",
"link": "Link of the new",
"content": "Content of the new",
"non_media_links": [
{
"href": "url of link used in the new",
"link_type": "link type of the link"
}
],
"media_links": [
{
"href": "url of media used in the new",
"link_type": "type of media"
}
]
}
```

---

# Feed caching

---

Previously fetched feeds are cached in user cache directory.

**For macOS:**

>/Users/[USER]/Library/Caches/RSSReader

**For Linux:**

>/home/[USER]/.cache/RSSReader

**For Windows 7:**

>C:\Users\[USER]\AppData\Local\nmac99\RSSReader\Cache

## Format of caching

Fetched feeds are stored in `[DATE].json` files, where `[DATE]` is the date of publication of the feed.

Inside `.json` file is JSON object where keys are fetched feeds' sources and values are feeds' data list in JSON format.

**Example:**

`2022-06-11.json`

```json
{
"https://timesofindia.indiatimes.com/rssfeedstopstories.cms": [
"{\n \"source_title\": Times of India,\n \"source_url\": \"https://timesofindia.indiatimes.com/rssfeedstopstories.cms\",\n \"title\": \"Presidential polls: Mamata invites 22 oppn CMs, leaders for joint meeting on June 15\",\n \"date\": \"2022-06-11T16:22:36+05:30\",\n \"link\": \"https://timesofindia.indiatimes.com/india/presidential-polls-mamata-invites-22-oppn-cms-leaders-for-joint-meeting-on-june-15/articleshow/92146582.cms\",\n \"content\": \"With the Rajya Sabha results exposing dissension and lack of cohesion among opposition parties, West Bengal chief minister Mamata Banerjee on Saturday reached out to her counterparts and other leaders to participate in a meeting in Delhi on June 15 to discuss the upcoming presidential polls, which are scheduled for July 18.\",\n \"non_media_links\": [\n {\n \"href\": \"https://timesofindia.indiatimes.com/india/presidential-polls-mamata-invites-22-oppn-cms-leaders-for-joint-meeting-on-june-15/articleshow/92146582.cms\",\n \"link_type\": \"link\"\n }\n ],\n \"media_links\": []\n}"
]
}
```
 

3 types of cache checks are implemented:

1. When cache files for dates are exceeding count of 10, the earliest date cache file is deleted
2. When cache sources in one cache file are exceeding count of 10, the first source is deleted with its content
3. When cached feeds in one cache source are exceeding count of 10, the first cached feed in that source is deleted

 

When reading from cache, JSON objects are being converted to normalized Feed objects

---

# Feeds conversion

---

Currently, there are **2 types** of conversion available:

1. HTML
2. EPUB

 

You can easily convert your feeds to these 2 formats, whether they are newly fetched or were read from cache.

Converted files will be saved in your provided directory, however if that directory does not exist, files will be saved
in user data directory.

**For macOS:**

>/Users/[USER]/Library/Application Support/RSSReader

**For Linux:**

>/home/[USER]/.local/share/RSSReader

**For Windows 7:**

>C:\Users\[USER]\AppData\Local\nmac99\RSSReader
Empty file added Narek Arsenyan/__init__.py
Empty file.
6 changes: 6 additions & 0 deletions Narek Arsenyan/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
appdirs==1.4.4
beautifulsoup4==4.11.1
EbookLib==0.17.1
requests==2.28.0
setuptools==61.2.0
yattag==1.14.0
Binary file added Narek Arsenyan/rss_reader_package/.DS_Store
Binary file not shown.
Empty file.
127 changes: 127 additions & 0 deletions Narek Arsenyan/rss_reader_package/cache_worker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
"""
Module for CacheWorker class

exports CacheWorker class
"""
import os
import utils.config as config
from appdirs import user_cache_dir
from datetime import datetime
from json import dumps, load
from feed import Feed
from utils.exceptions import CachedFeedNotFoundError


class CacheWorker:
"""Class for working with feeds caching"""

appname = config.appname
appauthor = config.appauthor

@staticmethod
def store_feed_in_cache(feed: Feed):
"""
Function that stores feed in cache files

Args:
feed: Feed object with all necessary data
"""

date = str(datetime.strptime(feed.date, "%a, %d %b %Y %H:%M:%S %z").date())
source = feed.source_url
feed_id = feed.link
config.verbose_print("Getting user cache directory", "bold")
cache_dir = user_cache_dir(CacheWorker.appname, CacheWorker.appauthor)
if not os.path.exists(cache_dir):
os.mkdir(cache_dir)
config.verbose_print("Checking user cache directory overload", "und")
cache_files = []
for (_, __, filenames) in os.walk(cache_dir):
for file in filenames:
if file.endswith(".json"):
cache_files.append(file)
if len(cache_files) > 10:
config.verbose_print("User cache directory is overloaded. Removing old cache file", "warn")
os.remove(os.path.join(cache_dir, f"{cache_files[0]}"))
try:
with open(os.path.join(cache_dir, f"{date}.json"), "r") as cache_file:
config.verbose_print("Reading date cache file", "bold")
try:
cache = load(cache_file)
except Exception as e:
config.verbose_print(f"Warning: JSON not found ({e})", "warn")
cache = dict()
except Exception as e:
config.verbose_print(f"Warning: date cache not found ({e})", "warn")
cache = dict()
config.verbose_print("Checking date cache file overload", "und")
if len(cache.keys()) > 10:
config.verbose_print("Date cache file is overloaded. Removing old cache entry", "warn")
del cache[list(cache.keys())[0]]
try:
with open(os.path.join(cache_dir, f"{date}.json"), "w") as cache_file:
config.verbose_print("Checking if feed is already in cache", "und")
feed_is_in_cache = False
if source in cache:
for feed in cache[source]:
if feed_id in feed:
config.verbose_print("Feed is in cache already", "green")
feed_is_in_cache = True
break
if not feed_is_in_cache:
config.verbose_print("Feed not in cache. Storing feed in cache", "bold")
if len(cache[source]) > 10:
cache[source].pop(0)
cache[source].append(feed.to_json())
else:
cache[source] = [feed.to_json()]
config.verbose_print("Update date cache file", "bold")
cache_file.write(dumps(cache, indent=4))
except Exception as e:
config.verbose_print(f"Unable to open cache file ({e})", "warn")

@staticmethod
def read_feed_from_cache(date: str, source: str or None, limit: int or None) -> [Feed]:
"""
Args:
date: for which date cached feed should be read
source: specific source for feed
limit: limit of feeds that should be retrieved

Returns:
[Feed]: list of fetched feeds from cache

Raises:
CachedFeedNotFoundError
"""

config.verbose_print("Opening user cache directory", "bold")
cache_dir = user_cache_dir(CacheWorker.appname, CacheWorker.appauthor)
try:
with open(os.path.join(cache_dir, f"{date}.json"), "r") as cache_file:
try:
cache = load(cache_file)
except Exception as e:
config.verbose_print(f"Cannot read JSON from cache file ({e})", "warn")
cache = dict()
if len(cache.keys()) == 0:
raise CachedFeedNotFoundError("Error: Cached Feed not found")
formatted_cached_feeds = []
limit_final = 100
if limit is not None:
limit_final = limit
config.verbose_print("Reading feeds from cache", "bold")
if source is None:
for cached_feed in cache.values():
if len(formatted_cached_feeds) == limit_final:
break
formatted_cached_feeds.append(Feed.json_to_feed(cached_feed[0]))
else:
for cached_feed in cache[source]:
if len(formatted_cached_feeds) == limit_final:
break
formatted_cached_feeds.append(Feed.json_to_feed(cached_feed))
return formatted_cached_feeds
except Exception as e:
config.verbose_print(f"Date cache file not found ({e})", "warn")
raise CachedFeedNotFoundError("Error: Cached Feed not found")
25 changes: 25 additions & 0 deletions Narek Arsenyan/rss_reader_package/date.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""
Module for datetime enhancements

exports valid_date
"""
from argparse import ArgumentTypeError
from datetime import datetime


def valid_date(s: str) -> datetime.date:
"""
Parses string with datetime to date format

Args:
s: datetime containing string

Returns:
datetime.date: parsed string to date
"""

try:
return datetime.strptime(s, "%Y%m%d").date()
except ValueError:
msg = "Not a valid date: {0!r}".format(s)
raise ArgumentTypeError(msg)
Loading