Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions EduardMalyautskiy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# RSS reader
RSS reader is a command-line utility which receives RSS URL and prints results in human-readable format.

## Installation and usage
It is recommended to use a python virtual environment.
How use virtual environment you can read [here](https://docs.python.org/3/library/venv.html).

After download RSS reader you can install it by:
```
pip setup.py install
```
and use by:
```
rss_reader ...
```
or you can change directory to directory with `rss_reader.py` file:
```
cd rss_reader
```
install requirements
```
pip install -r requirements.txt
```
and use:
```
python rss_reader.py ...
```
You can read the help of RSS reader using the command:
```
rss_reader --help
```
or
```
python rss_reader.py --help
```
RSS reader provide the following interface:
```
usage: rss_reader.py [-h] [--version] [--json] [--verbose] [--colorize] [--limit LIMIT] [--date DATE] [--to_html [PATH]] [--to_pdf [PATH]] [--clear_cache] [source]

Pure Python command-line RSS reader.

positional arguments:
source RSS URL

optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--json Print result as JSON in stdout
--verbose Outputs verbose status messages
--colorize That will print the result of the utility in colorized mode
--limit LIMIT Limit news topics if this parameter provided
--date DATE It should take a date in YearMonthDay format. For example: --date 20191020. The cashed news can be read with it. The new from the specified day will be printed out. If the news are not found return an error.
--to_html [PATH] Upload result to html-file in folder in PATH parameter, file name is current date with ms.
--to_pdf [PATH] Upload result to pdf-file in folder in PATH parameter, file name is current date with ms.
--clear_cache Delete data from cache DB.
```
Default output using follow structure:
```
------------------------------------------------------------
Feed: Газета ВСЁ
Title: Мост с зареки в центр находится в плачевном состоянии
Date: 2022.06.16
Link: https://vse.sale/news/view/37514
Description: Недавно отремонтированый мост, вновь в плохом состоянии и что бы оставить свою машину целой, приходится нарушать ПДД, а именно выезжать на встречную полосу.
Media: http://vse.sale/files/news/2022/06/102375_1655367671.jpg
------------------------------------------------------------
```

In case of using `--json` argument RSS reader convert the news into JSON format.
JSON contains follow data:
```json
{
"Date": "",
"Description": "",
"Feed": "",
"Link": "",
"LocalImgLink": "",
"Media": "",
"Title": "",
"Url": ""
}
```
JSON keys description:

`Date` - date of article in %Y.%m.%f format, example `"2022.06.29"`

`Description` - description of article.

`Feed` - Feed title.

`Link` - Link to article.

`LocalImgLink` - path to local saved image.

`Media` - link to image.

`Title` - title of article.

`Url` - url of RSS feed.

## News caching.
For caching date used SQLite database, image saved to `images` folder.

SQLite database used followed schema:

```sql
create table if not exists news (
id integer not NULL primary key AUTOINCREMENT,
Feed text,
Title text,
Link text,
Date text,
Description text,
Media text,
Url text,
LocalImgLink text
);
```

For clearing cache database you can use `--clear_cache` option.

RSS reader can save data in HTML and PDF formats. For it use `--to_html` and `--to_pdf` options with existing path.

RSS reader can colorize output data (default and json format). For it use `--colorize` option.

## Testing
For run RSS reader unit tests use:
```
python -m unittest parser_tests.py
```
Binary file added EduardMalyautskiy/dist/rss-reader-0.9.5.tar.gz
Binary file not shown.
Binary file not shown.
3 changes: 3 additions & 0 deletions EduardMalyautskiy/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
5 changes: 5 additions & 0 deletions EduardMalyautskiy/rss_reader/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import sys
import os


sys.path.append(os.path.dirname(__file__))
4 changes: 4 additions & 0 deletions EduardMalyautskiy/rss_reader/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from rss_parser import run

if __name__ == '__main__':
run()
124 changes: 124 additions & 0 deletions EduardMalyautskiy/rss_reader/converters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
from jinja2 import Environment, FileSystemLoader
import datetime
from fpdf import FPDF, HTMLMixin
from os import path
from urllib.parse import urlparse
from progress.bar import Bar


class PDF(FPDF, HTMLMixin):
pass


def get_file_local_path(url):
local_path = path.join(path.abspath(path.dirname(__file__)), 'images', path.basename(urlparse(url).path))

return local_path


def convert_to_html(data, local=False):
"""
Function for generating HTML from a list of articles.

:param data: list of articles in dict format
:param local: flag of source data. False - Internet, True - cache DB
:return: HTML in string
"""

env = Environment(
loader=FileSystemLoader(path.join(path.abspath(path.dirname(__file__)), "templates")),

)

template = env.get_template("template.html")

html_string = template.render(data=data, local=local)

return html_string


def save_to_html(html_string, dst_path=None):
"""
Function for saving a file in html format at the specified path

:param html_string: HTML in string format
:param dst_path: path to save html file

"""
if not dst_path:
dst_path = path.abspath(path.dirname(__file__))
file_name = path.join(dst_path, str(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")) + '.html')
with open(file_name, 'w',
encoding='utf-8') as f:
f.write(html_string)

return file_name


def save_to_pdf(data, dst_path=None, local=False):
"""
Function for saving a file in pdf format at the specified path

:param local: Flag for getting image local or from internet
:param data: list of articles in dict format
:param dst_path: path to save pdf file

"""

if not dst_path:
dst_path = path.abspath(path.dirname(__file__))

pdf = PDF(orientation='P', unit='mm', format='A4')
pdf.add_page()
pdf.add_font("Sans", style="",
fname=path.join(path.abspath(path.dirname(__file__)), 'templates', "NotoSans-Regular.ttf"))
pdf.add_font("Sans", style="B",
fname=path.join(path.abspath(path.dirname(__file__)), 'templates', "NotoSans-Bold.ttf"))
pdf.add_font("Sans", style="I",
fname=path.join(path.abspath(path.dirname(__file__)), 'templates', "NotoSans-Italic.ttf"))
pdf.add_font("Sans", style="BI",
fname=path.join(path.abspath(path.dirname(__file__)), 'templates', "NotoSans-BoldItalic.ttf"))

pdf.set_font('Sans', size=12)
pdf.set_auto_page_break(auto=True)
bar = Bar('Processing generate pdf', max=len(data))

for d in data: # Add data to PDF document

pdf.set_font('Sans', style='B', size=12)
pdf.write(5, txt='Feed: ' + d.get('Feed', 'No feed'))
pdf.ln(10)
pdf.write(5, txt='Title: ' + d.get('Title', 'No Title'))
pdf.ln(10)
pdf.set_font('Sans', style='', size=10)
desc = d.get('Description') if d.get('Description') else 'No description'
pdf.write(5, txt='Description: ' + desc)
pdf.ln(5)
pdf.set_font('Sans', style='I', size=8)
pdf.write(5, txt='Date: ' + d.get('Date', 'No date'))
pdf.ln(5)
link = d.get('Link')
if link:
pdf.write_html(f'<p>Article link :<a href="{link}">{link}</a></p>')
pdf.ln(5)

media_link = path.join(path.abspath(path.dirname(__file__)), 'templates', 'NoImage.jpg')

if local:
media_link = d.get('LocalImgLink')

else:
if d.get('Media'):
media_link = d.get('Media')

pdf.image(media_link, w=70, type="", link=media_link)
pdf.ln(5)
pdf.write_html(f'<p>Image link :<a href="{media_link}">{d.get("Media")}</a></p>')

pdf.ln(20)
bar.next()
bar.finish()
file_name = path.join(dst_path, str(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")) + '.pdf')
pdf.output(file_name)

return file_name
11 changes: 11 additions & 0 deletions EduardMalyautskiy/rss_reader/db_schema.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
create table if not exists news (
id integer not NULL primary key AUTOINCREMENT,
Feed text,
Title text,
Link text,
Date text,
Description text,
Media text,
Url text,
LocalImgLink text
);
Empty file.
31 changes: 31 additions & 0 deletions EduardMalyautskiy/rss_reader/parser_db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import sqlite3
from os import path


class DataConn:

def __init__(self, db_name=path.join(path.abspath(path.dirname(__file__)), 'rss_cache.db')):
"""Конструктор"""
self.db_name = db_name

def __enter__(self):
"""
Открываем подключение с базой данных.
"""
db_exists = path.exists(self.db_name)
self.conn = sqlite3.connect(self.db_name)
if not db_exists:
with open(path.join(path.abspath(path.dirname(__file__)), 'db_schema.sql'), 'r') as f:
schema = f.read()
self.conn.executescript(schema)

return self.conn

def __exit__(self, exc_type, exc_val, exc_tb):
"""
Закрываем подключение.
"""
self.conn.close()
if exc_val:
raise

15 changes: 15 additions & 0 deletions EduardMalyautskiy/rss_reader/parser_exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
class NoItemsException(Exception):
pass


class NotRSSException(Exception):
pass


class RequestProblem(Exception):
pass


class IncorrectPath(Exception):
pass

Loading